Close Menu
    Trending
    • Optimizing Data Transfer in Distributed AI/ML Training Workloads
    • Achieving 5x Agentic Coding Performance with Few-Shot Prompting
    • Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found
    • From Transactions to Trends: Predict When a Customer Is About to Stop Buying
    • America’s coming war over AI regulation
    • “Dr. Google” had its issues. Can ChatGPT Health do better?
    • Evaluating Multi-Step LLM-Generated Content: Why Customer Journeys Require Structural Metrics
    • Why SaaS Product Management Is the Best Domain for Data-Driven Professionals in 2026
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » The Machine Learning “Advent Calendar” Day 4: k-Means in Excel
    Artificial Intelligence

    The Machine Learning “Advent Calendar” Day 4: k-Means in Excel

    ProfitlyAIBy ProfitlyAIDecember 4, 2025No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    4 of the Machine Learning Advent Calendar.

    Through the first three days, we explored distance-based fashions for supervised studying:

    In all these fashions, the concept was the identical: we measure distances, and we resolve the output primarily based on the closest factors or nearest facilities.

    In the present day, we keep on this similar household of concepts. However we use the distances in an unsupervised means: k-means.

    Now, one query for individuals who already know this algorithm: k-means appears extra much like which mannequin, the k-NN classifier, or the Nearest Centroid classifier?

    And when you keep in mind, for all of the fashions now we have seen up to now, there was probably not a “coaching” section or hyperparameter tuning.

    • For k-NN, there is no such thing as a coaching in any respect.
    • For LDA, QDA, or GNB, coaching is simply computing means and variances. And there are additionally no actual hyperparameters.

    Now, with k-means, we’re going to implement a coaching algorithm that lastly appears like “actual” machine studying.

    We begin with a tiny 1D instance. Then we transfer to 2D.

    Purpose of k-means

    Within the coaching dataset, there are no preliminary labels.

    The aim of k-means is to create significant labels by grouping factors which might be shut to one another.

    Allow us to take a look at the illustration under. You may clearly see two teams of factors. Every centroid (the crimson sq. and the inexperienced sq.) is in the course of its cluster, and each level is assigned to the closest one.

    This provides a really intuitive image of how k-means discovers construction utilizing solely distances.

    And right here, ok means the variety of facilities we attempt to discover.

    k-means in Excel – picture by creator

    Now, allow us to reply the query: Which algorithm is k-means nearer to, the k-NN classifier or the Nearest Centroid classifier?

    Don’t be fooled by the ok in k-NN and k-means.
    They don’t imply the identical factor:

    • in k-NN, ok is the variety of neighbors, not the variety of courses;
    • in k-means, ok is the variety of centroids.

    Ok-means is far nearer to the Nearest Centroid classifier.

    Each fashions are represented by centroids, and for a brand new commentary we merely compute the gap to every centroid to resolve to which one it belongs.

    The distinction, in fact, is that within the Nearest Centroid classifier, we already know the centroids as a result of they arrive from labeled courses.

    In k-means, we have no idea the centroids. The entire aim of the algorithm is to uncover appropriate ones instantly from the info.

    The enterprise drawback is totally totally different: as a substitute of predicting labels, we are attempting to create them.

    And in k-means, the worth of ok (the variety of centroids) is unknown. So it turns into a hyperparameter that we will tune.

    k-means with solely One function

    We begin with a tiny 1D instance in order that the whole lot is seen on one axis. And we are going to select the values in such a trivial means that we will immediately see the 2 centroids.

    1, 2, 3, 11, 12, 13

    Sure, 2, and 12.

    However how would the pc know? The machine will “study” by guessing step-by-step.

    Right here comes the algorithm referred to as Lloyd’s algorithm.

    We are going to implement it in Excel with the next loop:

    1. select preliminary centroids
    2. compute the gap from every level to every centroid
    3. assign every level to the closest centroid
    4. recompute the centroids as the common of the factors in every cluster
    5. repeat steps 2 to 4 till the centroids not transfer

    1. Select preliminary centroids

    Choose two preliminary facilities, for instance:

    They need to be throughout the knowledge vary (between 1 and 13).

    k-means in Excel – picture by creator

    2. Compute distances

    For every knowledge level x:

    • compute the gap to c_1,
    • compute the gap to c_2.

    Usually, we use absolute distance in 1D.

    We now have two distance values for every level.

    k-means in Excel – picture by creator

    3. Assign clusters

    For every level:

    • evaluate the 2 distances,
    • assign the cluster of the smallest one (1 or 2).

    In Excel, it is a easy IF or MIN primarily based logic.

    k-means in Excel – picture by creator

    4. Compute the brand new centroids

    For every cluster:

    • take the factors assigned to that cluster,
    • compute their common,
    • this common turns into the brand new centroid.
    k-means in Excel – picture by creator

    5. Iterate till reaching convergence

    Now in Excel, because of the formulation, we will merely paste the brand new centroid values into the cells of the preliminary centroids.

    The replace is speedy, and after doing this just a few instances, you will note that the values cease altering. That’s when the algorithm has converged.

    k-means in Excel – picture by creator

    We will additionally report every step in Excel, so we will see how the centroids and clusters evolve over time.

    k-means in Excel – picture by creator

    k-means with Two Options

    Now allow us to use two options. The method is strictly the identical, we merely use the Euclidean distance in 2D.

    You may both do the copy-paste of the brand new centroids as values (with only a few cells to replace),

    k-means in Excel – picture by creator

    or you may show all of the intermediate steps to see the complete evolution of the algorithm.

    k-means in Excel – picture by creator

    Visualizing the Transferring Centroids in Excel

    To make the method extra intuitive, it’s useful to create plots that present how the centroids transfer.

    Sadly, Excel or Google Sheets should not ultimate for this type of visualization, and the info tables shortly turn out to be a bit complicated to prepare.

    If you wish to see a full instance with detailed plots, you may learn this article I wrote nearly three years in the past, the place every step of the centroid motion is proven clearly.

    k-means in Excel – picture by creator

    As you may see on this image, the worksheet grew to become fairly unorganized, particularly in comparison with the sooner desk, which was very simple.

    k-means in Excel – picture by creator

    Selecting the optimum ok: The Elbow Technique

    So now, it’s potential to attempt ok = 2 and ok = 3 in our case, and compute the inertia for every one. Then we merely evaluate the values.

    We will even start with ok=1.

    For every worth of ok:

    • we run k-Means till convergence,
    • compute the inertia, which is the sum of squared distances between every level and its assigned centroid.

    In Excel:

    • For every level, take the gap to its centroid and sq. it.
    • Sum all these squared distances.
    • This provides the inertia for this ok.

    For instance:

    • for ok = 1, the centroid is simply the general imply of x1 and x2,
    • for ok = 2 and ok = 3, we take the converged centroids from the sheets the place you ran the algorithm.

    Then we will plot inertia as a operate of ok, for instance for (ok = 1, 2, 3).

    For this dataset

    • from 1 to 2, the inertia drops loads,
    • from 2 to three, the advance is far smaller.

    The “elbow” is the worth of ok after which the lower in inertia turns into marginal. Within the instance, it means that ok = 2 is ample.

    k-means in Excel – picture by creator

    Conclusion

    Ok-means is a really intuitive algorithm when you see it step-by-step in Excel.

    We begin with easy centroids, compute distances, assign factors, replace the centroids, and repeat. Now, we will see how “machines study”, proper?

    Effectively, that is solely the start, we are going to see that totally different fashions “study” in actually other ways.

    And right here is the transition for tomorrow’s article: the unsupervised model of the Nearest Centroid classifier is certainly k-means.

    So what can be the unsupervised model of LDA or QDA? We are going to reply this within the subsequent article.

    k-means – picture by creator



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleA Fundamental Rethinking of How AI Learns
    Next Article Do Labels Make AI Blind? Self-Supervision Solves the Age-Old Binding Problem
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Optimizing Data Transfer in Distributed AI/ML Training Workloads

    January 23, 2026
    Artificial Intelligence

    Achieving 5x Agentic Coding Performance with Few-Shot Prompting

    January 23, 2026
    Artificial Intelligence

    Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found

    January 23, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Useful Python Libraries You Might Not Have Heard Of:  Freezegun

    September 4, 2025

    Lessons Learned After 6.5 Years Of Machine Learning

    June 30, 2025

    A small US city experiments with AI to find out what residents want

    April 15, 2025

    Why Optimization Isn’t Enough Anymore

    January 21, 2026

    The first trial of generative AI therapy shows it might help with depression

    April 3, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Build a Data Dashboard Using HTML, CSS, and JavaScript

    October 3, 2025

    The Art of Asking Good Questions

    September 23, 2025

    Muset AI: Features, Benefits, Review and Alternatives

    September 10, 2025
    Our Picks

    Optimizing Data Transfer in Distributed AI/ML Training Workloads

    January 23, 2026

    Achieving 5x Agentic Coding Performance with Few-Shot Prompting

    January 23, 2026

    Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found

    January 23, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.