Close Menu
    Trending
    • Enabling small language models to solve complex reasoning tasks | MIT News
    • New method enables small language models to solve complex reasoning tasks | MIT News
    • New MIT program to train military leaders for the AI age | MIT News
    • The Machine Learning “Advent Calendar” Day 12: Logistic Regression in Excel
    • Decentralized Computation: The Hidden Principle Behind Deep Learning
    • AI Blamed for Job Cuts and There’s Bigger Disruption Ahead
    • New Research Reveals Parents Feel Unprepared to Help Kids with AI
    • Pope Warns of AI’s Impact on Society and Human Dignity
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » The Machine Learning “Advent Calendar” Day 9: LOF in Excel
    Artificial Intelligence

    The Machine Learning “Advent Calendar” Day 9: LOF in Excel

    ProfitlyAIBy ProfitlyAIDecember 9, 2025No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Yesterday, we labored with Isolation Forest, which is an Anomaly Detection methodology.

    Right now, we have a look at one other algorithm that has the identical goal. However in contrast to Isolation Forest, it does not construct timber.

    It’s known as LOF, or Native Outlier Issue.

    Individuals usually summarize LOF with one sentence: Does this level stay in a area with a decrease density than its neighbors?

    This sentence is definitely tough to know. I struggled with it for a very long time.

    Nevertheless, there may be one half that’s instantly straightforward to know,
    and we’ll see that it turns into the important thing level:
    there’s a notion of neighbors.

    And as quickly as we speak about neighbors,
    we naturally return to distance-based fashions.

    We are going to clarify this algorithm in 3 steps.

    To maintain issues quite simple, we’ll use this dataset, once more:

    1, 2, 3, 9

    Do you keep in mind that I’ve the copyright on this dataset? We did Isolation Forest with it, and we’ll do LOF with it once more. And we will additionally evaluate the 2 outcomes.

    LOF in Excel with 3 steps- all pictures by creator

    Step 1 – okay Neighbors and k-distance

    LOF begins with one thing very simple:

    Have a look at the distances between factors.
    Then discover the okay nearest neighbors of every level.

    Allow us to take okay = 2, simply to maintain issues minimal.

    Nearest neighbors for every level

    • Level 1 → neighbors: 2 and three
    • Level 2 → neighbors: 1 and three
    • Level 3 → neighbors: 2 and 1
    • Level 9 → neighbors: 3 and a pair of

    Already, we see a transparent construction rising:

    • 1, 2, and three kind a decent cluster
    • 9 lives alone, removed from the others

    The k-distance: an area radius

    The k-distance is solely the biggest distance among the many okay nearest neighbors.

    And that is really the important thing level.

    As a result of this single quantity tells you one thing very concrete:
    the native radius across the level.

    If k-distance is small, the purpose is in a dense space.
    If k-distance is giant, the purpose is in a sparse space.

    With simply this one measure, you have already got a primary sign of “isolation”.

    Right here, we use the thought of “okay nearest neighbors”, which after all reminds us of k-NN (the classifier or regressor).
    The context right here is totally different, however the calculation is strictly the identical.

    And for those who consider k-means, don’t combine them:
    the “okay” in k-means has nothing to do with the “okay” right here.

    The k-distance calculation

    For level 1, the 2 nearest neighbors are 2 and 3 (distances 1 and a pair of), so k-distance(1) = 2.

    For level 2, neighbors are 1 and 3 (each at distance 1), so k-distance(2) = 1.

    For level 3, the 2 nearest neighbors are 1 and 2 (distances 2 and 1), so k-distance(3) = 2.

    For level 9, neighbors are 3 and 2 (6 and seven), so k-distance(9) = 7. That is large in comparison with all of the others.

    In Excel, we will do a pairwise distance matrix to get the k-distance for every level.

    LOF in Excel – picture by creator

    Step 2 – Reachability Distances

    For this step, I’ll simply outline the calculations right here, and apply the formulation in Excel. As a result of, to be trustworthy, I by no means succeeded to find a really intuitive method to clarify the outcomes.

    So, what’s “reachability distance”?

    For some extent p and a neighbor o, we outline this reachability distance as:

    reach-dist(p, o) = max(k-dist(o), distance(p, o))

    Why take the utmost?

    The aim of reachability distance is to stabilize density comparability.

    If the neighbor o lives in a really dense area (small k-dist), then we don’t need to permit an unrealistically small distance.

    Particularly, for level 2:

    • Distance to 1 = 1, however k-distance(1) = 2 → reach-dist(2, 1) = 2
    • Distance to three = 1, however k-distance(3) = 2 → reach-dist(2, 3) = 2

    Each neighbors drive the reachability distance upward.

    In Excel, we’ll preserve a matrix format to show the reachability distances: one level in comparison with all of the others.

    LOF in Excel – picture by creator

    Common reachability distance

    For every level, we will now compute the typical worth, which tells us: on common, how far do I must journey to achieve my native neighborhood?

    And now, do you discover one thing: the purpose 2 has a bigger common reachability distance than 1 and three.

    This isn’t that intuitive to me!

    Step 3 – LRD and the LOF Rating

    The ultimate step is form of a “normalization” to seek out an anomaly rating.

    First, we outline the LRD, Native Reachability Density, which is solely the inverse of the typical reachability distance.

    And the ultimate LOF rating is calculated as:

    So, LOF compares the density of some extent to the density of its neighbors.

    Interpretation:

    • If LRD(p) ≈ LRD (neighbors), then LOF ≈ 1
    • If LRD(p) is way smaller, then LOF >> 1. So p is in a sparse area
    • If LRD(p) is way bigger → LOF < 1. So p is in a really dense pocket.

    I additionally did a model with extra developments, and shorter formulation.

    Understanding What “Anomaly” Means in Unsupervised Fashions

    In unsupervised studying, there is no such thing as a floor fact. And that is precisely the place issues can turn into tough.

    We should not have labels.
    We should not have the “appropriate reply”.
    We solely have the construction of the info.

    Take this tiny pattern:

    1, 2, 3, 7, 8, 12
    (I even have the copyright on it.)

    In case you have a look at it intuitively, which one looks like an anomaly?

    Personally, I might say 12.

    Now allow us to have a look at the outcomes. LOF says the outlier is 7.

    (And you may discover that with k-distance, we’d say that it’s 12.)

    LOF in Excel – picture by creator

    Now, we will evaluate Isolation Forest and LOF facet by facet.

    On the left, with the dataset 1, 2, 3, 9, each strategies agree:
    9 is the clear outlier.
    Isolation Forest offers it the bottom rating,
    and LOF offers it the very best LOF worth.

    If we glance nearer, for Isolation Forest: 1, 2 and three don’t have any variations in rating. And LOF offers the next rating for two. That is what we already seen.

    With the dataset 1, 2, 3, 7, 8, 12, the story adjustments.

    • Isolation Forest factors to 12 as essentially the most remoted level.
      This matches the instinct: 12 is much from everybody.
    • LOF, nevertheless, highlights 7 as an alternative.
    LOF in Excel – picture by creator

    So who is true?

    It’s tough to say.

    In follow, we first must agree with enterprise groups on what “anomaly” really means within the context of our knowledge.

    As a result of in unsupervised studying, there is no such thing as a single fact.

    There’s solely the definition of “anomaly” that every algorithm makes use of.

    This is the reason this can be very essential to know
    how the algorithm works, and what sort of anomalies it’s designed to detect.

    Solely then are you able to resolve whether or not LOF, or k-distance, or Isolation Forest is the proper alternative in your particular scenario.

    And that is the entire message of unsupervised studying:

    Totally different algorithms have a look at the info in another way.
    There isn’t a “true” outlier.
    Solely the definition of what an outlier means for every mannequin.

    This is the reason understanding how the algorithm works
    is extra essential than the ultimate rating it produces.

    Conclusion

    LOF and Isolation Forest each detect anomalies, however they have a look at the info by means of utterly totally different lenses.

    • k-distance captures how far some extent should journey to seek out its neighbors.
    • LOF compares native densities.
    • Isolation Forest isolates factors utilizing random splits.

    And even on quite simple datasets, these strategies can disagree.
    One algorithm could flag some extent as an outlier, whereas one other highlights a very totally different one.

    And that is the important thing message:

    In unsupervised studying, there is no such thing as a “true” outlier.
    Every algorithm defines anomalies in response to its personal logic.

    This is the reason understanding how a technique works is extra essential than the quantity it produces.
    Solely then are you able to select the proper algorithm for the proper scenario, and interpret the outcomes with confidence.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticlePersonal, Agentic Assistants: A Practical Blueprint for a Secure, Multi-User, Self-Hosted Chatbot
    Next Article Optimizing PyTorch Model Inference on AWS Graviton
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Enabling small language models to solve complex reasoning tasks | MIT News

    December 12, 2025
    Artificial Intelligence

    New method enables small language models to solve complex reasoning tasks | MIT News

    December 12, 2025
    Artificial Intelligence

    New MIT program to train military leaders for the AI age | MIT News

    December 12, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    600+ AI Micro SaaS Ideas for Entrepreneurs in 30+ Categories • AI Parabellum

    April 3, 2025

    About Calculating Date Ranges in DAX

    May 22, 2025

    Netflix Adds ChatGPT-Powered AI to Stop You From Scrolling Forever

    May 8, 2025

    The AI Hype Index: The people can’t get enough of AI slop

    November 26, 2025

    Why We’ve Been Optimizing the Wrong Thing in LLMs for Years

    November 28, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    DeepCoder: Open Source AI som når O3-mini Prestanda

    April 9, 2025

    These four charts show where AI companies could go next in the US

    July 16, 2025

    Apple’s $1 Billion Bet on Google Gemini to Fix Siri

    November 14, 2025
    Our Picks

    Enabling small language models to solve complex reasoning tasks | MIT News

    December 12, 2025

    New method enables small language models to solve complex reasoning tasks | MIT News

    December 12, 2025

    New MIT program to train military leaders for the AI age | MIT News

    December 12, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.