Close Menu
    Trending
    • Gemini introducerar funktionen schemalagda åtgärder i Gemini-appen
    • AIFF 2025 Runway’s tredje årliga AI Film Festival
    • AI-agenter kan nu hjälpa läkare fatta bättre beslut inom cancervård
    • Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value
    • Prescriptive Modeling Unpacked: A Complete Guide to Intervention With Bayesian Modeling.
    • 5 Crucial Tweaks That Will Make Your Charts Accessible to People with Visual Impairments
    • Why AI Projects Fail | Towards Data Science
    • The Role of Luck in Sports: Can We Measure It?
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics
    Artificial Intelligence

    The Dangers of Deceptive Data Part 2–Base Proportions and Bad Statistics

    ProfitlyAIBy ProfitlyAIMay 9, 2025No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    -up to my earlier article: The Dangers of Deceptive Data–Confusing Charts and Misleading Headlines. My first article centered on how visualizations can be utilized to mislead, diving right into a type of information presentation broadly utilized in public issues.

    On this article, I am going a bit deeper, taking a look at how a misunderstanding of statistical concepts is breeding floor for being deceived by information. Particularly, I’ll stroll via how correlation, base proportions, abstract statistics, and misinterpretation of uncertainty can lead folks astray.

    Let’s get proper into it.

    Correlation ≠ Causation

    Let’s begin with a basic to get in the appropriate way of thinking for some extra advanced concepts. From the earliest statistics courses in grade faculty, we’re all advised that correlation will not be equal to causation.

    For those who do a little bit of Googling or studying, you’ll find “statistics” that present a excessive correlation between cigarette consumption and common life expectancy [1]. Fascinating. Effectively, does that imply we should always all begin smoking to dwell longer?

    In fact not. We’re lacking a confounding issue: shopping for cigarettes requires cash, and international locations with larger wealth understandably have larger life expectations. There is no such thing as a causal hyperlink between cigarettes and age. I like this instance as a result of it’s so blatantly deceptive and highlights the purpose nicely. Generally, it’s essential to be cautious of any information that solely exhibits a correlational hyperlink.

    From a scientific standpoint, a correlation might be recognized through statement, however the one method to declare causation is to really conduct a randomized trial controlling for potential confounding components—a reasonably concerned course of.

    I selected to start out right here as a result of whereas being introductory, this idea additionally highlights a key concept that underpins understanding information successfully: The info solely exhibits what it exhibits, and nothing else.

    Maintain that in thoughts as we transfer ahead.

    Keep in mind Base Proportions

    In 1978, Dr. Stephen Casscells and his workforce famously requested a gaggle of 60 physicians, residents, and college students at Harvard Medical College the next questions:

    “If a take a look at to detect a illness whose prevalence is 1 in 1,000 has a false optimistic fee of 5%, what’s the probability that an individual discovered to have a optimistic end result really has the illness, assuming you realize nothing concerning the particular person’s signs or indicators?”

    Although introduced in medical phrases, this query is absolutely about statistics. Accordingly, it additionally has connections to information science. Take a second to consider your individual reply to this query earlier than studying additional.

    Photograph by Getty Images on Unsplash

    The reply is (roughly) 2%. Now, if you happen to appeared via this shortly (and aren’t up to the mark along with your statistics), you could have guessed considerably larger.

    This was actually the case with the medical faculty of us. Solely 11/60 folks accurately answered the query, with 27/60 going as excessive as 95% of their response (presumably simply subtracting the false optimistic fee from 100).

    It’s straightforward to imagine that the precise worth must be excessive because of the optimistic relaxation end result, however this assumption accommodates an important reasoning error: It fails to account for the extraordinarily low prevalence of the illness within the inhabitants.

    Stated one other manner, if just one in each 1,000 folks has the illness, this must be taken under consideration when calculating the chance of a random particular person having the illness. The chance doesn’t rely solely on the optimistic take a look at end result. As quickly because the take a look at accuracy falls beneath 100%, the affect of the bottom fee comes into play fairly considerably.

    Formally, this reasoning error is named the base fee fallacy.

    To see this extra clearly, think about that just one in each 1,000,000 folks had the illness, however the take a look at nonetheless has a false optimistic fee of 5%. Would you continue to assume {that a} optimistic take a look at end result instantly signifies a 95% probability of getting the illness? What if it was 1 in a billion?

    Base charges are extraordinarily essential. Keep in mind that.

    Statistical Measures Are NOT Equal to the Knowledge

    Let’s check out the next quantitative information units (13 of them, to be exact), all of that are visualized as a scatter plot. One is even within the form of a dinosaur.

    Picture By Creator. Generated utilizing code obtainable beneath MIT license at https://jumpingrivers.github.io/datasauRus/

    Do you see something fascinating about these information units?

    I’ll level you in the appropriate route. Here’s a set of abstract statistics for the information:

    X-Imply 54.26
    Y-Imply 47.83
    X-SD (Customary Deviation) 16.76
    Y-SD 26.93
    Correlation -0.06

    For those who’re questioning why there is just one set of statistics, it’s as a result of they’re all the identical. Each single one of many 13 Charts above has the identical imply, commonplace deviation, and correlation between variables.

    This well-known set of 13 information units is named the Datasaurus Dozen [5], and was revealed some years in the past as a stark instance of why abstract statistics can’t all the time be trusted. It additionally highlights the worth of visualization as a instrument for information exploration. Within the phrases of famend statistician John Tukey,

    “The best worth of an image is when it forces us to note what we by no means anticipated to see.“

    Understanding Uncertainty

    To conclude, I need to speak about a slight variation of misleading information, however one that’s equally essential: mistrusting information that’s really right. In different phrases, false deception.

    The next chart is taken from a examine analyzing the emotions of headlines taken from left-leaning, right-leaning, and centrist information retailers [6]:

    “Common yearly sentiment of headlines grouped by the ideological leanings of reports retailers” by Authors of the examine: David Rozado, Ruth Hughes, Jamin Halberstadt is licensed beneath CC BY 4.0. To view a duplicate of this license, go to https://creativecommons.org/licenses/by/4.0/?ref=openverse.

    There’s fairly a bit happening within the chart above, however there’s one explicit facet I need to draw your consideration to: the vertical strains extending from every plotted level. You could have seen these earlier than. Formally, these are referred to as error bars, and they’re a technique that scientists typically depict uncertainty within the information.

    Let me say that once more. In statistics and Data Science, “error” is synonymous with “uncertainty.” Crucially, it doesn’t imply one thing is mistaken or incorrect about what’s being proven. When a chart depicts uncertainty, it depicts a fastidiously calculated measure of the vary of a price and the extent of confidence at varied factors inside that vary. Sadly, many individuals simply take it to imply that whoever made the chart is actually guessing.

    It is a severe error in reasoning, for the injury is twofold: Not solely does the information at hand get misinterpreted, however the presence of this false impression additionally contributes to the damaging societal perception that science is to not be trusted. Being upfront concerning the limitations of information ought to really enhance our confidence in a declare’s reliability, however mistaking that limitation as admission of foul play results in the other impact.

    Studying easy methods to interpret uncertainty is difficult however extremely essential. On the minimal, a very good place to start out is realizing what the so-called “error” is definitely making an attempt to convey.

    Recap and Ultimate Ideas

    Right here’s a cheat sheet for being cautious of misleading information:

    • Correlation ≠ causation. Search for the confounding issue.
    • Keep in mind base proportions. The chance of a phenomenon is extremely influenced by its prevalence within the inhabitants, regardless of how correct your take a look at is (excluding 100% accuracy, which is uncommon).
    • Beware abstract Statistics. Means and medians will solely take you thus far; it’s essential discover your information.
    • Don’t misunderstand uncertainty. It isn’t an error; it’s a fastidiously thought-about description of confidence ranges.

    Keep in mind these, and also you’ll be nicely positioned to deal with the subsequent information science downside that makes its method to you.

    Till subsequent time.

    References

    [1] How Charts Lie, Alberto Cairo

    [2] https://pmc.ncbi.nlm.nih.gov/articles/PMC4955674

    [3] https://data88s.org/textbook/content/Chapter_02/04_Use_and_Interpretation.html?utm_source=chatgpt.com

    [4] https://visualizing.jp/the-datasaurus-dozen

    [5] https://dl.acm.org/doi/abs/10.1145/3025453.3025912?casa_token=AU6PWgCWQuMAAAAA:5a9-oA38RxxzmVGZiIFJdrNdOMII2kmsFLJK22WJgaAk37PECCmAQjwVzAiapGiV4MAOPTJ8-uax0g

    [6] https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0276367



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleThe Shadow Side of AutoML: When No-Code Tools Hurt More Than Help
    Next Article ACP: The Internet Protocol for AI Agents
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Not Everything Needs Automation: 5 Practical AI Agents That Deliver Enterprise Value

    June 6, 2025
    Artificial Intelligence

    Prescriptive Modeling Unpacked: A Complete Guide to Intervention With Bayesian Modeling.

    June 6, 2025
    Artificial Intelligence

    5 Crucial Tweaks That Will Make Your Charts Accessible to People with Visual Impairments

    June 6, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    LLMs + Pandas: How I Use Generative AI to Generate Pandas DataFrame Summaries

    June 3, 2025

    Why handing over total control to AI agents would be a huge mistake

    April 3, 2025

    Build Your Own OCR Engine for Wingdings

    April 4, 2025

    Enhance your AP automation workflows

    May 22, 2025

    Scaling Human-in-the-Loop: Overcoming AI Evaluation Challenges

    April 9, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Teaching AI models the broad strokes to sketch more like humans do | MIT News

    June 3, 2025

    Nyfiken på GPT-4.1 -Så här testar du den på Poe och Polychat

    April 16, 2025

    LLMs + Pandas: How I Use Generative AI to Generate Pandas DataFrame Summaries

    June 3, 2025
    Our Picks

    Gemini introducerar funktionen schemalagda åtgärder i Gemini-appen

    June 7, 2025

    AIFF 2025 Runway’s tredje årliga AI Film Festival

    June 7, 2025

    AI-agenter kan nu hjälpa läkare fatta bättre beslut inom cancervård

    June 7, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.