Close Menu
    Trending
    • “The success of an AI product depends on how intuitively users can interact with its capabilities”
    • How to Crack Machine Learning System-Design Interviews
    • Music, Lyrics, and Agentic AI: Building a Smart Song Explainer using Python and OpenAI
    • An Anthropic Merger, “Lying,” and a 52-Page Memo
    • Apple’s $1 Billion Bet on Google Gemini to Fix Siri
    • Critical Mistakes Companies Make When Integrating AI/ML into Their Processes
    • Nu kan du gruppchatta med ChatGPT – OpenAI testar ny funktion
    • OpenAI’s new LLM exposes the secrets of how AI really works
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » Spearman Correlation Coefficient for When Pearson Isn’t Enough
    Artificial Intelligence

    Spearman Correlation Coefficient for When Pearson Isn’t Enough

    ProfitlyAIBy ProfitlyAINovember 13, 2025No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    on the Pearson correlation coefficient, we mentioned how it’s used to measure the energy of the linear relationship between two variables (years of expertise and wage).

    Not all relationships between variables are linear, and Pearson correlation works finest when the connection follows a straight-line sample.

    When the connection is just not linear however nonetheless strikes constantly in a single route, we use Spearman correlation coefficient to seize that sample.

    To grasp the Spearman correlation coefficient, let’s take into account the fish market dataset.

    This dataset consists of bodily attributes of every fish, comparable to:

    • Weight – the burden of the fish in grams (this can be our goal variable)
    • Length1, Length2, Length3 – numerous size measurements (in cm)
    • Peak – the peak of the fish (in cm)
    • Width – the diagonal width of the fish physique (in cm)

    We have to predict the burden of the fish based mostly on numerous size measurements, top and width.

    This was the identical instance we used to know the maths behind a number of linear regression in an earlier weblog however used solely top and width as impartial variables first to get the person equations for slopes and intercepts.

    Right here we are attempting to suit a a number of linear regression mannequin, and now we have 5 impartial variables and one goal variable.

    Now let’s calculate the Pearson correlation coefficient between every impartial variable and the goal variable.

    Code:

    import pandas as pd
    
    # Load the Fish Market dataset
    df = pd.read_csv("C:/Fish.csv")
    
    # Drop the explicit 'Species' column 
    if 'Species' in df.columns:
        df_numeric = df.drop(columns=['Species'])
    else:
        df_numeric = df.copy()
    
    # Calculate Pearson correlation between every impartial variable and the goal (Weight)
    goal = 'Weight'
    pearson_corr = df_numeric.corr(technique='pearson')[target].drop(goal)  # drop self-correlation
    
    pearson_corr.sort_values(ascending=False)

    The Pearson correlation coefficient between Weight and

    • Length3 is 0.923044
    • Length2 is 0.918618
    • Length1 is 0.915712
    • Width is 0.886507
    • Peak is 0.724345

    Amongst all of the variables, Peak has the weakest Pearson correlation coefficient, and we would assume that we should always drop this variable earlier than making use of the a number of linear regression mannequin.

    However earlier than that, is it appropriate to drop an impartial variable based mostly on Pearson correlation coefficient?

    No.

    First, let’s take a look at the scatter plot between Peak and Weight.

    Picture by Creator

    From the scatter plot we will observe that as top will increase, weight additionally will increase, however the relationship is just not linear.

    At smaller heights, the burden will increase slowly. At bigger heights, it will increase extra shortly.

    Right here the pattern is non-linear however nonetheless monotonic, as a result of it strikes in a single route.

    For the reason that Pearson correlation coefficient assumes a straight-line relationship (linearity), it offers a decrease worth right here.

    That is the place the Spearman correlation coefficient is available in.

    Now let’s calculate the Spearman correlation coefficient between Peak and Weight.

    Code:

    import pandas as pd
    from scipy.stats import spearmanr
    
    # Load the dataset
    df = pd.read_csv("C:/Fish.csv") 
    
    # Calculate Spearman correlation coefficient between Peak and Weight
    spearman_corr = spearmanr(df["Height"], df["Weight"])[0]
    
    print(f"Spearman Correlation Coefficient: {spearman_corr:.4f}")

    The Spearman correlation coefficient is 0.8586, which signifies a robust optimistic relationship between Peak and Weight.

    Which means that as the peak of the fish will increase, the burden additionally tends to extend.

    Earlier, we bought a Pearson correlation coefficient of 0.72 between Peak and Weight, which underestimates the precise relationship between these variables.

    If we choose options solely based mostly on the Pearson correlation and take away the Peak function, we would lose an necessary variable that really has a robust relationship with the goal, resulting in much less related predictions.

    That is the place the Spearman correlation coefficient helps, because it captures non-linear however monotonic traits.

    Through the use of the Spearman correlation, we will additionally resolve the following steps, comparable to making use of transformations like log or lag values or contemplating algorithms like determination timber or random forests that may deal with each linear and non-linear relationships.


    As now we have understood the importance of the Spearman correlation coefficient, now it’s time to perceive the maths behind it.

    How is the Spearman correlation coefficient calculated in a means that it captures the connection even when the information is non-linear and monotonic?

    To grasp this, let’s take into account a 10-point pattern from the dataset.

    Picture by Creator

    Now, we kind the values in ascending order in every column after which assign ranks.

    Picture by Creator

    Now that now we have given ranks to each Peak and Weight, we don’t hold them within the sorted order.

    Every worth wants to return to its unique place within the dataset so that each fish’s top rank is matched with its personal weight rank.

    We kind the columns solely to assign ranks. After that, we place the ranks again of their unique order after which calculate the Spearman correlation utilizing these two units of ranks.

    Picture by Creator

    Right here, whereas assigning ranks after sorting the values in ascending order within the Weight column, we encountered a tie at ranks 5 and 6, so we assigned each values the common rank of 5.5.

    Equally, we discovered one other tie throughout ranks 7, 8, 9, and 10, so we assigned all of them the common rank of 8.5.

    Now, we calculate the Spearman correlation coefficient, which is definitely the Pearson correlation utilized to the ranks.

    We already know the formulation for calculating Pearson correlation coefficient.

    [
    r = frac{text{Cov}(X, Y)}{s_X cdot s_Y}
    = frac{frac{1}{n – 1} sum_{i=1}^{n} (X_i – bar{X})(Y_i – bar{Y})}
    {sqrt{frac{1}{n – 1} sum_{i=1}^{n} (X_i – bar{X})^2} cdot sqrt{frac{1}{n – 1} sum_{i=1}^{n} (Y_i – bar{Y})^2}}
    ]

    [
    = frac{sum_{i=1}^{n} (X_i – bar{X})(Y_i – bar{Y})}
    {sqrt{sum_{i=1}^{n} (X_i – bar{X})^2} cdot sqrt{sum_{i=1}^{n} (Y_i – bar{Y})^2}}
    ]

    Now, the formulation for Spearman correlation coefficient is:

    [
    r_s =
    frac{
    sum_{i=1}^{n}
    underbrace{(R_{X_i} – bar{R}_X)}_{text{Rank deviation of } X_i}
    cdot
    underbrace{(R_{Y_i} – bar{R}_Y)}_{text{Rank deviation of } Y_i}
    }{
    sqrt{
    sum_{i=1}^{n}
    underbrace{(R_{X_i} – bar{R}_X)^2}_{text{Squared rank deviations of } X}
    }
    cdot
    sqrt{
    sum_{i=1}^{n}
    underbrace{(R_{Y_i} – bar{R}_Y)^2}_{text{Squared rank deviations of } Y}
    }
    }
    ]

    [
    begin{aligned}
    text{Where:}
    R_{X_i} & = text{ rank of the } i^text{th} text{ value in variable } X
    R_{Y_i} & = text{ rank of the } i^text{th} text{ value in variable } Y
    bar{R}_X & = text{ mean of all ranks in } X
    bar{R}_Y & = text{ mean of all ranks in } Y
    end{aligned}
    ]

    Now, let’s calculate the Spearman correlation coefficient for the pattern information.

    [
    textbf{Step 1: Ranks from the original data}
    ]

    [
    begin{array}cccccccccc
    R_{x_i} & 3 & 1 & 2 & 5 & 8 & 4 & 7 & 9 & 10 & 6 [2pt]
    R_{y_i} & 1 & 2 & 4 & 5.5 & 8.5 & 3 & 5.5 & 8.5 & 8.5 & 8.5
    finish{array}
    ]

    [
    textbf{Step 2: Formula of Spearman’s correlation (Pearson on ranks)}
    ]

    [
    rho_s =
    frac{sum_{i=1}^{n}bigl(R_{x_i}-bar{R_x}bigr)bigl(R_{y_i}-bar{R_y}bigr)}
    {sqrt{sum_{i=1}^{n}bigl(R_{x_i}-bar{R_x}bigr)^2} ;
    sqrt{sum_{i=1}^{n}bigl(R_{y_i}-bar{R_y}bigr)^2}},
    qquad n = 10
    ]

    [
    textbf{Step 3: Mean of rank variables}
    ]

    [
    bar{R_x} = frac{3+1+2+5+8+4+7+9+10+6}{10} = frac{55}{10} = 5.5
    ]

    [
    bar{R_y} = frac{1+2+4+5.5+8.5+3+5.5+8.5+8.5+8.5}{10}
    = frac{55.5}{10} = 5.55
    ]

    [
    textbf{Step 4: Deviations and cross-products}
    ]

    [
    begin{array}c
    i & R_{x_i}-bar{R_x} & R_{y_i}-bar{R_y} & (R_{x_i}-bar{R_x})(R_{y_i}-bar{R_y}) hline
    1 & -2.5 & -4.55 & 11.38
    2 & -4.5 & -3.55 & 15.98
    3 & -3.5 & -1.55 & 5.43
    4 & -0.5 & -0.05 & 0.03
    5 & 2.5 & 2.95 & 7.38
    6 & -1.5 & -2.55 & 3.83
    7 & 1.5 & -0.05 & -0.08
    8 & 3.5 & 2.95 & 10.33
    9 & 4.5 & 2.95 & 13.28
    10 & 0.5 & 2.95 & 1.48
    end{array}
    ]

    [
    sum (R_{x_i}-bar{R_x})(R_{y_i}-bar{R_y}) = 68.0
    ]

    [
    textbf{Step 5: Sum of squares for each rank variable}
    ]

    [
    sum (R_{x_i}-bar{R_x})^2 = 82.5,
    qquad
    sum (R_{y_i}-bar{R_y})^2 = 82.5
    ]

    [
    textbf{Step 6: Substitute into the formula}
    ]

    [
    rho_s
    = frac{68.0}{sqrt{(82.5)(82.5)}}
    = frac{68.0}{82.5}
    = 0.824
    ]

    [
    textbf{Step 7: Interpretation}
    ]

    [
    rho_s = 0.824
    ]

    A worth of ( rho_s = 0.824 ) reveals a robust optimistic monotonic relationship between Peak and Weight as top will increase, weight additionally tends to extend.

    That is how we calculate the spearman correlation coefficient.

    We even have one other formulation to calculate the Spearman correlation coefficient, however it’s used solely when there aren’t any tied ranks.

    [
    rho_s = 1 – frac{6sum d_i^2}{n(n^2 – 1)}
    ]

    the place:

    [
    begin{aligned}
    rho_s & : text{ Spearman correlation coefficient} [4pt]
    d_i & : textual content{ distinction between the ranks of every remark, } (R_{x_i} – R_{y_i}) [4pt]
    n & : textual content{ complete variety of paired observations}
    finish{aligned}
    ]

    If ties are current, the rank variations not symbolize the precise distances between positions, and we as an alternative calculate ‘ρ’ utilizing the ‘Pearson correlation on ranks’ formulation.


    Dataset

    The dataset used on this weblog is the Fish Market dataset, which accommodates measurements of fish species bought in markets, together with attributes like weight, top, and width.

    It’s publicly out there on Kaggle and is licensed underneath the Creative Commons Zero (CC0 Public Domain) license. This implies it may be freely used, modified, and shared for each non-commercial and business functions with out restriction.


    Spearman’s correlation coefficient helps us perceive how two variables transfer collectively when the connection is just not completely linear.

    By changing the information into ranks, it reveals how nicely one variable will increase as the opposite will increase, capturing any upward or downward sample.

    It is rather useful when the information has outliers, is just not usually distributed or when the connection is monotonic however curved.

    I hope this put up helped you see not simply tips on how to calculate the Spearman correlation coefficient, but additionally when to make use of it and why it is a crucial software in information evaluation.

    Thanks for studying!



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWhat is it? Use Cases, Benefits, Drawbacks
    Next Article Big Tech, Meta and Google Tout Benefits of AI
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    “The success of an AI product depends on how intuitively users can interact with its capabilities”

    November 14, 2025
    Artificial Intelligence

    How to Crack Machine Learning System-Design Interviews

    November 14, 2025
    Artificial Intelligence

    Music, Lyrics, and Agentic AI: Building a Smart Song Explainer using Python and OpenAI

    November 14, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    How to Approach Data Collection for Conversational AI

    May 5, 2025

    How to Transform Data Into Actionable Intelligence with Chris Penn [MAICON 2025 Speaker Series]

    August 14, 2025

    MIT researchers introduce Boltz-1, a fully open-source model for predicting biomolecular structures | MIT News

    April 9, 2025

    AI in Social Research and Polling

    April 4, 2025

    When 50/50 Isn’t Optimal: Debunking Even Rebalancing

    July 24, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    ChatGPT Feels More Human Than Ever. And It’s Causing Concern

    June 10, 2025

    AI’s impact on the job market: Conflicting signals in the early days

    April 29, 2025

    Building A Successful Relationship With Stakeholders

    October 13, 2025
    Our Picks

    “The success of an AI product depends on how intuitively users can interact with its capabilities”

    November 14, 2025

    How to Crack Machine Learning System-Design Interviews

    November 14, 2025

    Music, Lyrics, and Agentic AI: Building a Smart Song Explainer using Python and OpenAI

    November 14, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.