Close Menu
    Trending
    • When Transformers Sing: Adapting SpectralKD for Text-Based Knowledge Distillation
    • How to Keep AI Costs Under Control
    • How to Control a Robot with Python
    • Redefining data engineering in the age of AI
    • Multiple Linear Regression, Explained Simply (Part 1)
    • En ny super prompt kan potentiellt öka kreativiteten i LLM
    • Five with MIT ties elected to National Academy of Medicine for 2025 | MIT News
    • Why Should We Bother with Quantum Computing in ML?
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » Multiple Linear Regression, Explained Simply (Part 1)
    Artificial Intelligence

    Multiple Linear Regression, Explained Simply (Part 1)

    ProfitlyAIBy ProfitlyAIOctober 23, 2025No Comments22 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    On this weblog put up, we focus on a number of linear regression.

    this is likely one of the first algorithms to study in our Machine Studying journey, as it’s an extension of straightforward linear regression.

    We all know that in easy linear regression we now have one unbiased variable and one goal variable, and in a number of linear regression we now have two or extra unbiased variables and one goal variable.

    As an alternative of simply making use of the algorithm utilizing Python, on this weblog, let’s discover the maths behind the a number of linear regression algorithm.

    Let’s think about the Fish Market dataset to grasp the maths behind a number of linear regression.

    This dataset contains bodily attributes of every fish, resembling:

    • Species – the kind of fish (e.g., Bream, Roach, Pike)
    • Weight – the burden of the fish in grams (this will likely be our goal variable)
    • Length1, Length2, Length3 – varied size measurements (in cm)
    • Top – the peak of the fish (in cm)
    • Width – the diagonal width of the fish physique (in cm)

    To grasp a number of linear regression, we’ll use two unbiased variables to maintain it easy and simple to visualise.

    We’ll think about a 20-point pattern from this dataset.

    Picture by Creator

    We thought-about a 20-point pattern from the Fish Market dataset, which incorporates measurements of 20 particular person fish, particularly their top and width together with the corresponding weight. These three values will assist us perceive how a number of linear regression works in follow.

    First, let’s use Python to suit a a number of linear regression mannequin on our 20-point pattern information.

    Code:

    import numpy as np
    import pandas as pd
    from sklearn.linear_model import LinearRegression
    
    # 20-point pattern information from Fish Market dataset
    information = [
        [11.52, 4.02, 242.0],
        [12.48, 4.31, 290.0],
        [12.38, 4.70, 340.0],
        [12.73, 4.46, 363.0],
        [12.44, 5.13, 430.0],
        [13.60, 4.93, 450.0],
        [14.18, 5.28, 500.0],
        [12.67, 4.69, 390.0],
        [14.00, 4.84, 450.0],
        [14.23, 4.96, 500.0],
        [14.26, 5.10, 475.0],
        [14.37, 4.81, 500.0],
        [13.76, 4.37, 500.0],
        [13.91, 5.07, 340.0],
        [14.95, 5.17, 600.0],
        [15.44, 5.58, 600.0],
        [14.86, 5.29, 700.0],
        [14.94, 5.20, 700.0],
        [15.63, 5.13, 610.0],
        [14.47, 5.73, 650.0]
    ]
    
    # Create DataFrame
    df = pd.DataFrame(information, columns=["Height", "Width", "Weight"])
    
    # Unbiased variables (Top and Width)
    X = df[["Height", "Width"]]
    
    # Goal variable (Weight)
    y = df["Weight"]
    
    # Match the mannequin
    mannequin = LinearRegression().match(X, y)
    
    # Extract coefficients
    b0 = mannequin.intercept_           # β₀
    b1, b2 = mannequin.coef_            # β₁ (Top), β₂ (Width)
    
    # Print outcomes
    print(f"Intercept (β₀): {b0:.4f}")
    print(f"Top slope (β₁): {b1:.4f}")
    print(f"Width slope  (β₂): {b2:.4f}")
    

    Outcomes:

    Intercept (β₀): -1005.2810

    Top slope (β₁): 78.1404

    Width slope (β₂): 82.0572

    Right here, we haven’t performed a train-test break up as a result of it’s a small dataset, and we try to grasp the maths behind the mannequin however not construct the mannequin.


    We utilized a number of linear regression utilizing Python on our pattern dataset and we received the outcomes.

    What’s the following step?

    To judge the mannequin to see how good it’s at predictions?

    Not at present!

    We aren’t going to judge the mannequin till we perceive how we received these slope and intercept values within the first place.

    First, we are going to perceive how the mannequin works behind the scenes after which strategy these slope and intercept values utilizing math.


    First, let’s plot our pattern information.

    Picture by Creator

    In relation to easy linear regression, we solely have one unbiased variable, and the info is two-dimensional. We attempt to discover the road that most closely fits the info.

    In a number of linear regression, we might have two or extra unbiased variables, and the info is three-dimensional. We attempt to discover a aircraft that most closely fits the info.

    Right here, we thought-about two unbiased variables, which implies we now have to discover a aircraft that most closely fits the info.

    Picture by Creator

    The Equation of the Aircraft is:

    [
    y = beta_0 + beta_1 x_1 + beta_2 x_2
    ]

    the place

    y: the expected worth of the dependent (goal) variable

    β₀: the intercept (the worth of y when all x’s are 0)

    β₁: the coefficient (or slope) for characteristic x₁

    β₂: the coefficient for characteristic x₂

    x₁, x₂: the unbiased variables (options)

    Let’s say we calculated the intercept and slope values, and we need to calculate the burden at a specific level i.

    For that, we substitute the respective values, and we name it the expected worth, whereas the precise worth is in our dataset. We are actually calculating the expected worth at that time.

    Allow us to denote the expected worth by ŷᵢ.

    [
    hat{y}_i = beta_0 + beta_1 x_{i1} + beta_2 x_{i2}
    ]

    yᵢ represents the precise worth and ŷᵢ represents the expected worth.

    Now at level i, let’s discover the distinction between the precise worth and the expected worth i.e. Residual.

    [
    text{Residual}_i = y_i – hat{y}_i
    ]

    For n information factors, the entire residual will likely be

    [
    sum_{i=1}^{n} (y_i – hat{y}_i)
    ]

    If we calculate simply the sum of residuals, the constructive and destructive errors can cancel out, leading to a misleadingly small complete error.

    Squaring the residuals solves this by guaranteeing all errors contribute positively, whereas additionally giving extra significance to bigger deviations.

    So, we calculate the sum of squared residuals:

    [
    text{SSR} = sum_{i=1}^{n} (y_i – hat{y}_i)^2
    ]

    Visualizing Residuals in A number of Linear Regression

    Right here in a number of linear regression, the mannequin tries to suit a aircraft by way of the info such that the sum of squared residuals is minimized.

    We already know the equation of the aircraft:

    [
    hat{y} = beta_0 + beta_1 x_1 + beta_2 x_2
    ]

    Now we have to discover the equation of the aircraft that most closely fits our pattern information, minimizing the sum of squared residuals.

    We already know that ŷ is the expected worth and x1 and x2 are the values from the dataset.

    Now the remaining phrases β₀, β₁ and β₂.

    How can we discover these slopes and intercept values?

    Earlier than that, let’s see what occurs to the aircraft after we change the intercept (β₀).

    GIF by Creator

    Now, let’s see what occurs after we change the slopes β₁ and β₂.

    GIF by Creator
    GIF by Creator

    We will observe how altering the slopes and intercept impacts the regression aircraft.

    We have to discover these precise values of slopes and intercept, the place the sum of squared residuals is minimal.


    Now, we need to discover the most effective becoming aircraft

    [
    hat{y} = beta_0 + beta_1 x_1 + beta_2 x_2
    ]

    that minimizes the Sum of Squared Residuals (SSR):

    [
    SSR = sum_{i=1}^{n} (y_i – hat{y}_i)^2 = sum_{i=1}^{n} (y_i – beta_0 – beta_1 x_{i1} – beta_2 x_{i2})^2
    ]

    the place

    [
    hat{y}_i = beta_0 + beta_1 x_{i1} + beta_2 x_{i2}
    ]


    How can we discover this equation of greatest becoming aircraft?

    Earlier than continuing additional, let’s return to our faculty days.

    I used to marvel why we wanted to study matters like differentiation, integration, and limits. Do we actually use them in actual life?

    I believed that method as a result of I discovered these matters obscure. However when it got here to comparatively less complicated matters like matrices (at the least to some extent), I by no means questioned why we had been studying them or what their use was.

    It was once I started studying about Machine Studying that I began specializing in these matters.


    Now coming again to the dialogue, let’s think about a straight line.

    y = 2x+1

    Picture by Creator

    Let’s plot these values

    Picture by Creator

    Let’s think about two factors on the straight line.

    (x1, y1) = (2,3) and (x2, y2) = (3,5)

    Now we discover the slope.

    [
    m = frac{y_2 – y_1}{x_2 – x_1} = frac{text{change in } y}{text{change in } x}
    ]

    [
    m = frac{y_2 – y_1}{x_2 – x_1} = frac{5 – 3}{3 – 2} = frac{2}{1} = 2
    ]

    The slope is ‘2’.

    If we think about any two factors and calculate the slope, the worth stays the identical, which implies the change in y with respect to the change in x is similar all through the road.


    Now, let’s think about the equation y=x2.

    Picture by Creator

    let’s plot these values

    Picture by Creator

    y=x2 represents a curve (parabola).

    What’s the slope of this curve?

    Do we now have a single slope for this curve?

    NO.

    We will observe that the slope adjustments repeatedly, that means the speed of change in y with respect to x is just not the identical all through the curve.

    This reveals that the slope adjustments from one level on the curve to a different.

    In different phrases, we will discover the slope at every particular level, however there isn’t one single slope that represents all the curve.

    So, how do we discover the slope of this curve?

    That is the place we introduce Differentiation.

    First, let’s think about some extent x on the x-axis and one other level that’s at a distance h from it, i.e., the purpose x+h.

    The corresponding y-coordinates for these x-values could be f(x) and f(x+h), since y is a operate of x.

    Now we thought-about two factors on the curve (x, f(x)) and (x+h, f(x+h)).

    Now we be part of these two factors and the road which joins the 2 factors on a curve is known as Secant Line.

    Let’s discover the slope between these two factors.

    [
    text{slope} = frac{f(x + h) – f(x)}{(x + h) – x}
    ]

    This provides us the common price of change of ‘y’ with respect to ‘x’ over that interval.

    However since we need to discover the slope at a specific level, we progressively lower the space ‘h’ between the 2 factors.

    As these two factors come nearer and ultimately coincide, the secant line (which joins the 2 factors) turns into a tangent line to the curve at that time. This limiting worth of the slope could be discovered utilizing the idea of limits.

    A tangent line is a straight line that simply touches a curve at one single level.

    It reveals the instantaneous slope of the curve at that time.

    [
    frac{dy}{dx} = lim_{h to 0} frac{f(x + h) – f(x)}{h}
    ]

    Picture by Creator
    GIF by Creator

    That is the idea of differentiation.

    Now let’s discover the slope of the curve y=x2.

    [
    text{Given: } f(x) = x^2
    ]

    [
    text{Derivative: } f'(x) = lim_{h to 0} frac{f(x + h) – f(x)}{h}
    ]
    [
    = lim_{h to 0} frac{(x + h)^2 – x^2}{h}
    ]
    [
    = lim_{h to 0} frac{x^2 + 2xh + h^2 – x^2}{h}
    ]
    [
    = lim_{h to 0} frac{2xh + h^2}{h}
    ]
    [
    = lim_{h to 0} (2x + h)
    ]
    [
    = 2x
    ]

    2x is the slope of the curve y=x2.

    For instance, for x=2 on the curve y=x2, the slope is 2x=2×2=4.

    At this level, we now have the coordinate (2,4) on the curve, and the slope at that time is 4.

    Which means at that precise level, for each 1 unit change in x, there’s a 4 unit change in y.

    Now think about at x=0, the slope is 2×0 = 0.
    Which implies there isn’t any change in y with respect to x.

    then y = 0.

    At level (0,0) we get the slope 0, which implies (0,0) is the minimal level.

    Now that we’ve understood the fundamentals of differentiation, let’s proceed to search out the best-fitted aircraft.


    Now, let’s return to the fee operate

    [
    SSR = sum_{i=1}^{n} (y_i – hat{y}_i)^2 = sum_{i=1}^{n} (y_i – beta_0 – beta_1 x_{i1} – beta_2 x_{i2})^2
    ]

    This additionally represents a curve, because it comprises squared phrases.

    In easy linear regression the fee operate is:

    [
    SSR = sum_{i=1}^{n} (y_i – hat{y}_i)^2 = sum_{i=1}^{n} (y_i – beta_0 – beta_1 x_i)^2
    ]

    Once we think about random slope and intercept values and plot them, we will see a bowl-shaped curve.

    Picture by Creator

    In the identical method as in easy linear regression, we have to discover the purpose the place the slope equals zero, which implies the purpose at which we get the minimal worth of the Sum of Squared Residuals (SSR).

    Right here, this corresponds to discovering the values of β₀, β₁, and β₂ the place the SSR is minimal. This occurs when the derivatives of SSR with respect to every coefficient are equal to zero.

    In different phrases, at this level, there isn’t any change in SSR even with a slight change in β₀, β₁ or β₂, indicating that we now have reached the minimal level of the fee operate.


    In easy phrases, we will say that in our instance of y=x2, we received the spinoff (slope) 2x=0 at x=0, and at that time, y is minimal, which on this case is zero.

    Now, in our loss operate, let’s say SSR=y. Right here, we’re discovering the slope of the loss operate on the level the place the slope turns into zero.

    Within the y=x2 instance, the slope depends upon just one variable x, however in our loss operate, the slope depends upon three variables: β0, β1​ and β2​.

    So, we have to discover the purpose in a four-dimensional house. Identical to we received (0,0) because the minimal level for y=x2, in MLR we have to discover the purpose (β0,β1,β2,SSR) the place the slope (spinoff) equals zero.


    Now let’s proceed with the derivation.

    For the reason that Sum of Squared Residuals (SSR) depends upon the parameters β₀, β₁ and β₂.
    we will symbolize it as a operate of those parameters:

    [
    L(beta_0, beta_1, beta_2) = sum_{i=1}^{n} (y_i – beta_0 – beta_1 x_{i1} – beta_2 x_{i2})^2
    ]

    Derivation:

    Right here, we’re working with three variables, so we can’t use common differentiation. As an alternative, we differentiate every variable individually whereas conserving the others fixed. This course of is known as Partial Differentiation.

    Partial Differentiation w.r.t β₀

    [
    textbf{Loss:}quad L(beta_0,beta_1,beta_2)=sum_{i=1}^{n}big(y_i-beta_0-beta_1 x_{i1}-beta_2 x_{i2}big)^2
    ]

    [
    textbf{Let } e_i = y_i-beta_0-beta_1 x_{i1}-beta_2 x_{i2}quadRightarrowquad L=sum e_i^2.
    ]
    [
    textbf{Differentiate:}quad
    frac{partial L}{partial beta_0}
    = sum_{i=1}^{n} 2 e_i cdot frac{partial e_i}{partial beta_0}
    quadtext{(chain rule: } frac{d}{dtheta}u^2=2u,frac{du}{dtheta}text{)}
    ]
    [
    text{But }frac{partial e_i}{partial beta_0}
    =frac{partial}{partial beta_0}(y_i-beta_0-beta_1 x_{i1}-beta_2 x_{i2})
    =frac{partial y_i}{partial beta_0}
    -frac{partial beta_0}{partial beta_0}
    -frac{partial (beta_1 x_{i1})}{partial beta_0}
    -frac{partial (beta_2 x_{i2})}{partial beta_0}.
    ]
    [
    text{Since } y_i,; x_{i1},; x_{i2} text{ are constants w.r.t. } beta_0,;
    text{their derivatives are zero. Hence } frac{partial e_i}{partial beta_0}=-1.
    ]
    [
    Rightarrowquad frac{partial L}{partial beta_0}
    = sum 2 e_i cdot (-1) = -2sum_{i=1}^{n} e_i.
    ]
    [
    textbf{Set to zero (first-order condition):}quad
    frac{partial L}{partial beta_0}=0 ;Rightarrow; sum_{i=1}^{n} e_i = 0.
    ]
    [
    textbf{Expand } e_i:quad
    sum_{i=1}^{n}big(y_i-beta_0-beta_1 x_{i1}-beta_2 x_{i2}big)=0
    Rightarrow
    sum y_i – nbeta_0 – beta_1sum x_{i1} – beta_2sum x_{i2}=0.
    ]
    [
    textbf{Solve for } beta_0:quad
    beta_0=bar{y}-beta_1 bar{x}_1-beta_2 bar{x}_2
    quadtext{(divide by }ntext{ and use } bar{y}=frac{1}{n}sum y_i,; bar{x}_k=frac{1}{n}sum x_{ik}).
    ]


    Partial Differentiation w.r.t β1

    [
    textbf{Differentiate:}quad
    frac{partial L}{partial beta_1}
    = sum_{i=1}^{n} 2 e_i cdot frac{partial e_i}{partial beta_1}.
    ]

    [
    text{Here }frac{partial e_i}{partial beta_1}
    =frac{partial}{partial beta_1}(y_i-beta_0-beta_1 x_{i1}-beta_2 x_{i2})=-x_{i1}.
    ]
    [
    Rightarrowquad
    frac{partial L}{partial beta_1}
    = sum 2 e_i (-x_{i1})
    = -2sum_{i=1}^{n} x_{i1} e_i.
    ]
    [
    textbf{Set to zero:}quad
    frac{partial L}{partial beta_1}=0
    ;Rightarrow; sum_{i=1}^{n} x_{i1} e_i = 0.
    ]
    [
    textbf{Expand } e_i:quad
    sum x_{i1}big(y_i-beta_0-beta_1 x_{i1}-beta_2 x_{i2}big)=0
    ]
    [
    Rightarrow;
    sum x_{i1}y_i – beta_0sum x_{i1} – beta_1sum x_{i1}^2 – beta_2sum x_{i1}x_{i2}=0.
    ]


    Partial Differentiation w.r.t β2

    [
    textbf{Differentiate:}quad
    frac{partial L}{partial beta_2}
    = sum_{i=1}^{n} 2 e_i cdot frac{partial e_i}{partial beta_2}.
    ]

    [
    text{Here }frac{partial e_i}{partial beta_2}
    =frac{partial}{partial beta_2}(y_i-beta_0-beta_1 x_{i1}-beta_2 x_{i2})=-x_{i2}.
    ]
    [
    Rightarrowquad
    frac{partial L}{partial beta_2}
    = sum 2 e_i (-x_{i2})
    = -2sum_{i=1}^{n} x_{i2} e_i.
    ]
    [
    textbf{Set to zero:}quad
    frac{partial L}{partial beta_2}=0
    ;Rightarrow; sum_{i=1}^{n} x_{i2} e_i = 0.
    ]
    [
    textbf{Expand } e_i:quad
    sum x_{i2}big(y_i-beta_0-beta_1 x_{i1}-beta_2 x_{i2}big)=0
    ]
    [
    Rightarrow;
    sum x_{i2}y_i – beta_0sum x_{i2} – beta_1sum x_{i1}x_{i2} – beta_2sum x_{i2}^2=0.
    ]


    We obtained these three equations after performing partial differentiation.

    [
    sum y_i – nbeta_0 – beta_1sum x_{i1} – beta_2sum x_{i2} = 0 quad (1)
    ]

    [
    sum x_{i1}y_i – beta_0sum x_{i1} – beta_1sum x_{i1}^2 – beta_2sum x_{i1}x_{i2} = 0 quad (2)
    ]
    [
    sum x_{i2}y_i – beta_0sum x_{i2} – beta_1sum x_{i1}x_{i2} – beta_2sum x_{i2}^2 = 0 quad (3)
    ]

    Now we remedy these three equations to get the values of β₀, β₁ and β₂.

    From equation (1):

    [
    sum y_i – nbeta_0 – beta_1sum x_{i1} – beta_2sum x_{i2} = 0
    ]

    Rearranged:

    [
    nbeta_0 = sum y_i – beta_1sum x_{i1} – beta_2sum x_{i2}
    ]

    Divide each side by ( n ):

    [
    beta_0 = frac{1}{n}sum y_i – beta_1frac{1}{n}sum x_{i1} – beta_2frac{1}{n}sum x_{i2}
    ]

    Outline the averages:

    [
    bar{y} = frac{1}{n}sum y_i,quad
    bar{x}_1 = frac{1}{n}sum x_{i1},quad
    bar{x}_2 = frac{1}{n}sum x_{i2}
    ]

    Ultimate type for the intercept:

    [
    beta_0 = bar{y} – beta_1bar{x}_1 – beta_2bar{x}_2
    ]


    Let’s substitute ‘β₀’ in equation 2

    Step 1: Begin with Equation (2)

    [
    sum x_{i1}y_i – beta_0sum x_{i1} – beta_1sum x_{i1}^2 – beta_2sum x_{i1}x_{i2} = 0
    ]

    Step 2: Substitute the expression for ( beta_0 )

    [
    beta_0 = frac{sum y_i – beta_1sum x_{i1} – beta_2sum x_{i2}}{n}
    ]

    Step 3: Substitute into Equation (2)

    [
    sum x_{i1}y_i
    – left( frac{sum y_i – beta_1sum x_{i1} – beta_2sum x_{i2}}{n} right)sum x_{i1}
    – beta_1 sum x_{i1}^2
    – beta_2 sum x_{i1}x_{i2} = 0
    ]

    Step 4: Increase and simplify

    [
    sum x_{i1}y_i
    – frac{ sum x_{i1} sum y_i }{n}
    + beta_1 cdot frac{ ( sum x_{i1} )^2 }{n}
    + beta_2 cdot frac{ sum x_{i1} sum x_{i2} }{n}
    – beta_1 sum x_{i1}^2
    – beta_2 sum x_{i1}x_{i2}
    = 0
    ]

    Step 5: Rearranged type (Equation 4)

    [
    beta_1 left( sum x_{i1}^2 – frac{ ( sum x_{i1} )^2 }{n} right)
    +
    beta_2 left( sum x_{i1}x_{i2} – frac{ sum x_{i1} sum x_{i2} }{n} right)
    =
    sum x_{i1}y_i – frac{ sum x_{i1} sum y_i }{n}
    quad text{(4)}
    ]


    Now substituting ‘β₀’ in equation 3:

    Step 1: Begin with Equation (3)

    [
    sum x_{i2}y_i – beta_0sum x_{i2} – beta_1sum x_{i1}x_{i2} – beta_2sum x_{i2}^2 = 0
    ]

    Step 2: Use the expression for ( beta_0 )

    [
    beta_0 = frac{sum y_i – beta_1sum x_{i1} – beta_2sum x_{i2}}{n}
    ]

    Step 3: Substitute ( beta_0 ) into Equation (3)

    [
    sum x_{i2}y_i
    – left( frac{sum y_i – beta_1sum x_{i1} – beta_2sum x_{i2}}{n} right)sum x_{i2}
    – beta_1 sum x_{i1}x_{i2}
    – beta_2 sum x_{i2}^2 = 0
    ]

    Step 4: Increase the expression

    [
    sum x_{i2}y_i
    – frac{ sum x_{i2} sum y_i }{n}
    + beta_1 cdot frac{ sum x_{i1} sum x_{i2} }{n}
    + beta_2 cdot frac{ ( sum x_{i2} )^2 }{n}
    – beta_1 sum x_{i1}x_{i2}
    – beta_2 sum x_{i2}^2 = 0
    ]

    Step 5: Rearranged type (Equation 5)

    [
    beta_1 left( sum x_{i1}x_{i2} – frac{ sum x_{i1} sum x_{i2} }{n} right)
    +
    beta_2 left( sum x_{i2}^2 – frac{ ( sum x_{i2} )^2 }{n} right)
    =
    sum x_{i2}y_i – frac{ sum x_{i2} sum y_i }{n}
    quad text{(5)}
    ]


    We received these two equations:

    [
    beta_1 left( sum x_{i1}^2 – frac{ left( sum x_{i1} right)^2 }{n} right)
    +
    beta_2 left( sum x_{i1}x_{i2} – frac{ sum x_{i1} sum x_{i2} }{n} right)
    =
    sum x_{i1}y_i – frac{ sum x_{i1} sum y_i }{n}
    quad text{(4)}
    ]

    [
    beta_1 left( sum x_{i1}x_{i2} – frac{ sum x_{i1} sum x_{i2} }{n} right)
    +
    beta_2 left( sum x_{i2}^2 – frac{ left( sum x_{i2} right)^2 }{n} right)
    =
    sum x_{i2}y_i – frac{ sum x_{i2} sum y_i }{n}
    quad text{(5)}
    ]

    Now, we use Cramer’s rule to get the formulation for β₁ and β₂.

    We begin from the simplified equations (4) and (5):

    [
    beta_1 left( sum x_{i1}^2 – frac{ ( sum x_{i1} )^2 }{n} right)
    +
    beta_2 left( sum x_{i1}x_{i2} – frac{ sum x_{i1} sum x_{i2} }{n} right)
    =
    sum x_{i1}y_i – frac{ sum x_{i1} sum y_i }{n}
    quad text{(4)}
    ]

    [
    beta_1 left( sum x_{i1}x_{i2} – frac{ sum x_{i1} sum x_{i2} }{n} right)
    +
    beta_2 left( sum x_{i2}^2 – frac{ ( sum x_{i2} )^2 }{n} right)
    =
    sum x_{i2}y_i – frac{ sum x_{i2} sum y_i }{n}
    quad text{(5)}
    ]

    Allow us to outline:

    ( A = sum x_{i1}^2 – frac{(sum x_{i1})^2}{n} )
    ( B = sum x_{i1}x_{i2} – frac{(sum x_{i1})(sum x_{i2})}{n} )
    ( D = sum x_{i2}^2 – frac{(sum x_{i2})^2}{n} )
    ( C = sum x_{i1}y_i – frac{(sum x_{i1})(sum y_i)}{n} )
    ( E = sum x_{i2}y_i – frac{(sum x_{i2})(sum y_i)}{n} )

    Now, rewrite the system:

    [
    begin{cases}
    beta_1 A + beta_2 B = C
    beta_1 B + beta_2 D = E
    end{cases}
    ]

    We remedy this 2×2 system utilizing Cramer’s Rule.

    First, compute the determinant:

    [
    Delta = AD – B^2
    ]

    Then apply Cramer’s Rule:

    [
    beta_1 = frac{CD – BE}{AD – B^2}, qquad
    beta_2 = frac{AE – BC}{AD – B^2}
    ]

    Now substitute again the unique summation phrases:

    [
    beta_1 =
    frac{
    left( sum x_{i2}^2 – frac{(sum x_{i2})^2}{n} right)
    left( sum x_{i1}y_i – frac{(sum x_{i1})(sum y_i)}{n} right)
    –
    left( sum x_{i1}x_{i2} – frac{(sum x_{i1})(sum x_{i2})}{n} right)
    left( sum x_{i2}y_i – frac{(sum x_{i2})(sum y_i)}{n} right)
    }{
    left[
    left( sum x_{i1}^2 – frac{(sum x_{i1})^2}{n} right)
    left( sum x_{i2}^2 – frac{(sum x_{i2})^2}{n} right)
    –
    left( sum x_{i1}x_{i2} – frac{(sum x_{i1})(sum x_{i2})}{n} right)^2
    right]
    }
    ]

    [
    beta_2 =
    frac{
    left( sum x_{i1}^2 – frac{(sum x_{i1})^2}{n} right)
    left( sum x_{i2}y_i – frac{(sum x_{i2})(sum y_i)}{n} right)
    –
    left( sum x_{i1}x_{i2} – frac{(sum x_{i1})(sum x_{i2})}{n} right)
    left( sum x_{i1}y_i – frac{(sum x_{i1})(sum y_i)}{n} right)
    }{
    left[
    left( sum x_{i1}^2 – frac{(sum x_{i1})^2}{n} right)
    left( sum x_{i2}^2 – frac{(sum x_{i2})^2}{n} right)
    –
    left( sum x_{i1}x_{i2} – frac{(sum x_{i1})(sum x_{i2})}{n} right)^2
    right]
    }
    ]

    If the info are centered (means are zero), then the second phrases vanish and we get the simplified type:

    [
    beta_1 =
    frac{
    (sum x_{i2}^2)(sum x_{i1}y_i)
    –
    (sum x_{i1}x_{i2})(sum x_{i2}y_i)
    }{
    (sum x_{i1}^2)(sum x_{i2}^2) – (sum x_{i1}x_{i2})^2
    }
    ]

    [
    beta_2 =
    frac{
    (sum x_{i1}^2)(sum x_{i2}y_i)
    –
    (sum x_{i1}x_{i2})(sum x_{i1}y_i)
    }{
    (sum x_{i1}^2)(sum x_{i2}^2) – (sum x_{i1}x_{i2})^2
    }
    ]

    Lastly, we now have derived the formulation for β₁ and β₂.


    Allow us to compute β₀, β₁ and β₂ for our pattern dataset, however earlier than that permit’s perceive what centering truly means.

    We begin with a small dataset of three observations and a pair of options:

    [
    begin{array}c
    hline
    text{i} & x_{i1} & x_{i2} & y_i
    hline
    1 & 2 & 3 & 10
    2 & 4 & 5 & 14
    3 & 6 & 7 & 18
    hline
    end{array}
    ]

    Step 1: Compute means

    [
    bar{x}_1 = frac{2 + 4 + 6}{3} = 4, quad
    bar{x}_2 = frac{3 + 5 + 7}{3} = 5, quad
    bar{y} = frac{10 + 14 + 18}{3} = 14
    ]

    Step 2: Middle the info (subtract the imply)

    [
    x’_{i1} = x_{i1} – bar{x}_1, quad
    x’_{i2} = x_{i2} – bar{x}_2, quad
    y’_i = y_i – bar{y}
    ]

    [
    begin{array}c
    hline
    text{i} & x’_{i1} & x’_{i2} & y’_i
    hline
    1 & -2 & -2 & -4
    2 & 0 & 0 & 0
    3 & +2 & +2 & +4
    hline
    end{array}
    ]

    Now test the sums:

    [
    sum x’_{i1} = -2 + 0 + 2 = 0, quad
    sum x’_{i2} = -2 + 0 + 2 = 0, quad
    sum y’_i = -4 + 0 + 4 = 0
    ]

    Step 3: Perceive what centering does to sure phrases

    Within the regular equations, we see phrases like:

    [
    sum x_{i1} y_i – frac{ sum x_{i1} sum y_i }{n}
    ]

    If the info are centered:

    [
    sum x_{i1} = 0, quad sum y_i = 0 quad Rightarrow quad frac{0 cdot 0}{n} = 0
    ]

    So the time period turns into:

    [
    sum x_{i1} y_i
    ]

    And if we instantly use the centered values:

    [
    sum x’_{i1} y’_i
    ]

    These are equal:

    [
    sum (x_{i1} – bar{x}_1)(y_i – bar{y}) = sum x_{i1} y_i – frac{ sum x_{i1} sum y_i }{n}
    ]

    Step 4: Evaluate uncooked and centered calculation

    Utilizing unique values:

    [
    sum x_{i1} y_i = (2)(10) + (4)(14) + (6)(18) = 184
    ]

    [
    sum x_{i1} = 12, quad sum y_i = 42, quad n = 3
    ]

    [
    frac{12 cdot 42}{3} = 168
    ]

    [
    sum x_{i1} y_i – frac{ sum x_{i1} sum y_i }{n} = 184 – 168 = 16
    ]

    Now utilizing centered values:

    [
    sum x’_{i1} y’_i = (-2)(-4) + (0)(0) + (2)(4) = 8 + 0 + 8 = 16
    ]

    Identical consequence.

    Step 5: Why we middle

    – Simplifies the formulation by eradicating additional phrases
    – Ensures imply of all variables is zero
    – Improves numerical stability
    – Makes intercept simpler to calculate:

    [
    beta_0 = bar{y} – beta_1 bar{x}_1 – beta_2 bar{x}_2
    ]

    Step 6:

    After centering, we will instantly use:

    [
    sum (x’_{i1})(y’_i), quad
    sum (x’_{i2})(y’_i), quad
    sum {(x’_{i1})}^2, quad
    sum {(x’_{i2})}^2, quad
    sum (x’_{i1})(x’_{i2})
    ]

    And the simplified formulation for ( beta_1 ) and ( beta_2 ) turn into simpler to compute.

    That is how we derived the formulation for β₀, β₁ and β₂.

    [
    beta_1 =
    frac{
    left( sum x_{i2}^2 right)left( sum x_{i1} y_i right)
    –
    left( sum x_{i1} x_{i2} right)left( sum x_{i2} y_i right)
    }{
    left( sum x_{i1}^2 right)left( sum x_{i2}^2 right)
    –
    left( sum x_{i1} x_{i2} right)^2
    }
    ]

    [
    beta_2 =
    frac{
    left( sum x_{i1}^2 right)left( sum x_{i2} y_i right)
    –
    left( sum x_{i1} x_{i2} right)left( sum x_{i1} y_i right)
    }{
    left( sum x_{i1}^2 right)left( sum x_{i2}^2 right)
    –
    left( sum x_{i1} x_{i2} right)^2
    }
    ]

    [
    beta_0 = bar{y}
    quad text{(since the data is centered)}
    ]

    Notice: After centering, we proceed utilizing the identical symbols ( x_{i1}, x_{i2}, y_i ) to symbolize the centered variables.


    Now, let’s compute β₀, β₁ and β₂ for our pattern dataset.

    Step 1: Compute Means (Authentic Information)

    $$
    bar{x}_1 = frac{1}{n} sum x_{i1} = 13.841, quad
    bar{x}_2 = frac{1}{n} sum x_{i2} = 4.9385, quad
    bar{y} = frac{1}{n} sum y_i = 481.5
    $$

    Step 2: Middle the Information

    $$
    x’_{i1} = x_{i1} – bar{x}_1, quad
    x’_{i2} = x_{i2} – bar{x}_2, quad
    y’_i = y_i – bar{y}
    $$

    Step 3: Compute Centered Summations

    $$
    sum x’_{i1} y’_i = 2465.60, quad
    sum x’_{i2} y’_i = 816.57
    $$

    $$
    sum (x’_{i1})^2 = 24.3876, quad
    sum (x’_{i2})^2 = 3.4531, quad
    sum x’_{i1} x’_{i2} = 6.8238
    $$

    Step 4: Compute Shared Denominator

    $$
    Delta = (24.3876)(3.4531) – (6.8238)^2 = 37.6470
    $$

    Step 5: Compute Slopes

    $$
    beta_1 =
    frac{
    (3.4531)(2465.60) – (6.8238)(816.57)
    }{
    37.6470
    }
    =
    frac{2940.99}{37.6470}
    = 78.14
    $$

    $$
    beta_2 =
    frac{
    (24.3876)(816.57) – (6.8238)(2465.60)
    }{
    37.6470
    }
    =
    frac{3089.79}{37.6470}
    = 82.06
    $$

    Notice: Whereas the slopes had been computed utilizing centered variables, the ultimate mannequin makes use of the unique variables.
    So, compute the intercept utilizing:

    $$
    beta_0 = bar{y} – beta_1 bar{x}_1 – beta_2 bar{x}_2
    $$

    Step 6: Compute Intercept

    $$
    beta_0 = 481.5 – (78.14)(13.841) – (82.06)(4.9385)
    $$

    $$
    = 481.5 – 1081.77 – 405.01 = -1005.28
    $$

    Ultimate Regression Equation:

    $$
    y_i = -1005.28 + 78.14 cdot x_{i1} + 82.06 cdot x_{i2}
    $$

    That is how we get the ultimate slope and intercept values when making use of a number of linear regression in Python.


    Dataset

    The dataset used on this weblog is the Fish Market dataset, which comprises measurements of fish species bought in markets, together with attributes like weight, top, and width.

    It’s publicly obtainable on Kaggle and is licensed beneath the Creative Commons Zero (CC0 Public Domain) license. This implies it may be freely used, modified, and shared for each non-commercial and business functions with out restriction.


    Whether or not you’re new to machine studying or just enthusiastic about understanding the maths behind a number of linear regression, I hope this weblog gave you some readability.

    Keep tuned for Half 2, the place we’ll see what adjustments when greater than two predictors come into play.

    In the meantime, if you happen to’re enthusiastic about how credit score scoring fashions are evaluated, my latest weblog on the Gini Coefficient explains it in easy phrases. You may learn it here.

    Thanks for studying!



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleEn ny super prompt kan potentiellt öka kreativiteten i LLM
    Next Article Redefining data engineering in the age of AI
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    When Transformers Sing: Adapting SpectralKD for Text-Based Knowledge Distillation

    October 23, 2025
    Artificial Intelligence

    How to Keep AI Costs Under Control

    October 23, 2025
    Artificial Intelligence

    How to Control a Robot with Python

    October 23, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Eco-driving measures could significantly reduce vehicle emissions | MIT News

    August 7, 2025

    New technologies tackle brain health assessment for the military | MIT News

    August 25, 2025

    CIOs to Control 50% of Fortune 100 Budgets by 2030

    July 17, 2025

    DoE selects MIT to establish a Center for the Exascale Simulation of Coupled High-Enthalpy Fluid–Solid Interactions | MIT News

    September 10, 2025

    How to Train a Chatbot Using RAG and Custom Data

    June 25, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    NumPy API on a GPU?

    July 23, 2025

    The problem with AI agents

    June 12, 2025

    How to more efficiently study complex treatment interactions | MIT News

    July 16, 2025
    Our Picks

    When Transformers Sing: Adapting SpectralKD for Text-Based Knowledge Distillation

    October 23, 2025

    How to Keep AI Costs Under Control

    October 23, 2025

    How to Control a Robot with Python

    October 23, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.