Close Menu
    Trending
    • Optimizing Data Transfer in Distributed AI/ML Training Workloads
    • Achieving 5x Agentic Coding Performance with Few-Shot Prompting
    • Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found
    • From Transactions to Trends: Predict When a Customer Is About to Stop Buying
    • America’s coming war over AI regulation
    • “Dr. Google” had its issues. Can ChatGPT Health do better?
    • Evaluating Multi-Step LLM-Generated Content: Why Customer Journeys Require Structural Metrics
    • Why SaaS Product Management Is the Best Domain for Data-Driven Professionals in 2026
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » PyTorch Tutorial for Beginners: Build a Multiple Regression Model from Scratch
    Artificial Intelligence

    PyTorch Tutorial for Beginners: Build a Multiple Regression Model from Scratch

    ProfitlyAIBy ProfitlyAINovember 19, 2025No Comments15 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    earlier than LLMs turned hyped, there was an nearly seen line separating Machine Studying frameworks from Deep Studying frameworks.

    The speak was focused on Scikit-Study, XGBoost, and related for ML, whereas PyTorch and TensorFlow dominated the scene when Deep Studying was the matter.

    After the AI explosion, although, I’ve been seeing PyTorch dominating the scene rather more than TensorFlow. Each frameworks are actually highly effective, enabling Information Scientists to unravel totally different sorts of issues, Pure Language Processing being one in all them, due to this fact growing the recognition of Deep Studying as soon as once more.

    Nicely, on this publish, my concept is to not discuss NLP, however as a substitute, I’ll work with a multivariable linear regression downside with two targets in thoughts:

    • Educating find out how to create a mannequin utilizing PyTorch
    • Sharing information about Linear Regression that’s not all the time present in different tutorials.

    Let’s dive in.

    Getting ready the Information

    Alright, let me spare you from a flowery definition of Linear Regression. You in all probability noticed that too many instances in numerous tutorials all around the Web. So, sufficient to say that when you will have a variable Y that you simply wish to predict and one other variable X that may clarify Y’s variation utilizing a straight line, that’s, in essence, Linear Regression.

    Dataset

    For this train, let’s use the Abalone dataset [1].

    Nash, W., Sellers, T., Talbot, S., Cawthorn, A., & Ford, W. (1994). Abalone [Dataset]. UCI Machine Studying Repository. https://doi.org/10.24432/C55C7W.

    Based on the dataset documentation, the age of abalone is set by slicing the shell via the cone, staining it, and counting the variety of rings via a microscope, a boring and time-consuming process. Different measurements, that are simpler to acquire, are used to foretell the age.

    So, allow us to go forward and cargo the information. Moreover, we’ll One Sizzling Encode the variable Intercourse, since it’s the solely categorical one.

    # Information Load
    from ucimlrepo import fetch_ucirepo
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    sns.set_style('darkgrid')
    from feature_engine.encoding import OneHotEncoder
    
    # fetch dataset
    abalone = fetch_ucirepo(id=1)
    
    # information (as pandas dataframes)
    X = abalone.information.options
    y = abalone.information.targets
    
    # One Sizzling Encode Intercourse
    ohe = OneHotEncoder(variables=['Sex'])
    X = ohe.fit_transform(X)
    
    # View
    df = pd.concat([X,y], axis=1)

    Right here’s the dataset.

    Dataset header. Picture by the writer.

    So, to be able to create a greater mannequin, let’s discover the information.

    Exploring the Information

    The primary steps I prefer to carry out when exploring a dataset are:

    1. Checking the goal variable’s distribution.

    #  our Goal variable
    plt.hist(y)
    plt.title('Rings [Target Variable] Distribution');

    The graphic exhibits that the goal variable shouldn’t be usually distributed. That may impression the regression, however normally may be corrected with an influence transformation, comparable to log or Field-Cox.

    Goal variable distribution. Picture by the writer.

    2. Take a look at the statistical description.

    The stats can present us vital data like imply, customary deviation, and simply spot some discrepancies when it comes to minimal or most values. The explanatory variables are just about okay, inside a smaller vary, and similar scale. The goal variable (Rings) is in a distinct scale.

    # Statistical description
    df.describe()
    Statistical description. Picture by the writer.

    Subsequent, let’s examine the correlations.

    # Wanting on the correlations
    (df
     .drop(['Sex_M', 'Sex_I', 'Sex_F'],axis=1)
     .corr()
     .fashion
     .background_gradient(cmap='coolwarm')
    )
    Correlations. Picture by the writer.

    The explanatory variables have a reasonable to sturdy correlation with Rings. We will additionally see that there’s some collinearity between Whole_weight with Shucked_weight, Viscera_weight, and Shell_weight. Size and Diameter are additionally collinear. We will check eradicating them later.

    sns.pairplot(df);

    Once we plot the pairs scatterplots and have a look at the connection of the variables with Rings, we will shortly establish some issues

    • The idea of homoscedasticity is violated. Which means the connection shouldn’t be homogeneous when it comes to variance.
    • Look how the plots type a cone form, growing the variance of Y because the X values improve. When estimating the worth of Rings for greater values of the X variables, the estimate won’t be very correct.
    • The variable Peak has no less than two outliers which might be very seen when Peak > 0.3.
    Pairplots no transformation. Picture by the writer.

    Eradicating the outliers and reworking the goal variable to logarithms will consequence within the subsequent plot of the pairs. It’s higher, however nonetheless doesn’t remedy the homoscedasticity downside.

    Pairplots after transformation. Picture by the writer.

    One other fast exploration we will do is plotting some graphics to examine the connection of the variables when grouped by the Intercourse variable.

    The variable Diameter has probably the most linear relationship when Intercourse=I, however that’s all.

    # Create a FacetGrid with scatterplots
    sns.lmplot(x="Diameter", y="Rings", hue="Intercourse", col="Intercourse", order=2, information=df);
    Diameter x Rings. Picture by the writer.

    Alternatively, Shell_weight has an excessive amount of dispersion for prime values, distorting the linear relationship.

    # Create a FacetGrid with scatterplots
    sns.lmplot(x="Shell_weight", y="Rings", hue="Intercourse", col="Intercourse", information=df);
    Shell_weight x Rings. Picture by the writer.

    All of this exhibits {that a} Linear Regression mannequin could be actually difficult for this dataset, and can in all probability fail. However we nonetheless wish to do it.

    By the best way, I don’t keep in mind seeing a publish the place we truly undergo what went fallacious. So, by doing this, we will additionally be taught beneficial classes.

    Modeling: Utilizing Scikit-Study

    Let’s run the sklearn mannequin and consider it utilizing Root Imply Squared Error.

    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import root_mean_squared_error
    
    df2 = df.question('Peak < 0.3 and Rings > 2 ').copy()
    X = df2.drop(['Rings'], axis=1)
    y = np.log(df2['Rings'])
    
    lr = LinearRegression()
    lr.match(X, y)
    
    predictions = lr.predict(X)
    
    df2['Predictions'] = np.exp(predictions)
    print(root_mean_squared_error(df2['Rings'], df2['Predictions']))
    2.2383762717104916

    If we have a look at the header, we will affirm that the mannequin struggles with estimates for greater values (e.g., rows 0, 6, 7, and 9).

    Header with predictions. Picture by the writer.

    One Step Again: Attempting Different Transformations

    Alright. So what can we do now?

    Most likely take away extra outliers and check out once more. Let’s attempt utilizing an unsupervised algorithm to seek out some extra outliers. We are going to apply the Native Outlier Issue, dropping 5% of the outliers.

    We can even take away the multicollinearity, dropping Whole_weight and Size.

    from sklearn.neighbors import LocalOutlierFactor
    from sklearn.preprocessing import StandardScaler
    from sklearn.pipeline import Pipeline
    
    # fetch dataset
    abalone = fetch_ucirepo(id=1)
    
    # information (as pandas dataframes)
    X = abalone.information.options
    y = abalone.information.targets
    
    # One Sizzling Encode Intercourse
    ohe = OneHotEncoder(variables=['Sex'])
    X = ohe.fit_transform(X)
    
    # Drop Complete Weight and Size (multicolinearity)
    X.drop(['Whole_weight', 'Length'], axis=1, inplace=True)
    
    # View
    df = pd.concat([X,y], axis=1)
    
    # Let's create a Pipeline to scale the information and discover outliers utilizing KNN Classifier
    steps = [
    ('scale', StandardScaler()),
    ('LOF', LocalOutlierFactor(contamination=0.05))
    ]
    # Match and predict
    outliers = Pipeline(steps).fit_predict(X)
    
    # Add column
    df['outliers'] = outliers
    
    # Modeling
    df2 = df.question('Peak < 0.3 and Rings > 2 and outliers != -1').copy()
    X = df2.drop(['Rings', 'outliers'], axis=1)
    y = np.log(df2['Rings'])
    
    lr = LinearRegression()
    lr.match(X, y)
    
    predictions = lr.predict(X)
    
    df2['Predictions'] = np.exp(predictions)
    print(root_mean_squared_error(df2['Rings'], df2['Predictions']))
    2.238174395913869

    Identical consequence. Hmm….

    Okay. we will preserve enjoying with the variables and have engineering, and we’ll begin seeing some enhancements right here and there, like after we add the squared of Peak, Diameter, and Shell_weight. That added to the outliers therapy will drop the RMSE to 2.196.

    # Second Order Variables
    X['Diameter_2'] = X['Diameter'] ** 2
    X['Height_2'] = X['Height'] ** 2
    X['Shell_2'] = X['Shell_weight'] ** 2

    Definitely, it’s honest to notice that each variable added in Linear Regression fashions will impression the R² and typically inflate the consequence, giving a false concept that the mannequin is enhancing, when it’s not. On this case, the mannequin is definitely enhancing, since we’re including some non-linear elements to it with the second order variables. We will show that by calculating the adjusted R². It went from 0.495 to 0.517.

    # Adjusted R²
    from sklearn.metrics import r2_score
    
    r2 = r2_score(df2['Rings'], df2['Predictions'])
    n= df2.form[0]
    p = df2.form[1] - 1
    adj_r2 = 1 - (1 - r2) * (n - 1) / (n - p - 1)
    print(f'R²: {r2}')
    print(f'Adjusted R²: {adj_r2}')

    Alternatively, bringing again Whole_weight and Size can enhance a bit of extra the numbers, however I might not suggest it. If we do this, we’re including multicolinearity and inflating the significance of some variables’ coefficients, resulting in potential estimation errors sooner or later.

    Modeling: Utilizing PyTorch

    Okay. Now that we now have a base mannequin created, the thought is to create a Linear mannequin utilizing Deep Studying and attempt to beat the RMSE of two.196.

    Proper. To begin, let me state this upfront: Deep Studying fashions work higher with scaled information. Nonetheless, as our X variables are all in the identical scale, we gained’t want to fret about that. So let’s preserve transferring.

    import torch
    import torch.nn as nn
    import torch.optim as optim
    from torch.utils.information import DataLoader, TensorDataset

    We have to put together the information for modeling with PyTorch. Right here, we’d like some changes to make the information acceptable by the PyTorch framework, because it gained’t take common pandas dataframes.

    • Let’s use the identical information body from our base mannequin.
    • Break up X and Y
    • Rework the Y variable to log
    • Rework each to numpy arrays, since PyTorch gained’t take dataframes.
    df2 = df.question('Peak < 0.3 and Rings > 2 and outliers != -1').copy()
    X = df2.drop(['Rings', 'outliers'], axis=1)
    y = np.log(df2[['Rings']])
    
    # X and Y to Numpy
    X = X.to_numpy()
    y = y.to_numpy()

    Subsequent, utilizing TensorDataset, we make X and Y change into a Tensor object, and print the consequence.

    # Put together with TensorData
    # TensorData helps us reworking the dataset to Tensor object
    dataset = TensorDataset(torch.tensor(X).float(), torch.tensor(y).float())
    
    input_sample, label_sample = dataset[0]
    print(f'** Enter pattern: {input_sample}, n** Label pattern: {label_sample}')
    ** Enter pattern: tensor([0.3650, 0.0950, 0.2245, 0.1010, 0.1500, 1.0000, 
    0.0000, 0.0000, 0.1332, 0.0090, 0.0225]), 
    ** Label pattern: tensor([2.7081])

    Then, utilizing the DataLoader operate, we will create batches of information. Which means the Neural Community will take care of a batch_size quantity of information at a time.

    # Subsequent, let's use DataLoader
    batch_size = 500
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

    PyTorch fashions are greatest outlined as lessons.

    • The class relies on the nn.Module, which is PyTorch’s base class for neural networks.
    • We outline the mannequin layers we wish to use within the init methodology.
      • tremendous().__init__() ensures the category will behave like a torch object.
    • The ahead methodology describes what occurs to the enter when handed to the mannequin.

    Right here, we cross it via Linear layers that we outlined within the init methodology, and use ReLU activation capabilities so as to add some non-linearity to the mannequin within the ahead cross.

    # 2. Creating a category
    class AbaloneModel(nn.Module):
      def __init__(self):
        tremendous().__init__()
        self.linear1 = nn.Linear(in_features=X.form[1], out_features=128)
        self.linear2 = nn.Linear(128, 64)
        self.linear3 = nn.Linear(64, 32)
        self.linear4 = nn.Linear(32, 1)
    
      def ahead(self, x):
        x = self.linear1(x)
        x = nn.practical.relu(x)
        x = self.linear2(x)
        x = nn.practical.relu(x)
        x = self.linear3(x)
        x = nn.practical.relu(x)
        x = self.linear4(x)
        return x
    
    # Instantiate mannequin
    mannequin = AbaloneModel()

    Subsequent, let’s attempt the mannequin for the primary time utilizing a script that simulates a Random Search.

    • Create an error criterion for mannequin analysis
    • Create a listing to carry the information from one of the best mannequin and setup the best_loss as a excessive worth, so it will likely be changed by higher loss numbers through the iteration.
    • Setup the vary for the educational price. We are going to use energy elements from -2 to -4 (e.g. from 0.01 to 0.0001).
    • Setup a spread for the momentum from 0.9 to 0.99.
    • Get the information
    • Zero the gradient to clear gradient calculations from earlier iterations.
    • Match the mannequin
    • Compute the loss and register one of the best mannequin’s numbers.
    • Compute the weights and biases with the backward cross.
    • Iterate N instances and print one of the best mannequin.
    # Imply Squared Error (MSE) is customary for regression
    criterion = nn.MSELoss()
    
    # Random Search
    values = []
    best_loss = 999
    for idx in vary(1000):
      # Randomly pattern a studying price issue between 2 and 4
      issue = np.random.uniform(2,5)
      lr = 10 ** -factor
    
      # Randomly choose a momentum between 0.85 and 0.99
      momentum = np.random.uniform(0.90, 0.99)
    
      # 1. Get Information
      function, goal = dataset[:]
      # 2. Zero Gradients: Clear outdated gradients earlier than the backward cross
      optimizer = optim.SGD(mannequin.parameters(), lr=lr, momentum=momentum)
      optimizer.zero_grad()
      # 3. Ahead Cross: Compute prediction
      y_pred = mannequin(function)
      # 4. Compute Loss
      loss = criterion(y_pred, goal)
      # 4.1 Register greatest Loss
      if loss < best_loss:
        best_loss = loss
        best_lr = lr
        best_momentum = momentum
        best_idx = idx
    
      # 5. Backward Cross: Compute gradient of the loss w.r.t W and b'
      loss.backward()
      # 6. Replace Parameters: Regulate W and b utilizing the calculated gradients
      optimizer.step()
      values.append([idx, lr, momentum, loss])
    
    print(f'n: {idx},lr: {lr}, momentum: {momentum}, loss: {loss}')
    n: 999,lr: 0.004782946959508322, momentum: 0.9801209929050066, loss: 0.06135804206132889

    As soon as we get one of the best studying price and momentum, we will transfer on.

    # --- 3. Loss Perform and Optimizer ---
    
    # Imply Squared Error (MSE) is customary for regression
    criterion = nn.MSELoss()
    
    # Stochastic Gradient Descent (SGD) with a small studying price (lr)
    optimizer = optim.SGD(mannequin.parameters(), lr=0.004, momentum=0.98)

    Then, we’ll re-train this mannequin, utilizing the identical steps as earlier than, however this time preserving the identical studying price and momentum.

    Becoming a PyTorch mannequin wants an extended script than the common match() methodology from Scikit-Study. However it’s not an enormous deal. The construction will all the time be just like these steps:

    1. Activate the mannequin.prepare() mode
    2. Create a loop for the variety of iterations you need. Every iteration is named an epoch.
    3. Zero the gradients from earlier passes with optimizer.zero_grad().
    4. Get the batches from the dataloader.
    5. Compute the predictions with mannequin(X)
    6. Calculate the loss utilizing criterion(y_pred, goal).
    7. Do the Backward Cross to compute the weights and bias: loss.backward()
    8. Replace the weights and bias with optimizer.step()

    We are going to prepare this mannequin for 1000 epochs (iterations). Right here, we’re solely including a step to get one of the best mannequin on the finish, so we ensure to make use of the mannequin with one of the best loss.

    # 4. Coaching
    torch.manual_seed(42)
    NUM_EPOCHS = 1001
    loss_history = []
    best_loss = 999
    
    # Put mannequin in coaching mode
    mannequin.prepare()
    
    for epoch in vary(NUM_EPOCHS):
      for information in dataloader:
    
        # 1. Get Information
        function, goal = information
    
        # 2. Zero Gradients: Clear outdated gradients earlier than the backward cross
        optimizer.zero_grad()
    
        # 3. Ahead Cross: Compute prediction
        y_pred = mannequin(function)
    
        # 4. Compute Loss
        loss = criterion(y_pred, goal)
        loss_history.append(loss)
    
        # Get Finest Mannequin
        if loss < best_loss:
          best_loss = loss
          best_model_state = mannequin.state_dict()  # save greatest mannequin
    
        # 5. Backward Cross: Compute gradient of the loss w.r.t W and b'
        loss.backward()
    
        # 6. Replace Parameters: Regulate W and b utilizing the calculated gradients
        optimizer.step()
    
        # Load one of the best mannequin earlier than returning predictions
        mannequin.load_state_dict(best_model_state)
    
      # Print standing each 50 epochs
      if epoch % 200 == 0:
        print(epoch, loss.merchandise())
        print(f'Finest Loss: {best_loss}')
    
    0 0.061786893755197525
    Finest Loss: 0.06033024191856384
    200 0.036817338317632675
    Finest Loss: 0.03243456035852432
    400 0.03307393565773964
    Finest Loss: 0.03077109158039093
    600 0.032522525638341904
    Finest Loss: 0.030613820999860764
    800 0.03488151729106903
    Finest Loss: 0.029514113441109657
    1000 0.0369877889752388
    Finest Loss: 0.029514113441109657

    Good. The mannequin is skilled. Now it’s time to consider.

    Analysis

    Let’s examine if this mannequin did higher than the common regression. For that, I’ll put the mannequin in analysis mode by utilizing mannequin.eval(), so PyTorch is aware of that it wants to vary the habits from coaching and get into inference mode. It should flip off layer normalization and dropouts, for instance.

    # Get options
    options, targets = dataset[:]
    
    # Get Predictions
    mannequin.eval()
    with torch.no_grad():
      predictions = mannequin(options)
    
    # Add to dataframe
    df2['Predictions'] = np.exp(predictions.detach().numpy())
    
    # RMSE
    print(root_mean_squared_error(df2['Rings'], df2['Predictions']))
    2.1108551025390625

    The development was modest, about 4%.

    Let’s have a look at some predictions from every mannequin.

    Predictions from each fashions. Picture by the writer.

    Each fashions are getting very related outcomes. They wrestle extra because the variety of Rings turns into greater. That’s as a result of cone form of the goal variable.

    If we predict that via for a second:

    • Because the variety of Rings will increase, there’s extra variance coming from the explanatory variable.
    • An Abalone with 15 rings will likely be inside a a lot wider vary of values than one other one with 4 rings.
    • This confuses the mannequin as a result of it wants to attract a single line in the course of the information that’s not that linear.

    Earlier than You Go

    We discovered quite a bit on this venture:

    • Find out how to discover information.
    • Find out how to examine if the linear mannequin could be an excellent choice.
    • Find out how to create a PyTorch mannequin for a multivariable Linear Regression.

    In the long run, we noticed {that a} goal variable that’s not homogeneous, even after energy transformations, can result in a low-performing mannequin. Our mannequin remains to be higher than taking pictures the common worth for all of the predictions, however the error remains to be excessive, staying about 20% of the imply worth.

    We tried to make use of Deep Studying to enhance the consequence, however all that energy was not sufficient to decrease the error significantly. I might in all probability go together with the Scikit-Study mannequin, since it’s easier and extra explainable.

    Different choices to attempt to enhance the outcomes could be making a customized ensemble mannequin with a Random Forest + Linear Regression. However that could be a process that I go away to you, if you need.

    Should you appreciated this content material, discover me on my web site.

    https://gustavorsantos.me

    GitHub Repository

    The code for this train.

    https://github.com/gurezende/Linear-Regression-PyTorch

    References

    [1. Abalone Dataset – UCI Repository, CC BY 4.0 license.] https://archive.ics.uci.edu/dataset/1/abalone

    [2. Eval mode] https://stackoverflow.com/questions/60018578/what-does-model-eval-do-in-pytorch

    https://docs.pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval

    [3. PyTorch Docs] https://docs.pytorch.org/docs/stable/nn.html

    [4. Kaggle Notebook] https://www.kaggle.com/code/samlakhmani/s4e4-deeplearning-with-oof-strategy

    [5. GitHub Repo] https://github.com/gurezende/Linear-Regression-PyTorch



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleAnthropic Says It Detected a Major AI-Powered Hack by China
    Next Article Developing Human Sexuality in the Age of AI
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Optimizing Data Transfer in Distributed AI/ML Training Workloads

    January 23, 2026
    Artificial Intelligence

    Achieving 5x Agentic Coding Performance with Few-Shot Prompting

    January 23, 2026
    Artificial Intelligence

    Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found

    January 23, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Finding “Silver Bullet” Agentic AI Flows with syftr

    August 19, 2025

    MCP Client Development with Streamlit: Build Your AI-Powered Web App

    July 21, 2025

    How to Optimize Your AI Coding Agent Context

    January 6, 2026

    This Self-Driving Taxi Could Replace Uber by 2025 — And It’s Backed by Toyota

    April 25, 2025

    Microsofts framtidsvision för internet: NLWeb med AI-chatbottar integrerade på alla webbplatser

    May 20, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Build and Deploy Your First Supply Chain App in 20 Minutes

    December 4, 2025

    Dreaming in Blocks — MineWorld, the Minecraft World Model

    October 10, 2025

    It’s pretty easy to get DeepSeek to talk dirty

    June 19, 2025
    Our Picks

    Optimizing Data Transfer in Distributed AI/ML Training Workloads

    January 23, 2026

    Achieving 5x Agentic Coding Performance with Few-Shot Prompting

    January 23, 2026

    Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found

    January 23, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.