Close Menu
    Trending
    • From Transactions to Trends: Predict When a Customer Is About to Stop Buying
    • America’s coming war over AI regulation
    • “Dr. Google” had its issues. Can ChatGPT Health do better?
    • Evaluating Multi-Step LLM-Generated Content: Why Customer Journeys Require Structural Metrics
    • Why SaaS Product Management Is the Best Domain for Data-Driven Professionals in 2026
    • Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFrames
    • What Other Industries Can Learn from Healthcare’s Knowledge Graphs
    • Everyone wants AI sovereignty. No one can truly have it.
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » The Hidden Trap of Fixed and Random Effects
    Artificial Intelligence

    The Hidden Trap of Fixed and Random Effects

    ProfitlyAIBy ProfitlyAIJuly 18, 2025No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    What Are Random Results and Mounted Results?

    When designing a research, we frequently intention to isolate impartial variables from these of no curiosity to watch their true results on the dependent variables. For instance, let’s say we wish to research the consequences of utilizing Github Copilot (impartial variable) on developer productiveness (dependent variable). One method is to measure how a lot time builders spend utilizing Copilot and the way rapidly they full coding duties. At first look, we could observe a robust constructive correlation: extra Copilot utilization, sooner job completion.

    Nonetheless, different elements can even affect how rapidly builders end their work. For instance, Firm A may need sooner CI/CD pipelines or cope with smaller and less complicated duties, whereas Firm B could require prolonged code opinions or deal with extra complicated and time-consuming duties. If we don’t account for these organizational variations, we’d mistakenly conclude that Copilot is much less efficient for builders in Firm B, though it’s the surroundings, not Copilot, that really slows them down.

    These sorts of group-level variations — variations throughout groups, firms, or initiatives — are sometimes often known as “random results“ or “fastened results“.

    Mounted results are variables of curiosity, the place every group is handled individually utilizing one-hot coding. This manner, for the reason that within-group variability is captured neatly inside every dummy variable, we’re assuming the variance of every group is analogous, or homoscedastic.

    [y_i = beta_0 + beta_1 x_i + gamma_1 D_{1i} + gamma_2 D_{2i} + cdots + varepsilon_i]

    the place D1i, D2i, … respectively are dummy variables representing group D1i, D2i, … and γ₁, γ₂, … respectively are fastened impact coefficients for every corresponding group.

    Random results, then again, are sometimes not variables of curiosity. We assume every group is a part of a broader inhabitants and every group impact lies someplace inside a broader likelihood distribution of that inhabitants. As such, the variance of every group is heterogeneous.

    [ y_{ij} = beta_0 + beta_1 x_{ij} + u_j + varepsilon_{ij} ]

    the place uj is a random impact of group j of pattern i, drawn from a distribution, sometimes a standard distribution 𝒩(0, σ²ᵤ).

    Rethink Fastidiously Mounted and Random Results

    Nonetheless, it might mislead your evaluation when you simply randomly insert these results into your mannequin with out pondering fastidiously about what sorts of variations they’re truly capturing.

    I just lately labored on a challenge analyzing Environmental Impacts of AI models, which I studied how sure architectural options (variety of parameters, variety of compute, dataset measurement, and coaching time) and {hardware} decisions ({hardware} kind, variety of {hardware}) of AI fashions have an effect on power use throughout coaching. I discovered that Training_time, Hardware_quantity, and Hardware_type considerably affected the power utilization. The connection may be roughly modeled as:

    [ text{energy} = text{Training_time} + text{Hardware_quantity} + text{Hardware}]

    Since I assumed there is perhaps variations between organizations, for instance, in coding model, code construction, or algorithm preferences, I believed that together with Group as random results would assist account for all of those unobserved potential variations. To check my assumption, I in contrast the outcomes of two fashions: with and with out Group, to see which one is a greater match. Within the two fashions, the dependent variable Power was extraordinarily right-skewed, so I utilized a log transformation to stabilize its variance. Right here I used Generalized Linear Fashions (GLM) because the distribution of my knowledge was not regular.

    glm <- glm(
      log_Energy ~ Training_time_hour + 
                   Hardware_quantity + 
                   Training_hardware,
                   knowledge = df)
    abstract(glm)
    
    glm_random_effects <- glmer(
      log_Energy ~ Training_time_hour + 
                   Hardware_quantity + 
                   Training_hardware + 
                   (1 | Group), // Random results
                   knowledge = df)
    abstract(glm_random_effects)
    AIC(glm_random_effects)

    The GLM mannequin with out Group produced an AIC of 312.55, with Training_time, Hardware_quantity, and sure sorts of {Hardware} have been statistically important.

    > abstract(glm)
    
    Name:
    glm(components = log_Energy ~ Training_time_hour + Hardware_quantity + 
        Training_hardware, knowledge = df)
    
    Coefficients:
                                                     Estimate Std. Error t worth Pr(>|t|)    
    (Intercept)                                     7.134e+00  1.393e+00   5.123 5.07e-06 ***
    Training_time_hour                              1.509e-03  2.548e-04   5.922 3.08e-07 ***
    Hardware_quantity                               3.674e-04  9.957e-05   3.690 0.000563 ***
    Training_hardwareGoogle TPU v3                  1.887e+00  1.508e+00   1.251 0.216956    
    Training_hardwareGoogle TPU v4                  3.270e+00  1.591e+00   2.055 0.045247 *  
    Training_hardwareHuawei Ascend 910              2.702e+00  2.485e+00   1.087 0.282287    
    Training_hardwareNVIDIA A100                    2.528e+00  1.511e+00   1.674 0.100562    
    Training_hardwareNVIDIA A100 SXM4 40 GB         3.103e+00  1.750e+00   1.773 0.082409 .  
    Training_hardwareNVIDIA A100 SXM4 80 GB         3.866e+00  1.745e+00   2.216 0.031366 *  
    Training_hardwareNVIDIA GeForce GTX 285        -4.077e+00  2.412e+00  -1.690 0.097336 .  
    Training_hardwareNVIDIA GeForce GTX TITAN X    -9.706e-01  1.969e+00  -0.493 0.624318    
    Training_hardwareNVIDIA GTX Titan Black        -8.423e-01  2.415e+00  -0.349 0.728781    
    Training_hardwareNVIDIA H100 SXM5 80GB          3.600e+00  1.864e+00   1.931 0.059248 .  
    Training_hardwareNVIDIA P100                   -1.663e+00  1.899e+00  -0.876 0.385436    
    Training_hardwareNVIDIA Quadro P600            -1.970e+00  2.419e+00  -0.814 0.419398    
    Training_hardwareNVIDIA Quadro RTX 4000        -1.367e+00  2.424e+00  -0.564 0.575293    
    Training_hardwareNVIDIA Quadro RTX 5000        -2.309e+00  2.418e+00  -0.955 0.344354    
    Training_hardwareNVIDIA Tesla K80               1.761e+00  1.988e+00   0.886 0.380116    
    Training_hardwareNVIDIA Tesla V100 DGXS 32 GB   3.415e+00  1.833e+00   1.863 0.068501 .  
    Training_hardwareNVIDIA Tesla V100S PCIe 32 GB  3.698e+00  2.413e+00   1.532 0.131852    
    Training_hardwareNVIDIA V100                   -3.638e-01  1.582e+00  -0.230 0.819087    
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for gaussian household taken to be 3.877685)
    
        Null deviance: 901.45  on 69  levels of freedom
    Residual deviance: 190.01  on 49  levels of freedom
    AIC: 312.55
    
    Variety of Fisher Scoring iterations: 2

    However, the GLM mannequin with Group produced an AIC of 300.38, a lot decrease than the earlier mannequin, indicating a greater mannequin match. Nonetheless, when taking a more in-depth look, I observed a major problem: The statistical significance of different variables have gone away, as if Group took away the importance from them!

    > abstract(glm_random_effects)
    Linear blended mannequin match by REML ['lmerMod']
    Formulation: log_Energy ~ Training_time_hour + Hardware_quantity + Training_hardware +  
        (1 | Group)
       Information: df
    
    REML criterion at convergence: 254.4
    
    Scaled residuals: 
         Min       1Q   Median       3Q      Max 
    -1.65549 -0.24100  0.01125  0.26555  1.51828 
    
    Random results:
     Teams       Title        Variance Std.Dev.
     Group (Intercept) 3.775    1.943   
     Residual                 1.118    1.057   
    Variety of obs: 70, teams:  Group, 44
    
    Mounted results:
                                                     Estimate Std. Error t worth
    (Intercept)                                     6.132e+00  1.170e+00   5.243
    Training_time_hour                              1.354e-03  2.111e-04   6.411
    Hardware_quantity                               3.477e-04  7.035e-05   4.942
    Training_hardwareGoogle TPU v3                  2.949e+00  1.069e+00   2.758
    Training_hardwareGoogle TPU v4                  2.863e+00  1.081e+00   2.648
    Training_hardwareHuawei Ascend 910              4.086e+00  2.534e+00   1.613
    Training_hardwareNVIDIA A100                    3.959e+00  1.299e+00   3.047
    Training_hardwareNVIDIA A100 SXM4 40 GB         3.728e+00  1.551e+00   2.404
    Training_hardwareNVIDIA A100 SXM4 80 GB         4.950e+00  1.478e+00   3.349
    Training_hardwareNVIDIA GeForce GTX 285        -3.068e+00  2.502e+00  -1.226
    Training_hardwareNVIDIA GeForce GTX TITAN X     4.503e-02  1.952e+00   0.023
    Training_hardwareNVIDIA GTX Titan Black         2.375e-01  2.500e+00   0.095
    Training_hardwareNVIDIA H100 SXM5 80GB          4.197e+00  1.552e+00   2.704
    Training_hardwareNVIDIA P100                   -1.132e+00  1.512e+00  -0.749
    Training_hardwareNVIDIA Quadro P600            -1.351e+00  1.904e+00  -0.710
    Training_hardwareNVIDIA Quadro RTX 4000        -2.167e-01  2.503e+00  -0.087
    Training_hardwareNVIDIA Quadro RTX 5000        -1.203e+00  2.501e+00  -0.481
    Training_hardwareNVIDIA Tesla K80               1.559e+00  1.445e+00   1.079
    Training_hardwareNVIDIA Tesla V100 DGXS 32 GB   3.751e+00  1.536e+00   2.443
    Training_hardwareNVIDIA Tesla V100S PCIe 32 GB  3.487e+00  1.761e+00   1.980
    Training_hardwareNVIDIA V100                    7.019e-01  1.434e+00   0.489
    
    Correlation matrix not proven by default, as p = 21 > 12.
    Use print(x, correlation=TRUE)  or
        vcov(x)        when you want it
    
    match warnings:
    Some predictor variables are on very totally different scales: think about rescaling
    > AIC(glm_random_effects)
    [1] 300.3767

    Pondering over it fastidiously, it made numerous sense. Sure organizations could persistently choose particular sorts of {hardware}, or bigger organizations could possibly afford costlier {hardware} and assets to coach larger AI fashions. In different phrases, the random results right here possible overlapped and overly defined the variations of our accessible impartial variables, therefore they absorbed a big portion of what we have been making an attempt to review.

    This highlights an essential level: whereas random or fastened results are helpful instruments to manage for undesirable group-level variations, they will additionally unintentionally seize the underlying variations of our impartial variables. We must always fastidiously think about what these results really signify, earlier than simply blindly introducing them to our fashions hoping they’d fortunately take in all of the noise.


    References: Steve Halfway, Information Evaluation in R, https://bookdown.org/steve_midway/DAR/random-effects.html



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleGain a Better Understanding of Computer Vision: Dynamic SOLO (SOLOv2) with TensorFlow
    Next Article Exploratory Data Analysis: Gamma Spectroscopy in Python (Part 2)
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    From Transactions to Trends: Predict When a Customer Is About to Stop Buying

    January 23, 2026
    Artificial Intelligence

    Evaluating Multi-Step LLM-Generated Content: Why Customer Journeys Require Structural Metrics

    January 22, 2026
    Artificial Intelligence

    Why SaaS Product Management Is the Best Domain for Data-Driven Professionals in 2026

    January 22, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    OpenAI Declares “Code Red” as Google Threatens Its Dominance

    December 9, 2025

    Do You Smell That? Hidden Technical Debt in AI Development

    January 15, 2026

    Top Priorities for Shared Services and GBS Leaders for 2026

    September 1, 2025

    Understanding Reasoning in Large Language Models

    November 13, 2025

    Why Data Neutrality Matters More Than Ever in AI Training Data

    January 13, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Stop Feeling Lost :  How to Master ML System Design

    October 16, 2025

    Så här påverkar ChatGPT vårt vardagsspråk

    July 16, 2025

    How to Create an LLM Judge That Aligns with Human Labels

    July 21, 2025
    Our Picks

    From Transactions to Trends: Predict When a Customer Is About to Stop Buying

    January 23, 2026

    America’s coming war over AI regulation

    January 23, 2026

    “Dr. Google” had its issues. Can ChatGPT Health do better?

    January 22, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.