Close Menu
    Trending
    • OpenAIs nya webbläsare ChatGPT Atlas
    • Creating AI that matters | MIT News
    • Scaling Recommender Transformers to a Billion Parameters
    • Hidden Gems in NumPy: 7 Functions Every Data Scientist Should Know
    • Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI
    • ChatGPT Gets More Personal. Is Society Ready for It?
    • Why the Future Is Human + Machine
    • Why AI Is Widening the Gap Between Top Talent and Everyone Else
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » Stochastic Differential Equations and Temperature — NASA Climate Data pt. 2
    Artificial Intelligence

    Stochastic Differential Equations and Temperature — NASA Climate Data pt. 2

    ProfitlyAIBy ProfitlyAISeptember 3, 2025No Comments17 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    surprise why we’d wish to mannequin a temperature time sequence as an Ornstein-Uhlenbeck course of, and if this has any sensible implications. Effectively, it does. Whereas the O-U course of is utilized in Physics to mannequin the rate of particles below friction, additionally it is utilized in finance to mannequin rates of interest, volatility, or unfold imply reversion. This temperature utility originates from an space the place I’ve private expertise. My thesis, “Neural Community Approaches to Pricing Climate Spinoffs,” focuses on growing new strategies for pricing these derivatives.

    The O-U course of is usually utilized in pricing climate derivatives. A monetary by-product designed to ship a payout primarily based on an underlying local weather index (temperature, rainfall, windspeed, snowfall). These derivatives allow anybody uncovered to local weather dangers to handle their threat publicity. That could possibly be farmers involved about drought, vitality corporations nervous about rising heating prices, or seasonal retailers involved a couple of poor season. You may learn rather more about this in Weather Derivatives, Antonis Alexandridis Ok. (with whom I’m at the moment working), and Achilleas D. Zapranis.

    SDEs present up wherever uncertainty lies. The Black Scholes, Schrödinger Equations, and Geometric Brownian Movement are all Stochastic Differential Equations. These equations describe how methods evolve when a random part is concerned. SDEs are notably helpful for forecasting below uncertainty, whether or not that be inhabitants development, epidemic unfold, or inventory costs. In the present day, we will probably be taking a look at how we are able to mannequin temperature utilizing the Ornstein–Uhlenbeck course of, which behaves fairly a bit just like the Warmth Equation, a PDE describing temperature flow.

    Graph by Writer

    In my previous two tales:

    The Local weather knowledge can also be accessible on my GitHub, so we are able to skip this step.

    In the present day we’ll:

    • Clarify how the O-U course of can mannequin a temperature time-series.
    • Speak about mean-reverting processes, stochastic differential equations, and the way the O-U course of pertains to the warmth equation.
    • Use Python to suit an Ornstein-Uhlenbeck (O-U) course of, a Stochastic Differential Equation (SDE), to NASA local weather knowledge.
    Graph by Writer

    Within the determine above, we are able to see how volatility adjustments with the seasons. This emphasizes an vital motive why the O-U course of is well-suited for modelling this time sequence, because it doesn’t assume fixed volatility.

    The O-U course of, mean-reversion, and the warmth equation

    We’ll use the O-U course of to mannequin how temperature adjustments over time at a set level. In our equation, T(t) represents the temperature as a operate of time (t). As you intuitively perceive, the Earth’s temperatures cycle round a 365-day imply. Which means that seasonal adjustments are modelled utilizing s(t). We are able to consider these means because the blue and orange traces within the temperature graphs for Barrow, Alaska, and Niamey, Niger.

    [dT(t) = ds(t) – kappa left( T(t) – s(t) right) , dt + sigma(t) , dB(t)]

    Graph by Writer
    Graph by Writer

    Kappa (κ) is the velocity of imply reversion. Very like its title would indicate, this fixed controls how briskly our temperature reverts again to its seasonal means. Imply reversion on this context signifies that if right this moment is an icy day, colder than the common, then we’d anticipate tomorrow’s or subsequent week’s temperature to be hotter because it fluctuates round that seasonal imply.

    [
    dT(t) = ds(t) – {color{red}{kappa}} big( T(t) – s(t) big), dt + sigma(t), dB(t)
    ]

    This fixed performs an identical position within the warmth equation, the place it describes how briskly a rod will increase in temperature as it’s being heated. Within the warmth equation, κ is the thermal diffusivity, which describes how briskly warmth flows by a cloth.

    [
    frac{partial u(mathbf{x},t)}{partial t} = {color{red}{kappa}} , nabla^2 u(mathbf{x},t) + q(mathbf{x},t)
    ]

    • κ = 0: No imply reversion. The method reduces to Brownian movement with drift.
    • 0 < κ < 1: Weak imply reversion. The shocks persist, sluggish decay of autocorrelation.
    • κ > 1: Robust imply reversion. Deviations are corrected shortly, and the method is tightly clustered round μ.

    Fourier Collection

    Estimating Imply and Volatility

    [s(t) = a + b t + sum_{k=1}^{K_1} left( alpha_k sinleft( frac{2pi k t}{365} right) + beta_k cosleft( frac{2pi k t}{365} right) right)]

    [sigma^2(t) = c + sum_{k=1}^{K_2} left( gamma_k sinleft( frac{2pi k t}{365} right) + delta_k cosleft( frac{2pi k t}{365} right) right)]

    Becoming the Ornstein–Uhlenbeck course of to Mumbai knowledge

    We match the imply of our temperature course of

    • Match the imply utilizing the Fourier Collection
    • Mannequin short-term dependence — and calculate the imply reversion parameter κ utilizing the AR(1) course of
    • Match Volatility utilizing Fourier Collection

    Becoming the imply

    By becoming our knowledge to 80% of the info, we are able to later use our SDE to forecast temperatures. Our imply is an OLS regression becoming s(t) to our knowledge.

    Graph by Writer
    # --------------------
    # Config (parameters and paths for evaluation)
    # --------------------
    
    CITY        = "Mumbai, India"     # Identify of town being analyzed (utilized in labels/plots)
    SEED        = 42                  # Random seed for reproducibility of outcomes
    SPLIT_FRAC  = 0.80                # Fraction of knowledge to make use of for coaching (relaxation for testing)
    MEAN_HARM   = 2                   # Variety of harmonic phrases to make use of for modeling the imply (seasonality)
    VOL_HARM    = 3                   # Variety of harmonic phrases to make use of for modeling volatility (seasonality)
    LJUNG_LAGS  = 10                  # Variety of lags for Ljung-Field check (examine autocorrelation in residuals)
    EPS         = 1e-12               # Small worth to keep away from division by zero or log(0) points
    MIN_TEST_N  = 8                   # Minimal variety of check factors required to maintain a legitimate check set
    
    # --------------------
    # Paths (the place enter/output information are saved)
    # --------------------
    
    # Base listing in Google Drive the place local weather knowledge and outcomes are saved
    BASE_DIR    = Path("/content material/drive/MyDrive/TDS/Local weather")
    
    # Subdirectory particularly for outputs of the Mumbai evaluation
    OUT_BASE    = BASE_DIR / "Benth_Mumbai"
    
    # Subfolder for saving plots generated throughout evaluation
    PLOTS_DIR   = OUT_BASE / "plots"
    
    # Make sure the output directories exist (create them in the event that they don’t)
    OUT_BASE.mkdir(dad and mom=True, exist_ok=True)
    PLOTS_DIR.mkdir(dad and mom=True, exist_ok=True)
    
    # Path to the local weather knowledge CSV file (enter dataset)
    CSV_PATH    = BASE_DIR / "climate_data.csv"
    

    It’s far more durable to see the sign within the noise when wanting on the residuals of our regression. We is likely to be tempted to imagine that that is white-noise, however we’d be mistaken. There may be nonetheless dependence on this knowledge, a serial dependence. If we want to mannequin volatility later utilizing a Fourier sequence, we should account for this dependence within the knowledge.

    Graph by Writer
    # =================================================================
    # 1) MEAN MODEL: residuals after becoming seasonal imply + diagnostics
    # =================================================================
    
    # Residuals after subtracting fitted seasonal imply (earlier than AR step)
    mu_train = mean_fit.predict(prepare)
    x_train = prepare["DAT"] - mu_train
    
    # Save imply mannequin regression abstract to textual content file
    save_model_summary_txt(mean_fit, DIAG_DIR / f"{m_slug}_mean_OLS_summary.txt")
    
    # --- Plot: noticed DAT vs fitted seasonal imply (prepare + check) ---
    fig = plt.determine(figsize=(12,5))
    plt.plot(prepare["Date"], prepare["DAT"], lw=1, alpha=0.8, label="DAT (prepare)")
    plt.plot(prepare["Date"], mu_train, lw=2, label="μ̂(t) (prepare)")
    
    # Predict and plot fitted imply for check set
    mu_test = mean_fit.predict(check[["trend"] + [c for c in train.columns if c.startswith(("cos","sin"))][:2*MEAN_HARM]])
    plt.plot(check["Date"], check["DAT"], lw=1, alpha=0.8, label="DAT (check)")
    plt.plot(check["Date"], mu_test, lw=2, label="μ̂(t) (check)")
    
    plt.title("Mumbai — DAT and seasonal imply match")
    plt.xlabel("Date"); plt.ylabel("Temperature (DAT)")
    plt.legend()
    fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_mean_fit_timeseries.png", dpi=160); plt.shut(fig)
    
    # --- Plot: imply residuals (time sequence) ---
    fig = plt.determine(figsize=(12,4))
    plt.plot(prepare["Date"], x_train, lw=1)
    plt.axhline(0, shade="ok", lw=1)
    plt.title("Mumbai — Residuals after imply match (x_t = DAT - μ̂)")
    plt.xlabel("Date"); plt.ylabel("x_t")
    fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_mean_residuals_timeseries.png", dpi=160); plt.shut(fig)

    κ, ϕ, and the autoregressive course of

    Match an AR(1) course of. This tells you the velocity of imply reversion.

    An AR(1) course of can seize the dependence between our residual at time step t primarily based on the worth at t-1. The precise worth for κ depends upon location. For Mumbai, κ = 0.861. This implies temperatures return to the imply pretty shortly.

    [X_t = c + phi X_{t-1} + varepsilon_t, quad varepsilon_t sim mathcal{N}(0,sigma^2)]

    # =======================================================
    # 2) AR(1) MODEL: match and residual diagnostics
    # =======================================================
    
    # Extract AR(1) parameter φ and save abstract
    phi = float(ar_fit.params.iloc[0]) if len(ar_fit.params) else np.nan
    save_model_summary_txt(ar_fit, DIAG_DIR / f"{m_slug}_ar1_summary.txt")
    
    # --- Scatterplot: x_t vs x_{t-1} with fitted line φ x_{t-1} ---
    x_t = x_train.iloc[1:].values
    x_tm1 = x_train.iloc[:-1].values
    x_pred = phi * x_tm1
    fig = plt.determine(figsize=(5.8,5.2))
    plt.scatter(x_tm1, x_t, s=10, alpha=0.6, label="Noticed")
    xline = np.linspace(np.min(x_tm1), np.max(x_tm1), 2)
    plt.plot(xline, phi*xline, lw=2, label=f"Fitted: x_t = {phi:.3f} x_(t-1)")
    plt.title("Mumbai — AR(1) scatter: x_t vs x_{t-1}")
    plt.xlabel("x_{t-1}"); plt.ylabel("x_t"); plt.legend()
    fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_ar1_scatter.png", dpi=160); plt.shut(fig)
    
    # --- Time sequence: precise vs fitted AR(1) values ---
    fig = plt.determine(figsize=(12,4))
    plt.plot(prepare["Date"].iloc[1:], x_t, lw=1, label="x_t")
    plt.plot(prepare["Date"].iloc[1:], x_pred, lw=2, label=r"$hat{x}_t = phi x_{t-1}$")
    plt.axhline(0, shade="ok", lw=1)
    plt.title("Mumbai — AR(1) fitted vs noticed deviations")
    plt.xlabel("Date"); plt.ylabel("x_t")
    plt.legend()
    fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_ar1_timeseries_fit.png", dpi=160); plt.shut(fig)
    
    # --- Save AR diagnostics (φ + Ljung-Field check p-value) ---
    from statsmodels.stats.diagnostic import acorr_ljungbox
    strive:
        lb = acorr_ljungbox(e_train, lags=[10], return_df=True)
        lb_p = float(lb["lb_pvalue"].iloc[0])   # check for autocorrelation
        lb_stat = float(lb["lb_stat"].iloc[0])
    besides Exception:
        lb_p = np.nan; lb_stat = np.nan
    
    pd.DataFrame([{"phi": phi, "ljungbox_stat_lag10": lb_stat, "ljungbox_p_lag10": lb_p}])
        .to_csv(DIAG_DIR / f"{m_slug}_ar1_diagnostics.csv", index=False)
    Graph by Writer

    Within the graph above, we are able to see how our AR(1) course of, in orange, estimates the following day’s temperature from 1981 to 2016. We are able to additional justify the usage of the AR(1) course of, and achieve additional instinct into it by plotting Xₜ and Xₜ₋₁. Right here we are able to see how these values are clearly correlated. By framing it this fashion, we are able to additionally see that the AR(1) course of is solely linear regression. We don’t even have so as to add a drift/intercept time period as we now have eliminated the imply. Therefore, our intercept is at all times zero.

    Graph by Writer

    Now, taking a look at our AR(1) residuals, we are able to start to consider how we are able to mannequin the volatility of our temperature throughout time. From the graph beneath, we are able to see how our residuals pulsate throughout time in a seemingly periodic means. We are able to postulate that the factors within the yr with increased residuals correspond to time intervals with increased volatility.

    # --- Residuals from AR(1) mannequin ---
    e_train = ar_fit.resid.astype(float).values
    fig = plt.determine(figsize=(12,4))
    plt.plot(prepare["Date"].iloc[1:], e_train, lw=1)
    plt.axhline(0, shade="ok", lw=1)
    plt.title("Mumbai — AR(1) residuals ε_t")
    plt.xlabel("Date"); plt.ylabel("ε_t")
    fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_ar1_residuals_timeseries.png", dpi=160); plt.shut(fig)
    
    Graph by Writer

    Volatility Fourier Collection

    Our resolution is to mannequin the volatility utilizing a Fourier sequence, very like what we did for our imply. To do that, we must make some transformations to our residual εₜ. We contemplate the squared εₜ² as a result of volatility can’t be detrimental by definition.

    [varepsilon_t = sigma_t eta_t qquad varepsilon_t^{2} = sigma_t^{2}eta_t^{2}]

    We are able to consider these residuals as being comprised of half standardized shocks ηₜ, with a imply of 0 and a variance of 1 (representing randomness), and volatility σₜ. We wish to isolate for σₜ. This may be completed extra simply by taking the log. However extra importantly, doing this makes our error phrases additive.

    [displaystyle y_t := log(varepsilon_t^2) = underbrace{logsigma_t^2}_{text{deterministic, seasonal}} + underbrace{logeta_t^2}_{text{random noise}}]

    In abstract, modeling log(εₜ²) helps to:

    • maintain σ^t₂ > 0
    • Scale back the impact that outliers have on the match, which means they’ll have a much less important influence on the match.

    After we match log(εₜ²), if we ever want to recuperate εₜ², we exponentiate it later.

    [displaystyle widehat{sigma}_t^{,2} ;=; exp!big(widehat y_tbig)]

    # =======================================================
    # 3) VOLATILITY REGRESSION: match log(ε_t^2) on seasonal harmonics
    # =======================================================
    if vol_fit is just not None:
        # Compute log-squared residuals (proxy for variance)
        log_eps2 = np.log(e_train**2 + 1e-12)
    
        # Use cosine/sine harmonics as regressors for volatility
        feats = prepare.iloc[1:][[c for c in train.columns if c.startswith(("cos","sin"))][:2*VOL_HARM]]
        vol_terms = [f"{b}{k}" for k in range(1, VOL_HARM + 1) for b in ("cos","sin")]
        Xbeta = np.asarray(vol_fit.predict(prepare.iloc[1:][vol_terms]), dtype=float)
    
        # --- Time plot: noticed vs fitted log-variance ---
        fig = plt.determine(figsize=(12,4))
        plt.plot(prepare["Date"].iloc[1:], log_eps2, lw=1, label="log(ε_t^2)")
        plt.plot(prepare["Date"].iloc[1:], Xbeta, lw=2, label="Fitted log-variance")
        plt.title("Mumbai — Volatility regression (log ε_t^2)")
        plt.xlabel("Date"); plt.ylabel("log variance")
        plt.legend()
        fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_volatility_logvar_timeseries.png", dpi=160); plt.shut(fig)
    
        # --- Plot: estimated volatility σ̂(t) over time ---
        fig = plt.determine(figsize=(12,4))
        plt.plot(prepare["Date"].iloc[1:], sigma_tr, lw=1.5, label="σ̂ (prepare)")
        if 'sigma_te' in globals():
            plt.plot(check["Date"], sigma_te, lw=1.5, label="σ̂ (check)")
        plt.title("Mumbai — Conditional volatility σ̂(t)")
        plt.xlabel("Date"); plt.ylabel("σ̂")
        plt.legend()
        fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_volatility_sigma_timeseries.png", dpi=160); plt.shut(fig)
    
        # --- Coefficients of volatility regression (with CI) ---
        vol_coef = coef_ci_frame(vol_fit).sort_values("estimate")
        fig = plt.determine(figsize=(8, max(4, 0.4*len(vol_coef))))
        y = np.arange(len(vol_coef))
        plt.errorbar(vol_coef["estimate"], y, xerr=1.96*vol_coef["stderr"], fmt="o", capsize=3)
        plt.yticks(y, vol_coef["term"])
        plt.axvline(0, shade="ok", lw=1)
        plt.title("Mumbai — Volatility mannequin coefficients (95% CI)")
        plt.xlabel("Estimate")
        fig.tight_layout(); fig.savefig(DIAG_DIR / f"{m_slug}_volatility_coefficients.png", dpi=160); plt.shut(fig)
    
        # Save volatility regression abstract
        save_model_summary_txt(vol_fit, DIAG_DIR / f"{m_slug}_volatility_summary.txt")
    else:
        print("Volatility mannequin not accessible (too few factors or regression failed). Skipping vol plots.")
    
    print("Diagnostics saved to:", DIAG_DIR)
    Graph by Writer

    [dT(t) = ds(t) – kappa (T(t) – s(t)),dt + expleft(frac{1}{2}(alpha + beta^T z(t))right), dB(t)]

    Demo of log stabilization

    The graphs beneath illustrate how the logarithm aids in becoming volatility. The residual εₜ² accommodates important outliers, partly because of the sq. operation carried out to make sure positivity on the stabilized scale. With out the log stabilization, the massive shocks dominate the match, and the ultimate result’s biased.

    Graph by Writer
    Graph by Writer
    Graph by Writer
    Graph by Writer
    # Repair the seasonal plot from the earlier cell (indexing bug) and
    # add a *second state of affairs* with uncommon giant outliers as an instance instability
    # when regressing on skewed eps^2 immediately.
    
    # ------------------------------
    # 1) Seasonal view (fastened indexing)
    # ------------------------------
    # Recreate the arrays from the prior simulation cell by re-executing the identical seed & setup
    np.random.seed(7)                 # deterministic reproducibility
    n_days = 730                      # two artificial years (365 * 2)
    t = np.arange(n_days)             # day index 0..729
    omega = 2 * np.pi / 365.0         # seasonal frequency (one-year interval)
    
    # True (data-generating) log-variance is seasonal by way of sin/cos
    a_true, b_true, c_true = 0.2, 0.9, -0.35
    log_var_true = a_true + b_true * np.sin(omega * t) + c_true * np.cos(omega * t)
    
    # Convert log-variance to straightforward deviation: sigma = exp(0.5 * log var)
    sigma_true = np.exp(0.5 * log_var_true)
    
    # White noise improvements; variance scaled by sigma_true
    z = np.random.regular(measurement=n_days)
    eps = sigma_true * z              # heteroskedastic residuals
    eps2 = eps**2                     # "variance-like" goal if regressing uncooked eps^2
    
    # Design matrix with intercept + annual sin/cos harmonics
    X = np.column_stack([np.ones(n_days), np.sin(omega * t), np.cos(omega * t)])
    
    # --- Match A: OLS on uncooked eps^2 (delicate to skew/outliers) ---
    beta_A, *_ = np.linalg.lstsq(X, eps2, rcond=None)
    var_hat_A = X @ beta_A            # fitted variance (might be detrimental from OLS)
    var_hat_A = np.clip(var_hat_A, 1e-8, None)  # clip to keep away from negatives
    sigma_hat_A = np.sqrt(var_hat_A)  # convert to sigma
    
    # --- Match B: OLS on log(eps^2) (stabilizes scale & reduces skew) ---
    eps_safe = 1e-12                  # small epsilon to keep away from log(0)
    y_log = np.log(eps2 + eps_safe)   # stabilized goal
    beta_B, *_ = np.linalg.lstsq(X, y_log, rcond=None)
    log_var_hat_B = X @ beta_B
    sigma_hat_B = np.exp(0.5 * log_var_hat_B)
    
    # Day-of-year index for seasonal averaging throughout years
    doy = t % 365
    
    # Right ordering for the primary yr's DOY values
    order365 = np.argsort(doy[:365])
    
    def seasonal_mean(x):
        """
        Common the 2 years day-by-day to get a single seasonal curve.
        Assumes x has size 730 (two years); returns length-365 array.
        """
        return 0.5 * (x[:365] + x[365:730])
    
    # Plot one "artificial yr" view of the seasonal sigma sample
    plt.determine(figsize=(12, 5))
    plt.plot(np.kind(doy[:365]), seasonal_mean(sigma_true)[order365], label="True sigma seasonality")
    plt.plot(np.kind(doy[:365]), seasonal_mean(sigma_hat_A)[order365], label="Fitted from eps^2 regression", alpha=0.9)
    plt.plot(np.kind(doy[:365]), seasonal_mean(sigma_hat_B)[order365], label="Fitted from log(eps^2) regression", alpha=0.9)
    plt.title("Seasonal Volatility Sample: True vs Fitted (one-year view) – Fastened")
    plt.xlabel("Day of yr")
    plt.ylabel("Sigma")
    plt.legend()
    plt.tight_layout()
    plt.present()
    
    # ------------------------------
    # 2) OUTLIER SCENARIO as an instance instability
    # ------------------------------
    np.random.seed(21)                # separate seed for the outlier experiment
    
    def run_scenario(n_days=730, outlier_rate=0.05, outlier_scale=8.0):
        """
        Generate two-year heteroskedastic residuals with occasional big shocks
        to imitate heavy tails. Evaluate:
          - Match A: regress uncooked eps^2 on sin/cos (might be unstable, detrimental suits)
          - Match B: regress log(eps^2) on sin/cos (extra secure below heavy tails)
        Return fitted sigmas and error metrics (MAE/MAPE), plus diagnostics.
        """
        t = np.arange(n_days)
        omega = 2 * np.pi / 365.0
    
        # Identical true seasonal log-variance as above
        a_true, b_true, c_true = 0.2, 0.9, -0.35
        log_var_true = a_true + b_true * np.sin(omega * t) + c_true * np.cos(omega * t)
        sigma_true = np.exp(0.5 * log_var_true)
    
        # Base regular improvements
        z = np.random.regular(measurement=n_days)
    
        # Inject uncommon, big shocks to create heavy tails in eps^2
        masks = np.random.rand(n_days) < outlier_rate
        z[mask] *= outlier_scale
    
        # Heteroskedastic residuals and their squares
        eps = sigma_true * z
        eps2 = eps**2
    
        # Identical sin/cos design
        X = np.column_stack([np.ones(n_days), np.sin(omega * t), np.cos(omega * t)])
    
        # --- Match A: uncooked eps^2 on X (OLS) ---
        beta_A, *_ = np.linalg.lstsq(X, eps2, rcond=None)
        var_hat_A_raw = X @ beta_A
        neg_frac = np.imply(var_hat_A_raw < 0.0)        # fraction of detrimental variance predictions
        var_hat_A = np.clip(var_hat_A_raw, 1e-8, None) # clip to make sure non-negative variance
        sigma_hat_A = np.sqrt(var_hat_A)
    
        # --- Match B: log(eps^2) on X (OLS on log-scale) ---
        y_log = np.log(eps2 + 1e-12)
        beta_B, *_ = np.linalg.lstsq(X, y_log, rcond=None)
        log_var_hat_B = X @ beta_B
        sigma_hat_B = np.exp(0.5 * log_var_hat_B)
    
        # Error metrics evaluating fitted sigmas to the true sigma path
        mae = lambda a, b: np.imply(np.abs(a - b))
        mape = lambda a, b: np.imply(np.abs((a - b) / (a + 1e-12))) * 100
    
        mae_A = mae(sigma_true, sigma_hat_A)
        mae_B = mae(sigma_true, sigma_hat_B)
        mape_A = mape(sigma_true, sigma_hat_A)
        mape_B = mape(sigma_true, sigma_hat_B)
    
        return {
            "t": t,
            "sigma_true": sigma_true,
            "sigma_hat_A": sigma_hat_A,
            "sigma_hat_B": sigma_hat_B,
            "eps2": eps2,
            "y_log": y_log,
            "neg_frac": neg_frac,
            "mae_A": mae_A, "mae_B": mae_B,
            "mape_A": mape_A, "mape_B": mape_B
        }
    
    # Run with 5% outliers scaled 10x to make the purpose apparent
    res = run_scenario(outlier_rate=0.05, outlier_scale=10.0)
    
    print("nOUTLIER SCENARIO (5% of days have 10x shocks) — illustrating instability when utilizing eps^2 immediately")
    print(f"  MAE  (sigma):  uncooked eps^2 regression = {res['mae_A']:.4f}   |   log(eps^2) regression = {res['mae_B']:.4f}")
    print(f"  MAPE (sigma):  uncooked eps^2 regression = {res['mape_A']:.2f}% |   log(eps^2) regression = {res['mape_B']:.2f}%")
    print(f"  Unfavourable variance predictions earlier than clipping (uncooked match): {res['neg_frac']:.2%}")
    
    # Visible comparability: true sigma vs two fitted approaches below outliers
    plt.determine(figsize=(12, 5))
    plt.plot(res["t"], res["sigma_true"], label="True sigma(t)")
    plt.plot(res["t"], res["sigma_hat_A"], label="Fitted sigma from eps^2 regression", alpha=0.9)
    plt.plot(res["t"], res["sigma_hat_B"], label="Fitted sigma from log(eps^2) regression", alpha=0.9)
    plt.title("True vs Fitted Volatility with Uncommon Giant Shocks")
    plt.xlabel("Day")
    plt.ylabel("Sigma")
    plt.legend()
    plt.tight_layout()
    plt.present()
    
    # Present how the targets behave when outliers are current
    plt.determine(figsize=(12, 5))
    plt.plot(res["t"], res["eps2"], label="eps^2 (now extraordinarily heavy-tailed attributable to outliers)")
    plt.title("eps^2 below outliers: unstable goal for regression")
    plt.xlabel("Day")
    plt.ylabel("eps^2")
    plt.legend()
    plt.tight_layout()
    plt.present()
    
    plt.determine(figsize=(12, 5))
    plt.plot(res["t"], res["y_log"], label="log(eps^2): compressed & stabilized")
    plt.title("log(eps^2) below outliers: stabilized scale")
    plt.xlabel("Day")
    plt.ylabel("log(eps^2)")
    plt.legend()
    plt.tight_layout()
    plt.present()
    

    Conclusion

    Now that we now have fitted all these parameters to this SDE, we are able to think about using it to forecast. However getting right here hasn’t been straightforward. Our resolution closely relied on two key ideas.

    • Features might be approximated by a Fourier sequence.
    • The imply reversion parameter κ is equal to ϕ from our AR(1) course of.

    Every time becoming a stochastic differential equation, every time doing something stochastic, I discover it humorous to think about how the phrase derives from stokhos, which means intention. Even funnier is how that phrase became stokhazesthai, which means intention/guess. That course of should’ve concerned some error and horrible intention.

    References

    Alexandridis, A. Ok., & Zapranis, A. D. (2013). Climate Derivatives: Modelling and Pricing Climate-Associated Danger. Springer. https://doi.org/10.1007/978-1-4614-6071-8

    Benth, F. E., & Šaltytė Benth, J. (2007). The volatility of temperature and pricing of climate derivatives. Quantitative Finance, 7(5), 553–561. https://doi.org/10.1080/14697680601155334


    GitHub

    Website

    Marco Hening Tallarico
    Writer



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWhat Being a Data Scientist at a Startup Really Looks Like
    Next Article The Complete Guide to Modern Document Processing
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Creating AI that matters | MIT News

    October 21, 2025
    Artificial Intelligence

    Scaling Recommender Transformers to a Billion Parameters

    October 21, 2025
    Artificial Intelligence

    Hidden Gems in NumPy: 7 Functions Every Data Scientist Should Know

    October 21, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Apple Just Signaled the End of Traditional Search. Here’s What That Means

    May 13, 2025

    Ny studie avslöjar att vissa kan ge LLM ger vilseledande förklaringar

    June 6, 2025

    From Reporting to Reasoning: How AI Is Rewriting the Rules of Data App Development

    July 1, 2025

    OpenAI har precis lanserat en stor ChatGPT uppdatering med nya shoppingfunktioner

    April 29, 2025

    The Iconic Motorola Flip Phone is Back, Now Powered by AI

    April 25, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    The Rise of Semantic Entity Resolution

    September 14, 2025

    How to Develop Powerful Internal LLM Benchmarks

    August 26, 2025

    The Definitive Guide to Data Parsing

    September 8, 2025
    Our Picks

    OpenAIs nya webbläsare ChatGPT Atlas

    October 22, 2025

    Creating AI that matters | MIT News

    October 21, 2025

    Scaling Recommender Transformers to a Billion Parameters

    October 21, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.