Close Menu
    Trending
    • Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)
    • Machine Learning at Scale: Managing More Than One Model in Production
    • Improving AI models’ ability to explain their predictions | MIT News
    • Write C Code Without Learning C: The Magic of PythoC
    • LatentVLA: Latent Reasoning Models for Autonomous Driving
    • Understanding Context and Contextual Retrieval in RAG
    • The AI Bubble Has a Data Science Escape Hatch
    • Is the Pentagon allowed to surveil Americans with AI?
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » Machine Learning at Scale: Managing More Than One Model in Production
    Artificial Intelligence

    Machine Learning at Scale: Managing More Than One Model in Production

    ProfitlyAIBy ProfitlyAIMarch 9, 2026No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    your self how actual machine studying merchandise really run in main tech corporations or departments? If sure, this text is for you 🙂

    Earlier than discussing scalability, please don’t hesitate to learn my first article on the basics of machine learning in production.

    On this final article, I instructed you that I’ve spent 10 years working as an AI engineer within the business. Early in my profession, I realized {that a} mannequin in a pocket book is only a mathematical speculation. It solely turns into helpful when its output hits a person, a product, or generates cash.

    I’ve already proven you what “Machine Studying in Manufacturing” appears like for a single mission. However at the moment, the dialog is about Scale: managing tens, and even lots of, of ML initiatives concurrently. These final years, we have now moved from the Sandbox Period into the Infrastructure Period. “Deploying a mannequin” is now a no negotiable talent; the actual problem is making certain a large portfolio of fashions works reliably and safely.


    1. Leaving the Sandbox: The Technique of Availability

    To grasp ML at scale, you first want to depart the “Sandbox” mindset behind you. In a sandbox, you’ve static information and one mannequin. If it drifts, you see it, you cease it, you repair it.

    However when you transition to Scale Mode, you’re now not managing a mannequin, you’re managing a portfolio. That is the place the CAP Theorem (Consistency, Availability, and Partition Tolerance) turns into your actuality. In a single-model setup, you’ll be able to attempt to steadiness the tradeoffs, however at scale, it’s inconceivable to be good throughout the three metrics. You could select your battles, and most of the time, Availability turns into the highest precedence.

    Why? As a result of when you’ve 100 fashions operating, one thing is all the time breaking. In case you stopped the service each time a mannequin drifted, your product can be offline 50% of the time.

    Since we can not cease the service, we design fashions to fail “cleanly.” Take an instance of a advice system: if its mannequin will get corrupted information, it shouldn’t crash or present a “404 error.” It ought to fall again to a protected default setting (like exhibiting the “High 10 Most Common” gadgets). The person stays pleased, the system stays accessible, regardless that the result’s suboptimal. However to do that, it’s essential know when to set off that fallback. And that leads us to our greatest problem at scale…”The monitoring”.


    2. The Monitoring Problem And Why conventional metrics die at scale

    By saying that at scale it’s vital that our system fail “cleanly,” you would possibly suppose that it’s straightforward and we simply must verify or monitor the accuracy. However at scale, “Accuracy” will not be sufficient and I’ll let you know precisely why:

    • The Lack of Human Consensus: In Laptop Imaginative and prescient, for instance, monitoring is straightforward as a result of people agree on the reality (it’s a canine or it’s not). However in a Suggestion System or an Advert-ranking mannequin, there is no such thing as a “Gold Commonplace.” If a person doesn’t click on, is the mannequin unhealthy? Or is the person simply not within the temper?
    • The Characteristic Engineering Entice: As a result of we are able to’t simply measure “reality” via a easy metric, we over-compensate. We add lots of of options to the mannequin, hoping that “extra information” will remedy the uncertainty.
    • The Theoretical Ceiling: We battle for 0.1% accuracy positive factors with out realizing if the info is simply too noisy to present extra. We’re chasing a “ceiling” we are able to’t see.

    So let’s hyperlink all of that to grasp the place we’re going and why that is vital: As a result of monitoring “reality” is sort of inconceivable at scale (Useless Zones), we are able to’t depend on easy alerts to inform us to cease. That is precisely why we prioritize Availability and Secure Fallbacks, we assume the mannequin could be failing with out the metrics telling us, so we construct a system that may survive that “fuzzy” failure.


    3. What about The Engineering Wall

    Now that we have now mentioned the technique and monitoring challenges, we aren’t but able to scale, as we have now not but addressed the infrastructure facet. Scaling requires engineering abilities simply as a lot as information science abilities.

    We can not speak about scaling if we don’t have a stable, safe infrastructure. As a result of the fashions are complicated, and since Availability is our primary precedence, we have to suppose critically in regards to the structure we arrange.

    At this stage, my sincere recommendation is to encompass your self with a crew or people who find themselves used to constructing huge infrastructures. You don’t essentially want a large cluster or a supercomputer, however you do want to consider these three execution fundamentals:

    • Cloud vs. Machine: A server provides you energy and is straightforward to observe, but it surely’s costly. Your alternative relies upon solely on Value vs. Management.
    • The {Hardware}: You merely can’t put each mannequin on a GPU; you’d go bankrupt. You want a Tiered Technique: run your easy “fallback” fashions on low cost CPUs, and reserve the costly GPUs for the heavy “money-maker” fashions.
    • Optimization: At scale, a 1-second lag in your fallback mechanism is a failure. You aren’t simply writing Python anymore; you will need to be taught to compile and optimize your code for particular chips so the “Fail Cleanly” change occurs in milliseconds.

    4. Watch out of Label Leakage

    So, you’ve anticipated the failures, labored on availability, sorted the monitoring, and constructed the infrastructure. You most likely suppose you’re lastly able to grasp scalability. Really, not but. There is a matter you merely can’t anticipate when you have by no means labored in an actual surroundings.

    Even when your engineering is ideal, Label Leakage can destroy your technique and your programs which are operating a number of fashions.

    In a single mission, you would possibly spot leakage in a pocket book. However at scale, the place information comes from 50 completely different pipelines, leakage turns into nearly invisible.

    The Churn Instance: Think about you’re predicting which customers will cancel their subscription. Your coaching information has a function known as Last_Login_Date. The mannequin appears good with 99% F1 rating.

    However right here’s what really occurred: The database crew arrange a set off that “clears” the login date area the second a person hits the “Cancel” button. Your mannequin sees a “Null” login date and realizes, “Aha! They canceled!”

    In the actual world, on the precise millisecond the mannequin must make a prediction earlier than the person cancels, that area isn’t Null but. The mannequin is wanting on the reply from the long run.

    This can be a primary instance simply so you’ll be able to perceive the idea. However consider me, when you have a posh system with real-time predictions (which occurs usually with IoT), that is extremely laborious to detect. You’ll be able to solely keep away from it if you’re conscious of the issue from the beginning.

    My suggestions:

    • Characteristic Latency Monitoring: Don’t simply monitor the worth of the info, monitor when it was written vs. when the occasion really occurred.
    • The Millisecond Take a look at: At all times ask: “On the precise second of prediction, does this particular database row really include this worth but?”

    After all, these are easy questions, however the perfect time to guage that is in the course of the design part, earlier than you ever write a line of manufacturing code.

    5. Lastly, The Human Loop

    The ultimate piece of the puzzle is Accountability. At scale, our metrics are fuzzy, our infrastructure is complicated, and our information is leaky, so we’d like a “Security Web.”

    • Shadow Deployment: That is necessary for scale. You deploy “Mannequin B” however don’t present its outcomes to customers. You let it run “within the shadows” for every week, evaluating its predictions to the “Fact” that finally arrives. If it’s secure, solely then do you put it up for sale to “Dwell.”
    • Human-in-the-Loop: For prime-stakes fashions, you want a small crew to audit the “Secure Defaults.” In case your system has fallen again to “Most Common Objects” for 3 days, a human must ask why the principle mannequin hasn’t recovered.

    And a fast recap earlier than you begin working with ML at scale:

    • Since we are able to’t be good, we select to remain on-line (Availability) and fail safely.
    • Availability is our metric no 1 since monitoring at scale is “fuzzy” and conventional metrics are unreliable.
    • We construct the infrastructure (Cloud/{Hardware}) to make these protected failures quick.
    • We be careful for “dishonest” information (Leakage) that makes our fuzzy metrics look too good to be true.
    • We use Shadow Deploys to show the mannequin is protected earlier than it ever touches a buyer.

    And bear in mind, your scale is just pretty much as good as your security internet. Don’t let your work be among the many 87% of failed initiatives.


    👉 LinkedIn: Sabrine Bendimerad

    👉 Medium: https://medium.com/@sabrine.bendimerad1

    👉 Instagram: https://tinyurl.com/datailearn



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleImproving AI models’ ability to explain their predictions | MIT News
    Next Article Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)

    March 9, 2026
    Artificial Intelligence

    Improving AI models’ ability to explain their predictions | MIT News

    March 9, 2026
    Artificial Intelligence

    Write C Code Without Learning C: The Magic of PythoC

    March 8, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Agentic AI 101: Starting Your Journey Building AI Agents

    May 2, 2025

    Hur man tar bort bakgrunder från foton med AI – enkelt och gratis

    July 6, 2025

    Actual Intelligence in the Age of AI

    September 30, 2025

    Enabling small language models to solve complex reasoning tasks | MIT News

    December 12, 2025

    Chatbots are surprisingly effective at debunking conspiracy theories

    October 30, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    The Machine Learning “Advent Calendar” Day 21: Gradient Boosted Decision Tree Regressor in Excel

    December 22, 2025

    Enhancing Senior Care and Safety

    April 10, 2025

    Understanding Ethical AI: The Importance of Fairness and How to Avoid Common Biases in AI Systems

    April 9, 2025
    Our Picks

    Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)

    March 9, 2026

    Machine Learning at Scale: Managing More Than One Model in Production

    March 9, 2026

    Improving AI models’ ability to explain their predictions | MIT News

    March 9, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.