Close Menu
    Trending
    • The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall
    • Metric Deception: When Your Best KPIs Hide Your Worst Failures
    • How to Scale Your LLM usage
    • TruthScan vs. SciSpace: AI Detection Battle
    • Data Science in 2026: Is It Still Worth It?
    • Why We’ve Been Optimizing the Wrong Thing in LLMs for Years
    • The Product Health Score: How I Reduced Critical Incidents by 35% with Unified Monitoring and n8n Automation
    • TDS Newsletter: November Must-Reads on GraphRAG, ML Projects, LLM-Powered Time-Series Analysis, and More
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » How to Scale Your LLM usage
    Artificial Intelligence

    How to Scale Your LLM usage

    ProfitlyAIBy ProfitlyAINovember 29, 2025No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    has maybe been a very powerful phrase with regards to Giant Language Fashions (LLMs), with the discharge of ChatGPT. ChatGPT was made so profitable, largely due to the scaled pre-training OpenAI did, making it a robust language mannequin.

    Following that, Frontier LLM labs began scaling the post-training, with supervised fine-tuning and RLHF, the place fashions acquired more and more higher at instruction following and performing advanced duties.

    And simply after we thought LLMs have been about to plateau, we began doing inference-time scaling with the discharge of reasoning fashions, the place spending considering tokens gave big enhancements to the standard of outputs.

    This infographic highlights the primary contents of this text. I’ll first talk about why it is best to scale your LLM utilization, highlighting the way it can result in elevated productiveness. Persevering with, I’ll specify how one can enhance your LLM utilization, protecting methods like working parallel coding brokers and utilizing deep analysis mode in Gemini 3 Professional. Picture by Gemini

    I now argue we should always proceed this scaling with a brand new scaling paradigm: usage-based scaling, the place you scale how a lot you’re utilizing LLMs:

    • Run extra coding brokers in parallel
    • All the time begin a deep analysis on a subject of curiosity
    • Run data fetching workflows

    When you’re not firing off an agent earlier than going to lunch, or going to sleep, you’re losing time

    On this article, I’ll talk about why scaling LLM utilization can result in elevated productiveness, particularly when working as a programmer. Moreover, I’ll talk about particular methods you need to use to scale your LLM utilization, each personally, and for firms you’re working for. I’ll hold this text high-level, aiming to encourage how one can maximally make the most of AI to your benefit.

    Why it is best to scale LLM utilization

    We have now already seen scaling be extremely highly effective beforehand with:

    • pre-training
    • post-training
    • inference time scaling

    The explanation for that is that it seems the extra computing energy you spend on one thing, the higher output high quality you’ll obtain. This, in fact, assumes you’re in a position to spend the pc successfully. For instance, for pre-training, having the ability to scale computing depends on

    • Giant sufficient fashions (sufficient weights to coach)
    • Sufficient information to coach on

    When you scale compute with out these two elements, you received’t see enhancements. Nevertheless, in case you do scale all three, you get wonderful outcomes, just like the frontier LLMs we’re seeing now, for instance, with the discharge of Gemini 3.

    I thus argue it is best to look to scale your personal LLM utilization as a lot as attainable. This might, for instance, be firing off a number of brokers to code in parallel, or beginning Gemini deep analysis on a subject you’re taken with.

    In fact, the utilization should nonetheless be of worth. There’s no level in beginning a coding agent on some obscure process you haven’t any want for. Slightly, it is best to begin a coding agent on:

    • A linear situation you by no means felt you had time to take a seat down and do your self
    • A fast characteristic was requested within the final gross sales name
    • Some UI enhancements, you realize, right this moment’s coding brokers deal with simply
    This picture reveals scaling legal guidelines, exhibiting how we are able to see elevated efficiency with elevated scaling. I argue the identical factor will occur when scaling our LLM utilization. Picture from NodeMasters.

    In a world with abundance of sources, we should always look to maximise our use of them

    My predominant level right here is that the edge to carry out duties has decreased considerably because the launch of LLMs. Beforehand, once you acquired a bug report, you needed to sit down for two hours in deep focus, serious about the best way to remedy that bug.

    Nevertheless, right this moment, that’s now not the case. As a substitute, you may go into Cursor, put within the bug report, and ask Claude Sonnet 4.5 to try to repair it. You may then come again 10 minutes later, check if the issue is fastened, and create the pull request.

    What number of tokens are you able to spend whereas nonetheless doing one thing helpful with the tokens

    scale LLM utilization

    I talked about why it is best to scale LLM utilization by working extra coding brokers, deep analysis brokers, and another AI brokers. Nevertheless, it may be exhausting to think about precisely what LLMs it is best to hearth off. Thus, on this part, I’ll talk about particular brokers you may hearth off to scale your LLM utilization.

    Parallel coding brokers

    Parallel coding brokers are one of many easiest methods to scale LLM utilization for any programmer. As a substitute of solely engaged on one downside at a time, you begin two or extra brokers on the similar time, both utilizing Cursor brokers, Claude code, or another agentic coding device. That is usually made very straightforward to do by using Git worktrees.

    For instance, I usually have one predominant process or venture that I’m engaged on, the place I’m sitting in Cursor and programming. Nevertheless, typically I get a bug report coming in, and I routinely route it to Claude Code to make it seek for why the issue is going on and repair it if attainable. Generally, this works out of the field; typically, I’ve to assist it a bit.

    Nevertheless, the price of beginning this bug fixing agent is tremendous low (I can actually simply copy the Linear situation into Cursor, which may learn the problem utilizing Linear MCP). Equally, I even have a script routinely researching related prospects, which I’ve working within the background.

    Deep analysis

    Deep analysis is a performance you need to use in any of the frontier mannequin suppliers like Google Gemini, OpenAI ChatGPT, and Anthropic’s Claude. I choose Gemini 3 deep analysis, although there are numerous different stable deep analysis instruments on the market.

    At any time when I’m taken with studying extra a couple of matter, discovering data, or something comparable, I hearth off a deep analysis agent with Gemini.

    For instance, I used to be taken with discovering some prospects given a selected ICP. I then shortly pasted the ICP data into Gemini, gave it some contextual data, and had it begin researching, in order that it may run whereas I used to be engaged on my predominant programming venture.

    After 20 minutes, I had a short report from Gemini, which turned out to comprise a great deal of helpful data.

    Creating workflows with n8n

    One other solution to scale LLM utilization is to create workflows with n8n or any comparable workflow-building device. With n8n, you may construct particular workflows that, for instance, learn Slack messages and carry out some motion based mostly on these Slack messages.

    You possibly can, as an example, have a workflow that reads a bug report group on Slack and routinely begins a Claude code agent for a given bug report. Or you possibly can create one other workflow that aggregates data from numerous totally different sources and gives it to you in an simply readable format. There are basically limitless alternatives with workflow-building instruments.

    Extra

    There are numerous different methods you need to use to scale your LLM utilization. I’ve solely listed the primary few gadgets that got here to thoughts for me after I’m working with LLMs. I like to recommend at all times maintaining in thoughts what you may automate utilizing AI, and how one can leverage it to grow to be simpler. scale LLM utilization will range broadly from totally different firms, job titles, and lots of different components.

    Conclusion

    On this article, I’ve mentioned the best way to scale your LLM utilization to grow to be a simpler engineer. I argue that we’ve seen scaling work extremely properly prior to now, and it’s extremely possible we are able to see more and more highly effective outcomes by scaling our personal utilization of LLMs. This may very well be firing off extra coding brokers in parallel, working deep analysis brokers whereas consuming lunch. On the whole, I consider that by growing our LLM utilization, we are able to grow to be more and more productive.

    👉 Discover me on socials:

    📚 Get my free Vision Language Models ebook

    💻 My webinar on Vision Language Models

    📩 Subscribe to my newsletter

    🧑‍💻 Get in touch

    🔗 LinkedIn

    🐦 X / Twitter

    ✍️ Medium



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleTruthScan vs. SciSpace: AI Detection Battle
    Next Article Metric Deception: When Your Best KPIs Hide Your Worst Failures
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall

    November 30, 2025
    Artificial Intelligence

    Metric Deception: When Your Best KPIs Hide Your Worst Failures

    November 29, 2025
    Artificial Intelligence

    Data Science in 2026: Is It Still Worth It?

    November 28, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Study could lead to LLMs that are better at complex reasoning | MIT News

    July 8, 2025

    Large Language Models in Healthcare: Breakthroughs, Use Cases, and Challenges

    April 7, 2025

    AudioX: En kraftfull ny AI som förvandlar allt till ljud

    April 16, 2025

    Alibaba har lanserat Qwen3 AI-modeller som är optimerade för Apples enheter

    June 17, 2025

    From Static Products to Dynamic Systems

    October 7, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    When Models Stop Listening: How Feature Collapse Quietly Erodes Machine Learning Systems

    August 1, 2025

    ChatGPT prompt-trick: lämna en tom rad efter en mening

    July 2, 2025

    Generating Structured Outputs from LLMs

    August 8, 2025
    Our Picks

    The Greedy Boruta Algorithm: Faster Feature Selection Without Sacrificing Recall

    November 30, 2025

    Metric Deception: When Your Best KPIs Hide Your Worst Failures

    November 29, 2025

    How to Scale Your LLM usage

    November 29, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.