Close Menu
    Trending
    • Deploy a Streamlit App to AWS
    • How to Ensure Reliability in LLM Applications
    • Automating Deep Learning: A Gentle Introduction to AutoKeras and Keras Tuner
    • From Equal Weights to Smart Weights: OTPO’s Approach to Better LLM Alignment
    • The Future of AI Agent Communication with ACP
    • Vad världen har frågat ChatGPT under 2025
    • Google’s generative video model Veo 3 has a subtitles problem
    • MedGemma – Nya AI-modeller för hälso och sjukvård
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » How to Ensure Reliability in LLM Applications
    Artificial Intelligence

    How to Ensure Reliability in LLM Applications

    ProfitlyAIBy ProfitlyAIJuly 15, 2025No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    have entered the world of pc science at a report tempo. LLMs are highly effective fashions able to successfully performing all kinds of duties. Nonetheless, LLM outputs are stochastic, making them unreliable. On this article, I talk about how one can guarantee reliability in your LLM functions by correctly prompting the mannequin and dealing with the output.

    This infographic highlights the contents of this text. I’ll primarily talk about making certain output consistency and dealing with errors. Picture by ChatGPT.

    You too can learn my articles on Attending NVIDIA GTC Paris 2025 and Creating Powerful Embeddings for Machine Learning.

    Desk of Contents

    Motivation

    My motivation for this text is that I’m constantly growing new functions utilizing LLMs. LLMs are generalized instruments that may be utilized to most text-dependent duties equivalent to classification, summarization, info extraction, and far more. Moreover, the rise of imaginative and prescient language fashions additionally allow us to deal with photos much like how we deal with textual content.

    I typically encounter the issue that my LLM functions are inconsistent. Generally the LLM doesn’t reply within the desired format, or I’m unable to correctly parse the LLM response. This can be a large drawback when you’re working in a manufacturing setting and are absolutely depending on consistency in your utility. I’ll thus talk about the methods I take advantage of to make sure reliability for my functions in a manufacturing setting.

    Making certain output consistency

    Markup tags

    To make sure output consistency, I take advantage of a way the place my LLM solutions in markup tags. I take advantage of a system immediate like:

    immediate = f"""
    Classify the textual content into "Cat" or "Canine"
    
    Present your response in <reply> </reply> tags
    
    """

    And the mannequin will virtually all the time reply with:

    <reply>Cat</reply>
    
    or 
    
    <reply>Canine</reply>

    Now you can simply parse out the response utilizing the next code:

    def _parse_response(response: str):
        return response.cut up("<reply>")[1].cut up("</reply>")[0]

    The explanation utilizing markup tags works so effectively is that that is how the mannequin is skilled to behave. When OpenAI, Qwen, Google, and others prepare these fashions, they use markup tags. The fashions are thus tremendous efficient at using these tags and can, in virtually all instances, adhere to the anticipated response format.

    For instance, with reasoning fashions, which have been on the rise recently, the fashions first do their considering enclosed in <assume> … </assume> tags, after which present their reply to the consumer.


    Moreover, I additionally attempt to use as many markup tags as doable elsewhere in my prompts. For instance, if I’m offering a couple of shot examples to my mannequin, I’ll do one thing like:

    immediate = f"""
    Classify the textual content into "Cat" or "Canine"
    
    Present your response in <reply> </reply> tags
    
    <instance>
    That is a picture displaying a cat -> <reply>Cat</reply>
    </instance>
    <instance>
    That is a picture displaying a canine -> <reply>Canine</reply>
    </instance>
    """

    I do two issues that assist the mannequin carry out right here:

    1. I present examples in <instance></instance> tags.
    2. In my examples, I guarantee to stick to my very own anticipated response format, utilizing the <reply></reply tags>

    Utilizing markup tags, you’ll be able to thus guarantee a excessive stage of output consistency out of your LLM

    Output validation

    Pydantic is a device you should use to make sure and validate the output of your LLMs. You’ll be able to outline varieties and validate that the output of the mannequin adheres to the kind we count on. For instance, you’ll be able to comply with the instance beneath, based mostly on this article:

    from pydantic import BaseModel
    from openai import OpenAI
    
    shopper = OpenAI()
    
    
    class Profile(BaseModel):
        identify: str
        e mail: str
        cellphone: str
    
    resp = shopper.chat.completions.create(
        mannequin="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": "Return the `name`, `email`, and `phone` of user {user} in a json object."
            },
        ]
    )
    
    Profile.model_validate_json(resp.selections[0].message.content material)

    As you’ll be able to see, we immediate GPT to reply with a JSON object, and we then run Pydantic to make sure the response is as we count on.


    I’d additionally like to notice that generally it’s simpler to easily create your individual output validation operate. Within the final instance, the one necessities for the response object are basically that the response object comprises the keys identify, e mail, and cellphone, and that every one of these are of the string sort. You’ll be able to validate this in Python with a operate:

    def validate_output(output: str):
        assert "identify" in output and isinstance(output["name"], str)
        assert "e mail" in output and isinstance(output["email"], str)
        assert "cellphone" in output and isinstance(output["phone"], str)

    With this, you would not have to put in any packages, and in a number of instances, it’s simpler to arrange.

    Tweaking the system immediate

    You too can make a number of different tweaks to your system immediate to make sure a extra dependable output. I all the time advocate making your immediate as structured as doable, utilizing:

    • Markup tags as talked about earlier
    • Lists, such because the one I’m writing in right here

    On the whole, you also needs to all the time guarantee clear directions. You need to use the next to make sure the standard of your immediate

    For those who gave the immediate to a different human, that had by no means seen the duty earlier than, and with no prior information of the duty. Would the human be capable of carry out the duty successfully?

    For those who can’t have a human do the duty, you often can’t count on an AI to do it (not less than for now).

    Dealing with errors

    Errors are inevitable when coping with LLMs. For those who carry out sufficient API calls, it’s virtually sure that generally the response won’t be in your required format, or one other situation.

    In these eventualities, it’s essential that you’ve a sturdy utility outfitted to deal with such errors. I take advantage of the next methods to deal with errors:

    • Retry mechanism
    • Enhance the temperature
    • Have backup LLMs

    Now, let me elaborate on every level.

    Exponential backoff retry mechanism

    It’s essential to have a retry mechanism in place, contemplating a number of points can happen when making an API name. You may encounter points equivalent to price limiting, incorrect output format, or a gradual response. In these eventualities, you should guarantee to wrap the LLM name in a try-catch and retry. Normally, it’s additionally good to make use of an exponential backoff, particularly for rate-limiting errors. The explanation for that is to make sure you wait lengthy sufficient to keep away from additional rate-limiting points.

    Temperature enhance

    I additionally generally advocate growing the temperature a bit. For those who set the temperature to 0, you inform the mannequin to behave deterministically. Nonetheless, generally this may have a unfavourable impact.

    For instance, when you’ve got an enter instance the place the mannequin failed to reply within the correct output format. For those who retry this utilizing a temperature of 0, you might be more likely to simply expertise the identical situation. I thus advocate you set the temperature to a bit larger, for instance 0.1, to make sure some stochasticness within the mannequin, whereas additionally making certain its outputs are comparatively deterministic.

    This is identical logic that a number of brokers use: a better temperature.

    They should keep away from being stuch in a loop. Having a better temperature may help them keep away from repetitive errors.

    Backup LLMs

    One other highly effective technique to cope with errors is to have backup LLMs. I like to recommend utilizing a sequence of LLM suppliers for all of your API calls. For instance, you first strive OpenAI, if that fails, you employ Gemini, and if that fails, you should use Claude.

    This ensures reliability within the occasion of provider-specific points. These could possibly be points equivalent to:

    • The server is down (for instance, if OpenAI’s API shouldn’t be accessible for a time period)
    • Filtering (generally, an LLM supplier will refuse to reply your request if it believes your request is in violation of jailbreak insurance policies or content material moderation)

    On the whole, it’s merely good apply to not be absolutely depending on one supplier.

    Conclusion

    On this article, I’ve mentioned how one can guarantee reliability in your LLM utility. LLM functions are inherently stochastic since you can’t straight management the output of an LLM. It’s thus essential to make sure you have correct insurance policies in place, each to attenuate the errors that happen and to deal with the errors after they happen.

    I’ve mentioned the next approaches to attenuate errors and deal with errors:

    • Markup tags
    • Output validation
    • Tweaking the system immediate
    • Retry mechanism
    • Enhance the temperature
    • Have backup LLMs

    For those who mix these methods into your utility, you’ll be able to obtain each a strong and sturdy LLM utility.

    👉 Observe me on socials:

    🧑‍💻 Get in touch
    🌐 Personal Blog
    🔗 LinkedIn
    🐦 X / Twitter
    ✍️ Medium
    🧵 Threads



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleAutomating Deep Learning: A Gentle Introduction to AutoKeras and Keras Tuner
    Next Article Deploy a Streamlit App to AWS
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Deploy a Streamlit App to AWS

    July 15, 2025
    Artificial Intelligence

    Automating Deep Learning: A Gentle Introduction to AutoKeras and Keras Tuner

    July 15, 2025
    Artificial Intelligence

    From Equal Weights to Smart Weights: OTPO’s Approach to Better LLM Alignment

    July 15, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Adding Training Noise To Improve Detections In Transformers

    April 28, 2025

    Maximizing AI Potential: Strategies for Effective Human-in-the-Loop Systems

    April 9, 2025

    Världens första AI-läkarklinik öppnar i Saudiarabien

    May 17, 2025

    DeepVerse 4D – AI som förstår världen i fyra dimensioner

    June 10, 2025

    Applications of Density Estimation to Legal Theory

    June 10, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Top 9 Amazon Textract alternatives for data extraction

    April 4, 2025

    Kling AI video uppgradering – vad är nytt i version 2.0?

    April 16, 2025

    How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals

    April 24, 2025
    Our Picks

    Deploy a Streamlit App to AWS

    July 15, 2025

    How to Ensure Reliability in LLM Applications

    July 15, 2025

    Automating Deep Learning: A Gentle Introduction to AutoKeras and Keras Tuner

    July 15, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.