Close Menu
    Trending
    • Three OpenClaw Mistakes to Avoid and How to Fix Them
    • I Stole a Wall Street Trick to Solve a Google Trends Data Problem
    • How AI is turning the Iran conflict into theater
    • Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)
    • Machine Learning at Scale: Managing More Than One Model in Production
    • Improving AI models’ ability to explain their predictions | MIT News
    • Write C Code Without Learning C: The Magic of PythoC
    • LatentVLA: Latent Reasoning Models for Autonomous Driving
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » Building LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output
    Artificial Intelligence

    Building LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output

    ProfitlyAIBy ProfitlyAISeptember 20, 2025No Comments13 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    , the usual “textual content in, textual content out” paradigm will solely take you up to now.

    Actual purposes that ship precise worth ought to have the ability to look at visuals, motive via complicated issues, and produce outcomes that programs can really use.

    On this submit, we’ll design this stack by bringing collectively three highly effective capabilities: multimodal enter, reasoning, and structured output.

    As an example this, we’ll stroll via a hands-on instance: constructing a time-series anomaly detection system for e-commerce order information utilizing OpenAI’s o3 mannequin. Particularly, we’ll present the way to pair o3’s reasoning functionality with picture enter and emit validated JSON, in order that the downstream system can simply eat it.

    By the tip, our app will:

    • See: analyze charts of e-commerce order quantity time collection
    • Suppose: determine uncommon patterns
    • Combine: output a structured anomaly report

    You’ll depart with useful code you may reuse for numerous use instances that transcend simply anomaly detection.

    Let’s dive in.

    Focused on studying the broader panorama of how LLMs are being utilized for anomaly detection? Try my earlier submit: Boosting Your Anomaly Detection With LLMs, the place I summarized 7 rising utility patterns that you simply shouldn’t miss.


    1. Case Research

    On this submit, we intention to construct an anomaly detection resolution for figuring out irregular patterns in e-commerce order time collection information.

    For this case examine, we generated three units of artificial every day order information. The datasets signify three totally different profiles of the every day order over roughly one month of time. To make seasonality apparent, we now have shaded the weekends. The x-axis exhibits the day of the week.

    Determine 1. Dataset 1, with the shaded areas being the weekends. (Picture by creator)
    Determine 2. Dataset 2, with the shaded areas being the weekends. (Picture by creator)
    Determine 3. Dataset 3, with the shaded areas being the weekends. (Picture by creator)

    Every determine comprises one particular sort of anomaly (can you discover them?). We’ll later use these figures to check our anomaly detection resolution and see if it may well precisely get better these anomalies.

    2. Our Answer

    2.1 Overview

    In contrast to the normal machine studying approaches that require tedious characteristic engineering and mannequin coaching, our present method is way easier. It really works with the next steps:

    1. We put together the determine for visualizing the e-commerce order time collection information.
    2. We immediate the reasoning mannequin o3, ask it to take a better have a look at the time collection picture we fed to it, and decide if an uncommon sample exists.
    3. The o3 mannequin will then output its findings in a pre-defined JSON format.

    And that’s it. Easy.

    In fact, to ship this resolution, we have to allow o3 mannequin to take picture enter and emit structured output. We are going to see how to try this shortly.

    2.2 Organising the reasoning mannequin

    As talked about earlier than, we’ll use o3 mannequin, which is the flagship reasoning mannequin from OpenAI that may deal with complicated multi-step issues with state-of-the-art efficiency. Particularly, we’ll use the Azure OpenAI endpoint to name the mannequin.

    Ensure you have put the endpoint, API key, and deployment title in an .env file, we will then proceed to organising the LLM shopper:

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import matplotlib.dates as mdates
    
    from openai import AzureOpenAI
    from dotenv import load_dotenv
    import os
    
    load_dotenv()
    
    # Setup LLM shopper
    endpoint = os.getenv("api_base")
    api_key = os.getenv("o3_API_KEY")
    api_version = "2025-04-01-preview"
    model_name = "o3"
    deployment = os.getenv("deployment_name")
    
    LLM_client = AzureOpenAI(
        api_key=api_key,  
        api_version=api_version,
        azure_endpoint=endpoint
    )

    We use the next instruction because the system message for the o3 mannequin (tuned by GPT-5):

    instruction = f"""
    
    [Role]
    You're a meticulous information analyst.
    
    [Task]
    You can be given a line chart picture associated to every day e-commerce orders. 
    Your activity is to determine outstanding anomalies within the information.
    
    [Rules]
    The anomaly varieties will be spike, drop, level_shift, or seasonal_outlier.
    A level_shift is a sustained baseline change (≥ 5 consecutive days), not a single level.
    A seasonal_outlier occurs if a weekend/weekday behaves not like friends in its class. 
    For instance, weekend orders are often decrease than the weekdays'.
    Learn dates/values from axes; when you can’t learn precisely, snap to the closest tick and observe uncertainty in clarification.
    The weekends are shaded within the determine.
    """

    Within the above instruction, we clearly outlined the function of the LLM, the duty that the LLM ought to full, and the foundations the LLM ought to comply with.

    To restrict the complexity of our case examine, we deliberately specified solely 4 anomaly sorts that LLM must determine. We additionally offered clear definitions of these anomaly sorts to take away ambiguity.

    Lastly, we injected a little bit of area data about e-commerce patterns, i.e., decrease weekend orders are anticipated in comparison with weekdays. Incorporating area know-how is usually thought-about good observe for guiding the mannequin’s analytical course of.

    Now that we now have our mannequin arrange, let’s talk about the way to put together the picture for o3 mannequin to eat.

    2.3 Picture preparation

    To allow o3’s multimodal capabilities, we have to present figures in a particular format, i.e., both publicly accessible internet URLs or as base64-encoded information URLs. Since our figures are generated domestically, we’ll use the second method.

    What’s Base64 Encoding anyway? Base64 is a option to signify binary information (like our picture information) utilizing solely textual content characters which might be protected to transmit over the web. It converts binary picture information right into a string of letters, numbers, and some symbols.

    And what about information URL? A knowledge URL is a sort of URL that embeds the file content material instantly within the URL string, quite than pointing to a file location.

    We will use the next perform to deal with this conversion robotically:

    import io
    import base64
    
    def fig_to_data_url(fig, fmt="png"):
        """
        Converts a Matplotlib determine to a base64 information URL with out saving to disk.
    
        Args:
        -----
        fig (matplotlib.determine.Determine): The determine to transform.
        fmt (str): The format of the picture ("png", "jpeg", and so forth.)
    
        Returns:
        --------
        str: The information URL representing the determine.
        """
    
        buf = io.BytesIO()
        fig.savefig(buf, format=fmt, bbox_inches="tight")
        buf.search(0)
        
        base64_encoded_data = base64.b64encode(buf.learn()).decode("utf-8")
        mime_type = f"picture/{fmt.decrease()}"
        
        return f"information:{mime_type};base64,{base64_encoded_data}"

    Primarily, our perform first saves the matplotlib determine to a reminiscence buffer. It then encodes the binary PNG information as base64 textual content and wraps it within the desired information URL format.

    Assuming we now have entry to the artificial every day order information, we will use the next perform to generate the plot and convert it into a correct information URL format in a single go:

    def create_fig(df):
        """
        Create a Matplotlib determine and convert it to a base64 information URL.
        Weekends (Sat–Solar) are shaded.
    
        Args:
        -----
        df: dataframe comprises one profile of every day order time collection. 
            dataframe has "date" and "orders" columns.
    
        Returns:
        --------
        image_url: The information URL representing the determine.
        """
    
        df = df.copy()
        df['date'] = pd.to_datetime(df['date'])
    
        fig, ax = plt.subplots(figsize=(8, 4.5))
        ax.plot(df["date"], df["orders"], linewidth=2)
        ax.set_xlabel('Date', fontsize=14)
        ax.set_ylabel('Each day Orders', fontsize=14)
    
        # Weekend shading
        begin = df["date"].min().normalize()
        finish   = df["date"].max().normalize()
        cur = begin
        whereas cur <= finish:
            if cur.weekday() == 5:  # Saturday 00:00
                span_start = cur                                      # Sat 00:00
                span_end   = cur + pd.Timedelta(days=1)               # Mon 00:00
                ax.axvspan(span_start, span_end, alpha=0.12, zorder=0)
                cur += pd.Timedelta(days=2)                           # skip Sunday 
            else:
                cur += pd.Timedelta(days=1)
    
        # Title
        title = f'Each day Orders: {df["date"].min():%b %d, %Y} - {df["date"].max():%b %d, %Y}'
        ax.set_title(title, fontsize=16)
    
        # Format x-axis dates
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d')) 
        ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1))
    
        plt.tight_layout()
    
        # Acquire url
        image_url = fig_to_data_url(fig)
    
        return image_url

    Figures 1-3 are generated by the above plotting routine.

    2.4 Structured output

    On this part, let’s talk about how to make sure the o3 mannequin outputs a constant JSON format as an alternative of free-form textual content. That is what’s often known as “structured output,” and it’s one of many key enablers for integrating LLMs into current computerized workflows.

    To attain that, we begin by defining the schema that governs the anticipated output construction. We’ll be utilizing a Pydantic mannequin:

    from pydantic import BaseModel, Area
    from typing import Literal
    from datetime import date
    
    AnomalyKind = Literal["spike", "drop", "level_shift", "seasonal_outlier"]
    
    class DateWindow(BaseModel):
        begin: date = Area(description="Earliest believable date the anomaly begins (ISO YYYY-MM-DD)")
        finish: date = Area(description="Newest believable date the anomaly ends, inclusive (ISO YYYY-MM-DD)")
    
    class AnomalyReport(BaseModel):
        when: DateWindow = Area(
            description=(
                "Minimal window that comprises the anomaly. "
                "For single-point anomalies, use the interval that covers studying uncertainty, if the tick labels are unclear"
            )
        )
        y: int = Area(description="Approx worth on the anomaly’s most consultant day (peak/lowest), rounded")
        form: AnomalyKind = Area(description="The kind of the anomaly")
        why: str = Area(description="One-sentence motive for why this window is uncommon")
        date_confidence: Literal["low","medium","high"] = Area(
            default="medium", description="Confidence that the window localization is right"
        )

    Our Pydantic schema tries to seize each the quantitative and qualitative elements of the detected anomalies. For every subject, we specify its information sort (e.g., int for numerical values, Literal for a set set of selections, and so forth.).

    Additionally, we use Area perform to supply detailed descriptions of every key. These descriptions are particularly vital as they successfully function inline directions for o3, in order that it understands the semantic which means of every element.

    Now, we now have coated the multimodal enter and structured output, time to place them collectively in a single LLM name.

    2.5 o3 mannequin invocation

    To work together with o3 utilizing multimodal enter and structured output, we use LLM_client.beta.chat.completions.parse() API. A few of the key arguments embody:

    • mannequin: the deployment title;
    • messages: the message object despatched to o3 mannequin;
    • max_completion_token: the utmost variety of tokens the mannequin can generate in its closing response. Notice that for reasoning fashions like o3, they may generate reasoning_tokens internally to “suppose via” the issue. The present max_completion_token solely limits the seen output tokens that customers obtain;
    • response_format: the Pydantic mannequin that defines the anticipated JSON schema construction;
    • reasoning_effort: a management knob that dictates how a lot computational effort o3 ought to use for reasoning. The out there choices embody low, medium, and excessive.

    We will outline a helper perform to work together with the o3 mannequin:

    def anomaly_detection(instruction, fig_path, 
                          response_format, immediate=None, 
                          deployment="o3", reasoning_effort="excessive"):
    
        # Compose messages
        messages=[
                { "role": "system", "content": instruction},
                { "role": "user", "content": [  
                    { 
                        "type": "image_url",
                        "image_url": {
                            "url": fig_path,
                            "detail": "high"
                        }
                    },
                ]} 
        ]
    
        # Add immediate whether it is given
        if immediate shouldn't be None:
            messages[1]["content"].append({"sort": "textual content", "textual content": immediate})
    
        # Invoke LLM API
        response = LLM_client.beta.chat.completions.parse(
            mannequin=deployment,
            messages=messages,
            max_completion_tokens=4000,
            reasoning_effort=reasoning_effort,
            response_format=response_format
        )
    
        return response.selections[0].message.parsed.model_dump()

    Notice that the messages object accepts each textual content and picture content material. Since we’ll solely use figures to immediate the mannequin, the textual content immediate is optionally available.

    We set the "element": "excessive" to allow high-resolution picture processing. For our present case examine, that is almost certainly obligatory as we want o3 to raised learn high quality particulars like axis tick labels, information level values, and refined visible patterns. Nevertheless, keep in mind that high-detail processing would incur extra tokens and better API prices.

    Lastly, through the use of .parsed.model_dump(), we flip the JSON output right into a normal Python dictionary.

    That’s it for the implementation. Let’s see some outcomes subsequent.


    3. Outcomes

    On this part, we’ll enter the beforehand generated figures into the o3 mannequin and ask it to determine potential anomalies.

    3.1 Spike anomaly

    # df_spike_anomaly is the dataframe of the primary set of artificial information (Determine 1)
    spike_anomaly_url = create_fig(df_spike_anomaly)
    
    # Anomaly detection
    end result = anomaly_detection(instruction,
                              spike_anomaly_url,
                              response_format=AnomalyReport,
                              reasoning_effort="medium")
    print(end result)

    Within the name above, the spike_anomaly_url is the information URL for Determine 1. The output of the result’s proven beneath:

    {
      'when': {'begin': datetime.date(2025, 8, 19), 'finish': datetime.date(2025, 8, 21)}, 
      'y': 166, 
      'form': 'spike', 
      'why': 'Single day orders leap to ~166, far above adjoining days that sit close to 120–130.', 
      'date_confidence': 'medium'
    }

    We see that o3 mannequin faithfully returned the output precisely within the format we designed. Now, we will seize this end result and generate a visualization programmatically:

    # Create picture
    fig, ax = plt.subplots(figsize=(8, 4.5))
    df_spike_anomaly['date'] = pd.to_datetime(df_spike_anomaly['date'])
    ax.plot(df_spike_anomaly["date"], df_spike_anomaly["orders"], linewidth=2)
    ax.set_xlabel('Date', fontsize=14)
    ax.set_ylabel('Each day Orders', fontsize=14)
    
    # Format x-axis dates
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))  
    ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1)) 
    
    # Add anomaly overlay
    start_date = pd.to_datetime(end result['when']['start'])
    end_date = pd.to_datetime(end result['when']['end'])
    
    # Add shaded area
    ax.axvspan(start_date, end_date, alpha=0.3, coloration='crimson', label=f"Anomaly ({end result['kind']})")
    
    # Add textual content annotation
    mid_date = start_date + (end_date - start_date) / 2  # Center of anomaly window
    ax.annotate(
        end result['why'], 
        xy=(mid_date, end result['y']), 
        xytext=(10, 20),  # Offset from the purpose
        textcoords='offset factors',
        bbox=dict(boxstyle='spherical,pad=0.5', fc='yellow', alpha=0.7),
        arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0.1'),
        fontsize=10,
        wrap=True
    )
    
    # Add legend
    ax.legend()
    
    plt.xticks(rotation=0)
    plt.tight_layout()

    The generated visualization seems to be like this:

    Determine 4. The anomaly detection outcomes for Determine 1. (Picture by creator)

    We will see that the o3 mannequin accurately recognized the spike anomaly introduced on this first set of artificial information.

    Not dangerous, particularly contemplating the truth that we didn’t do any standard mannequin coaching, simply by prompting an LLM.

    3.2 Degree shift anomaly

    # df_level_shift_anomaly is the dataframe of the 2nd set of artificial information (Determine 2)
    level_shift_anomaly_url = create_fig(df_level_shift_anomaly)
    
    # Anomaly detection
    end result = anomaly_detection(instruction,
                              level_shift_anomaly_url,
                              response_format=AnomalyReport,
                              reasoning_effort="medium")
    print(end result)

    The output of the result’s proven beneath:

    {
      'when': {'begin': datetime.date(2025, 8, 26), 'finish': datetime.date(2025, 9, 2)}, 
      'y': 150, 
      'form': 'level_shift', 
      'why': 'Orders instantly leap from the 120-135 vary to ~150 on Aug 26 and stay elevated for all subsequent days, indicating a sustained baseline change.', 
      'date_confidence': 'excessive'
    }

    Once more, we see that the mannequin precisely recognized {that a} “level_shift” anomaly is current within the plot:

    Determine 5. The anomaly detection outcomes for Determine 2. (Picture by creator)

    3.3 Seasonality anomaly

    # df_seasonality_anomaly is the dataframe of the third set of artificial information (Determine 3)
    seasonality_anomaly_url = create_fig(df_seasonality_anomaly)
    
    # Anomaly detection
    end result = anomaly_detection(instruction,
                              seasonality_anomaly_url,
                              response_format=AnomalyReport,
                              reasoning_effort="medium")
    print(end result)

    The output of the result’s proven beneath:

    {
      'when': {'begin': datetime.date(2025, 8, 23), 'finish': datetime.date(2025, 8, 24)}, 
      'y': 132, 
      'form': 'seasonal_outlier', 
      'why': 'Weekend of Aug 23-24 exhibits order volumes (~130+) on par with surrounding weekdays, whereas different weekends constantly drop to ~115, making it an out-of-season spike.', 
      'date_confidence': 'excessive'
    }

    It is a difficult case. However, our o3 mannequin managed to deal with it correctly, with correct localization and a transparent reasoning hint. Fairly spectacular:

    Determine 6. The anomaly detection outcomes for Determine 3. (Picture by creator)

    4. Abstract

    Congratulations! We’ve efficiently constructed an anomaly detection resolution for time-series information that labored totally via visualization and prompting.

    By feeding every day order plots into the o3 reasoning mannequin and constraining its output to a JSON schema, the LLM managed to determine three totally different anomaly sorts with correct localization. All of this was achieved with out coaching any ML mannequin. Spectacular!

    If we take a step again, we will see that the answer we constructed illustrates the broader sample of mixing three capabilities:

    • See: multimodal enter to let the mannequin eat figures instantly.
    • Suppose: step-by-step reasoning functionality to deal with complicated issues.
    • Combine: structured output that downstream programs can simply eat (e.g., producing visualizations).

    The mixture of multimodal enter + reasoning + structured output actually creates a flexible basis for helpful LLM purposes.

    You now have the constructing blocks prepared. What do you wish to construct subsequent?



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleThe SyncNet Research Paper, Clearly Explained
    Next Article Python Can Now Call Mojo | Towards Data Science
    ProfitlyAI
    • Website

    Related Posts

    Artificial Intelligence

    Three OpenClaw Mistakes to Avoid and How to Fix Them

    March 9, 2026
    Artificial Intelligence

    I Stole a Wall Street Trick to Solve a Google Trends Data Problem

    March 9, 2026
    Artificial Intelligence

    Why Your AI Search Evaluation Is Probably Wrong (And How to Fix It)

    March 9, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    How Relevance Models Foreshadowed Transformers for NLP

    November 20, 2025

    Hitchhiker’s Guide to RAG with ChatGPT API and LangChain

    June 26, 2025

    How to Run Claude Code for Free with Local and Cloud Models from Ollama

    January 31, 2026

    OpenAI stödjer AI animerad film kallad Critterz

    September 26, 2025

    The new biologists treating LLMs like an alien autopsy

    January 12, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Parquet File Format – Everything You Need to Know!

    May 14, 2025

    Helping data storage keep up with the AI revolution | MIT News

    August 6, 2025

    Best Invoice Automation Software 2025 [Updated]

    September 1, 2025
    Our Picks

    Three OpenClaw Mistakes to Avoid and How to Fix Them

    March 9, 2026

    I Stole a Wall Street Trick to Solve a Google Trends Data Problem

    March 9, 2026

    How AI is turning the Iran conflict into theater

    March 9, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.