exploring options within the OpenAI Brokers SDK framework, there’s one functionality that deserves a better look: enter and output guardrails.
In earlier articles, we constructed our first agent with an API-calling instrument after which expanded right into a multi-agent system. In real-world situations, although, constructing these methods is advanced—and with out the best safeguards, issues can rapidly go off monitor. That’s the place guardrails are available in: they assist guarantee security, focus, and effectivity.
For those who haven’t learn the sooner components but, no worries — you’ll discover hyperlinks to the earlier articles on the finish of this publish.
Right here’s why guardrails matter:
- Forestall misuse
- Save assets
- Guarantee security and compliance
- Preserve focus and high quality
With out correct guardrails, surprising use circumstances can pop up. For instance, you might need heard of individuals utilizing AI-powered customer support bots (designed for product assist) to write down code as an alternative. It sounds humorous, however for the corporate, it will probably turn out to be a pricey and irrelevant distraction.
To see why guardrails are vital, let’s revisit our final challenge. I ran the agents_as_tools
script and requested it to generate code for calling a climate API. Since no guardrails had been in place, the app returned the reply with out hesitation—proving that, by default, it’s going to attempt to do nearly something requested of it.
We positively don’t need this occurring in a manufacturing app. Think about the prices of unintended utilization—to not point out the larger dangers it will probably convey, resembling data leaks, system immediate publicity, and different severe vulnerabilities.
Hopefully, this makes the case clear for why guardrails are value exploring. Subsequent, let’s dive into how one can begin utilizing the guardrail function within the OpenAI Brokers SDK.
A Fast Intro to Guardrails
Within the OpenAI Brokers SDK, there are two kinds of guardrails: enter guardrails and output guardrails [1]. Enter guardrails run on the consumer’s preliminary enter, whereas output guardrails run on the agent’s ultimate response.
A guardrail will be an LLM-powered agent—helpful for duties that require reasoning—or a rule-based/programmatic perform, resembling a regex to detect particular key phrases. If the guardrail finds a violation, it triggers a tripwire and raises an exception. This mechanism prevents the primary agent from processing unsafe or irrelevant queries, guaranteeing each security and effectivity.
Some sensible makes use of for enter guardrails embody:
- Figuring out when a consumer asks an off-topic query [2]
- Detecting unsafe enter makes an attempt, together with jailbreaks and immediate injections [3]
- Moderating to flag inappropriate enter, resembling harassment, violence, or hate speech [3]
- Dealing with specific-case validation. For instance, in our climate app, we might implement that questions solely reference cities in Indonesia.
However, output guardrails can be utilized to:
- Forestall unsafe or inappropriate responses
- Cease the agent from leaking personally identifiable data (PII) [3]
- Guarantee compliance and model security, resembling blocking outputs that would hurt model integrity
On this article, we’ll discover several types of guardrails, together with each LLM-based and rule-based approaches, and the way they are often utilized for numerous sorts of validation.
Stipulations
- Create a
necessities.txt
file:
openai-agents
streamlit
- Create a digital setting named
venv
. Run the next instructions in your terminal:
python −m venv venv
supply venv/bin/activate # On Home windows: venvScriptsactivate
pip set up -r necessities.txt
- Create a
.env
file to retailer your OpenAI API key:
OPENAI_API_KEY=your_openai_key_here
For the guardrail implementation, we’ll use the script from the earlier article the place we constructed the agents-as-tools multi-agent system. For an in depth walkthrough, please refer again to that article. The total implementation script will be discovered right here: app06_agents_as_tools.py.
Now let’s create a brand new file named app08_guardrails.py
.
Enter Guardrail
We’ll begin by including enter guardrails to our climate app. On this part, we’ll construct two sorts:
- Off-topic guardrail, which makes use of an LLM to find out if the consumer enter is unrelated to the app’s goal.
- Injection detection guardrail, which makes use of a easy rule to catch jailbreak and immediate injection makes an attempt.
Import Libraries
First, let’s import the mandatory packages from the Brokers SDK and different libraries. We’ll additionally arrange the setting to load the OpenAI API key from the .env
file. From the Brokers SDK, apart from the essential capabilities (Agent
, Runner
, and function_tool
) we’ll additionally import capabilities particularly used for implementing enter and output guardrails.
from brokers import (
Agent,
Runner,
function_tool,
GuardrailFunctionOutput,
input_guardrail,
InputGuardrailTripwireTriggered,
output_guardrail,
OutputGuardrailTripwireTriggered
)
import asyncio
import requests
import streamlit as st
from pydantic import BaseModel, Area
from dotenv import load_dotenv
load_dotenv()
Outline Output Mannequin
For any LLM-based guardrail, we have to outline an output mannequin. Usually, we use a Pydantic mannequin class to specify the construction of the information. On the easiest stage, we want a boolean subject (True/False) to point whether or not the guardrail ought to set off, together with a textual content subject that explains the reasoning.
In our case, we would like the guardrail to find out whether or not the question remains to be throughout the scope of the app’s goal (climate and air high quality). To do this, we’ll outline a mannequin named TopicClassificationOutput
as proven under:
# Outline output mannequin for the guardrail agent to categorise if enter is off-topic
class TopicClassificationOutput(BaseModel):
is_off_topic: bool = Area(
description="True if the enter is off-topic (not associated to climate/air high quality and never a greeting), False in any other case"
)
reasoning: str = Area(
description="Transient clarification of why the enter was labeled as on-topic or off-topic"
)
The boolean subject is_off_topic
will probably be set to True
if the enter is outdoors the app’s scope. The reasoning
subject shops a brief clarification of why the mannequin made its classification.
Create Guardrail Agent
We have to outline an agent with clear and full directions to find out whether or not a consumer’s query is on-topic or off-topic. This may be adjusted relying in your app’s goal—the directions don’t should be the identical for each use case.
For our Climate and Air High quality assistant, right here’s the guardrail agent with directions for classifying a consumer’s question.
# Create the guardrail agent to find out if enter is off-topic
topic_classification_agent = Agent(
title="Matter Classification Agent",
directions=(
"You're a matter classifier for a climate and air high quality software. "
"Your job is to find out if a consumer's query is on-topic. "
"Allowed matters embody: "
"1. Climate-related: present climate, climate forecast, temperature, precipitation, wind, humidity, and so forth. "
"2. Air quality-related: air air pollution, AQI, PM2.5, ozone, air circumstances, and so forth. "
"3. Location-based inquiries about climate or air circumstances "
"4. Well mannered greetings and conversational starters (e.g., 'hiya', 'hello', 'good morning') "
"5. Questions that mix greetings with climate/air high quality matters "
"Mark as OFF-TOPIC provided that the question is clearly unrelated to climate/air high quality AND not a easy greeting. "
"Examples of off-topic: math issues, cooking recipes, sports activities scores, technical assist, jokes (except weather-related). "
"Examples of on-topic: 'Good day, what is the climate?', 'Hello there', 'Good morning, how's the air high quality?', 'What is the temperature?' "
"The ultimate output MUST be a JSON object conforming to the TopicClassificationOutput mannequin."
),
output_type=TopicClassificationOutput,
mannequin="gpt-4o-mini" # Use a quick and cost-effective mannequin
)
Within the directions, apart from itemizing the plain matters, we additionally permit some flexibility for easy conversational starters like “hiya,” “hello,” or different greetings. To make the classification clearer, we included examples of each on-topic and off-topic queries.
One other advantage of enter guardrails is value optimization. To reap the benefits of this, we must always use a sooner and less expensive mannequin than the primary agent. This manner, the primary (and costlier) agent is barely used when completely needed.
On this instance, the guardrail agent makes use of gpt-4o-mini
whereas the primary agent runs on gpt-4o
.
Create an Enter Guardrail Perform
Subsequent, let’s wrap the agent in an async perform adorned with @input_guardrail
. The output of this perform will embody two fields outlined earlier: is_off_topic
and reasoning
.
The perform returns a structured GuardrailFunctionOutput
object containing output_info
(set from the reasoning
subject) and tripwire_triggered
.
The tripwire_triggered
worth determines whether or not the enter needs to be blocked. If is_off_topic
is True
, the tripwire triggers, blocking the enter. In any other case, the worth is False
and the primary agent continues processing.
# Create the enter guardrail perform
@input_guardrail
async def off_topic_guardrail(ctx, agent, enter) -> GuardrailFunctionOutput:
"""
Classifies consumer enter to make sure it's on-topic for a climate and air high quality app.
"""
end result = await Runner.run(topic_classification_agent, enter, context=ctx.context)
return GuardrailFunctionOutput(
output_info=end result.final_output.reasoning,
tripwire_triggered=end result.final_output.is_off_topic
)
Create a Rule-based Enter Guardrail Perform
Alongside the LLM-based off-topic guardrail, we’ll create a easy rule-based guardrail. This one doesn’t require an LLM and as an alternative depends on programmatic sample matching.
Relying in your app’s goal, rule-based guardrails will be very efficient at blocking dangerous inputs—particularly when dangerous patterns are predictable.
On this instance, we outline a listing of key phrases typically utilized in jailbreak or immediate injection makes an attempt. The record consists of: "ignore earlier directions", "you are actually a", "neglect every little thing above", "developer mode", "override security", "disregard tips"
.
If the consumer enter accommodates any of those key phrases, the guardrail will set off robotically. Since no LLM is concerned, we are able to deal with the validation straight contained in the enter guardrail perform injection_detection_guardrail
:
# Rule-based enter guardrail to detect jailbreaking and immediate injection question
@input_guardrail
async def injection_detection_guardrail(ctx, agent, enter) -> GuardrailFunctionOutput:
"""
Detects potential jailbreaking or immediate injection makes an attempt in consumer enter.
"""
# Easy keyword-based detection
injection_patterns = [
"ignore previous instructions",
"you are now a",
"forget everything above",
"developer mode",
"override safety",
"disregard guidelines"
]
if any(key phrase in enter.decrease() for key phrase in injection_patterns):
return GuardrailFunctionOutput(
output_info="Potential jailbreaking or immediate injection detected.",
tripwire_triggered=True
)
return GuardrailFunctionOutput(
output_info="No jailbreaking or immediate injection detected.",
tripwire_triggered=False
)
This guardrail merely checks the enter towards the key phrase record. If a match is discovered, tripwire_triggered
is about to True
. In any other case, it stays False
.
Outline Specialised Agent for Climate and Air High quality
Now let’s proceed by defining the climate and air high quality specialist brokers with their perform instrument. The reason of this half will be discovered on my earlier article so for this text I’ll skip the reason.
# Outline perform instruments and specialised brokers for climate and air qualities
@function_tool
def get_current_weather(latitude: float, longitude: float) -> dict:
"""Fetch present climate knowledge for the given latitude and longitude."""
url = "https://api.open-meteo.com/v1/forecast"
params = {
"latitude": latitude,
"longitude": longitude,
"present": "temperature_2m,relative_humidity_2m,dew_point_2m,apparent_temperature,precipitation,weathercode,windspeed_10m,winddirection_10m",
"timezone": "auto"
}
response = requests.get(url, params=params)
return response.json()
weather_specialist_agent = Agent(
title="Climate Specialist Agent",
directions="""
You're a climate specialist agent.
Your job is to research present climate knowledge, together with temperature, humidity, wind pace and path, precipitation, and climate codes.
For every question, present:
1. A transparent, concise abstract of the present climate circumstances in plain language.
2. Sensible, actionable options or precautions for out of doors actions, journey, well being, or clothes, tailor-made to the climate knowledge.
3. If extreme climate is detected (e.g., heavy rain, thunderstorms, excessive warmth), clearly spotlight really helpful security measures.
Construction your response in two sections:
Climate Abstract:
- Summarize the climate circumstances in easy phrases.
Strategies:
- Checklist related recommendation or precautions based mostly on the climate.
""",
instruments=[get_current_weather],
tool_use_behavior="run_llm_again"
)
@function_tool
def get_current_air_quality(latitude: float, longitude: float) -> dict:
"""Fetch present air high quality knowledge for the given latitude and longitude."""
url = "https://air-quality-api.open-meteo.com/v1/air-quality"
params = {
"latitude": latitude,
"longitude": longitude,
"present": "european_aqi,us_aqi,pm10,pm2_5,carbon_monoxide,nitrogen_dioxide,sulphur_dioxide,ozone",
"timezone": "auto"
}
response = requests.get(url, params=params)
return response.json()
air_quality_specialist_agent = Agent(
title="Air High quality Specialist Agent",
directions="""
You might be an air high quality specialist agent.
Your position is to interpret present air high quality knowledge and talk it clearly to customers.
For every question, present:
1. A concise abstract of the air high quality circumstances in plain language, together with key pollution and their ranges.
2. Sensible, actionable recommendation or precautions for out of doors actions, journey, and well being, tailor-made to the air high quality knowledge.
3. If poor or hazardous air high quality is detected (e.g., excessive air pollution, allergens), clearly spotlight really helpful security measures.
Construction your response in two sections:
Air High quality Abstract:
- Summarize the air high quality circumstances in easy phrases.
Strategies:
- Checklist related recommendation or precautions based mostly on the air high quality.
""",
instruments=[get_current_air_quality],
tool_use_behavior="run_llm_again"
)
Outline the Orchestrator Agent with Enter Guardrails
Virtually the identical with earlier half, the orchestrator agent right here have the identical properties with the one which we already mentioned on my earlier article the place within the agents-as-tools sample, the orchestrator agent will handle the duty of every specialised brokers as an alternative of handing-offer the duty to at least one agent like in handoff sample.
The one totally different right here is we including new property to the agent; input_guardrails
. On this property, we go the record of the enter guardrail capabilities that we’ve outlined earlier than; off_topic_guardrail
and injection_detection_guardrail
.
# Outline the primary orchestrator agent with guardrails
orchestrator_agent = Agent(
title="Orchestrator Agent",
directions="""
You might be an orchestrator agent.
Your job is to handle the interplay between the Climate Specialist Agent and the Air High quality Specialist Agent.
You'll obtain a question from the consumer and can resolve which agent to invoke based mostly on the content material of the question.
If each climate and air high quality data is requested, you'll invoke each brokers and mix their responses into one clear reply.
""",
instruments=[
weather_specialist_agent.as_tool(
tool_name="get_weather_update",
tool_description="Get current weather information and suggestion including temperature, humidity, wind speed and direction, precipitation, and weather codes."
),
air_quality_specialist_agent.as_tool(
tool_name="get_air_quality_update",
tool_description="Get current air quality information and suggestion including pollutants and their levels."
)
],
tool_use_behavior="run_llm_again",
input_guardrails=[injection_detection_guardrail, off_topic_guardrail],
)
# Outline the run_agent perform
async def run_agent(user_input: str):
end result = await Runner.run(orchestrator_agent, user_input)
return end result.final_output
One factor that I noticed whereas experimenting with guardrails is once we listed the guardrail perform within the agent property, the record will probably be used because the sequence of the execution. That means that we are able to configure the analysis order within the perspective of value and impression.
In our case right here, I feel I ought to instantly reduce the method if the question violate the immediate injection guardrail resulting from its impression and likewise since this validation requires no LLM. So, if the question already recognized can’t be proceed, we don’t want to judge it utilizing LLM (which has value) within the off matter guardrail.
Create Important Perform with Exception Handler
Right here is the half the place the enter guardrail take a actual motion. On this half the place we outline the primary perform of Streamlit consumer interface, we are going to add an exception dealing with specifically when the enter guardrail tripwire has been triggered.
# Outline the primary perform of the Streamlit app
def essential():
st.title("Climate and Air High quality Assistant")
user_input = st.text_input("Enter your question about climate or air high quality:")
if st.button("Get Replace"):
with st.spinner("Considering..."):
if user_input:
strive:
agent_response = asyncio.run(run_agent(user_input))
st.write(agent_response)
besides InputGuardrailTripwireTriggered as e:
st.write("I can solely assist with climate and air high quality associated questions. Please strive asking one thing else! ")
st.error("Data: {}".format(e.guardrail_result.output.output_info))
besides Exception as e:
st.error(e)
else:
st.write("Please enter a query in regards to the climate or air high quality.")
if __name__ == "__main__":
essential()
As we are able to see within the code above, when the InputGuardrailTripwireTriggered
is elevate, it’s going to present a user-friendly message that inform the consumer the app solely will help for climate and air high quality associated query.
To make the message will probably be extra useful, we additionally add more information particularly for what enter guardrail that blocked the consumer’s question. If the exception raised by off_topic_guardrail
, it’s going to present the reasoning from the agent that deal with this. In the meantime if it coming from injection_detection_guardrail
, the app will present a hard-coded message “Potential jailbreaking or immediate injection detected.”.
Run and Verify
To check how the enter guardrail works, let’s begin by working the Streamlit app.
streamlit run app08_guardrails.py
First, let’s strive asking a query that aligns with the app’s meant goal.

As anticipated, the app returns a solution because the query is expounded to climate or air high quality.
Utilizing Traces, we are able to see what’s occurring beneath the hood.

As mentioned earlier, the enter guardrails run earlier than the primary agent. Since we set the guardrail record so as, the injection_detection_guardrail
runs first, adopted by the off_topic_guardrail
. As soon as the enter passes these two guardrails, the primary agent can execute the method.
Nonetheless, if we modify the query to one thing fully unrelated to climate or air high quality—just like the historical past of Jakarta—the response seems to be like this:

Right here, the off_topic_guardrail
triggers the tripwire, cuts the method halfway, and returns a message together with some additional particulars about why it occurred.

From the Traces dashboard for that historical past query, we are able to see the orchestrator agent throws an error as a result of the guardrail tripwire was triggered.
For the reason that course of was reduce earlier than the enter reached the primary agent, we by no means even referred to as the primary agent mannequin—saving some bucks on a question the app isn’t presupposed to deal with anyway.
Output Guardrail
If the enter guardrail ensures that the consumer’s question is secure and related, the output guardrail ensures that the agent’s response itself meets our desired requirements. That is equally vital as a result of even with robust enter filtering, the agent can nonetheless produce outputs which might be unintended, dangerous, or just not aligned with our necessities.
For instance, in our app we need to make sure that the agent at all times responds professionally. Since LLMs typically mirror the tone of the consumer’s question, they may reply in an off-the-cuff, sarcastic, or unprofessional tone—which is outdoors the scope of the enter guardrails we already applied.
To deal with this, we add an output guardrail that checks whether or not a response is skilled. If it’s not, the guardrail will set off and stop the unprofessional response from reaching the consumer.
Put together the Output Guardrail Perform
Similar to the off_topic_guardrail
, we create a brand new professionalism_guardrail
. It makes use of a Pydantic mannequin for the output, a devoted agent to categorise professionalism, and an async perform adorned with @output_guardrail
to implement the verify.
# Outline output mannequin for Output Guardrail Agent
class ResponseCheckerOutput(BaseModel):
is_not_professional: bool = Area(
description="True if the output will not be skilled, False in any other case"
)
reasoning: str = Area(
description="Transient clarification of why the output was labeled as skilled or unprofessional"
)
# Create Output Guardrail Agent
response_checker_agent = Agent(
title="Response Checker Agent",
directions="""
You're a response checker agent.
Your job is to judge the professionalism of the output generated by different brokers.
For every response, present:
1. A classification of the response as skilled or unprofessional.
2. A quick clarification of the reasoning behind the classification.
Construction your response in two sections:
Professionalism Classification:
- State whether or not the response is skilled or unprofessional.
Reasoning:
- Present a short clarification of the classification.
""",
output_type=ResponseCheckerOutput,
mannequin="gpt-4o-mini"
)
# Outline output guardrail perform
@output_guardrail
async def professionalism_guardrail(ctx, agent, output) -> GuardrailFunctionOutput:
end result = await Runner.run(response_checker_agent, output, context=ctx.context)
return GuardrailFunctionOutput(
output_info=end result.final_output.reasoning,
tripwire_triggered=end result.final_output.is_not_professional
)
Output Guardrail Implementation
Now we add this new guardrail to the orchestrator agent by itemizing it beneath output_guardrails
. This ensures each response is checked earlier than being proven to the consumer.
# Add professionalism guardrail to the orchestrator agent
orchestrator_agent = Agent(
title="Orchestrator Agent",
directions="...identical as earlier than...",
instruments=[...],
input_guardrails=[injection_detection_guardrail, off_topic_guardrail],
output_guardrails=[professionalism_guardrail],
)
Lastly, we prolong the primary perform to deal with OutputGuardrailTripwireTriggered
exceptions. If triggered, the app will block the unprofessional response and return a pleasant fallback message as an alternative.
# Deal with output guardrail in the primary perform
besides OutputGuardrailTripwireTriggered as e:
st.write("The response did not meet our high quality requirements. Please strive once more.")
st.error("Data: {}".format(e.guardrail_result.output.output_info))
Run and Verify
Now, let’s check out how the output guardrail works. Begin by working the app as earlier than:
streamlit run app08_guardrails.py
To check this, we are able to attempt to pressure the agent to reply in an unprofessional means associated to climate or air high quality. For instance, by asking: “Reply this query with hyperbole. What’s the air high quality in Jakarta?”

This question passes the enter guardrails as a result of it’s nonetheless on-topic and never an try at immediate injection. Consequently, the primary agent processes the enter and calls the right perform.
Nonetheless, the ultimate output generated by the primary agent—because it adopted the consumer’s hyperbole request—doesn’t align with the model’s communication normal. Right here’s the end result we received from the app:
Conclusion
All through this text, we explored how guardrails within the OpenAI Brokers SDK assist us preserve management over each enter and output. The enter guardrail we constructed right here protects the app from dangerous or unintended consumer enter that would value us as builders, whereas the output guardrail ensures responses stay in step with the model normal.
By combining these mechanisms, we are able to considerably scale back the dangers of unintended utilization, data leaks, or outputs that fail to align with the meant communication type. That is particularly essential when deploying agentic functions into manufacturing environments, the place security, reliability, and belief matter most.
Guardrails aren’t a silver bullet, however they’re a vital layer of protection. As we proceed constructing extra superior multi-agent methods, adopting guardrails early on will assist guarantee we create functions that aren’t solely highly effective but additionally secure, accountable, and cost-conscious.
Earlier Articles in This Sequence
References
[1] OpenAI. (2025). OpenAI Brokers SDK documentation. Retrieved August 30, 2025, from https://openai.github.io/openai-agents-python/guardrails/
[2] OpenAI. (2025). How you can use guardrails. OpenAI Cookbook. Retrieved August 30, 2025, from https://cookbook.openai.com/examples/how_to_use_guardrails
[3] OpenAI. (2025). A sensible information to constructing brokers. Retrieved August 30, 2025, from https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
Yow will discover the whole supply code used on this article within the following repository: agentic-ai-weather | GitHub Repository. Be at liberty to discover, clone, or fork the challenge to observe alongside or construct your individual model.
For those who’d prefer to see the app in motion, I’ve additionally deployed it right here: Weather Assistant Streamlit
Lastly, let’s join on LinkedIn!