AI guardrails are everywhere in modern development discussions—but what if the best guardrail is not needing one at all? As I explored OpenAI’s new Agent Builder guardrail system this week, a surprising idea surfaced: engineers can sometimes design AI workflows so constrained that there’s nothing left to guard. In this post, I walk through the metaphor, the engineering mindset, and why shrinking the attack surface may be the most underrated AI safety technique.
I'm playing with AI agent guardrails this week. My first use case pushed me into territory I haven''t seen mentioned elsewhere and in interesting question:
We're engineers and programmers first right? Can you just control your AI application in such a way that there's nothing to place a guardrail on?
Allow me to make a visual metaphor. Some apps need guardrails because they have to go close to the edge..
While others simply do not.
It seems like to me then, that the first question in every develoment of AI guardrails should be, "Can we design the app so we don't need them at all?
I thought the AI detection of psy-ops toy example in my last post would be a good place to try out guardrails. Then it occurred to me that there was a better way.
- Ask the user to input the article that's to be scaled.
- Run the article through the AI scoring enabled by OpenAI ChatKit.
- Display the results.
- Turn off the AI. The job is done.
There's nothing left to guard against. We've obtained our value, our psy-op detection report. Simply take the AI back offline.
As usual, I asked GPT-5 what it thought. It had a few clarifications and I got to learn a spiffy new phrase. First, the new phrase. My idea had resulted in me shrinking "the attack surface a lot." That's kinda cool!
Second, GPT-5 pointed out that I wasn't correct in my assumptions. The user could still try things in their single shot. It adivised that I watch for
- Users trying to optimize a propaganda piece
- Expressions of self-harm
- The model elaborating on harmful content in the input article
The guardrails amount to what OpenAi's docs call moderation. When I asked for more resources, GPT-5 suggested the Microsoft Responsible AI Standard. I also found a referene guide to that standard.
GPT-5 also offered to generate an initial set of guardrails. I'm working with OpenAI's graphic UI agent builder to implement them. I'll add a post soon on my experiences there. As a short preview, it looks like if want full control over guardrails, I should be using the OpenAI API instead of the graphical builder. The guardrails cotrol in the UI only allows selection of canned guardrails for self-harm, violent material, and the like. It's useful, but doesn't allow the specification of app specific guardrails.
For the moment, I've placed the GPT-5 generated prompt in the agent instructions. Then, I clicked on the 'AI edit' icon and replided to "What would you like to change?" with "I'd like to add guardrails." An LLM thought for a bit, and then provided a beefed up version of the same prompt, not the option to speciffy guardrails with control logic I'd hoped for. You can see the diff of the two prompts here.
I'll also add more about verifying, (using evals ala OpenAI’s evaluation and alignment guidelines), the agent that comes out of this process.
Guardrails matter—but they aren’t always the starting point. By designing workflows intentionally, you can often eliminate entire categories of unsafe behavior before the model ever generates a token. In this walkthrough of OpenAI’s moderation tools, the Microsoft Responsible AI Standard, and GPT-5-based feedback loops, we looked at when guardrails are essential—and when thoughtful engineering can remove the need for them entirely.
If you're building AI-powered tools, subscribe or follow the blog to get the upcoming deep dive on designing custom guardrails in the OpenAI API and the Agent Builder.

Comments
Post a Comment
Please leave your comments on this topic: