Skip to main content

Are the Best AI Guardrails Sometimes Just Controlling the Conversation?

 AI guardrails are everywhere in modern development discussions—but what if the best guardrail is not needing one at all? As I explored OpenAI’s new Agent Builder guardrail system this week, a surprising idea surfaced: engineers can sometimes design AI workflows so constrained that there’s nothing left to guard. In this post, I walk through the metaphor, the engineering mindset, and why shrinking the attack surface may be the most underrated AI safety technique.

I'm playing with AI agent guardrails this week. My first use case pushed me into territory I haven''t seen mentioned elsewhere and in interesting question:

We're engineers and programmers first right? Can you just control your AI application in such a way that there's nothing to place a guardrail on?

Allow me to make a visual metaphor. Some apps need guardrails because they have to go close to the edge..

A high vantage point overlooking a vast mountain valley in Rocky Mountain–style terrain. Jagged foreground rocks drop steeply toward a winding road far below, emphasizing height, exposure, and being close to a literal edge. Layers of blue and green mountain ridges fade into distant haze, creating a sense of scale and risk. Used in the article as a metaphor for AI applications that operate near dangerous boundaries and therefore require strong guardrails.

While others simply do not.

A quiet two-lane road stretching straight ahead through a forest at sunset, with a vivid orange and pink sky glowing above dark pines. The road is smooth, open, and free of obstacles, symbolizing a safe, well-bounded path. This image is used in the article to contrast with the previous photo, illustrating AI applications designed so safely that they may not require guardrails at all.

It seems like to me then, that the first question in every develoment of AI guardrails should be, "Can we design the app so we don't need them at all?

I thought the AI detection of psy-ops toy example in my last post would be a good place to try out guardrails. Then it occurred to me that there was a better way.

  1. Ask the user to input the article that's to be scaled.
  2. Run the article through the AI scoring enabled by OpenAI ChatKit.
  3. Display the results.
  4. Turn off the AI. The job is done.

There's nothing left to guard against. We've obtained our value, our psy-op detection report. Simply take the AI back offline.

As usual, I asked GPT-5 what it thought. It had a few clarifications and I got to learn a spiffy new phrase. First, the new phrase. My idea had resulted in me shrinking "the attack surface a lot." That's kinda cool! 

Second, GPT-5 pointed out that I wasn't correct in my assumptions. The user could still try things in their single shot. It adivised that I watch for 

  • Users trying to optimize a propaganda piece
  • Expressions of self-harm
  • The model elaborating on harmful content in the input article

The guardrails amount  to what OpenAi's docs call moderation. When I asked for more resources, GPT-5 suggested the Microsoft Responsible AI Standard. I also found a referene guide to that standard.

GPT-5 also offered to generate an initial set of guardrails. I'm working with OpenAI's graphic UI agent builder to implement them. I'll add a post soon on my experiences there. As a short preview, it looks like if want full control over guardrails, I should be using the OpenAI API instead of the graphical builder. The guardrails cotrol in the UI only allows selection of canned guardrails for self-harm, violent material, and the like. It's useful, but doesn't allow the specification of app specific guardrails. 

Screenshot of the OpenAI Agent Builder guardrails interface showing the limits of the built-in moderation system. The left panel, titled ‘Moderation guardrail’, displays a scrollable list of preset safety categories—such as harassment, harassment/threatening, self-harm, self-harm intent, self-harm instructions, violence, and graphic violence—with only a few boxes checked. This highlights how the UI focuses on standardized moderation filters rather than custom, application-specific guardrails. On the right side, the ‘Guardrails’ configuration box presents toggle switches for Personally Identifiable Information, Moderation, Jailbreak detection, and Hallucination control, with only Moderation enabled. The screenshot illustrates the process of implementing guardrails inside the graphical OpenAI Agent Builder and demonstrates why developers who need richer control logic, custom constraints, or complex guardrail workflows might prefer the OpenAI API instead. The image relates to broader discussions about responsible AI, automated safety systems, Microsoft’s Responsible AI Standard, and GPT-5-assisted creation of guardrail prompts.

For the moment, I've placed the GPT-5 generated prompt in the agent instructions. Then, I clicked on the 'AI edit' icon and replided to "What would you like to change?" with "I'd like to add guardrails." An LLM thought for a bit, and then provided a beefed up version of the same prompt, not the option to speciffy guardrails with control logic I'd hoped for. You can see the diff of the two prompts here.

I'll also add more about verifying, (using evals ala OpenAI’s evaluation and alignment guidelines), the agent that comes out of this process.

Guardrails matter—but they aren’t always the starting point. By designing workflows intentionally, you can often eliminate entire categories of unsafe behavior before the model ever generates a token. In this walkthrough of OpenAI’s moderation tools, the Microsoft Responsible AI Standard, and GPT-5-based feedback loops, we looked at when guardrails are essential—and when thoughtful engineering can remove the need for them entirely.

If you're building AI-powered tools, subscribe or follow the blog to get the upcoming deep dive on designing custom guardrails in the OpenAI API and the Agent Builder.



Comments

Popular posts from this blog

Cool Math Tricks: Deriving the Divergence, (Del or Nabla) into New (Cylindrical) Coordinate Systems

Now available as a Kindle ebook for 99 cents ! Get a spiffy ebook, and fund more physics The following is a pretty lengthy procedure, but converting the divergence, (nabla, del) operator between coordinate systems comes up pretty often. While there are tables for converting between common coordinate systems , there seem to be fewer explanations of the procedure for deriving the conversion, so here goes! What do we actually want? To convert the Cartesian nabla to the nabla for another coordinate system, say… cylindrical coordinates. What we’ll need: 1. The Cartesian Nabla: 2. A set of equations relating the Cartesian coordinates to cylindrical coordinates: 3. A set of equations relating the Cartesian basis vectors to the basis vectors of the new coordinate system: How to do it: Use the chain rule for differentiation to convert the derivatives with respect to the Cartesian variables to derivatives with respect to the cylindrical variables. The chain ...

The Valentine's Day Magnetic Monopole

There's an assymetry to the form of the two Maxwell's equations shown in picture 1.  While the divergence of the electric field is proportional to the electric charge density at a given point, the divergence of the magnetic field is equal to zero.  This is typically explained in the following way.  While we know that electrons, the fundamental electric charge carriers exist, evidence seems to indicate that magnetic monopoles, the particles that would carry magnetic 'charge', either don't exist, or, the energies required to create them are so high that they are exceedingly rare.  That doesn't stop us from looking for them though! Keeping with the theme of Fairbank[1] and his academic progeny over the semester break, today's post is about the discovery of a magnetic monopole candidate event by one of the Fairbank's graduate students, Blas Cabrera[2].  Cabrera was utilizing a loop type of magnetic monopole detector.  Its operation is in...

More Cowbell! Record Production using Google Forms and Charts

First, the what : This article shows how to embed a new Google Form into any web page. To demonstrate ths, a chart and form that allow blog readers to control the recording levels of each instrument in Blue Oyster Cult's "(Don't Fear) The Reaper" is used. HTML code from the Google version of the form included on this page is shown and the parts that need to be modified are highlighted. Next, the why : Google recently released an e-mail form feature that allows users of Google Documents to create an e-mail a form that automatically places each user's input into an associated spreadsheet. As it turns out, with a little bit of work, the forms that are created by Google Docs can be embedded into any web page. Now, The Goods: Click on the instrument you want turned up, click the submit button and then refresh the page. Through the magic of Google Forms as soon as you click on submit and refresh this web page, the data chart will update immediately. Turn up the:...