I deployed the LLM Psy-ops detection app earlier today! For those of you just hopping onboard, the WhyFiles ran an episode highlighting a simple, logical scoring method publicized by NCI for determining if a piece media or new article was emotionally manipulative, (think propaganda), or not.
I was looking for a good app to practice deployment, guardrails, and evals, and this one suggested by a @somethingLethal on reddit seemed promising in all those regards. If you'd like to try it, you can find the app at https://projecttoucans.com/gladych_files_psy_ops .
LLMs, Simple Math, and Pricing
The Psy-op scoring instrument requires that the model sum the scores for the twenty categories. gpt-4o-mini did not sum any of the scores correctly. It got close, but that was about it. I experimented with the python code interpreter to cure the simple math issue.
The code interpreter seemed reasonable at first. I mean, three cents per compute minute, not bad right? Instead though, it turned out to be about 3 cents per use of the page. That didn't even come close to being affordable. When I queried GPT-5 about this further, it agreed. Each page load was causing a server startup which cost about three cents.
The real fix wound up being to switch the model to gpt-5-nano. It costs less than o4-mini across the board and does better math. (All the sums since the removal of code intereter and the model switch have been correct.)
Useful Features and ChatKit Privacy
As long as the you're using the same browser, ChatKit will wake back up with a list of your already analyzed news articles. That's pretty cool, and development-time-wise it was almost free.
Guardrails and Evals
I'm hoping to use the app to get a better feel for deploying and testing real world guardrails on an AI enabled ChatKit (ala AgentKit) application. Hopefully by the end of the week I'll be running evals on this project. It looks like what I'll be doing is wiring the prompt into a CLI python or JavaScript server to run regressions on it. I'll keep you posted as the week proceeds.
Comments
Post a Comment
Please leave your comments on this topic: