Skip to main content

Posts

Showing posts with the label AI Evals

Deploying a ChatKit Demo for PsyOps Detection

 I deployed the LLM Psy-ops detection app earlier today! For those of you just hopping onboard, the WhyFiles ran an episode highlighting a simple, logical scoring method publicized by NCI for determining if a piece media or new article was emotionally manipulative, (think propaganda), or not.  I was looking for a good app to practice deployment, guardrails, and evals, and this one suggested by a @somethingLethal on reddit seemed promising in all those regards. If you'd like to try it, you can find the app at  https://projecttoucans.com/gladych_files_psy_ops  .  LLMs, Simple Math, and Pricing The Psy-op scoring instrument requires that the model sum the scores for the twenty categories. gpt-4o-mini did not sum any of the scores correctly. It got close, but that was about it. I experimented with the python code interpreter to cure the simple math issue. The code interpreter seemed reasonable at first. I mean, three cents per compute minute , not bad right? Ins...