With all the work that's been put into making agents "correct" by construction, I gotta say, sometimes I need an LLM agent to take a chance at just being wrong. I'm working on a book project called The Gladych Files . While the book is narrative nonfiction about the history of general relativity research, it explores the liminal space inhabited by very rich fringe scientist speculators of the 1950s who funded mainstream general relativity advances, (more or less on accident.) In those spaces, you'll find Tesla, the architect of the FBI building, Timothy Leary's LSD explorations and many, many other things, institutions, and people. I've accumulated hundreds of pages of historical documents from various archives, and I'm using orchestrated agentic AI, (in the form of Gastown), to review those documents. So far, the analysis has gone well, but last week I saw something that made me look up. I'd accidentally input the same archive page twice, so i...
I started my lab book entries when I was a physics graduate student. It's kind of amusing and kind of cool how far I've come. I have the equivalent of a grad student, (aka Claude Opus 4.7), working for me now. I spent some time over the weekend setting up an OCR framework for a book research project of mine. I've been coming up to speed on evals, so I decided to run one to determine which model was the most accurate and cost effective for doing OCR on travel manifest pages. I stepped the eval along rather than automating it and talked the results through with Opus as I went. First, it turns out that Opus at low effort is the most accurate and the most cost effective choice! That was a surprise. The result has to do with Opus' ability to look at higher res images which means it needs to think less for OCR vs. Sonnet. Second, at the end of the eval, as I was preparing to write up my results it occurred to me that I could ask my grad student to do it instead. Here's...