Skip to main content

Posts

Showing posts with the label OCR

LLM Lab Book 2026-06-30: Using LLMs with Datasette-agent and Database Prep vs Token Usage on Claude

 I'm condensing the steps to move from travel manifest page to human readable findings to sqlite database here. History Research Contextual Recap I'm working on a history of physics research project, The Gladych Files , that explores how industrialists interested in fringe physics wound up actually funding mainstream general relativity research. As part of that research, I've been looking at the travel manifests of various industrialists and research scientists from the 1930s to the 1950s. Because there are literally thousands of passengers on their combined voyages, I'm using LLM agents orchestrated through Gas Town to coordinate the research. At present, I am working on a bit of a mystery. Multiple sources state that Hedy Lamarr came to the United States aboard the S.S. Normandy, arriving on September 30th, 1937. That's the same ship that Tom Slick, (one of the industrialists of whom I spoke above), took across the Atlantic. There's only one problem. Hedy is...

Gladych Files Lab Book: Document OCR vs LLM Model vs Cost, or Claude Opus is Cheaper than Sonnet for OCR!

I started my lab book entries when I was a physics graduate student. It's kind of amusing and kind of cool how far I've come. I have the equivalent of a grad student, (aka Claude Opus 4.7), working for me now. I spent some time over the weekend setting up an OCR framework for a book research project of mine. I've been coming up to speed on evals, so I decided to run one to determine which model was the most accurate and cost effective for doing OCR on travel manifest pages. I stepped the eval along rather than automating it and talked the results through with Opus as I went.  First, it turns out that Opus at low effort is the most accurate and the most cost effective choice! That was a surprise. The result has to do with Opus' ability to look at higher res images which means it needs to think less for OCR vs. Sonnet. Second, at the end of the eval, as I was preparing to write up my results it occurred to me that I could ask my grad student to do it instead. Here's...