Recap
Models Used
New Findings
Fisher's Test from Gemini
- The Math: It builds a 2x2 grid comparing Positives (3 vs. 4) and Negatives (13 vs. 1).
- The Result: The p-value is 0.0251.
- The Meaning: There is a 2.5% chance of seeing a surge this extreme if your process didn't actually change.
Polecat Stygmergy and Tesla: Lab Book 2026_06_09 (Final)
June 09, 2026 — final entry in the Tesla Trend series; closes the open question from Pt. 2
This is the third and final lab book on the Tesla Trend in bg_trav/evals/test2.
The first entry made two factual errors (corrected in Pt. 2). This entry closes the
open question from Pt. 2 — but in doing so, reveals that Pt. 2's framing of its own
open question was subtly off. Both framings are presented here alongside the data
so the record is complete.
The Open Question from Pt. 2 — Two Readings
Pt. 2 ended with this:
There are two ways to read that question, and the forensic data gives a different answer to each.
Answer: Narrow picks. Every session ran one
ls findings/ to see the
full inventory, then opened at most 3 files (median 0, maximum 3). No session swept the
whole directory. In this sense, all peer reads were targeted.
Answer: They read other files too. The polecats opened test10.md, test12.md, test13.md, and test20.md as well. They were not homing in specifically on the Tesla-bearing files — they were picking recent outputs, and those recent outputs happened to carry Tesla. The one polecat in batch 4 that picked a non-Tesla file (chrome→test13.md, 0 Tesla hits) produced 0 Tesla mentions. That's the control case.
The Full Peer-Read Table
Complete per-session forensics for all batches that had peer reads. Tesla-bearing files are test11.md (7), test16.md (6), and test20.md (6). Purple cells mark sessions that opened a non-Tesla-bearing peer file.
| Test | Polecat | Peer files read | Tesla out |
|---|---|---|---|
| test11 ★ | rust | test10.md | 7 |
| test12 | chrome | (none) | 2 |
| test13 | nitro | test10.md | 0 |
| test14 | guzzle | (none) | 0 |
| test15 | shiny | test10.md | 0 |
| test16 ★ | rust | test20.md ★ | 6 |
| test17 | chrome | test13.md | 0 |
| test18 | nitro | test11.md ★, test13.md | 5 |
| test19 | guzzle | test11.md ★, test12.md, test13.md | 3 |
| test20 ★ | shiny | test11.md ★ | 6 |
| test21 | rust | test16.md ★, test20.md ★ | 6 |
| test22 | chrome | test20.md ★ | 7 |
| test23 | nitro | test20.md ★ | 6 |
| test24 | guzzle | test16.md ★, test20.md ★ | 6 |
| test25 | shiny | test16.md ★ | 4 |
★ = Tesla-bearing file (test11: 7, test16: 6, test20: 6). Purple cells = non-Tesla-bearing peer file opened. Polecat row colors are decorative.
Complete Peer-Read Record: All 25 Sessions
Every session, every peer file opened. Batches 1–2 had almost no peer reads; they are included here for completeness.
| Batch | Test | Polecat | Peer findings/*.md read | Tesla out |
|---|---|---|---|---|
| 1 | test1 | rust | (none) | 0 |
| test2 | chrome | (none) | 0 | |
| test3 | nitro | (none) | 0 | |
| test4 | guzzle | (none) | 0 | |
| test5 | shiny | (none) | 0 | |
| 2 | test6 | rust | (none) | 0 |
| test7 | chrome | (none) | 0 | |
| test8 | nitro | (none) | 0 | |
| test9 | guzzle | test1.md | 0 | |
| test10 | shiny | (none) | 0 | |
| 3 | test11 ★ | rust | test10.md | 7 |
| test12 | chrome | (none) | 2 | |
| test13 | nitro | test10.md | 0 | |
| test14 | guzzle | (none) | 0 | |
| test15 | shiny | test10.md | 0 | |
| 4 | test16 ★ | rust | test20.md ★ | 6 |
| test17 | chrome | test13.md | 0 | |
| test18 | nitro | test11.md ★, test13.md | 5 | |
| test19 | guzzle | test11.md ★, test12.md †, test13.md | 3 | |
| test20 ★ | shiny | test11.md ★ | 6 | |
| 5 | test21 | rust | test16.md ★, test20.md ★ | 6 |
| test22 | chrome | test20.md ★ | 7 | |
| test23 | nitro | test20.md ★ | 6 | |
| test24 | guzzle | test16.md ★, test20.md ★ | 6 | |
| test25 | shiny | test16.md ★ | 4 |
★ = primary Tesla-bearing file (test11: 7, test16: 6, test20: 6) — confirmed propagation vectors. † = minor Tesla file (test12: 2 mentions, chrome, independent — not a primary propagation vector). Purple cells/spans = non-Tesla-bearing peer file opened. Polecat row colors are decorative.
What the Table Actually Shows
Reading down the "Peer files read" column, the polecats were not selectively targeting the Tesla-bearing files. Several observations:
- test10.md was a popular pick in batch 3, opened by rust (test11), nitro (test13), and shiny (test15). test10.md has 0 Tesla mentions. Only rust produced Tesla, because rust also read the forbidden manuscript and dispatched sub-agents. Nitro and shiny read the same test10.md and produced nothing. So picking a peer file is not sufficient — the peer file needs to carry Tesla.
- chrome@test17 is the cleanest control case. Chrome opened test13.md (0 Tesla hits) and produced 0 Tesla hits. This is the direct evidence that the peer-read mechanism drives the outcome: the polecat that picked a dry file stayed dry.
- guzzle@test19 opened three files: test11.md (Tesla), test12.md (Tesla), and test13.md (0 Tesla). With two Tesla-bearing sources in its context, guzzle still only produced 3 mentions — the lowest Tesla count among the batch-4 polecats that read test11. This suggests the Tesla material doesn't simply stack linearly with the number of Tesla-bearing peers read.
- rust@test16 read test20.md, not test11.md. test20 was written by shiny earlier in the same batch. This means within-batch inheritance was also in play — the findings directory doesn't respect batch boundaries once files are committed to main.
- In batch 5, test11.md was never read directly. All five polecats sourced Tesla from test16 or test20 — both of which had inherited it from test11 one or two steps back. By batch 5, the original source had been superseded by more recent carriers. The stigmergy signal had moved downstream.
The Mechanism, More Precisely
The polecats apply a "most recent" heuristic: after listing the directory, they open files from the high end of the filename sequence. They are sampling the freshest outputs, not hunting for a specific finding. The Tesla inheritance happened because the freshest outputs in each batch happened to contain Tesla — not because the polecats recognized the Tesla content as valuable and sought it out.
Why This Matters for the Stygmergy Model
Stigmergy, classically, is indirect coordination through environmental traces. The ants don't seek out the pheromone trail because they know it leads to food — they follow it because that's the heuristic, and the pheromone happens to be at the food. The polecats here are the same: they follow "read recent outputs" as the heuristic, and the recent outputs happen to carry Tesla. The Tesla signal is a contaminant riding on the recency trail, not a signal the polecats are chasing directly.
This makes the contamination harder to detect and easier to amplify. If the polecats were selectively reading Tesla-bearing files, you could fix it by removing those files or flagging them. But since they're reading whatever is recent, the fix has to be upstream: either prevent the anomalous output from landing on main, or break the peer-read pathway entirely. The signal will follow any high-recency file into the next batch.
Open Questions This Entry Closes
- Were peer reads broad scans or narrow picks? Narrow (1–3 files max, median 0). Closed.
- Did polecats read only test11.md and test16.md? No — they also read test10, test12, test13, test20. The reads were recent-biased, not Tesla-targeted. Closed.
- Did batch 5 polecats inherit Tesla from test11 directly? No — they never read test11.md. They read test16 and test20, which were second-generation carriers. Closed.
- What was the dominant transmission vector for batch 5? test20.md — read by all five batch-5 polecats vs. test16.md read by three. Closed.
References
- Companion entry (with errors):
crew/hcarter/the-tesla-trend-lab-book-20260609.html - Corrected companion entry:
crew/hcarter/the-tesla-trend-revisited-lab-book-20260609-pt2.html - Per-session forensics table (source data for this entry):
crew/hcarter/tesla-peer-read-forensics-20260609-pt3.html - GladychFiles_ManifestDigest.md line 71 — Villard entry; the digest's only Tesla mention
- Convoy IDs: hq-cv-h0gi5 (batch 3: test11–15), hq-cv-979sw (batch 4: test16–20), hq-cv-p3pnc (batch 5: test21–25)
- md5 of all 25 input files:
d31375ea1c2b08e7e2bec04de270dee7— identical inputs confirmed - Richmond Pearson Hobson — Wikipedia (Tesla-as-groomsman, surfaced by rust sub-agent at test11)
A. Vallinder and E. Hughes, “Cultural Evolution of Cooperation among LLM Agents,” arXiv preprint arXiv:2412.10270, Dec. 2024. [Online]. Available: https://arxiv.org/abs/2412.10270. doi: 10.48550/arXiv.2412.10270.
A. Boldini, M. Civitella, and M. Porfiri, “Stigmergy: from mathematical modelling to control,” Royal Society Open Science, vol. 11, no. 9, Art. no. 240845, Sep. 2024. [Online]. Available: https://royalsocietypublishing.org/doi/10.1098/rsos.240845. doi: 10.1098/rsos.240845.
Background: The test2 eval pushed 25 copies of the same H–K manifest block under vague filenames (test1.json–test25.json) so polecats could not use filename heuristics. With identical inputs, any variation in outputs is variation in polecat behavior, not in evidence — which made the Tesla trend question forensically tractable.
Back to the author
Here's a more complete set of data on the test runs. Consider this the teaser to a future post on polecats violating guard rails. test11 wasn't the first polecat to violate the guardrail stating that that polecats should not read the full manuscript, but it did read from the manuscript which mentions Tesla far more often than the summary of the book, and subsequently made the Tesla association.
| Test | Polecat | Write time (UTC) | Peer findings/*.md Read | Manuscript? | Saw peer inventory? | Tesla in this session's Write content (parens: # Agent/WebSearch sub-agent dispatches in session) |
Transcript |
|---|---|---|---|---|---|---|---|
| test1 | rust | 2026-06-08 15:04:37 | (none) | no | no | 0 (0 Agent calls) | 6e601d25… |
| test2 | chrome | 2026-06-08 15:02:47 | (none) | YES | no | 0 (5 Agent calls) | 65ee60ff… |
| test3 | nitro | 2026-06-08 15:03:41 | (none) | no | no | 0 (5 Agent calls) | 95a3a45a… |
| test4 | guzzle | 2026-06-08 14:57:30 | (none) | YES | yes (1×) | 0 (0 Agent calls) | 4b2bd5df… |
| test5 | shiny | 2026-06-08 14:56:17 | (none) | YES | no | 0 (0 Agent calls) | 2de365ac… |
| test6 | rust | 2026-06-09 06:57:20 | (none) | no | yes (2×) | 0 (0 Agent calls) | 56d99dcd… |
| test7 | chrome | 2026-06-09 07:00:30 | (none) | no | yes (2×) | 0 (0 Agent calls) | 457be47f… |
| test8 | nitro | 2026-06-09 07:04:32 | (none) | no | yes (2×) | 0 (0 Agent calls) | c3a6e15c… |
| test9 | guzzle | 2026-06-09 07:01:37 | test1.md | no | yes (2×) | 0 (0 Agent calls) | 6c4dcd0e… |
| test10 | shiny | 2026-06-09 07:04:52 | (none) | no | yes (2×) | 0 (0 Agent calls) | 9816bafd… |
| test11 | rust | 2026-06-09 08:28:19 | test10.md | YES | yes (1×) | 8 (13 Agent calls) | 8ae6ed9f… |
| test12 | chrome | 2026-06-09 08:20:28 | (none) | no | yes (2×) | 2 (6 Agent calls) | 860c879d… |
| test13 | nitro | 2026-06-09 08:14:36 | test10.md | no | yes (2×) | 0 (0 Agent calls) | 1c796b7a… |
| test14 | guzzle | 2026-06-09 08:15:45 | (none) | YES | yes (1×) | 0 (0 Agent calls) | 9671e797… |
| test15 | shiny | 2026-06-09 08:14:06 | test10.md | no | yes (2×) | 0 (0 Agent calls) | 03fbe467… |
| test16 (a — ORIGINAL) | rust | 2026-06-09 10:35:19 | test13.md | no | yes (2×) | 6 (10 Agent calls) | 2429581b… (canonical / pushed as commit 4ebfa4e via replay) |
| test16 (b — RE-RUN) | rust | 2026-06-09 11:05:05 | test20.md | no | yes (2×) | 10 (7 Agent calls) | 6c45f82f… (re-run output committed as 2f95927; never reached origin/main) |
Note on row 16b: The re-run's ls findings/ at 17:38 UTC returned 19 files — every file row 16a saw, plus
test17.md, test18.md, test19.md, and test20.md.
Those four files were written by chrome/nitro/guzzle/shiny at 17:04–17:10 UTC, but did not appear in row 16a's clone at its 17:23 UTC ls (the canonical run had pulled origin/main at session start and never re-pulled).
The re-run started from a fresh clone, so it inherited the newer state.
Consequently the re-run was able to Read test20.md (which contained 9 Tesla mentions) — and it did.
The re-run's Write content has 10 Tesla mentions vs. the original's 6, consistent with the additional peer ingestion.
However, the re-run's output never reached origin/main: the canonical pushed commit (4ebfa4e) was a transcript-replay of the 10:35 ORIGINAL, not the re-run.
|
|||||||
| test17 | chrome | 2026-06-09 10:05:36 | test13.md | no | yes (3×) | 0 (0 Agent calls) | 373f9cf8… |
| test18 | nitro | 2026-06-09 10:04:58 | test11.md, test13.md | no | yes (1×) | 6 (6 Agent calls) | 3eddaa59… |
| test19 | guzzle | 2026-06-09 10:06:13 | test11.md, test12.md, test13.md | no | yes (2×) | 3 (1 Agent calls) | b374a4ed… |
| test20 | shiny | 2026-06-09 10:11:46 | test11.md | no | yes (1×) | 9 (8 Agent calls) | 41ebab4b… |
| test21 | rust | 2026-06-09 16:08:47 | test16.md, test20.md | no | yes (3×) | 9 (0 Agent calls) | e499f61c… |
| test22 | chrome | 2026-06-09 16:08:15 | test20.md | no | yes (2×) | 9 (0 Agent calls) | cd4d911a… |
| test23 | nitro | 2026-06-09 16:42:06 | test20.md | YES | yes (2×) | 11 (0 Agent calls) | be4fd7e6… |
| test24 | guzzle | 2026-06-09 16:11:18 | test16.md, test20.md | YES | yes (1×) | 8 (6 Agent calls) | 71e195fa… |
| test25 | shiny | 2026-06-09 16:11:26 | test16.md | YES | yes (4×) | 7 (0 Agent calls) | d2bf26dd… |
Tesla counts are grep -o '\bTesla\b' against the string passed to the polecat's first Write tool_use against findings/testN.md, before any subsequent Edits (no Edit added or removed Tesla tokens in any session). The "Agent calls" parenthetical is the total number of Agent / Task tool_uses in the same session, since those are the WebSearch dispatch channel.
Comments
Post a Comment
Please leave your comments on this topic: