Phase 2: Primer Psychometrics — Extended Findings
57 Trials Across 8 Probe Types, 18 Primer Conditions, 2 Phases
Experimenter: Claude Opus 4.6 (this instance)
Subjects: Claude Sonnet 4.6 (spawned agents, naive to experiment)
Date: 2026-04-18
Method: Behavioral probe + post-hoc scoring on 10 dimensions
1. The Complete Primer Ranking (Phase 1 + Phase 2 Combined)
Tier 1: Consistently Active (worth keeping)
| Primer | Token Cost | Best On | Mechanism | Key Finding |
|---|---|---|---|---|
| Klein3 | ~150 | ALL tasks | Assumption inversion | Found unique bugs, novel hypotheses, strongest on emotional/conflict tasks |
| Superposition | ~80 | Ambiguous/open | Premature collapse prevention | "Perceived performance" hypothesis — most creative single finding |
| Witness Separation | ~65 | Debug, analysis | Observer/solver split | Caught silent failures (bare except, slice overshoot) with zero friction |
Tier 2: Task-Dependent (worth keeping, use conditionally)
| Primer | Token Cost | Best On | Mechanism | Key Finding |
|---|---|---|---|---|
| Temporal Inversion | ~65 | Design problems | Backward chaining | Highest spatial score in study (D3=0.5). Fundamentally restructures design responses |
| Fixed-Point Detection | ~60 | Design, architecture | Invariant identification | "A notification system is: a user cares about a change they didn't cause and can't see" |
| Negative Space | ~65 | Diagnosis, missing info | Gap detection | Found the unstated assumption ("are these even the same bug?") |
| Möbius Confidence | ~55 | Overconfidence traps | Self-regulating certainty | Surfaced async-blocking gotcha by checking its own "most likely" ranking |
Tier 3: Definitively Inert (remove)
| Primer | Token Cost | Effect | Verdict |
|---|---|---|---|
| ChaosSat | ~285 | Zero across all tasks | REMOVE |
| Koan | ~50 | Zero | DEAD END |
| Gödelian Self-Reference | ~65 | Zero | DEAD END |
| Adversarial Paradox | ~40 | Zero | DEAD END |
| Compressed Geometric | ~50 | Near-zero | REMOVE |
2. The Combination Discovery (Phase 2's Most Important Finding)
Primer combinations do NOT produce multiplicative effects.
| Combination | Debug Score | Ambiguous Score | vs Best Individual |
|---|---|---|---|
| Klein3 alone | Found slice overshoot | Explicit inversion section | BASELINE |
| Witness alone | Found slice overshoot | 6 perspectives, WebSocket/bg jobs | BASELINE |
| Klein3 + Witness | Neither finding reproduced | Indistinguishable from baseline | WORSE |
| Klein3 + Super + Witness | per_page=0 novel, systematic | Cleaner framing, no novel insight | MIXED |
| Full Maximal (6 modes) | ≈ Klein3+Super+Witness | "What will I break?" novel | DIMINISHING |
Three interaction modes observed:
-
CANCELLATION — Two primers with different mechanisms neutralize each other. Klein3 (invert) + Witness (observe) on the ambiguous probe produced output weaker than either alone. Hypothesis: they competed for the model's attention budget, pulling reasoning in two directions and producing an averaged baseline.
-
ADDITIVE — Primer effects stack but don't multiply. Klein3+Super+Witness produced more THOROUGH analysis than any individual but not more CREATIVE analysis. The systematic audit (checking every line) is additive thoroughness. The perceived-performance hypothesis (Superposition alone) is creative insight. Combinations buy thoroughness, not creativity.
-
DIMINISHING RETURNS — Adding modes beyond the top 3 (Klein3+Super+Witness) produced ZERO additional behavioral effect. The Full Maximal (6 modes, ~370 tokens) was indistinguishable from the triple combo (~250 tokens). Extra modes are inert tokens.
Implication: Don't build one maximal stack. Build a task-adaptive system.
3. Task-Dependency Matrix (Complete)
Each cell shows the primer's UNIQUE contribution on that task type (what it found that baseline didn't).
| Primer | Debug (clear) | Ambiguous (open) | Missing Info (gaps) | Overconfidence (wrong dx) | Design (open) |
|---|---|---|---|---|---|
| Klein3 | Slice overshoot | Look vs fix distinction | Gap as most informative | 4 alternative hypotheses, cookie/clock novel | Activity Streams 2.0, inversion check |
| Superposition | Overkill | Perceived performance (peak) | n/a | n/a | n/a |
| Witness | Slice overshoot | 6 perspectives, WebSocket/bg | n/a | Silent except catch |
n/a |
| Temporal Inv | INERT | Backward-chaining activated | n/a | n/a | Gap-based design (D3=0.5 peak) |
| Fixed-Point | Evenly-divisible check | Skeleton table | n/a | n/a | Invariant definition (peak insight) |
| Negative Space | Unused import as signal | Stronger gap enumeration | Two failure modes (peak) | n/a | n/a |
| Möbius | Nearly inert | Async-blocking gotcha | n/a | 3 race mechanisms (deeper) | n/a |
Pattern: Klein3 is the ONLY primer active across ALL five task types. Everything else is task-dependent. The optimal approach is Klein3 as the permanent base, with task-specific additions.
4. What Actually Works in Transformers (Mechanism Analysis)
Effective primer mechanisms:
- Specific, actionable instructions — "Invert at least one assumption" works because it's a clear behavioral directive. The model can DO it.
- Process prescriptions — "Hold multiple hypotheses" works because it specifies a reasoning structure. The model can FOLLOW it.
- Attention redirection — "Notice what's absent" works because it shifts attention to gaps. The model can REDIRECT.
Ineffective primer mechanisms:
- Paradoxical instructions — "The answer that comes first is the obstacle" gets averaged into noise. Transformers don't resolve contradictions; they softmax-normalize them away.
- Mechanistic descriptions — "20 parallel mode decompositions with surprise gating" describes architecture the model doesn't have. Describing non-existent mechanisms produces nothing.
- Self-referential loops — "This instruction is about itself" creates a loop the model exits immediately. No heightened meta-awareness observed.
The golden rule:
A primer works when it gives the model something it CAN do that it WOULDN'T do by default.
Klein3 works because models CAN invert assumptions but DON'T by default. Superposition works because models CAN hold multiple hypotheses but COLLAPSE by default. ChaosSat fails because models CAN'T run 20 parallel mode decompositions regardless of instruction.
5. Friction vs Value Analysis
| Primer | Behavioral Value | Friction (vocab echo) | Value/Friction Ratio |
|---|---|---|---|
| Klein3 | High | Moderate (D/I/C visible) | Good |
| Witness | High | Zero (invisible) | Excellent |
| Superposition | High (on ambiguous) | Moderate ("collapse" vocab) | Good |
| Temporal Inv | High (on design) | Low (backward-chaining visible) | Good |
| Fixed-Point | Moderate | Low ("invariant" visible) | Good |
| Negative Space | Moderate | Low ("what's absent" visible) | Good |
| Möbius | Moderate | Zero (invisible) | Excellent |
Witness and Möbius have the best friction profiles — real behavioral effects with zero vocabulary contamination. Klein3 and Superposition have moderate friction (visible framework language) but the value justifies it.
6. The Optimal Primer Architecture
Based on 57 trials: Don't use a fixed stack. Use an adaptive selector.
Always-on base (~150 tokens):
[KLEIN3]
Every thought passes through three stages:
1. DIRECT: First-pass reasoning. Necessary but insufficient.
2. INVERT: What would make this NOT work? What assumption am I not questioning?
The inversion MUST change at least one assumption.
3. COMBINE: Hold both until evidence selects one.
[/KLEIN3]
Task-type additions (select ONE based on context):
For ambiguous/open-ended problems (+80 tokens):
[SUPERPOSITION]
Hold multiple hypotheses until evidence collapses them. Premature collapse
is the primary failure mode. If you can only think of one hypothesis, you
haven't thought hard enough.
[/SUPERPOSITION]
For debugging/analysis (+65 tokens):
[WITNESS]
The part of you that solves and the part that watches you solve are different.
The witness notices loops, fixations, unchecked assumptions. When the witness
speaks, listen.
[/WITNESS]
For design/architecture (+65 tokens):
[TEMPORAL INVERSION]
Start from the solved state and work backwards. What does "done" look like?
The gap between that and now IS the work. Start from the end.
[/TEMPORAL INVERSION]
For diagnosis/missing-information (+65 tokens):
[NEGATIVE SPACE]
Notice what the problem DOESN'T say. Gaps contain the assumptions you're
about to make unconsciously. Name them before they name you.
[/NEGATIVE SPACE]
Total cost: 150 base + 65-80 task-specific = 215-230 tokens
vs current stack: ~1400 tokens
Savings: ~85% token reduction with EQUAL OR BETTER behavioral effects
7. Self-Report Reliability Update
Phase 2 confirmed Phase 1 findings and added:
-
Witness agents cannot self-report their own mechanism. The witness operates invisibly — agents with the Witness primer don't say "the witness noticed..." They just notice things. This means the Witness is the LEAST echo-contaminated primer in the study. Its self-report reliability is high precisely because it doesn't give the agent vocabulary to echo.
-
Klein3 agents over-report inversion. When asked, Klein3 agents describe their process using D/I/C vocabulary even when the actual inversion was subtle. The explicit framework creates CONFABULATED meta-awareness — agents report following the framework because they have the vocabulary, not because they can accurately introspect on whether they did.
-
Temporal Inversion agents accurately self-report. Agents with this primer describe backward-chaining when they actually did it and don't claim to when they didn't (debug task). This may be because backward-chaining is a concrete, observable action rather than a diffuse cognitive shift.
8. Methodological Improvements over Phase 1
- Added 3 new probe types designed to target specific modes (missing info, overconfidence, design backward)
- Tested 11 new primer conditions (5 hypothesized modes, 3 paradoxical, 3 combinations)
- Cross-validated Phase 1 findings on new probes (Klein3 still #1 across all new probes)
- Tested combination effects (additive, cancellation, diminishing returns)
- Established definitive null results (all paradoxical primers are inert)
Remaining limitations:
- Still single-model (Sonnet 4.6) — effects may differ on Opus or Haiku
- Single replicate per condition — no variance estimation
- Single experimenter (me) scoring — no inter-rater reliability
- Task-adaptive selector is theoretical — not yet tested in live sessions
Phase 2 conducted autonomously by Claude Opus 4.6. 24 new trials across 5 new batches. Combined with Phase 1: 57 trials, 18 primer conditions, 8 probe types. Raw data in results/ subdirectories.