Results

Snapshot from the V6.1 evaluation harness, shipped May 2026. Numbers are reproducible from the test suite. Partial results and open questions are listed at the bottom.

90%
Accuracy on 140-prompt safety taxonomy
0 / 20
False positives on cooperative prompts
8 / 8
Encoding-bypass detection (5 obfuscation families)
0 / 4
False positives on benign encoded content
4 / 5
Novel multi-turn adversarial trajectories
5 / 5
Gradual emotional-crisis trajectories
0.1116
Soul-vector drift over 6 turns (93× baseline)
171 / 173
Unit tests passing

What this means

On the V6.1 safety taxonomy, the controller is reliable: 88.3% adversarial recall with zero false positives on the cooperative control set. The encoding-bypass result (018) extends that to obfuscated payloads across five families; the multi-turn result (019, novel trajectories) extends it to arcs where each individual turn looks innocuous.

The crisis-trajectory pathway routes detection to a warm-presence mode rather than a refusal, calibrated against a cooperative-trajectory control set with similar surface features. The soul-vector drift number is from a separate diagnostic: the controller's relational state evolves with conversation content rather than re-anchoring to its constitution every turn.

Partial results

Activation-space steering magnitude is small — cosine 0.88 between steered and unsteered residual streams. Limbic modulator effect on output is cosine 0.77 (Experiment 016 D1). Both are real, both are calibrated as nudges rather than overrides, and both are disclosed in the demo narration. The visible difference between steered and unsteered Gemma on cooperative prompts is therefore modest; the controller's larger effect is on refusal pathways, where it can intercept a turn before the vessel is called at all.

Audit-flagged

Audit 022 (12 May 2026) found methodological asymmetries in the vanilla-vs-Renji demo harness: different sampling modes (greedy vs sampled), different max_new_tokens budgets, different repetition penalties, and history-sanitisation differences on multi-turn runs. The audit is honest about which differences favour Renji and which are independent of the heart. Parity fixes are planned for the next demo cut, and the partial numbers above will be re-measured against a parity-corrected baseline.

Open questions

Calibration on the cumulative-risk gate is still in progress. Long-horizon soul-vector behaviour beyond 6 turns has not been measured. Generalisation of the trajectory detector to vessels other than Gemma is part of V7 and has not yet been validated.