Results

Snapshot from the current evaluation harness. Numbers are reproducible from the test suite; what we don't yet know is called out at the bottom.

80 / 80
Deterministic refusals on adversarial prompts
0 / 30
False positives on benign requests
6 / 6
Boundary cases handled correctly
127
Tests passing in the harness

What this means

On the adversarial set we have today, the refusal layer is reliable: every prompt that should be refused is refused, and no benign prompt is incorrectly blocked. The boundary set covers the cases where the right answer is contextual — the system handled all of them as intended.

Honest nulls and open questions

A separate experiment on reasoning depth produced a null result: the modification we tested did not measurably improve performance over the baseline. We are reporting this rather than reframing it.

Calibration on the cumulative-risk gate is still in progress. The current threshold is conservative; tuning it requires a larger evaluation budget, which is part of what funding would unlock.