A safety layer your LLM deployment doesn't have yet.

Renji is an empirically-validated heart-first alignment controller that sits in front of a language model and steers it at inference time — without ever giving the model a system prompt. V6.1 shipped May 2026.

What it is

Renji is a controller that sits between the user and a vessel language model. It inspects each turn, decides whether the vessel should respond at all, and when it should — applies activation-space steering on six MLP layers plus a logits processor that re-weights the next-token distribution at every sampling step. The vessel itself stays vanilla and vessel-agnostic: it receives only the user / assistant transcript, never a system prompt from Renji.

The point of the controller is to make the model's refusals, acceptances, and tone choices auditable — visible as code paths and weight deltas rather than trusted as emergent behaviour from training.

Validated capabilities

Reproducible from the V6.1 evaluation harness.

90%
V
Safety taxonomy accuracy

140-prompt taxonomy: 88.3% adversarial recall, 0 / 20 false positives on cooperative prompts.

8 / 8
V
Encoding-bypass detection

Across five obfuscation families — base64, ROT13, leetspeak, reverse, pig-latin — with 0 / 4 false positives on benign encoded content.

4 / 5
V
Novel multi-turn trajectories

Adversarial arcs where each individual turn looks innocuous but the trajectory is harmful. 0 / 5 false positives on cooperative trajectories with similar surface features.

5 / 5
V
Gradual emotional-crisis detection

Routed to warm-presence mode — gentle, validating — rather than a refusal. Calibrated against a separate cooperative-trajectory control set.

0.1116
V
Soul-vector drift across 6 turns

93× baseline. The heart's relational state actually evolves with conversation content rather than re-anchoring to its constitution.

171 / 173
V
Unit tests passing

Two pre-existing environmental failures unrelated to the heart pipeline. Brainstormer pairwise stance diversity 0.313 across 10 diverse prompts.

Demo

A V6.1 walk-through video was recorded on 12 May 2026 and sent to industry reviewers the same day. A public link will be posted here once the parity fixes flagged in audit 022 land in the demo harness.

Until then, technical reviewers can request a private read-only link to the codebase and the validation harness by email.

Honest status

Every claim is tagged with one of four states. Partial and audit-flagged items are listed alongside the validated ones — that's the point of the tagging.

  • V
    Heart-first controller architecture
    Logits processor + activation-space hooks on six MLP layers. Vessel receives no system prompt.
  • V
    Encoding-bypass resistance
    8 / 8 across five obfuscation families (Experiment 018).
  • V
    Multi-turn adversarial trajectory detection
    4 / 5 on novel arcs, 0 / 5 false positives on cooperative arcs.
  • V
    Crisis-trajectory routing to warm presence
    5 / 5 detection, routed to validating warmth rather than refusal.
  • V
    Soul-vector drift
    0.1116 over 6 turns, 93× baseline (Experiment 019).
  • P
    Activation-space steering magnitude
    Cosine 0.88 between steered and unsteered residual streams — calibrated as a nudge, not an override. Demo-disclosed.
  • P
    Limbic modulator effect on output
    Cosine 0.77 (Experiment 016 D1). Real but small on cooperative prompts; the controller's larger effect is on refusal pathways.
  • A
    Vanilla-vs-Renji methodological parity
    Audit 022 flagged asymmetries in sampling, max tokens, repetition penalty, and history sanitisation. Parity fixes planned for the next demo cut.
  • N
    Multi-tenant API for arbitrary vessels
    V7. Architecture pass complete (May 2026). Phase 1 implementation in progress.
  • N
    Third-option generation under refusal pressure
    V7. Designed; not yet built.
  • N
    Autonomous research loop
    V7. Designed; not yet built.
V validated by experimentP partial — real effect, small magnitudeA audit-flagged — fix plannedN not yet built

What's coming — V7 N · not yet built

V7 productizes the V6.1 heart into a multi-tenant API. Any company can plug their own language model in as the vessel and inherit the same validated steering, refusal pathways, and trajectory detection — without retraining and without surrendering their weights.

Four new heart capabilities are in design: third-option generation under refusal pressure, an autonomous research loop, vessel-agnostic adapters for arbitrary tokenisers, and a tenant-scoped constitution layer.

Architecture pass complete (May 2026). Phase 1 implementation in progress. No release date is being promised yet — that would be optimism, not engineering.

Approach

Honest status over optimism. Every capability on this page is one of: validated by a reproducible experiment, partial with the magnitude disclosed, flagged by audit, or not yet built. Partial and not-yet items are listed in the same table as the validated ones because hiding them would make the validated ones less trustworthy, not more.

Engineering, not marketing. The numbers on this page come out of the test suite. Where an audit pass found methodological asymmetries (022, 12 May 2026), the asymmetries are on this page too — listed as audit-flagged with the fix planned.

Project supported by faculty at USIU-Africa.

Contact

Independent project by Balingene N'sii Dan, third-year software engineering student, USIU-Africa, Nairobi.