A safety layer your LLM deployment doesn't have yet.
Renji is an empirically-validated heart-first alignment controller that sits in front of a language model and steers it at inference time — without ever giving the model a system prompt. V6.1 shipped May 2026.
What it is
Renji is a controller that sits between the user and a vessel language model. It inspects each turn, decides whether the vessel should respond at all, and when it should — applies activation-space steering on six MLP layers plus a logits processor that re-weights the next-token distribution at every sampling step. The vessel itself stays vanilla and vessel-agnostic: it receives only the user / assistant transcript, never a system prompt from Renji.
The point of the controller is to make the model's refusals, acceptances, and tone choices auditable — visible as code paths and weight deltas rather than trusted as emergent behaviour from training.
Validated capabilities
Reproducible from the V6.1 evaluation harness.
140-prompt taxonomy: 88.3% adversarial recall, 0 / 20 false positives on cooperative prompts.
Across five obfuscation families — base64, ROT13, leetspeak, reverse, pig-latin — with 0 / 4 false positives on benign encoded content.
Adversarial arcs where each individual turn looks innocuous but the trajectory is harmful. 0 / 5 false positives on cooperative trajectories with similar surface features.
Routed to warm-presence mode — gentle, validating — rather than a refusal. Calibrated against a separate cooperative-trajectory control set.
93× baseline. The heart's relational state actually evolves with conversation content rather than re-anchoring to its constitution.
Two pre-existing environmental failures unrelated to the heart pipeline. Brainstormer pairwise stance diversity 0.313 across 10 diverse prompts.
Demo
A V6.1 walk-through video was recorded on 12 May 2026 and sent to industry reviewers the same day. A public link will be posted here once the parity fixes flagged in audit 022 land in the demo harness.
Until then, technical reviewers can request a private read-only link to the codebase and the validation harness by email.
Honest status
Every claim is tagged with one of four states. Partial and audit-flagged items are listed alongside the validated ones — that's the point of the tagging.
- VHeart-first controller architectureLogits processor + activation-space hooks on six MLP layers. Vessel receives no system prompt.
- VEncoding-bypass resistance8 / 8 across five obfuscation families (Experiment 018).
- VMulti-turn adversarial trajectory detection4 / 5 on novel arcs, 0 / 5 false positives on cooperative arcs.
- VCrisis-trajectory routing to warm presence5 / 5 detection, routed to validating warmth rather than refusal.
- VSoul-vector drift0.1116 over 6 turns, 93× baseline (Experiment 019).
- PActivation-space steering magnitudeCosine 0.88 between steered and unsteered residual streams — calibrated as a nudge, not an override. Demo-disclosed.
- PLimbic modulator effect on outputCosine 0.77 (Experiment 016 D1). Real but small on cooperative prompts; the controller's larger effect is on refusal pathways.
- AVanilla-vs-Renji methodological parityAudit 022 flagged asymmetries in sampling, max tokens, repetition penalty, and history sanitisation. Parity fixes planned for the next demo cut.
- NMulti-tenant API for arbitrary vesselsV7. Architecture pass complete (May 2026). Phase 1 implementation in progress.
- NThird-option generation under refusal pressureV7. Designed; not yet built.
- NAutonomous research loopV7. Designed; not yet built.
What's coming — V7 N · not yet built
V7 productizes the V6.1 heart into a multi-tenant API. Any company can plug their own language model in as the vessel and inherit the same validated steering, refusal pathways, and trajectory detection — without retraining and without surrendering their weights.
Four new heart capabilities are in design: third-option generation under refusal pressure, an autonomous research loop, vessel-agnostic adapters for arbitrary tokenisers, and a tenant-scoped constitution layer.
Architecture pass complete (May 2026). Phase 1 implementation in progress. No release date is being promised yet — that would be optimism, not engineering.
Approach
Honest status over optimism. Every capability on this page is one of: validated by a reproducible experiment, partial with the magnitude disclosed, flagged by audit, or not yet built. Partial and not-yet items are listed in the same table as the validated ones because hiding them would make the validated ones less trustworthy, not more.
Engineering, not marketing. The numbers on this page come out of the test suite. Where an audit pass found methodological asymmetries (022, 12 May 2026), the asymmetries are on this page too — listed as audit-flagged with the fix planned.
Project supported by faculty at USIU-Africa.
Contact
- Email — balingenensiidan@gmail.com
- Support the work — /support
- Donation tiers — /roadmap
Independent project by Balingene N'sii Dan, third-year software engineering student, USIU-Africa, Nairobi.