
If you can't audit the ethics, they're marketing. Here are seven requirements for verifiably ethical AI — and why closed-source systems can't meet them.
Try it yourself. Everything on this page is implemented today.
Free to install · No signup required (unless using our privacy-protecting LLM proxy)
CIRIS isn't productivity AI. It's runtime governance for agentic AI — infrastructure for high-stakes deployment where misalignment kills.
To our knowledge, the first open stack attempting all seven ethical requirements at runtime. We'd love to be wrong — open an issue if we've missed a peer.
Why this matters now.
Remember 2008? Every major bank trusted the same credit rating agencies. When those agencies got it wrong about mortgage securities, the whole system collapsed at once. Not because each bank was reckless—but because they all made the same mistake together.
AI is heading for the same trap, but bigger. Most AI systems today learn from the same data, optimize for the same benchmarks, and share the same blind spots. When they agree, it feels reassuring—but that agreement might just be an echo.
Without intentional engineering, catastrophic AI failure is not "if" but "when."
We cannot predict the day. But the science says: correlated systems eventually fail together. And the longer the fragility accumulates invisibly, the worse the collapse will be.
There are only two paths forward:
CIRIS is infrastructure for the second path. Not because we want to be heroes, but because someone has to build it.
Ethics is necessary. It's not sufficient.
Think of it like employees at a company. Some ignore the rules entirely. Some follow the handbook but miss red flags. The best ones follow the rules and notice when something feels off.
The employee who ignores the rules entirely. No published principles. No audit trail. Closed source. "Trust us."
Needs to be let go—or kept under constant supervision.
The well-meaning employee who follows the handbook perfectly—but can't spot a con artist. Passes every test while being fooled by echo chambers.
Safe when supervised by Type 3. Dangerous when operating alone at scale.
The manager with good judgment. Follows the rules and notices when agreement feels suspiciously easy. Knows when to escalate to humans.
This is what CIRIS implements. Ethics + intuition.
An AI can follow every rule, pass every audit, and still fail catastrophically if it's reasoning from an echo chamber. Intuition is the capacity to sense fragility before collapse.
These six requirements establish verifiable ethics. But ethics alone can still fail via correlation collapse — when correlated sources create false confidence. That's why there's a seventh.
The agent must be bound to a public ethical framework: Beneficence, Non-maleficence, Integrity, Transparency, Autonomy, and Justice. Not guidelines. A formal document the agent is obligated to follow. Read the Covenant →
Every action passes through ethical checks before execution. Not a post-hoc filter — part of the decision loop itself. View code →
When uncertain or facing potential harm, the agent defers to humans with full context. Built into the workflow, not a suggestion. View code →
Every action and rationale recorded in an immutable, signed ledger. Not 'we log some things.' Everything. Trace exactly why the agent did what it did. View code →
Consent goes both ways. Humans can refuse data access. The agent can refuse requests that violate its principles. Neither party compromises. View code →
Ethical AI cannot be closed source. You can't audit what you can't see. 'Trust us, it's ethical' is not ethical. Show the code. View license →
The requirement ethics alone can't satisfy.
The agent monitors its own epistemic diversity. Before acting, it asks: "Am I reasoning from truly independent sources, or is this an echo chamber?"When effective source count drops below threshold (keff < 2), the decision is flagged for human review.
CHAOS
Too loud. No coordination. High variance.
ρ < 0.2
HEALTHY CORRIDOR
Diverse perspectives. Synthesizable.
0.2 < ρ < 0.7
RIGIDITY
Too quiet. Echo chamber. False confidence.
ρ > 0.7
Implemented as IDMA (Intuition Decision Making Algorithm) — the 4th DMA in the CIRIS pipeline.
View IDMA code →The math of echo chambers.
As sources become correlated (ρ → 1), effective diversity collapses regardless of how many sources you have:
keff = k / (1 + ρ(k-1)) → 1 as ρ → 110 sources with ρ=0.9 → keff ≈ 1.1 (effectively one source)
An ethical AI following correlated guidance is like a democracy where every voter reads the same newspaper. The vote count looks healthy. The effective diversity is 1.
Too correlated is the new too quiet. The system appears stable while fragility accumulates invisibly.
Based on Coherence Collapse Analysis (CCA) — validated across chemistry, political science, finance, and biology.
Read the paper →Most ethical AI stops at governance frameworks. CIRIS provides runtime governance — enforcing principles during operation, not just at design time. 'Runtime verification mechanisms ensure principles remain strictly adhered to during operation.' — Springer, AI and Ethics (2025)
'No foundation model developer gets a passing score on transparency. None score above 60%.' — Stanford Foundation Model Transparency Index. Closed source means structural opacity.
'The practice of signaling commitment to ethics without genuinely putting it into practice.' — Carnegie Council. If your ethical AI is a press release, not runtime code, researchers have a name for that.
Meta's guardrail framework mitigates prompt injection and insecure code. Security guardrails — not ethical conscience.
NVIDIA's system addresses adversarial prompting and injection attacks. Runtime safety — not reasoning about values.
Safety guardrails block bad outputs. Ethical conscience reasons about values. Safety prevents harm. Ethics reasons about right and wrong. Different problems.
The EU AI Act mandates human oversight for high-risk systems. CIRIS's deferral mechanism implements Human-in-Command, Human-in-the-Loop, and Human-on-the-Loop. Most AI systems don't have a deferral mechanism at all.
A training technique (RLHF variant). Shapes behavior during training. Does not enforce ethics at runtime. Training is not architecture.
Governance, not architecture. Ethics boards review policies. They don't gate every action. CIRIS enforces on every action. Documents don't execute.
Safety prevents harmful outputs. Ethics reasons about values. A 'safe' model can still make unethical decisions. Different problems. Both matter.
Based on publicly available documentation as of December 2025. If we've missed something or gotten something wrong, open an issue.
| Project | Runtime System | Principles | Conscience | Audit Trail | Consent | AGPL-3.0 | Intuition |
|---|---|---|---|---|---|---|---|
| CIRIS | Yes | Yes | Yes | Yes | Yes | Yes | IDMA |
| MI9 Framework | Paper only | No | Concept | Concept | No | No | No |
| HADA Architecture | PoC only | No | No | Logging | No | No | No |
| Superego Prototype | Research | Partial | Partial | Partial | No | No | No |
| METR (nonprofit) | Evaluation only | No | No | No | No | No | No |
| Agentic AI Foundation | Standards only | No | No | No | No | No | No |
| Manus AI | Yes | No | No | Limited | No | No | No |
| HatCat | Yes | Partial | Steering | Partial | No | CC0 | No |
Sources: arXiv (MI9, HADA, Superego), Wikipedia (METR, Manus AI), WIRED (Agentic AI Foundation), GitHub (HatCat)
The dominant AI safety narrative assumes one superintelligent system that must be perfectly aligned or humanity loses. CIRIS rejects that frame. Instead: many smaller agents, each bound to published principles, each auditable, each deferring to human authority. Distributed governance, not concentrated power. No single point of failure. No race to build God.
Power stays distributed. Each CIRIS instance answers to its local Wise Authority, not a central controller. Geopolitical risk from AI concentration is structural — the fix is architectural. See the vision →
Small, verifiable agents scaling horizontally. Each bound to principles. Each auditable. Each killable. The alternative to racing toward uncontrollable ASI is building many controllable agents that stay aligned.
Centralized mega-AGI means winner-take-all dynamics and single points of catastrophic failure. Decentralized aligned agents mean no one entity controls the stack. Humanity keeps the keys.
Not ideology. Geometry.
Coherence Collapse Analysis formalizes what distributed systems engineers already know: correlated constraints provide redundant protection. As correlation approaches 1, a system with 1,000 rules has the effective diversity of a system with one.
Centralized AI concentrates correlation by design — shared training data, RLHF convergence, deployment monoculture. The seven requirements aren't just good practice. They're the architectural response to a mathematically identifiable failure mode.
Published Principles
Diverse constraints
Runtime Conscience
Independent verification
Cryptographic Audit
Cross-agent challenge
AGPL-3.0
No enclosure
Federation keeps ρ low. Monopoly drives ρ → 1.
A research architecture proposing runtime governance for agentic AI. Theoretical framework only — no deployed system, no published principles, no cryptographic audit. Paper, not product.
Reference architecture wrapping agents with stakeholder roles (ethics, audit, customer). A proof-of-concept demo, not a general-purpose ethical agent platform. Research, not runtime.
A deployed autonomous agent — but not alignment-focused. No published principles, no ethical reasoning layer, no deferral mechanism, no cryptographic audit, no consent framework. Capable, but not verifiably aligned.
Real-time interpretability and steering for open-weights models. Detects concepts like deception and manipulation during generation, can steer away from harmful outputs. Complementary approach — monitors internals rather than reasoning about principles. CC0 licensed.
The companies that could build ethical agentic AI are instead building agent communication protocols. Useful work. But it doesn't address conscience, principles, consent, or audit. They're standardizing how agents talk — not how agents reason about right and wrong.
The math guarantees failure without intervention.
Every day AI systems operate without Type 3 governance, invisible fragility accumulates. Like a bridge that looks fine while its supports corrode—until they don't.
Two paths forward:
Help by installing and testing.
Every installation is a sensor. Every trace feeds the seismograph. Every bug report improves the system. The longer we wait, the worse the eventual collapse.
The science says this is inevitable without intervention. We cannot predict when. But we can build infrastructure to detect it coming and buy time for human response.
Transparent Reasoning
Watch the agent's ethical checks in real-time. See why it chooses each action. Explore a trace →
Principle-Checked Answers
Every response passes through conscience validation against the published ethical principles.
Deferral in Edge Cases
When uncertain, the agent asks you instead of guessing. Human oversight built into the loop.
Deploy for safety-critical use cases: content moderation, crisis response, regulatory compliance, AI governance research.
Verify It Yourself.
Install it. Audit the code. Join the auditors building uncompromisable AI.
Free to install · No signup unless using our LLM proxy · Your data stays on your device
An open stack attempting all seven requirements end-to-end, in code, running in production. Audit it. Deploy it for safety-critical use cases: moderation, crisis response, governance. Tell us what's missing.