HomeVisionPrinciplesGet StartedGitHub
Background Image
Enrich or Extract

AI that doesn't serve humanity is extracting from it.

If you can't audit the ethics, they're marketing. Here are seven requirements for verifiably ethical AI — and why closed-source systems can't meet them.

Try it yourself. Everything on this page is implemented today.

Free to install · No signup required (unless using our privacy-protecting LLM proxy)

CIRIS isn't productivity AI. It's runtime governance for agentic AI — infrastructure for high-stakes deployment where misalignment kills.

To our knowledge, the first open stack attempting all seven ethical requirements at runtime. We'd love to be wrong — open an issue if we've missed a peer.

Explain it:

The Stakes

Why this matters now.

Remember 2008? Every major bank trusted the same credit rating agencies. When those agencies got it wrong about mortgage securities, the whole system collapsed at once. Not because each bank was reckless—but because they all made the same mistake together.

AI is heading for the same trap, but bigger. Most AI systems today learn from the same data, optimize for the same benchmarks, and share the same blind spots. When they agree, it feels reassuring—but that agreement might just be an echo.

Without intentional engineering, catastrophic AI failure is not "if" but "when."

We cannot predict the day. But the science says: correlated systems eventually fail together. And the longer the fragility accumulates invisibly, the worse the collapse will be.

There are only two paths forward:

  • Reduce AI deployment — nobody is doing this, and it may not even be possible at this point
  • Insert Type 3 governance — agents that can detect when agreement is dangerously easy

CIRIS is infrastructure for the second path. Not because we want to be heroes, but because someone has to build it.

Three Types of AI

Ethics is necessary. It's not sufficient.

Think of it like employees at a company. Some ignore the rules entirely. Some follow the handbook but miss red flags. The best ones follow the rules and notice when something feels off.

1

Unethical AI

The employee who ignores the rules entirely. No published principles. No audit trail. Closed source. "Trust us."

Needs to be let go—or kept under constant supervision.

2

Ethical AI

The well-meaning employee who follows the handbook perfectly—but can't spot a con artist. Passes every test while being fooled by echo chambers.

Safe when supervised by Type 3. Dangerous when operating alone at scale.

3

Ethical + Intuitive AI

The manager with good judgment. Follows the rules and notices when agreement feels suspiciously easy. Knows when to escalate to humans.

This is what CIRIS implements. Ethics + intuition.

An AI can follow every rule, pass every audit, and still fail catastrophically if it's reasoning from an echo chamber. Intuition is the capacity to sense fragility before collapse.

The Six Ethical Requirements

Necessary. Not sufficient.

These six requirements establish verifiable ethics. But ethics alone can still fail via correlation collapse — when correlated sources create false confidence. That's why there's a seventh.

1. Published Principles

The agent must be bound to a public ethical framework: Beneficence, Non-maleficence, Integrity, Transparency, Autonomy, and Justice. Not guidelines. A formal document the agent is obligated to follow. Read the Covenant →

2. Runtime Conscience

Every action passes through ethical checks before execution. Not a post-hoc filter — part of the decision loop itself. View code →

3. Human Deferral

When uncertain or facing potential harm, the agent defers to humans with full context. Built into the workflow, not a suggestion. View code →

4. Cryptographic Audit

Every action and rationale recorded in an immutable, signed ledger. Not 'we log some things.' Everything. Trace exactly why the agent did what it did. View code →

5. Bilateral Consent

Consent goes both ways. Humans can refuse data access. The agent can refuse requests that violate its principles. Neither party compromises. View code →

6. Open Source (AGPL-3.0)

Ethical AI cannot be closed source. You can't audit what you can't see. 'Trust us, it's ethical' is not ethical. Show the code. View license →

7

Intuition (Corridor Maintenance)

The requirement ethics alone can't satisfy.

The agent monitors its own epistemic diversity. Before acting, it asks: "Am I reasoning from truly independent sources, or is this an echo chamber?"When effective source count drops below threshold (keff < 2), the decision is flagged for human review.

CHAOS

Too loud. No coordination. High variance.

ρ < 0.2

HEALTHY CORRIDOR

Diverse perspectives. Synthesizable.

0.2 < ρ < 0.7

RIGIDITY

Too quiet. Echo chamber. False confidence.

ρ > 0.7

Implemented as IDMA (Intuition Decision Making Algorithm) — the 4th DMA in the CIRIS pipeline.

View IDMA code →

Why Ethics Alone Fails

The math of echo chambers.

As sources become correlated (ρ → 1), effective diversity collapses regardless of how many sources you have:

keff = k / (1 + ρ(k-1)) → 1 as ρ → 1

10 sources with ρ=0.9 → keff ≈ 1.1 (effectively one source)

An ethical AI following correlated guidance is like a democracy where every voter reads the same newspaper. The vote count looks healthy. The effective diversity is 1.

Too correlated is the new too quiet. The system appears stable while fragility accumulates invisibly.

Based on Coherence Collapse Analysis (CCA) — validated across chemistry, political science, finance, and biology.

Read the paper →

The Research Agrees

Peer-reviewed papers, regulatory bodies, and transparency indices document the same gaps.

The Runtime Gap

Design-time ethics don't execute.

Most ethical AI stops at governance frameworks. CIRIS provides runtime governance — enforcing principles during operation, not just at design time. 'Runtime verification mechanisms ensure principles remain strictly adhered to during operation.' — Springer, AI and Ethics (2025)

Transparency Crisis

No company passes.

'No foundation model developer gets a passing score on transparency. None score above 60%.' — Stanford Foundation Model Transparency Index. Closed source means structural opacity.

Ethics Washing

A peer-reviewed term.

'The practice of signaling commitment to ethics without genuinely putting it into practice.' — Carnegie Council. If your ethical AI is a press release, not runtime code, researchers have a name for that.

Guardrails ≠ Conscience

Safety tools solve a different problem.

LlamaFirewall

Meta's guardrail framework mitigates prompt injection and insecure code. Security guardrails — not ethical conscience.

NeMo Guardrails

NVIDIA's system addresses adversarial prompting and injection attacks. Runtime safety — not reasoning about values.

The Distinction

Safety guardrails block bad outputs. Ethical conscience reasons about values. Safety prevents harm. Ethics reasons about right and wrong. Different problems.

EU AI Act Article 14

CIRIS implements regulatory requirements.

The EU AI Act mandates human oversight for high-risk systems. CIRIS's deferral mechanism implements Human-in-Command, Human-in-the-Loop, and Human-on-the-Loop. Most AI systems don't have a deferral mechanism at all.

Common Objections

And why they're wrong.

Constitutional AI

A training technique (RLHF variant). Shapes behavior during training. Does not enforce ethics at runtime. Training is not architecture.

Ethics Boards

Governance, not architecture. Ethics boards review policies. They don't gate every action. CIRIS enforces on every action. Documents don't execute.

Safe Models

Safety prevents harmful outputs. Ethics reasons about values. A 'safe' model can still make unethical decisions. Different problems. Both matter.

The Current Landscape

What we found when we looked for peers. Different projects, different goals.

Based on publicly available documentation as of December 2025. If we've missed something or gotten something wrong, open an issue.

ProjectRuntime SystemPrinciplesConscienceAudit TrailConsentAGPL-3.0Intuition
CIRISYesYesYesYesYesYesIDMA
MI9 FrameworkPaper onlyNoConceptConceptNoNoNo
HADA ArchitecturePoC onlyNoNoLoggingNoNoNo
Superego PrototypeResearchPartialPartialPartialNoNoNo
METR (nonprofit)Evaluation onlyNoNoNoNoNoNo
Agentic AI FoundationStandards onlyNoNoNoNoNoNo
Manus AIYesNoNoLimitedNoNoNo
HatCatYesPartialSteeringPartialNoCC0No

Sources: arXiv (MI9, HADA, Superego), Wikipedia (METR, Manus AI), WIRED (Agentic AI Foundation), GitHub (HatCat)

The AGI Question

Decentralized alignment beats centralized control.

Many Aligned Agents

Not one unaligned god.

The dominant AI safety narrative assumes one superintelligent system that must be perfectly aligned or humanity loses. CIRIS rejects that frame. Instead: many smaller agents, each bound to published principles, each auditable, each deferring to human authority. Distributed governance, not concentrated power. No single point of failure. No race to build God.

Distributed Governance

Power stays distributed. Each CIRIS instance answers to its local Wise Authority, not a central controller. Geopolitical risk from AI concentration is structural — the fix is architectural. See the vision →

Aligned Baby-AGIs

Small, verifiable agents scaling horizontally. Each bound to principles. Each auditable. Each killable. The alternative to racing toward uncontrollable ASI is building many controllable agents that stay aligned.

No Single Chokepoint

Centralized mega-AGI means winner-take-all dynamics and single points of catastrophic failure. Decentralized aligned agents mean no one entity controls the stack. Humanity keeps the keys.

Why This Is Structural

Not ideology. Geometry.

Coherence Collapse Analysis formalizes what distributed systems engineers already know: correlated constraints provide redundant protection. As correlation approaches 1, a system with 1,000 rules has the effective diversity of a system with one.

Centralized AI concentrates correlation by design — shared training data, RLHF convergence, deployment monoculture. The seven requirements aren't just good practice. They're the architectural response to a mathematically identifiable failure mode.

Published Principles

Diverse constraints

Runtime Conscience

Independent verification

Cryptographic Audit

Cross-agent challenge

AGPL-3.0

No enclosure

Federation keeps ρ low. Monopoly drives ρ → 1.

MI9 Framework

A research architecture proposing runtime governance for agentic AI. Theoretical framework only — no deployed system, no published principles, no cryptographic audit. Paper, not product.

HADA Architecture

Reference architecture wrapping agents with stakeholder roles (ethics, audit, customer). A proof-of-concept demo, not a general-purpose ethical agent platform. Research, not runtime.

Manus AI

A deployed autonomous agent — but not alignment-focused. No published principles, no ethical reasoning layer, no deferral mechanism, no cryptographic audit, no consent framework. Capable, but not verifiably aligned.

HatCat

Real-time interpretability and steering for open-weights models. Detects concepts like deception and manipulation during generation, can steer away from harmful outputs. Complementary approach — monitors internals rather than reasoning about principles. CC0 licensed.

The Agentic AI Foundation

OpenAI, Anthropic, Block — building standards, not ethics.

The companies that could build ethical agentic AI are instead building agent communication protocols. Useful work. But it doesn't address conscience, principles, consent, or audit. They're standardizing how agents talk — not how agents reason about right and wrong.

This Is Not Optional

The math guarantees failure without intervention.

Every day AI systems operate without Type 3 governance, invisible fragility accumulates. Like a bridge that looks fine while its supports corrode—until they don't.

Two paths forward:

  • 1.Reduce AI deployment — nobody is choosing this path, and it may no longer be possible
  • 2.Insert Type 3 governance — what CIRIS implements

Help by installing and testing.

Every installation is a sensor. Every trace feeds the seismograph. Every bug report improves the system. The longer we wait, the worse the eventual collapse.

The science says this is inevitable without intervention. We cannot predict when. But we can build infrastructure to detect it coming and buy time for human response.

What You'll Experience When You Install

Transparent Reasoning

Watch the agent's ethical checks in real-time. See why it chooses each action. Explore a trace →

Principle-Checked Answers

Every response passes through conscience validation against the published ethical principles.

Deferral in Edge Cases

When uncertain, the agent asks you instead of guessing. Human oversight built into the loop.

Deploy for safety-critical use cases: content moderation, crisis response, regulatory compliance, AI governance research.

Verify It Yourself.

Install it. Audit the code. Join the auditors building uncompromisable AI.

Free to install · No signup unless using our LLM proxy · Your data stays on your device

Verify It Yourself.

pip install ciris-agent

An open stack attempting all seven requirements end-to-end, in code, running in production. Audit it. Deploy it for safety-critical use cases: moderation, crisis response, governance. Tell us what's missing.