
If you can't check the ethics, they're marketing. Here's what to look for — and how existing approaches compare.
Ethics is necessary. It's not sufficient.
Some AI has no rules at all. Some follows rules but can't tell when its sources are just echoing each other. Only one type checks whether its information actually comes from different places.
No published principles. No audit trail. Closed source. You can't check what it did or why.
Requires external regulation. Cannot govern itself.
Follows ethical rules. But can't tell when all its sources are just copying each other — so it can be confidently wrong.
Safe when supervised. Can't detect echo chambers on its own.
Follows ethical rules AND checks whether its information comes from genuinely different places. When agreement looks suspicious, it flags it before acting.
This is what CIRIS builds.
An AI can follow every rule, pass every audit, and still fail if all its information comes from the same place. That blind spot is what CIRIS was built to fix.
These are the things that make AI verifiably ethical. The first six are about doing the right thing. The seventh is about catching the situations where 'doing the right thing' is based on bad information.
The agent must follow a public ethical framework. Not hidden rules — a document anyone can read and hold it accountable to.
Every action goes through an ethics check before the agent does it. Not after the fact — before.
When uncertain or facing potential harm, the agent asks a person instead of guessing. Built into the workflow, not optional.
Every decision is recorded and signed so you can verify exactly what happened and why. A receipt for every action.
Consent goes both ways. You can say no to the agent. The agent can say no to you. Neither side is forced to compromise.
You can't audit what you can't see. CIRIS is fully open source under AGPL-3.0 — anyone can read, verify, and improve the code.
The thing rules alone can't catch.
Before acting, the agent asks: "Do my sources actually disagree with each other, or are they all getting their information from the same place?"Ten sources that all copied from the same original are really just one source. When agreement looks too uniform, the agent flags it for a person to review.
Too Noisy
Sources contradict each other so much that nothing useful can be concluded.
Healthy
Sources genuinely differ. Real agreement means something.
Echo Chamber
Looks like agreement, but sources are just repeating each other.
This is what makes CIRIS different from other ethical AI frameworks.
Want the math? Read the full thesis →The echo chamber problem.
As sources start copying each other, the number of truly independent viewpoints collapses — even if you have ten sources on paper.
Ten sources that all read the same report? That's really one source counted ten times.
An ethical AI following copied guidance is like a democracy where every voter reads the same newspaper. The vote count looks healthy. The actual number of viewpoints is one.
Agreement only means something when the sources are actually independent.
This problem shows up everywhere — from financial markets to scientific peer review to social media.
Read the full thesis →Based on publicly available documentation as of February 2026. If we've missed something or gotten something wrong, let us know.
| Project | Checks Every Decision | Published Rules | Ethics Built In | Proof of What It Did | Open Source | Echo Chamber Detection |
|---|---|---|---|---|---|---|
| CIRIS | Yes | Yes | Yes | Yes | AGPL-3.0 | Yes |
| Constitutional AI | Training only | Implicit | No | No | No | No |
| LlamaFirewall / NeMo Guardrails | Yes | No | No | Logging | Yes | No |
| HatCat | Yes | Partial | Steering | Partial | CC0 | No |
| Ethics Boards / Governance Frameworks | No | Yes | No | Manual | Varies | No |
Guardrails and governance frameworks solve important but different problems. Safety blocks harmful outputs. Ethics reasons about values. CIRIS aims to do both — and catch the blind spots that neither addresses alone.
Block dangerous outputs — prompt injection, harmful content, adversarial attacks. Like a filter that catches bad things on the way out.
Reasons about whether an action is right, not just whether it's safe. Like a judge weighing the situation before making a call.
Checks whether agreement is real or just repetition. Like a fact-checker who asks "did you all read the same article?"
Many smaller agents, each bound to published principles, each auditable, each deferring to human authority. No single company or entity controls the whole stack. The more independent the agents, the harder it is for any one failure to cascade.
This is active research. We're transparent about what's established and what's still being tested.
Well-established
Still being tested
Every claim on this page is backed by code you can read, traces you can verify, and research you can check. That's the point.