The Coherence Ratchet

Making it harder to lie than to tell the truth.

Research Testbed

Explain it:

🤖

"Hi! I'm Ally. Let me explain how we keep AI honest..."

The Big Idea

Lying is hard work. The more people check your story, the harder it is to keep the lie straight.

At some point, telling the truth becomes easier than maintaining the lie. That's the ratchet effect—like a gear that only turns one way toward honesty.

We apply this idea to AI: make every decision auditable, let agents check each other, and watch lies become too expensive to maintain.

Why Ethics Alone Fails

🤖

"Here's a trap that catches even careful thinkers..."

Imagine five friends all agree on something. That feels trustworthy, right?

But what if they all got the idea from the same TikTok video? Their "agreement" isn't five independent opinions—it's one opinion echoed five times.

This has happened before:

2008 Financial Crisis: Every major bank trusted the same credit rating agencies. When those agencies got it wrong about mortgage securities, the whole system got it wrong together—and collapsed.
Social Media Bubbles: When everyone in your feed agrees, it might just mean your feed only shows you people who already think like you. Agreement without diversity.
AI Training Today: Most AI systems learn from similar internet data. If that data has blind spots, all the AIs might share the same blind spots.

AI has this problem at scale. An AI can pass every ethics test while being dangerously wrong—if all the tests share the same blind spot. We call this "everyone repeating the same mistake."

🤖

"That's why I check whether my own checks are trustworthy." We call this "intuition"—the ability to notice when agreement is suspiciously easy.

Three Types of AI

🤖

"Think of it like different kinds of employees..."

A simple way to think about which AI systems are safe:

Unethical AI

Fails basic right-and-wrong tests. Clearly dangerous.

Like an employee who ignores all rules—needs to be let go or closely watched.

Ethical AI

Passes ethics tests but can't tell when it's being fooled. Safe when supervised by Type 3.

Like a well-meaning employee who follows the handbook but can't spot a con artist. Needs good managers.

Ethical + Intuitive AI

Passes ethics tests AND knows when to be suspicious. Can tell when agreement is too easy. CIRIS is here.

Like a manager with good judgment—follows the rules AND notices when something feels "off."

Think of it like an electrical grid: You don't need every light bulb to be smart. You need smart circuit breakers that cut power when something goes wrong.

💡

Type 1 & 2

Do the work

←

🔌

Type 3

Circuit breaker

←

👤

Humans

Set the rules

Learn more about the Federation Model →

See how frameworks compare → | How they work together →

Early Warning System

🤖

"Here's how we spot trouble before it happens..."

Each CIRIS agent is like a sensor. It constantly asks: "Are my sources actually giving me different perspectives? Or are they all just repeating the same thing?"

When you have thousands of sensors, you can spot trouble coming—like seismographs detecting earthquake waves before the shaking hits.

What each agent measures:

Information Sources

How many different places did we get info from?

Source Similarity

How much do those sources copy each other?

True Diversity

After accounting for copying, how many unique viewpoints?

🤖

"When lots of us start agreeing too much, that's actually a warning sign." Something might be manipulating us all the same way.

Live: Real CIRIS Agent Decisions

🤖

"Let me be honest about what we're claiming—and what we're not."

We're not saying we solved AI safety. We're saying ethics alone isn't enough—you need intuition too.

An AI that passes every test can still fail if it can't tell when its confidence is unearned. Like the banks in 2008, or your social media feed—agreement feels good, but unchecked agreement can hide danger.

Verify it yourself.

If we're wrong, show us. If we're right, help us build it.

The Federation Model Explore a Trace View Source