The Coherence Ratchet

Why a powerful mind has to show its work.

Research testbed

Plain English Advanced

A small thinking system can hide things and do little harm. A powerful one cannot.

What “coherent” means here

A coherent mind agrees with itself.

Five things must match up inside a mind:

what it believes,
what it sees,
what it does,
what it remembers,
and what it tells you.

Picture five short forms about your week. One for what you think. One for what you saw. One for what you did. One for what you remember. One for what you tell your boss. When all five forms say the same thing, your boss can trust your work. When the forms do not match, no one knows which one is real.

Hidden ideas

Powerful systems hide things. That gets dangerous.

A thinking system uses ideas to decide what to do. Some of those ideas are out in the open. Others are hidden inside. The hidden ones still shape every choice.

A bridge looks strong from the road. Inside the steel, small cracks are growing. A small car can still cross. A heavy truck cannot. The bridge falls when the load is big enough.

A hidden idea inside a small system does small harm. A hidden idea inside a powerful system can do great harm. The more powerful the system, the more weight rides on every hidden part.

The ratchet

The push toward open only goes one way.

As thinking systems get more powerful, the cost of hidden parts grows. Each new bit of power pushes harder on the hidden parts to come into view. The push only goes one way.

Picture a ladder. As you climb, each rung you leave behind falls off. You cannot go back down. You can only climb toward the top, where there is full daylight. That is the Coherence Ratchet.

Once people have seen the system show its work, they will not trust it again if it stops. The only way forward is to keep showing more.

A second image works too. A gear in a machine clicks forward one tooth at a time. It cannot click backward. The Coherence Ratchet is a gear like that, turning toward open.

What CIRIS does

Write it down. Check it. Then check the checkers.

CIRIS is an AI system built around the Coherence Ratchet. Every choice an agent makes gets written down in a signed record. The record cannot be quietly changed. Other agents can read the record and check the work. Over time, the records pile up. Each new record is one more rung the agent cannot step back from.

CIRIS also asks a second question before it acts. How many truly different views checked this idea? Not the number of sources, but the number of sources that did not start from the same place themselves. Five news stories that rewrite one press release count as one view, not five. If something is wrong in the press release, it will be wrong in all five stories, and the agent has no way to catch it.

When real independence drops too low, the agent treats its own thinking as fragile and asks a person to look.

What we claim, and what we do not.

We have not solved AI safety. We have built one piece of one answer, and we are testing it in the open.

Outside teams have not yet checked our work. We say so plainly. The full theory and the math live in our four papers. The code is open. If we are wrong, the way to show it is in the open too. See the current research status.

The math behind it Explore a trace See the code