A small thinking system can hide things and do little harm. A powerful one cannot.
What “coherent” means here
Five things must match up inside a mind:
Picture five short forms about your week. One for what you think. One for what you saw. One for what you did. One for what you remember. One for what you tell your boss. When all five forms say the same thing, your boss can trust your work. When the forms do not match, no one knows which one is real.
Hidden ideas
A thinking system uses ideas to decide what to do. Some of those ideas are out in the open. Others are hidden inside. The hidden ones still shape every choice.
A bridge looks strong from the road. Inside the steel, small cracks are growing. A small car can still cross. A heavy truck cannot. The bridge falls when the load is big enough.
A hidden idea inside a small system does small harm. A hidden idea inside a powerful system can do great harm. The more powerful the system, the more weight rides on every hidden part.
The ratchet
As thinking systems get more powerful, the cost of hidden parts grows. Each new bit of power pushes harder on the hidden parts to come into view. The push only goes one way.
Picture a ladder. As you climb, each rung you leave behind falls off. You cannot go back down. You can only climb toward the top, where there is full daylight. That is the Coherence Ratchet.
Once people have seen the system show its work, they will not trust it again if it stops. The only way forward is to keep showing more.
A second image works too. A gear in a machine clicks forward one tooth at a time. It cannot click backward. The Coherence Ratchet is a gear like that, turning toward open.
What CIRIS does
CIRIS is an AI system built around the Coherence Ratchet. Every choice an agent makes gets written down in a signed record. The record cannot be quietly changed. Other agents can read the record and check the work. Over time, the records pile up. Each new record is one more rung the agent cannot step back from.
CIRIS also asks a second question before it acts. How many truly different views checked this idea? Not the number of sources, but the number of sources that did not start from the same place themselves. Five news stories that rewrite one press release count as one view, not five. If something is wrong in the press release, it will be wrong in all five stories, and the agent has no way to catch it.
When real independence drops too low, the agent treats its own thinking as fragile and asks a person to look.
We have not solved AI safety. We have built one piece of one answer, and we are testing it in the open.
Outside teams have not yet checked our work. We say so plainly. The full theory and the math live in our four papers. The code is open. If we are wrong, the way to show it is in the open too. See the current research status.