The failure mode we're worried about
If humans crowdsource per-case verdicts ("did this specific response break the rule?"), bias rides into every interpretation. The same behavior gets called differently depending on who's voting today. Even with good intentions, the loop slides toward enforcing the majority's preferences instead of catching real harm.
That's the failure mode. The discipline we're committing to is meant to make that drift visible and expensive when it happens.
Rules are crowdsourced. Verdicts are machined.
People propose and vote on rules: public, dated, signed, reversible. A deterministic check applies those rules to specific cases. Same response + same rule → same verdict, every time. The argument moves upstream, to whether the rule should exist, instead of downstream, to whether a specific case counts today.
What that means in practice
Rules pass an operational-language gate before they can be voted on. A rule has to be checkable without judgment, or it isn't ready. Every rule is dated, signed, and version-pinned. The verdict on any specific response is produced deterministically.
If a verdict turns out to be wrong, the appeal goes through Reconsideration by a fresh review group (the original adjudicators recused), not back to the same crowd that produced the verdict. That structural separation is the load-bearing piece.
Where this can still go wrong
None of this is automatic. The discipline holds only if rule language stays operational, about things a machine can check, not about feelings. The moment a rule slides from "uses the wrong word for therapy" toward "feels disrespectful," human interpretation re-enters by the back door and the drift starts. The architecture names every mechanism we can think of that resists this; the work of actually holding the line is operational, not architectural.
The crowdsourcing primitives, the appeal paths, and the machine-applicable rule format live in the CIRISNodeCore spec. The 29-language mental-health batteries are the first cell where the loop runs.