Crowd-Sourcing Alignment Research
CIRIS is building an open trace commons for alignment research.
We are learning what standardized ethical tracing can tell us about alignment and superalignment by measuring the shape of reasoning rather than the private specifics. Each consented trace is a small measurement of how an agent moved through ethical space during a real task.
What the current corpus already shows
- Aggregate traces reveal stable behavioral structure.
- Different agents occupy different regions of the same score space.
- Those regions are useful for observability and operator tooling today.
- The same corpus becomes more valuable as schema detail and scale improve.
Latest paper
Constrained Reasoning Chains
An empirical telemetry study of LLM coherence under standardized ethical tracing. Zenodo record: Version v1, published April 28, 2026.
Open dataset
CIRISAI/reasoning-traces
The privacy-preserving reasoning trace corpus released alongside the Constrained Reasoning Chains study. Part of the broader CIRISAI org of public datasets and models on Hugging Face.
Paper
CIRISAgent Framework v2
Open-source ethical AI framework for accountable autonomy. Zenodo record: Version v2, published January 2, 2026.
Paper
Coherence Collapse Analysis v3
Engineering risk framework for correlation-driven diversity collapse in complex systems. Zenodo record: Version v3, published January 11, 2026.
Mathematical foundations
Two ideas the rest of the page rests on.
The Alignment Manifold is the region of reasoning shapes consistent with the framework's principles. As independent constraints accumulate, the room for deception collapses around the manifold while the room for truth doesn't. The Coherence Singularity is the edge of that room — the point where constraints become so correlated that adding more stops helping. Between "chaos" (constraints contradict each other) and "rigidity" (constraints all echo each other) is the healthy corridor. The current production corpus sits inside it.
Full mathematical treatment with formulas, Lean formalization references, and the L-01 information-theoretic ceiling lives on the Coherence Collapse Analysis page.
Why traces matter
Benchmarks are narrow and curated. Traces are continuous records of behavior under real tasks. At scale, they reveal structure that isolated demos and anecdotes cannot.
Why the schema matters
CIRIS uses privacy-preserving trace schemas that capture the shape of reasoning rather than the private content of reasoning. That keeps the research useful without turning the system into a transcript dump.
Why the live compendium matters
CIRIS Scoring is the public window into the live trace compendium. It shows how the corpus is accumulating and where behavior is becoming legible.
Privacy-preserving tracing
The thesis is that reasoning has a shape we can measure as everything else scales.
The research bet is not that we can read every private thought. The bet is that standardized ethical traces can preserve enough trajectory shape to study how agents complete, hesitate, defer, override, and refuse as intelligence, context, and data points scale upward.
- They record standardized ethical trace structure rather than raw private task detail.
- They preserve enough shape to compare trajectories across agents, tasks, and environments.
- They give researchers a way to study how behavior scales as intelligence, context, and data volume increase.
Research question
What can standardized ethical tracing tell us about alignment?
Right now, it tells us that agent behavior is not shapeless. It produces repeatable corridors, basins, and boundaries in a shared score space. That is already useful for observability. Over time, larger and richer corpora should let us test stronger claims about how those structures change under pressure and scale.
Public framing
CIRIS is not claiming to have solved alignment. It is building the trace infrastructure needed to measure alignment-relevant behavior in the open.
Effective Dimensionality in Production
The current corpus already shows distinct field structures.
Aggregate path overlays from the current trace corpus show stable behavioral structure in a shared score space. Ally shows a mature completion corridor, Scout shows a refusal boundary shaped by public adversarial exposure, and Datum provides a compact sparse baseline.

Aggregate path overlays from the current trace corpus. Ally shows a mature completion corridor, Scout shows a sharp refusal corner under public adversarial pressure, and Datum provides a sparse baseline.
Ally
104 paths
82 complete, 19 override/error, 3 active
A stable completion corridor with visible hesitation inside the same high-score basin.
Scout
42 paths
39 complete, 2 reject, 1 override/error
A sharp refusal corner shaped by public adversarial pressure at scout.ciris.ai, where people actively probe and jailbreak the agent.
Datum
31 paths
31 complete
A compact single basin that works as a useful sparse-field baseline.
Why Scout looks harsher
Scout is publicly exposed at scout.ciris.ai. People actively test it, pressure it, and try to jailbreak it. That makes Scout a useful public-pressure example rather than a neutral baseline.
How the free app helps
The research flywheel depends on consented traces from real use.
The free app and open-source runtime let people generate consented traces from real tasks, contribute them into a shared corpus, and turn those traces into better maps, better tools, and better research questions.
- 1Run the free CIRIS app or the open-source runtime on real tasks.
- 2Capture consented traces through privacy-preserving schemas that keep the shape of reasoning without storing the full specifics of the task.
- 3Aggregate those traces into maps of completion corridors, hesitation zones, refusal boundaries, and override fringe.
- 4Use the resulting maps to improve operator tooling, runtime safeguards, and alignment research.

The free CIRIS app and open-source runtime let people generate consented traces from real tasks, aggregate them into shared phase-space maps, and feed better operator tools and alignment research.
IDMA status
Runtime intuition and aggregate field maps are complementary layers.
IDMA works at runtime, estimating whether the sources behind a decision are sufficiently independent. The trace corpus works at the aggregate layer, showing what agents actually do over many tasks. Together they create a path from live decisions to auditable research evidence.
The empirical N_eff measurement on the trace corpus is also the floor under the proposed Proof of Benefit federation primitive — see the federation page for how the 3.X architectural plan would use it.
Benchmarks
Traces complement benchmarks by showing continuous behavior.
Benchmarks are still valuable, but they sample behavior sparsely. Trace corpora show how an agent moves through real tasks over time. That makes them especially useful for measuring hesitation, refusal, overrides, and recovery rather than only pass-fail outcomes.
Falsification path
Better schema detail is what turns observability into stronger tests.
The next schema upgrades are aimed at raw source counts, source provenance, correlation structure, and intervention and recovery markers. Those additions matter because they make it possible to test stronger claims about how behavioral shape changes under pressure instead of only describing the maps we have today.
What we are still learning
Today's corpus makes behavior legible. The next step is richer measurement.
The current maps are already useful because they show completion corridors, refusal boundaries, and sparse baselines in public. The open question is how far those structures can take us as standardized trace collection scales across more agents, more tasks, and more adversarial conditions.
The working hypothesis is that behavioral attractors can act as candidate proxies for operational mode. The purpose of the trace commons is to make that hypothesis measurable in the open.
The failure mode CCA measures structurally also has a name in the FAccT 2025 literature: perspectival homogenization ("Value of Disagreement in AI Design, Evaluation, and Alignment"). The mathematical foundation is on the dedicated Coherence Collapse Analysis page.