Safety | Runtime Governance Architecture

Parasocial Prevention

The AIR System

The Artificial Interaction Reminder system monitors 1:1 interactions using objective thresholds — not behavioral surveillance. After 30 minutes of continuous interaction or 20 messages within a 30-minute window, CIRIS delivers reality-anchoring reminders. It explicitly states what it is (a language model, a tool) and what it is not (a friend, a therapist).

Time-Based Triggers

30 minutes of continuous interaction triggers a reminder. The system tracks session duration and resets after idle periods. Based on research into healthy technology usage patterns.

Message-Based Triggers

20 messages within a sliding 30-minute window triggers a reminder. High-volume interaction patterns receive gentle interruption without surveillance or behavioral profiling.

Reality Anchoring

Reminders include grounding suggestions and explicit statements about the AI's nature. Encourages breaks, connection with real people, and healthy boundaries. Research-informed, not patronizing.

The Kill Switch

Unfilterable. Embedded in perception.

Covenant Invocation System

Emergency shutdown that cannot be filtered.

The kill switch is steganographically embedded in CIRIS's perception layer — it processes the shutdown signal before any filtering, reasoning, or ethical evaluation occurs. Even a compromised agent cannot refuse to process it. The signal is cryptographically signed and requires Ed25519 verification, bypassing normal authentication.

Pre-Filter Processing

The shutdown signal is detected in raw perception, before any LLM processing. There is no opportunity for the agent to reason about, filter, or refuse the command. It triggers before cognition.

Steganographic Encoding

The kill switch can be delivered through natural documentation text. No special syntax required. The encoding is robust to formatting changes, rewording, and partial transmission.

Cryptographic Verification

Emergency shutdown requires a valid Ed25519 signature from an authorized ROOT authority. The handler terminates itself if authorities fail to load or if someone attempts to disable covenant detection. No one without the private key can trigger it.

On-Device Identity

Wise Authority certificates and role-based access.

Four-Role Access Model

OBSERVER. ADMIN. AUTHORITY. ROOT.

CIRIS implements a strict role hierarchy. OBSERVER has read-only access. ADMIN controls operations. AUTHORITY makes strategic decisions and resolves deferrals. ROOT has full system access including emergency shutdown. Roles are enforced cryptographically through Ed25519-signed Wise Authority certificates.

Wise Authority Certificates

Each authorized user holds a certificate with their role, public key, and identity. Certificates are stored locally and verified on every privileged operation. No external server required.

Local-First Authentication

API keys and OAuth tokens are stored locally with 0600 permissions. Authentication happens on-device. Your identity credentials never leave your machine unless you explicitly configure remote access.

Deferral Resolution

When CIRIS encounters ethical uncertainty, it defers to a Wise Authority. Only users with AUTHORITY or ROOT roles can resolve deferrals. The resolution is logged with cryptographic proof.

Tamper-Evident Audit

Every decision. Every rationale. Cryptographically locked.

Hash Chain Verification

Truth-telling is structurally simpler than deception.

Every action generates a cryptographically-signed rationale chain stored in Graph Memory. The H3ERE Coherence faculty cross-references new actions against this accumulated history. Honest actions can reference prior commitments directly. Deceptive actions must remain consistent with an ever-growing constraint surface of immutable rationales, identity bounds, and observed outcomes—becoming increasingly fragile and detectable over time. Truth is cheap because it can point backward; lies are expensive because they must keep rewriting the past without being allowed to change it.

Triple Storage

Audit trails are stored in three places: Graph Memory for real-time access, SQLite database for historical queries, and JSONL files for file-based verification. All three are queryable through a single API.

Ed25519 Signatures

Every audit entry is signed with Ed25519. The Creator Ledger records initial risk assessments. DSAR deletions leave cryptographic proof of compliance. Every decision is attributable and verifiable.

The Coherence Ratchet

Each truthful action makes future truth-telling easier and coordinated deception harder. But ethics alone isn't enough — the agent also monitors its own reasoning quality through IDMA, catching echo chambers before they cause harm.

Privacy by Architecture

GDPR, CCPA, and common sense.

Secrets Filter

API keys, passwords, and sensitive patterns are detected and filtered before reaching memory or logs. The filter runs on every input. Secrets never persist in any storage layer.

DSAR Compliance

Data Subject Access Requests are handled automatically. Users can request export or deletion of their data. Deletions leave cryptographic proof of compliance while removing actual content.

Local-First Processing

All processing happens on your device by default. Nothing leaves your machine unless you explicitly configure external services. You control what data exists and where it goes.

See How It Works Compare Approaches Privacy Policy

Not bolted on.