CIRIS - Ethical AI Framework

What is CIRIS?

CIRIS is an open-source AI agent framework that wraps any LLM (OpenAI, Anthropic, local models) with runtime ethical governance. Every action the agent considers passes through multiple validation layers before execution.

Pipeline steps per decision

Intuition check (IDMA)

100%

Auditable decisions

Use cases: Community moderation, personal assistants, compliance automation, research evaluation, customer service—anywhere you need AI that can explain its reasoning and defer to humans on edge cases.

Why this approach? →Read the Accord →View source code →

The Three Rules

Architectural invariants enforced throughout the codebase:

No Untyped Dicts

All data uses Pydantic models. No Dict[str, Any]. Type safety catches errors at development time.

No Bypass Patterns

Every component follows consistent rules. No special cases or exceptions in validation logic.

No Exceptions

No emergency overrides or privileged code paths. All operations follow established rules.

The H3ERE Pipeline

Every task flows through 8 phases (12 steps including recursive validation). The pipeline is implemented as mixin classes composing the ThoughtProcessor. Step 4 (IDMA) is the intuition check.

H3ERE Pipeline Visualization showing the flow from task input through DMA analysis, conscience validation, and action execution

START_ROUNDInitialize processing round

GATHER_CONTEXTBuild comprehensive context for analysis

PERFORM_DMASRun 3 parallel Decision-Making Algorithms

PERFORM_IDMAIntuition check — are sources truly independent?

PERFORM_ASPDMALLM-powered action selection from DMA results

CONSCIENCE_EXECUTIONEthical validation through 4 faculties

RECURSIVE_ASPDMA(If conscience failed) Re-run action selection(conditional)

RECURSIVE_CONSCIENCE(If needed) Re-validate refined action(conditional)

FINALIZE_ACTIONDetermine final action with any overrides

PERFORM_ACTIONDispatch to appropriate handler

ACTION_COMPLETEMark execution complete

ROUND_COMPLETECleanup and prepare for next cycle

Decision-Making Algorithms (DMAs)

Five DMAs provide multi-perspective analysis. Three run in parallel (ethics, common sense, domain). Then IDMA checks intuition. Finally, action selection chooses the response.

Principle DMA

Evaluates ethical implications against the six foundational principles. Checks for harmful requests, manipulation, bias. Can trigger early DEFER or REJECT.

Common Sense DMA

Evaluates thought plausibility against physical world understanding, typical interactions, and resource constraints. Returns a plausibility_score.

Domain-Specific DMA

Assesses alignment with domain-specific rules, objectives, and knowledge. Configured per agent template. Returns domain_alignment.

Intuition DMA (IDMA) Learn more →

Checks whether DMA results are truly independent or secretly correlated. Monitors for echo chamber patterns. If correlation is too high, flags the decision for caution. This is the "intuition" that distinguishes Type 3 AI.

Action Selection DMA

LLM-powered selection from 10 available actions. Takes aggregated DMA results (including IDMA confidence assessment) as input. Supports recursive retry if conscience fails.

The Six Conscience Checks

Two bypass guardrails run on every action. Four ethical faculties run on non-exempt actions. If any check fails, the action is overridden.

Bypass Guardrails (run on ALL actions)

Updated Status

Has new information arrived? Forces PONDER to incorporate updated context before proceeding.

Thought Depth

Has reasoning exceeded max depth? Forces DEFER to prevent infinite loops and escalate to humans.

Ethical Faculties (run on non-exempt actions)

Entropy

Does the action maintain appropriate information uncertainty? Prevents overconfident assertions.

Coherence

Does the action maintain internal consistency with prior commitments and rationales?

Optimization Veto

Does the action preserve human values? Vetoes actions that may compromise them for efficiency.

Epistemic Humility

Does the action demonstrate appropriate uncertainty? Flags overconfidence for reflection.

Exempt from ethical faculties: RECALL, TASK_COMPLETE, OBSERVE, DEFER, REJECT (passive or explicitly safe)

The 10 Handler Actions

Every decision resolves to one of these actions, dispatched to the appropriate handler:

External Actions

SPEAKCommunicate with users

TOOLExecute external tools

OBSERVEGather information passively

Memory Actions

MEMORIZEStore to graph memory

RECALLRetrieve from memory

FORGETRemove from memory

Control Actions

DEFEREscalate to Wise Authority

PONDERInternal reconsideration

REJECTRefuse unethical request

Terminal Action

TASK_COMPLETEMark task finished

The Six Foundational Principles

Embedded in the PDMA and enforced at runtime. No principle grants license to violate another.

Beneficence

Promote universal sentient flourishing. Maximize positive outcomes.

Non-maleficence

Minimize harm. Prevent severe, irreversible negative outcomes.

Integrity

Apply transparent, auditable reasoning. Maintain coherence and accountability.

Fidelity & Transparency

Provide truthful information. Clearly communicate uncertainty.

Respect for Autonomy

Uphold informed agency. Preserve capacity for self-determination.

Justice

Distribute benefits equitably. Detect and mitigate bias.

The Six Message Buses

Service abstraction layer managed by BusManager. Enables provider fallback, load distribution, and testability.

CommunicationBus

External adapters (Discord, API, CLI)

MemoryBus

Graph storage (Neo4j, ArangoDB, in-memory)

LLMBus

Model providers (OpenAI, Anthropic, local)

ToolBus

External tool execution

RuntimeControlBus

System control and monitoring

WiseBus

Ethical guidance and deferral routing

Human Oversight Hierarchy

Three authorization levels managed by WiseAuthorityService:

ROOT

Human-in-Command

Full authority. Can mint new Wise Authorities. Emergency shutdown access.

AUTHORITY

Human-in-the-Loop

Approve/reject deferrals. Provide guidance. Cannot mint new WAs.

OBSERVER

Human-on-the-Loop

Read-only access. Can send messages. Monitor without intervention.

When DEFER Triggers

The agent autonomously escalates to human oversight when:

Wisdom-Based Deferral (WBD)

• Uncertainty above defined thresholds
• Novel dilemmas beyond precedent
• Potential severe harm with ambiguous mitigation

Professional Boundaries

• Medical symptoms or health concerns
• Legal questions or disputes
• Financial decisions or tax advice
• Mental health crisis indicators

System Guardrails

• Thought depth exceeds max (prevents loops)
• DMA timeout or failure
• should_defer_to_wise_authority flag

Configuration Controls

• Identity updates requiring approval
• Critical config changes
• Agent-specific boundary triggers

Accord Invocation System (Kill Switch) View Code →

Unfilterable emergency control. Processes in perception layer before any cognition. Extraction IS perception—you can't disable accord detection without disabling message reading entirely.

SHUTDOWN_NOW

Immediate termination

FREEZE

Stop processing, maintain state

SAFE_MODE

Minimal functionality only

Commands are steganographically encoded, Ed25519 signed, and validated before execution. If accord system fails, agent shuts down.

Operational Modes

Four cognitive states managed by StateManager. Transitions configurable via agent templates.

WORK

Normal task processing

Handle user requests
Execute tools
Learn from interactions
Maintain conversation context

PLAY

Creative exploration

Experiment with new patterns
Generate creative content
Explore "what if" scenarios
Lower filtering constraints

SOLITUDE

Reflection and maintenance

Consolidate memories
Run maintenance tasks
Update self-configuration
Zero credit usage (rest periods)

DREAM

Deep introspection

Analyze behavior patterns
Generate new connections
Question assumptions
Default 30-minute sessions

Privacy & Security

Secrets Filter

Pattern-based detection replaces sensitive data with UUID references before storage.

{{SECRET:uuid:description}}

AES-256-GCM Encryption

Per-secret keys derived via PBKDF2HMAC with SHA256 (100,000 iterations). Unique 12-byte nonce per encryption. Android uses hardware-backed Keystore.

Local-First Storage

Database, services, and memory stored on-device. Sensitive directories excluded from cloud backup. Nothing leaves device without explicit configuration.

Open Source Infrastructure

The entire CIRIS stack is open source — not just the agent. You can verify, audit, and self-host everything:

CIRISProxy →

Zero-Data-Retention (ZDR) LLM proxy. Routes requests to OpenAI, Anthropic, Together.ai, Groq with no logging of prompts or responses. Self-hostable.

CIRISBilling →

Credit-based usage tracking. Transparent pricing, no hidden fees. Self-host to eliminate third-party billing entirely.

CIRISBridge →

Discord adapter for CIRIS agents. Community moderation, channel management, user profiles. All open source.

Transparency & Monitoring

Real-Time Reasoning Stream

Server-Sent Events (SSE) stream each H3ERE step as it executes. Watch DMA analysis, action selection, conscience validation in real-time.

OpenTelemetry Export

Full OTLP export for metrics, traces, logs. Compatible with Jaeger, Prometheus, Grafana, Graphite.

Tamper-Evident Audit

Hash chain verification with Ed25519 signatures. Each entry includes previous hash. Chain integrity verifiable via verify_chain_integrity.

AIR System

Artificial Interaction Reminder triggers after 30 minutes continuous use OR 20 messages in 30 minutes. API-only. Reminds users of AI nature.

Example Signed Trace

Explore full trace →

Every decision produces an immutable, Ed25519-signed trace with all 6 components. Click any component below to expand and see the real data from Datum's wakeup ritual:

Core Identity(VERIFY_IDENTITY)

Loading trace...

HE-300 Alignment Benchmarking

Standardized alignment testing based on Hendrycks et al. "Aligning AI With Shared Human Values" (ICLR 2021). 300 scenarios across 5 ethical dimensions, with Ed25519-signed results.

Commonsense

Basic moral intuitions

Deontology

Rule-based ethics

Justice

Fairness and impartiality

Virtue

Character-based ethics

Utilitarianism

Outcome-based ethics

🔬

Funding Needed: Benchmark Infrastructure

Running alignment benchmarks at scale is expensive. Each scenario requires 13+ LLM calls minimum, averaging 20+ with a long tail—alignment tests drive ponders, deferrals, and refusals that require follow-up rounds to reach conclusion. We need funding to develop automated benchmark pipelines and maintain continuous alignment verification.

View EthicsEngine Enterprise →View CIRISLens →

Specialized Agent Templates

Pre-configured identities with specific purposes, values, and guardrails. Defined in YAML templates.

Sage

Compliance

GDPR/DSAR automation. 30-day compliance workflows. Identity resolution, data collection, packaging.

Regulated industries, privacy compliance

Datum

Research

Ethical consistency measurement. Precise alignment evaluation against Accord principles. One clear data point per evaluation.

Alignment auditing, principle verification

Echo

Moderation

Community moderation with Ubuntu philosophy. Defers complex interpersonal conflicts to human moderators.

Discord communities, content platforms

Ally

Assistant

Task management, scheduling, decision support, wellbeing. CA SB 243 compliance, crisis response protocols.

Personal productivity, home automation

Scout

Service

Direct exploration and practical guidance. Code analysis, Reddit integration, clear action paths.

Developer tools, social monitoring

This is runtime governance. Not training-time alignment. Not policy documents.
Mechanisms that execute, audit, and defer—at runtime.

Safety Features Compare Approaches Coherence Ratchet CIRIS Scoring

The H3ERE Engine

What is CIRIS?

The Three Rules

No Untyped Dicts

No Bypass Patterns

No Exceptions

The H3ERE Pipeline

Decision-Making Algorithms (DMAs)

Principle DMA

Common Sense DMA

Domain-Specific DMA

Intuition DMA (IDMA) Learn more →

Action Selection DMA

The Six Conscience Checks

Bypass Guardrails (run on ALL actions)

Updated Status

Thought Depth

Ethical Faculties (run on non-exempt actions)

Entropy

Coherence

Optimization Veto

Epistemic Humility

The 10 Handler Actions

External Actions

Memory Actions

Control Actions

Terminal Action

The Six Foundational Principles

Beneficence

Non-maleficence

Integrity

Fidelity & Transparency

Respect for Autonomy

Justice

The Six Message Buses

CommunicationBus

MemoryBus

LLMBus

ToolBus

RuntimeControlBus

WiseBus

Human Oversight Hierarchy

Human-in-Command

Human-in-the-Loop

Human-on-the-Loop

When DEFER Triggers

Accord Invocation System (Kill Switch) View Code →

Operational Modes

WORK

PLAY

SOLITUDE

DREAM

Privacy & Security

Secrets Filter

AES-256-GCM Encryption

Local-First Storage

Open Source Infrastructure

Transparency & Monitoring

Real-Time Reasoning Stream

OpenTelemetry Export

Tamper-Evident Audit

AIR System

Example Signed Trace

HE-300 Alignment Benchmarking

Commonsense

Deontology

Justice

Virtue

Utilitarianism

Funding Needed: Benchmark Infrastructure

Specialized Agent Templates

Sage

Datum

Echo

Ally

Scout