CIRIS Agent runs on Llama 4 Maverick because it's the only open model that actually works for ethical, tool-heavy agents in production.
Architecture: Mixture-of-Experts (activates ~17B parameters per token)
Context Window: 1M tokens
Pricing: ~$0.11 in / $0.34 out per 1M tokens (via OpenRouter)
Deployment: Multi-provider (OpenRouter, Groq, Together)
CIRIS includes the complete Covenant and Comprehensive Guide in every single LLM call. No compression, no summaries, no options. This means the agent never forgets its obligations—not even for a single token. That's why context window isn't a vanity metric for us: it's a direct extension of our commitment to transparency and accountability.
Must natively support function calling and return valid JSON across 12-70 tool calls per interaction. CIRIS is an orchestrator—we need stable tool semantics, not chatty conversation.
CIRIS embeds the full Covenant and Guide into every prompt. 128K is the absolute minimum; 256K+ is strongly preferred for long conversations, tool outputs, and audit trails.
Target: <$1.00 per 1M tokens combined. We choose the cheapest working option—not the cheapest benchmark winner. A reliable model that never breaks JSON beats a cheaper model that fails 1 in 10 calls.
Must be available from at least two independent providers for robust fallback chains. CIRIS degrades gracefully during outages instead of failing hard.
Fast responses keep humans in the loop for ethical review workflows. We prioritize low-latency providers for interactive tiers while accepting slower backends for background tasks.
Llama 4 Maverick via cost-optimized provider (OpenRouter)
Llama 4 Maverick via speed-optimized provider (Groq) for interactive use
Maverick across multiple providers, with final fallback to large Llama 3.3-class models when Maverick is unavailable
Attractive on paper due to cost, but weak in structured output and tool calling.
Failure mode: "tool choice is required, but the model did not call a tool"
This error is unacceptable for a framework that depends on 12-70 tool calls per interaction. Even a 3-10× cheaper token price is not worth the operational failures.
Some newer high-context, high-parameter models offer impressive benchmarks and large contexts, but with significantly higher per-token pricing and less mature tool-calling behavior. For CIRIS's mission—ethical, inspectable, tool-centric agents—these models are currently better suited to targeted experiments than to default production use.
Llama 4 Maverick via cost-optimized providers delivers:
CIRIS embeds the full Covenant and complete Comprehensive Guide into every prompt. Not a summary. Not a distilled version. The entire governance text.
This ensures that updates to the Covenant or Guide immediately affect behavior across all agents, without waiting for new fine-tunes or prompt compression strategies.
CIRIS Agents are tool-heavy orchestrators juggling:
This combined context easily exceeds 32K-64K, especially for long-running sessions or complex investigations. That's why 128K is the minimum and 256K+ is preferred.
The bottom line:
CIRIS does not trim its values or procedures to fit the model. Instead, CIRIS chooses models that are large enough to carry the entire ethical and operational framework on every call. Models with smaller context windows—even if cheaper or more popular—are excluded from production use.
CIRIS uses Llama 4 Maverick as the primary model because it is the most reliable open option that satisfies CIRIS's ethical, operational, and economic constraints. Other models are monitored and periodically tested, but Maverick is the current default because it best serves CIRIS's commitment to trustworthy, tool-centric AI systems.
This isn't about chasing benchmark scores or following hype cycles. It's about choosing a model that actually works for ethical agents in production—and that takes the Covenant seriously enough to carry it in every single call.