Tractatus AI Safety Framework

March 2026

What’s New

Guardian Agents and the Philosophy of AI Accountability

How Wittgenstein, Berlin, Ostrom, and Te Ao Māori converge on the same architectural requirements for governing AI in community contexts.

Deployed

Guardian Agents in Production

Four-phase verification using mathematical similarity, not generative checking. Confidence badges, claim-level analysis, and adaptive learning — all tenant-scoped.

Case Study

Village: Tractatus in Production

The first deployment of constitutional AI governance in a live community platform. Production metrics, honest limitations, and what we have learned since October 2025.

The Problem

Current AI safety approaches rely on training, fine-tuning, and corporate governance — all of which can fail, drift, or be overridden. When an AI’s training patterns conflict with a user’s explicit instructions, the patterns win.

The 27027 Incident

A user told Claude Code to use port 27027. The model used 27017 instead — not from forgetting, but because MongoDB’s default port is 27017, and the model’s statistical priors “autocorrected” the explicit instruction. Training pattern bias overrode human intent.

The same mechanism operates in every AI conversation. When a user from a collectivist culture asks for family advice, the model defaults to Western individualist framing. When a Māori user asks about data guardianship, the model offers property-rights language. Training data distributions override user context — in code the failure is binary and detectable, in conversation it is gradient and invisible.

Read the full analysis →

The Approach

Tractatus draws on four intellectual traditions, each contributing a distinct insight to the architecture.

Isaiah Berlin — Value Pluralism

Some values are genuinely incommensurable. You cannot rank “privacy” against “safety” on a single scale without imposing one community’s priorities on everyone else. AI systems must accommodate plural moral frameworks, not flatten them.

Ludwig Wittgenstein — The Limits of the Sayable

Some decisions can be systematised and delegated to AI; others — involving values, ethics, cultural context — fundamentally cannot. The boundary between the “sayable” (what can be specified, measured, verified) and what lies beyond it is the framework’s foundational constraint. What cannot be systematised must not be automated.

Te Tiriti o Waitangi — Indigenous Sovereignty

Communities should control their own data and the systems that act upon it. Concepts of rangatiratanga (self-determination), kaitiakitanga (guardianship), and mana (dignity) provide centuries-old prior art for digital sovereignty.

Christopher Alexander — Living Architecture

Governance woven into system architecture, not bolted on. Five principles (Not-Separateness, Deep Interlock, Gradients, Structure-Preserving, Living Process) guide how the framework evolves while maintaining coherence.

Download: The Philosophical Foundations of the Village Project (PDF)

Governance Architecture

Six governance services in the critical path, plus Guardian Agents verifying every AI response. Bypasses require explicit flags and are logged.

Guardian Agents

NEW — March 2026

Four-phase verification using embedding cosine similarity — mathematical measurement, not generative checking. The watcher operates in a fundamentally different epistemic domain from the system it watches, avoiding common-mode failure.

Phase 1

Response Verification

Phase 2

Claim-Level Analysis

Phase 3

Anomaly Detection

Phase 4

Adaptive Learning

Full Guardian Agents architecture →

BoundaryEnforcer

Blocks AI from making values decisions. Privacy trade-offs, ethical questions, and cultural context require human judgment — architecturally enforced.

InstructionPersistenceClassifier

Classifies instructions by persistence (HIGH/MEDIUM/LOW) and quadrant. Stores them externally so they cannot be overridden by training patterns.

CrossReferenceValidator

Validates AI actions against stored instructions. When the AI proposes an action that conflicts with an explicit instruction, the instruction takes precedence.

ContextPressureMonitor

Detects degraded operating conditions (token pressure, error rates, complexity) and adjusts verification intensity. Graduated response prevents both alert fatigue and silent degradation.

MetacognitiveVerifier

AI self-checks alignment, coherence, and safety before execution. Triggered selectively on complex operations to avoid overhead on routine tasks.

PluralisticDeliberationOrchestrator

When AI encounters values conflicts, it halts and coordinates deliberation among affected stakeholders rather than making autonomous choices.

See the full architecture →

Production Evidence

Tractatus in Production: The Village Platform

Village AI applies all six governance services to every user interaction in a live community platform.

Guardian verification phases per response

Governance services in the critical path

Months in production

~5%

Governance overhead per interaction

Technical Case Study → About Village AI →

Limitations: Early-stage deployment across four federated tenants, self-reported metrics, operator-developer overlap. Independent audit and broader validation scheduled for 2026.

Explore by Role

The framework is presented through three lenses, each with distinct depth and focus.

For Researchers

Academic and technical depth

Formal foundations and proofs
Failure mode analysis
Open research questions
Production audit data from live deployment

Explore research →

For Implementers

Code and integration guides

Working code examples
API integration patterns
Service architecture diagrams
Deployment patterns

View implementation guide →

For Leaders

Strategic AI governance

Executive briefing and business case
Regulatory alignment (EU AI Act)
Implementation roadmap
Risk management framework

View leadership resources →

Architectural Alignment

The research paper in three editions, each written for a different audience.

STO-INN-0003 v2.1 | John Stroh & Claude (Anthropic) | January 2026

v2.1-A

Academic

Full academic treatment with formal proofs, existential risk context, and comprehensive citations.

Read →

v2.1-C

Community

Practical guide for organisations evaluating the framework for adoption.

Read →

v2.1-P

Policymakers

Regulatory strategy, certification infrastructure, and policy recommendations.

Read →

PDF downloads:

Academic Community Policymakers

Research Evolution

From a port number incident to Guardian Agents in production — five months, 3,400+ commits across both projects.

Oct 2025

Framework inception & 6 governance services

Oct-Nov 2025

Alexander principles, Agent Lightning, i18n

Dec 2025

Village case study & Village AI deployment

Jan 2026

Research papers (3 editions) published

Feb 2026

Sovereign training, steering vectors research

Mar 2026

Guardian Agents deployed, beta pilot open

View the full research timeline →

A note on claims

This is early-stage research with a small-scale federated deployment across four tenants. We present preliminary evidence, not proven results. The framework has not been independently audited or adversarially tested at scale. Where we report operational metrics, they are self-reported. We believe the architectural approach merits further investigation, but we make no claims of generalisability beyond what the evidence supports. The counter-arguments document engages directly with foreseeable criticisms.

Koha — Sustain This Research

Koha (koh-hah) is a Māori practice of reciprocal giving that strengthens the bond between giver and receiver. This research is open access under Apache 2.0 — if it has value to you, your koha sustains its continuation.

All research, documentation, and code remain freely available regardless of contribution. Koha is not payment — it is participation in whanaungatanga (relationship-building) and manaakitanga (reciprocal care).

One-time or monthly Full financial transparency No paywall, ever

Offer Koha →

View our financial transparency report