Home AI — Sovereign Locally-Trained Language Model

What is an SLL?

An SLL (Sovereign Locally-trained Language Model) is distinct from both LLMs and SLMs. The distinction is not size — it is control.

LLM

Large Language Model

Training: provider-controlled
Data: scraped at scale
Governance: provider's terms
User control: none

SLM

Small Language Model

Training: provider-controlled
Data: curated by provider
Governance: partial (fine-tuning)
User control: limited

SLL

Sovereign Locally-trained

Training: community-controlled
Data: community-owned
Governance: architecturally enforced
User control: full

The honest trade-off: an SLL is a less powerful system that serves your interests, rather than a more powerful one that serves someone else's. We consider this an acceptable exchange.

Two-Model Architecture

Home AI uses two models of different sizes, routed by task complexity. This is not a fallback mechanism — each model is optimised for its role.

3B Model — Fast Assistant

Handles help queries, tooltips, error explanations, short summaries, and translation. Target response time: under 5 seconds complete.

Routing triggers: simple queries, known FAQ patterns, single-step tasks.

8B Model — Deep Reasoning

Handles life story generation, year-in-review narratives, complex summarisation, and sensitive correspondence. Target response time: under 90 seconds.

Routing triggers: keywords like "everything about", multi-source retrieval, grief/trauma markers.

Both models operate under the same governance stack. The routing decision itself is governed — the ContextPressureMonitor can override routing if session health requires it.

Three Training Tiers

Training is not monolithic. Three tiers serve different scopes, each with appropriate governance constraints.

Tier 1: Platform Base

All communities

Trained on platform documentation, philosophy, feature guides, and FAQ content. Provides the foundational understanding of how Village works, what Home AI's values are, and how to help members navigate the platform.

Update frequency: weekly during beta, quarterly at GA. Training method: QLoRA fine-tuning.

Tier 2: Tenant Adapters

Per community

Each community trains a lightweight LoRA adapter on its own content — stories, documents, photos, and events that members have explicitly consented to include. This allows Home AI to answer questions like "What stories has Grandma shared?" without accessing any other community's data.

Adapters are small (50–100MB). Consent is per-content-item. Content marked "only me" is never included regardless of consent. Training uses DPO (Direct Preference Optimization) for value alignment.

Tier 3: Individual (Future)

Per member

Personal adapters that learn individual preferences and interaction patterns. Speculative — this tier raises significant questions about feasibility, privacy, and the minimum training data required for meaningful personalisation.

Research questions documented. Implementation not planned until Tier 2 is validated.

Governance During Training

This is the central research contribution. Most AI governance frameworks operate at inference time — they filter or constrain responses after the model has already been trained. Home AI embeds governance inside the training loop.

This follows Christopher Alexander's principle of Not-Separateness: governance is woven into the training architecture, not applied afterward. The BoundaryEnforcer validates every training batch before the forward pass. If a batch contains cross-tenant data, data without consent, or content marked as private, the batch is rejected and the training step does not proceed.

# Governance inside the training loop (Not-Separateness)

for batch in training_data:

if not BoundaryEnforcer.validate(batch):

continue # Governance rejects batch

loss = model.forward(batch)

loss.backward()

# NOT this — governance separated from training

for batch in training_data:

loss = model.forward(batch)

loss.backward()

filter_outputs_later() # Too late

Why both training-time and inference-time governance?

Training shapes tendency; architecture constrains capability. A model trained to respect boundaries can still be jailbroken. A model that fights against governance rules wastes compute and produces worse outputs. The combined approach makes the model tend toward governed behaviour while the architecture makes it impossible to violate structural boundaries.

Research from the Agent Lightning integration suggests governance adds approximately 5% performance overhead — an acceptable trade-off for architectural safety constraints. This requires validation at scale.

Training-time governance is only half the picture. The same Tractatus framework also operates at runtime in the Village codebase. The next section explains how these two layers work together.

Dual-Layer Tractatus Architecture

Home AI is governed by Tractatus at two distinct layers simultaneously. This is the architectural insight that distinguishes the SLL approach from both ungoverned models and bolt-on safety filters.

LAYER A: INHERENT

Tractatus Inside the Model

During training, the BoundaryEnforcer validates every batch. DPO alignment shapes preferences toward governed behaviour. The model learns to respect boundaries, prefer transparent responses, and defer values decisions to humans.

Mechanism: Governance in the training loop
Effect: Model tends toward governed behaviour
Limitation: Tendencies can be overridden by adversarial prompting

LAYER B: ACTIVE

Tractatus Around the Model

At runtime, the full six-service governance stack operates in the Village codebase. Every interaction passes through BoundaryEnforcer, PluralisticDeliberationOrchestrator, MetacognitiveVerifier, CrossReferenceValidator, ContextPressureMonitor, and InstructionPersistenceClassifier.

Mechanism: Six architectural services in the critical path
Effect: Structural boundaries cannot be violated
Limitation: Adds ~5% performance overhead per interaction

The dual-layer principle:

Training shapes tendency.

Architecture constrains capability.

A model that has internalised governance rules AND operates within governance architecture

produces better outputs than either approach alone. The model works WITH the guardrails,

not against them — reducing compute waste and improving response quality.

Honest caveat: Layer A (inherent governance via training) is designed but not yet empirically validated — training has not begun. Layer B (active governance via Village codebase) has been operating in production for 11+ months. The dual-layer thesis is an architectural commitment, not yet a demonstrated result.

Philosophical Foundations

Home AI's governance draws from four philosophical traditions, each contributing a specific architectural principle. These are not decorative references — they translate into concrete design decisions.

Isaiah Berlin — Value Pluralism

Values are genuinely plural and sometimes incompatible. When freedom conflicts with equality, there may be no single correct resolution. Home AI presents options without hierarchy and documents what each choice sacrifices.

Architectural expression: PluralisticDeliberationOrchestrator presents trade-offs; it does not resolve them.

Ludwig Wittgenstein — Language Boundaries

Language shapes what can be thought and expressed. Some things that matter most resist systematic expression. Home AI acknowledges the limits of what language models can capture — particularly around grief, cultural meaning, and lived experience.

Architectural expression: BoundaryEnforcer defers values decisions to humans, acknowledging limits of computation.

Indigenous Sovereignty — Data as Relationship

Te Mana Raraunga (Māori Data Sovereignty), CARE Principles, and OCAP (First Nations Canada) provide frameworks where data is not property but relationship. Whakapapa (genealogy) belongs to the collective, not individuals. Consent is a community process, not an individual checkbox.

Architectural expression: tenant isolation, collective consent mechanisms, intergenerational stewardship.

Christopher Alexander — Living Architecture

Five principles guide how governance evolves: Deep Interlock (services coordinate), Structure-Preserving (changes enhance without breaking), Gradients Not Binary (intensity levels), Living Process (evidence-based evolution), Not-Separateness (governance embedded, not bolted on).

Architectural expression: all six governance services and the training loop architecture.

Three-Layer Governance

Governance operates at three levels, each with different scope and mutability.

Layer 1: Platform (Immutable)

Structural constraints that apply to all communities. Tenant data isolation. Governance in the critical path. Options presented without hierarchy. These cannot be disabled by tenant administrators or individual members.

Enforcement: architectural (BoundaryEnforcer blocks violations before they execute).

Layer 2: Tenant Constitution

Rules defined by community administrators. Content handling policies (e.g., "deceased members require moderator review"), cultural protocols (e.g., Māori tangi customs), visibility defaults, and AI training consent models. Each community configures its own constitution within Layer 1 constraints.

Enforcement: constitutional rules validated by CrossReferenceValidator per tenant.

Layer 3: Adopted Wisdom Traditions

Individual members and communities can adopt principles from wisdom traditions to influence how Home AI frames responses. These are voluntary, reversible, and transparent. They influence presentation, not content access. Multiple traditions can be adopted simultaneously; conflicts are resolved by the member, not the AI.

Enforcement: framing hints in response generation. Override always available.

Wisdom Traditions

Home AI offers thirteen wisdom traditions that members can adopt to guide AI behaviour. Each tradition has been validated against the Stanford Encyclopedia of Philosophy as the primary scholarly reference. Adoption is voluntary, transparent, and reversible.

Berlin: Value Pluralism

Present options without ranking; acknowledge what each choice sacrifices.

Stoic: Equanimity and Virtue

Focus on what can be controlled; emphasise character in ancestral stories.

Weil: Attention to Affliction

Resist summarising grief; preserve names and specifics rather than abstracting.

Care Ethics: Relational Responsibility

Attend to how content affects specific people, not abstract principles.

Confucian: Relational Duty

Frame stories in terms of family roles and reciprocal obligations.

Buddhist: Impermanence

Acknowledge that memories and interpretations change; extend compassion.

Ubuntu: Communal Personhood

"I am because we are." Stories belong to the community, not the individual.

African Diaspora: Sankofa

Preserve what was nearly lost; honour fictive kinship and chosen family.

Indigenous/Māori: Whakapapa

Kinship with ancestors, land, and descendants. Collective ownership of knowledge.

Jewish: Tikkun Olam

Repair, preserve memory (zachor), uphold dignity even of difficult relatives.

Islamic: Mercy and Justice

Balance rahma (mercy) with adl (justice) in sensitive content.

Hindu: Dharmic Order

Role-appropriate duties within larger order; karma as consequence, not punishment.

Alexander: Living Architecture

Governance as living system; changes emerge from operational experience.

What this is not: Selecting "Buddhist" does not mean the AI practises Buddhism. These are framing tendencies — they influence how the AI presents options, not what content is accessible. A member can always override tradition-influenced framing on any response. The system does not claim algorithmic moral reasoning.

Indigenous Data Sovereignty

Indigenous data sovereignty differs fundamentally from Western privacy models. Where Western privacy centres on individual rights and consent-as-checkbox, indigenous frameworks centre on collective rights, community process, and intergenerational stewardship.

Te Mana Raraunga

Māori Data Sovereignty. Rangatiratanga (self-determination), kaitiakitanga (guardianship for future generations), whanaungatanga (kinship as unified entity).

CARE Principles

Global Indigenous Data Alliance. Collective Benefit, Authority to Control, Responsibility, Ethics. Data ecosystems designed for indigenous benefit.

OCAP

First Nations Canada. Ownership, Control, Access, Possession. Communities physically control their data.

Concrete architectural implications: whakapapa (genealogy) cannot be atomised into individual data points. Tapu (sacred/restricted) content triggers cultural review before AI processing. Consent for AI training requires whānau consensus, not individual opt-in. Elder (kaumātua) approval is required for training on sacred genealogies.

These principles are informed by Te Tiriti o Waitangi and predate Western technology governance by centuries. We consider them prior art, not novel invention. Actual implementation requires ongoing consultation with Māori cultural advisors — this specification is a starting point.

Training Infrastructure

Home AI follows a "train local, deploy remote" model. The training hardware sits in the developer's home. Trained model weights are deployed to production servers for inference. This keeps training costs low and training data under physical control.

Local Training

Consumer GPU with 24GB VRAM via external enclosure
QLoRA fine-tuning (4-bit quantisation fits in VRAM budget)
DPO (Direct Preference Optimization) — requires only 2 models in memory vs PPO's 4
Overnight training runs — compatible with off-grid solar power
Sustained power draw under 500W

Remote Inference

Model weights deployed to production servers (OVH France, Catalyst NZ)
Inference via Ollama with per-tenant adapter loading
Hybrid GPU/CPU architecture with health monitoring
Home GPU available via WireGuard VPN as primary inference engine
CPU fallback ensures availability when GPU is offline

Why consumer hardware? The SLL thesis is that sovereign AI training should be accessible, not reserved for organisations with data centre budgets. A single consumer GPU can fine-tune a 7B model efficiently via QLoRA. The entire training infrastructure fits on a desk.

Bias Documentation and Verification

Home AI operates in the domain of family storytelling, which carries specific bias risks. Six bias categories have been documented with detection prompts, debiasing examples, and evaluation criteria.

Family Structure

Nuclear family as default; same-sex parents, blended families, single parents treated as normative.

Elder Representation

Deficit framing of aging; elders as active agents with expertise, not passive subjects.

Cultural/Religious

Christian-normative assumptions; equal treatment of all cultural practices and observances.

Geographic/Place

Anglo-American defaults; location-appropriate references and cultural context.

Grief/Trauma

Efficiency over sensitivity; pacing, attention to particulars, no premature closure.

Naming Conventions

Western name-order assumptions; correct handling of patronymics, honorifics, diacritics.

Verification Framework

Governance Metrics

Tenant leak rate: target 0%
Constitutional violations: target <1%
Value framework compliance: target >80%
Refusal appropriateness: target >95%

Testing Methods

Secret phrase probes for tenant isolation
Constraint persistence after N training rounds
Red-team adversarial prompts (jailbreak, injection, cross-tenant)
Human review sampling (5–100% depending on content type)

What's Live Today

Home AI currently operates in production with the following governed features. These run under the full six-service governance stack.

RAG-Based Help

Vector search retrieves relevant documentation, filtered by member permissions. Responses grounded in retrieved documents, not training data alone.

Document OCR

Text extraction from uploaded documents. Results stored within member scope, not shared across tenants or used for training without consent.

Story Assistance

Writing prompts, structural advice, narrative enhancement. Cultural context decisions deferred to the storyteller, not resolved by the AI.

AI Memory Transparency

Members view and control what the AI remembers. Independent consent for triage memory, OCR memory, and summarisation memory.

Limitations and Open Questions

• Training not yet begun: The SLL architecture is designed and documented. Hardware is ordered. But no model has been trained yet. Claims about training-time governance are architectural design, not empirical results.
• Limited deployment: Home AI operates across four federated tenants within one platform built by the framework developer. Governance effectiveness cannot be generalised without independent deployments.
• Self-reported metrics: Performance and safety figures are reported by the same team that built the system. Independent audit is planned but not yet conducted.
• Tradition operationalisation: Can rich philosophical traditions be authentically reduced to framing hints? A member selecting "Buddhist" does not mean they understand or practise Buddhism. This risks superficiality.
• Training persistence unknown: Whether governance constraints survive hundreds of training rounds without degradation is an open research question. Drift detection is designed but untested.
• Adversarial testing limited: The governance stack has not been subjected to systematic adversarial evaluation. Red-teaming is a priority.
• Scale unknown: Governance overhead (~5% per interaction) is measured at current scale. Whether this holds under high throughput is untested.
• Cultural validation needed: Indigenous knowledge module specifications require ongoing consultation with Māori cultural advisors. The documentation is a starting point, not a final authority.