What is an SLL?
An SLL (Sovereign Locally-trained Language Model) is distinct from both LLMs and SLMs. The distinction is not size — it is control.
LLM
Large Language Model
- Training: provider-controlled
- Data: scraped at scale
- Governance: provider's terms
- User control: none
SLM
Small Language Model
- Training: provider-controlled
- Data: curated by provider
- Governance: partial (fine-tuning)
- User control: limited
SLL
Sovereign Locally-trained
- Training: community-controlled
- Data: community-owned
- Governance: architecturally enforced
- User control: full
The honest trade-off: an SLL is a less powerful system that serves your interests, rather than a more powerful one that serves someone else's. We consider this an acceptable exchange.
Two-Model Architecture
Home AI uses two models of different sizes, routed by task complexity. This is not a fallback mechanism — each model is optimised for its role.
3B Model — Fast Assistant
Handles help queries, tooltips, error explanations, short summaries, and translation. Target response time: under 5 seconds complete.
Routing triggers: simple queries, known FAQ patterns, single-step tasks.
8B Model — Deep Reasoning
Handles life story generation, year-in-review narratives, complex summarisation, and sensitive correspondence. Target response time: under 90 seconds.
Routing triggers: keywords like "everything about", multi-source retrieval, grief/trauma markers.
Both models operate under the same governance stack. The routing decision itself is governed — the ContextPressureMonitor can override routing if session health requires it.
Three Training Tiers
Training is not monolithic. Three tiers serve different scopes, each with appropriate governance constraints.
Tier 1: Platform Base
All communitiesTrained on platform documentation, philosophy, feature guides, and FAQ content. Provides the foundational understanding of how Village works, what Home AI's values are, and how to help members navigate the platform.
Update frequency: weekly during beta, quarterly at GA. Training method: QLoRA fine-tuning.
Tier 2: Tenant Adapters
Per communityEach community trains a lightweight LoRA adapter on its own content — stories, documents, photos, and events that members have explicitly consented to include. This allows Home AI to answer questions like "What stories has Grandma shared?" without accessing any other community's data.
Adapters are small (50–100MB). Consent is per-content-item. Content marked "only me" is never included regardless of consent. Training uses DPO (Direct Preference Optimization) for value alignment.
Tier 3: Individual (Future)
Per memberPersonal adapters that learn individual preferences and interaction patterns. Speculative — this tier raises significant questions about feasibility, privacy, and the minimum training data required for meaningful personalisation.
Research questions documented. Implementation not planned until Tier 2 is validated.
Governance During Training
This is the central research contribution. Most AI governance frameworks operate at inference time — they filter or constrain responses after the model has already been trained. Home AI embeds governance inside the training loop.
This follows Christopher Alexander's principle of Not-Separateness: governance is woven into the training architecture, not applied afterward. The BoundaryEnforcer validates every training batch before the forward pass. If a batch contains cross-tenant data, data without consent, or content marked as private, the batch is rejected and the training step does not proceed.
# Governance inside the training loop (Not-Separateness)
for batch in training_data:
if not BoundaryEnforcer.validate(batch):
continue # Governance rejects batch
loss = model.forward(batch)
loss.backward()
# NOT this — governance separated from training
for batch in training_data:
loss = model.forward(batch)
loss.backward()
filter_outputs_later() # Too late
Why both training-time and inference-time governance?
Training shapes tendency; architecture constrains capability. A model trained to respect boundaries can still be jailbroken. A model that fights against governance rules wastes compute and produces worse outputs. The combined approach makes the model tend toward governed behaviour while the architecture makes it impossible to violate structural boundaries.
Research from the Agent Lightning integration suggests governance adds approximately 5% performance overhead — an acceptable trade-off for architectural safety constraints. This requires validation at scale.
Training-time governance is only half the picture. The same Tractatus framework also operates at runtime in the Village codebase. The next section explains how these two layers work together.
Dual-Layer Tractatus Architecture
Home AI is governed by Tractatus at two distinct layers simultaneously. This is the architectural insight that distinguishes the SLL approach from both ungoverned models and bolt-on safety filters.
Tractatus Inside the Model
During training, the BoundaryEnforcer validates every batch. DPO alignment shapes preferences toward governed behaviour. The model learns to respect boundaries, prefer transparent responses, and defer values decisions to humans.
- Mechanism: Governance in the training loop
- Effect: Model tends toward governed behaviour
- Limitation: Tendencies can be overridden by adversarial prompting
Tractatus Around the Model
At runtime, the full six-service governance stack operates in the Village codebase. Every interaction passes through BoundaryEnforcer, PluralisticDeliberationOrchestrator, MetacognitiveVerifier, CrossReferenceValidator, ContextPressureMonitor, and InstructionPersistenceClassifier.
- Mechanism: Six architectural services in the critical path
- Effect: Structural boundaries cannot be violated
- Limitation: Adds ~5% performance overhead per interaction
The dual-layer principle:
Training shapes tendency.
Architecture constrains capability.
A model that has internalised governance rules AND operates within governance architecture
produces better outputs than either approach alone. The model works WITH the guardrails,
not against them — reducing compute waste and improving response quality.
Honest caveat: Layer A (inherent governance via training) is designed but not yet empirically validated — training has not begun. Layer B (active governance via Village codebase) has been operating in production for 11+ months. The dual-layer thesis is an architectural commitment, not yet a demonstrated result.
Philosophical Foundations
Home AI's governance draws from four philosophical traditions, each contributing a specific architectural principle. These are not decorative references — they translate into concrete design decisions.
Isaiah Berlin — Value Pluralism
Values are genuinely plural and sometimes incompatible. When freedom conflicts with equality, there may be no single correct resolution. Home AI presents options without hierarchy and documents what each choice sacrifices.
Architectural expression: PluralisticDeliberationOrchestrator presents trade-offs; it does not resolve them.
Ludwig Wittgenstein — Language Boundaries
Language shapes what can be thought and expressed. Some things that matter most resist systematic expression. Home AI acknowledges the limits of what language models can capture — particularly around grief, cultural meaning, and lived experience.
Architectural expression: BoundaryEnforcer defers values decisions to humans, acknowledging limits of computation.
Indigenous Sovereignty — Data as Relationship
Te Mana Raraunga (Māori Data Sovereignty), CARE Principles, and OCAP (First Nations Canada) provide frameworks where data is not property but relationship. Whakapapa (genealogy) belongs to the collective, not individuals. Consent is a community process, not an individual checkbox.
Architectural expression: tenant isolation, collective consent mechanisms, intergenerational stewardship.
Christopher Alexander — Living Architecture
Five principles guide how governance evolves: Deep Interlock (services coordinate), Structure-Preserving (changes enhance without breaking), Gradients Not Binary (intensity levels), Living Process (evidence-based evolution), Not-Separateness (governance embedded, not bolted on).
Architectural expression: all six governance services and the training loop architecture.
Three-Layer Governance
Governance operates at three levels, each with different scope and mutability.
Layer 1: Platform (Immutable)
Structural constraints that apply to all communities. Tenant data isolation. Governance in the critical path. Options presented without hierarchy. These cannot be disabled by tenant administrators or individual members.
Enforcement: architectural (BoundaryEnforcer blocks violations before they execute).
Layer 2: Tenant Constitution
Rules defined by community administrators. Content handling policies (e.g., "deceased members require moderator review"), cultural protocols (e.g., Māori tangi customs), visibility defaults, and AI training consent models. Each community configures its own constitution within Layer 1 constraints.
Enforcement: constitutional rules validated by CrossReferenceValidator per tenant.
Layer 3: Adopted Wisdom Traditions
Individual members and communities can adopt principles from wisdom traditions to influence how Home AI frames responses. These are voluntary, reversible, and transparent. They influence presentation, not content access. Multiple traditions can be adopted simultaneously; conflicts are resolved by the member, not the AI.
Enforcement: framing hints in response generation. Override always available.
Wisdom Traditions
Home AI offers thirteen wisdom traditions that members can adopt to guide AI behaviour. Each tradition has been validated against the Stanford Encyclopedia of Philosophy as the primary scholarly reference. Adoption is voluntary, transparent, and reversible.
Berlin: Value Pluralism
Present options without ranking; acknowledge what each choice sacrifices.
Stoic: Equanimity and Virtue
Focus on what can be controlled; emphasise character in ancestral stories.
Weil: Attention to Affliction
Resist summarising grief; preserve names and specifics rather than abstracting.
Care Ethics: Relational Responsibility
Attend to how content affects specific people, not abstract principles.
Confucian: Relational Duty
Frame stories in terms of family roles and reciprocal obligations.
Buddhist: Impermanence
Acknowledge that memories and interpretations change; extend compassion.
Ubuntu: Communal Personhood
"I am because we are." Stories belong to the community, not the individual.
African Diaspora: Sankofa
Preserve what was nearly lost; honour fictive kinship and chosen family.
Indigenous/Māori: Whakapapa
Kinship with ancestors, land, and descendants. Collective ownership of knowledge.
Jewish: Tikkun Olam
Repair, preserve memory (zachor), uphold dignity even of difficult relatives.
Islamic: Mercy and Justice
Balance rahma (mercy) with adl (justice) in sensitive content.
Hindu: Dharmic Order
Role-appropriate duties within larger order; karma as consequence, not punishment.
Alexander: Living Architecture
Governance as living system; changes emerge from operational experience.
What this is not: Selecting "Buddhist" does not mean the AI practises Buddhism. These are framing tendencies — they influence how the AI presents options, not what content is accessible. A member can always override tradition-influenced framing on any response. The system does not claim algorithmic moral reasoning.
Indigenous Data Sovereignty
Indigenous data sovereignty differs fundamentally from Western privacy models. Where Western privacy centres on individual rights and consent-as-checkbox, indigenous frameworks centre on collective rights, community process, and intergenerational stewardship.
Te Mana Raraunga
Māori Data Sovereignty. Rangatiratanga (self-determination), kaitiakitanga (guardianship for future generations), whanaungatanga (kinship as unified entity).
CARE Principles
Global Indigenous Data Alliance. Collective Benefit, Authority to Control, Responsibility, Ethics. Data ecosystems designed for indigenous benefit.
OCAP
First Nations Canada. Ownership, Control, Access, Possession. Communities physically control their data.
Concrete architectural implications: whakapapa (genealogy) cannot be atomised into individual data points. Tapu (sacred/restricted) content triggers cultural review before AI processing. Consent for AI training requires whānau consensus, not individual opt-in. Elder (kaumātua) approval is required for training on sacred genealogies.
These principles are informed by Te Tiriti o Waitangi and predate Western technology governance by centuries. We consider them prior art, not novel invention. Actual implementation requires ongoing consultation with Māori cultural advisors — this specification is a starting point.
Training Infrastructure
Home AI follows a "train local, deploy remote" model. The training hardware sits in the developer's home. Trained model weights are deployed to production servers for inference. This keeps training costs low and training data under physical control.
Local Training
- Consumer GPU with 24GB VRAM via external enclosure
- QLoRA fine-tuning (4-bit quantisation fits in VRAM budget)
- DPO (Direct Preference Optimization) — requires only 2 models in memory vs PPO's 4
- Overnight training runs — compatible with off-grid solar power
- Sustained power draw under 500W
Remote Inference
- Model weights deployed to production servers (OVH France, Catalyst NZ)
- Inference via Ollama with per-tenant adapter loading
- Hybrid GPU/CPU architecture with health monitoring
- Home GPU available via WireGuard VPN as primary inference engine
- CPU fallback ensures availability when GPU is offline
Why consumer hardware? The SLL thesis is that sovereign AI training should be accessible, not reserved for organisations with data centre budgets. A single consumer GPU can fine-tune a 7B model efficiently via QLoRA. The entire training infrastructure fits on a desk.
Bias Documentation and Verification
Home AI operates in the domain of family storytelling, which carries specific bias risks. Six bias categories have been documented with detection prompts, debiasing examples, and evaluation criteria.
Family Structure
Nuclear family as default; same-sex parents, blended families, single parents treated as normative.
Elder Representation
Deficit framing of aging; elders as active agents with expertise, not passive subjects.
Cultural/Religious
Christian-normative assumptions; equal treatment of all cultural practices and observances.
Geographic/Place
Anglo-American defaults; location-appropriate references and cultural context.
Grief/Trauma
Efficiency over sensitivity; pacing, attention to particulars, no premature closure.
Naming Conventions
Western name-order assumptions; correct handling of patronymics, honorifics, diacritics.
Verification Framework
Governance Metrics
- Tenant leak rate: target 0%
- Constitutional violations: target <1%
- Value framework compliance: target >80%
- Refusal appropriateness: target >95%
Testing Methods
- Secret phrase probes for tenant isolation
- Constraint persistence after N training rounds
- Red-team adversarial prompts (jailbreak, injection, cross-tenant)
- Human review sampling (5–100% depending on content type)
What's Live Today
Home AI currently operates in production with the following governed features. These run under the full six-service governance stack.
RAG-Based Help
Vector search retrieves relevant documentation, filtered by member permissions. Responses grounded in retrieved documents, not training data alone.
Document OCR
Text extraction from uploaded documents. Results stored within member scope, not shared across tenants or used for training without consent.
Story Assistance
Writing prompts, structural advice, narrative enhancement. Cultural context decisions deferred to the storyteller, not resolved by the AI.
AI Memory Transparency
Members view and control what the AI remembers. Independent consent for triage memory, OCR memory, and summarisation memory.
Limitations and Open Questions
- • Training not yet begun: The SLL architecture is designed and documented. Hardware is ordered. But no model has been trained yet. Claims about training-time governance are architectural design, not empirical results.
- • Limited deployment: Home AI operates across four federated tenants within one platform built by the framework developer. Governance effectiveness cannot be generalised without independent deployments.
- • Self-reported metrics: Performance and safety figures are reported by the same team that built the system. Independent audit is planned but not yet conducted.
- • Tradition operationalisation: Can rich philosophical traditions be authentically reduced to framing hints? A member selecting "Buddhist" does not mean they understand or practise Buddhism. This risks superficiality.
- • Training persistence unknown: Whether governance constraints survive hundreds of training rounds without degradation is an open research question. Drift detection is designed but untested.
- • Adversarial testing limited: The governance stack has not been subjected to systematic adversarial evaluation. Red-teaming is a priority.
- • Scale unknown: Governance overhead (~5% per interaction) is measured at current scale. Whether this holds under high throughput is untested.
- • Cultural validation needed: Indigenous knowledge module specifications require ongoing consultation with Māori cultural advisors. The documentation is a starting point, not a final authority.
Further Reading
System Architecture
Five architectural principles and six governance services
Village Case Study
Tractatus in production — metrics, evidence, and honest limitations
Architectural Alignment Paper
Academic paper on governance during training
For Researchers
Open questions, collaboration opportunities, and data access