← All artifacts · All patterns · Safety & Control

Sovereign Inference Stack

aka On-Premise Agent Stack, Data-Residency Agent Architecture, Sovereign AI

category: safety-control · status: emerging

Intent

Run the entire agent stack (model weights, inference, tool layer, vector stores, logs) inside a jurisdictional and operational boundary the operator controls, so no request, prompt, or output crosses into a third-party API.

Context

Public administration, regulated industry (banking, defense, health), or critical infrastructure operators where data egress to a foreign-cloud LLM provider is forbidden by policy or law (e.g. EU AI Act high-risk systems, BSI C5, NIS2, sectoral data-protection regimes).

Problem

Hosted-API agents leak prompts, tool inputs, and outputs to a third party; for regulated workloads this is a non-starter regardless of contractual assurances.

Forces

Frontier hosted models offer the best capability per dollar.
Regulators forbid data egress for protected categories.
Self-hosting demands GPU capex and MLOps competence the operator may lack.
Sovereign deployments must still reach acceptable model quality to be useful.

Solution

Choose models with permissive weights or commercial sovereign licensing. Run inference on-prem or in a jurisdictionally controlled cloud region with the operator holding the keys. Place all auxiliary services (vector store, tool gateway, audit log, evaluation harness) inside the same boundary. Document the boundary as part of the system's compliance posture (model card, data-flow diagram). Treat the boundary as load-bearing: any new tool or model call has to be reviewed for boundary impact before merge.

Structure

Boundary { Inference + Tools + Memory + Logs + Eval } -- only public artefacts (UI responses) leave.

Constrains

No prompt, tool input, tool output, or memory entry may leave the operator-controlled boundary; agent components that require a third-party hosted call are forbidden by construction.

Consequences

Benefits

Compliant with data-residency and sectoral regulations.
Auditable end-to-end; no opaque third-party API.
Operator retains negotiating power over model upgrades and pricing.

Liabilities

Capex and operational complexity (GPU fleet, ops team).
Capability gap vs. frontier hosted models is real and ongoing.
Each new model upgrade is a procurement project, not an API key swap.

Known Uses

Aleph Alpha PhariaAI — End-to-end stack (Pharia models, PhariaEngine WebAssembly skill runtime, on-prem deployable) marketed for sovereign / explainable enterprise and government use.
Mistral on-prem ("Le Chat Enterprise" / private deployment) — Self-hostable European model option used for similar sovereignty requirements.
SAP Joule with private grounding — Tenant-isolated agent stack with customer data residency commitments.

Related Patterns

complements → session-isolation
complements → model-card
uses → lineage-tracking
complements → secrets-handling
complements → constitutional-charter
complements → open-weight-cascade