aka On-Premise Agent Stack, Data-Residency Agent Architecture, Sovereign AI
Run the entire agent stack (model weights, inference, tool layer, vector stores, logs) inside a jurisdictional and operational boundary the operator controls, so no request, prompt, or output crosses into a third-party API.
Public administration, regulated industry (banking, defense, health), or critical infrastructure operators where data egress to a foreign-cloud LLM provider is forbidden by policy or law (e.g. EU AI Act high-risk systems, BSI C5, NIS2, sectoral data-protection regimes).
Hosted-API agents leak prompts, tool inputs, and outputs to a third party; for regulated workloads this is a non-starter regardless of contractual assurances.
Choose models with permissive weights or commercial sovereign licensing. Run inference on-prem or in a jurisdictionally controlled cloud region with the operator holding the keys. Place all auxiliary services (vector store, tool gateway, audit log, evaluation harness) inside the same boundary. Document the boundary as part of the system's compliance posture (model card, data-flow diagram). Treat the boundary as load-bearing: any new tool or model call has to be reviewed for boundary impact before merge.
Boundary { Inference + Tools + Memory + Logs + Eval } -- only public artefacts (UI responses) leave.
complements → session-isolationcomplements → model-carduses → lineage-trackingcomplements → secrets-handlingcomplements → constitutional-chartercomplements → open-weight-cascade