Context Window Packing

aka Context Compression, Token Budget Management, Fit in Context, Token Cost Reduction

category: memory · status: mature

Intent

Choose what fits in the context window each turn given a fixed token budget.

Context

Agents whose available context (system prompt + history + retrieved chunks + tools + state) exceeds the model's window.

Problem

Naive concatenation overflows; naive truncation loses critical state.

Forces

Solution

Define a packing policy. Reserve N tokens for system + tools + response. Allocate the rest across history (compressed), retrieved chunks (top-k after rerank), and current state. Use eviction (drop oldest), summarisation (compress), or selection (relevance-rank) policies. Audit token counts before each call.

Constrains

Total tokens passed to the model must not exceed the window minus the reserved response budget.

Consequences

Benefits

Liabilities

Known Uses

Related Patterns

References