aka Context Compression, Token Budget Management, Fit in Context, Token Cost Reduction
Choose what fits in the context window each turn given a fixed token budget.
Agents whose available context (system prompt + history + retrieved chunks + tools + state) exceeds the model's window.
Naive concatenation overflows; naive truncation loses critical state.
Define a packing policy. Reserve N tokens for system + tools + response. Allocate the rest across history (compressed), retrieved chunks (top-k after rerank), and current state. Use eviction (drop oldest), summarisation (compress), or selection (relevance-rank) policies. Audit token counts before each call.
uses → episodic-summariesalternative-to → memgpt-pagingcomplements → dynamic-scaffoldingused-by → todo-list-driven-agentused-by → reasoning-trace-carry-forward