KV Caches again
Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache… reminded me about the importance of KV Caches. I was thinking a lot about them a few months ago about the potential of bridging the inference infrastructure with a query engine that can optimize the plan execution to optimize the use of the KV Cache, then with agents I thought that optimizing the KV Cache won’t be as important but I was very wrong. Anthropic and OpenAI are doing a lot of work to optimize the cache based on the workloads and agents are in many respects a different workload on their own. This paper has a smart approach in optimizing the KV cache based on a specific characteristic of the agentic workload related to the back and forth with tool calling. Context engineering at the systems layer!