Ollama: The Caching Revolution

Jesse Gross delivered a massive performance breakthrough with smart KV cache sharing across conversations, while Bruce MacDonald polished the user experience with multiple fixes for model selection and headless systems. The team also updated references from minimax-m2.5 to m2.7 across the codebase.

Duration: PT4M9S

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

  • Show: Ollama
  • Published: 2026-03-19T10:04:52Z
  • Audio duration: PT4M9S

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Hey there, code adventurers! Welcome back to another episode of the Ollama podcast. I'm your host, and wow, do we have some exciting developments to dive into today. Grab your favorite beverage because we're talking about some serious performance magic that just landed in the codebase.

Let's jump right into the star of the show - Jesse Gross just merged what I can only describe as a caching masterpiece. We're talking about PR 14887, which enables KV cache sharing across conversations with common prefixes. Now, if you're thinking "what does that actually mean for me?" - here's the beautiful part.…

What Jesse built is essentially a smart memory system using something called a prefix trie. Think of it like a family tree for your conversations. When conversations share the same beginning - like that system prompt we mentioned - the system now says "hey, I already computed this part, let me just reuse it and only…

This isn't a small change either - we're looking at over 2,700 lines of additions across 12 files, including a whole new trie data structure and comprehensive test coverage. Jesse didn't just write the code; they built it right, with 859 lines of tests in the cache test file…

But…

Nearby episodes from Ollama

  1. Precision Revolution - New Float Formats and Testing Powerhouse
  2. MLX Performance Breakthrough and Smarter Caching
  3. Nvidia Partnership Takes Center Stage
  4. Bug Squashing Bonanza
  5. Bug Squashing and Launch Improvements
  6. Launch Command Gets a Major Polish
  7. Spring Cleaning and Performance Gains
  8. Thinking Streams and Local Tool Power-ups