Ollama: Simplifying the Sampling Story
Patrick Devine merged a significant refactor that streamlines how Ollama's MLX runner handles text generation sampling. The change replaces a complex chain of sampling interfaces with a single, stateful sampler that's much easier to work with and maintain.
Duration: PT4M5S
Episode overview
This episode is a short developer briefing from Ollama.
It explains recent repository work in plain language.
- Show: Ollama
- Published: 2026-03-08T10:03:36Z
- Audio duration: PT4M5S
Transcript excerpt
This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.
Hey everyone, and welcome back to another episode of the Ollama podcast! I'm your host, and wow, do we have an interesting story about code simplification today. You know those moments when someone looks at a complex system and says "there's got to be a better way"? Well, that's exactly what happened yesterday, and…
Let's dive right into our main story. Patrick Devine just merged a fantastic refactor that tackles something called the sampling system in our MLX runner. Now, if you're not familiar with sampling in AI models, think of it like this - when an AI generates text, it doesn't just pick the most obvious next word every…
The old system used what's called a "chain of interfaces" - imagine having separate little workers, each handling one aspect of sampling. One worker handled TopP sampling, another handled TopK, another managed penalties for repetition, and so on. While this worked, it created a lot of complexity. You had to…
Patrick's solution is beautifully simple. Instead of all these separate interfaces, he collapsed everything into a single, stateful Sampler struct. Think of it like replacing a relay team with one really capable runner who can handle the whole race. This…
Th…
Nearby episodes from Ollama
- Spring Cleaning and Performance Gains
- Thinking Streams and Local Tool Power-ups
- Stability First - Error Handling and Performance Fixes
- MLX Gets a Major Upgrade and Web Search Goes Live
- Cloud Models Get Smarter & Build Performance Boost
- Cloud Integrations Get Some Love
- Smarter Constraints and Qwen3.5 Boost
- Cloud Integration Drama and AI Model Expansion