Ollama: MLX Performance Breakthrough and Smarter Caching
The Ollama team delivered major MLX improvements with a massive update that brings 6.4x speed improvements through new CUDA kernels, plus smarter caching logic for transformer models. Daniel Hiltgen led the MLX update while Jesse Gross enhanced cache performance with better partial matching capabilities.
Duration: PT4M8S
Episode overview
This episode is a short developer briefing from Ollama.
It explains recent repository work in plain language.
- Show: Ollama
- Published: 2026-03-24T10:04:16Z
- Audio duration: PT4M8S
Transcript excerpt
This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.
Hey there, code crafters! Welcome back to another episode of the Ollama podcast. I'm your host, and wow, do we have some exciting performance stories to dive into today. Grab your favorite beverage because we're talking about some seriously impressive speed improvements that are going to make your day.
So picture this - you're running a model and suddenly it's over six times faster. Not 6 percent, not 60 percent, but 6.4 times faster! That's exactly what happened with the massive MLX update that Daniel Hiltgen just merged. This isn't just a small tweak - we're talking about a complete refresh of the MLX…
Here's the story behind this update. Daniel pulled in the latest MLX changes from March 16th, but the real magic happened when they added the CUDA Fast Gated Delta kernel. When they tested it on a Qwen 3.5 model with an RTX 5090, the prefill speed jumped from 529 tokens per second to over 3,300 tokens per second.…
But Daniel didn't stop at the performance improvements. They also cleaned up some technical debt that had been lurking in the codebase. You know how it goes - sometimes when you're moving fast, little vendoring bugs creep in. This update caught and fixed those issues,…
Now,…
…
Nearby episodes from Ollama
- Fixing the Inconsistencies That Matter
- Smart Caching and Better User Experience
- VS Code Integration Takes Center Stage
- Precision Revolution - New Float Formats and Testing Powerhouse
- Nvidia Partnership Takes Center Stage
- Bug Squashing Bonanza
- The Caching Revolution
- Bug Squashing and Launch Improvements