Ollama: MLX Performance Boost and Model Updates
Six pull requests merged with significant MLX runner optimizations delivering 1.5% throughput improvements and better concurrent processing. Model recommendations updated to feature kimi-k2.6.
Duration: PT1M55S
Episode overview
This episode is a short developer briefing from Ollama.
It explains recent repository work in plain language.
- Show: Ollama
- Published: 2026-04-21T00:00:00Z
- Audio duration: PT1M55S
Transcript excerpt
This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.
Good morning, this is your Ollama development briefing for April 21st, 2026.
Jesse Gross merged MLX Sampler Improvements, adding logprobs support to the MLX runner and optimizing the sampling process. The changes avoid multiple sorts when both top-P and top-K filters are active, delivering a 1.5% generation throughput improvement with gemma4 models.
Gross also merged tokenization improvements that move prompt processing out of the GPU goroutine into individual request handlers. This allows CPU tokenization to happen concurrently while the GPU handles current requests, improving overall pipeline efficiency.
Parth Sareen merged a server fix enabling format constraints for gemma4 models when thinking mode is disabled. This addresses user blocking issues with model constraints.
Michael Verrilli merged capability detection fixes for the interactive TUI mode. The terminal interface was missing multimodal detection, causing image and audio files to be treated as unknown commands instead of valid attachments.
Matteo Celani merged a model picker fix that resolves stale model displays when switching between chats. The issue occurred when streaming messages stored model objects instead of…
Nearby episodes from Ollama
- MLX Sampling Performance Enhancement
- OpenAI Reasoning Integration
- Launch System Improvements and Integration Fixes
- Launch System Overhaul and Documentation Updates
- New CLI Integration and Performance Improvements
- Weekly Recap - MLX Performance & Launch Integration Expansion
- MLX Sampler Improvements
- Windows WSL Integration Simplified