Ollama: MLX Sampling Performance Enhancement

Jesse Gross merged a significant pull request implementing batch sampling across multiple sequences in the MLX runner, along with optimizations to use fixed-size ring buffers for tracking sampler history.

2026-04-25T00:00:00Z

Duration: PT2M1S

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

Show: Ollama
Published: 2026-04-25T00:00:00Z
Audio duration: PT2M1S

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Good morning, this is your Ollama developer briefing for April 25th, 2026.

Jesse Gross merged pull request 15736, implementing batch sampling across multiple sequences in the MLX runner. This substantial change adds 788 lines and modifies 212 across eight files in the MLX runner codebase. The new system allows sequences to be registered and removed, with each sampling call processing…

The implementation includes a sequence registration system where Sample calls can process any subset of registered slots, sampling one token per row and appending results to each slot's ring-buffer history. Performance remains unchanged for single sequences, which is currently all that's exposed to users.

Additionally, Gross committed two performance optimizations. The first refactors the sampler to use a fixed-size ring buffer for tracking history instead of concatenating and slicing tensors on each decode step. This eliminates graph shape changes and reduces memory allocation overhead. The second commit contains…

The changes primarily affect the sample package within the MLX runner, with modifications to core sampling logic, operations, and comprehensive test coverage additions.

What's next: The team…

Nearby episodes from Ollama

Launch Experience Improvements and Model Recommendations 2026-04-29T00:00:00Z
Multi-Sequence Batching and New Model Support 2026-04-28T00:00:00Z
Tokenizer Bug Fix for BPE Processing 2026-04-27T00:00:00Z
Weekly Recap - MLX Performance & Launch Integrations 2026-04-27T00:00:00Z
OpenAI Reasoning Integration 2026-04-24T00:00:00Z
Launch System Improvements and Integration Fixes 2026-04-23T00:00:00Z
Launch System Overhaul and Documentation Updates 2026-04-22T00:00:00Z
MLX Performance Boost and Model Updates 2026-04-21T00:00:00Z