Ollama: MLX Sampling Performance Enhancement

Jesse Gross merged a significant pull request implementing batch sampling across multiple sequences in the MLX runner, along with optimizations to use fixed-size ring buffers for tracking sampler history.

Duration: PT2M1S

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

  • Show: Ollama
  • Published: 2026-04-25T00:00:00Z
  • Audio duration: PT2M1S

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Good morning, this is your Ollama developer briefing for April 25th, 2026.

Jesse Gross merged pull request 15736, implementing batch sampling across multiple sequences in the MLX runner. This substantial change adds 788 lines and modifies 212 across eight files in the MLX runner codebase. The new system allows sequences to be registered and removed, with each sampling call processing…

The implementation includes a sequence registration system where Sample calls can process any subset of registered slots, sampling one token per row and appending results to each slot's ring-buffer history. Performance remains unchanged for single sequences, which is currently all that's exposed to users.

Additionally, Gross committed two performance optimizations. The first refactors the sampler to use a fixed-size ring buffer for tracking history instead of concatenating and slicing tensors on each decode step. This eliminates graph shape changes and reduces memory allocation overhead. The second commit contains…

The changes primarily affect the sample package within the MLX runner, with modifications to core sampling logic, operations, and comprehensive test coverage additions.

What's next: The team…

Nearby episodes from Ollama

  1. Launch Experience Improvements and Model Recommendations
  2. Multi-Sequence Batching and New Model Support
  3. Tokenizer Bug Fix for BPE Processing
  4. Weekly Recap - MLX Performance & Launch Integrations
  5. OpenAI Reasoning Integration
  6. Launch System Improvements and Integration Fixes
  7. Launch System Overhaul and Documentation Updates
  8. MLX Performance Boost and Model Updates