Ollama: Multi-Sequence Batching and New Model Support

Ollama merged four major pull requests on April 28th, including foundational work for multi-sequence batching in the MLX runner and support for new model architectures. The team also fixed a critical desktop app issue that was terminating active CLI sessions.

Duration: PT2M

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

  • Show: Ollama
  • Published: 2026-04-28T00:00:00Z
  • Audio duration: PT2M

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Good morning. This is your Ollama developer briefing for April 28th, 2026.

Jesse Gross merged model support for batching, a significant 2,750-line change that prepares the MLX runner for multi-sequence processing. The update introduces a new Batch struct to wrap model inputs, switches to per-row RoPE positioning, and decouples models from attention cache storage layouts. While the runner…

Eva Ho merged a fix for desktop app startup killing active ollama launch sessions. The previous cleanup logic would terminate all ollama processes when the desktop app started, inadvertently stopping foreground CLI workflows. The fix narrows the cleanup to target only ollama serve processes on macOS and Windows.

Daniel Hiltgen merged NVIDIA TensorRT Model Optimizer import support, adding 1,570 lines of code to handle FP8 safetensors imports and mixed-precision tensor processing. The changes enable better quantization workflows for models optimized with NVIDIA's tools.

Hiltgen also merged new model support, the largest change at over 11,000 lines. This adds support for Laguna and Nemotron 3 Nano Omni models, includes FP8 safetensors conversion capabilities, and introduces a new Poolside integration…

Ad…

Nearby episodes from Ollama

  1. MLX Threading Fixes and Claude App Integration
  2. Model Recommendations and Windows Gateway Fix
  3. Metal GPU Stability and Gemma4 Updates
  4. Launch Experience Improvements and Model Recommendations
  5. Tokenizer Bug Fix for BPE Processing
  6. Weekly Recap - MLX Performance & Launch Integrations
  7. MLX Sampling Performance Enhancement
  8. OpenAI Reasoning Integration