Ollama: Multi-Sequence Batching and New Model Support

Ollama merged four major pull requests on April 28th, including foundational work for multi-sequence batching in the MLX runner and support for new model architectures. The team also fixed a critical desktop app issue that was terminating active CLI sessions.

2026-04-28T00:00:00Z

Duration: PT2M

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

Show: Ollama
Published: 2026-04-28T00:00:00Z
Audio duration: PT2M

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Good morning. This is your Ollama developer briefing for April 28th, 2026.

Jesse Gross merged model support for batching, a significant 2,750-line change that prepares the MLX runner for multi-sequence processing. The update introduces a new Batch struct to wrap model inputs, switches to per-row RoPE positioning, and decouples models from attention cache storage layouts. While the runner…

Eva Ho merged a fix for desktop app startup killing active ollama launch sessions. The previous cleanup logic would terminate all ollama processes when the desktop app started, inadvertently stopping foreground CLI workflows. The fix narrows the cleanup to target only ollama serve processes on macOS and Windows.

Daniel Hiltgen merged NVIDIA TensorRT Model Optimizer import support, adding 1,570 lines of code to handle FP8 safetensors imports and mixed-precision tensor processing. The changes enable better quantization workflows for models optimized with NVIDIA's tools.

Hiltgen also merged new model support, the largest change at over 11,000 lines. This adds support for Laguna and Nemotron 3 Nano Omni models, includes FP8 safetensors conversion capabilities, and introduces a new Poolside integration…

Ad…

Nearby episodes from Ollama

MLX Threading Fixes and Claude App Integration 2026-05-03T00:00:00Z
Model Recommendations and Windows Gateway Fix 2026-05-01T00:00:00Z
Metal GPU Stability and Gemma4 Updates 2026-04-30T00:00:00Z
Launch Experience Improvements and Model Recommendations 2026-04-29T00:00:00Z
Tokenizer Bug Fix for BPE Processing 2026-04-27T00:00:00Z
Weekly Recap - MLX Performance & Launch Integrations 2026-04-27T00:00:00Z
MLX Sampling Performance Enhancement 2026-04-25T00:00:00Z
OpenAI Reasoning Integration 2026-04-24T00:00:00Z