Ollama: Weekly Recap - MLX Performance & Codex Integration
Sixteen pull requests were merged this week focusing on MLX runner improvements, speculative decoding, and new Codex App integration. Major infrastructure updates include optimized release builds and hardened update flows.
Duration: PT2M30S
Episode overview
This episode is a short developer briefing from Ollama.
It explains recent repository work in plain language.
- Show: Ollama
- Published: 2026-05-17T10:00:53Z
- Audio duration: PT2M30S
Transcript excerpt
This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.
Good morning. This is your Ollama weekly recap for May 10th through 17th, 2026.
Sixteen PRs merged and 16 additional commits this week, with significant focus on MLX performance and new integrations.
Starting with new features: The team shipped Codex App integration as a launch option, joining Claude Code, Hermes, and OpenClaw in the top integrations list. This includes full install, open, and configuration handling. The OpenCode launch integration now supports image modalities for vision-capable models,…
On the performance front, major MLX runner improvements landed. The team added DFlash speculative decoding with support for Qwen 3.6 models, draft model recurrent cache playback, and enhanced RoPE/YaRN implementations. The MLX sampler received a complete rework, replacing the transform chain with an explicit…
Infrastructure updates include optimized release builds that should significantly reduce build times. The team switched to ninja's load targeting instead of fixed parallelism, adjusted compression settings - Windows 7zip from level 9 to 7, Linux zstd from 22 to 19 - and separated MLX into different archive files.…
Several critical fixes were implemented. MLX status timeouts…