Ollama: Major Architecture Overhaul Removes CGO Dependencies
Ollama has completed a massive refactoring, removing CGO engines and switching exclusively to llama-server for GGML models while fixing MLX development paths. The changes span over 1,100 files and streamline the inference architecture.
Duration: PT1M55S
Episode overview
This episode is a short developer briefing from Ollama.
It explains recent repository work in plain language.
- Show: Ollama
- Published: 2026-05-30T10:00:31Z
- Audio duration: PT1M55S
Transcript excerpt
This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.
Good morning, I'm your host with the Ollama developer briefing for Thursday, May 30th, 2026.
Daniel Hiltgen merged a massive architectural overhaul that removes CGO engines and adopts llama-server exclusively for GGML models. This change eliminates the vendored GGML and llama.cpp backend, the CGO runner, and Go model implementations. The refactoring spans 1,100 files with over 28,000 additions and 430,000…
Hiltgen also merged a smaller fix addressing MLX development mode search paths that were broken during the llama-server transition. This update corrects library resolution code to match the new superbuild structure.
The architectural changes include significant improvements to GPU discovery, moving away from parsing llama-server output to using dynamic library loading for more reliable hardware detection. The update also introduces better batch sizing for performance optimization and enhanced Vulkan support with Windows…
Notable technical additions include compatibility layers for Ollama-format GGUF files in llama-server, support for multiple new model architectures including Gemma4, Qwen3.5, and Mistral3, and improved multi-GPU filtering capabilities.
What's next: Teams should…
Nearby episodes from Ollama
- LLaMA Server Integration Hardening
- Integration Platform Expansion
- Model Integration Updates
- Weekly Recap - Infrastructure Modernization
- MLX Model Display Fixes and Template Parser Cleanup
- Weekly Recap - Performance Optimization & Launch System Improvements
- DFlash Speculative Decoding Rollback
- Model Inventory Refactoring