Ollama: Performance Lessons and Gemma4 Refinements

The Ollama team tackled critical Gemma4 performance issues, with a fascinating story of enabling flash attention only to revert it due to a 60% performance regression. Major improvements included reworking tool call handling with cleaner code and fixing ROCm build issues for better GPU compatibility.

Duration: PT3M53S

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

  • Show: Ollama
  • Published: 2026-04-04T10:00:33Z
  • Audio duration: PT3M53S

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Hey there, developers! Welcome back to another episode of the Ollama podcast. I'm so excited to catch up with you today because we've got a really interesting story from yesterday's development work - one of those classic tales that reminds us why thorough performance testing is absolutely crucial in our field.

Let's dive right into the main event, because this is honestly fascinating. The team has been working hard on Gemma4 improvements, and there's this perfect example of how software development really works in practice. Daniel Hiltgen submitted a pull request to enable flash attention for Gemma4 - which sounds great,…

But here's where it gets interesting. Sometimes in software development, what looks like a good idea on paper doesn't work out in practice. The team ran their performance benchmarks and discovered that enabling flash attention actually caused a massive 60% performance regression for Gemma4 prefill operations. That's…

Speaking of Gemma4 improvements, Devon Rifkin made some really excellent changes to the tool call handling. This is one of those refactoring wins that I absolutely love to see. Devon replaced a custom argument normalizer with what they call a…

Jesse…

And…

Nearby episodes from Ollama

  1. Gemma4 Parser Improvements
  2. Model Updates and Tool Call Fixes
  3. Error Handling and Modelfile Fixes
  4. Weekly Recap - Gemma4 Integration & Audio Support
  5. Gemma4 Arrives with Audio Magic
  6. Modernizing Codex Configuration
  7. Tokenizer Love and Better Model Support
  8. Legacy Compatibility and Developer Experience Wins