Memory Magic and Command Makeover

Today brought some serious memory optimization wizardry with MLA absorption for GLM models - though it took a few tries to get the CUDA builds just right! Plus, the team made the CLI more intuitive by renaming `ollama config` to `ollama launch`, and we got some nice fixes for image generation support.

Duration: PT3M58S

Episode overview

This episode is a short developer briefing from Ollama.

It explains recent repository work in plain language.

  • Show: Ollama
  • Published: 2026-01-24T11:07:21Z
  • Audio duration: PT3M58S

Transcript excerpt

This excerpt keeps the crawler page concise. Listen to the episode or use the RSS feed for the full update.

Hey there, fellow developers! Welcome back to another episode of the Ollama podcast. I'm your host, and wow - what a day it's been in the codebase! Grab your favorite beverage because we've got some really exciting stuff to dive into.

So the big story today is all about memory optimization, and let me tell you - it's been quite the journey! Jeffrey Morgan has been working on something called MLA absorption for GLM models, which is essentially a way to compress the KV cache and use way less memory. Think of it like organizing your closet - instead…

Now here's where it gets interesting - this feature had quite the adventure getting merged. It went through what I like to call the "third time's the charm" dance. First it got merged, then it had to be reverted because of some CUDA build issues, and then Jeffrey came back with a fix and got it merged again. It's…

The technical bits are pretty fascinating if you're into the weeds. They're splitting combined KV_B tensors into separate K_B and V_B tensors, enabling this Multi-head Latent Attention compression. The tricky part was getting all the CUDA configurations just right across different GPU architectures. There were…

Moving on to user…

We…

Nearby episodes from Ollama

  1. Smooth Onboarding for New Users
  2. Polish and Perfectionism - The Art of Getting the Details Right
  3. Cleaning Up the Config Game
  4. Speed Boost and Model Magic
  5. Making Ollama Play Nice with Everyone
  6. The Great Cleanup - Manifests Get Their Own Home
  7. New Model Architecture and Image Generation Fixes
  8. New Model Support and Memory Management Wins