Ollama Now Runs Faster on Macs Thanks to Apple's MLX Framework

Tuesday March 31, 2026 3:22 am PDT by Tim Hardwick

Ollama, the popular app for running AI models locally on a computer, has released an update that takes advantage of Apple's own machine learning framework, MLX. The result is a hefty speed boost on Macs with Apple silicon.

According to Ollama, the new version processes prompts around 1.6 times faster (prefill speed) and nearly doubles the speed at which it generates responses (decode speed). Macs with M5-series chips are said to see the largest improvements, thanks to Apple's new GPU Neural Accelerators.

The update also includes smarter memory management, which should make AI-powered coding tools and chat assistants feel noticeably more responsive during extended use.

Ollama says the new performance boost should especially benefit macOS users who run personal assistants like OpenClaw or coding agents like Claude Code, OpenCode, or Codex.

The preview release is available to download as Ollama 0.19 – just make sure you have a Mac with more than 32GB of unified memory to run it. Support is currently limited to Alibaba's Qwen3.5, but Ollama says support for more AI models is planned.

Popular Stories

Apple Begins Selling a $419 iPhone

Monday July 6, 2026 6:29 am PDT by Joe Rossignol

Apple recently added the iPhone 16e to its refurbished store, with U.S. pricing starting as low as $419 for a model with 128GB of storage. Originally released in February 2025, the iPhone 16e is a lower-end device with a 6.1-inch OLED display, an A18 chip with 8GB of RAM for Apple Intelligence support, a single 48-megapixel rear camera, a 12-megapixel front camera, a USB-C port, an Action...

iPhone 18 With 9GB RAM Still Won't Support Two New iOS 27 Features

Friday July 3, 2026 12:10 pm PDT by Joe Rossignol

The lower-end iPhone 18 and iPhone 18e will be equipped with 9GB of RAM, up from 8GB in the iPhone 17 and iPhone 17e, according to supply chain analyst Ming-Chi Kuo. In a social media post, Kuo said the 1GB increase in RAM will ensure that Apple Intelligence features continue to run smoothly on the pair of devices. The higher-end iPhone 18 Pro, iPhone 18 Pro Max, and foldable "iPhone Ultra...

• 204 comments

'iPhone Ultra' Likely to 'Repeat the iPhone X Story' With Delayed Launch

Sunday July 5, 2026 10:28 am PDT by Joe Rossignol

Apple will likely "repeat the iPhone X story" by unveiling its foldable iPhone at the same time as the iPhone 18 Pro and iPhone 18 Pro Max, but starting foldable iPhone pre-orders at a later date, according to analyst Ming-Chi Kuo. Kuo today said manufacturing challenges have limited early production of the foldable iPhone, which will reportedly be named iPhone Ultra. As a result, he...

• 157 comments

Top Rated Comments

neilpmas

14 weeks ago

This is going to be some serious cash flow incoming for Apple in this year.

I think this could be a major business for Apple - it’s way cheaper for a small business to buy a powerful Mac and run qwen 3.5 than pay for an enterprise license for a frontier model - and you don’t need to worry about privacy issues.

Score: 11 Votes (Like | Disagree)

RemedyRabbit

14 weeks ago

On device is definitely gonna be the future.

I can’t help but wonder if Apple looked ahead and foresaw this when developing the M series, or if they’ve lucked into it.

Score: 10 Votes (Like | Disagree)

Justin Cymbal

14 weeks ago

M-Series chips at work😎

Score: 7 Votes (Like | Disagree)

JMalone

14 weeks ago

As someone who downloads and experiments with everything possible…

There is a lot of delusion in this thread. Local language models below 100 billion parameters are quite useless. Even 100 billion parameters is considered the weak side. Fun to play with for a while but boredom and frustration sets in quickly.

So what happens is they want the next model…and then the next one…and then the next one…falsely believing their 16GB or 32GB machine will one day have the holy grail of small and powerful local language model.

But it doesn’t happen. The models keep growing and aside from being memory hungry the most important thing that makes them useable is memory bandwidth.

The top 5 language models in the world are all over a trillion parameters and what makes them useful and responsive is that they respond quickly and have GPU with over a terabyte of bandwidth.

Score: 6 Votes (Like | Disagree)

Kirkster

14 weeks ago

They are so far behind LM Studio. And only support for one model?

Score: 6 Votes (Like | Disagree)

Takeo Apple

14 weeks ago

This is going to be some serious cash flow incoming for Apple in this year.

Score: 6 Votes (Like | Disagree)

Read All Comments

Ollama Now Runs Faster on Macs Thanks to Apple's MLX Framework

Popular Stories

Apple Begins Selling a $419 iPhone

iPhone 18 With 9GB RAM Still Won't Support Two New iOS 27 Features

'iPhone Ultra' Likely to 'Repeat the iPhone X Story' With Delayed Launch

Top Rated Comments

Next Article

Videos

Guides

Upcoming

Other Stories