Ollama Now Runs Faster on Macs Thanks to Apple's MLX Framework - MacRumorsOpen MenuShow RoundupsShow Forums menuVisit ForumsOpen Sidebar
Skip to Content

Ollama Now Runs Faster on Macs Thanks to Apple's MLX Framework

Ollama, the popular app for running AI models locally on a computer, has released an update that takes advantage of Apple's own machine learning framework, MLX. The result is a hefty speed boost on Macs with Apple silicon.

ollama logo mac
According to Ollama, the new version processes prompts around 1.6 times faster (prefill speed) and nearly doubles the speed at which it generates responses (decode speed). Macs with M5-series chips are said to see the largest improvements, thanks to Apple's new GPU Neural Accelerators.

The update also includes smarter memory management, which should make AI-powered coding tools and chat assistants feel noticeably more responsive during extended use.

Ollama says the new performance boost should especially benefit macOS users who run personal assistants like OpenClaw or coding agents like Claude Code, OpenCode, or Codex.

The preview release is available to download as Ollama 0.19 – just make sure you have a Mac with more than 32GB of unified memory to run it. Support is currently limited to Alibaba's Qwen3.5, but Ollama says support for more AI models is planned.

Popular Stories

airpods pro 3 pink

New Apple Card Holders Can Get Free AirPods Pro 3, But There's a Catch

Monday May 18, 2026 8:11 am PDT by
Apple today launched a new promotion offering new Apple Card holders the chance to earn back the cost of AirPods Pro 3 through monthly cash rebates, but there is a recurring spend requirement attached. Customers who open a new Apple Card account and purchase AirPods Pro 3 directly from Apple by June 15 will qualify. Starting July 1 and running through April 30, 2027, cardholders can earn $25 ...
Foldable iPhone 2023 Feature 1

Foldable iPhone Production Stalls Amid Hinge Issues

Monday May 18, 2026 7:29 am PDT by
Trial production of Apple's long-anticipated foldable iPhone, likely called the "iPhone Ultra," has run into a significant engineering hurdle centered on hinge reliability, according to a known leaker. The leaker known as "Instant Digital" posted on Weibo that the foldable device's hinge is consistently failing to meet Apple's quality control standards under conditions of prolonged,...
wwdc apple park in person

Apple Announces WWDC 2026 Schedule, Sends Media Invites

Monday May 18, 2026 10:23 am PDT by
Apple today provided a schedule for its 2026 Worldwide Developers Conference, which starts on June 8 and ends on June 12. Apple also sent out invites to members of the media who have been invited to attend an in-person keynote viewing at Apple Park. Both the invites and schedule confirm that the keynote will begin at the standard time, 10:00 a.m. Pacific Time or 1:00 p.m Eastern Time....

Top Rated Comments

7 weeks ago

This is going to be some serious cash flow incoming for Apple in this year.
I think this could be a major business for Apple - it’s way cheaper for a small business to buy a powerful Mac and run qwen 3.5 than pay for an enterprise license for a frontier model - and you don’t need to worry about privacy issues.
Score: 11 Votes (Like | Disagree)
7 weeks ago
On device is definitely gonna be the future.

I can’t help but wonder if Apple looked ahead and foresaw this when developing the M series, or if they’ve lucked into it.
Score: 10 Votes (Like | Disagree)
Justin Cymbal Avatar
7 weeks ago
M-Series chips at work😎
Score: 7 Votes (Like | Disagree)
7 weeks ago
As someone who downloads and experiments with everything possible…

There is a lot of delusion in this thread. Local language models below 100 billion parameters are quite useless. Even 100 billion parameters is considered the weak side. Fun to play with for a while but boredom and frustration sets in quickly.

So what happens is they want the next model…and then the next one…and then the next one…falsely believing their 16GB or 32GB machine will one day have the holy grail of small and powerful local language model.

But it doesn’t happen. The models keep growing and aside from being memory hungry the most important thing that makes them useable is memory bandwidth.

The top 5 language models in the world are all over a trillion parameters and what makes them useful and responsive is that they respond quickly and have GPU with over a terabyte of bandwidth.
Score: 6 Votes (Like | Disagree)
Kirkster Avatar
7 weeks ago
They are so far behind LM Studio. And only support for one model?
Score: 6 Votes (Like | Disagree)
7 weeks ago
This is going to be some serious cash flow incoming for Apple in this year.
Score: 6 Votes (Like | Disagree)