The headline from Apple’s developer conference was a reborn Siri. The more interesting story sits underneath it: the AI models Apple built to run the thing, one of which is far too big to fit in an iPhone’s memory, yet runs on the device anyway.
In a technical post published alongside WWDC, Apple detailed the third generation of its Apple Foundation Models, a family of five models it describes as “custom-built in collaboration with Google.”
Two run on-device: AFM 3 Core, a 3-billion-parameter model for everyday tasks, and AFM 3 Core Advanced, its most powerful on-device model. Three more run in the cloud: AFM 3 Cloud, a server workhorse; ADM 3 Cloud, an image model behind Image Playground and Genmoji; and AFM 3 Cloud Pro, the heavyweight built for agentic tool use and complex reasoning.
The clever engineering is in Core Advanced.
It is a 20-billion-parameter, natively multimodal model, the kind of size that normally lives in a data centre, not a phone. Apple’s trick is to keep the entire model in flash storage rather than the much smaller pool of working memory. Using a technique its researchers call Instruction-Following Pruning, the model makes routing decisions once per prompt, loading only a small set of “expert” parameters into memory, between 1 and 4 billion at a time, while keeping a core of shared experts always on.
That lets Apple scale the model “far beyond traditional DRAM limits,” it says, and powers the more expressive voices and sharper dictation in this year’s software.
The cloud models lean on Apple’s Private Cloud Compute, which the company says keeps user data from being stored or shared with anyone, including Apple. For the top-end Cloud Pro model, Apple worked with Google and Nvidia to extend that privacy architecture onto Nvidia GPUs in Google Cloud.
That Google partnership is the detail worth untangling. Coverage of the keynote variously suggested Apple’s models were “distilled from Gemini” or contained no Google technology at all.
The technical post lands in between: the AFM family is Apple’s own, “custom-built in collaboration with Google,” and trained on Google’s cloud TPUs, while the heaviest reasoning behind the new Siri reportedly draws on a large custom Google model. In short, the models are Apple’s, the muscle and much of the infrastructure are Google’s.
For developers, the more consequential change is the Foundation Models framework.
Apps can tap the on-device model directly, and this year Apple added a model-abstraction layer that lets developers swap in third-party models such as Anthropic’s Claude or Google’s Gemini without rewriting their code, while iOS 27 will let users set a rival assistant as their default. It is an unusually open stance for Apple, even if Apple Intelligence itself still is not coming to the EU on the same timeline.
The usual caveat applies to the numbers. Apple’s post is studded with flattering comparisons, AFM 3 Cloud preferred over last year’s model on 64.7 per cent of prompts, expressive voices scoring 4.15 on a 5-point opinion scale against 3.87 for the old system, but these are Apple’s own human evaluations, not independent benchmarks, and the models are still in beta.
A fuller technical report is promised later this summer.
Still, after two years of being mocked for an assistant that did not work, this is Apple’s clearest argument that the plumbing is finally real: a small, private model for the everyday, bigger ones boxed inside its own cloud for the hard stuff, and Google’s frontier muscle where Apple still cannot compete alone.
Whether it holds up outside Apple’s own charts is the test that comes next.


