Perplexity splits AI inference between PCs and cloud to cut costs

TL;DR

Perplexity AI announced a platform at Computex that dynamically routes AI inference between PCs and cloud servers in real time, acting as an “air-traffic controller” for AI tasks. The chip-agnostic system targets the cost crisis of centralised inference as Perplexity’s revenue hits $500 million.

Perplexity AI has developed a platform that dynamically splits AI workloads between personal computers and cloud servers, deciding in real time which tasks can run locally on a PC’s processor and which need the power of data centre hardware. CEO Aravind Srinivas announced the system at Computex in Taipei on Tuesday, describing it as an “air-traffic controller for AI tasks” designed to reduce the cost of inference, the process of running trained AI models to generate responses.

“You don’t want all your compute centralised in servers and everything running through the largest models,” Srinivas said in a Bloomberg Television interview. “You’re already reading reports of how people are freaking out about their cost. Some people are spending half a billion dollars per month. What you actually want is efficient value per watt per user.”

How it works

The system evaluates each AI task and routes it to the most efficient compute layer. Simple operations that modern PC processors can handle, such as summarisation, formatting, or lightweight classification, run locally without touching the cloud. More complex tasks that require large model inference, such as multi-step reasoning or retrieval-augmented generation across large datasets, get routed to cloud servers. The routing decision happens in real time, invisible to the user.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol’ founder Boris, and some questionable AI art. It’s free, every week, in your inbox. Sign up now!

The practical effect is that Perplexity can serve more users at lower cost by offloading a portion of inference work to the billions of PCs already in circulation. As AI inference demand strains data centre capacity and drives utilities to plan $1.4 trillion in grid upgrades, distributing compute to the edge is both an economic and infrastructure necessity.

Srinivas made the announcement alongside Intel CEO Lip-Bu Tan, whose company leads the market for PC processors and has a commercial interest in making PCs a meaningful AI compute layer. However, Srinivas said the platform is “chip agnostic” and works with Nvidia processors as well. Nvidia highlighted the same edge-inference trend at Computex with its new RTX Spark platform for AI-powered laptops and desktops.

The cost problem

Srinivas’s reference to companies “spending half a billion dollars per month” on AI compute is not hyperbole. OpenAI’s infrastructure costs have been widely reported at that scale, and Anthropic’s projected $10.9 billion in Q2 revenue comes with substantial compute expenses that compress margins. The energy and cost burden of centralised AI inference is one of the defining constraints of the current AI boom.

Perplexity’s approach inverts the assumption that AI inference must happen in the cloud. By treating the PC as a first-class compute node rather than a thin client, the company can reduce its own server costs while potentially delivering faster responses for tasks that run locally. The tradeoff is complexity: the routing system must accurately assess task difficulty in milliseconds, and the quality of local inference depends on the user’s hardware capabilities.

Revenue efficiency

Perplexity’s financial trajectory underscores why cost efficiency matters. Srinivas posted on X in April that the company’s revenue grew fivefold, from $100 million to $500 million, while headcount increased just 34%. That ratio, roughly 15x revenue growth per employee added, reflects both the leverage of AI-native business models and Perplexity’s position as an aggregator that routes queries across multiple AI providers rather than training its own frontier models.

“Every time any of the AI gets better, our unified system also gets better because we route across all of them,” Srinivas said. The AI-native growth rates that are drawing capital away from traditional SaaS companies are partly enabled by this kind of architectural efficiency, where the product improves as its underlying providers improve, without proportional cost increases.

The hybrid compute platform extends that logic to hardware. If Perplexity can use the compute already sitting on users’ desks to handle a meaningful share of inference work, it reduces marginal cost per query and improves response latency for lightweight tasks. As AI moves deeper into enterprise workflows, the economics of who pays for the compute, the cloud provider, the AI company, or the user’s own hardware, will become a critical competitive variable.

Perplexity splits AI inference between PCs and cloud to cut costs

September 2026 Is Packed With Games Trying To Avoid GTA 6

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Categories

Perplexity splits AI inference between PCs and cloud to cut costs

TL;DR

How it works

The cost problem

Revenue efficiency

September 2026 Is Packed With Games Trying To Avoid GTA 6

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Categories

Get more stuff like this in your inbox

Get more stuff like this
in your inbox