TL;DR
Nebius, the Dutch neocloud that split from Yandex in 2024, agreed to acquire Eigen AI for $643 million, valuing the 20-person MIT-alumni startup at roughly $32 million per employee. Eigen’s inference optimisation technology maximises tokens per Nvidia GPU, the most valuable capability in AI infrastructure. The deal strengthens Nebius’s Token Factory inference platform as the neocloud market expands rapidly with CoreWeave and FluidStack raising billions.
Nebius Group, the Dutch cloud computing company that split from Russian internet provider Yandex in 2024, has agreed to acquire Eigen AI for approximately $643 million in stock and cash. The deal, announced on 1 May, is for a 20-person startup founded by alumni of MIT’s HAN Lab. In a market where the largest AI companies are valued in the hundreds of billions and the most prominent acquisitions involve thousands of engineers, $643 million for 20 people requires explanation. The explanation is inference. Eigen AI’s technology maximises the number of tokens, the basic units of data in large language models, that each Nvidia chip can generate when running AI models. “This is like the Olympic sport of the current market: who can extract more tokens for the same price?” said Roman Chernin, Nebius co-founder and chief business officer. The Eigen team members, he said, are “like Olympic runners in this discipline.” The discipline, it turns out, is worth $32 million per person.
The economics
The AI industry’s most expensive problem is not training models. It is running them. Training a frontier model is a one-time capital expenditure, measured in hundreds of millions of dollars, that produces a set of weights. Inference, the process of running those weights to generate responses for users, is a recurring operational cost that scales with every query, every API call, and every token produced. For companies that sell AI as a service, inference is the dominant cost line. Every percentage point of efficiency gained in inference, every additional token squeezed from the same Nvidia GPU, translates directly into lower costs or higher margins. Eigen AI specialises in exactly this: optimising the performance of open-source models from OpenAI, Alibaba, Meta, and Nvidia so that each chip produces more output for the same input of electricity and silicon.
The technique that made Eigen AI’s founders notable in the field is activation-aware weight quantisation, a method for compressing AI models from higher-precision to lower-precision numerical formats without significant loss in output quality. Co-founder Wei-Chen Wang received the MLSys 2024 Best Paper Award for this work. In practice, quantisation allows a model that would normally require four GPUs to run on two, or allows a model running on one GPU to generate tokens twice as fast. For a cloud provider like Nebius, which raised $700 million from Nvidia and Accel to build out its GPU fleet, the ability to extract more value from each chip changes the unit economics of the entire business.
The acquirer
Nebius occupies a specific position in the AI infrastructure market. It is one of a group of companies called “neoclouds,” cloud providers that rent AI computing capacity to enterprises rather than building consumer products. The established hyperscalers, AWS, Microsoft Azure, and Google Cloud, dominate the cloud market overall, but the neoclouds have carved out a niche by offering AI-optimised infrastructure with lower overhead and faster deployment. Nebius has been tripling its Nvidia GPU capacity at its data centre in Finland, deploying Nvidia’s H200 chips, and launched a data centre in Paris as part of a $1 billion European investment plan. In November, it unveiled Token Factory, a managed inference product that competes with startups like Fireworks and Baseten as well as the hyperscalers’ own inference offerings.
The acquisition of Eigen AI is intended to make Token Factory the most efficient inference platform on the market. With Eigen’s optimisation layer integrated into Token Factory, Nebius can offer customers lower per-token prices or higher throughput from the same hardware, a competitive advantage in a market where pricing is transparent and switching costs are low. The neocloud market is expanding rapidly, with companies like CoreWeave signing infrastructure deals worth tens of billions. FluidStack, another neocloud, is in talks to raise $1 billion at an $18 billion valuation. The competitive dynamics are clear: whoever can offer the most tokens per dollar per GPU wins.
The strategy
The Eigen deal is Nebius’s second acquisition in three months, following its February purchase of Tavily, an AI agent search company, for $275 million. Chernin said the company is looking at other deal opportunities. The pattern suggests a strategy of acquiring small, technically excellent teams whose capabilities would take years to build internally. Eigen AI brings 20 people and a production-grade optimisation stack. Tavily brought search infrastructure for AI agents. Both acquisitions add capabilities that move Nebius up the stack, from renting raw GPU capacity toward providing higher-value services that interact directly with customers.
“We don’t want to be the infrastructure and someone above us works with the real customers,” Chernin said. This is the neocloud dilemma in a sentence. Renting GPU capacity is profitable but commoditised. The margins improve as you move closer to the application layer: from raw compute, to managed inference, to optimised model serving, to fine-tuning pipelines, to enterprise-grade endpoints. Eigen AI’s technology operates at the intersection of compute and model serving, which is precisely where the value in AI infrastructure is migrating. The $643 million price tag, roughly $32 million per employee, reflects a market in which the scarcest resource is not chips or capital but the people who know how to make chips produce more tokens for less money. With data centre capacity in short supply, Nebius is reserving some of its computing power for Token Factory rather than selling it in multiyear bulk deals, charging premium prices for short-notice inference contracts. The economics only work if each GPU generates as many tokens as possible. That is what Nebius just bought.


