Summary: Google made Ironwood, its seventh-generation TPU, generally available at Cloud Next 2026 while previewing its eighth-generation architecture: TPU 8t (Sunfish), a Broadcom-designed training chip, and TPU 8i (Zebrafish), a MediaTek-designed inference chip, both targeting TSMC 2nm and late 2027. Ironwood delivers 4.6 petaFLOPS per chip and 42.5 exaFLOPS in a 9,216-chip superpod. The v8 split marks the first time Google has purpose-built separate training and inference chips, with Anthropic’s deal expanding to 3.5 gigawatts of compute in 2027, making it the anchor customer for both generations.
Google made its seventh-generation Tensor Processing Unit, Ironwood, generally available to cloud customers on Tuesday at Google Cloud Next in Las Vegas, positioning the chip as “the first Google TPU for the age of inference” and the centrepiece of what may be the most aggressive infrastructure investment in the company’s history. Ironwood delivers 4.6 petaFLOPS of peak FP8 compute per chip, roughly four times the performance of its predecessor Trillium, with 192 gigabytes of HBM3e memory and 7.37 terabytes per second of memory bandwidth. A single Ironwood superpod links 9,216 chips into a unified system delivering 42.5 exaFLOPS of compute, more than 24 times the capacity of El Capitan, currently the world’s most powerful supercomputer.
The numbers position Ironwood as a direct competitor to Nvidia’s Blackwell B200 on raw specifications. Both chips deliver roughly 4.5 to 4.6 petaFLOPS of FP8 compute and 192 gigabytes of HBM. Nvidia leads on single-device interconnect bandwidth, 14.4 terabits per second via NVLink compared with Ironwood’s 9.6 terabits over ICI, and supports FP4 precision, which doubles inference throughput for quantised models, a capability Ironwood lacks. Google’s advantage is at cluster scale: the superpod architecture, its energy efficiency at roughly twice the performance per watt of Trillium and 2.8 times that of Nvidia’s H100, and the economics of running inference workloads on custom silicon designed specifically for the task rather than on general-purpose GPUs adapted for it.
Why inference, why now
The emphasis on inference rather than training marks a strategic shift. Training a frontier model is a one-time capital expenditure measured in weeks or months. Inference, the process of running that model in response to every query from every user, is an ongoing operational cost that scales with demand and never stops. Google says it must double its AI serving capacity every six months to meet demand across Gemini, Search, YouTube, and Gmail. At that growth rate, the cost of inference becomes the single largest variable in the economics of AI, and the company that builds the cheapest, most efficient inference hardware captures the margin that would otherwise flow to Nvidia.
Ironwood is Google’s answer. It is purpose-built for the workloads that dominate production AI: large language model inference, mixture-of-experts architectures, diffusion models, and reinforcement learning. Its 192 gigabytes of HBM3e per chip allow it to hold larger model shards in memory, reducing the need to distribute a single model across multiple chips. Its 256-by-256 matrix multiply unit array, containing 65,536 multiply-accumulate operations per cycle, is optimised for the dense linear algebra that accounts for most of the compute in transformer inference. Google is also making its internal Pathways distributed runtime available to cloud customers for the first time, enabling multi-host inference with dynamic scaling across Ironwood pods.
Two chips for the eighth generation
Alongside Ironwood’s general availability, Google previewed its eighth-generation TPU architecture, and for the first time, it is splitting the line in two. TPU 8t, codenamed Sunfish, is a training accelerator designed with Broadcom. It features two compute dies, one I/O chiplet, and eight stacks of 12-high HBM3e, an upgrade from Ironwood’s eight-high stacks that delivers roughly 30% higher memory bandwidth. TPU 8i, codenamed Zebrafish, is an inference accelerator designed with MediaTek. It uses a single compute die, one I/O die, and six stacks of HBM3e, a simpler configuration engineered to deliver inference at 20 to 30% lower cost than the training variant. Both chips will be fabricated on TSMC’s 2-nanometre process node and are targeted for late 2027.
The bifurcation is the most significant architectural decision in Google’s TPU history. Every previous generation was a unified chip used for both training and inference. Splitting the two acknowledges a reality the industry has been approaching for years: the workloads are fundamentally different. Training demands maximum compute density and memory bandwidth to process trillions of parameters across weeks of continuous operation. Inference demands cost efficiency and low latency to serve billions of queries per day without the economics collapsing. Designing one chip that is optimal for both has always been a compromise. Google has decided to stop compromising.
The decision also locks in the multi-supplier strategy. Broadcom handles the high-performance training silicon under a relationship that has been described as a $46 billion AI contract. MediaTek handles cost-optimised inference, having already proved its ability to deliver I/O modules for Ironwood at 20 to 30% lower cost than alternatives. MediaTek has reportedly requested a sevenfold increase in CoWoS advanced packaging capacity from TSMC to fulfil Google’s orders. Anthropic, whose deal has expanded to 3.5 gigawatts of compute coming online in 2027, will be among the first customers for both variants.
The supply chain behind the chips
The eighth-generation roadmap makes explicit what Ironwood’s general availability already implied: Google is building the AI industry’s most diversified custom chip supply chain. Broadcom designs Ironwood and the TPU 8t training chip under an agreement running through 2031. MediaTek designs the TPU 8i inference chip and cost-optimised variants including TPU v7e and v8e. Google is in talks with Marvell to develop a memory processing unit and another inference-focused TPU, potentially becoming the third design partner. Intel provides CPUs and is collaborating on custom IPU development. The multi-supplier approach gives Google negotiating leverage, supply chain redundancy, and the ability to assign each partner the workload profile that matches its strengths.
Google projects 4.3 million TPU shipments in 2026, rising to 10 million in 2027 and more than 35 million in 2028. The capital expenditure to support this is enormous. Google has committed $175 billion to $185 billion in infrastructure spending for 2026, nearly doubling the $91.4 billion it spent in 2025, itself a figure that was revised upward twice during the year. Roughly 60% goes to servers and 40% to data centres and networking equipment. Combined with Microsoft, Meta, and Amazon, total big tech AI infrastructure spending is approaching $700 billion this year.
The customer that matters most
Anthropic is Ironwood’s marquee customer and increasingly Google’s most important cloud client. The company will have access to up to one million TPU chips and more than a gigawatt of capacity in 2026. The first phase covers 400,000 Ironwood units, worth an estimated $10 billion in finished racks from Broadcom. The remaining 600,000 units are rented through Google Cloud Platform. The partnership has already expanded: Anthropic’s deal now covers 3.5 gigawatts of compute coming online in 2027, positioning it as the anchor customer for the eighth-generation TPUs as well. Anthropic’s run-rate revenue has surpassed $30 billion, up from roughly $9 billion at the end of 2025, and it described itself as “compelled by the impressive price-performance gains.” It maintains a diversified compute strategy that also includes Amazon Trainium and Nvidia GPUs. The fact that Anthropic is even exploring designing its own custom chips while simultaneously committing to a multi-gigawatt Google TPU deployment illustrates both the scale of inference demand and the degree to which the economics of AI hardware have become a competitive variable in their own right.
The Nvidia question
Ironwood reaches general availability roughly one year after Nvidia’s Blackwell architecture. In that year, every major cloud provider has accelerated its custom silicon programme. Amazon values its custom chip business at $50 billion and has hinted at selling Trainium externally. Microsoft’s Maia 200, announced in January, claims three times the FP4 performance of Amazon’s Trainium 3. Meta continues developing MTIA. The custom ASIC market for AI is growing at 44.6% annually, compared with 16.1% for GPUs. Analysts project that custom chips could account for 45% of the AI chip market by 2028, and that Nvidia’s share of the inference market specifically could fall from more than 90% to between 20 and 30%.
Nvidia’s response has been to lock in the ecosystem through NVLink Fusion, a strategy that makes its interconnect the default standard for custom silicon, including chips designed to compete with Nvidia’s own GPUs. Jensen Huang has brushed off the custom chip threat with characteristic confidence: “Not that easy building something better.” He is not wrong that Nvidia’s CUDA software ecosystem, developer tools, and interconnect standards create switching costs that raw hardware specifications do not capture. But the direction of the market is clear. The hyperscalers are building their own chips not because they think they can beat Nvidia on every metric but because they have concluded that purpose-built inference silicon, optimised for their specific workloads and deployed at their specific scale, produces better economics than buying Nvidia GPUs at Nvidia’s margins.
Google Cloud holds roughly 11% of the cloud infrastructure market, behind AWS at 31% and Azure at 25%, but exited 2025 with the fastest growth rate among the three and its first sustained period of profitability. Ironwood does not change Google’s position in the cloud market overnight, and Nvidia’s Rubin architecture, also targeting late 2027, may reclaim performance advantages in memory and networking that could shift total cost of ownership calculations back in its favour. But the direction Google has committed to is now irreversible: a roadmap stretching from Ironwood shipping today through two purpose-built eighth-generation chips at 2 nanometres in 2027, backed by $185 billion in annual infrastructure spending, a four-partner supply chain, and an anchor customer in Anthropic that is scaling to 3.5 gigawatts. The chip race is, in the end, a margin race. Google is betting that the margin belongs to the company that builds the silicon, not the company that buys it, and it is now building two chips where it used to build one.


