Amazon and Cerebras Carve Up the AI Chip — and Nvidia’s Lunch

Valued at $23.1 billion, Cerebras is a startup aiming to take on Nvidia by building a fundamentally different kind of AI chip

The AI infrastructure war entered a new phase on Friday when Amazon Web Services and chip startup Cerebras Systems announced they had struck a deal to jointly power a new class of AI inference services on the cloud. The two companies said they would combine their computing chips in a service aimed at speeding up chatbots, coding tools, and other artificial intelligence applications.

The mechanics matter here. Amazon and Cerebras will split AI processing tasks: Trainium3 chips handle the “prefill” step — translating user requests into AI tokens — while Cerebras chips handle the “decode” step, producing AI responses. Cerebras CEO Andrew Feldman described it to Reuters as a “divide and conquer strategy.” The arrangement is designed to attack one of the most expensive and technically frustrating bottlenecks in modern AI: the memory wall that slows down large language models as they generate answers token by token.

Valued at $23.1 billion, Cerebras is a startup aiming to take on Nvidia by building a fundamentally different kind of AI chip — one that does not rely on expensive high-bandwidth memory as Nvidia’s flagship chips do. The company’s wafer-scale architecture packs an entire silicon wafer into a single processor, eliminating the data-shuffling overhead that hobbles conventional GPU designs during inference. Earlier this year, Cerebras also signed a $10 billion deal to supply chips to OpenAI.

The strategic implications reach well beyond a single product launch. For AWS, the partnership is a direct hedge against its dependence on Nvidia’s Blackwell hardware, which remains expensive and supply-constrained. By developing an in-house inference stack that blends its own Trainium silicon with Cerebras’s decoding architecture, Amazon gains both a cost advantage and a differentiated product story for enterprise customers. Feldman framed the customer pitch in expansive terms: “Every customer large or small is on AWS, from individual developers to the largest banks in the world,” he said, adding the deal would make it as easy as a single click to access Cerebras hardware.

This “divide and conquer” approach is similar to what Nvidia is expected to announce with Groq, a startup it acquired for $17 billion. That parallel is telling: the dominant GPU maker is preparing its own version of disaggregated inference precisely because the industry has concluded that no single chip architecture handles both stages of AI processing optimally. The fact that Amazon moved first — and with a $23 billion partner — means Nvidia’s GTC keynote, scheduled for March 16, will now be watched partly as a response to this deal rather than simply a product showcase.

For the broader market, the takeaway is a structural shift in how AI infrastructure is being built. The era of routing every workload through a single type of expensive GPU is giving way to specialised, tiered compute architectures. The AWS-Cerebras service is expected to come online in the second half of 2026.