Groq Raising $650M to Fund Its Post-Nvidia Second Act as an Inference Neocloud

After December's $20 billion technology-licensing deal with Nvidia, Groq is rebuilding around its own chips and a hosted inference business — and existing investors are backstopping the round.

Groq is raising $650 million from existing investors to fund its pivot to an AI inference 'neocloud' business, according to reporting in Axios and TechCrunch this week. The round is led by Disruptive and Infinitum, who have agreed to fill any portion of the round that current shareholders decline. Adam Winter takes over as CEO and Matt Eng as CFO, both Groq veterans. The fundraise is effectively the launch capital for Groq 2.0.

The pivot follows December's $20 billion technology-licensing agreement with Nvidia — a deal that handed Nvidia rights to Groq's homegrown chip and systems IP and moved a chunk of Groq's senior engineering team to Nvidia. That transaction returned cash to shareholders but left Groq itself smaller and without its original mission of selling LPU-based hardware to enterprises. The neocloud strategy keeps Groq in the game by selling tokens-per-second rather than tin: customers don't buy the chip, they buy inference capacity on Groq's hosted infrastructure.

Groq is not the only player making this bet. Cerebras, SambaNova, and a wave of GPU-as-a-service providers are all positioning around the same thesis — that inference, not training, is where AI spending plateaus, and that customers will tolerate a non-CUDA stack if the latency and price-per-token math works. Groq's specific edge has always been latency on language-model inference, where its LPU architecture posts numbers that GPU clusters struggle to match. The question now is whether that latency advantage translates into the kind of margin a venture-backed cloud business needs.

A takeaway for learners: the AI hardware story is bifurcating. The training side still belongs to Nvidia. The inference side is becoming a real competitive market, with multiple architectures and many billing models. If you're building anything serious on top of a foundation model, it is worth running the same prompt against three or four inference providers and measuring tokens-per-second and dollars-per-million-tokens yourself. The 'best' provider depends on your workload, not the press release.