While GPUs are in high demand, they still need high-performance memory chips for AI apps. The market is tight for both - for now.
As the adoption of generative artificial intelligence (genAI) continues to soar, the infrastructure to support that growth is currently running into a supply and demand bottleneck.
Sixty-six percent of enterprises worldwide said they would be investing in genAI over the next 18 months, according to IDC research. Among organizations indicating genAI will see increased IT spending in 2024, infrastructure will account for 46% of the total spend. The problem: a key piece of hardware needed to build out that AI infrastructure is in short supply.
The breakneck pace of AI adoption over the past two years has strained the industry's ability to supply the special high-performance chips needed to run the process-intensive operations of genAI and AI in general. Most of the focus on processor shortages has been on the exploding demand for Nvidia GPUs and alternatives from various chip designers such as AMD, Intel, and the hyperscale datacenter operators, according to Benjamin Lee, a professor in the Department of Computer and Information Science at the University of Pennsylvania.
"There has been much less attention focused on exploding demand for high-bandwidth memory chips, which are fabricated in Korea-based foundries run by SK Hynix," Lee said.
Last week, SK Hynix said its high-bandwidth memory (HBM) products, which are needed in combination with high-performance GPUs to handle AI processing requirements, are almost fully booked through 2025 because of high demand. The price of HBMs has also recently increased by 5% to 10%, driven by significant premiums and increased capacity needs for AI chips, according to market research firm TrendForce.
SK Hynix
HBM chips are expected to account for more than 20% of the total DRAM market value starting in 2024, potentially exceeding 30% by 2025, according to TrendForce Senior Research Vice President Avril Wu. "Not all major suppliers have passed customer qualifications for [high-performance HBM], leading buyers to accept higher prices to secure stable and quality supplies," Wu said in a research report.
Why GPUs need high-bandwidth memory
Without HBM chips, a data center server's memory system would be unable to keep up with a high-performance processor, such as a GPU, according to Lee. HBMs are what supply GPUs with the data they process. "Anyone who purchases a GPU for AI computation will also need high-bandwidth memory," Lee said.
"In other words, high-performance GPUs would be poorly utilized and often sit idle waiting for data transfers. In summary, high demand for SK Hynix memory chips is caused by high demand for Nvidia GPU chips and, to a lesser extent, associated with demand for alternative AI chips such as those from AMD, Intel, and others," he said.
"HBM is relatively new and picking up a strong momentum because of what HBM offers - more bandwidth and capacity," said Gartner analyst Gaurav Gupta. "It is different than what Nvidia and Intel sell. Other than SK Hynix, the situation for HBM is similar for other memory players. For Nvidia, I believe there are constraints, but more associated with packaging capacity for their chips with foundries."
While SK Hynix is reaching its supply limits, Samsung and Micron are ramping up HBM production and should be able to support the demand as the market becomes more distributed, according to Lee.
The current HBM shortages are primarily in the packaging from TSMC (i.e., chip-on-wafer-on-substrate or CoWoS), which is the exclusive supplier of the technology. According to Lee, TSMC is more than doubling its SOIC capacity and boosting capacity for CoWoS by more than 60%. "I expect the shortages to ease by the end of this year," he said.
At the same time, more packaging and foundry suppliers are coming online and qualifying their technology to support NVIDIA, AMD, Broadcom, Amazon, and others using TSMC's chip packaging technology, according to Lee.
Nvidia, whose production represents about 70% of the global supply of AI server chips, is expected to generate$40 billion in revenue from GPU sales this year, according to Bloomberg analysts. By comparison, competitors Intel and AMD are expected to generate$500 million and$3.5 billion, respectively. But all three are ramping production as quickly as possible.
Nvidia is tackling the GPU supply shortage by increasing its CoWoS and HBM production capacities, according to TrendForce. "This proactive approach is expected to cut the current average delivery time of 40 weeks in half by the second quarter [of 2024], as new capacities start to come online," TrendForce report said in its report. "This expansion aims to alleviate the supply chain bottlenecks that have hindered AI server availability due to GPU shortages."
Shane Rau, IDC's research vice president for computing semiconductors, said that while demand for AI chip capacity is very high, markets are adapting. "In the case of server-class GPUs, they're increasing supply of wafers, packaging, and memories. The increased supply is key because, due to their performance and programmability, server-class GPUs will remain the platform of choice for training and running large AI models."
Chipmakers scramble to meet the demand for AI
Global spending on AI-focused chips is expected to hit$53 billion this year - and to more than double over the next four years, according to Gartner Research. So it's no surprise that chipmakers are rolling out new processors as quickly as they can.
Intel has announced its plans for chips aimed at powering AI functions with its Gaudi 3 processors, and has said its Xeon 6 processors, which can run retrieval augmented generation (RAG) processes, will also be key. The Gaudi 3 GPU was purpose-built for training and running massive large language models (LLMs) that underpin genAI in data centers.
Meanwhile, AMD in its most recent earnings call, touted its MI300 GPU for AI data center workloads, which also has good market traction, according to IDC Group Vice President Mario Morales, adding that the research firm is tracking over 80 semiconductor vendors developing specialized chips for AI.
On the software side of the equation, LLM creators are also developing smaller models tailored for specific tasks; they require fewer processing resources and rely on local, proprietary data - unlike the massive, amorphous algorithms that boast hundreds of billions or even more than a trillion parameters.
Intel's strategy going forward is similar: it wants to enable genAI on every type of computing device, from laptops to smart phones. Intel's Xeon 6 processors will include some versions with onboard neural processing units (NPUs or "AI accelerators") for use in workstations, PCs and edge devices. Intel also claims its Xeon 6 processors will be good enough to run smaller, more customized LLMs.
Even so, without HBMs, those processors would likely struggle to keep up with genAI's high performance demands.