Cerebras CEO's Insights on the AI Chip Landscape
This article summarizes an interview with Andrew Feldman, CEO and co-founder of Cerebras, by Harry Stebbings of 20VC. The interview covers Cerebras' founding, AI chip design philosophies, market strategies, and perspectives on the broader AI landscape.
The Genesis of Cerebras
Identifying the AI Opportunity
Founded in 2015, Cerebras recognized early on that AI software demands on underlying chip processors were fundamentally different from traditional computing. They foresaw the need for a hardware system specifically tailored for AI.
Focus on Memory Bandwidth and Communication
Cerebras understood that memory bandwidth and communication architecture would be critical bottlenecks in AI development. They built their company with these challenges in mind, aiming to overcome the limitations of existing hardware.
Cerebras' Chip Design Philosophy
Addressing Data Transfer Bottlenecks
Cerebras focused on the challenge of frequent data transfers in AI computations. AI operations, though based on simple matrix multiplication, require massive data movement between memory units and across GPUs.
Balancing Training, Fine-tuning, and Inference
Cerebras decided to address model training, fine-tuning, and inference. While training and fine-tuning share similar computational needs, inference, especially generative inference, places significant demands on memory bandwidth. For instance, generating a single word with a 70 billion parameter model could require moving 140GB of data.
Wafer-Scale Integration and SRAM
Traditional GPUs using HBM memory are limited by infrequent memory interactions, in constrast with AI's need for frequent data interaction. While SRAM is faster, capacity limitations are usually an issue. Cerebras uses wafer-scale integration to achieve both high speed and high capacity with SRAM. Traditional approaches requiring thousands of chips would be too complex and expensive.
Cerebras' Competitive Advantages
Advantages Over NVIDIA
While NVIDIA has been successful with HBM, Cerebras argues that its wafer-scale integration offers superior inference efficiency. Third-party tests support claims of the fastest inference speeds on various models.
Addressing Yield Challenges with Redundancy
Cerebras implements a unique approach with a design based on an array of units and redundant rows and columns. Defective units are bypassed using the backups. This innovation makes Cerebras the first company in 70 years to deliver a complete wafer.
Prioritizing Speed in Specific Scenarios
Cerebras offers a differentiated strategy: prioritizes cost control for batch processing but pursuing extreme speed for interactive scenarios. Millisecond-level delays impact user engagement.
The Future of AI: A Shift Towards Inference
Exponential Growth in Inference
Feldman predicts a significant shift in resource allocation from training to inference over the next five years. The inference market is experiencing "triple growth" as the number of AI users, usage frequency, and per-call computing needs all increase.
Challenges in AI Infrastructure
- Energy Consumption: AI is energy-intensive, straining power and water resources.
- Uneven Energy Distribution: Power-rich areas may lack suitable data center locations.
- Regulatory Hurdles: Local regulations impede national-scale data center construction.
Mitigating Inference Costs
- Data Center Optimization: Data centers should focus on the essential combination of electrical connectivity, construction engineering, and design optimization.
- Algorithm Optimization: Algorithm improvements can greatly enhance efficiency. Current GPU utilization for inference is low, leaving significant room for gains.
The Role of Synthetic Data
Synthetic data helps fill training gaps, especially in high-risk scenarios. When advances in compute, algorithms, and data are combined, AI will become faster, cheaper, and more widespread, opening up new applications.
Investment Perspective
Valuing Companies in the AI Sector
- Hardware: Difficult to invest in.
- Models: Valuabe despite competition if companies show an ability to improve.
- Software: Fiercely competitive, with sustaining power being a valuable trait.
NVIDIA's Market Share
Feldman believes NVIDIA's market share could decrease to 50-60% in the next five years. While NVIDIA dominates training with CUDA, the situation is different in inference. He believes users can switch platforms. Feldman sees NVIDIA's structural weakness in its GPU out-of-memory architecture being unsuitable for inference.
Cerebras' Competitive Positioning and Future Plans
- Technology Advantage: Cerebras' high gross margin shows its technology creates unique value.
- Going Public: The company is actively preparing for an IPO to improve its competitive position.
China's AI Development
The Impact of US Technological Restrictions
The US government aims to slow China's AI hardware development by restricting its access to technology. But software poses more challenges because it is less tangible and easily regulated. Additionally, China's data resources and software talent are substantial assets. Feldman notes that the US may underestimate China's AI capabilities at its own peril.