Rewritten (en): 【人工智能】推理需求将增长百倍 | Cerebras CEO Andrew Feldman 20VC专访 | 设计理念 | 晶圆级集成 | 分化策略 | 基础设施 | AI投资 | 上市 | 英伟达

Summary

Quick Abstract

Delve into the world of AI Chips with insights from Cerebras CEO Andrew Feldman's interview with 20VC. Discover the cutting-edge tech and strategic choices shaping the future of AI hardware. Explore Cerebras' unique approach to overcoming AI's data transfer bottlenecks and Nvidia's potential market shift.

Quick Takeaways:

  • Cerebras was founded on the insight that AI requires tailored hardware.
  • AI's challenge is massive, simple calculations requiring constant memory transfer.
  • Cerebras uses wafer-scale integration for high-speed, high-capacity SRAM, addressing the HBM limits.
  • Their architecture is uniquely suited for demanding inference tasks.
  • They prioritize cost for batch tasks and speed for interactive scenarios.
  • The interview highlights the coming shift from training-heavy to inference-driven AI.
  • He notes the growth of compute, usage frequency and users of AI.
  • Feldman acknowledges that Nvidia holds a dominant position in the AI market.
  • He believes that Nvidia's AI market share may decrease in the coming years.

Learn about AI's escalating energy demands, the role of synthetic data, and investment opportunities. Also, get Feldman's take on China's AI development.

Cerebras CEO's Insights on the AI Chip Landscape

This article summarizes an interview with Andrew Feldman, CEO and co-founder of Cerebras, by Harry Stebbings of 20VC. The interview covers Cerebras' founding, AI chip design philosophies, market strategies, and perspectives on the broader AI landscape.

The Genesis of Cerebras

Identifying the AI Opportunity

Founded in 2015, Cerebras recognized early on that AI software demands on underlying chip processors were fundamentally different from traditional computing. They foresaw the need for a hardware system specifically tailored for AI.

Focus on Memory Bandwidth and Communication

Cerebras understood that memory bandwidth and communication architecture would be critical bottlenecks in AI development. They built their company with these challenges in mind, aiming to overcome the limitations of existing hardware.

Cerebras' Chip Design Philosophy

Addressing Data Transfer Bottlenecks

Cerebras focused on the challenge of frequent data transfers in AI computations. AI operations, though based on simple matrix multiplication, require massive data movement between memory units and across GPUs.

Balancing Training, Fine-tuning, and Inference

Cerebras decided to address model training, fine-tuning, and inference. While training and fine-tuning share similar computational needs, inference, especially generative inference, places significant demands on memory bandwidth. For instance, generating a single word with a 70 billion parameter model could require moving 140GB of data.

Wafer-Scale Integration and SRAM

Traditional GPUs using HBM memory are limited by infrequent memory interactions, in constrast with AI's need for frequent data interaction. While SRAM is faster, capacity limitations are usually an issue. Cerebras uses wafer-scale integration to achieve both high speed and high capacity with SRAM. Traditional approaches requiring thousands of chips would be too complex and expensive.

Cerebras' Competitive Advantages

Advantages Over NVIDIA

While NVIDIA has been successful with HBM, Cerebras argues that its wafer-scale integration offers superior inference efficiency. Third-party tests support claims of the fastest inference speeds on various models.

Addressing Yield Challenges with Redundancy

Cerebras implements a unique approach with a design based on an array of units and redundant rows and columns. Defective units are bypassed using the backups. This innovation makes Cerebras the first company in 70 years to deliver a complete wafer.

Prioritizing Speed in Specific Scenarios

Cerebras offers a differentiated strategy: prioritizes cost control for batch processing but pursuing extreme speed for interactive scenarios. Millisecond-level delays impact user engagement.

The Future of AI: A Shift Towards Inference

Exponential Growth in Inference

Feldman predicts a significant shift in resource allocation from training to inference over the next five years. The inference market is experiencing "triple growth" as the number of AI users, usage frequency, and per-call computing needs all increase.

Challenges in AI Infrastructure

  • Energy Consumption: AI is energy-intensive, straining power and water resources.
  • Uneven Energy Distribution: Power-rich areas may lack suitable data center locations.
  • Regulatory Hurdles: Local regulations impede national-scale data center construction.

Mitigating Inference Costs

  • Data Center Optimization: Data centers should focus on the essential combination of electrical connectivity, construction engineering, and design optimization.
  • Algorithm Optimization: Algorithm improvements can greatly enhance efficiency. Current GPU utilization for inference is low, leaving significant room for gains.

The Role of Synthetic Data

Synthetic data helps fill training gaps, especially in high-risk scenarios. When advances in compute, algorithms, and data are combined, AI will become faster, cheaper, and more widespread, opening up new applications.

Investment Perspective

Valuing Companies in the AI Sector

  • Hardware: Difficult to invest in.
  • Models: Valuabe despite competition if companies show an ability to improve.
  • Software: Fiercely competitive, with sustaining power being a valuable trait.

NVIDIA's Market Share

Feldman believes NVIDIA's market share could decrease to 50-60% in the next five years. While NVIDIA dominates training with CUDA, the situation is different in inference. He believes users can switch platforms. Feldman sees NVIDIA's structural weakness in its GPU out-of-memory architecture being unsuitable for inference.

Cerebras' Competitive Positioning and Future Plans

  • Technology Advantage: Cerebras' high gross margin shows its technology creates unique value.
  • Going Public: The company is actively preparing for an IPO to improve its competitive position.

China's AI Development

The Impact of US Technological Restrictions

The US government aims to slow China's AI hardware development by restricting its access to technology. But software poses more challenges because it is less tangible and easily regulated. Additionally, China's data resources and software talent are substantial assets. Feldman notes that the US may underestimate China's AI capabilities at its own peril.

Was this summary helpful?