最佳拍档: Google Cloud Next 2025: New TPU, AI Model Updates & A2A Protocol

Google Cloud Next 2024: Google Aims to Reclaim AI Throne

Google held its annual Google Cloud Next conference in Las Vegas on April 10th, showcasing significant advancements in its AI capabilities. The event highlighted a range of developments, including the debut of its latest TPU, upgraded AI models, a new Agent-to-Agent (A2A) protocol, and enhancements to its code assistance tools. These innovations signal Google's ambition to reshape the AI landscape. This article will provide an overview of the key announcements made at the conference.

Seventh-Generation TPU: Ironwood

Ironwood: Challenging NVIDIA's Blackwell B200

The most prominent announcement was the unveiling of Google's seventh-generation TPU, Ironwood. This chip is designed to compete directly with NVIDIA's Blackwell B200. Google positions Ironwood as its most powerful and scalable custom AI accelerator to date, specifically optimized for inference.

Performance and Specifications

Ironwood boasts impressive performance improvements compared to previous generations. Its inference performance is reportedly 3600 times faster and 29 times more efficient than the first-generation TPU from 2018. * It features 192GB of HBM memory, six times more than the sixth-generation TPU Trillium, and also six times more than the TPU v4. * The chip's HBM bandwidth has increased to 7.2 Tbps, 4.5 times that of Trillium. * The chip-to-chip interconnect (ICI) bi-directional bandwidth is now 1.2 Tbps, 1.5 times greater than Trillium.

These improvements in memory capacity and bandwidth allow Ironwood to handle larger models and datasets while reducing data transfer bottlenecks, ultimately boosting performance.

Scalability and Compute Power

For Google Cloud customers, Ironwood is available in two configurations: 256 chips and 9216 chips. Each individual chip has a peak FP8 compute power of 4614 TFLOPs. A pod consisting of 9216 chips reaches 42.5 Exaflops in FP8 precision.

Google stated that this compute power exceeds that of the world's largest supercomputer, El Capitan, by over 24 times. However, this comparison is based on El Capitan's FP64 precision performance (1.74 exaFLOPS) versus Ironwood's FP8 performance. When both are converted to FP8, El Capitan's theoretical peak performance is closer to 87 exaFLOPS, still exceeding Ironwood. Even so, 42.5 Exaflops of FP8 compute power is a considerable figure for large-scale inference tasks.

Enhanced Features

Ironwood is also equipped with an enhanced version of SparseCore, a dedicated accelerator for advanced ranking and recommendation tasks. This expands Ironwood's applications beyond traditional AI, making it suitable for fields like finance and science. The Pathways ML runtime, developed by Google DeepMind, is designed to work seamlessly with Ironwood to enable efficient distributed computing across multiple TPU chips. Google has also integrated new GKE inference capabilities and vLLM support, allowing PyTorch code optimized for GPUs to be easily transferred and run on TPUs.

Power Efficiency

Ironwood prioritizes power efficiency, achieving a two-fold improvement compared to the sixth-generation TPU Trillium and a 29-fold increase compared to the first-generation TPU. Google utilizes advanced liquid cooling solutions and optimized chip design to maintain performance even under heavy AI workloads.

Competitive Analysis

OpenAI researchers have compared Ironwood's performance with NVIDIA's GB200, suggesting that the two are comparable, with Ironwood potentially having a slight edge in power efficiency. Google's VP and General Manager of Cloud AI, Amin Vahdat, stated that Ironwood is designed to support the next phase of generative AI and its demands for compute and communication, as AI agents transition to proactively retrieving and generating data for collaborative insights.

Vertex AI Platform Updates

Google's Vertex AI platform now supports all modalities, including video, image, voice, and music. The conference introduced four significant updates to the platform:

Lyria (Text-to-Music Model): Lyria enables users to generate complete music tracks from text prompts for production use. Businesses can create custom soundtracks aligned with their brand for marketing campaigns, product launches, or immersive experiences. Creators can use Lyria to accelerate content creation workflows and reduce licensing costs.
Veo 2 (Video Generation Model): Veo 2 has been upgraded with new features for video creation, editing, and visual effects. Enhancements include video restoration capabilities for clean edits, the removal of unwanted objects, image expansion to adapt content for different platforms, and the ability to apply complex cinematic techniques without specialized expertise. Veo 2 also has an interplation function to create transitions between different videos.
Chirp 3 (Speech Generation Model): Chirp 3 offers high-definition voices in over 35 languages and eight speaker options. New features include Instant Custom Voice (generating realistic customized voices from 10-second audio clips) and Transcription with Diarization (separating and identifying individual speakers in multi-person recordings).
Imagen 3 (Text-to-Image Model): Imagen 3 produces images with improved details, enhanced lighting, and fewer artifacts. Significant improvements have been made to its image inpainting capabilities, particularly for object removal.

Agent2Agent (A2A) Protocol

As AI agents become more prevalent, the need for interoperability between them grows. Google has introduced the Agent2Agent (A2A) protocol, an open standard enabling agents to collaborate across isolated data systems and applications. Over 50 partners support the new A2A protocol. A2A is designed to facilitate interaction between agents regardless of their underlying frameworks or vendors.

For example, in a large e-commerce company utilizing various platforms (Atlassian, Box, Salesforce, Workday), A2A allows agents on these platforms to communicate and automate data interactions securely.

Google followed five key principles when designing the protocol:

Focus on enabling agents to collaborate in their natural, unstructured modes.
Building on existing and popular standards (HTTP, SSE, JSON-RPC).
Supporting enterprise-grade authentication and authorization, on par with OpenAPI.
Offering flexibility to support scenarios from quick tasks to in-depth research.
Supporting various modalities, including audio, image, and video streams.

How A2A Works

A2A facilitates communication between client agents and remote agents. The client agent initiates tasks, and the remote agent executes them, providing information or performing actions. Key aspects of the protocol include:

Agent Cards: Agents advertise their capabilities using JSON-formatted "Agent Cards."
Task Management: Communication revolves around completing tasks, with a defined "Task" object and lifecycle.
Collaboration: Agents can exchange messages containing context, replies, artifacts, and user instructions.
User Experience Negotiation: Messages include "parts" specifying content types, enabling agents to negotiate optimal formats and UI capabilities.

Comparison with MCP

Google compared A2A with its Model Control Protocol (MCP). MCP primarily manages tools and resources, connecting agents to APIs and resources through structured inputs and outputs. A2A focuses on agent-to-agent collaboration, making the two protocols complementary.

Gemini Code Assist

Google's AI coding assistant, Gemini Code Assist, can now deploy new AI agents capable of performing complex programming tasks through multiple steps.

For example, it can create applications from Google Docs product specifications or translate code between languages.
Code Assist is now available in Android Studio, expanding its reach.

Conclusion

Google's Cloud Next conference showcased significant advancements in its AI offerings. From the powerful Ironwood TPU and the full-modality Vertex AI platform to the new A2A protocol and Gemini Code Assist, Google is demonstrating its commitment to innovation. Google CEO, Pichai, noted that Gemini 2.5 Pro is now available to all users in AI Studio, Vertex AI, and Gemini applications. With the increase of users in these tools shows Google's AI growth. As OpenAI prepares for its own series of announcements, Google is expected to continue its AI development.

Google Cloud Next 2025: New TPU, AI Model Updates & A2A Protocol

Summary

Quick Abstract

Google Cloud Next 2024: Google Aims to Reclaim AI Throne

Seventh-Generation TPU: Ironwood

Ironwood: Challenging NVIDIA's Blackwell B200

Performance and Specifications

Scalability and Compute Power

Enhanced Features

Power Efficiency

Competitive Analysis

Vertex AI Platform Updates

Agent2Agent (A2A) Protocol

How A2A Works

Comparison with MCP

Gemini Code Assist

Conclusion

Quick Actions

More from 最佳拍档

【人工智能】OpenAI发布满血版o3和o4-mini | 迄今为止最强大最智能 | 深度使用工具 | 图像推理 | 基准评分大幅提升 | 博士水平 | 成本效率更优 | Agent-CodeX开源

【人工智能】推理需求将增长百倍 | Cerebras CEO Andrew Feldman 20VC专访 | 设计理念 | 晶圆级集成 | 分化策略 | 基础设施 | AI投资 | 上市 | 英伟达

Related Summaries

【人工智能】OpenAI发布满血版o3和o4-mini | 迄今为止最强大最智能 | 深度使用工具 | 图像推理 | 基准评分大幅提升 | 博士水平 | 成本效率更优 | Agent-CodeX开源

【人工智能】推理需求将增长百倍 | Cerebras CEO Andrew Feldman 20VC专访 | 设计理念 | 晶圆级集成 | 分化策略 | 基础设施 | AI投资 | 上市 | 英伟达

【人工智能】OpenAI发布满血版o3和o4-mini | 迄今为止最强大最智能 | 深度使用工具 | 图像推理 | 基准评分大幅提升 | 博士水平 | 成本效率更优 | Agent-CodeX开源

【人工智能】推理需求将增长百倍 | Cerebras CEO Andrew Feldman 20VC专访 | 设计理念 | 晶圆级集成 | 分化策略 | 基础设施 | AI投资 | 上市 | 英伟达

Stay Updated