Google Cloud Next 2024: Google Aims to Reclaim AI Throne
Google held its annual Google Cloud Next conference in Las Vegas on April 10th, showcasing significant advancements in its AI capabilities. The event highlighted a range of developments, including the debut of its latest TPU, upgraded AI models, a new Agent-to-Agent (A2A) protocol, and enhancements to its code assistance tools. These innovations signal Google's ambition to reshape the AI landscape. This article will provide an overview of the key announcements made at the conference.
Seventh-Generation TPU: Ironwood
Ironwood: Challenging NVIDIA's Blackwell B200
The most prominent announcement was the unveiling of Google's seventh-generation TPU, Ironwood. This chip is designed to compete directly with NVIDIA's Blackwell B200. Google positions Ironwood as its most powerful and scalable custom AI accelerator to date, specifically optimized for inference.
Performance and Specifications
Ironwood boasts impressive performance improvements compared to previous generations. Its inference performance is reportedly 3600 times faster and 29 times more efficient than the first-generation TPU from 2018. * It features 192GB of HBM memory, six times more than the sixth-generation TPU Trillium, and also six times more than the TPU v4. * The chip's HBM bandwidth has increased to 7.2 Tbps, 4.5 times that of Trillium. * The chip-to-chip interconnect (ICI) bi-directional bandwidth is now 1.2 Tbps, 1.5 times greater than Trillium.
These improvements in memory capacity and bandwidth allow Ironwood to handle larger models and datasets while reducing data transfer bottlenecks, ultimately boosting performance.
Scalability and Compute Power
For Google Cloud customers, Ironwood is available in two configurations: 256 chips and 9216 chips. Each individual chip has a peak FP8 compute power of 4614 TFLOPs. A pod consisting of 9216 chips reaches 42.5 Exaflops in FP8 precision.
Google stated that this compute power exceeds that of the world's largest supercomputer, El Capitan, by over 24 times. However, this comparison is based on El Capitan's FP64 precision performance (1.74 exaFLOPS) versus Ironwood's FP8 performance. When both are converted to FP8, El Capitan's theoretical peak performance is closer to 87 exaFLOPS, still exceeding Ironwood. Even so, 42.5 Exaflops of FP8 compute power is a considerable figure for large-scale inference tasks.
Enhanced Features
Ironwood is also equipped with an enhanced version of SparseCore, a dedicated accelerator for advanced ranking and recommendation tasks. This expands Ironwood's applications beyond traditional AI, making it suitable for fields like finance and science. The Pathways ML runtime, developed by Google DeepMind, is designed to work seamlessly with Ironwood to enable efficient distributed computing across multiple TPU chips. Google has also integrated new GKE inference capabilities and vLLM support, allowing PyTorch code optimized for GPUs to be easily transferred and run on TPUs.
Power Efficiency
Ironwood prioritizes power efficiency, achieving a two-fold improvement compared to the sixth-generation TPU Trillium and a 29-fold increase compared to the first-generation TPU. Google utilizes advanced liquid cooling solutions and optimized chip design to maintain performance even under heavy AI workloads.
Competitive Analysis
OpenAI researchers have compared Ironwood's performance with NVIDIA's GB200, suggesting that the two are comparable, with Ironwood potentially having a slight edge in power efficiency. Google's VP and General Manager of Cloud AI, Amin Vahdat, stated that Ironwood is designed to support the next phase of generative AI and its demands for compute and communication, as AI agents transition to proactively retrieving and generating data for collaborative insights.
Vertex AI Platform Updates
Google's Vertex AI platform now supports all modalities, including video, image, voice, and music. The conference introduced four significant updates to the platform:
- Lyria (Text-to-Music Model): Lyria enables users to generate complete music tracks from text prompts for production use. Businesses can create custom soundtracks aligned with their brand for marketing campaigns, product launches, or immersive experiences. Creators can use Lyria to accelerate content creation workflows and reduce licensing costs.
- Veo 2 (Video Generation Model): Veo 2 has been upgraded with new features for video creation, editing, and visual effects. Enhancements include video restoration capabilities for clean edits, the removal of unwanted objects, image expansion to adapt content for different platforms, and the ability to apply complex cinematic techniques without specialized expertise. Veo 2 also has an interplation function to create transitions between different videos.
- Chirp 3 (Speech Generation Model): Chirp 3 offers high-definition voices in over 35 languages and eight speaker options. New features include Instant Custom Voice (generating realistic customized voices from 10-second audio clips) and Transcription with Diarization (separating and identifying individual speakers in multi-person recordings).
- Imagen 3 (Text-to-Image Model): Imagen 3 produces images with improved details, enhanced lighting, and fewer artifacts. Significant improvements have been made to its image inpainting capabilities, particularly for object removal.
Agent2Agent (A2A) Protocol
As AI agents become more prevalent, the need for interoperability between them grows. Google has introduced the Agent2Agent (A2A) protocol, an open standard enabling agents to collaborate across isolated data systems and applications. Over 50 partners support the new A2A protocol. A2A is designed to facilitate interaction between agents regardless of their underlying frameworks or vendors.
- For example, in a large e-commerce company utilizing various platforms (Atlassian, Box, Salesforce, Workday), A2A allows agents on these platforms to communicate and automate data interactions securely.
Google followed five key principles when designing the protocol:
- Focus on enabling agents to collaborate in their natural, unstructured modes.
- Building on existing and popular standards (HTTP, SSE, JSON-RPC).
- Supporting enterprise-grade authentication and authorization, on par with OpenAPI.
- Offering flexibility to support scenarios from quick tasks to in-depth research.
- Supporting various modalities, including audio, image, and video streams.
How A2A Works
A2A facilitates communication between client agents and remote agents. The client agent initiates tasks, and the remote agent executes them, providing information or performing actions. Key aspects of the protocol include:
- Agent Cards: Agents advertise their capabilities using JSON-formatted "Agent Cards."
- Task Management: Communication revolves around completing tasks, with a defined "Task" object and lifecycle.
- Collaboration: Agents can exchange messages containing context, replies, artifacts, and user instructions.
- User Experience Negotiation: Messages include "parts" specifying content types, enabling agents to negotiate optimal formats and UI capabilities.
Comparison with MCP
Google compared A2A with its Model Control Protocol (MCP). MCP primarily manages tools and resources, connecting agents to APIs and resources through structured inputs and outputs. A2A focuses on agent-to-agent collaboration, making the two protocols complementary.
Gemini Code Assist
Google's AI coding assistant, Gemini Code Assist, can now deploy new AI agents capable of performing complex programming tasks through multiple steps.
- For example, it can create applications from Google Docs product specifications or translate code between languages.
- Code Assist is now available in Android Studio, expanding its reach.
Conclusion
Google's Cloud Next conference showcased significant advancements in its AI offerings. From the powerful Ironwood TPU and the full-modality Vertex AI platform to the new A2A protocol and Gemini Code Assist, Google is demonstrating its commitment to innovation. Google CEO, Pichai, noted that Gemini 2.5 Pro is now available to all users in AI Studio, Vertex AI, and Gemini applications. With the increase of users in these tools shows Google's AI growth. As OpenAI prepares for its own series of announcements, Google is expected to continue its AI development.