Google Cloud Next 2025: New TPU, AI Model Updates & A2A Protocol

Summary

Quick Abstract

Google is vying to reclaim its AI dominance with the unveiling of its groundbreaking seventh-generation TPU, Ironwood, directly challenging NVIDIA's Blackwell B200. This summary highlights key announcements from the Google Cloud Next conference, covering the new TPU, Vertex AI platform updates (Lyria, Veo 2, Chirp 3, Imagen 3), the Agent2Agent (A2A) protocol, and Gemini Code Assist.

Quick Takeaways:

  • Ironwood TPU: Google's most powerful and scalable AI accelerator, boasting a 3600x performance leap over its first-generation TPU and an FP8 compute of 42.5 Exaflops.
  • Vertex AI Updates: Introducing Lyria, a text-to-music model; Veo 2, a comprehensive video creation tool; Chirp 3 with enhanced voice customization; and Imagen 3, improving image generation quality.
  • Agent2Agent Protocol: A new open protocol enabling seamless agent collaboration across various platforms, fostering a dynamic multi-agent ecosystem.
  • Gemini Code Assist: Enhances developer efficiency with AI agents capable of complex coding tasks and code translation, now also integrated into Android Studio.

Google Cloud Next 2024: Google Aims to Reclaim AI Throne

Google held its annual Google Cloud Next conference in Las Vegas on April 10th, showcasing significant advancements in its AI capabilities. The event highlighted a range of developments, including the debut of its latest TPU, upgraded AI models, a new Agent-to-Agent (A2A) protocol, and enhancements to its code assistance tools. These innovations signal Google's ambition to reshape the AI landscape. This article will provide an overview of the key announcements made at the conference.

Seventh-Generation TPU: Ironwood

Ironwood: Challenging NVIDIA's Blackwell B200

The most prominent announcement was the unveiling of Google's seventh-generation TPU, Ironwood. This chip is designed to compete directly with NVIDIA's Blackwell B200. Google positions Ironwood as its most powerful and scalable custom AI accelerator to date, specifically optimized for inference.

Performance and Specifications

Ironwood boasts impressive performance improvements compared to previous generations. Its inference performance is reportedly 3600 times faster and 29 times more efficient than the first-generation TPU from 2018. * It features 192GB of HBM memory, six times more than the sixth-generation TPU Trillium, and also six times more than the TPU v4. * The chip's HBM bandwidth has increased to 7.2 Tbps, 4.5 times that of Trillium. * The chip-to-chip interconnect (ICI) bi-directional bandwidth is now 1.2 Tbps, 1.5 times greater than Trillium.

These improvements in memory capacity and bandwidth allow Ironwood to handle larger models and datasets while reducing data transfer bottlenecks, ultimately boosting performance.

Scalability and Compute Power

For Google Cloud customers, Ironwood is available in two configurations: 256 chips and 9216 chips. Each individual chip has a peak FP8 compute power of 4614 TFLOPs. A pod consisting of 9216 chips reaches 42.5 Exaflops in FP8 precision.

Google stated that this compute power exceeds that of the world's largest supercomputer, El Capitan, by over 24 times. However, this comparison is based on El Capitan's FP64 precision performance (1.74 exaFLOPS) versus Ironwood's FP8 performance. When both are converted to FP8, El Capitan's theoretical peak performance is closer to 87 exaFLOPS, still exceeding Ironwood. Even so, 42.5 Exaflops of FP8 compute power is a considerable figure for large-scale inference tasks.

Enhanced Features

Ironwood is also equipped with an enhanced version of SparseCore, a dedicated accelerator for advanced ranking and recommendation tasks. This expands Ironwood's applications beyond traditional AI, making it suitable for fields like finance and science. The Pathways ML runtime, developed by Google DeepMind, is designed to work seamlessly with Ironwood to enable efficient distributed computing across multiple TPU chips. Google has also integrated new GKE inference capabilities and vLLM support, allowing PyTorch code optimized for GPUs to be easily transferred and run on TPUs.

Power Efficiency

Ironwood prioritizes power efficiency, achieving a two-fold improvement compared to the sixth-generation TPU Trillium and a 29-fold increase compared to the first-generation TPU. Google utilizes advanced liquid cooling solutions and optimized chip design to maintain performance even under heavy AI workloads.

Competitive Analysis

OpenAI researchers have compared Ironwood's performance with NVIDIA's GB200, suggesting that the two are comparable, with Ironwood potentially having a slight edge in power efficiency. Google's VP and General Manager of Cloud AI, Amin Vahdat, stated that Ironwood is designed to support the next phase of generative AI and its demands for compute and communication, as AI agents transition to proactively retrieving and generating data for collaborative insights.

Vertex AI Platform Updates

Google's Vertex AI platform now supports all modalities, including video, image, voice, and music. The conference introduced four significant updates to the platform:

  1. Lyria (Text-to-Music Model): Lyria enables users to generate complete music tracks from text prompts for production use. Businesses can create custom soundtracks aligned with their brand for marketing campaigns, product launches, or immersive experiences. Creators can use Lyria to accelerate content creation workflows and reduce licensing costs.
  2. Veo 2 (Video Generation Model): Veo 2 has been upgraded with new features for video creation, editing, and visual effects. Enhancements include video restoration capabilities for clean edits, the removal of unwanted objects, image expansion to adapt content for different platforms, and the ability to apply complex cinematic techniques without specialized expertise. Veo 2 also has an interplation function to create transitions between different videos.
  3. Chirp 3 (Speech Generation Model): Chirp 3 offers high-definition voices in over 35 languages and eight speaker options. New features include Instant Custom Voice (generating realistic customized voices from 10-second audio clips) and Transcription with Diarization (separating and identifying individual speakers in multi-person recordings).
  4. Imagen 3 (Text-to-Image Model): Imagen 3 produces images with improved details, enhanced lighting, and fewer artifacts. Significant improvements have been made to its image inpainting capabilities, particularly for object removal.

Agent2Agent (A2A) Protocol

As AI agents become more prevalent, the need for interoperability between them grows. Google has introduced the Agent2Agent (A2A) protocol, an open standard enabling agents to collaborate across isolated data systems and applications. Over 50 partners support the new A2A protocol. A2A is designed to facilitate interaction between agents regardless of their underlying frameworks or vendors.

  • For example, in a large e-commerce company utilizing various platforms (Atlassian, Box, Salesforce, Workday), A2A allows agents on these platforms to communicate and automate data interactions securely.

Google followed five key principles when designing the protocol:

  1. Focus on enabling agents to collaborate in their natural, unstructured modes.
  2. Building on existing and popular standards (HTTP, SSE, JSON-RPC).
  3. Supporting enterprise-grade authentication and authorization, on par with OpenAPI.
  4. Offering flexibility to support scenarios from quick tasks to in-depth research.
  5. Supporting various modalities, including audio, image, and video streams.

How A2A Works

A2A facilitates communication between client agents and remote agents. The client agent initiates tasks, and the remote agent executes them, providing information or performing actions. Key aspects of the protocol include:

  • Agent Cards: Agents advertise their capabilities using JSON-formatted "Agent Cards."
  • Task Management: Communication revolves around completing tasks, with a defined "Task" object and lifecycle.
  • Collaboration: Agents can exchange messages containing context, replies, artifacts, and user instructions.
  • User Experience Negotiation: Messages include "parts" specifying content types, enabling agents to negotiate optimal formats and UI capabilities.

Comparison with MCP

Google compared A2A with its Model Control Protocol (MCP). MCP primarily manages tools and resources, connecting agents to APIs and resources through structured inputs and outputs. A2A focuses on agent-to-agent collaboration, making the two protocols complementary.

Gemini Code Assist

Google's AI coding assistant, Gemini Code Assist, can now deploy new AI agents capable of performing complex programming tasks through multiple steps.

  • For example, it can create applications from Google Docs product specifications or translate code between languages.
  • Code Assist is now available in Android Studio, expanding its reach.

Conclusion

Google's Cloud Next conference showcased significant advancements in its AI offerings. From the powerful Ironwood TPU and the full-modality Vertex AI platform to the new A2A protocol and Gemini Code Assist, Google is demonstrating its commitment to innovation. Google CEO, Pichai, noted that Gemini 2.5 Pro is now available to all users in AI Studio, Vertex AI, and Gemini applications. With the increase of users in these tools shows Google's AI growth. As OpenAI prepares for its own series of announcements, Google is expected to continue its AI development.

Was this summary helpful?

Quick Actions

Watch on YouTube

Stay Updated

Get the latest summaries delivered to your inbox weekly.