The 2025 Google I/O conference showcased a plethora of AI advancements, potentially providing enough material for extensive coverage by AI companies. This article will provide an overview of the key announcements made at the conference, including updates to GM9, AI Search, Agents, and AI Video.
Content Creation: AI-Powered Tools
Google heavily emphasized AI's role in content creation. The conference featured significant updates to its video and image tools. Each individual tool boasts significant capabilities that warrant detailed exploration.
AI Video Advancements with VL3
Google's VL2 model already demonstrated impressive video recognition capabilities. The newly released VL3 model improves upon this, generating videos with enhanced quality and adherence to physical laws. More significantly, VL3 allows users to easily add sound effects, background sounds, and character dialogue to videos. With a simple prompt, characters can speak and have their mouth movements automatically synchronized.
Image Generation and Text Integration
The new image generation model features richer layers, more delicate text rendering, and realistic background effects. It also incorporates text layout functionality, enabling the creation of posters, invitations, and other visually appealing designs. This extends to music creation as well.
Flow: An AI Movie Creation Tool
Google introduced Flow, an AI movie creation tool that leverages the aforementioned functionalities. Users can upload existing images or generate new ones using built-in models. By providing commands, they can then transform these images into AI videos. Flow maintains character and scene consistency throughout the production process. It also allows for direct editing after generation. Flow is currently available to Ultra members in the US for development.
SenseID: Identifying AI-Generated Content
The rise of AI-generated video and images raises the issue of content identification. To address this, Google is investing in SenseID, a hidden digital watermark that can be embedded in AI-generated text, images, and audio/video. Currently, 1.1 billion pieces of content have been watermarked. The SenseID tool allows users to verify if content contains this watermark. Google believes AI content should be marked, not stopped, and hopes this technology will be widely adopted.
Gemnet Model Upgrades
Google announced upgrades to the Gemnet model, including Gemnet 2.5 Pro and Flash.
Gemnet 2.5 Pro
Gemnet 2.5 Pro excels in various areas, particularly programming. It has achieved top rankings in the LRM arena and even successfully navigated the "Bao Ke Meng Blue" game.
Gemnet 2.5 Flash
As a lightweight model, Gemnet 2.5 Flash has been improved in reasoning, coding, and long-context processing while maintaining cost-effectiveness. Compared to the Pro version, it uses 22% fewer tokens for the same performance.
Practical Feature Updates for Gemnet
-
Text-to-Speech: Gemnet now features a text-to-speech function that allows it to speak in a realistic human voice, including whispers. It supports 24 languages with seamless switching between them. This functionality is available through the GM.NET API.
-
Canvas: Canvas can transform user input into various formats, such as web pages, information charts, and blog posts.
-
Thinking Process Visualization: The model's thinking process can now be organized into a clear format with keywords and key information, allowing users to understand the model's reasoning.
-
Budget Function: This feature allows users to limit the number of tokens used by the Gemini model, controlling costs and ensuring faster results.
AI Programming Tool: Juice
Google unveiled Juice, an AI programming tool designed for professional developers. It connects to GitHub, automatically extracts and submits code, and offers five free uses per day. Juice is currently undergoing global testing.
AI Search: A Personalized and Intelligent Search Experience
Google introduced AI Mood, a redeveloped search experience powered by the Gemnet 2.5 series. It is designed to solve complex search problems by:
-
Breaking down complex problems into smaller parts.
-
Deep-diving into the web to combine real-time information.
-
Generating documents, pictures, connections, maps, and other multi-format answers.
AI Mood differentiates itself from traditional search by providing a personalized experience that considers past search records and, with user authorization, integrates with applications like Gmail. It can organize search results into a research report with intuitive charts and has initial agent capabilities.
Agent Capabilities in AI Search
For example, when asked to book a flight ticket, AI Search will automatically analyze and compare information from various platforms to find the best option and complete the reservation.
Visual Capabilities and Real-Time Interaction
AI Search allows users to use their camera and ask questions about the real world. The AI can understand what it sees and interact in real-time. During a demonstration, the AI corrected a user who deliberately misidentified a garbage truck as a factory car and a street lamp as a building.
Hardware Innovations
The conference highlighted three notable hardware products:
-
Google Beam (with Huip): A video call device utilizing six cameras to capture movements and create a 3D video experience.
-
AI Glasses: Featuring a built-in camera, microphone, and speaker, these glasses can detect what the user sees and provide real-time translation.
-
Android XR Device (with Samsung): An Android-based extended reality device.
Ultra Membership
Google introduced Ultra Membership, priced at $249 per month, including access to most of the products announced at the conference. This is $50 more expensive than OpenAI's Pro membership, but offers a half-price discount of $124.99 per month for the first three months.