直男山禾: Google I/O 2025: AI Search, Video, & More! (Everything Announced)

Dive into the groundbreaking 2025 Google I/O! This recap summarizes the major AI announcements, from advanced image and video generation tools to revolutionary updates in search and hardware. Discover how Google is pushing the boundaries of AI in content creation, personalized experiences, and accessibility.

Quick Takeaways:

AI Video Creation: The VL3 model allows sound effect matching and character dialogue generation, streamlining movie production.
Image Generation: New models create realistic, layered images with advanced text integration, ideal for posters and invitations.
Gemini Updates: 2.5 Pro excels in programming, while Flash offers cost-effective reasoning. New text-to-speech features with nuanced tones are also revealed.
AI Search: Redesigned search experience powered by Gemini 2.5, providing personalized results, report generation, and agent capabilities for tasks like flight booking.
Hardware Innovations: Google Beam for 3D video calls, AI glasses for real-time translation, and a Samsung-partnered Android XR device were unveiled.

Learn about SenseID technology, Google's approach to watermarking AI-generated content, and more!

The 2025 Google I/O conference showcased a plethora of AI advancements, potentially providing enough material for extensive coverage by AI companies. This article will provide an overview of the key announcements made at the conference, including updates to GM9, AI Search, Agents, and AI Video.

Content Creation: AI-Powered Tools

Google heavily emphasized AI's role in content creation. The conference featured significant updates to its video and image tools. Each individual tool boasts significant capabilities that warrant detailed exploration.

AI Video Advancements with VL3

Google's VL2 model already demonstrated impressive video recognition capabilities. The newly released VL3 model improves upon this, generating videos with enhanced quality and adherence to physical laws. More significantly, VL3 allows users to easily add sound effects, background sounds, and character dialogue to videos. With a simple prompt, characters can speak and have their mouth movements automatically synchronized.

Image Generation and Text Integration

The new image generation model features richer layers, more delicate text rendering, and realistic background effects. It also incorporates text layout functionality, enabling the creation of posters, invitations, and other visually appealing designs. This extends to music creation as well.

Flow: An AI Movie Creation Tool

Google introduced Flow, an AI movie creation tool that leverages the aforementioned functionalities. Users can upload existing images or generate new ones using built-in models. By providing commands, they can then transform these images into AI videos. Flow maintains character and scene consistency throughout the production process. It also allows for direct editing after generation. Flow is currently available to Ultra members in the US for development.

SenseID: Identifying AI-Generated Content

The rise of AI-generated video and images raises the issue of content identification. To address this, Google is investing in SenseID, a hidden digital watermark that can be embedded in AI-generated text, images, and audio/video. Currently, 1.1 billion pieces of content have been watermarked. The SenseID tool allows users to verify if content contains this watermark. Google believes AI content should be marked, not stopped, and hopes this technology will be widely adopted.

Gemnet Model Upgrades

Google announced upgrades to the Gemnet model, including Gemnet 2.5 Pro and Flash.

Gemnet 2.5 Pro

Gemnet 2.5 Pro excels in various areas, particularly programming. It has achieved top rankings in the LRM arena and even successfully navigated the "Bao Ke Meng Blue" game.

Gemnet 2.5 Flash

As a lightweight model, Gemnet 2.5 Flash has been improved in reasoning, coding, and long-context processing while maintaining cost-effectiveness. Compared to the Pro version, it uses 22% fewer tokens for the same performance.

Practical Feature Updates for Gemnet

Text-to-Speech: Gemnet now features a text-to-speech function that allows it to speak in a realistic human voice, including whispers. It supports 24 languages with seamless switching between them. This functionality is available through the GM.NET API.
Canvas: Canvas can transform user input into various formats, such as web pages, information charts, and blog posts.
Thinking Process Visualization: The model's thinking process can now be organized into a clear format with keywords and key information, allowing users to understand the model's reasoning.
Budget Function: This feature allows users to limit the number of tokens used by the Gemini model, controlling costs and ensuring faster results.

AI Programming Tool: Juice

Google unveiled Juice, an AI programming tool designed for professional developers. It connects to GitHub, automatically extracts and submits code, and offers five free uses per day. Juice is currently undergoing global testing.

AI Search: A Personalized and Intelligent Search Experience

Google introduced AI Mood, a redeveloped search experience powered by the Gemnet 2.5 series. It is designed to solve complex search problems by:

Breaking down complex problems into smaller parts.
Deep-diving into the web to combine real-time information.
Generating documents, pictures, connections, maps, and other multi-format answers.

AI Mood differentiates itself from traditional search by providing a personalized experience that considers past search records and, with user authorization, integrates with applications like Gmail. It can organize search results into a research report with intuitive charts and has initial agent capabilities.

Agent Capabilities in AI Search

For example, when asked to book a flight ticket, AI Search will automatically analyze and compare information from various platforms to find the best option and complete the reservation.

Visual Capabilities and Real-Time Interaction

AI Search allows users to use their camera and ask questions about the real world. The AI can understand what it sees and interact in real-time. During a demonstration, the AI corrected a user who deliberately misidentified a garbage truck as a factory car and a street lamp as a building.

Hardware Innovations

The conference highlighted three notable hardware products:

Google Beam (with Huip): A video call device utilizing six cameras to capture movements and create a 3D video experience.
AI Glasses: Featuring a built-in camera, microphone, and speaker, these glasses can detect what the user sees and provide real-time translation.
Android XR Device (with Samsung): An Android-based extended reality device.

Ultra Membership

Google introduced Ultra Membership, priced at $249 per month, including access to most of the products announced at the conference. This is $50 more expensive than OpenAI's Pro membership, but offers a half-price discount of $124.99 per month for the first three months.

Google I/O 2025: AI Search, Video, & More! (Everything Announced)

Summary

Quick Abstract