Video thumbnail for E191|小而美的机会来了,聊聊这轮AI Agent进化新范式

AI Agents: Small, Beautiful Opportunities & the NEW Paradigm Shift

Summary

Quick Abstract

Dive into the world of AI Agents! This summary recaps the explosive growth of AI Agents in early 2025, exploring key drivers like improved model coding and reinforcement learning. We'll examine insights from industry experts about what makes current AI Agents different from earlier iterations and what core capabilities the top AI agents possess.

Quick Takeaways:

  • Model code writing has substantially improved due to advancements like Anthropic's SONNET 3.5

  • Reinforcement Learning from Feedback (RFT) enhances model performance, even with limited data.

  • The establishment of the MCP (Multi-Computational Protocol) agreement allows AI to communicate and interact with various websites and services more effectively.

  • Early AI Agents such as OpenAI's Operator and DeepResearch illustrate the potential of AI to independently operate browsers and execute complex research tasks.

  • The key factors for AI Agent success are now understanding the environment and context, even above the need for data.

We'll also touch on the challenges in commercializing AI Agents and the critical importance of evaluation metrics for their long-term viability and what opportunities exist for smaller businesses in this AI boom.

Introduction

Hi, everyone. Welcome to Gui Gu 101. I'm Hongjun. Since 2025, AI Agent has made rapid progress. In this article, we will review some of the progress in the first half of this year, explore the reasons behind the acceleration of AI Agent development, and discuss the intelligence and application scenarios of AI Agents. We will also hear from two guests, Tao Fangbo, the founder of Mediverse, and Clanto (Hou Taiyu), an AI entrepreneur and student of application psychology at the University of New York.

Progress of AI Agents in the First Half of 2025

  • January: OpenAI launched the AI Agent Operator that can use the browser on its own.

  • February: DeepResearch launched a complex research mission.

  • March: China's first so-called common AI agent, Minos, went viral.

  • May: Minos received $75 million in financing from Benchmark.

  • May 6th: OpenAI announced that it would purchase Windsurf at a price of $3 billion. Another programming tool, Cursor's mother company Anisfield, also received $9 billion in financing, with a valuation of up to $9 billion.

Reasons for the Acceleration of AI Agent Development

  • Improvement of model code writing capabilities: For example, SONNET 3.5, released by Anthropic last year, is a positive improvement in the direction of code generation. This has driven the development of a batch of code-writing AI agents, such as Windsurf and Cursor.

  • Advent of RFT-enhanced learning control technology: This technology allows training data to improve the model performance in specific tasks even when it is limited, accelerating the development of Agents.

  • Proposal of MCP agreements: At the end of November 2024, Anthropic proposed a set of traditional information-based services to transform into a set of MCP agreements that can communicate with AI. More and more websites and services are starting to join MCP, and industry-level infrastructure is also being built.

Intelligence of AI Agents

  • Machine learning perspective: In a learning environment, an agent can rely on the feedback of the environment to learn an action strategy independently, in order to achieve its goal. For example, AlphaGo is a typical agent in a learning environment.

  • Popular language perspective: An agent is more like a person who can complete a task independently, is driven by a basic logic model or a thinking model like the latest reasoning model, has its own memory system, and has an interface with the user.

Application Scenarios of AI Agents

  • Programming: For example, programming agents need to be trained by companies to operate in a specific programming environment, including using IDEs, test tools, deployment tools, and visiting GitHub and programming communities.

  • Law and medicine: In these fields, there are experts who have their own set of weapons and methodologies to complete tasks. AI agents can be trained based on these methodologies and tools.

  • Military: Palantir is a military base that has a very typical environment, including operating weapons. AI agents can be trained to operate in this environment.

OpenAI's Operator

  • Function: Operator is an AI that can operate computers, such as booking hotels and tickets.

  • Technology: It is based on the micro-tuning technology of the new generation of enhanced learning and large-scale model integration.

  • Experience: It will open a browser interface for users at the end of the server, and users can complete operations through the browser. It can disassemble tasks, try various websites, and take back the results to continue thinking. However, its speed is very slow and the accuracy rate is not high enough.

Deep Research

  • Function: It is an AI agent that can complete complex research tasks.

  • Technology: It is also based on the micro-tuning technology of the new generation of enhanced learning and large-scale model integration.

  • Experience: It will unlock first, and if there are conflicts between the information sources, it will have to unlock again or solve the conflicts. It can put its thinking process and its ability to act in the environment, and then take the feedback from the environment back and continue to think.

Minos

  • Function: It is a common AI agent that can complete various tasks, such as writing web pages, doing research, and uploading articles.

  • Technology: It is based on the cloud version of the SONNET 3.7 model.

  • Experience: It has a good UI with a notion style, and its memory function is different from OpenAI's. It can remember users' instructions and constantly align with them before performing any task. However, its information quality is a bit low, and it is more suitable for work that is not deep.

Evaluation of AI Agents

  • Importance: Evaluation is more important than training for AI agents. It is the only tool that can be used to optimize the effect of each product after it is changed.

  • Types: There are three types of evaluation: human evaluation, code-based evaluation, and LN-based evaluation. Each type has its own advantages and limitations.

  • Framework: AI agents should have their own evaluation operations. Any response generation and execution should go through an evaluation mechanism. A systematic evaluation framework should be built from the beginning.

Future of AI Agents

  • Opportunities and challenges: Although AI agents have great potential, there are still many challenges, such as data barriers, model limitations, and user experience. However, there are also many opportunities, such as the development of new technologies and the emergence of new application scenarios.

  • Positioning and strategy: AI agent products need to have a clear positioning and strategy. They need to focus on specific vertical areas and provide better user experience and value.

  • Network effect: AI agent products need to build a strong network effect to attract more users and developers. They need to create a platform that can connect various scenarios and application data.

Conclusion

In conclusion, AI Agent has made rapid progress in 2025, and there are many opportunities and challenges in the future. We need to continue to explore and innovate to make AI Agent more intelligent, more useful, and more accessible. Thank you for listening to Gui Gu 101.

Was this summary helpful?