This article explores the future trajectory of Artificial Intelligence (AI) based on the insights of Yao Shunyue, a researcher at OpenAI and a graduate of Tsinghua University's Yao Class and Princeton University. Yao, known for his groundbreaking work in language agents, including the Tree of Thoughts (ToT), ReAct, and CoALA architecture, recently published a blog post titled "The Second Half of AI," offering a perspective on the future direction of AI. This article delves into his ideas.
AI's "Halftime": A Review of the First Phase
We are currently at a unique stage in AI development, described by Yao as "halftime." The initial decades of AI focused heavily on developing new training methods and models, yielding significant advancements. These include fundamental innovations in search technology, deep reinforcement learning, and reasoning methodologies.
The Shift from Solving to Defining Problems
Deep reinforcement learning, once plagued by generalization challenges, has seen progress in finding solutions applicable across diverse tasks. This shift has caused the AI development focus to evolve from merely solving problems to defining them. Evaluation is now paramount, prompting a re-evaluation of existing AI training methodologies and a need for more scientific assessment of AI progress. This shift requires viewing AI development from a more product-oriented perspective.
The Importance of Foundational Training Methods
The impactful AI papers of the first phase, such as those introducing the Transformer architecture, AlexNet, and GPT-3, centered on foundational breakthroughs in training methods rather than benchmarks. While benchmarks like ImageNet are important, the papers detailing method innovations have received significantly more citations. This emphasis reflects the wide applicability and value of these methods across the AI landscape. The Transformer architecture, initially applied in machine translation, has been successfully adapted to computer vision, natural language processing, and reinforcement learning. The emphasis on method innovation effectively propelled AI advancements across various domains. However, continuous accumulation of these innovations has driven AI to an inflection point, triggering a fundamental shift in development focus.
The AI "Recipe": Language Pre-training, Scale, Reasoning, and Action
Yao proposes an AI "recipe" comprising large-scale language pre-training, scale, and reasoning & action. He draws a parallel to reinforcement learning to clarify the reasoning behind terming these as recipe.
Reinforcement Learning: More Than Just Algorithms
Reinforcement learning, considered the "ultimate form" of AI, theoretically ensures an agent's success in games. However, even systems like AlphaGo rely heavily on reinforcement learning support. Reinforcement learning comprises algorithms, environment, and prior knowledge. Historically, researchers emphasized algorithms (REINFORCE, DQN, etc.), treating the environment and prior knowledge as fixed or simplified factors.
The Importance of Environment and Prior Knowledge
With the advent of deep reinforcement learning, the significance of the environment has become more apparent. Algorithm performance relies heavily on the environment used for development and testing. Neglecting environmental factors can lead to algorithms that perform well in simple simulations but fail in real-world applications. OpenAI's initial plan of building Gym, World of Bits and Universe to transform the entire internet to a gigantic game environment did not fully achieve the expected results. While OpenAI achieved significant results, like using reinforcement learning to solve Dota and robot hand control, they failed to solve computer usage and web navigation challenges, and the trained Agent was hard to transfer to other areas.
The missing key ingredient was identified as prior knowledge with the emergence of GPT-2 and GPT-3. Powerful language pre-training can distill general knowledge and language understanding into models. These models, when fine-tuned, can become web agents like WebGPT or chatbots like ChatGPT.
Reasoning as a Key Element for Generalization
While language pre-training provides a solid foundation for chatbots, it struggles in domains like computer control or video games. These domains have different data distributions compared to internet text, limiting the effectiveness of supervised fine-tuning or reinforcement learning.
In 2019, Yao attempted to use GPT-2 to solve text-based games, finding that the Agent required millions of reinforcement learning steps to reach a certain level of gameplay, and the experience learned was difficult to apply in new games. Unlike the agents, humans can play new games without any prior experience. This is because humans can think abstractly. This ability to reason becomes crucial for dealing with new situations. Reasoning can be considered a unique kind of action. It deals with the open and unbounded space of thought. The key is that language enables generalization through reasoning within agents. When the correct prior knowledge and appropriate reinforcement learning environment are established, the learning algorithms are simple. Building on this understanding, researchers have developed models like O series and R1, which can utilize computers, paving the way for more advances in the future.
Rethinking Evaluation: The Key to AI's Next Phase
The conventional method of AI development, focusing on improving benchmark scores through new training methods, is reaching its limits. The "recipe" mentioned above has made benchmark improvements standardized and industrialized, reducing the need for innovative ideas. This expanding recipe makes it such that new methods to optimize performance in a specific task will only improve performance by 5% compared to a 30% performance increase that can be achieved even without task-specific optimization when using O-series models. The benchmarks can quickly and increasingly more quickly be solved with the "recipe".
Therefore, in the next phase of AI development, it is necessary to fundamentally rethink evaluation methods, not just create more difficult benchmarks. We need to question current assumptions and create entirely new evaluation systems, forcing the invention of methods that surpass the existing "recipe".
The Importance of Real-World Utility
AI's success in games like chess and Go, its academic performance surpassing most humans, and its achievements in Olympiad-level competitions haven't translated into significant economic or GDP changes. Yao terms this the utility problem, considering it a crucial challenge for AI development.
This problem arises from the discrepancy between existing evaluation settings and real-world conditions.
-
Interaction with Humans: Evaluations typically require autonomous operation, where an Agent receives a task input and completes it independently for a reward. However, real-world Agents need to interact with humans throughout the process.
-
Independent and Identically Distributed (IID) Data: Evaluations often assume IID data, where tasks are processed independently and averaged. In reality, task solving is sequential, and experience gained in one task can inform subsequent tasks. Software engineers familiarity with the code based increases, allowing them to more effectively solve problems. This is not possible with agents.
Generic methods under current assumptions may not continue to be as effective. Therefore, in the "second half" of AI, we need to develop new evaluation setups or tasks that reflect real-world utility, then use generic methods to solve these tasks or enhance these methods with novel components, then cycle this process.