The Illusion of Stalled AI Progress
From the outside, especially as companies have become more secretive, it may seem like AI progress has come to a halt. There appears to be little effort to disrupt the current AI stack. However, behind the scenes, a multitude of new breakthrough technologies are emerging. These technologies have the potential to render current Large Language Models (LLMs) obsolete, introducing AI models that are far more capable, intelligent, faster, and reliable.
Infinite Lifespan AI: A Google Breakthrough
The Limitations of Transformer-Based Models
The state-of-the-art approach to building artificial general intelligence (AGI) today is the transformer-based large language model. While transformers are highly effective at utilizing massive amounts of parallel compute, they have a well-known issue of limiting AI's lifespan during inference time. In essence, transformers are great at building the digital brain, but when it comes to operating in the real world, they have a short lifespan. For example, on the consumer side, this is evident in the limited context window, and starting a new chat session is like running a new copy of the AI, losing valuable information each time.
The New Architecture: A Game Changer
A new architecture from Google is set to change this. This architecture allows for the creation of an AI with an infinite lifespan and an infinite context window. Jacob Bachman, the CEO of Manifest AI, has predicted that transformers will not remain the dominant architecture for much longer. By the end of 2025, every hyperscaler will be working on a subquadratic foundation model, and by the end of the next year, almost no one will be using transformer models.
How the New Architecture Works
Transformers are next token predictors. They take a given amount of text and predict the next token. To keep a conversation going, the model is fine-tuned on user/assistant turn-based conversations and uses a special character at the end of the assistant turn. However, this approach has limitations as the conversation grows, and transformers struggle to process the increasing amount of text due to their quadratic operation complexity.
The solution lies in subquadratic architectures that include memory as part of the main system. Google's Titans, for example, showed that a model can be more selective about what it remembers using a surprise mechanism. By training the model to hold information more effectively, it was able to outperform transformers on longer context windows and be more efficient.
The Turing Test and the Next Generation of AI
The Flaws in Current AI Models
Despite claims that the Turing test has been passed, AI has never truly achieved this. There is a simple test that can expose the flaws in current AI models. Yann LeCun, one of the godfathers of modern AI, has moved on from LLMs and is working on a new architecture that addresses two fundamental problems with current AI models.
The Problem of Over - Memorization
The first problem is that current AI models, like ChatGPT, seem to know everything as they have memorized the entire internet. This is in stark contrast to humans, who have a limited capacity to store information. LeCun believes that we may be wasting too many parameters on exact words, leaving insufficient space to store patterns and higher-level abstractions. Interpretability studies of current AI models support this idea, showing that they are underparameterized and are forced to cram as much information as possible.
The Problem of Thinking and Planning
The second problem is thinking and planning. Current reasoning models are a simplified version of actual thinking and planning. They are fine-tuned to output in a specific format, but in reality, they are just processing a large amount of text. LeCun's new architecture aims to give the model the ability to think in the space of ideas, rather than just talking out loud to itself.
Joint Embedding Predictive Architecture (JEPA)
LeCun's architecture, JEPA, pushes models to predict features and semantic representations rather than detailed outputs. For example, in ImageJPG, the model predicts the abstract representation of a missing patch instead of the raw pixels. This approach allows the model to work with vectors and manipulate ideas in a more efficient way, only outputting words when it is ready to communicate.
Synthetic Data: The Next Frontier
The Importance of Synthetic Data
As AI models become more capable and open-ended, they require more data. However, the internet is running out of useful data. Synthetic data offers a solution. While one way to use synthetic data is to generate a dataset for training, there are more efficient and effective methods.
Self-Play and Self-Improvement
AlphaGo and AlphaZero demonstrated the power of self-play, a form of synthetic data generation. Self-play environments have significant upsides, but there are misconceptions about self-improving AI. Defining the correct goals and rules for a general system to achieve recursive self-improvement is a complex task.
Absolute Zero: A Promising Approach
Absolute Zero from China is a promising approach to training reasoning models. Instead of relying on external data, the model simultaneously learns to define tasks that maximize learnability and solve them effectively through self-play. This system not only gets better at solving problems but also at proposing problems, outperforming models of its own size, even those trained for specific tasks.
The World Model: Google's Vision for AGI
The Difference in AGI Definitions
There is a difference in how OpenAI and Google DeepMind view AGI. OpenAI defines AGI as a highly autonomous system that outperforms humans at most economically valuable work, while Demis Hassabis of Google DeepMind defines AGI as a system capable of doing the range of things that the human brain can do.
The Brain as an Architecture
The human brain is a general-purpose computing device that takes in various sensory inputs and outputs actions. It can learn to make sense of new sensors without the need for drivers or patches. This is the inspiration behind Google's world model, an AI that is not dominated by one form of input but learns from all modalities to build a rich representation of the world.
Multi - Modal AI and the World Model
Multi-modal AI models are a step towards a true world model, but a true world model is more ambitious and powerful. Google DeepMind's Gemini aims to achieve this ultimate goal, combining various technologies to create an AI that can understand and interact with the world in a more comprehensive way.
Conclusion
While the AI we see in the news is mature and used in real products, the most interesting developments are happening in the scientific research of AI. There are several obvious leaps in sight, and these secret technologies have the potential to transform the future of AI.