Video thumbnail for Leaked AI Technology Making Large Language Models Obsolete!

AI Revolution: 4 Secret Technologies Making LLMs Obsolete!

Summary

Quick Abstract

Is AI progress stalling? Don't believe the hype! While publicly visible advancements in AI might seem incremental, groundbreaking technologies are secretly revolutionizing the field. This summary unveils four leaked technologies from major labs, poised to make current Large Language Models (LLMs) obsolete. Discover innovations that promise more capable, intelligent, faster, and reliable AI models. Understand the fundamental flaws of today's AI and how upcoming tech could trigger a paradigm shift.

Quick Takeaways:

  • Infinite Lifespan AI: Google's new architecture allows AI instances to learn indefinitely with an infinite context window, overcoming transformer limitations.

  • Moving Beyond Memorization: Yann LeCun is pioneering new AI architectures focused on conceptual understanding rather than rote memorization, addressing a key flaw in current LLMs.

  • Thinking in Vector Space: Explore models capable of reasoning and planning in the space of ideas (vectors), enabling a more efficient and accurate thought process.

  • World Models & Multi-modality: Demis Hassabis introduces the concept of "world models" – AIs that learn from diverse inputs (video, image, text, sound) to build a rich, input-independent representation of the world, mirroring how the human brain operates.

The Illusion of Stalled AI Progress

From the outside, especially as companies have become more secretive, it may seem like AI progress has come to a halt. There appears to be little effort to disrupt the current AI stack. However, behind the scenes, a multitude of new breakthrough technologies are emerging. These technologies have the potential to render current Large Language Models (LLMs) obsolete, introducing AI models that are far more capable, intelligent, faster, and reliable.

Infinite Lifespan AI: A Google Breakthrough

The Limitations of Transformer-Based Models

The state-of-the-art approach to building artificial general intelligence (AGI) today is the transformer-based large language model. While transformers are highly effective at utilizing massive amounts of parallel compute, they have a well-known issue of limiting AI's lifespan during inference time. In essence, transformers are great at building the digital brain, but when it comes to operating in the real world, they have a short lifespan. For example, on the consumer side, this is evident in the limited context window, and starting a new chat session is like running a new copy of the AI, losing valuable information each time.

The New Architecture: A Game Changer

A new architecture from Google is set to change this. This architecture allows for the creation of an AI with an infinite lifespan and an infinite context window. Jacob Bachman, the CEO of Manifest AI, has predicted that transformers will not remain the dominant architecture for much longer. By the end of 2025, every hyperscaler will be working on a subquadratic foundation model, and by the end of the next year, almost no one will be using transformer models.

How the New Architecture Works

Transformers are next token predictors. They take a given amount of text and predict the next token. To keep a conversation going, the model is fine-tuned on user/assistant turn-based conversations and uses a special character at the end of the assistant turn. However, this approach has limitations as the conversation grows, and transformers struggle to process the increasing amount of text due to their quadratic operation complexity.

The solution lies in subquadratic architectures that include memory as part of the main system. Google's Titans, for example, showed that a model can be more selective about what it remembers using a surprise mechanism. By training the model to hold information more effectively, it was able to outperform transformers on longer context windows and be more efficient.

The Turing Test and the Next Generation of AI

The Flaws in Current AI Models

Despite claims that the Turing test has been passed, AI has never truly achieved this. There is a simple test that can expose the flaws in current AI models. Yann LeCun, one of the godfathers of modern AI, has moved on from LLMs and is working on a new architecture that addresses two fundamental problems with current AI models.

The Problem of Over - Memorization

The first problem is that current AI models, like ChatGPT, seem to know everything as they have memorized the entire internet. This is in stark contrast to humans, who have a limited capacity to store information. LeCun believes that we may be wasting too many parameters on exact words, leaving insufficient space to store patterns and higher-level abstractions. Interpretability studies of current AI models support this idea, showing that they are underparameterized and are forced to cram as much information as possible.

The Problem of Thinking and Planning

The second problem is thinking and planning. Current reasoning models are a simplified version of actual thinking and planning. They are fine-tuned to output in a specific format, but in reality, they are just processing a large amount of text. LeCun's new architecture aims to give the model the ability to think in the space of ideas, rather than just talking out loud to itself.

Joint Embedding Predictive Architecture (JEPA)

LeCun's architecture, JEPA, pushes models to predict features and semantic representations rather than detailed outputs. For example, in ImageJPG, the model predicts the abstract representation of a missing patch instead of the raw pixels. This approach allows the model to work with vectors and manipulate ideas in a more efficient way, only outputting words when it is ready to communicate.

Synthetic Data: The Next Frontier

The Importance of Synthetic Data

As AI models become more capable and open-ended, they require more data. However, the internet is running out of useful data. Synthetic data offers a solution. While one way to use synthetic data is to generate a dataset for training, there are more efficient and effective methods.

Self-Play and Self-Improvement

AlphaGo and AlphaZero demonstrated the power of self-play, a form of synthetic data generation. Self-play environments have significant upsides, but there are misconceptions about self-improving AI. Defining the correct goals and rules for a general system to achieve recursive self-improvement is a complex task.

Absolute Zero: A Promising Approach

Absolute Zero from China is a promising approach to training reasoning models. Instead of relying on external data, the model simultaneously learns to define tasks that maximize learnability and solve them effectively through self-play. This system not only gets better at solving problems but also at proposing problems, outperforming models of its own size, even those trained for specific tasks.

The World Model: Google's Vision for AGI

The Difference in AGI Definitions

There is a difference in how OpenAI and Google DeepMind view AGI. OpenAI defines AGI as a highly autonomous system that outperforms humans at most economically valuable work, while Demis Hassabis of Google DeepMind defines AGI as a system capable of doing the range of things that the human brain can do.

The Brain as an Architecture

The human brain is a general-purpose computing device that takes in various sensory inputs and outputs actions. It can learn to make sense of new sensors without the need for drivers or patches. This is the inspiration behind Google's world model, an AI that is not dominated by one form of input but learns from all modalities to build a rich representation of the world.

Multi - Modal AI and the World Model

Multi-modal AI models are a step towards a true world model, but a true world model is more ambitious and powerful. Google DeepMind's Gemini aims to achieve this ultimate goal, combining various technologies to create an AI that can understand and interact with the world in a more comprehensive way.

Conclusion

While the AI we see in the news is mature and used in real products, the most interesting developments are happening in the scientific research of AI. There are several obvious leaps in sight, and these secret technologies have the potential to transform the future of AI.

Was this summary helpful?