Discover AI: DeepSeek R1: 8B vs 671B - Logic Test & Performance Comparison

Uncover the performance differences between Deepseek R1's full 671B parameter version and the distilled Q13 8B version. We dive into a logic puzzle involving a skyscraper elevator, exploring each model's reasoning process, optimization strategies, and ability to find the shortest path. See how these models handle trap floors and complex rules.

Quick Takeaways:

The 671B model initially struggles but eventually finds a six-step optimal solution, showcasing sophisticated reasoning and optimization.
The 8B model quickly produces a valid but non-optimal ten-step solution.
The 8B model can optimize for shortest number of steps.
The 8B can use previously found, non-optimal solution to guide optimizations to better solutions.
The full model displays inherent optimization and self-reflection, even in validation stages.
The distilled model demonstrates impressive reasoning for its size but can get trapped in local minimums.

DeepSeek R1: Performance Comparison Between Full and Distilled Versions

This article explores the performance differences between the full DeepSeek R1 0528 model and its distilled Q13 8 billion parameter version. We will examine a logic-based test to assess their reasoning capabilities.

Testing the Full DeepSeek R1 (671B Parameters)

We begin with the full DeepSeek R1 0528 model, boasting 671 billion free trainable parameters. This test utilizes Open Router, a platform known for providing access to various AI models.

The Skyscraper Logic Test

The test involves navigating an elevator in a skyscraper, moving from floor zero to floor 30. Five buttons (A-E) control the elevator's movement, each with a unique function. The challenge lies in finding the shortest sequence of button presses to reach floor 30 while avoiding "trap floors" that cause setbacks. This test is designed to assess the LLM's ability to perform logic and causal reasoning, avoiding simple solutions.

Reasoning Process Analysis

The full model begins by understanding the functions of each button. It then explores different solutions, considering the consequences of each action. The system identifies the problem as a shortest path problem, suitable for reinforcement learning optimization.

The model systematically lists button effects and attempts different sequences, learning from failures and trap floors. It utilizes a breadth-first search (BFS) methodology, employing logic and deduction within linguistic boundaries. The model explores various strategies, including starting from the end goal (floor 30).

The AI implements a reward function akin to reinforcement learning, evaluating each action based on its cost and benefit. It assesses various permutations and learns through trial and error.

Optimization and Language Switching

The model demonstrates an optimization strategy from the start. This is in contrast to older systems that first find a working but inefficient solution before optimizing. The model occasionally switches to another language, potentially its native training language, during intense reasoning. This behavior mimics how humans sometimes revert to their native language when dealing with complex calculations.

Outcome and Validation

The reasoning process eventually ends. While the system initially encountered an error during stream processing, it was restarted. After finding a six-step solution, the system was asked to validate it step-by-step. This validation included another internal optimization run, showcasing the model's ingrained reasoning process. The final validated solution was found to be optimal, avoiding trap floors and minimizing presses. The AI provided a clear explanation of why the sequence was the shortest path.

Testing the Distilled DeepSeek R1 (8B Parameters)

Next, the distilled Q13 8 billion parameter version of DeepSeek R1 is subjected to the same skyscraper logic test.

Faster Processing, Different Solution

The 8B model processes the information significantly faster due to its smaller size. It considers floor zero as the starting point and attempts to simplify the problem. It identifies the potential of button A to reach multiple floors. It quickly reaches an answer, a solution of 10 button presses, after quickly trying different buttons.

Suboptimal Solution and Analysis

The distilled model finds a solution that requires 10 button presses, which is not the optimal six-press solution. This highlights a key difference in performance compared to the full model.

Post-Analysis and Optimization Attempt

The reasoning path of the 8B model is then given to the AI for analysis, to find different solution patterns. The AI immediately tried to optimize yet another solution, showing an inherent tendency for optimization. It finds a seven-step solution.

Limitation of Distilled Model

When prompted to find a six-step solution, the distilled model, after further examination, reports that it cannot find one. This illustrates a limitation of the smaller model in handling the complexities of the problem. While it can find a valid solution, it falls short of the optimal one due to a reduction in capacity compared to the full model. The analysis and reasoning patterns are more basic compared to the full model. A more mathematical method solver is beneficial for this problem.

Conclusion

The full DeepSeek R1 model demonstrates superior causal reasoning and optimization capabilities compared to its distilled 8 billion parameter counterpart. While the distilled model can still find a valid solution, it may not be the most efficient or optimal one. This shows that there must be some reduction in the capacity of handling higher complexities in the distilled model.

DeepSeek R1: 8B vs 671B - Logic Test & Performance Comparison

Summary

Quick Abstract