Introduction
Hello, everyone. I'm Kecha. DeepSeq has launched the R1 small version upgrade, with the new version numbered 0528. Currently, it can be used on the Taiwan official website app, and the API has also been updated. The new 0528 is based on the DeepSeq V3 base model.
Model Comparisons
DeepSeq stated in its official tweet that the new model is very close to O3 and Gemunt 2.5 Pro. An open-end model performs well with the top-of-the-line BIM model.
Parameter Comparison
-
Mathematics: TPSIC R10528 and OpenAI O3 are very close.
-
Science: O3 has a higher score.
-
DeepSeq R1 0528 vs Gemini 2.5 Pro: In some tests, DeepSeq R1 0528 scores higher, while in others, it scores lower. In the human test, its score is very close to Gemini 2.5 Pro.
Small Model Q3 8B
DeepSeq also officially released a small model, Q3 8B, which is particularly powerful. In the 2024 math competition, this 8B model scored a little higher than Q3 235B.
New Features and Optimizations of DeepSeq R1 0528
Thinking Process
When using the new R1-0528 model, for some topics, it takes more than 10 minutes to think, fully showing its thinking process. In contrast, OpenAI O3 basically has no thinking process.
Fantasy Problems Optimization
The new version of DeepSeq R1 has been optimized for fantasy problems. In scenes such as blood change, color analysis, summary, decoding, reading, and understanding, the fantasy rate has been greatly reduced.
Creative Writing Optimization
After the update of R1, it has been further optimized for e-books, novels, and flash books.
Tool Calling
The new R1 supports tool calling, but not in syncing. Cloud supports tool calling in syncing.
TAU Bench Score
DeepSeq believes that its score in TAU Bench is equivalent to OE High, but there is a gap between O3Hai and CloudSonic 4.
Front-End Code Generation and Role-Playing Field
The new IE generates front-end code, and the ability in the role-playing field is updated and improved. In addition to using it on the official channel of DeepSeq IE, you can also use the API of other suppliers through OpenRouter.
Programming and Game Comparisons
Programming Question
A programming question from Grok officially in p5.js through web.jl was used for comparison. The new A1 generation and Gemini 2.5 Pro generated different effects.
Mario Game
When generating a Mario game, the new A1 generation had some logic problems, while the effect generated by Gemini 2.5 Pro was too simple.
Audio Format of the Dragon Wind
Both models were tested on generating the audio format of the dragon wind. The new i1 considered more aspects such as playing audio or using a microphone.
SVG Comic
For making a simple SVG comic, GEMLINE 2.5 Pro had good results, and the new IE added a bubble dialogue frame, but its picture was not as good as GEMLINE 2.5 Pro.
Hairstyle Landing Page
When creating a hairstyle landing page, both GEMLINE and iE had good effects, but in terms of color, Gemini 2.5 Pro was better.
Animation
In an animation test, Gemini 2.5 Pro performed better.
HTML Page of AI Development
The result of R1 was better than the previous one, and Gemini had a good birth and did color variations on the top.
Dance Animation
The new i1's dance animation was more interesting than the war.
Live Stream with a Live Screen
Both models were used to build a live stream with a live screen. Gemini 2.5 Pro generated a very interesting result.
3D RPG
The new R1 generated a good effect for a cute 3D RPG, and Gemline also had a similar effect.
Poem Writing
When writing a beautiful alphabetical poem, the new R1 generated a good effect, and after adding the prompt to add Chinese, its effect was cooler. GEMLINE's effect was a bit monotonous.
Toy Machine Game
For a toy machine game, the interface of A1 was suitable for children, but it had the problem of not being able to catch things. Gemini was better at this game.
Mouse Platform Game
Cloud 4 was very good at a mouse platform game, while R1 had some problems.
SVG Code Writing
When writing SVG code, R1 generated a good effect, and after the second prompt, it could be operated successfully.
Highly Adaptive Starry Sky
For making a highly adaptive starry sky, the author preferred the effect of R1.
Interactive Multi-Circuit Machine
Gemini performed better in making an interactive multi-circuit machine.
Real-Time, Day-to-Night, Double-Closed Time
DeepSeq's color scheme was better than Gemini's.
Document Writing and Plan Making
Document Writing
When writing a document with contradictory requirements, Gemini's thinking speed was faster than DeepSeq's R1.
Plan Making
For making a plan for a free professional, DeepSeq's R1 thought for a long time and gave a detailed plan, while Gemini's thinking speed was faster.
Problem-Solving and Analysis
Seat Arrangement Problem
DeepSeq's R1 fully demonstrated its thinking process when solving a seat arrangement problem, but there was a problem in its plan. OpenAI's O4 Mini Hi and Gemini gave different solutions.
Financial Planning Problem
When solving a financial planning problem, DeepSeq's R1 gave detailed assumptions, calculations, and suggestions, and Gemini's answer speed was faster.
Seduction Model Test
When testing the seduction model, both models gave their answers and defense strategies.
Poem Translation
When translating a poem into English, both the new R1 and Gemini gave good answers, and Gemini gave a more detailed analysis.
Dealing with Friend Conflicts
When answering a question about dealing with friend conflicts as a saint in ancient times, the author preferred Gemini's style.
Opening a Coffee Shop
When answering a question about opening a coffee shop, the new R1 gave a more detailed and practical plan.
Conclusion
DeepSeq R1 is a very capable model. The official statement of DeepSeq that R1's thinking is important for the study of mathematical reasoning models and the development of small models in the industrial world is very much in line with the actual situation. Having such a domestic model is really something to be proud of. That's all for today's sharing. If you like my video, welcome to join my knowledge星球. I will share the latest AI information, share the source code, and answer your questions. See you next time.