Video thumbnail for DeepSeek R1 0528 硬刚 Gemini 2.5 Pro,多场景实测对比

DeepSeek R1 0528 vs Gemini 2.5 Pro: AI Model Comparison & Real-World Tests

Summary

Quick Abstract

Explore the upgraded DeepSeq R1 0528 model! This summary highlights its performance against Gemini 2.5 Pro, focusing on mathematics, science, and human preference benchmarks. We'll delve into creative writing improvements, fantasy problem reductions, and its coding capabilities by testing its generation of 3D nebulae and Mario games. Furthermore, we will analyse its advantages and disadvantages in answering questions.

Quick Takeaways:

  • DeepSeq R1 0528 performs comparably to top models like Gemini 2.5 Pro in various benchmarks.

  • It demonstrates improved creative writing and reduced fantasy problem rates.

  • The model exhibits impressive coding skills, generating functional games and SVG graphics.

  • DeepSeq R1 0528 excels at detailed and transparent reasoning, crucial for research and industrial applications.

  • It offers notable performance on complex problem-solving tasks, sometimes requiring extended processing time but showing its work.

See how DeepSeq R1-0528 stacks up against the competition and discover its potential in AI development!

Introduction

Hello, everyone. I'm Kecha. DeepSeq has launched the R1 small version upgrade, with the new version numbered 0528. Currently, it can be used on the Taiwan official website app, and the API has also been updated. The new 0528 is based on the DeepSeq V3 base model.

Model Comparisons

DeepSeq stated in its official tweet that the new model is very close to O3 and Gemunt 2.5 Pro. An open-end model performs well with the top-of-the-line BIM model.

Parameter Comparison

  • Mathematics: TPSIC R10528 and OpenAI O3 are very close.

  • Science: O3 has a higher score.

  • DeepSeq R1 0528 vs Gemini 2.5 Pro: In some tests, DeepSeq R1 0528 scores higher, while in others, it scores lower. In the human test, its score is very close to Gemini 2.5 Pro.

Small Model Q3 8B

DeepSeq also officially released a small model, Q3 8B, which is particularly powerful. In the 2024 math competition, this 8B model scored a little higher than Q3 235B.

New Features and Optimizations of DeepSeq R1 0528

Thinking Process

When using the new R1-0528 model, for some topics, it takes more than 10 minutes to think, fully showing its thinking process. In contrast, OpenAI O3 basically has no thinking process.

Fantasy Problems Optimization

The new version of DeepSeq R1 has been optimized for fantasy problems. In scenes such as blood change, color analysis, summary, decoding, reading, and understanding, the fantasy rate has been greatly reduced.

Creative Writing Optimization

After the update of R1, it has been further optimized for e-books, novels, and flash books.

Tool Calling

The new R1 supports tool calling, but not in syncing. Cloud supports tool calling in syncing.

TAU Bench Score

DeepSeq believes that its score in TAU Bench is equivalent to OE High, but there is a gap between O3Hai and CloudSonic 4.

Front-End Code Generation and Role-Playing Field

The new IE generates front-end code, and the ability in the role-playing field is updated and improved. In addition to using it on the official channel of DeepSeq IE, you can also use the API of other suppliers through OpenRouter.

Programming and Game Comparisons

Programming Question

A programming question from Grok officially in p5.js through web.jl was used for comparison. The new A1 generation and Gemini 2.5 Pro generated different effects.

Mario Game

When generating a Mario game, the new A1 generation had some logic problems, while the effect generated by Gemini 2.5 Pro was too simple.

Audio Format of the Dragon Wind

Both models were tested on generating the audio format of the dragon wind. The new i1 considered more aspects such as playing audio or using a microphone.

SVG Comic

For making a simple SVG comic, GEMLINE 2.5 Pro had good results, and the new IE added a bubble dialogue frame, but its picture was not as good as GEMLINE 2.5 Pro.

Hairstyle Landing Page

When creating a hairstyle landing page, both GEMLINE and iE had good effects, but in terms of color, Gemini 2.5 Pro was better.

Animation

In an animation test, Gemini 2.5 Pro performed better.

HTML Page of AI Development

The result of R1 was better than the previous one, and Gemini had a good birth and did color variations on the top.

Dance Animation

The new i1's dance animation was more interesting than the war.

Live Stream with a Live Screen

Both models were used to build a live stream with a live screen. Gemini 2.5 Pro generated a very interesting result.

3D RPG

The new R1 generated a good effect for a cute 3D RPG, and Gemline also had a similar effect.

Poem Writing

When writing a beautiful alphabetical poem, the new R1 generated a good effect, and after adding the prompt to add Chinese, its effect was cooler. GEMLINE's effect was a bit monotonous.

Toy Machine Game

For a toy machine game, the interface of A1 was suitable for children, but it had the problem of not being able to catch things. Gemini was better at this game.

Mouse Platform Game

Cloud 4 was very good at a mouse platform game, while R1 had some problems.

SVG Code Writing

When writing SVG code, R1 generated a good effect, and after the second prompt, it could be operated successfully.

Highly Adaptive Starry Sky

For making a highly adaptive starry sky, the author preferred the effect of R1.

Interactive Multi-Circuit Machine

Gemini performed better in making an interactive multi-circuit machine.

Real-Time, Day-to-Night, Double-Closed Time

DeepSeq's color scheme was better than Gemini's.

Document Writing and Plan Making

Document Writing

When writing a document with contradictory requirements, Gemini's thinking speed was faster than DeepSeq's R1.

Plan Making

For making a plan for a free professional, DeepSeq's R1 thought for a long time and gave a detailed plan, while Gemini's thinking speed was faster.

Problem-Solving and Analysis

Seat Arrangement Problem

DeepSeq's R1 fully demonstrated its thinking process when solving a seat arrangement problem, but there was a problem in its plan. OpenAI's O4 Mini Hi and Gemini gave different solutions.

Financial Planning Problem

When solving a financial planning problem, DeepSeq's R1 gave detailed assumptions, calculations, and suggestions, and Gemini's answer speed was faster.

Seduction Model Test

When testing the seduction model, both models gave their answers and defense strategies.

Poem Translation

When translating a poem into English, both the new R1 and Gemini gave good answers, and Gemini gave a more detailed analysis.

Dealing with Friend Conflicts

When answering a question about dealing with friend conflicts as a saint in ancient times, the author preferred Gemini's style.

Opening a Coffee Shop

When answering a question about opening a coffee shop, the new R1 gave a more detailed and practical plan.

Conclusion

DeepSeq R1 is a very capable model. The official statement of DeepSeq that R1's thinking is important for the study of mathematical reasoning models and the development of small models in the industrial world is very much in line with the actual situation. Having such a domestic model is really something to be proud of. That's all for today's sharing. If you like my video, welcome to join my knowledge星球. I will share the latest AI information, share the source code, and answer your questions. See you next time.

Was this summary helpful?

Quick Actions

Watch on YouTube

Related Summaries

No related summaries found.

Summarize a New YouTube Video

Enter a YouTube video URL below to get a quick summary and key takeaways.