OpenAI o3和DeepSeek R1在數學benchmarks的成績比較:

本帖於 2025-02-01 22:47:23 時間, 由普通用戶 大觀園的賈探春 編輯

ChatGPT的回答:

Here's the comparison of math-related benchmark performance between OpenAI o3 and DeepSeek R1, excluding any benchmarks where one of the models’ numbers is unknown:

  • AIME 2024:

    • OpenAI o3: 96.7% accuracy
    • DeepSeek R1: 79.8%
  • Graduate-Level Google-Proof Q&A (GPQA) Benchmark:

    • OpenAI o3: 87.7%
    • DeepSeek R1: 71.5%

The MATH-500 benchmark data is only available for DeepSeek R1, so no comparison is made for that benchmark.

 

所有跟帖: 

請您先登陸,再發跟帖!