OpenAI o3和DeepSeek R1在數學benchmarks的成績比較:

ChatGPT的回答:

Here's the comparison of math-related benchmark performance between OpenAI o3 and DeepSeek R1, excluding any benchmarks where one of the models’ numbers is unknown:

  • AIME 2024:

    • OpenAI o3: 96.7% accuracy
    • DeepSeek R1: 79.8%
  • Graduate-Level Google-Proof Q&A (GPQA) Benchmark:

    • OpenAI o3: 87.7%
    • DeepSeek R1: 71.5%

The MATH-500 benchmark data is only available for DeepSeek R1, so no comparison is made for that benchmark.
 

請您先登陸,再發跟帖!