OpenAI o3和DeepSeek R1在數學benchmarks的成績比較:

來源: 2025-02-02 00:40:34 [博客] [舊帖] [給我悄悄話] 本文已被閱讀:

ChatGPT的回答:

Here's the comparison of math-related benchmark performance between OpenAI o3 and DeepSeek R1, excluding any benchmarks where one of the models’ numbers is unknown:

  • AIME 2024:

    • OpenAI o3: 96.7% accuracy
    • DeepSeek R1: 79.8%
  • Graduate-Level Google-Proof Q&A (GPQA) Benchmark:

    • OpenAI o3: 87.7%
    • DeepSeek R1: 71.5%

The MATH-500 benchmark data is only available for DeepSeek R1, so no comparison is made for that benchmark.