ChatGPT的回答:
Here's the comparison of math-related benchmark performance between OpenAI o3 and DeepSeek R1, excluding any benchmarks where one of the models’ numbers is unknown:
-
AIME 2024:
- OpenAI o3: 96.7% accuracy
- DeepSeek R1: 79.8%
-
Graduate-Level Google-Proof Q&A (GPQA) Benchmark:
- OpenAI o3: 87.7%
- DeepSeek R1: 71.5%
The MATH-500 benchmark data is only available for DeepSeek R1, so no comparison is made for that benchmark.