DeepSeek R1 最多也就是趕上 OpenAI o1-1217

來源: 2025-01-27 19:04:14 [舊帖] [給我悄悄話] 本文已被閱讀:
Benchmark DeepSeek-R1 (%) OpenAI o1-1217 (%) Verdict
AIME 2024 (Pass@1) 79.8 79.2 DeepSeek-R1 wins (better math problem-solving)
Codeforces (Percentile) 96.3 96.6 OpenAI-o1-1217 wins (better competitive coding)
GPQA Diamond (Pass@1) 71.5 75.7 OpenAI-o1-1217 wins (better general QA performance)
MATH-500 (Pass@1) 97.3 96.4 DeepSeek-R1 wins (stronger math reasoning)
MMLU (Pass@1) 90.8 91.8 OpenAI-o1-1217 wins (better general knowledge understanding)
SWE-bench Verified (Resolved) 49.2 48.9 DeepSeek-R1 wins (better software engineering task handling)