Model Performance Comparison
Claude 3.7 Sonnet vs GPT-O3
C
Claude 3.7 Sonnet
GPQA Diamond
84.8%
AIME Math
80.0%
SWE-bench
70.3%
Output Cost
$15/M
O
GPT-O3 Model
GPQA Diamond
79.7%
AIME Math
87.3%
SWE-bench
64.5%
Output Cost
$75/M