AI Model Comparison
Performance Metrics
GPT-4 o3-mini
Reasoning (GPQA)
78.0%
Math (MATH 500)
97.9%
Context Window
~32K tokens
Claude Sonnet 3.7
Reasoning (GPQA)
68.0% / 84.8%*
Math (MATH 500)
96.2%
Context Window
200K tokens
*84.8% in extended thinking mode