Benchmark

1 post tagged with this.

Anthropic shipped Claude Opus 4.7 today with real benchmark data against GPT-5.4 and Gemini 3.1 Pro. 87.6% on SWE-bench Verified, 80.6% on document reasoning, 3.75MP vision, and a new xhigh effort level.

← All posts