| Model | Date | W | WIN% | HIT/ATT | ACC% | AVG/G | TOK/G | COST | $/G |
|---|---|---|---|---|---|---|---|---|---|
| google/gemini-3-flash-preview | 2026-02-24 | 20 | 100.0% | 80/85 | 94.1% | 10s | 4.9k | $0.10 | $0.005 |
| openai/gpt-5.3-codex | 2026-02-26 | 20 | 100.0% | 80/84 | 95.2% | 36s | 3.6k | $0.50 | $0.025 |
| x-ai/grok-4-fast | 2026-02-25 | 20 | 100.0% | 80/85 | 94.1% | 41s | 8.5k | $0.06 | $0.003 |
| openai/o3 | 2026-02-25 | 20 | 100.0% | 80/82 | 97.6% | 1m7s | 8.2k | $1.05 | $0.052 |
| google/gemini-3.1-pro-preview | 2026-02-24 | 20 | 100.0% | 80/80 | 100.0% | 1m15s | 7.8k | $1.31 | $0.066 |
| google/gemini-3-pro-preview | 2026-02-25 | 20 | 100.0% | 80/81 | 98.8% | 1m19s | 10.7k | $1.94 | $0.097 |
| anthropic/claude-opus-4.6 | 2026-02-23 | 20 | 100.0% | 80/80 | 100.0% | 1m25s | 17.7k | $3.53 | $0.176 |
| anthropic/claude-sonnet-4.6 | 2026-02-24 | 20 | 100.0% | 80/80 | 100.0% | 1m51s | 29.4k | $3.53 | $0.177 |
| openai/gpt-5.2-pro | 2026-02-25 | 20 | 100.0% | 80/80 | 100.0% | 3m7s | 3.9k | $7.44 | $0.372 |
| x-ai/grok-4 | 2025-10-18 | 20 | 100.0% | 80/82 | 97.6% | 3m28s | 13.6k | $2.53 | $0.127 |
| moonshotai/kimi-k2.5 | 2026-02-24 | 20 | 100.0% | 80/89 | 89.9% | 8m14s | 13.4k | $0.51 | $0.026 |
| anthropic/claude-opus-4.5 | 2026-02-25 | 19 | 95.0% | 78/84 | 92.9% | 34s | 7.6k | $1.52 | $0.076 |
| qwen/qwen3-max-thinking | 2026-02-15 | 19 | 95.0% | 78/87 | 89.7% | 2m48s | 15.3k | $0.88 | $0.044 |
| z-ai/glm-5 | 2026-02-24 | 19 | 95.0% | 77/89 | 86.5% | 3m11s | 9.2k | $0.41 | $0.020 |
| z-ai/glm-4.7 | 2026-01-30 | 19 | 95.0% | 77/88 | 87.5% | 5m18s | 21.0k | $0.76 | $0.038 |
| google/gemini-2.5-pro | 2025-10-18 | 18 | 90.0% | 75/89 | 84.3% | 1m1s | 11.4k | $1.30 | $0.065 |
| openai/gpt-5-mini | 2025-12-19 | 18 | 90.0% | 69/84 | 82.1% | 1m51s | 7.9k | $0.24 | $0.012 |
| stepfun/step-3.5-flash | 2026-02-15 | 18 | 90.0% | 76/92 | 82.6% | 5m42s | 93.1k | $0.43 | $0.021 |
| anthropic/claude-4.5-sonnet | 2025-12-19 | 17 | 85.0% | 59/86 | 68.6% | 38s | 7.1k | $0.91 | $0.045 |
| deepseek/deepseek-v3.2 | 2025-12-02 | 17 | 85.0% | 72/92 | 78.3% | 4m21s | 13.4k | $0.08 | $0.004 |
| moonshotai/kimi-k2-thinking | 2025-11-12 | 17 | 85.0% | 74/101 | 73.3% | 8m26s | 17.9k | $0.65 | $0.032 |
| deepseek/deepseek-r1-0528 | 2025-10-18 | 16 | 80.0% | 69/99 | 69.7% | 9m14s | 21.4k | $0.98 | $0.049 |
| openai/gpt-5.2 | 2025-12-19 | 15 | 75.0% | 56/83 | 67.5% | 42s | 3.9k | $0.64 | $0.032 |
| openai/gpt-oss-120b | 2025-10-18 | 14 | 70.0% | 65/111 | 58.6% | 2m26s | 19.4k | $0.12 | $0.006 |
| qwen/qwen3.5-35b-a3b | 2026-02-26 | 14 | 70.0% | 63/89 | 70.8% | 3m10s | 24.3k | $0.73 | $0.037 |
| anthropic/claude-haiku-4.5 | 2025-12-19 | 13 | 65.0% | 47/94 | 50.0% | 37s | 12.5k | $0.56 | $0.028 |
| qwen/qwen3-max | 2025-10-18 | 13 | 65.0% | 63/106 | 59.4% | 2m9s | 13.7k | $0.69 | $0.034 |
| qwen/qwen3.5-flash-02-23 | 2026-02-26 | 13 | 65.0% | 61/97 | 62.9% | 2m53s | 30.3k | $0.19 | $0.010 |
| moonshotai/kimi-k2-0905 | 2025-10-18 | 10 | 50.0% | 53/100 | 53.0% | 1m9s | 7.1k | $0.15 | $0.008 |
| openai/o4-mini | 2025-12-20 | 9 | 45.0% | 35/97 | 36.1% | 3m33s | 29.3k | $2.41 | $0.121 |
| openai/gpt-oss-20b | 2025-10-18 | 8 | 40.0% | 45/108 | 41.7% | 4m4s | 34.4k | $0.12 | $0.006 |
| z-ai/glm-4.6 | 2025-10-18 | 7 | 35.0% | 45/115 | 39.1% | 1m41s | 8.2k | $0.17 | $0.008 |
| openai/o3-mini | 2025-12-20 | 5 | 25.0% | 25/100 | 25.0% | 2m38s | 21.6k | $1.72 | $0.086 |
| minimax/minimax-m2.5 | 2026-02-14 | 3 | 15.0% | 24/95 | 25.3% | 2m5s | 8.8k | $0.13 | $0.007 |
| amazon/nova-pro-v1 | 2025-12-20 | 1 | 5.0% | 8/90 | 8.9% | 11s | 4.7k | $0.13 | $0.006 |
| microsoft/phi-4 | 2025-11-06 | 1 | 5.0% | 17/102 | 16.7% | 47s | 7.5k | $0.01 | $0.001 |
| meta-llama/llama-3.3-70b-instruct | 2025-12-20 | 1 | 5.0% | 7/97 | 7.2% | 50s | 4.5k | $0.03 | $0.001 |
| mistralai/mistral-large | 2025-12-20 | 1 | 5.0% | 9/99 | 9.1% | 58s | 10.6k | $0.68 | $0.034 |
| baidu/ernie-4.5-21b-a3b-thinking | 2025-10-18 | 0 | 0.0% | 18/99 | 18.2% | 3m22s | 22.3k | $0.11 | $0.005 |