Connections Evaluation Box Score | ||||||||||||||||
Latest runs for 22 models (>=11 puzzles, >40 guesses, sorted by solve rate) | ||||||||||||||||
Model | Ver | Date | GP | W | PCT | ATT | HIT | MISS | ERR | AVG | TIME | AVG/G | TOK | TOK/G | COST | $/G |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
grok4-fast | 2.0.1 | 2025-10-03 | 20 | 20 | 1.000 | 83 | 80 | 3 | 0 | .963 | 57m9s | 2m51s | 161.3k | 8.1k | $0.06 | $0.003 |
gpt5-mini | 2.0.1 | 2025-10-03 | 11 | 11 | 1.000 | 47 | 44 | 3 | 0 | .936 | 13m2s | 1m11s | 90.3k | 8.2k | $0.13 | $0.012 |
o3 | 2.0.1 | 2025-10-03 | 11 | 11 | 1.000 | 44 | 44 | 0 | 0 | 1.000 | 26m43s | 2m25s | 111.4k | 10.1k | $0.76 | $0.069 |
gemini | 1.0.0 | 2025-08-06 | 11 | 11 | 1.000 | 49 | 44 | 5 | 1 | .897 | 16m0s | 1m27s | 91.6k | 8.3k | $0.83 | $0.075 |
gpt5 | 2.0.1 | 2025-10-03 | 11 | 11 | 1.000 | 45 | 44 | 1 | 0 | .977 | 34m7s | 3m6s | 129.5k | 11.8k | $1.13 | $0.103 |
o4-mini | 1.0.0 | 2025-08-01 | 11 | 10 | .909 | 50 | 42 | 8 | 0 | .840 | 23m48s | 2m9s | 121.2k | 11.0k | $0.52 | $0.047 |
sonnet-4.5 | 2.0.1 | 2025-10-02 | 11 | 9 | .818 | 47 | 38 | 9 | 0 | .808 | 8m59s | 49s | 82.5k | 7.5k | $0.54 | $0.049 |
gpt-oss-120b | 1.0.0 | 2025-08-06 | 11 | 8 | .727 | 56 | 36 | 20 | 6 | .642 | 11m41s | 1m3s | 123.5k | 11.2k | $0.06 | $0.005 |
gemini-flash | 2.0.1 | 2025-10-02 | 11 | 8 | .727 | 49 | 32 | 17 | 0 | .653 | 4m58s | 27s | 170.0k | 15.5k | $0.17 | $0.016 |
grok3-mini | 1.0.0 | 2025-08-07 | 11 | 6 | .545 | 54 | 26 | 28 | 0 | .481 | 8m17s | 45s | 72.3k | 6.6k | $0.05 | $0.005 |
opus-4 | 1.0.0 | 2025-08-01 | 11 | 5 | .454 | 58 | 29 | 29 | 13 | .500 | 4m40s | 25s | 38.8k | 3.5k | $0.71 | $0.064 |
gpt5-nano | 1.0.0 | 2025-08-08 | 11 | 4 | .363 | 53 | 22 | 31 | 3 | .415 | 24m2s | 2m11s | 225.1k | 20.5k | $0.09 | $0.008 |
sonnet-4 | 1.0.0 | 2025-08-01 | 11 | 4 | .363 | 51 | 21 | 30 | 15 | .411 | 2m17s | 12s | 36.4k | 3.3k | $0.13 | $0.012 |
gpt4.1 | 1.0.0 | 2025-08-01 | 11 | 3 | .272 | 54 | 17 | 37 | 0 | .314 | 57s | 5s | 13.6k | 1.2k | $0.03 | $0.003 |
opus-4.1 | 1.0.0 | 2025-08-06 | 11 | 3 | .272 | 47 | 26 | 21 | 21 | .553 | 2m54s | 15s | 37.9k | 3.4k | $0.70 | $0.064 |
horizon | 1.0.0 | 2025-08-01 | 11 | 2 | .181 | 49 | 13 | 36 | 3 | .265 | 50s | 4s | 14.0k | 1.3k | $0.00 | $0.000 |
grok3 | 1.0.0 | 2025-08-01 | 11 | 2 | .181 | 51 | 12 | 39 | 0 | .235 | 54s | 5s | 12.7k | 1.2k | $0.03 | $0.002 |
deepseek-v3.1 | 1.0.0 | 2025-08-21 | 11 | 1 | .090 | 43 | 8 | 35 | 7 | .186 | 1m40s | 9s | 13.3k | 1.2k | $0.00 | $0.000 |
gpt-oss-20b | 1.0.0 | 2025-08-06 | 11 | 1 | .090 | 51 | 18 | 33 | 17 | .352 | 20m46s | 1m53s | 246.8k | 22.4k | $0.05 | $0.005 |
qwen3 | 1.0.0 | 2025-08-08 | 11 | 0 | .000 | 44 | 0 | 44 | 0 | .000 | 1m2s | 5s | 10.9k | 1.0k | $0.00 | $0.000 |
gpt4.1-mini | 1.0.0 | 2025-08-08 | 11 | 0 | .000 | 48 | 4 | 44 | 0 | .083 | 46s | 4s | 11.7k | 1.1k | $0.01 | $0.001 |
gpt4-turbo | 1.0.0 | 2025-08-01 | 11 | 0 | .000 | 46 | 2 | 44 | 1 | .043 | 1m0s | 5s | 11.8k | 1.1k | $0.14 | $0.012 |