Connections Evaluation Box Score | |||||||||||||||
Latest runs for 23 models (11 puzzles, >40 guesses, sorted by solve rate) | |||||||||||||||
Model | Date | GP | W | PCT | ATT | HIT | MISS | ERR | AVG | TIME | AVG/G | TOK | TOK/G | COST | $/G |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gpt5-mini | 2025-08-19 | 11 | 11 | 1.000 | 52 | 44 | 8 | 0 | .846 | 16m40s | 1m31s | 79.9k | 7.3k | $0.15 | $0.013 |
o3 | 2025-08-01 | 11 | 11 | 1.000 | 45 | 44 | 1 | 1 | .977 | 17m10s | 1m33s | 59.5k | 5.4k | $0.43 | $0.040 |
gpt5 | 2025-08-07 | 11 | 11 | 1.000 | 46 | 44 | 2 | 0 | .956 | 16m28s | 1m29s | 74.5k | 6.8k | $0.69 | $0.063 |
gemini | 2025-08-06 | 11 | 11 | 1.000 | 49 | 44 | 5 | 1 | .897 | 16m0s | 1m27s | 91.6k | 8.3k | $0.83 | $0.075 |
grok4 | 2025-08-11 | 11 | 11 | 1.000 | 44 | 44 | 0 | 0 | 1.000 | 31m46s | 2m53s | 113.6k | 10.3k | $1.58 | $0.144 |
o4-mini | 2025-08-01 | 11 | 10 | .909 | 50 | 42 | 8 | 0 | .840 | 23m48s | 2m9s | 121.2k | 11.0k | $0.52 | $0.047 |
gpt-oss-120b | 2025-08-06 | 11 | 8 | .727 | 56 | 36 | 20 | 6 | .642 | 11m41s | 1m3s | 123.5k | 11.2k | $0.06 | $0.005 |
deepseek-r1 | 2025-08-21 | 11 | 8 | .727 | 53 | 35 | 18 | 5 | .660 | 83m42s | 7m36s | 222.1k | 20.2k | $0.52 | $0.047 |
grok3-mini | 2025-08-07 | 11 | 6 | .545 | 54 | 26 | 28 | 0 | .481 | 8m17s | 45s | 72.3k | 6.6k | $0.05 | $0.005 |
opus-4 | 2025-08-01 | 11 | 5 | .454 | 58 | 29 | 29 | 13 | .500 | 4m40s | 25s | 38.8k | 3.5k | $0.71 | $0.064 |
gpt5-nano | 2025-08-08 | 11 | 4 | .363 | 53 | 22 | 31 | 3 | .415 | 24m2s | 2m11s | 225.1k | 20.5k | $0.09 | $0.008 |
sonnet-4 | 2025-08-21 | 11 | 4 | .363 | 56 | 22 | 34 | 17 | .392 | 3m6s | 16s | 52.5k | 4.8k | $0.20 | $0.018 |
gpt4.1 | 2025-08-01 | 11 | 3 | .272 | 54 | 17 | 37 | 0 | .314 | 57s | 5s | 13.6k | 1.2k | $0.03 | $0.003 |
opus-4.1 | 2025-08-21 | 11 | 3 | .272 | 56 | 23 | 33 | 17 | .410 | 3m31s | 19s | 46.0k | 4.2k | $0.84 | $0.076 |
horizon | 2025-08-01 | 11 | 2 | .181 | 49 | 13 | 36 | 3 | .265 | 50s | 4s | 14.0k | 1.3k | $0.00 | $0.000 |
grok3 | 2025-08-01 | 11 | 2 | .181 | 51 | 12 | 39 | 0 | .235 | 54s | 5s | 12.7k | 1.2k | $0.03 | $0.002 |
llama3.3 | 2025-08-21 | 11 | 1 | .090 | 51 | 8 | 43 | 1 | .156 | 56s | 5s | 14.2k | 1.3k | $0.00 | $0.000 |
gemini-flash | 2025-08-01 | 11 | 1 | .090 | 53 | 13 | 40 | 2 | .245 | 40s | 3s | 12.4k | 1.1k | $0.01 | $0.000 |
gpt-oss-20b | 2025-08-06 | 11 | 1 | .090 | 51 | 18 | 33 | 17 | .352 | 20m46s | 1m53s | 246.8k | 22.4k | $0.05 | $0.005 |
qwen3 | 2025-08-08 | 11 | 0 | .000 | 44 | 0 | 44 | 0 | .000 | 1m2s | 5s | 10.9k | 1.0k | $0.00 | $0.000 |
gpt4.1-mini | 2025-08-08 | 11 | 0 | .000 | 48 | 4 | 44 | 0 | .083 | 46s | 4s | 11.7k | 1.1k | $0.01 | $0.001 |
deepseek-v3.1 | 2025-08-21 | 11 | 0 | .000 | 50 | 6 | 44 | 3 | .120 | 1m42s | 9s | 14.9k | 1.4k | $0.01 | $0.001 |
gpt4-turbo | 2025-08-01 | 11 | 0 | .000 | 46 | 2 | 44 | 1 | .043 | 1m0s | 5s | 11.8k | 1.1k | $0.14 | $0.012 |