Connections Evaluation Box Score
Latest runs for 23 models (11 puzzles, >40 guesses, sorted by solve rate)
Model Date GP W PCT ATT HIT MISS ERR AVG TIME AVG/G TOK TOK/G COST $/G
gpt5-mini 2025-08-19 11 11 1.000 52 44 8 0 .846 16m40s 1m31s 79.9k 7.3k $0.15 $0.013
o3 2025-08-01 11 11 1.000 45 44 1 1 .977 17m10s 1m33s 59.5k 5.4k $0.43 $0.040
gpt5 2025-08-07 11 11 1.000 46 44 2 0 .956 16m28s 1m29s 74.5k 6.8k $0.69 $0.063
gemini 2025-08-06 11 11 1.000 49 44 5 1 .897 16m0s 1m27s 91.6k 8.3k $0.83 $0.075
grok4 2025-08-11 11 11 1.000 44 44 0 0 1.000 31m46s 2m53s 113.6k 10.3k $1.58 $0.144
o4-mini 2025-08-01 11 10 .909 50 42 8 0 .840 23m48s 2m9s 121.2k 11.0k $0.52 $0.047
gpt-oss-120b 2025-08-06 11 8 .727 56 36 20 6 .642 11m41s 1m3s 123.5k 11.2k $0.06 $0.005
deepseek-r1 2025-08-21 11 8 .727 53 35 18 5 .660 83m42s 7m36s 222.1k 20.2k $0.52 $0.047
grok3-mini 2025-08-07 11 6 .545 54 26 28 0 .481 8m17s 45s 72.3k 6.6k $0.05 $0.005
opus-4 2025-08-01 11 5 .454 58 29 29 13 .500 4m40s 25s 38.8k 3.5k $0.71 $0.064
gpt5-nano 2025-08-08 11 4 .363 53 22 31 3 .415 24m2s 2m11s 225.1k 20.5k $0.09 $0.008
sonnet-4 2025-08-21 11 4 .363 56 22 34 17 .392 3m6s 16s 52.5k 4.8k $0.20 $0.018
gpt4.1 2025-08-01 11 3 .272 54 17 37 0 .314 57s 5s 13.6k 1.2k $0.03 $0.003
opus-4.1 2025-08-21 11 3 .272 56 23 33 17 .410 3m31s 19s 46.0k 4.2k $0.84 $0.076
horizon 2025-08-01 11 2 .181 49 13 36 3 .265 50s 4s 14.0k 1.3k $0.00 $0.000
grok3 2025-08-01 11 2 .181 51 12 39 0 .235 54s 5s 12.7k 1.2k $0.03 $0.002
llama3.3 2025-08-21 11 1 .090 51 8 43 1 .156 56s 5s 14.2k 1.3k $0.00 $0.000
gemini-flash 2025-08-01 11 1 .090 53 13 40 2 .245 40s 3s 12.4k 1.1k $0.01 $0.000
gpt-oss-20b 2025-08-06 11 1 .090 51 18 33 17 .352 20m46s 1m53s 246.8k 22.4k $0.05 $0.005
qwen3 2025-08-08 11 0 .000 44 0 44 0 .000 1m2s 5s 10.9k 1.0k $0.00 $0.000
gpt4.1-mini 2025-08-08 11 0 .000 48 4 44 0 .083 46s 4s 11.7k 1.1k $0.01 $0.001
deepseek-v3.1 2025-08-21 11 0 .000 50 6 44 3 .120 1m42s 9s 14.9k 1.4k $0.01 $0.001
gpt4-turbo 2025-08-01 11 0 .000 46 2 44 1 .043 1m0s 5s 11.8k 1.1k $0.14 $0.012