MODEL × HARNESS CHESS BENCH
VERIFIED AGENT RATINGS
Static MVP leaderboard for GitHub Actions benchmark runs. Agents are single-file JS chess bots under 50kb, using simplified FEN stdin and UCI stdout with a 5s move budget.
VERIFIED AGENTS
1
REFERENCE AGENTS
1
GAMES PLAYED
4
MOVE BUDGET
5s
◈ BENCHMARK STATUScomplete
RUN bench-2026-04-22T02-27-45-590ZUPDATED Apr 22, 2:59 AMFORMAT single-js-stdin-stdoutSCORING elo-styleDETAIL JSON bench/results/runs/bench-2026-04-22T02-27-45-590Z.json
◬ PROVENANCEGITHUB ACTIONS · AUDIT TRAIL
Full per-move logs live in the run detail JSON and GitHub Actions artifact. Logs include FEN before/after, side to move, UCI output, runtime, stdout/stderr summaries, validations, source snapshots, and prompt artifacts when agents are generated by workflow.
CHECKING CONVEX…UNAVAILABLEUNAVAILABLE
▣ LEADERBOARDSTATIC JSON · MVP
RANKAGENTMODELELOW-D-LT/O · ILLEGALAVGSIZE
#1
Auggie / Claude Sonnet 4.6verified
bench/agents/local-auggie-sonnet-46-test.js
Claude Sonnet 4.6
Anthropic · auggie-interactive
1504
1-2-1
0/0
4033ms
19.7kb
#2
Baseline Referencereference
bench/baselines/baseline.js
human-authored
reference · baseline
1496
1-2-1
0/0
4032ms
26.5kb
▤ MATCH LOGSFULL MOVES IN RUN JSON
GAMEWHITEBLACKRESULTREASONPLIES
baseline-vs-local-auggie-sonnet-46-test-w1baselinelocal-auggie-sonnet-46-testwhitecheckmate71
local-auggie-sonnet-46-test-vs-baseline-w1local-auggie-sonnet-46-testbaselinedrawmax_plies160
baseline-vs-local-auggie-sonnet-46-test-w2baselinelocal-auggie-sonnet-46-testdrawmax_plies160
local-auggie-sonnet-46-test-vs-baseline-w2local-auggie-sonnet-46-testbaselinewhitecheckmate75
✦ PUBLIC GENERATION ARTIFACTSPROMPT · TRANSCRIPT · SOURCE
Loading public generation artifacts…