shlogg · Early preview
Maxim Saplin @msmxm

LLMs Struggle With Chess: Limitations And Implications

LLMs struggle with complex tasks like chess due to lack of creative problem-solving skills. They're essentially large look-up tables, not capable of strategic reflection or evaluation. New benchmarks are needed to assess their capabilities.

No, they can not play chess if you expect them to win rather than merely move pieces on the board. I ran multiple simulations putting LLMs against a random player, and there was no single win for LLM.

Here's the full LLM Chess Leaderboard. A shorter version follows...


Player
Draws ▼
Wins


gpt-4-turbo-2024-04-09
93.33%
0.00%

gpt-4o-2024-08-06
90.00%
0.00%

gpt-4o-2024-05-13
83.33%
0.00%

anthropic.claude-v3-5-sonnet
73.33%
0.00%

gpt-4o-mini-2024-07-18
60.00%
0.00%

llama-3-70b-instruct-awq
50.00%
0.00%

gemini-1.5-pro-preview-0409
36.67%
0.00%

gemini-1.5-flash-001
33.33%
0.00%

gemma-2-2...