LLMs Struggle With Chess: Limitations And Implications
LLMs struggle with complex tasks like chess due to lack of creative problem-solving skills. They're essentially large look-up tables, not capable of strategic reflection or evaluation. New benchmarks are needed to assess their capabilities.
No, they can not play chess if you expect them to win rather than merely move pieces on the board. I ran multiple simulations putting LLMs against a random player, and there was no single win for LLM. Here's the full LLM Chess Leaderboard. A shorter version follows... Player Draws ▼ Wins gpt-4-turbo-2024-04-09 93.33% 0.00% gpt-4o-2024-08-06 90.00% 0.00% gpt-4o-2024-05-13 83.33% 0.00% anthropic.claude-v3-5-sonnet 73.33% 0.00% gpt-4o-mini-2024-07-18 60.00% 0.00% llama-3-70b-instruct-awq 50.00% 0.00% gemini-1.5-pro-preview-0409 36.67% 0.00% gemini-1.5-flash-001 33.33% 0.00% gemma-2-2...