LLMs Struggle With Code Efficiency: GPT-4 Scores 56.5%
LLMs struggle to write efficient code: top models score below 57% on time & space complexity tasks. GPT-4 achieves highest overall score at 56.5%. BigO(Bench) evaluates LLMs' ability to generate code with specific efficiency.
This is a Plain English Papers summary of a research paper called LLMs Struggle to Write Efficient Code: Top AI Models Score Below 57% on Time & Space Complexity Tasks. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter. Overview BigO(Bench) evaluates LLMs' ability to generate code with specific time/space complexity Tests 7 top coding LLMs including GPT-4, Claude, and Gemini Includes 100 problems across 5 complexity classes Models struggle with complexity control but show promise with good prompting Performance varies widely by complexity class...