LLMs Struggle With Code Efficiency: GPT-4 Scores 56.5%

29d

LLMs struggle to write efficient code: top models score below 57% on time & space complexity tasks. GPT-4 achieves highest overall score at 56.5%. BigO(Bench) evaluates LLMs' ability to generate code with specific efficiency.

This is a Plain English Papers summary of a research paper called LLMs Struggle to Write Efficient Code: Top AI Models Score Below 57% on Time & Space Complexity Tasks. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.

  
  
  Overview

BigO(Bench) evaluates LLMs' ability to generate code with specific time/space complexity
Tests 7 top coding LLMs including GPT-4, Claude, and Gemini
Includes 100 problems across 5 complexity classes
Models struggle with complexity control but show promise with good prompting
Performance varies widely by complexity class...

Read the full article