Human Forecasters Outperform Top LLM On Benchmark Test

Expert human forecasters outperformed top-performing LLM in statistically significant way (p-value = 0.01) on ForecastBench, a new dynamic benchmark for evaluating forecasting capabilities of ML systems.

This is a Plain English Papers summary of a research paper called Benchmark Tests Superior Forecasting Skills of Humans over AI. If you like these kinds of analysis, you should join AImodels.fyi or follow me on Twitter.

  
  
  Overview

Forecasts of future events are essential for making informed decisions.
Machine learning (ML) systems have the potential to generate forecasts at scale.
However, there is no standard way to evaluate the accuracy of ML forecasting systems.

  
  
  Plain English Explanation

ForecastBench is a new benchmark that aims to address this gap. It is a dynamic test t...

Read the full article