AI Models Trust Text Over Images 98% Of Time, Even When Wrong

Mar 11, 2025

Vision-language models prioritize text over images 98% of time, even when wrong. GPT-4V shows "blind faith" in textual descriptions, impacting model confidence.

This is a Plain English Papers summary of a research paper called Study Reveals AI Models Trust Text Over Images 98% of Time, Even When Wrong. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.

  
  
  Overview

Vision-language models (VLMs) often prioritize text over visual information
Models show "blind faith" in textual descriptions even when contradicting images
GPT-4V shows 98% text influence on decisions when text and images conflict
Textual certainty and agreement with prior text impacts model confidence
Major VLMs (GPT-4V, Claude, Gemini) evalua...

Read the full article