Open-Source LLMs Outperform GPT-4 In Non-English Languages
New benchmark MMLU-ProX tests LLMs in 9 languages, revealing performance gaps. Open-source models like Llama-3 outperform proprietary GPT-4 in some non-English tests.
This is a Plain English Papers summary of a research paper called New AI Test Shows Open-Source Models Beat GPT-4 in Foreign Languages. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter. Overview MMLU-ProX is a new benchmark for testing large language models (LLMs) across multiple languages Covers 57 subjects and 9 languages including English, Chinese, French, German, Japanese, Korean, Portuguese, Spanish, and Arabic Built upon MMLU-Pro, but extends it to non-English languages Reveals significant performance gaps in LLMs across different langua...