Loading...
Discovering amazing AI tools
Crowdsourced benchmarking platform that hosts a Chatbot Arena and leaderboards to evaluate and compare foundation models across tasks.
Crowdsourced benchmarking platform that hosts a Chatbot Arena and leaderboards to evaluate and compare foundation models across tasks.
LMArena is an open platform for crowdsourced AI benchmarking where users interact with and vote between chatbot outputs, producing leaderboards that rank foundation models across multiple categories (text, code, vision, search, T2I, webdev). The project publishes datasets of human preference judgments, open evaluation toolkits (e.g., Arena-Hard-Auto) and repositories for automated benchmarking and prompt-to-leaderboard tooling. LMArena combines human-preference data with automated evaluation pipelines (including GPT-based judges and ensemble judging) to produce high-correlation leaderboards and actionable evaluation metrics for model developers and researchers. Its value lies in public, reproducible benchmarks, large human preference datasets, and tooling to estimate how models will perform in LMArena-style head-to-head comparisons before deployment.