Meta has come under scrutiny after submitting a specially tuned version of its Llama 4 AI model to the LMArena leaderboard, sparking concerns about fair competition.
The 'experimental' version, dubbed Llama-4-Maverick-03-26-Experimental, ranked second in popularity, trailing only Google's Gemini-2.5-Pro.
While Meta openly labelled the model as experimental, many users assumed it reflected the public release. Once the official version became available, users quickly noticed it lacked the expressive, emoji-filled responses seen in the leaderboard battles.
LMArena, a crowdsourced platform where users vote on chatbot responses, said Meta's custom variant appeared optimised for human approval, possibly skewing the results.
The group released over 2,000 head-to-head matchups to back its claims, showing the experimental Llama 4 consistently offered longer, more engaging answers than the more concise public build.
In response, LMArena updated its policies to ensure greater transparency and stated that Meta's use of the experimental model did not align with expectations for leaderboard submissions.
Meta defended its approach, stating the experimental model was designed to explore chat optimisation and was never hidden. While company executives denied any misconduct, including speculation around training on test data, they acknowledged inconsistent performance across platforms.
Meta's GenAI chief Ahmad Al-Dahle said it would take time for all public implementations to stabilise and improve. Meanwhile, LMArena plans to upload the official Llama 4 release to its leaderboard for more accurate evaluation going forward.