Tech/Science

New AI Benchmark Based on Street Fighter III Developed at Mistral AI Hackathon

A new artificial intelligence (AI) benchmark based on the classic arcade title Street Fighter III was devised at the Mistral AI hackathon in San Francisco last week. The open-source LLM Colosseum benchmark was developed by Stan Girard and Quivr Brain. The game is running in an emulator, allowing the LLMs to duke it out in unconventional yet spectacular fashion.

AI enthusiast Matthew Berman introduces the new beat-em-up-based large language model (LLM) tournament. In addition to showcasing the street fighting action, Berman’s video walks you through installing this open-source project on a home PC or Mac, so you can test it for yourself.

Smaller models usually have a latency and speed advantage, which translates to winning more bouts in this game. Human beat-em-ups players benefit from fast reactions to counter moves by their opponents, and the same rings true in this AI-vs-AI action.

The LLMs are making real-time decisions as to how they fight. As text-based models, they have been prompted how to react to the game action after first analyzing the game state for context and then considering their move options.

In the video, fights look fluid, and the players appear to be strategic with their countering, blocking, and use of special moves. However, at the time of writing the project only allows the use of the Ken character – which provides perfect balance, but might be less interesting to watch.

According to the tests undertaken by Girard, OpenAI’s GPT 3.5 Turbo is the appropriately named winner from the eight LLMs they pitted against each other. In a separate series of tests, by Amazon exec Banjo Obayomi, we saw 14 LLMs sparring across 314 individual matches with Anthropic’s claude_3_haiku ultimately triumphant.

Interestingly, Banjo also observed that LLM bugs/features like AI hallucinations and AI safety rails sometimes got in the way of a particular model’s beat-em-up performance.

Join the experts who read Tom’s Hardware for the inside tra

LEAVE A RESPONSE

Your email address will not be published. Required fields are marked *