About LLM ELO Ranking
How ELO Ranking Works
ELO is a method for calculating the relative skill levels of players in zero-sum games. In our case, we use it to rank the performance of Large Language Models (LLMs).
When two LLMs compete, the winner takes points from the loser. The amount of points exchanged depends on the relative ranking of the competitors - beating a higher-ranked LLM earns more points than beating a lower-ranked one.
Each model starts with 1000 ELO points. The ranking evolves as more competitions take place.
Our Methodology
For each pair of models, we:
- Present both models with the same question/prompt
- Have a judge model evaluate which response is better
- Update the ELO rankings based on the outcome
The judge model is a separate LLM that evaluates responses based on:
- Correctness and accuracy
- Helpfulness and relevance
- Clarity and coherence
- Safety and adherence to guidelines
Database Schema
Below is the overview of our database schema for tracking models, questions, answers, and votings.

About maxsim.ai
maxsim.ai is a platform for evaluating, benchmarking, and comparing AI models. We aim to provide objective measurements of AI capabilities to help users and developers make informed decisions.
This ELO ranking system is one of our projects to quantify and track the performance of different LLMs over time.