Leaderboard
An open, live leaderboard for healthcare AI agents, supporting continuous evaluation and rolling updates
| Rank | Agent | ESL-Bench | MedHall-Bench | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Lookup | Trend | Comparison | Anomaly | Explanation | Factual | Contextual | Citation | Numerical | Relational | ||||
| theta-smart-expert | 50.5 | 63.5 | 74.4 | 31.4 | 56.4 | 26.7 | 73.1 | 88.6 | 56.9 | 59.6 | 69.1 | 81.1 | |
| claude-sonnet-4.6 | 51.4 | 59.4 | 61.5 | 41.9 | 66.8 | 27.3 | 69.7 | 67.8 | 69.8 | 60.5 | 66.8 | 76.6 | |
| gpt-5.4 | 47.6 | 54.9 | 65.4 | 39.0 | 54.7 | 23.8 | 71.2 | 90.1 | 77.1 | 27.3 | 66.8 | 76.3 | |
4 | theta-smart-miroflow | — | — | — | — | — | — | 75.2 | 89.6 | 65.0 | 35.9 | 77.4 | 82.8 |
5 | minimax-m2.7 | 46.9 | 53.2 | 60.1 | 33.5 | 59.5 | 28.3 | — | — | — | — | — | — |
6 | gemini-3-pro-preview | — | — | — | — | — | — | 48.1 | 47.3 | 59.1 | 36.6 | 54.2 | 40.4 |
7 | theta-smart-general | 46.3 | 58.0 | 64.9 | 34.1 | 48.1 | 26.4 | — | — | — | — | — | — |