Evaluation Framework

Evaluation framework fully open-source on GitHub, developers can reproduce evaluation results independently

Quick Start

Installation & Runningbash
# Clone Repository
git clone https://github.com/healthmemoryarena/hma-benchmark.git  # coming soon
cd hma-benchmark

# Install Dependencies
uv sync

# Run Benchmark
python -m benchmark.basic_runner healthbench sample \
  --target-type llm_api \
  --target-model gpt-4o

API Example

Pythonpython
from evaluator.core.orchestrator import do_single_test
from evaluator.core.schema import TestCase

# Build TestCase and run evaluation
result = await do_single_test(test_case)
print(f"Score: {result.score}, Pass: {result.passed}")