评测框架
评测框架完全开源,发布至 GitHub,开发者可自行复现评测结果
快速开始
安装与运行bash
# 克隆仓库
git clone https://github.com/healthmemoryarena/hma-benchmark.git # coming soon
cd hma-benchmark
# 安装依赖
uv sync
# 运行评测
python -m benchmark.basic_runner healthbench sample \
--target-type llm_api \
--target-model gpt-4oAPI 示例
Pythonpython
from evaluator.core.orchestrator import do_single_test
from evaluator.core.schema import TestCase
# 构建 TestCase 并执行评测
result = await do_single_test(test_case)
print(f"Score: {result.score}, Pass: {result.passed}")