guohaoli2000apache-2.0
HarnessBench-CN-v0.1
A Chinese-first benchmark for evaluating robot harness decision behavior in scenarios involving permissions, memory operations, context handling, failure recovery, and refusal/abort decisions.
Downloads27
Likes1
Technical Profile
- Modalities
- language
- Task Types
- decision-makingtool-usepermission_gatingmemory_operationscontext_handlingfailure_recoveryrefusal_abort
- Data Format
- JSONL
- License
- apache-2.0
Access
Need custom language data?
Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.
Request a Sample Pack