guohaoli2000apache-2.0

HarnessBench-CN-v0.1

A Chinese-first benchmark for evaluating robot harness decision behavior in scenarios involving permissions, memory operations, context handling, failure recovery, and refusal/abort decisions.

Downloads27
Likes1

Technical Profile

Modalities
language
Task Types
decision-makingtool-usepermission_gatingmemory_operationscontext_handlingfailure_recoveryrefusal_abort
Data Format
JSONL
License
apache-2.0
Part of the HarnessBench-CN-v0.1 family

Access

Need custom language data?

Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets