Lennittusmit
DESPITE: Deterministic Evaluation of Safe Planning In embodied Task Execution
A benchmark for evaluating large language models on embodied safe task planning, derived from multiple sources including ALFRED, BDDL, VirtualHome, NormBank, and NEISS.
Downloads76
Episodes12,729
Likes4
Why This Matters for Physical AI
This dataset is critical for evaluating the safety risks of using LLMs for embodied task planning in robotic systems, addressing systematic safety concerns in AI-driven robot control.
Technical Profile
- Modalities
- language
- Action Space
- language
- Environment
- simulation
- Task Types
- task-planningmanipulation
- Episodes
- 12,729
- Annotation Types
- language_instructionspddl
- License
- mit
Access
Need custom language data?
Claru builds purpose-built datasets for simulation applications with dense human annotations and quality assurance.
Request a Sample Pack