Lennittusmit

DESPITE: Deterministic Evaluation of Safe Planning In embodied Task Execution

A benchmark for evaluating large language models on embodied safe task planning, derived from multiple sources including ALFRED, BDDL, VirtualHome, NormBank, and NEISS.

Downloads76
Episodes12,729
Likes4

Why This Matters for Physical AI

This dataset is critical for evaluating the safety risks of using LLMs for embodied task planning in robotic systems, addressing systematic safety concerns in AI-driven robot control.

Technical Profile

Modalities
language
Action Space
language
Environment
simulation
Task Types
task-planningmanipulation
Episodes
12,729
Annotation Types
language_instructionspddl
License
mit
Part of the DESPITE: Deterministic Evaluation of Safe Planning In embodied Task Execution family

Access

Need custom language data?

Claru builds purpose-built datasets for simulation applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets