Lennittusmit

DESPITE: Deterministic Evaluation of Safe Planning In embodied Task Execution

Name: DESPITE: Deterministic Evaluation of Safe Planning In embodied Task Execution
Creator: Lennittus
License: mit
Keywords: language, task-planning, manipulation, simulation

A benchmark for evaluating large language models on embodied safe task planning, derived from multiple sources including ALFRED, BDDL, VirtualHome, NormBank, and NEISS.

Downloads128
Likes4

Technical Profile

Modalities: language
Environment: simulation
Task Types: task-planningmanipulation
License: mit

Part of the DESPITE: Deterministic Evaluation of Safe Planning In embodied Task Execution family

Access

View on HuggingFace

Need custom language data?

Claru builds purpose-built datasets for simulation applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets

Hy-Embodied-0.5-VLA-Data

A large-scale bimanual manipulation dataset with 2,163 hours of high-fidelity demonstrations collected via custom fingertip UMI device with optical motion-capture, spanning 70+ manipulation tasks for training Vision-Language-Action foundation models.

rgbproprioceptionlanguage

214K downloadsJul 2026cc-by-4.0

Xperience-10M

A large-scale egocentric multimodal dataset of human experience containing 10 million interactions and 10,000 hours of synchronized first-person recordings with six video streams, audio, stereo depth, camera pose, hand mocap, full-body mocap, IMU, and hierarchical language annotations for embodied AI, robotics, and world modeling research.

rgbaudiodepthproprioception+3

79K downloadsApr 2026other

Stera-10M

An open egocentric multimodal dataset for embodied AI and robotics containing 200 hours of synchronized first-person recordings with RGB, LiDAR depth, 6-DoF camera pose, IMU, and 21-joint hand mocap annotations from 500+ sessions.

rgbdepthimuproprioception+2

77K downloadsMay 2026stera-license

OmniAction

A large-scale multimodal dataset for proactive robot manipulation with 141,162 episodes covering contextual instruction following through spoken dialogue, environmental sounds, and visual cues. The dataset includes 5,096 distinct speaker timbres, 2,482 non-verbal sound events, and 640 environmental backgrounds across six categories of contextual instructions.

rgbaudiolanguage

76K downloadsApr 2026cc-by-nc-4.0

In-The-Wild Humanoid Dataset

A large-scale dataset of 500+ hours and 23K+ episodes of whole-body humanoid robot learning captured through human teleoperation demonstrations on Unitree G1 across real homes in Southeast Asia. The dataset includes synchronized visual observations, robot states, actions, and metadata for research on mobile manipulation, bimanual interaction, and long-horizon household skills.

rgbinfraredproprioceptionimu+2

75K downloadsJun 2026

OmniAction

A large-scale multimodal dataset for proactive robot manipulation comprising 141,162 episodes with cross-modal contextual instructions derived from spoken dialogue, environmental sounds, and visual cues rather than explicit commands.

rgbaudiolanguage

75K downloadsMar 2026cc-by-nc-4.0