paulpacaud2026apache-2.0

Guardian FailCoT OOD Benchmarks

Name: Guardian FailCoT OOD Benchmarks
Creator: paulpacaud
Published: 2026-01-01
License: apache-2.0
Keywords: rgb, language, manipulation, failure_detection, visual_question_answering, lab, home

Three real-world failure-detection benchmarks (UR5-Fail, RoboFail, RoboVQA) for evaluating vision-language models on cross-environment robotic manipulation failure reasoning and out-of-distribution generalization.

Downloads0
Episodes650

Why This Matters for Physical AI

Provides large-scale out-of-distribution real-robot benchmarks for evaluating failure detection and reasoning in vision-language models across diverse embodiments and environments, enabling research on robust cross-environment manipulation.

Technical Profile

Modalities: rgblanguage
Robot Embodiments: UR5mobile_manipulatorhumanoid
Environment: labhome
Task Types: manipulationfailure_detectionvisual_question_answering
Episodes: 650
Data Format: JSONL
Annotation Types: language_instructionsreward_labelsfailure_modesfailure_reasons
License: apache-2.0

Part of the Guardian family

Access

View on HuggingFace Read Paper

Need custom rgb data?

Claru builds purpose-built datasets for lab applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets

LingBot-Depth Dataset

Self-curated RGB-D dataset for training masked depth modeling approaches, containing real-world indoor scenes, VLA robot manipulation tasks, and simulated data across multiple camera types and robot platforms.

rgbdepth

389K downloadsApr 2026CC BY-NC-SA 4.0

Xperience-10M

A large-scale egocentric multimodal dataset of human experience containing 10 million interactions and 10,000 hours of synchronized first-person recordings with six video streams, audio, stereo depth, camera pose, hand mocap, full-body mocap, IMU, and hierarchical language annotations for embodied AI, robotics, and world modeling research.

rgbaudiodepthproprioception+3

184K downloadsApr 2026other

Egocentric-100K

The largest dataset of manual labor with 100,405 hours of egocentric video from head-mounted fisheye cameras, featuring state-of-the-art hand visibility and active manipulation density.

rgbvideo

75K downloadsFeb 2026apache-2.0

OmniAction

A large-scale multimodal dataset for proactive robot manipulation comprising 141,162 episodes across 112 skills and 748 objects, enriched with audio, visual, and contextual instruction data for cross-modal intention recognition.

rgbaudiolanguage

48K downloadsMar 2026cc-by-nc-4.0

OmniAction

A large-scale multimodal dataset for proactive robot manipulation comprising 141,162 episodes with cross-modal contextual instructions derived from spoken dialogue, environmental sounds, and visual cues rather than explicit commands.

rgbaudiolanguage

47K downloadsMar 2026cc-by-nc-4.0

Open-H-Embodiment

A multi-embodiment community-driven dataset of paired kinematics and video for training AI autonomy models in surgical robotics and ultrasound applications. The dataset includes tabletop exercises, clinical procedures, and simulations of healthcare robotics tasks.

rgbproprioception

47K downloadsMar 2026cc-by-4.0