IPEC-COMMUNITYapache-2.0

EO-Data-1.5M

Name: EO-Data-1.5M
Creator: IPEC-COMMUNITY
License: apache-2.0
Keywords: rgb, language, action, manipulation, task_planning, affordance_assessment, failure_detection, trajectory_prediction, object_grounding, spatial_reasoning, lab

A large-scale interleaved vision-text-action dataset with 1.5M samples derived from 2.1M robot episodes, emphasizing temporal dynamics and causal dependencies among vision, language, and action modalities for embodied AI. It combines human-annotated and VLM-generated annotations across 17 subsets covering manipulation and embodied reasoning tasks.

Downloads13K
Episodes1500000
Likes19

Why This Matters for Physical AI

EO-Data-1.5M enables training of generalist robot policies by providing the first large-scale dataset capturing interleaved temporal dynamics and causal relationships between vision, language, and action, essential for developing embodied AI systems that understand both spatial and temporal reasoning.

Technical Profile

Modalities: rgblanguageaction
Robot Embodiments: AgiBotWidowXFranka
Environment: lab
Task Types: manipulationtask_planningaffordance_assessmentfailure_detectiontrajectory_predictionobject_groundingspatial_reasoning
Episodes: 1500000
Data Format: parquet
Annotation Types: language_instructionsvisual_question_answeringvideo_captioningtrajectory_annotationsaction_labelsaffordance_labelsfailure_labelsspatial_annotations
License: apache-2.0

Part of the EO-Robotics family

Community Signals

Top 5% by downloads

HuggingFace Discussions8

Access

View on HuggingFace Read Paper

Need custom rgb data?

Claru builds purpose-built datasets for lab applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets

Xperience-10M

A large-scale egocentric multimodal dataset of human experience containing 10 million interactions and 10,000 hours of synchronized first-person recordings with six video streams, audio, stereo depth, camera pose, hand mocap, full-body mocap, IMU, and hierarchical language annotations for embodied AI, robotics, and world modeling research.

rgbaudiodepthproprioception+3

2.3M downloadsother

Open-H-Embodiment

A multi-embodiment community-driven dataset of paired kinematics and video for training AI autonomy models in surgical robotics and ultrasound applications. The dataset includes tabletop exercises, clinical procedures, and simulations of healthcare robotics tasks.

rgbproprioception

118K downloads2026cc-by-4.0

Open-H-Embodiment

A community-driven, multi-embodiment dataset of paired kinematics and video for training and evaluating AI autonomy models in surgical robotics and ultrasound applications, including tabletop exercises, clinical procedures, and healthcare robotics simulations.

rgbproprioception

76K downloads2026cc-by-4.0

LingBot-Depth Dataset

Self-curated RGB-D dataset for training masked depth modeling approaches, containing real-world indoor scenes, VLA robot manipulation tasks, and simulated data across multiple camera types and robot platforms.

rgbdepth

71K downloadsCC BY-NC-SA 4.0

PhysicalAI-Robotics-Manipulation-Kitchen-Demos

A large-scale dataset of 600 hours of human-teleoperated demonstrations across 316 different kitchen manipulation tasks, totalling 55k trajectories collected using a Franka Panda robot with an Omron mobile base.

rgbproprioception

57K downloads2025cc-by-4.0

ManiTwin-100K: Manipulation-Ready Digital Object Twins

A large-scale dataset of 100K manipulation-ready digital object twins with simulation-ready 3D meshes, physical properties, functional point annotations, grasp configurations, and language descriptions validated through physics-based simulation.

3d_meshlanguage

45K downloads2026cc-by-nc-4.0