IPEC-COMMUNITYapache-2.0

EO-Data-1.5M

A large-scale interleaved vision-text-action dataset with 1.5M samples derived from 2.1M robot episodes, emphasizing temporal dynamics and causal dependencies among vision, language, and action modalities for embodied AI. It combines human-annotated and VLM-generated annotations across 17 subsets covering manipulation and embodied reasoning tasks.

Downloads13K
Episodes1500000
Likes19

Why This Matters for Physical AI

EO-Data-1.5M enables training of generalist robot policies by providing the first large-scale dataset capturing interleaved temporal dynamics and causal relationships between vision, language, and action, essential for developing embodied AI systems that understand both spatial and temporal reasoning.

Technical Profile

Modalities
rgblanguageaction
Robot Embodiments
AgiBotWidowXFranka
Environment
lab
Task Types
manipulationtask_planningaffordance_assessmentfailure_detectiontrajectory_predictionobject_groundingspatial_reasoning
Episodes
1500000
Data Format
parquet
Annotation Types
language_instructionsvisual_question_answeringvideo_captioningtrajectory_annotationsaction_labelsaffordance_labelsfailure_labelsspatial_annotations
License
apache-2.0
Part of the EO-Robotics family

Community Signals

Access

Need custom rgb data?

Claru builds purpose-built datasets for lab applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets