EO-Data-1.5M
A large-scale interleaved vision-text-action dataset with 1.5M samples derived from 2.1M robot episodes, emphasizing temporal dynamics and causal dependencies among vision, language, and action modalities for embodied AI. It combines human-annotated and VLM-generated annotations across 17 subsets covering manipulation and embodied reasoning tasks.
Why This Matters for Physical AI
EO-Data-1.5M enables training of generalist robot policies by providing the first large-scale dataset capturing interleaved temporal dynamics and causal relationships between vision, language, and action, essential for developing embodied AI systems that understand both spatial and temporal reasoning.
Technical Profile
- Modalities
- rgblanguageaction
- Robot Embodiments
- AgiBotWidowXFranka
- Environment
- lab
- Task Types
- manipulationtask_planningaffordance_assessmentfailure_detectiontrajectory_predictionobject_groundingspatial_reasoning
- Episodes
- 1500000
- Data Format
- parquet
- Annotation Types
- language_instructionsvisual_question_answeringvideo_captioningtrajectory_annotationsaction_labelsaffordance_labelsfailure_labelsspatial_annotations
- License
- apache-2.0
Community Signals
Access
Need custom rgb data?
Claru builds purpose-built datasets for lab applications with dense human annotations and quality assurance.
Request a Sample Pack