OpenMOSS-Team2025cc-by-nc-4.0

OmniAction

A large-scale multimodal dataset for proactive robot manipulation comprising 141,162 episodes across 112 skills and 748 objects, enriched with audio, visual, and contextual instruction data for cross-modal intention recognition.

Downloads42K
Episodes141162
Likes252

Why This Matters for Physical AI

This dataset advances physical AI by enabling robots to recognize human intentions from multimodal contextual cues (speech, sounds, visuals) rather than explicit instructions, bridging the gap between human-robot interaction in real-world collaborative scenarios.

Technical Profile

Modalities
rgbaudiolanguage
Environment
simulationlab
Task Types
manipulationpick_and_place
Episodes
141162
Data Format
RLDS
Annotation Types
language_instructionsaction_labels
License
cc-by-nc-4.0
Part of the OmniAction family

Community Signals

Access

Need custom rgb data?

Claru builds purpose-built datasets for simulation applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets