OpenMOSS-Team2025cc-by-nc-4.0
OmniAction
A large-scale multimodal dataset for proactive robot manipulation comprising 141,162 episodes across 112 skills and 748 objects, enriched with audio, visual, and contextual instruction data for cross-modal intention recognition.
Downloads42K
Episodes141162
Likes252
Why This Matters for Physical AI
This dataset advances physical AI by enabling robots to recognize human intentions from multimodal contextual cues (speech, sounds, visuals) rather than explicit instructions, bridging the gap between human-robot interaction in real-world collaborative scenarios.
Technical Profile
- Modalities
- rgbaudiolanguage
- Environment
- simulationlab
- Task Types
- manipulationpick_and_place
- Episodes
- 141162
- Data Format
- RLDS
- Annotation Types
- language_instructionsaction_labels
- License
- cc-by-nc-4.0
Community Signals
Top 5% by downloads
Academic Citations20
HuggingFace Discussions5
Access
Need custom rgb data?
Claru builds purpose-built datasets for simulation applications with dense human annotations and quality assurance.
Request a Sample Pack