rad1d1m1232025cc-by-nc-4.0
OmniAction
A large-scale multimodal dataset for proactive robot manipulation comprising 141,162 episodes with cross-modal contextual instructions derived from spoken dialogue, environmental sounds, and visual cues rather than explicit commands.
Downloads45K
Episodes141,162
Why This Matters for Physical AI
This dataset addresses the critical challenge of proactive intention recognition from multimodal contextual cues, enabling robots to infer and respond to user intentions without explicit instructions, which is essential for natural human-robot collaboration in real-world settings.
Technical Profile
- Modalities
- rgbaudiolanguage
- Environment
- simulationlab
- Task Types
- manipulationinstruction_following
- Episodes
- 141,162
- Data Format
- RLDS
- Annotation Types
- language_instructionsaction_labels
- License
- cc-by-nc-4.0
Community Signals
Top 5% by downloads
Academic Citations20
Access
Need custom rgb data?
Claru builds purpose-built datasets for simulation applications with dense human annotations and quality assurance.
Request a Sample Pack