rad1d1m1232025cc-by-nc-4.0

OmniAction

A large-scale multimodal dataset for proactive robot manipulation comprising 141,162 episodes with cross-modal contextual instructions derived from spoken dialogue, environmental sounds, and visual cues rather than explicit commands.

Downloads45K
Episodes141,162

Why This Matters for Physical AI

This dataset addresses the critical challenge of proactive intention recognition from multimodal contextual cues, enabling robots to infer and respond to user intentions without explicit instructions, which is essential for natural human-robot collaboration in real-world settings.

Technical Profile

Modalities
rgbaudiolanguage
Environment
simulationlab
Task Types
manipulationinstruction_following
Episodes
141,162
Data Format
RLDS
Annotation Types
language_instructionsaction_labels
License
cc-by-nc-4.0
Part of the OmniAction family

Community Signals

Access

Need custom rgb data?

Claru builds purpose-built datasets for simulation applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets