hosam12kalad2025cc-by-nc-4.0

OmniAction

A large-scale multimodal dataset for proactive robot manipulation with 141,162 episodes covering contextual instruction following through spoken dialogue, environmental sounds, and visual cues.

Downloads11K
Episodes141162

Why This Matters for Physical AI

This dataset enables training of multimodal robot systems that can infer user intentions from contextual cues without explicit instructions, advancing real-world human-robot collaboration capabilities.

Technical Profile

Modalities
rgbaudiolanguage
Environment
simulationlab
Task Types
manipulation
Episodes
141162
Data Format
RLDS
Annotation Types
language_instructionsaction_labels
License
cc-by-nc-4.0
Part of the OmniAction family

Community Signals

Access

Need custom rgb data?

Claru builds purpose-built datasets for simulation applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets