Whoisjutanleecc-by-nc-4.0

OmniAction

A large-scale multimodal dataset for proactive robot manipulation with 141,162 episodes covering contextual instruction following through spoken dialogue, environmental sounds, and visual cues. The dataset includes 5,096 distinct speaker timbres, 2,482 non-verbal sound events, and 640 environmental backgrounds across six categories of contextual instructions.

Downloads6

Technical Profile

Modalities
rgbaudiolanguage
Environment
simulationlab
Task Types
manipulationproactive_assistance
Data Format
RLDS
License
cc-by-nc-4.0
Part of the OmniAction family

Access

Need custom rgb data?

Claru builds purpose-built datasets for simulation applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets