Whoisjutanleecc-by-nc-4.0
OmniAction
A large-scale multimodal dataset for proactive robot manipulation with 141,162 episodes covering contextual instruction following through spoken dialogue, environmental sounds, and visual cues. The dataset includes 5,096 distinct speaker timbres, 2,482 non-verbal sound events, and 640 environmental backgrounds across six categories of contextual instructions.
Downloads6
Technical Profile
- Modalities
- rgbaudiolanguage
- Environment
- simulationlab
- Task Types
- manipulationproactive_assistance
- Data Format
- RLDS
- License
- cc-by-nc-4.0
Access
Need custom rgb data?
Claru builds purpose-built datasets for simulation applications with dense human annotations and quality assurance.
Request a Sample Pack