cagataydev2026cc-by-4.0
VLM Robotics Voice Commands (Audio)
A dataset of 9,999 audio recordings of natural speech commands for controlling robots, covering tasks like pick & place, navigation, manipulation, and multi-step instructions across 10 command categories.
Downloads66
Episodes9999
Hours6.8
Why This Matters for Physical AI
This dataset enables training of omni-modal Vision-Language Models that can understand natural spoken commands for robot control, bridging the gap between human communication and embodied AI systems.
Technical Profile
- Modalities
- audio
- Action Space
- language
- Task Types
- pick_and_placemanipulationnavigationobservationmulti-step_tasksspatial_commandssafetyhousehold_choresconversational_feedback
- Episodes
- 9999
- Total Hours
- 6.8
- Data Format
- WAV
- Annotation Types
- language_instructionscategory_labelsdifficulty_labels
- License
- cc-by-4.0
Access
Need custom audio data?
Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.
Request a Sample Pack