cagataydev2026cc-by-4.0

VLM Robotics Voice Commands (Audio)

A dataset of 9,999 audio recordings of natural speech commands for controlling robots, covering tasks like pick & place, navigation, manipulation, and multi-step instructions across 10 command categories.

Downloads66
Episodes9999
Hours6.8

Why This Matters for Physical AI

This dataset enables training of omni-modal Vision-Language Models that can understand natural spoken commands for robot control, bridging the gap between human communication and embodied AI systems.

Technical Profile

Modalities
audio
Action Space
language
Task Types
pick_and_placemanipulationnavigationobservationmulti-step_tasksspatial_commandssafetyhousehold_choresconversational_feedback
Episodes
9999
Total Hours
6.8
Data Format
WAV
Annotation Types
language_instructionscategory_labelsdifficulty_labels
License
cc-by-4.0
Part of the VLM Robotics Voice Commands (Audio) family

Access

Need custom audio data?

Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets