cagataydev2026cc-by-4.0

VLM Robotics Voice Commands (Text)

50,000 curated natural language commands for Vision-Language-Model robot control, covering 10 categories of embodied interaction including pick-and-place, manipulation, navigation, and human-robot interaction.

Downloads26
Episodes50000

Why This Matters for Physical AI

This dataset enables training Vision-Language Models to understand diverse natural language commands for robot control, supporting research in embodied AI and human-robot interaction through text-based instruction understanding.

Technical Profile

Modalities
language
Action Space
language
Task Types
pick_and_placemanipulationnavigationmultistepobservationhousehold
Episodes
50000
Annotation Types
language_instructionstask_category_labelsdifficulty_labels
License
cc-by-4.0
Part of the VLM Robotics Voice Commands (Text) family

Access

Need custom language data?

Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets