keplerccc2025apache-2.0

Robo2VLM-Reasoning

A visual question-answering dataset derived from large-scale in-the-wild robot manipulation data, augmented with reasoning traces generated by Gemini-2.5-Pro for supporting correct multiple-choice answers.

Downloads150
Episodes5150

Why This Matters for Physical AI

This dataset enables training vision-language models to reason about robot manipulation tasks by combining real robotic imagery with structured question-answering and reasoning traces, advancing embodied AI's capacity for visual understanding and explanatory inference.

Technical Profile

Modalities
rgblanguage
Task Types
manipulationvisual-question-answering
Episodes
5150
Annotation Types
language_instructionsreasoning
License
apache-2.0
Part of the Robo2VLM family

Community Signals

Access

Need custom rgb data?

Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets