memmelma

PEEK VQA

A dataset of 2M image-QA pairs for fine-tuning PEEK VLM, a vision language model for robotics that predicts trajectory paths and task-relevant masking points for robot manipulation. Answers are generated from Open X-Embodiment datasets using point-based representations normalized to [0,1]².

Downloads738

Technical Profile

Modalities
rgblanguage
Robot Embodiments
Franka PandaUR5JacoHydraEDANFanucStretch
Environment
labkitchen
Task Types
manipulationpick_and_place
Data Format
JSON
Part of the PEEK VQA family

Community Signals

Top 50% by downloads

Access

Need custom rgb data?

Claru builds purpose-built datasets for lab applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets