memmelma
PEEK VQA
A dataset of 2M image-QA pairs for fine-tuning PEEK VLM, a vision language model for robotics that predicts trajectory paths and task-relevant masking points for robot manipulation. Answers are generated from Open X-Embodiment datasets using point-based representations normalized to [0,1]².
Downloads738
Technical Profile
- Modalities
- rgblanguage
- Robot Embodiments
- Franka PandaUR5JacoHydraEDANFanucStretch
- Environment
- labkitchen
- Task Types
- manipulationpick_and_place
- Data Format
- JSON
Community Signals
Top 50% by downloads
Access
Need custom rgb data?
Claru builds purpose-built datasets for lab applications with dense human annotations and quality assurance.
Request a Sample Pack