ghazishazan2025cc-by-nc-4.0

VPoS-Bench: Video Pointing and Segmentation Benchmark

A challenging out-of-distribution benchmark for evaluating spatio-temporal pointing and reasoning capabilities of video-language models across five real-world application domains with fine-grained point-level and segmentation annotations.

Downloads509
Likes2

Why This Matters for Physical AI

VPoS-Bench enables evaluation of video-language models' ability to understand and localize objects in robotic manipulation tasks through spatio-temporal grounding, which is essential for developing embodied AI systems that can follow visual instructions.

Technical Profile

Modalities
rgblanguagesegmentation
Robot Embodiments
robotic_manipulator
Environment
labsimulation
Task Types
manipulationcell_trackingobject_trackingpointing
Data Format
JSON
Annotation Types
language_instructionspoint_annotationssegmentation
License
cc-by-nc-4.0
Part of the VideoMolmo family

Community Signals

Access

Need custom rgb data?

Claru builds purpose-built datasets for lab applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets