ghazishazan2025cc-by-nc-4.0
VPoS-Bench: Video Pointing and Segmentation Benchmark
A challenging out-of-distribution benchmark for evaluating spatio-temporal pointing and reasoning capabilities of video-language models across five real-world application domains with fine-grained point-level and segmentation annotations.
Downloads509
Likes2
Why This Matters for Physical AI
VPoS-Bench enables evaluation of video-language models' ability to understand and localize objects in robotic manipulation tasks through spatio-temporal grounding, which is essential for developing embodied AI systems that can follow visual instructions.
Technical Profile
- Modalities
- rgblanguagesegmentation
- Robot Embodiments
- robotic_manipulator
- Environment
- labsimulation
- Task Types
- manipulationcell_trackingobject_trackingpointing
- Data Format
- JSON
- Annotation Types
- language_instructionspoint_annotationssegmentation
- License
- cc-by-nc-4.0
Community Signals
Top 25% by downloads
Access
Need custom rgb data?
Claru builds purpose-built datasets for lab applications with dense human annotations and quality assurance.
Request a Sample Pack