nvidia2025nvidia-noncommercial-license

R4D-Bench

A region-level 4D video question answering benchmark with 1,419 region-prompted multiple-choice VQA pairs built from dynamic real-world videos. It challenges models to track, reason about depth, and understand temporal dynamics of specific regions in video.

Downloads26
Episodes1419
Likes4

Why This Matters for Physical AI

This dataset advances physical AI by enabling models to understand 4D spatial-temporal dynamics at the region level, critical for tasks requiring precise depth perception, motion tracking, and reasoning about object kinematics in real-world scenarios.

Technical Profile

Modalities
rgblanguage
Environment
outdoorurban
Task Types
visual_question_answering3d_groundingspatial_reasoningmotion_understanding
Episodes
1419
Data Format
json
Annotation Types
language_instructionsbounding_boxesaction_labels
License
nvidia-noncommercial-license
Part of the R4D-Bench family

Access

Need custom rgb data?

Claru builds purpose-built datasets for outdoor applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets