nvidia2025nvidia-noncommercial-license
R4D-Bench
A region-level 4D video question answering benchmark with 1,419 region-prompted multiple-choice VQA pairs built from dynamic real-world videos. It challenges models to track, reason about depth, and understand temporal dynamics of specific regions in video.
Downloads26
Episodes1419
Likes4
Why This Matters for Physical AI
This dataset advances physical AI by enabling models to understand 4D spatial-temporal dynamics at the region level, critical for tasks requiring precise depth perception, motion tracking, and reasoning about object kinematics in real-world scenarios.
Technical Profile
- Modalities
- rgblanguage
- Environment
- outdoorurban
- Task Types
- visual_question_answering3d_groundingspatial_reasoningmotion_understanding
- Episodes
- 1419
- Data Format
- json
- Annotation Types
- language_instructionsbounding_boxesaction_labels
- License
- nvidia-noncommercial-license
Access
Need custom rgb data?
Claru builds purpose-built datasets for outdoor applications with dense human annotations and quality assurance.
Request a Sample Pack