bhatvineetMIT

BOPASK-Test

Name: BOPASK-Test
Creator: bhatvineet
License: MIT
Keywords: rgb, depth, segmentation, grasping, pose_estimation, visual_question_answering, spatial_reasoning, trajectory_prediction, depth_estimation, object_rearrangement, lab, home

Human-verified evaluation benchmark for spatial-reasoning visual question-answering on robotic grasping and pose estimation tasks. Contains 934 question-answer pairs across core (HANDAL, HOPE, YCB-V) and lab (home/in-the-wild) testsets with RGB images, depth maps, and segmentation masks.

Downloads11

Technical Profile

Modalities: rgbdepthsegmentation
Environment: labhome
Task Types: graspingpose_estimationvisual_question_answeringspatial_reasoningtrajectory_predictiondepth_estimationobject_rearrangement
Data Format: JSON
License: MIT

Part of the BOPASK-Test family

Access

View on HuggingFace

Need custom rgb data?

Claru builds purpose-built datasets for lab applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets

CoRL2026-CSI square_ring_on_peg

A robotics manipulation dataset created using LeRobot containing demonstrations of a square ring on peg task performed by a SO follower robot.

rgbproprioception

null downloadsApr 2026apache-2.0

Flip Object - Franka FR3

A robotics dataset containing 106 episodes of a Franka FR3 robot performing object flipping tasks, recorded with multi-view RGB video and proprioceptive state observations.

rgbproprioception

null downloadsApr 2026apache-2.0

EgoTraj-Bench

The first real-world benchmark for pedestrian trajectory prediction under ego-centric noisy observations, pairing noisy first-person-view derived trajectories with clean bird's-eye-view ground truth.

rgbtrajectory

null downloadsApr 2026cc-by-nc-4.0

Xperience-10M

A large-scale egocentric multimodal dataset of human experience containing 10 million interactions and 10,000 hours of synchronized first-person recordings with six video streams, audio, stereo depth, camera pose, hand mocap, full-body mocap, IMU, and hierarchical language annotations for embodied AI, robotics, and world modeling research.

rgbaudiodepthproprioception+3

2.3M downloadsApr 2026other

LingBot-Depth Dataset

Self-curated RGB-D dataset for training masked depth modeling approaches, containing real-world indoor scenes, VLA robot manipulation tasks, and simulated data across multiple camera types and robot platforms.

rgbdepth

261K downloadsApr 2026CC BY-NC-SA 4.0

Open-H-Embodiment

A multi-embodiment community-driven dataset of paired kinematics and video for training AI autonomy models in surgical robotics and ultrasound applications. The dataset includes tabletop exercises, clinical procedures, and simulations of healthcare robotics tasks.

rgbproprioception

140K downloadsMar 2026cc-by-4.0