Multi-View Manipulation Dataset
Synchronized multi-camera robot manipulation recordings — 3-5 calibrated viewpoints — with 3D annotations for training spatial manipulation policies.
Dataset at a Glance
Comparison with Public Datasets
How Claru's dataset compares to publicly available alternatives.
| Dataset | Clips | Hours | Modalities | Environments | Annotations |
|---|---|---|---|---|---|
| RLBench | 100K | ~50 | RGB-D (sim) | Simulated | Actions, keypoints |
| ManiSkill2 | 200K | ~100 | RGB-D (sim) | Simulated | Actions, rewards |
| Claru Multi-View | 40K+ | 250+ | RGB, Depth, PC | 20+ real setups | 3D poses, actions, camera params |
Use Cases
3D-Aware Policies
Policies that reason about full 3D geometry for robust grasping. Example models: PerAct, Act3D, 3D Diffusion Policy.
Neural Radiance Fields
Scene representations from multi-view captures for robotics. Example models: NeRF-RL, Ditto, F3RM.
Point Cloud Manipulation
Direct 3D processing for manipulation planning. Example models: PointNet++, Contact-GraspNet, VoxPoser.
How Claru Delivers This Data
Claru's multi-view rigs use 3-5 calibrated RGB-D cameras with precise calibration enabling registered point cloud reconstruction. This bridges the gap between simulation-heavy 3D datasets and real-world needs.
Frequently Asked Questions
3 to 5 calibrated RGB-D cameras with full extrinsic/intrinsic parameters per frame.
6-DoF object poses, 3D bounding boxes, registered point clouds, surface normals, and camera-to-world transformations.
Yes. Multi-view images with known camera parameters are directly usable for NeRF, 3D Gaussian Splatting, and other neural 3D methods.
Request a Sample Pack
Get a curated sample of multi-view manipulation data with full annotations to evaluate for your project.