Physical AI Datasets

The most comprehensive directory of open-source datasets for training robot manipulation policies, VLA models, world models, and embodied AI systems. Search by modality, robot platform, environment, and task type.

421 open-source datasets

Loading datasets...
Claru collection·Not open-source — available on request

Purpose-Built Datasets by Claru

These are not open-source datasets. Claru collects custom physical AI training data with dense human annotations. Contact us to request samples or discuss collection.

Egocentric Kitchen Video Dataset

First-person video of real kitchen activities — cooking, cleaning, organizing — captured across diverse home and commercial kitchen layouts with dense manipulation annotations for training robotic kitchen assistants and embodied AI systems.

rgbdepth120,000+ clips

Egocentric Warehouse Video Dataset

First-person video of real warehouse operations — picking, packing, sorting, and navigation — captured across diverse fulfillment center layouts with logistics-specific annotations for training warehouse robotics and AMR systems.

rgbdepth85K+ clips

Egocentric Outdoor Urban Video Dataset

First-person video of urban pedestrian environments — sidewalks, crosswalks, plazas — captured across 30+ cities with navigation annotations for training delivery robots and outdoor autonomous systems.

rgbdepthimu95K+ clips

Egocentric Retail Video Dataset

First-person video of real retail environments — grocery stores, pharmacies, department stores — with product interaction annotations for training retail automation AI.

rgbdepth70K+ clips

Egocentric Office Video Dataset

First-person video of real office environments — desks, meeting rooms, corridors — with workplace activity annotations for training telepresence robots and office automation AI.

rgbdepth60K+ clips

Teleoperation Kitchen Dataset

Robot teleoperation data from real kitchen environments — synchronized camera-action-force triplets for training cooking robot manipulation policies.

rgbdepthforce-torque45K+ clips

Teleoperation Warehouse Dataset

Robot teleoperation data from real warehouse environments — pick-and-place trajectories with force sensing for training logistics manipulation policies.

rgbdepthforce-torque55K+ clips

Teleoperation Tabletop Dataset

Robot teleoperation data for tabletop manipulation — sorting, stacking, tool use — with synchronized camera-action-force triplets for training general-purpose policies.

rgbdepthforce-torque80K+ clips

Multi-View Manipulation Dataset

Synchronized multi-camera robot manipulation recordings — 3-5 calibrated viewpoints — with 3D annotations for training spatial manipulation policies.

rgbdepthpoint-cloud40K+ clips

RGB-D Kitchen Dataset

Paired RGB and depth video from real kitchen environments with registered depth maps and 3D annotations for training depth-aware kitchen robots.

rgbdepth50K+ clips

RGB-D Manipulation Dataset

Paired RGB-D recordings of robot manipulation with 3D grasp annotations and force measurements for training depth-aware grasping policies.

rgbdepthforce-torque65K+ clips

Game Environment Dataset

High-fidelity video from game engines with pixel-perfect ground truth for pre-training vision models, world models, and sim-to-real transfer.

rgbdepthpoint-cloud66K+ clips

Synthetic Manipulation Dataset

Procedurally generated manipulation trajectories from physics simulators with perfect state information for scalable robot policy pre-training.

rgbdepthpoint-cloud200K+ clips

Dashcam Urban Dataset

Forward-facing dashcam video from urban driving environments with traffic annotations for training autonomous driving perception and world models.

rgbimu100K+ clips

Aerial Agricultural Dataset

Drone-captured agricultural imagery with crop health annotations for training agricultural robotics and precision farming AI.

rgbthermal30K+ clips

Egocentric Construction Video Dataset

First-person construction site video for training construction robotics and safety monitoring AI. 55K+ clips across 20+ site types with PPE detection, tool usage, and structural progress annotations.

rgbdepthimu55K+ clips

Egocentric Restaurant Video Dataset

First-person restaurant environment video for training food service robots and hospitality automation. 45K+ clips across 15+ restaurant types with food handling, plating, and service workflow annotations.

rgbdepth45K+ clips

Egocentric Healthcare Video Dataset

First-person healthcare environment video for training medical assistance robots and clinical workflow AI. 35K+ clips from 10+ clinical settings with instrument tracking, procedure phase, and sterile field annotations.

rgbdepth35K+ clips

Multi-View Assembly Dataset

Synchronized multi-camera recordings of assembly tasks for training 3D-aware manipulation policies. 30K+ trajectories across 15+ assembly configurations with part tracking, insertion state, and 3D pose annotations.

rgbdepthpoint-cloud30K+ clips

Thermal Industrial Dataset

Paired thermal-RGB imaging from industrial environments for training predictive maintenance robots and safety monitoring systems. 25K+ clips across 10+ facility types with thermal anomaly and equipment health annotations.

rgbthermal25K+ clips

Egocentric Workshop Video Dataset

First-person workshop and maker-space video for training tool-use robots and craft manipulation AI. 40K+ clips across 20+ workshop types with tool grasp, material transformation, and assembly sequence annotations.

rgbdepthimu40K+ clips

Point Cloud Indoor Dataset

Dense indoor point cloud scans with semantic annotations for training 3D scene understanding and indoor navigation. 15K+ scans across 500+ rooms with per-point semantic labels, instance segmentation, and room layout annotations.

rgbdepthpoint-cloud15K+ clips

Multi-Sensor Warehouse Dataset

Synchronized RGB, depth, LiDAR, and IMU data from warehouse environments for training autonomous mobile robots and pick-pack-ship automation. 50K+ clips across 25+ warehouse configurations.

rgbdepthlidar50K+ clips

Egocentric Agricultural Video Dataset

First-person video from agricultural settings for training harvesting robots and crop monitoring AI. 35K+ clips across 12+ farm types with dense manipulation and crop-state annotations.

rgbdepthimu35K+ clips

Stereo Outdoor Dataset

Calibrated stereo camera pairs from outdoor environments for training depth estimation and terrain-aware navigation. 40K+ clips across 15+ terrain types with disparity maps, traversability labels, and obstacle annotations.

rgbstereoimu40K+ clips

Egocentric Lab Video Dataset

First-person video of real laboratory workflows — pipetting, centrifuging, microscopy, sample handling — captured across diverse wet labs, dry labs, and cleanrooms with dense manipulation annotations for training robotic lab assistants and scientific automation systems.

rgbdepth85,000+ clips

Egocentric Outdoor Sports Video Dataset

First-person video of real outdoor sports activities — cycling, climbing, skiing, running, kayaking — captured with wearable cameras across diverse terrain and weather conditions with dense action and body pose annotations.

rgbimu95,000+ clips

Egocentric Assembly Line Video Dataset

First-person video of real manufacturing assembly tasks — part insertion, fastening, wiring, inspection — captured across diverse production facilities with step-level process annotations for training industrial cobots and quality monitoring AI.

rgbdepth70,000+ clips

Urban LiDAR Point Cloud Dataset

Dense LiDAR scans of real urban environments — streets, intersections, parking structures, pedestrian zones — captured across diverse cities with 3D bounding boxes, semantic segmentation, and lane-level annotations for training autonomous navigation and urban mapping systems.

lidarrgb60,000+ clips

Warehouse LiDAR Point Cloud Dataset

Dense LiDAR scans of real warehouse and logistics facilities — aisles, shelving units, pallet racks, loading docks — with 3D annotations for shelving geometry, pallet positions, obstacle detection, and navigable paths for training autonomous mobile robots.

lidarrgb45,000+ clips

Force-Torque Manipulation Dataset

Force-torque sensor recordings from real manipulation tasks — grasping, insertion, polishing, assembly — paired with synchronized RGB video for training contact-aware robot policies.

force-torque50K+ clips

Event Camera Manipulation Dataset

High-temporal-resolution event camera recordings of manipulation tasks — fast grasping, dynamic catching, tool use — for training reactive robot policies that require microsecond-level visual feedback.

event-camera40K+ clips

Multi-View Kitchen Dataset

Synchronized multi-camera recordings of real kitchen activities from 4-8 viewpoints — enabling 3D reconstruction, novel view synthesis, and multi-perspective manipulation training for kitchen robotics.

rgb65K+ clips

Thermal Outdoor Dataset

Thermal infrared video of outdoor environments — pedestrians, vehicles, wildlife, terrain — captured across day/night and seasonal conditions for training robust perception systems that operate beyond the visible spectrum.

thermal55K+ clips

Stereo Manipulation Dataset

Calibrated stereo camera recordings of manipulation tasks — pick-and-place, assembly, tool use — providing dense depth estimation ground truth for training depth-aware robot policies.

stereo60K+ clips

Outdoor Point Cloud Dataset

Dense 3D point clouds of outdoor environments — parks, construction sites, agricultural fields, forests — from LiDAR scanning and photogrammetry for training outdoor navigation and terrain analysis models.

point-cloud35K+ clips

Highway Dashcam Dataset

Continuous dashcam video from highway and freeway driving across diverse road conditions — multi-lane traffic, merging, construction zones, weather events — with lane-level annotations for training highway driving assistance and autonomous highway systems.

rgb110K+ clips

Aerial Inspection Dataset

Drone-captured video of infrastructure inspection — bridges, powerlines, solar farms, building facades, cell towers — with defect annotations and structural element labels for training automated aerial inspection AI.

rgb45K+ clips

Underwater Inspection Dataset

Underwater video from ROVs and divers inspecting subsea infrastructure — pipelines, offshore platforms, ship hulls, port structures — with corrosion, biofouling, and structural damage annotations.

rgb30K+ clips

Synthetic Household Dataset

Photorealistic synthetic renders of household environments — living rooms, bedrooms, bathrooms, garages — with perfect ground truth annotations for pre-training household robot perception before real-world fine-tuning.

rgb200K+ clips

Need Custom Physical AI Data?

Claru builds custom datasets for any robotics application. Tell us your model architecture, target environment, and data requirements.