Robotics Training Datasets

Purpose-built datasets for training robot manipulation policies, VLA models, world models, and embodied AI systems. Each dataset includes dense annotations and delivery in your preferred format.

40 datasets available

Egocentric Kitchen Video Dataset

First-person video of real kitchen activities — cooking, cleaning, organizing — captured across diverse home and commercial kitchen layouts with dense manipulation annotations for training robotic kitchen assistants and embodied AI systems.

rgbdepth120,000+ clips

Egocentric Warehouse Video Dataset

First-person video of real warehouse operations — picking, packing, sorting, and navigation — captured across diverse fulfillment center layouts with logistics-specific annotations for training warehouse robotics and AMR systems.

rgbdepth85K+ clips

Egocentric Outdoor Urban Video Dataset

First-person video of urban pedestrian environments — sidewalks, crosswalks, plazas — captured across 30+ cities with navigation annotations for training delivery robots and outdoor autonomous systems.

rgbdepthimu95K+ clips

Egocentric Retail Video Dataset

First-person video of real retail environments — grocery stores, pharmacies, department stores — with product interaction annotations for training retail automation AI.

rgbdepth70K+ clips

Egocentric Office Video Dataset

First-person video of real office environments — desks, meeting rooms, corridors — with workplace activity annotations for training telepresence robots and office automation AI.

rgbdepth60K+ clips

Teleoperation Kitchen Dataset

Robot teleoperation data from real kitchen environments — synchronized camera-action-force triplets for training cooking robot manipulation policies.

rgbdepthforce-torque45K+ clips

Teleoperation Warehouse Dataset

Robot teleoperation data from real warehouse environments — pick-and-place trajectories with force sensing for training logistics manipulation policies.

rgbdepthforce-torque55K+ clips

Teleoperation Tabletop Dataset

Robot teleoperation data for tabletop manipulation — sorting, stacking, tool use — with synchronized camera-action-force triplets for training general-purpose policies.

rgbdepthforce-torque80K+ clips

Multi-View Manipulation Dataset

Synchronized multi-camera robot manipulation recordings — 3-5 calibrated viewpoints — with 3D annotations for training spatial manipulation policies.

rgbdepthpoint-cloud40K+ clips

RGB-D Kitchen Dataset

Paired RGB and depth video from real kitchen environments with registered depth maps and 3D annotations for training depth-aware kitchen robots.

rgbdepth50K+ clips

RGB-D Manipulation Dataset

Paired RGB-D recordings of robot manipulation with 3D grasp annotations and force measurements for training depth-aware grasping policies.

rgbdepthforce-torque65K+ clips

Game Environment Dataset

High-fidelity video from game engines with pixel-perfect ground truth for pre-training vision models, world models, and sim-to-real transfer.

rgbdepthpoint-cloud66K+ clips

Synthetic Manipulation Dataset

Procedurally generated manipulation trajectories from physics simulators with perfect state information for scalable robot policy pre-training.

rgbdepthpoint-cloud200K+ clips

Dashcam Urban Dataset

Forward-facing dashcam video from urban driving environments with traffic annotations for training autonomous driving perception and world models.

rgbimu100K+ clips

Aerial Agricultural Dataset

Drone-captured agricultural imagery with crop health annotations for training agricultural robotics and precision farming AI.

rgbthermal30K+ clips

Egocentric Construction Video Dataset

First-person construction site video for training construction robotics and safety monitoring AI. 55K+ clips across 20+ site types with PPE detection, tool usage, and structural progress annotations.

rgbdepthimu55K+ clips

Egocentric Restaurant Video Dataset

First-person restaurant environment video for training food service robots and hospitality automation. 45K+ clips across 15+ restaurant types with food handling, plating, and service workflow annotations.

rgbdepth45K+ clips

Egocentric Healthcare Video Dataset

First-person healthcare environment video for training medical assistance robots and clinical workflow AI. 35K+ clips from 10+ clinical settings with instrument tracking, procedure phase, and sterile field annotations.

rgbdepth35K+ clips

Multi-View Assembly Dataset

Synchronized multi-camera recordings of assembly tasks for training 3D-aware manipulation policies. 30K+ trajectories across 15+ assembly configurations with part tracking, insertion state, and 3D pose annotations.

rgbdepthpoint-cloud30K+ clips

Thermal Industrial Dataset

Paired thermal-RGB imaging from industrial environments for training predictive maintenance robots and safety monitoring systems. 25K+ clips across 10+ facility types with thermal anomaly and equipment health annotations.

rgbthermal25K+ clips

Egocentric Workshop Video Dataset

First-person workshop and maker-space video for training tool-use robots and craft manipulation AI. 40K+ clips across 20+ workshop types with tool grasp, material transformation, and assembly sequence annotations.

rgbdepthimu40K+ clips

Point Cloud Indoor Dataset

Dense indoor point cloud scans with semantic annotations for training 3D scene understanding and indoor navigation. 15K+ scans across 500+ rooms with per-point semantic labels, instance segmentation, and room layout annotations.

rgbdepthpoint-cloud15K+ clips

Multi-Sensor Warehouse Dataset

Synchronized RGB, depth, LiDAR, and IMU data from warehouse environments for training autonomous mobile robots and pick-pack-ship automation. 50K+ clips across 25+ warehouse configurations.

rgbdepthlidar50K+ clips

Egocentric Agricultural Video Dataset

First-person video from agricultural settings for training harvesting robots and crop monitoring AI. 35K+ clips across 12+ farm types with dense manipulation and crop-state annotations.

rgbdepthimu35K+ clips

Stereo Outdoor Dataset

Calibrated stereo camera pairs from outdoor environments for training depth estimation and terrain-aware navigation. 40K+ clips across 15+ terrain types with disparity maps, traversability labels, and obstacle annotations.

rgbstereoimu40K+ clips

Egocentric Lab Video Dataset

First-person video of real laboratory workflows — pipetting, centrifuging, microscopy, sample handling — captured across diverse wet labs, dry labs, and cleanrooms with dense manipulation annotations for training robotic lab assistants and scientific automation systems.

rgbdepth85,000+ clips

Egocentric Outdoor Sports Video Dataset

First-person video of real outdoor sports activities — cycling, climbing, skiing, running, kayaking — captured with wearable cameras across diverse terrain and weather conditions with dense action and body pose annotations.

rgbimu95,000+ clips

Egocentric Assembly Line Video Dataset

First-person video of real manufacturing assembly tasks — part insertion, fastening, wiring, inspection — captured across diverse production facilities with step-level process annotations for training industrial cobots and quality monitoring AI.

rgbdepth70,000+ clips

Urban LiDAR Point Cloud Dataset

Dense LiDAR scans of real urban environments — streets, intersections, parking structures, pedestrian zones — captured across diverse cities with 3D bounding boxes, semantic segmentation, and lane-level annotations for training autonomous navigation and urban mapping systems.

lidarrgb60,000+ clips

Warehouse LiDAR Point Cloud Dataset

Dense LiDAR scans of real warehouse and logistics facilities — aisles, shelving units, pallet racks, loading docks — with 3D annotations for shelving geometry, pallet positions, obstacle detection, and navigable paths for training autonomous mobile robots.

lidarrgb45,000+ clips

Force-Torque Manipulation Dataset

Force-torque sensor recordings from real manipulation tasks — grasping, insertion, polishing, assembly — paired with synchronized RGB video for training contact-aware robot policies.

force-torque50K+ clips

Event Camera Manipulation Dataset

High-temporal-resolution event camera recordings of manipulation tasks — fast grasping, dynamic catching, tool use — for training reactive robot policies that require microsecond-level visual feedback.

event-camera40K+ clips

Multi-View Kitchen Dataset

Synchronized multi-camera recordings of real kitchen activities from 4-8 viewpoints — enabling 3D reconstruction, novel view synthesis, and multi-perspective manipulation training for kitchen robotics.

rgb65K+ clips

Thermal Outdoor Dataset

Thermal infrared video of outdoor environments — pedestrians, vehicles, wildlife, terrain — captured across day/night and seasonal conditions for training robust perception systems that operate beyond the visible spectrum.

thermal55K+ clips

Stereo Manipulation Dataset

Calibrated stereo camera recordings of manipulation tasks — pick-and-place, assembly, tool use — providing dense depth estimation ground truth for training depth-aware robot policies.

stereo60K+ clips

Outdoor Point Cloud Dataset

Dense 3D point clouds of outdoor environments — parks, construction sites, agricultural fields, forests — from LiDAR scanning and photogrammetry for training outdoor navigation and terrain analysis models.

point-cloud35K+ clips

Highway Dashcam Dataset

Continuous dashcam video from highway and freeway driving across diverse road conditions — multi-lane traffic, merging, construction zones, weather events — with lane-level annotations for training highway driving assistance and autonomous highway systems.

rgb110K+ clips

Aerial Inspection Dataset

Drone-captured video of infrastructure inspection — bridges, powerlines, solar farms, building facades, cell towers — with defect annotations and structural element labels for training automated aerial inspection AI.

rgb45K+ clips

Underwater Inspection Dataset

Underwater video from ROVs and divers inspecting subsea infrastructure — pipelines, offshore platforms, ship hulls, port structures — with corrosion, biofouling, and structural damage annotations.

rgb30K+ clips

Synthetic Household Dataset

Photorealistic synthetic renders of household environments — living rooms, bedrooms, bathrooms, garages — with perfect ground truth annotations for pre-training household robot perception before real-world fine-tuning.

rgb200K+ clips

Need a Custom Dataset?

Claru builds custom datasets for any robotics application. Tell us your model architecture, target environment, and data requirements.