Egocentric Workshop Video Dataset

First-person workshop and maker-space video for training tool-use robots and craft manipulation AI. 40K+ clips across 20+ workshop types with tool grasp, material transformation, and assembly sequence annotations.

Dataset at a Glance

40K+
Video clips
280+
Hours captured
20+ workshop types
Environments
11+
Annotation layers

Why Egocentric Workshop Data Matters for Robotics

Tool use is one of the defining capabilities that separates general-purpose manipulation robots from simple pick-and-place systems. A robot that can wield a screwdriver, operate a drill press, apply a clamp, or sand a surface unlocks an enormous range of practical applications in manufacturing, maintenance, repair, and craft production. Training these capabilities requires demonstration data that captures the precise grip forces, approach angles, and visual feedback loops that skilled tool users have internalized through years of practice.

Workshop environments encompass the broadest diversity of tool-material interactions found in any single domain: woodworking shops use hand planes, chisels, routers, and table saws; metal shops involve lathes, mills, welders, and grinders; electronics workshops require soldering irons, oscilloscopes, and precision hand tools; automotive shops combine pneumatic tools, hydraulic lifts, and diagnostic equipment. Each workshop type demands different manipulation strategies, force profiles, and safety protocols.

Claru's egocentric workshop dataset captures skilled craftspeople performing genuine tasks in 20+ workshop types: woodworking, metalworking, automotive repair, electronics assembly, 3D printing labs, CNC shops, welding shops, upholstery studios, bicycle repair, plumbing workshops, HVAC service, appliance repair, jewelry making, ceramics studios, glass blowing, leather working, and general maker-spaces. This breadth is critical because tool-use skills transfer between domains -- a robot that learns proper screwdriver technique in electronics assembly can generalize to furniture assembly if the training data captures sufficient tool diversity.

Research from CoRL 2023 and RSS 2024 shows that tool-use policies trained on diverse egocentric demonstration data exhibit 40-60% better cross-task generalization compared to policies trained in simulation or on single-workshop data, because real-world tool use involves material-dependent feedback (the resistance of hardwood vs. softwood, the yield point of different metals) that simulation cannot faithfully reproduce.

Sensor Configuration and Collection Methodology

Collection uses head-mounted GoPro HERO12 cameras with protective housings rated for workshop environments (impact-resistant polycarbonate with anti-fog ventilation). Depth from co-mounted Intel RealSense D455 provides aligned RGB-D at the 0.2-1.5m working distance typical for bench and machine work. For environments with excessive vibration (machine shops during cutting operations), a secondary body-mounted IMU (Xsens MTi-630) provides supplementary motion data for vibration rejection in post-processing.

Collectors are experienced craftspeople, technicians, and tradespeople performing genuine workshop tasks on real projects. Sessions last 45-90 minutes and cover complete task sequences: a full woodworking joint (measuring, marking, cutting, fitting, gluing, clamping), a complete electronic circuit assembly (component placement, soldering, inspection, testing), or an automotive brake job (disassembly, inspection, component swap, reassembly, bleeding, testing). This ensures the data captures the full decision chain, including the inspection and verification steps that distinguish skilled work from novice attempts.

Workshop metadata includes facility type, primary materials being worked, tool inventory with photographs, project type and complexity level, and ambient conditions (temperature, humidity, dust level, ventilation type). For machine tool operations, spindle speed, feed rate, and material grade are recorded when available. This metadata enables researchers to correlate manipulation strategies with material properties and machine settings.

The dataset captures the full range of workshop conditions: the fine dust of woodworking shops, the metallic particle haze of grinding operations, the bright arc light of welding (through auto-darkening filters), the steam of heat-treating, the close-range precision of electronics work under magnification, and the oil-film visual distortion common in automotive and machining environments. These challenging visual conditions are exactly what workshop robots must handle reliably.

Comparison with Public Datasets

How Claru's egocentric workshop dataset compares to publicly available alternatives for tool-use robotics and manufacturing AI.

DatasetClipsHoursModalitiesEnvironmentsAnnotations
Assembly101 (CVPR 2022)~4K sequences~513RGB (multi-view)Toy assembly (single)Fine-grained assembly actions
Ego4D (Hand-Object, 2022)~10K segments~50 (subset)RGBMixed activitiesHand-object contacts, state changes
IndustReal (ICRA 2023)~1K~10RGB-DIndustrial assembly jigsInsertion, threading tasks
Claru Egocentric Workshop40K+280+RGB-D, IMU20+ workshop typesTool grasps, material state, assembly, safety, techniques

Annotation Pipeline and Quality Assurance

Stage one automated pre-labeling applies DINOv2 for tool and material segmentation, SAM2 for instance-level masks of tools, workpieces, and fasteners, and a custom tool-state classifier that detects whether tools are idle, being gripped, in active use, or being returned to storage. DepthAnything V2 supplements hardware depth for reflective metal surfaces and transparent materials (acrylic, glass) where stereo matching fails.

Stage two human annotation is performed by annotators with workshop experience. They add: tool-use action taxonomy (110+ verbs covering measuring, marking, cutting, shaping, joining, finishing, and assembly operations across all workshop types), grasp type classification using the Feix taxonomy adapted for tool use (power grasp, precision pinch, hook grip, lateral pinch, and tool-specific grips like trigger pull and torque wrench hold), material state tracking (stock, measured, marked, cut, shaped, joined, finished), and safety event annotations (eye protection, hearing protection, guard usage, lockout-tagout).

Stage three QA achieves 95%+ agreement on action boundaries, 94%+ on tool identification (critical because workshop robots must select the correct tool), and 96%+ on safety event detection. Tool identification is particularly challenging because many workshop tools look similar in the egocentric view (different screwdriver types, various plier configurations, multiple wrench sizes) -- annotators with trade experience are essential for accurate labeling.

The complete taxonomy covers 110+ workshop action verbs (measure, scribe, crosscut, rip, dado, rabbet, mortise, tenon, braze, sweat, crimp, chase, ream), 90+ tool categories across all workshop types, 40+ material types with state attributes, grasp classifications per the Feix taxonomy, and 12 workshop safety compliance categories. This depth enables training models that can select appropriate tools, plan multi-step fabrication sequences, and execute skilled manipulation techniques.

Use Cases

Tool-Use Manipulation Policies

Training robot arms to wield hand and power tools with appropriate grip, force, and technique. Egocentric demonstrations from skilled craftspeople capture the force modulation and visual feedback patterns that define competent tool use. Example architectures: RT-2, Diffusion Policy, ACT, pi0.

Assembly Sequence Planning

Learning multi-step assembly and fabrication sequences from expert demonstrations. Models observe how skilled workers decompose complex builds into ordered operations, select appropriate tools for each step, and verify quality at each stage. Applications in manufacturing co-bots and automated assembly systems.

Material State Understanding

Training models to recognize material transformations during workshop processes: raw stock through measured, marked, cut, shaped, joined, and finished states. Critical for robots that must assess work progress, detect defects, and determine when a process step is complete.

Key References

  1. [1]Sener et al.. Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities.” CVPR 2022, 2022. Link
  2. [2]Grauman et al.. Ego4D: Around the World in 3,000 Hours of Egocentric Video.” CVPR 2022, 2022. Link
  3. [3]Fang et al.. AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains.” IEEE T-RO 2023, 2023. Link
  4. [4]Physical Intelligence. pi0: A Vision-Language-Action Flow Model for General Robot Control.” arXiv 2024, 2024. Link

How Claru Delivers This Data

Claru's collector network includes skilled craftspeople and technicians across 20+ workshop types, from traditional woodworking shops to modern FabLabs and CNC facilities. This breadth enables collection of the tool diversity and material variety that tool-use robots need for cross-domain generalization -- a single large-scale collection campaign can capture data spanning hand tools, power tools, and machine tools across multiple material families.

Custom campaigns can target specific workshop types (woodworking, metalworking, electronics, automotive), tool categories (hand tools only, or power tool focus), material types, or skill levels (apprentice through master craftsperson). Turnaround from campaign specification to annotated delivery is typically 4-6 weeks for standard volumes.

Data is delivered in your preferred format with full sensor calibration data, tool inventories, and material specifications. Grasp type annotations are available as a separate layer for teams working specifically on grasp planning. All format conversions (RLDS, HDF5, WebDataset, LeRobot, custom schemas) are handled at no additional cost.

Frequently Asked Questions

20+ types including woodworking, metalworking, automotive repair, electronics assembly, 3D printing labs, CNC shops, welding, upholstery, bicycle repair, plumbing, HVAC, appliance repair, jewelry making, ceramics, glass, leather working, and general maker-spaces.

90+ tool categories spanning hand tools (chisels, planes, screwdrivers, wrenches, pliers), power tools (drills, saws, grinders, sanders, routers), machine tools (lathes, mills, drill presses), and specialty tools unique to specific trades. Each tool instance includes grasp type classification.

Yes. Every tool-use instance is annotated with grasp type following the Feix taxonomy adapted for tool use: power grasp, precision pinch, hook grip, lateral pinch, and tool-specific grips. This enables training grasp planning models alongside manipulation policies.

Yes. Collections can target specific skill levels from apprentice to master craftsperson. Skill-stratified data is valuable for learning both competent technique (from experts) and common error patterns (from apprentices) that robots should recognize and avoid.

Yes. The dataset includes lathe, mill, drill press, table saw, band saw, and CNC operations. Machine tool clips include spindle speed and feed rate metadata when available, and capture the setup, operation, and workpiece inspection phases of machine operations.

Request a Sample Pack

Get a curated sample of egocentric workshop video with tool-use and material transformation annotations to evaluate for your robotics or manufacturing project.