Custom Manipulation Trajectory Data Collection for Robotics
Open manipulation datasets cover broad task distributions but rarely match the specific embodiment, environment, and action-space representation your policy requires. Claru builds custom trajectory datasets from scratch — capturing the exact manipulation behaviors, sensor configurations, and annotation formats that production robotics systems need to generalize beyond the lab.
What Makes Manipulation Trajectory Data So Hard to Collect?
Manipulation trajectory data pairs observation streams (RGB, depth, proprioception) with timestamped action sequences (joint velocities, end-effector poses, gripper states) at control-loop frequency. Collecting this data at scale requires synchronized multi-modal capture, calibrated hardware, and structured annotation of task boundaries, contact events, and success criteria. AgiBot World demonstrated the infrastructure cost: 1 million trajectories across 217 tasks required a 4,000-square-meter facility, 100 robots, and a dedicated engineering team to maintain temporal alignment between camera feeds and joint-state logs [agibot-2025]. Most robotics labs lack this infrastructure entirely. The result is a field where the largest open datasets still cover fewer than 22 robot embodiments [oxe-2023], and labs training policies for new hardware or new tasks face a cold-start problem that no amount of pre-training on mismatched data solves.
[1][3]Why Does Embodiment Mismatch Degrade Policy Transfer?
DROID collected 76,000 trajectories over 350 hours of interaction, but every trajectory used a single robot: the Franka Emika Panda [droid-2024]. Policies trained on DROID inherit Franka-specific kinematics, gripper geometry, and control-frequency assumptions that do not transfer to other arms without significant fine-tuning. Open X-Embodiment aggregated data from 22 different robots and showed that cross-embodiment transfer is possible in principle — but the dataset's quality variability across contributing labs meant that models trained on the full mixture often underperformed models trained on smaller, higher-quality subsets [oxe-2023]. AgiBot World's GO-1 model achieved a 30% improvement over models trained on Open X-Embodiment data, attributing the gap primarily to consistent capture quality across their controlled facility [agibot-2025]. The pattern is clear: trajectory data must match the target embodiment and maintain consistent quality to produce reliable policies.
[2][3][1]How Do Task Coverage Gaps Limit Real-World Deployment?
Production manipulation systems encounter task distributions that open datasets were not designed to cover. A warehouse pick-and-place robot handles thousands of SKU geometries; a kitchen assistant robot navigates deformable objects, liquids, and articulated containers. DROID's 76,000 trajectories span tabletop manipulation with rigid objects — a narrow slice of real-world interaction [droid-2024]. AgiBot World covers 217 tasks but within a controlled facility that does not replicate the visual and physical variability of deployment environments [agibot-2025]. Generalist AI (GEN-0) claims 270,000 hours of robotic interaction data generated at 10,000 hours per week, but these figures are company-reported and not peer-reviewed, making independent verification impossible [gen0-2024]. Labs building production systems need trajectory data that matches their specific task distribution, not a generic benchmark.
[2][1][4]How Do Open Manipulation Datasets Compare to Custom Collection?
The table below compares the four most cited manipulation trajectory sources against Claru's custom collection approach. Scale alone does not determine utility — embodiment match, task coverage, and annotation consistency are the variables that predict policy performance.
AgiBot World
DROID
Open X-Embodiment
Claru Custom Collection
Egocentric Video Data Collection for Robotics and World Modeling
We built a purpose-built capture and ingestion platform — not adapted from an off-the-shelf tool — and launched three parallel pipelines within days of engagement, each optimized for different environments and interaction types. The first pipeline deployed GoPro and DJI wearable cameras for high-fidelity, wide-angle egocentric capture of manipulation tasks, cooking, and locomotion — producing 219,000+ clips. The second pipeline used smartphone cameras for rapid, high-volume capture of everyday activities across diverse indoor and outdoor environments — producing 155,000+ clips.
Read Full Case StudyGame-Based Data Capture for Real-World Simulation
We designed and built a custom capture application from scratch. The system performs simultaneous screen recording at native resolution and raw input logging, capturing every keystroke, mouse movement, and controller input as structured data with microsecond-precision timestamps. Frame-level alignment between the video and control streams is maintained via a shared monotonic clock, with periodic sync markers to detect and correct any drift.
Read Full Case StudyAnnotators
Countries
Annotations Delivered
QA Turnaround
Frequently Asked Questions
Claru supports joint-velocity, end-effector pose (6-DOF position + orientation), and raw control input representations. The specific action space is configured per engagement based on the client's policy architecture. For imitation learning pipelines that consume observation-action pairs, Claru delivers per-frame action labels with microsecond-precision timestamps aligned to the video stream.
Open datasets are free to download but carry hidden costs: fine-tuning to compensate for embodiment mismatch, re-annotating inconsistent labels, and filtering quality-variable subsets. AgiBot World's facility required 100 robots and 4,000 square meters of dedicated space. Claru's distributed collection model avoids facility overhead entirely, and the 1-2 week calibration phase per engagement means production data collection begins within days, not months.
Yes. Claru's capture pipelines are hardware-agnostic at the observation level — GoPro, DJI, smartphone, and custom camera rigs are all supported. For proprioceptive data (joint states, torques), Claru integrates with the client's teleoperation interface or deploys its synchronized capture system, which operates at the OS input layer rather than hooking into specific robot firmware.
Throughput depends on task complexity and annotation requirements. In the egocentric video engagement, Claru produced 386,000 clips across three parallel pipelines with approximately 500 global contributors. The game-based capture engagement produced 10,000 hours of synchronized data. Weekly delivery batches mean collection scales continuously rather than in discrete project phases.
Every submission passes automated validation (resolution, duration, orientation, file integrity) at upload time, followed by human QA review within 24 hours. Inter-annotator agreement is tracked via real-time dashboards, and submissions falling below quality thresholds trigger specific remediation instructions to contributors. The structured activity taxonomy is enforced at the UI level, preventing free-text label drift across the contributor pool.
Your next hire isn't a vendor.
It's a data team.
Tell us what you're training. We'll scope the dataset.
References
- [1]AgiBot Team. “AgiBot World: A Unified Platform for Scalable and Diverse Robot Learning.” arXiv, 2025. 1M+ trajectories across 217 tasks in a 4,000 sqm facility; GO-1 model achieves 30% improvement over Open X-Embodiment-trained baselines. Link
- [2]Khazatsky et al.. “DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset.” arXiv, 2024. 76,000 trajectories over 350 hours of interaction data collected across multiple institutions, but limited to a single robot embodiment (Franka Emika Panda). Link
- [3]Open X-Embodiment Collaboration. “Open X-Embodiment: Robotic Learning Datasets and RT-X Models.” arXiv, 2023. 1M+ trajectories from 22 robot embodiments across 60+ datasets; quality variability across contributing labs limits transfer performance on curated subsets. Link
- [4]Generalist AI. “GEN-0: Building a General-Purpose Robot.” Company Publication, 2024. Claims 270,000 hours of robotic interaction data generated at 10,000 hours per week; figures are company-reported and not peer-reviewed. Link