Custom Manipulation Trajectory Data Collection for Robotics

Q: What action space representations does Claru support for manipulation trajectory data?

Claru supports joint-velocity, end-effector pose (6-DOF position + orientation), and raw control input representations. The specific action space is configured per engagement based on the client's policy architecture. For imitation learning pipelines that consume observation-action pairs, Claru delivers per-frame action labels with microsecond-precision timestamps aligned to the video stream.

Q: How does custom trajectory data compare in cost to using open datasets like DROID or Open X-Embodiment?

Open datasets are free to download but carry hidden costs: fine-tuning to compensate for embodiment mismatch, re-annotating inconsistent labels, and filtering quality-variable subsets. AgiBot World's facility required 100 robots and 4,000 square meters of dedicated space. Claru's distributed collection model avoids facility overhead entirely, and the 1-2 week calibration phase per engagement means production data collection begins within days, not months.

Q: Can Claru collect manipulation trajectory data for custom robot hardware?

Yes. Claru's capture pipelines are hardware-agnostic at the observation level — GoPro, DJI, smartphone, and custom camera rigs are all supported. For proprioceptive data (joint states, torques), Claru integrates with the client's teleoperation interface or deploys its synchronized capture system, which operates at the OS input layer rather than hooking into specific robot firmware.

Q: How many trajectories can Claru collect per month?

Throughput depends on task complexity and annotation requirements. In the egocentric video engagement, Claru produced 386,000 clips across three parallel pipelines with approximately 500 global contributors. The game-based capture engagement produced 10,000 hours of synchronized data. Weekly delivery batches mean collection scales continuously rather than in discrete project phases.

Q: What quality assurance processes does Claru apply to trajectory data?

Every submission passes automated validation (resolution, duration, orientation, file integrity) at upload time, followed by human QA review within 24 hours. Inter-annotator agreement is tracked via real-time dashboards, and submissions falling below quality thresholds trigger specific remediation instructions to contributors. The structured activity taxonomy is enforced at the UI level, preventing free-text label drift across the contributor pool.

Open manipulation datasets cover broad task distributions but rarely match the specific embodiment, environment, and action-space representation your policy requires. Claru builds custom trajectory datasets from scratch — capturing the exact manipulation behaviors, sensor configurations, and annotation formats that production robotics systems need to generalize beyond the lab.

What Makes Manipulation Trajectory Data So Hard to Collect?

Manipulation trajectory data pairs observation streams (RGB, depth, proprioception) with timestamped action sequences (joint velocities, end-effector poses, gripper states) at control-loop frequency. Collecting this data at scale requires synchronized multi-modal capture, calibrated hardware, and structured annotation of task boundaries, contact events, and success criteria. AgiBot World demonstrated the infrastructure cost: 1 million trajectories across 217 tasks required a 4,000-square-meter facility, 100 robots, and a dedicated engineering team to maintain temporal alignment between camera feeds and joint-state logs [agibot-2025]. Most robotics labs lack this infrastructure entirely. The result is a field where the largest open datasets still cover fewer than 22 robot embodiments [oxe-2023], and labs training policies for new hardware or new tasks face a cold-start problem that no amount of pre-training on mismatched data solves.

[1][3]

Why Does Embodiment Mismatch Degrade Policy Transfer?

DROID collected 76,000 trajectories over 350 hours of interaction, but every trajectory used a single robot: the Franka Emika Panda [droid-2024]. Policies trained on DROID inherit Franka-specific kinematics, gripper geometry, and control-frequency assumptions that do not transfer to other arms without significant fine-tuning. Open X-Embodiment aggregated data from 22 different robots and showed that cross-embodiment transfer is possible in principle — but the dataset's quality variability across contributing labs meant that models trained on the full mixture often underperformed models trained on smaller, higher-quality subsets [oxe-2023]. AgiBot World's GO-1 model achieved a 30% improvement over models trained on Open X-Embodiment data, attributing the gap primarily to consistent capture quality across their controlled facility [agibot-2025]. The pattern is clear: trajectory data must match the target embodiment and maintain consistent quality to produce reliable policies.

[2][3][1]

How Do Task Coverage Gaps Limit Real-World Deployment?

Production manipulation systems encounter task distributions that open datasets were not designed to cover. A warehouse pick-and-place robot handles thousands of SKU geometries; a kitchen assistant robot navigates deformable objects, liquids, and articulated containers. DROID's 76,000 trajectories span tabletop manipulation with rigid objects — a narrow slice of real-world interaction [droid-2024]. AgiBot World covers 217 tasks but within a controlled facility that does not replicate the visual and physical variability of deployment environments [agibot-2025]. Generalist AI (GEN-0) claims 270,000 hours of robotic interaction data generated at 10,000 hours per week, but these figures are company-reported and not peer-reviewed, making independent verification impossible [gen0-2024]. Labs building production systems need trajectory data that matches their specific task distribution, not a generic benchmark.

[2][1][4]

How Do Open Manipulation Datasets Compare to Custom Collection?

The table below compares the four most cited manipulation trajectory sources against Claru's custom collection approach. Scale alone does not determine utility — embodiment match, task coverage, and annotation consistency are the variables that predict policy performance.

Name	Scale	Tasks	Environments	Limitations
AgiBot World	1M+ trajectories, 217 tasks	Tabletop manipulation, mobile manipulation, bimanual tasks	4,000 sqm controlled facility, 100 robots	Single facility limits environmental diversity; 5 embodiment types; not publicly available for all tasks
DROID	76K trajectories, 350 hours	Tabletop manipulation (rigid objects, limited deformable)	Multiple labs, but Franka Panda only	Single embodiment (Franka); rigid-object bias; no mobile or bimanual tasks
Open X-Embodiment	1M+ trajectories, 22 robots	Broad but inconsistent — aggregated from 60+ contributing datasets	Heterogeneous lab settings across contributing institutions	Quality variability across labs; inconsistent annotation formats; models trained on full mixture often underperform curated subsets
Claru Custom Collection	386K+ clips (egocentric) + 10,000+ hours (synchronized gameplay)	Configured per engagement — task taxonomy co-designed with research team	Global contributors across ~500 participants; real-world indoor/outdoor	Requires 1-2 week calibration phase per new engagement; not a public benchmark

AgiBot World

Scale1M+ trajectories, 217 tasks

TasksTabletop manipulation, mobile manipulation, bimanual tasks

Environments4,000 sqm controlled facility, 100 robots

LimitationsSingle facility limits environmental diversity; 5 embodiment types; not publicly available for all tasks

DROID

Scale76K trajectories, 350 hours

TasksTabletop manipulation (rigid objects, limited deformable)

EnvironmentsMultiple labs, but Franka Panda only

LimitationsSingle embodiment (Franka); rigid-object bias; no mobile or bimanual tasks

Open X-Embodiment

Scale1M+ trajectories, 22 robots

TasksBroad but inconsistent — aggregated from 60+ contributing datasets

EnvironmentsHeterogeneous lab settings across contributing institutions

LimitationsQuality variability across labs; inconsistent annotation formats; models trained on full mixture often underperform curated subsets

Claru Custom Collection

Scale386K+ clips (egocentric) + 10,000+ hours (synchronized gameplay)

TasksConfigured per engagement — task taxonomy co-designed with research team

EnvironmentsGlobal contributors across ~500 participants; real-world indoor/outdoor

LimitationsRequires 1-2 week calibration phase per new engagement; not a public benchmark

Egocentric Video Data Collection for Robotics and World Modeling

386K+Total first-person video clips captured

219KGoPro & DJI wearable capture clips

155KSmartphone capture clips

~500Global contributors across 3 pipelines

We built a purpose-built capture and ingestion platform — not adapted from an off-the-shelf tool — and launched three parallel pipelines within days of engagement, each optimized for different environments and interaction types. The first pipeline deployed GoPro and DJI wearable cameras for high-fidelity, wide-angle egocentric capture of manipulation tasks, cooking, and locomotion — producing 219,000+ clips. The second pipeline used smartphone cameras for rapid, high-volume capture of everyday activities across diverse indoor and outdoor environments — producing 155,000+ clips.

Read Full Case Study

Game-Based Data Capture for Real-World Simulation

10,000+Hours of synchronized gameplay data

<16msVideo-to-input temporal alignment error

CustomCapture solution built from scratch

0Data loss incidents across all sessions

We designed and built a custom capture application from scratch. The system performs simultaneous screen recording at native resolution and raw input logging, capturing every keystroke, mouse movement, and controller input as structured data with microsecond-precision timestamps. Frame-level alignment between the video and control streams is maintained via a shared monotonic clock, with periodic sync markers to detect and correct any drift.

Read Full Case Study

Annotators

Countries

0M+

Annotations Delivered

Same-day

QA Turnaround

Frequently Asked Questions