Locomotion Training Data
Locomotion datasets for legged robots — motion capture references, terrain traversal recordings, and proprioceptive data for training agile quadruped and humanoid walking, running, and climbing policies.
Data Requirements
Motion capture (optical/IMU) + joint encoders (1000 Hz) + IMU (200 Hz) + foot contact + terrain heightmap (LiDAR/depth)
1-10 hours MoCap references + 10K-100K terrain traversal segments
120-500 Hz motion capture, 500-1000 Hz joint encoders, 200 Hz IMU, 30 Hz visual
How Claru Supports This Task
Claru provides locomotion data collection across two pipelines. For reference motion capture, we partner with biomechanics laboratories equipped with Vicon or OptiTrack systems to capture animal and human gaits at 120-500 Hz with full-body marker configurations. Captured motions include walking, trotting, running, turning, stair climbing, and recovery behaviors in BVH, FBX, or retargetable formats. For real-world terrain data, we deploy instrumented legged platforms across multi-terrain test courses featuring calibrated slopes (5-25 degrees), stairs at 4 standard heights, gravel, sand, and natural outdoor terrain. Recordings include 500+ Hz proprioception, 200 Hz IMU, 30 Hz synchronized video, and LiDAR terrain mapping. Delivered datasets include gait phase labels, contact event timestamps at millisecond precision, terrain classification, and traversability scores. We support ANYmal, Spot, Unitree quadrupeds, and humanoid platforms including Digit with format conversion for IsaacGym, Legged Gym, or custom training pipelines.
What Is Locomotion in Robotics and Why Does Data Matter?
Legged locomotion is one of the most dynamic control problems in robotics. A quadruped walking at a moderate pace makes and breaks ground contacts at 2-4 Hz per leg, with each contact lasting 200-400 ms. During each stance phase, the leg must support body weight (50-100 kg for production quadrupeds like Spot or ANYmal), propel the body forward, and adjust to terrain geometry — all while maintaining lateral stability and coordinating with the other three legs. A humanoid biped faces an even harder balance problem: with only two legs, the support polygon is narrow and the center of mass is high, requiring continuous adjustment of hip, knee, and ankle torques at 200+ Hz to prevent falling.
Reinforcement learning in simulation has achieved remarkable locomotion results. Miki et al. (2022) trained ANYmal to traverse stairs, rubble, gaps, and steep slopes using RL in IsaacGym with privileged terrain information during training and only proprioceptive sensing at deployment. The policy learned to probe terrain with cautious steps, brace for expected impacts, and recover from stumbles — all from simulation alone. Similarly, Rudin et al. (2022) showed that a massively parallel RL training pipeline in IsaacGym can produce robust quadruped locomotion policies in under 20 minutes of GPU training time.
However, simulation-trained locomotion policies exhibit characteristic weaknesses that real-world data addresses. Sim-to-real transferred gaits are often energy-inefficient — consuming 30-50% more power than equivalent animal gaits — because simulation rewards optimize task completion rather than biological optimality criteria. Peng et al. (2020) demonstrated that imitation of real animal motion capture data produces gaits that are both more natural-looking and 15-30% more energy-efficient than pure RL. The motion capture references provide a strong prior on what 'good' locomotion looks like, constraining the RL solution space to gaits that resemble evolved biological strategies.
Real-world terrain data is also essential for the perception pipeline that feeds locomotion policies. Exteroceptive locomotion — where the robot uses cameras or LiDAR to see upcoming terrain before stepping on it — requires terrain perception models trained on real-world data. Simulation environments model terrain geometry accurately but cannot replicate the full visual complexity of outdoor surfaces: mud textures, wet rock reflections, grass blade occlusions, snow-covered steps, and the countless surface appearances that a robust terrain classifier must handle. Agarwal et al. (2023) showed that egocentric terrain perception for locomotion benefits strongly from real-world visual data, with outdoor success rates improving from 71% to 89% when real terrain images augmented the simulation-trained perception model.
Locomotion Data by the Numbers
Data Requirements by Locomotion Approach
Locomotion learning spans from pure simulation to demonstration-driven approaches. Each has distinct data needs.
| Approach | Real Data Needed | Primary Data Type | Sim Data Role | Strengths |
|---|---|---|---|---|
| Pure sim RL + sim-to-real transfer | 0 demos (system ID calibration only) | Motor parameters + terrain friction measurements | 100% — all learning in simulation | No real robot demos needed; fast iteration |
| Motion capture imitation + RL | 1-10 hours of animal/human MoCap | Optical MoCap at 120-500 Hz | RL fine-tuning in simulation against MoCap reference | Natural, energy-efficient gaits; smooth style |
| Learned terrain perception | 10K-100K real terrain images with labels | Egocentric RGB-D + terrain traversability labels | Pretraining on simulated terrain textures | Handles real visual complexity (mud, snow, glass) |
| Skill learning from demonstrations | 100-1K demos of specific skills (jumping, spinning) | Full-body proprioception at 500+ Hz + IMU + video | Optional — RL refinement of learned skills | Learns complex acrobatic behaviors directly |
| World model + model-predictive control | 10K-50K transition tuples from real terrain | Proprioception + exteroception + contact flags | Pre-training the world model dynamics | Adapts to novel terrain online; sample-efficient |
State of the Art in Learned Locomotion
ANYmal perceptive locomotion (Miki et al., 2022) demonstrated that a single RL policy can navigate stairs, rubble, gaps, and steep slopes using only proprioceptive and exteroceptive sensing — no terrain map or pre-planned path. The policy was trained entirely in IsaacGym simulation using a teacher-student framework: a privileged teacher had access to ground-truth terrain heightmaps, while the student policy learned to estimate terrain from proprioceptive history and onboard depth sensing. Deployed on the 50 kg ANYmal C robot, the policy achieved 92% success on a challenging outdoor obstacle course including 20-degree slopes, 15 cm rubble, and 25 cm gaps between stepping stones.
Peng et al. (2020) introduced AMP (Adversarial Motion Priors) for learning animal-like locomotion from motion capture. By training a discriminator to distinguish between the robot's movements and real animal motion data (from dog motion capture), AMP produces gaits that match animal dynamics without explicit reward engineering. Trained on 30 minutes of real dog motion capture, AMP-trained quadruped policies walk, trot, canter, and pace with natural-looking transitions between gaits. The energy efficiency improved by 23% compared to pure reward-based RL, demonstrating the concrete value of real motion data as a learning prior.
For humanoid locomotion, the field accelerated dramatically in 2023-2024. Radosavovic et al. (2024) demonstrated real-world humanoid walking on Digit using a policy trained in simulation with a combination of RL and reference motion data. The policy achieved stable walking at 1.2 m/s on flat ground and 0.8 m/s on grass — speeds that require precise timing of the double-support to single-support transitions that define human walking. The training used 50 hours of human walking motion capture as a style reference, producing gaits rated as more natural by human evaluators than any prior sim-to-real humanoid policy.
Parkour-level locomotion was demonstrated by Zhuang et al. (2023) and Cheng et al. (2024), who trained quadruped robots to jump over 60 cm gaps, vault onto boxes, and execute backflips. These extreme locomotion skills required both simulation training and real-world fine-tuning data: the simulation provided initial policy learning in safe conditions, while 500-1,000 real-world trials of each parkour skill corrected for the sim-to-real dynamics gap that is particularly severe during high-impact maneuvers. The real-world trial data was essential — policies trained only in simulation failed 40% of jumps that required precise timing of ground reaction forces.
Collection Methodology for Locomotion Data
Locomotion data collection encompasses two distinct modalities: reference motion data (what the gait should look like) and real-world interaction data (what happens when the robot actually walks on real terrain). Reference motion data comes from optical motion capture of animals or humans performing target gaits. A standard setup uses 8-12 OptiTrack or Vicon cameras tracking 30-50 reflective markers attached to the animal or human subject, recording at 120-500 Hz. For quadruped references, trained dogs wearing custom marker suits perform walks, trots, canters, and specific behaviors (turning, standing up, lying down). For humanoid references, human subjects walk on instrumented treadmills and outdoor terrain courses.
Real-world terrain interaction data captures the robot's proprioceptive experience during actual terrain traversal. This requires the robot to be instrumented for full-body state recording: joint position encoders at 1000 Hz, joint torque sensors or motor current measurements at 500-1000 Hz, IMU data at 200-400 Hz, and foot contact switches or force plates at 500+ Hz. The high frequency requirements are non-negotiable for locomotion — contact events during walking create force transients that last 10-50 ms and are invisible at lower sample rates. Missing these events means the policy cannot learn proper contact timing and force control.
Terrain diversity is the primary quality axis for locomotion datasets. A production-grade locomotion dataset should cover at minimum 8 terrain types: flat hard floor, carpet, short grass, tall grass or brush, gravel/loose stone, sand, stairs (up and down, multiple step heights), and slopes (5-degree increments from 5 to 25 degrees). Each terrain type needs 30-60 minutes of continuous traversal data covering multiple speeds and direction changes. For exteroceptive (camera-based) locomotion, each terrain type also needs egocentric RGB-D imagery at 30 Hz paired with terrain type labels and traversability scores.
Claru collects locomotion data using instrumented legged robot platforms deployed across purpose-built multi-terrain test courses and natural outdoor environments. Each recording includes full proprioceptive state at 500+ Hz, synchronized multi-view video at 30 Hz, and LiDAR terrain mapping for ground-truth elevation. Our outdoor courses include calibrated slope sections, stair sets at 4 standard heights (15, 18, 20, 25 cm), gravel beds, sand pits, and natural grass terrain. For motion capture references, we partner with biomechanics laboratories equipped with Vicon systems for precise animal and human gait capture.
Key Datasets for Locomotion
Public locomotion datasets include motion capture libraries, simulation benchmarks, and real-world deployment recordings.
| Dataset | Year | Scale | Platform | Terrain Types | Modalities |
|---|---|---|---|---|---|
| ANYmal in the Wild (Miki et al.) | 2022 | 100+ hours sim; 50+ real outdoor trials | ANYmal C (quadruped) | Stairs, rubble, slopes, gaps, forest | Proprioception + depth perception |
| AMP Dog MoCap (Peng et al.) | 2020 | 30 min of real dog motion capture | A1 quadruped (deployment) | Flat (MoCap studio) | Full-body MoCap at 120 Hz |
| Legged Gym / IsaacGym Locomotion | 2022 | Billions of sim steps; procedural terrain | ANYmal, A1, Cassie (sim) | 18 procedural terrain types | Simulated proprioception + height maps |
| CMU Motion Capture Database | 2003-ongoing | 2,500+ human motion sequences | Human subjects (Vicon) | Indoor flat ground | Optical MoCap at 120 Hz + video |
| Extreme Parkour (Cheng et al.) | 2024 | Sim + 500-1K real trials per skill | Unitree A1 (quadruped) | Flat, gaps, boxes, climbing walls | Proprioception + egocentric depth |
How Claru Supports Locomotion Data Needs
Claru provides locomotion data collection across two distinct pipelines: reference motion capture and real-world terrain interaction recording. For motion capture references, we partner with biomechanics laboratories equipped with Vicon or OptiTrack systems to capture animal gaits (dogs, cats, horses) and human locomotion at 120-500 Hz with 50-marker full-body configurations. Captured motions span walking, trotting, running, turning, stair climbing, and recovery behaviors, delivered in BVH, FBX, or custom retargetable formats.
For real-world terrain data, we deploy instrumented legged platforms across multi-terrain test courses featuring calibrated slopes (5-25 degrees), stair sets at 4 standard heights, gravel beds, sand pits, grass terrain, and indoor hard surfaces. Each traversal recording includes 500+ Hz proprioception (joint positions, velocities, torques), 200 Hz IMU data, 30 Hz synchronized video, and LiDAR terrain mapping. Terrain type labels and traversability scores are annotated per trajectory segment for training terrain-aware perception models.
Claru delivers locomotion datasets with gait phase annotations (stance, swing, double-support, flight), contact event timestamps at millisecond precision, terrain type classification per segment, center-of-mass trajectory estimates, and per-step foot placement coordinates. For clients training exteroceptive locomotion policies, we provide paired egocentric RGB-D imagery with terrain labels and traversability ground truth. Our data supports training pipelines for ANYmal, Spot, Unitree platforms, and humanoid robots including Digit, Atlas, and custom systems, with format conversion to match the target training framework.
References
- [1]Miki et al.. “Learning Robust Perceptive Locomotion for Quadrupedal Robots in the Wild.” Science Robotics 2022, 2022. Link
- [2]Peng et al.. “AMP: Adversarial Motion Priors for Stylized Physics-Based Character Animation.” SIGGRAPH 2021, 2021. Link
- [3]Peng et al.. “Learning Agile Robotic Locomotion Skills by Imitating Animals.” RSS 2020, 2020. Link
- [4]Rudin et al.. “Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning.” CoRL 2022, 2022. Link
- [5]Agarwal et al.. “Legged Locomotion in Challenging Terrains using Egocentric Vision.” CoRL 2023, 2023. Link
- [6]Cheng et al.. “Extreme Parkour with Legged Robots.” ICRA 2024, 2024. Link
Frequently Asked Questions
Simulation-only policies achieve 85-90% of optimal performance on standard terrains, but real data remains critical for three use cases: (1) terrain perception model training, where real visual complexity far exceeds simulation; (2) natural gait style transfer via motion capture references, which improves energy efficiency by 15-30%; and (3) high-impact skill refinement (jumping, parkour) where sim-to-real dynamics gaps cause 40% failure rates on precision timing. A 90/10 simulation/real split is cost-effective for most applications.
Outdoor MoCap options include GPS-RTK combined with body-mounted IMUs (centimeter accuracy, unlimited area), portable optical tracking (OptiTrack Active, 10x10m outdoor area, sub-millimeter accuracy), and markerless computer vision systems (Theia3D, no markers but lower accuracy). For large-scale outdoor terrain traversal, IMU-based motion estimation with periodic SLAM ground truth provides the best coverage-to-accuracy tradeoff. Claru uses a multi-sensor fusion approach combining IMU integration with visual-inertial odometry for continuous outdoor tracking.
A minimum of 8 terrain types for general quadruped deployment: flat hard floor, carpet, grass, gravel, sand, stairs (multiple heights), slopes (5-25 degrees), and curbs or single steps. Each terrain type needs 30-60 minutes of traversal at multiple speeds. For humanoid deployment, add cobblestone, wet surfaces, and transitions between terrain types. Terrain transitions (grass to gravel, flat to stairs) are particularly important and often under-represented in datasets.
Joint-level proprioception at 200 Hz minimum, with 500-1000 Hz strongly preferred for capturing contact force transients that last 10-50 ms. IMU data at 200 Hz for body orientation and angular velocity. Visual data at 30 Hz (sufficient for terrain perception). Foot contact data should match proprioception frequency. Lower proprioceptive rates introduce aliasing of contact dynamics that prevents the policy from learning proper foot placement timing and force control.
Quadruped data is simpler: 4 legs with 3-4 joints each, inherently more stable (4-point support polygon), and gait transitions are gradual. Biped data requires higher precision: 2 legs with 6+ joints each, the narrow support polygon demands precise center-of-mass tracking, and transitions between double-support and single-support phases must be captured at high temporal resolution. Biped datasets also need upper-body data (arms and torso) because arm swing is essential for balance. Expect 2-3x more annotation effort per hour of biped data compared to quadruped.
Get a Custom Quote for Locomotion Data
Describe your robot platform, target terrain environments, and locomotion skills, and we will design a data collection plan covering motion capture references and real-world terrain recordings.