Closing the Sim-to-Real Gap with Real-World Data Collection
Simulation trains fast but deploys brittle. The gap between rendered physics and physical reality still causes 30-50% performance drops when policies transfer to hardware. Closing that gap requires structured real-world data collected at the exact distribution your simulator cannot reproduce. Claru operates the collection infrastructure that bridges simulation and deployment.
The Domain Gap Is a Data Problem, Not a Compute Problem
The sim-to-real gap refers to the performance degradation that occurs when a policy trained in simulation is deployed on physical hardware. Despite advances in photorealistic rendering and physics engines, simulated environments systematically differ from reality in ways that matter for control: contact dynamics, surface friction coefficients, lighting variation, sensor noise profiles, and object deformation under force. NVIDIA Isaac Sim achieves visual fidelity within 5% of real camera output, yet policies trained exclusively in Isaac Sim still exhibit 30-50% task success rate drops on physical robots due to dynamics mismatches that no renderer can solve [1]. The fundamental issue is distributional: simulation generates data from an approximation of the real world, and the approximation error compounds across long-horizon manipulation tasks where small force errors accumulate into large trajectory deviations.
[1]Why Domain Randomization Alone Falls Short
Domain randomization — varying textures, lighting, object masses, and friction parameters randomly during training — has been the standard mitigation since 2017. The approach works for perception-heavy tasks (object detection, pose estimation) but degrades for contact-rich manipulation where the randomization range must cover the true physical parameters without being so wide that the policy learns overly conservative behaviors. ABB and NVIDIA's HyperReality system demonstrated 99% correlation between simulated and real sensor readings with 0.5mm positioning accuracy, but achieved this only by constraining the simulation to a narrow, well-calibrated domain — industrial robotic cells with known geometry and materials [2]. Generalizing that calibration to diverse household or workplace environments remains unsolved. The Sim2Real-VLA architecture at ICLR 2026 showed that vision-language-action models trained exclusively on synthetic data can achieve zero-shot real-world transfer, but the paper's own ablation revealed that adding even 500 real-world demonstrations improved manipulation success rates by 18 percentage points over the synthetic-only baseline [3].
[2][3]What Real-World Data Actually Fixes
Real-world data addresses three specific failure modes that simulation cannot resolve internally. First, contact dynamics: the force profiles generated when a gripper contacts a deformable object (fabric, food, paper) differ from simulated rigid-body or soft-body approximations in ways that depend on material batch, temperature, and humidity — variables that simulation randomizes uniformly but reality distributes non-uniformly. Second, perceptual distribution shift: real kitchens, workshops, and warehouses have lighting, clutter, and occlusion patterns that domain randomization under-represents because the randomization is typically parameterized by engineers who unconsciously bias toward well-lit, moderately cluttered scenes. Third, embodiment-specific dynamics: every physical robot has manufacturing tolerances, joint backlash, and cable routing that create systematic biases absent from its URDF model. Targeted real-world data collection addresses all three by sampling directly from the deployment distribution rather than approximating it.
[1][3]How Do Current Approaches to Sim-to-Real Transfer Compare?
Four strategies dominate sim-to-real transfer today. Each involves a different data requirement and cost structure. Simulation-only training is cheapest per sample but most brittle at deployment. Full real-world collection is most robust but scales slowly. The hybrid approaches in between vary in how much real data they require and how effectively they use it.
Simulation-Only (Isaac Sim / MuJoCo)
Sim + Fine-Tuning (Sim2Real-VLA approach)
Real-World Only (DROID / Open X-Embodiment)
Claru Hybrid Collection
Game-Based Data Capture for Real-World Simulation
We designed and built a custom capture application from scratch. The system performs simultaneous screen recording at native resolution and raw input logging, capturing every keystroke, mouse movement, and controller input as structured data with microsecond-precision timestamps. Frame-level alignment between the video and control streams is maintained via a shared monotonic clock, with periodic sync markers to detect and correct any drift.
Read Full Case StudyEgocentric Video Data Collection for Robotics and World Modeling
We built a purpose-built capture and ingestion platform — not adapted from an off-the-shelf tool — and launched three parallel pipelines within days of engagement, each optimized for different environments and interaction types. The first pipeline deployed GoPro and DJI wearable cameras for high-fidelity, wide-angle egocentric capture of manipulation tasks, cooking, and locomotion — producing 219,000+ clips. The second pipeline used smartphone cameras for rapid, high-volume capture of everyday activities across diverse indoor and outdoor environments — producing 155,000+ clips.
Read Full Case StudyAnnotators
Countries
Annotations Delivered
QA Turnaround
Frequently Asked Questions
Between 500 and 5,000 real-world demonstrations typically close the gap for a specific task domain. The Sim2Real-VLA study showed that 500 real demonstrations improved manipulation success by 18 percentage points over synthetic-only training. The exact volume depends on task complexity, environment diversity, and how well the simulator approximates the target domain. Claru sizes collection volume based on an initial calibration phase that measures transfer error reduction per added demonstration batch.
Contact-rich manipulation data delivers the highest marginal value because contact dynamics are the hardest for simulators to model accurately. Specifically, data showing force profiles during grasping deformable objects, tool-surface interactions, and multi-step assembly sequences addresses the failure modes where simulation diverges most from reality. Claru prioritizes these interaction types through its activity-specific capture pipeline, which produced 12,000+ precisely labeled manipulation clips in a single engagement.
Yes, for narrow domains. The Sim2Real-VLA architecture demonstrated zero-shot sim-to-real transfer using vision-language-action models trained exclusively on synthetic data. ABB and NVIDIA's HyperReality system achieved 99% sim-real correlation in calibrated industrial cells. However, both approaches constrain the deployment environment — zero-shot transfer degrades rapidly as environment diversity increases. For general-purpose robotics targeting varied households or workplaces, real-world data remains necessary to cover the distribution that simulation cannot anticipate.
Claru delivers data in formats compatible with standard robotics training frameworks — per-frame image sequences paired with structured metadata (timestamps, activity labels, environment descriptors). Research teams typically use Claru's real-world data in three ways: as a fine-tuning set after simulation pre-training, as a validation set to measure sim-to-real transfer error, or as a calibration set to tune domain randomization parameters. The weekly delivery cadence means teams can begin fine-tuning runs during collection rather than waiting for a complete dataset.
Your next hire isn't a vendor.
It's a data team.
Tell us what you're training. We'll scope the dataset.
References
- [1]NVIDIA Developer Blog. “Closing the Sim-to-Real Gap with NVIDIA Isaac Sim.” NVIDIA Developer, 2025. Isaac Sim achieves photorealistic rendering within 5% of real camera output, but policies still exhibit 30-50% success rate drops on physical hardware due to dynamics mismatches. Link
- [2]ABB & NVIDIA. “ABB Robotics Partners with NVIDIA to Deliver Industrial-Grade Physical AI at Scale (RobotStudio HyperReality).” ABB Technology Review, 2025. RobotStudio HyperReality achieves 99% sim-to-real correlation using NVIDIA Omniverse with ABB's virtual controller firmware; Absolute Accuracy technology reduces positioning errors from 8-15mm to 0.5mm. Available H2 2026. Link
- [3]Anonymous (under review). “Sim2Real-VLA: Bridging the Sim-to-Real Gap with Vision-Language-Action Models.” ICLR 2026, 2026. Vision-language-action models trained exclusively on synthetic data achieve zero-shot real-world transfer; adding 500 real demonstrations improves manipulation success by 18 percentage points. Link
- [4]Khazatsky et al.. “DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset.” arXiv 2024, 2024. 76,000 demonstration episodes across 564 scenes and 86 tasks, showing that dataset scale and diversity improve cross-embodiment transfer. Link
- [5]Collaboration et al.. “Open X-Embodiment: Robotic Learning Datasets and RT-X Models.” arXiv 2024, 2024. 1M+ episodes across 22 robot embodiments demonstrate that cross-embodiment data improves transfer, but lab-environment bias limits real-world deployment performance. Link