How to Set Up a Domain Randomization Pipeline

A practitioner's guide to building a domain randomization pipeline for sim-to-real robot learning — from selecting randomization axes and configuring simulators through tuning distribution ranges, validating transfer quality, and combining synthetic data with real-world demonstrations.

Difficultyadvanced
Time2-4 weeks

Prerequisites

  • NVIDIA GPU (RTX 3090+ for Isaac Sim, any CUDA GPU for MuJoCo)
  • Simulator installation (Isaac Sim 2023.1+ or MuJoCo 3.0+)
  • Robot URDF or MJCF model
  • 3D object models (OBJ/USD) for target objects
  • Python 3.9+ with PyTorch
1

Identify the Randomization Axes for Your Task

Domain randomization operates across three categories: visual randomization (what the camera sees), physics randomization (how objects interact), and dynamics randomization (how the robot behaves). For each category, identify which specific parameters affect your task's sim-to-real gap.

Visual randomization axes: (1) Lighting — intensity (0.3x-3x ambient), color temperature (3000-7000K), direction (hemisphere sampling), number of lights (1-5), shadows on/off. (2) Textures — table surface (uniform color, wood, metal, fabric, random procedural), object textures, wall/background textures. (3) Camera — position (+-3-5 cm per axis from nominal), orientation (+-3-5 degrees), field of view (+-5 degrees), focal length, white balance, noise level. (4) Distractors — random objects placed in the scene that are not task-relevant, varying in count (0-10) and appearance.

Physics randomization axes: (1) Friction — surface-surface friction coefficient (0.3-1.5), object-table friction, gripper-object friction. (2) Mass — object mass (0.5x-2x nominal), center of mass offset (+-1 cm). (3) Restitution — bounciness coefficient (0.0-0.5). (4) Contact stiffness and damping (affects how objects feel when grasped).

Dynamics randomization axes: (1) Actuator gains — P and D gains of joint controllers (0.8x-1.2x). (2) Motor delay — control latency (0-20 ms). (3) Joint friction — per-joint Coulomb friction (0.5x-2x). (4) Observation noise — Gaussian noise added to joint position readings (0-0.005 rad) and force readings (0-0.5 N).

Document all axes in a randomization specification with the planned range for each parameter and the sampling distribution (uniform is the default; log-uniform for parameters spanning orders of magnitude like friction).

Randomization specification document

Tip: Start with visual randomization only — it is the easiest to implement, has the largest impact on perception-based sim-to-real transfer, and does not affect training stability. Add physics and dynamics randomization incrementally after visual DR is validated

Tip: Start with visual randomization only. It has the largest impact on perception-based sim-to-real transfer and does not affect training stability.

Tip: Document the rationale for each randomization range. This prevents future team members from arbitrarily narrowing ranges that were set based on specific real-world measurements.

2

Configure the Simulator and Randomization APIs

Set up the simulation environment with your robot model, task objects, and the randomization framework. The two main simulator ecosystems are:

NVIDIA Isaac Sim + Replicator: Isaac Sim provides the rendering engine (RTX ray tracing for photorealistic images), robot model loading (URDF/USD), and physics (PhysX). Replicator is the built-in randomization API. Create a Replicator graph that specifies which parameters to randomize and their distributions. Example: rep.create.light(position=rep.distribution.uniform((-2,-2,1),(2,2,3)), intensity=rep.distribution.uniform(500,5000)). Replicator can randomize textures, object poses, camera parameters, and lighting in a single graph definition. Run the graph at the start of each episode to generate a fresh randomized scene.

MuJoCo + custom randomization: MuJoCo provides fast physics simulation (10,000+ steps/second on CPU, 1M+ with GPU parallelization via MuJoCo XLA). Randomization is applied by modifying the MJCF model XML before each episode: change friction, mass, joint damping, and actuator parameters. For visual randomization, MuJoCo's native renderer is basic (no ray tracing), so either: (a) use MuJoCo for physics and render with a separate engine (Blender, Isaac Sim) for perception training, or (b) apply visual randomization as image augmentation (color jitter, random crops, Gaussian noise) after rendering. The augmentation approach is simpler but less physically accurate than rendered randomization.

For either simulator, create a randomization configuration file (YAML or JSON) that lists all randomized parameters with their distribution type and range. This config file is versioned alongside the codebase so that randomization settings are reproducible.

NVIDIA Isaac Sim + ReplicatorMuJoCoBlender (for USD asset creation)

Tip: Use Isaac Sim's Synthetic Data Generation (SDG) extensions for automated ground-truth labeling — every randomized frame comes with perfect segmentation masks, depth maps, and bounding boxes at zero annotation cost

Tip: Use Isaac Sim's Synthetic Data Generation extensions for automatic ground-truth labels at zero annotation cost.

Tip: Test the randomization pipeline by generating 100 samples and visually inspecting them before large-scale generation.

3

Generate and Validate Synthetic Training Data

With the randomization pipeline configured, generate a large synthetic dataset. For perception pretraining (object detection, segmentation), generate 50,000-500,000 randomized images. For policy pretraining (manipulation, navigation), generate 10,000-100,000 randomized episodes.

During generation, log the randomization parameters used for each image or episode alongside the data. This metadata enables post-hoc analysis: if the real-world transfer fails for shiny objects, you can check whether your texture randomization included sufficient metallic/specular materials. Store the data in a format compatible with your training pipeline — images as .png with JSON annotations for perception, HDF5 or RLDS episodes for policy training.

Validate the synthetic data before training: (1) Visual plausibility — render 100 random images and verify they look reasonable (no objects floating in mid-air, no cameras inside walls, no extreme lighting that makes everything black or white). (2) Physics plausibility — for policy data, verify that the physics randomization does not produce impossible scenarios (objects with negative mass, friction > 2.0 causing numerical instability). (3) Distribution coverage — histogram each randomized parameter and verify the distribution matches the specification (uniform parameters should be flat, not peaked). (4) Label correctness — for perception data, overlay the generated labels (bounding boxes, masks) on the images and verify alignment.

Common generation pitfalls: Objects spawning inside each other (add collision checking during scene generation), cameras placed inside objects (add camera frustum validation), and lighting so extreme that images are black (clamp minimum scene brightness). Fix these issues before large-scale generation.

NVIDIA ReplicatorPython (for metadata logging and validation)wandb (for visualizing generated samples)

Tip: Generate 1,000 samples first and train a small model as a sanity check before generating the full dataset — this catches data format issues and gross randomization errors at 0.2% of the cost

Tip: Generate 1,000 samples and train a small model as a sanity check before generating the full dataset. This catches data format issues at 0.2% of the cost.

Tip: Validate that randomization parameters are logged alongside each generated sample for post-hoc analysis.

4

Train with Randomized Data and Evaluate Sim-to-Real Transfer

Train your model on the domain-randomized synthetic data and evaluate transfer to the real world. The evaluation protocol must be carefully designed to isolate the effect of domain randomization from other variables.

For perception models (detectors, segmenters): train on 100% synthetic data with domain randomization, then evaluate on a real-world test set (100-500 real images labeled by hand). Compare against: (a) a model trained on the same number of real images (upper bound), (b) a model trained on synthetic data without domain randomization (lower bound). The randomized model should close 50-80% of the gap between (a) and (b).

For manipulation policies: pretrain on domain-randomized simulation data, then evaluate success rate on 50-100 real-world trials per task. If sim-to-real transfer is below 60% success, the randomization ranges are likely too narrow (real conditions fall outside the training distribution) or there is a systematic gap the randomization does not cover (e.g., sensor delay, robot dynamics mismatch). Diagnose by replaying the real-world observations through the policy and inspecting where it makes incorrect predictions — if the policy fails at the perception stage (wrong object detection), widen visual randomization; if it fails at the control stage (correct detection but wrong action), widen dynamics randomization.

For policies that need higher real-world performance, add a fine-tuning stage: take the sim-pretrained policy and fine-tune on 500-2,000 real-world demonstrations. This hybrid (sim pretrain + real fine-tune) consistently outperforms either approach alone. The sim pretraining provides a strong initialization that the real data refines. Track the fine-tuning data efficiency: how many real demonstrations are needed to reach 90% success? Compare to training from scratch on real data to quantify the benefit of sim pretraining.

PyTorchwandb (experiment tracking)Robot hardware (for real-world evaluation)

Tip: When evaluating sim-to-real transfer, test in the exact real-world conditions you plan to deploy — not a clean lab setup. The whole point of domain randomization is to handle real-world variability, so evaluate on that variability

Tip: When evaluating sim-to-real transfer, test in the exact real-world conditions you plan to deploy in, not a clean lab setup.

Tip: Record video of every real-world evaluation trial. Failure mode analysis requires seeing what actually happened.

5

Iterate: Tune Randomization Ranges Based on Real-World Failures

Domain randomization is not a one-shot setup — it requires iterative tuning based on real-world deployment feedback. Analyze real-world failures to identify which randomization axes need adjustment.

Build a failure analysis pipeline: (1) Record all real-world evaluation trials with full sensor data. (2) For each failure, classify the failure mode (perception failure, planning failure, control failure). (3) For perception failures, compare the real observation to the distribution of synthetic observations — is the real image out-of-distribution in color, texture, lighting, or clutter? If so, widen the corresponding randomization axis. (4) For control failures, compare the real dynamics (measured forces, achieved velocities) to the range of dynamics in simulation — is the real friction/mass/delay outside the randomized range? If so, widen the dynamics randomization.

Common iteration patterns: (a) Lighting too narrow — real-world shadows and highlights are more extreme than initial randomization. Widen lighting intensity range and add hard shadows. (b) Object textures too synthetic — procedural textures do not capture real-world wear, printing, and material variation. Add photorealistic texture assets from 3D asset libraries (Polyhaven, AmbientCG). (c) Camera position too narrow — real cameras drift over time due to vibration and temperature changes. Widen camera pose randomization to +-5 cm and +-5 degrees. (d) Friction too narrow — real gripper-object friction varies dramatically with object material and surface condition. Widen friction range to 0.2-1.5.

After each iteration, retrain and re-evaluate. Track the sim-to-real gap (performance difference between simulation and real world) across iterations — it should decrease monotonically. If it plateaus, the remaining gap is likely due to factors that domain randomization cannot address (deformable object physics, fluid dynamics, thermal effects), and real-world data fine-tuning is needed for further improvement.

Failure analysis scriptswandb (for tracking iterations)3D asset libraries (Polyhaven, AmbientCG)

Tip: Maintain a 'randomization changelog' that records every range adjustment with the justification — this prevents oscillation (widening, then narrowing, then widening the same parameter) and provides institutional knowledge for future projects

Tip: Maintain a randomization changelog recording every range adjustment with the justification. This prevents oscillation and provides institutional knowledge.

Tip: After each iteration, verify that the sim-to-real gap decreases monotonically. If it does not, the latest randomization change may have introduced a regression.

6

Integrate with Real-World Data for Production Deployment

For production deployment, combine domain-randomized synthetic data with real-world data in a structured training pipeline. The integration follows a standard two-phase approach:

Phase 1 — Sim pretraining: Train the model on 50,000-500,000 domain-randomized synthetic episodes (or images for perception). Use aggressive augmentation and randomization in this phase since the goal is to build general representations. Train until validation loss on a held-out synthetic set plateaus.

Phase 2 — Real fine-tuning: Starting from the Phase 1 checkpoint, fine-tune on real-world data with a reduced learning rate (typically 0.1x-0.01x the pretraining rate). The real data set is smaller (500-5,000 episodes) but high-quality. Use light augmentation in this phase (color jitter, random crop) — heavy augmentation is unnecessary because the real data distribution is what we want to match. Fine-tune until validation performance on a held-out real-world test set peaks.

Data mixing: Some practitioners mix synthetic and real data during fine-tuning rather than training sequentially. Use a sampling ratio of 3:1 to 5:1 (synthetic:real) in each batch to prevent the small real dataset from being overwhelmed. Gradually increase the real data proportion over the course of fine-tuning (curriculum approach).

Deployment validation: Before deploying, run a final evaluation on 100+ real-world trials covering the full task diversity. Track not just success rate but also the distribution of failure modes — if the failure distribution shifts after fine-tuning (fewer perception failures, more dynamics failures), this indicates the real data improved perception but more dynamics randomization or dynamics-specific real data is needed. Document the full training recipe (sim dataset, randomization config, real dataset, hyperparameters) for reproducibility.

PyTorch (multi-source DataLoader)wandbRobot hardware

Tip: Keep the sim-pretrained checkpoint as a fallback — if fine-tuning degrades performance on some task variants (overfitting to the small real dataset), you can restart from the sim checkpoint with different fine-tuning hyperparameters

Tip: Keep the sim-pretrained checkpoint as a fallback. If fine-tuning degrades performance on some task variants, restart from the sim checkpoint.

Tip: Test the full training recipe end-to-end before committing to a large real-data collection campaign.

Tools & Technologies

NVIDIA Isaac SimNVIDIA ReplicatorMuJoCoPyTorchBlender (for asset creation)Pythonwandb (for experiment tracking)

References

  1. [1]Tobin et al.. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.” IROS 2017, 2017. Link
  2. [2]Sadeghi & Levine. CAD2RL: Real Single-Image Flight without a Single Real Image.” RSS 2017, 2017. Link
  3. [3]OpenAI et al.. Solving Rubik's Cube with a Robot Hand.” arXiv 1910.07113, 2019. Link

How Claru Can Help

Claru provides sim-to-real data pipelines that pair domain-randomized synthetic data with matched real-world demonstrations. We configure randomization pipelines in Isaac Sim and MuJoCo tuned to your specific task and target environment, generate large-scale synthetic datasets with verified distribution coverage, and collect the real-world fine-tuning data needed to close the sim-to-real gap. Our iterative tuning process uses real-world failure analysis to adjust randomization ranges until the combined pipeline meets deployment success rate targets. We deliver the full package: synthetic dataset, randomization configuration, real-world dataset, and documented training recipe.

Why Domain Randomization Is Essential for Sim-to-Real Transfer

Domain randomization is the technique of training robot policies in simulation with deliberately varied visual appearance, physics parameters, and dynamics so that the real world appears as just another variation the policy has already encountered. Introduced by Tobin et al. (2017) at OpenAI for object detection and extended by Sadeghi and Levine (2017) for drone navigation, domain randomization has become the standard approach for sim-to-real transfer. The core idea is simple: if the policy trains on textures ranging from checkerboard patterns to photorealistic wood grain, real-world tabletop textures fall within the training distribution. If physics parameters vary from 0.5x to 2x the nominal values, real-world physics (which are uncertain) fall within the trained range.

The alternative to domain randomization — system identification, where you carefully measure and replicate every real-world parameter in simulation — is fragile. Perfect system identification is impossible for real-world environments where lighting changes with time of day, surface friction depends on humidity, and object masses vary across individual items. Domain randomization embraces this uncertainty by training on a distribution of parameters wide enough that reality is covered. The cost is that the policy must be robust to a wider range of conditions than it will actually encounter, which may slightly reduce peak performance in any single condition but dramatically improves reliability across the full range of real-world conditions.

Automatic Domain Randomization: Learning the Optimal Ranges

Manual tuning of randomization ranges is time-consuming and suboptimal. Automatic Domain Randomization (ADR), introduced by OpenAI for the Dactyl project, automates range tuning by progressively expanding randomization ranges as the policy improves. The algorithm starts with narrow ranges (close to the nominal simulation parameters) and widens each range incrementally whenever the policy's success rate exceeds a threshold (typically 80%). If success rate drops below a lower threshold (typically 50%), the range is narrowed. This creates an adaptive curriculum where the policy is always trained at the edge of its capability.

Implementing ADR requires: (1) a performance metric that can be computed in simulation (task success rate), (2) a schedule for range updates (every 100-1,000 episodes), and (3) independent tracking of each randomization axis so they can expand at different rates. Not all axes need ADR — visual randomization parameters (textures, lighting) can usually be set to wide ranges without destabilizing training, while physics parameters (friction, mass) need tighter control. Use ADR for physics and dynamics parameters where the optimal range is unknown, and manual wide ranges for visual parameters. Track all range updates in a log file alongside the training checkpoint for reproducibility.

Frequently Asked Questions

The right amount of randomization is the minimum range that makes real-world conditions fall within the training distribution. Under-randomization causes sim-to-real failure (the real world looks like an out-of-distribution input). Over-randomization wastes training compute and can degrade performance by forcing the policy to be robust to conditions it will never encounter. Start with published ranges from successful sim-to-real papers (e.g., lighting intensity 0.5x-2x, friction 0.3-1.0, camera position +-5cm), train, test in the real world, and iteratively narrow or widen ranges based on where failures occur.

NVIDIA Isaac Sim is the current best option for visual domain randomization — it provides physically-based rendering (PBR) with ray tracing, built-in randomization APIs (Replicator), and large asset libraries. MuJoCo is better for physics-focused randomization (contact dynamics, actuator modeling) and runs 10-100x faster for non-visual policies. For a full pipeline, use Isaac Sim for perception pretraining (visual randomization) and MuJoCo for policy optimization (fast physics). PyBullet is a free alternative but has weaker rendering and less accurate contact physics.

For perception-only tasks (object detection, segmentation), domain randomization can achieve competitive performance with zero real-world images — Tobin et al. demonstrated sim-only training for object detection that transferred directly. For contact-rich manipulation (grasping, insertion, assembly), domain randomization alone typically achieves 60-80% of real-world-trained performance due to the sim-to-real gap in contact dynamics. The practical approach is sim-pretrained with domain randomization, then fine-tuned with 500-2,000 real-world demonstrations. This hybrid reduces the real-data requirement by 5-10x compared to training from scratch on real data.

Start with system identification: measure the real-world value of each parameter (object mass, surface friction, joint damping) and set the randomization center to the measured value. Then set the range to plus or minus 30-50% for well-characterized parameters (mass, dimensions) and plus or minus 100% for poorly characterized parameters (friction, restitution). The key constraint is that simulation must remain stable across the full range — if MuJoCo produces NaN values or exploding physics at extreme parameter settings, narrow the range. Validate by running 1,000 randomized episodes and checking that the physics failure rate is below 1%. Published ranges from successful sim-to-real papers provide good starting points: OpenAI's Rubik's Cube hand used friction 0.5-1.5, mass 0.7x-1.3x, and actuator gains 0.75x-1.5x.

Randomize all axes simultaneously during training — the real world presents all sources of variation at once, so training on individual randomization axes does not prepare the policy for the combined effect. However, during debugging and hyperparameter tuning, ablate one axis at a time to understand which randomization axes have the most impact. A common pattern is: train with all randomization, evaluate on real hardware, identify the primary failure mode (visual, physics, or dynamics), then run an ablation where you remove that specific randomization axis and measure the performance drop. If removing visual randomization causes a 30% drop but removing physics randomization causes only 5%, focus your iteration effort on visual randomization range tuning.

Need Sim-to-Real Data Services?

Claru provides sim-to-real data pipelines: domain-randomized simulation data plus matched real-world demonstrations for fine-tuning. Tell us your task and target platform.