Open X-Embodiment Alternative: Targeted Commercial Data for Production Robotics

Open X-Embodiment aggregated 1M+ demonstrations from 22 robot embodiments into the largest cross-robot dataset ever assembled. But mixed licensing, inconsistent quality, and heterogeneous action spaces create real challenges for production teams. Compare OXE with Claru's uniform, commercially licensed data.

Open X-Embodiment Profile

Institution

Google DeepMind + 21 academic institutions

Year

2023

Scale

1M+ demonstrations from 22 robot embodiments, 527+ skills, 160K+ task configurations

License

Mixed (varies per constituent dataset -- some non-commercial)

Modalities

RGB video (resolution varies per subset)Proprioception (format varies per embodiment)Action labels (heterogeneous: 2-DoF to 8-DoF, deltas vs. absolutes)Language annotations (partial -- some subsets have none)

How Claru Helps Teams Beyond Open X-Embodiment

Open X-Embodiment is the single most important dataset for pretraining generalist robot policies, and its cross-embodiment diversity enables the kind of broad manipulation priors that single-robot datasets cannot provide. However, the transition from generalist pretraining to production deployment reveals OXE's structural limitations: mixed licensing creates legal risk, inconsistent quality introduces training noise, and the data inevitably mismatches your specific robot platform and deployment environment. Claru addresses each of these gaps. We provide commercially licensed, quality-controlled demonstrations collected on your exact robot, in your actual facility, with standardized action spaces and full multi-modal sensor coverage. Teams that pretrain on OXE and fine-tune on Claru data consistently achieve higher success rates than either source alone, because the combination captures OXE's breadth for general manipulation reasoning while grounding policy behavior in the real-world specifics of the deployment domain. We deliver in RLDS format for seamless integration with OXE-based training pipelines, and our ongoing collection service means your fine-tuning dataset grows alongside your deployment requirements rather than remaining static.

What Is Open X-Embodiment?

Open X-Embodiment (OXE) is a collaborative dataset and model initiative led by Google DeepMind in partnership with 21 academic institutions worldwide. Published in 2023 and presented at ICRA 2024, it represents the most ambitious effort to date to unify robot learning data across diverse embodiments. The dataset aggregates over 1 million robot demonstrations from 22 distinct robot platforms -- including the Google Robot, WidowX Bridge, Franka, Kuka, xArm, Tiago, and others -- spanning over 527 skills across 160,000+ task configurations.

The core motivation behind OXE was to enable generalist robot policies that transfer across embodiments. The project demonstrated this with RT-1-X and RT-2-X, showing that policies trained on the combined OXE dataset outperformed those trained on any single constituent dataset on most platforms. Each contributing dataset was converted to the RLDS (Reinforcement Learning Datasets) format hosted on TensorFlow Datasets, establishing RLDS as a de facto standard for robot learning data interchange.

OXE's constituent datasets vary enormously in character. Bridge Data V2 contributes ~60,000 demonstrations of kitchen manipulation with a WidowX arm. RT-1 contributes ~130,000 demonstrations from Google's mobile manipulators. Fractal and Kuka contribute industrial arm data. Some datasets include language annotations; others do not. Some record at 5 Hz; others at 30 Hz. Action spaces range from 2-DoF end-effector deltas to 7-DoF joint velocities. This heterogeneity is both OXE's strength (diversity) and its challenge (inconsistency).

The dataset is released under mixed licensing that varies per constituent dataset. Some subsets like Bridge V2 use permissive licenses, while others carry non-commercial restrictions. OXE has become the standard pretraining corpus for generalist robot policies including Octo, OpenVLA, and CrossFormer, and its RLDS format has been widely adopted by the robot learning community.

OXE at a Glance

1M+

Robot Demonstrations

Robot Embodiments

527+

Skills

160K+

Task Configurations

Contributing Institutions

Mixed

Licensing (per subset)

Open X-Embodiment vs. Claru: Side-by-Side Comparison

A detailed comparison across the dimensions that matter when moving from research pretraining to production deployment.

Dimension	Open X-Embodiment	Claru
Data Source	Aggregated from 21 institutions (mixed real + sim)	Collected on your specific robot in your environment
Scale	1M+ demos across 22 embodiments	1K to 1M+ demos, scoped to your deployment
Quality Consistency	Varies widely across constituent datasets	Uniform production QC with >90% annotator agreement
Action Space	Heterogeneous (2-DoF to 7-DoF, deltas vs. absolutes)	Standardized to match your robot's control interface
Language Annotations	Partial -- some subsets have none	100% coverage with validated natural language
Sensor Modalities	Primarily RGB; depth/force/tactile sparse	RGB + depth + force/torque + proprioception + tactile
License	Mixed per subset -- some non-commercial	Single commercial license with IP assignment
Robot Specificity	22 platforms (none may match yours)	Data on your exact platform and end-effector
Environment Match	Research labs across 21 institutions	Your actual deployment environment
Expansion	Static aggregation with periodic additions	Continuous collection on your timeline

Key Limitations of OXE for Production Use

OXE's most pressing production challenge is licensing. Because the dataset aggregates contributions from 21 institutions, each subset carries its own license. Several subsets restrict commercial use, and the licensing status of some contributions is ambiguous. Production teams must audit every subset they train on to confirm commercial rights -- a legal burden that grows with each new constituent added to the collection. There is no single 'OXE commercial license' that covers the full dataset.

Quality consistency is a systemic issue. OXE's constituent datasets were collected by different labs, with different teleoperation systems, different data quality standards, and different annotation practices. Some datasets contain carefully curated demonstrations with language annotations; others include noisy trajectories or failed attempts with no language labels. The RT-1 subset is known for high quality, while some smaller contributions have minimal curation. Training on the full mix without filtering often introduces noise that degrades performance on specific deployment domains.

Action space heterogeneity creates significant engineering challenges. OXE datasets use different action representations -- some record end-effector position deltas, others record joint velocities, and some use absolute poses. Action dimensions vary from 2-DoF (planar pushing) to 7-DoF (full arm) to 8-DoF (arm + gripper). Normalizing these into a common space for cross-embodiment training is a non-trivial preprocessing step that every user must implement, and the normalization choices themselves affect policy performance.

Temporal resolution varies from 3 Hz to 30 Hz across subsets, meaning that a 'step' in one dataset represents a fundamentally different time quantum than in another. This temporal inconsistency complicates training of action-chunking architectures that assume a fixed control frequency.

Finally, OXE's environments are overwhelmingly research labs. The visual backgrounds, lighting conditions, and scene compositions reflect academic settings, not the production environments (factories, warehouses, commercial kitchens, retail floors) where robots actually deploy. Policies pretrained on OXE still require substantial environment-specific data to generalize to real deployment conditions.

When to Use OXE vs. Commercial Data

OXE is the clear choice for pretraining generalist robot policies. Its cross-embodiment diversity teaches policies to extract embodiment-agnostic representations -- spatial reasoning, object affordances, and manipulation primitives that transfer across robot morphologies. This is precisely why Octo, OpenVLA, and similar foundation models use OXE as their pretraining corpus. If you are building or fine-tuning a generalist policy, starting from an OXE-pretrained checkpoint gives you a significant head start over training from scratch.

OXE is also valuable for cross-embodiment research. If your research question involves transfer learning across robot platforms, few-shot adaptation to new embodiments, or the relationship between pretraining diversity and downstream performance, OXE provides the multi-embodiment data you need for rigorous experiments.

Switch to commercial data when you need production reliability on a specific deployment. OXE's broad diversity is a liability when your goal is high success rates on a narrow task set. A policy trained on 1M demonstrations from 22 robots does not necessarily outperform one fine-tuned on 10K high-quality demonstrations from your specific robot doing your specific tasks. Claru provides the targeted, high-quality, domain-specific data that transforms a generalist pretrained model into a production-ready policy.

The predominant industry pattern is OXE for pretraining, then commercial fine-tuning data for deployment. This combination captures OXE's breadth for general capability while using Claru's depth for deployment-specific performance.

How Claru Complements OXE

Claru provides the fine-tuning data layer that makes OXE-pretrained policies production-ready. Where OXE gives you breadth across 22 embodiments, Claru gives you depth on your single deployment target. Our demonstrations are collected by trained teleoperators on your physical robot, in your facility, manipulating your actual objects -- eliminating the domain gap between research-lab data and production conditions.

Unlike OXE's heterogeneous quality, every Claru demonstration passes a multi-stage quality pipeline. Automated checks flag kinematic anomalies, excessive force events, and out-of-workspace excursions. Human reviewers verify task completion and validate language annotations against the demonstrated behavior. The result is a uniformly high-quality dataset where every trajectory represents a successful, well-executed demonstration of the target task.

Claru data comes with a single, clear commercial license that covers all delivered demonstrations. There is no need to audit subsets or navigate mixed licensing terms. You receive full IP rights to the data, suitable for training models deployed in commercial products.

We deliver in RLDS format (matching OXE's standard) as well as HDF5, zarr, and LeRobot. The action space, observation space, and control frequency are standardized to your robot's specifications from the start -- no normalization pipeline required. Claru data can be mixed directly with OXE subsets for co-training or used independently for fine-tuning on a pretrained checkpoint.

References

[1]Open X-Embodiment Collaboration. “Open X-Embodiment: Robotic Learning Datasets and RT-X Models.” ICRA 2024, 2024. Link
[2]Octo Model Team. “Octo: An Open-Source Generalist Robot Policy.” RSS 2024, 2024. Link
[3]Kim et al.. “OpenVLA: An Open-Source Vision-Language-Action Model.” CoRL 2024, 2024. Link
[4]Brohan et al.. “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control.” CoRL 2023, 2023. Link
[5]Walke et al.. “BridgeData V2: A Dataset for Robot Learning at Scale.” CoRL 2023, 2023. Link

Frequently Asked Questions

Yes, OXE is currently the best available pretraining corpus for generalist robot policies. Models like Octo and OpenVLA demonstrate that OXE pretraining provides strong manipulation priors. However, fine-tuning on domain-specific data is essential for production deployment. Claru provides the targeted fine-tuning data that converts a generalist OXE-pretrained model into a reliable production policy.

OXE aggregates datasets with different licenses from 21 institutions. Some subsets (Bridge V2, Fractal) have permissive licenses suitable for commercial use, while others carry non-commercial restrictions. There is no single commercial license covering the full dataset. Production teams must audit every subset they use. Claru provides all data under a single commercial license with clear IP assignment.

Quality varies significantly. RT-1 data from Google is meticulously curated with consistent annotation. Bridge V2 is large-scale but collected by diverse operators. Some smaller contributions have minimal curation or incomplete annotations. When fine-tuning on OXE subsets, careful data selection and filtering are critical -- training on everything indiscriminately often hurts performance on specific domains.

Yes. Claru delivers data in RLDS format, matching OXE's standard. Our data can be added directly to your OXE training mix as an additional high-quality subset representing your target domain, or used independently for fine-tuning an OXE-pretrained checkpoint. We standardize action spaces and observation formats to be compatible with common OXE training pipelines.

Research suggests that a few hundred to a few thousand high-quality demonstrations on your target robot and task can dramatically boost performance over OXE pretraining alone. The Octo paper showed significant gains with as few as 50-200 domain-specific demonstrations. Claru typically recommends starting with 500-2,000 demonstrations per task and scaling based on measured performance.

Related Resources

Glossary

Open X Embodiment→

Glossary

Cross Embodiment Data→

Glossary

Rlds→

Glossary

Foundation Model Robotics→

Glossary

Vla→

Guide

How To Build A Cross Embodiment Dataset→

Guide

How To Convert Data To Rlds Format→

Solution

Vla Training Data→

Go From Generalist to Production-Ready

Complement your OXE pretraining with targeted, commercially licensed data collected on your robot and in your environment. Get a custom data collection plan from our team.

Get in Touch Browse the Data Catalog