LIBERO Alternative: Real-World Training Data for Production Robotics

LIBERO pioneered benchmarking for lifelong robot learning with 130 simulated manipulation tasks. But simulation-only data hits a wall when you need to deploy in the real world. Compare LIBERO with Claru's production-grade, real-world data collection service.

LIBERO Profile

Institution

UT Austin

Year

2023

Scale

130 tasks across 5 task suites, ~6,500 expert demonstrations in simulation

License

MIT License

Modalities
128x128 RGB images (agentview + wrist camera)7-DoF joint positions and velocitiesGripper stateEnd-effector poseTemplate language instructions

How Claru Helps Teams Beyond LIBERO

LIBERO has established itself as the standard benchmark for lifelong robot learning, and its structured task suites provide a rigorous framework for comparing continual learning algorithms. However, the transition from benchmarking to deployment reveals LIBERO's core limitation: it is entirely simulated. Every demonstration lives inside robosuite's physics engine, with idealized rendering, perfect sensor models, and deterministic dynamics. Policies that achieve high success rates on LIBERO's evaluation suite routinely struggle on physical hardware, where visual textures differ, contact dynamics are stochastic, and sensor noise is ever-present. Claru directly addresses this gap by collecting teleoperated demonstrations on your actual robot hardware, in the environments where your system will operate. Our data captures the visual complexity, physical dynamics, and sensor characteristics that simulation cannot replicate. Teams that have validated their algorithm on LIBERO can use Claru's real-world data for the critical fine-tuning phase that takes a research prototype to a deployable product. We deliver data in RLDS, HDF5, or LeRobot format with standardized observation and action schemas, making it straightforward to integrate Claru demonstrations into the same training pipeline you used for LIBERO. The result is a policy that inherits broad manipulation priors from simulation while grounding its behavior in the real-world data it needs to succeed in production.

What Is LIBERO?

LIBERO (Lifelong Robotic Benchmark for Learning) is a benchmark suite published by Bo Liu, Yifeng Zhu, Chongkai Gao, and colleagues at UT Austin in 2023. It was designed specifically to evaluate lifelong and continual learning algorithms for robotic manipulation. The benchmark organizes 130 procedurally generated tasks into five task suites -- LIBERO-Spatial, LIBERO-Object, LIBERO-Goal, LIBERO-Long, and LIBERO-100 -- each isolating a different axis of task variation so researchers can measure how well policies transfer knowledge across sequentially presented tasks.

All demonstrations in LIBERO are collected in the BDDL-based simulation environment built on the robosuite framework, using a Franka Emika Panda arm with a parallel-jaw gripper. Each task suite contains carefully controlled variations: LIBERO-Spatial changes object layouts while holding objects and goals fixed, LIBERO-Object swaps in novel objects while preserving spatial relationships, and LIBERO-Goal introduces new manipulation objectives. LIBERO-Long chains multiple subtasks into extended sequences, while LIBERO-100 provides a larger-scale evaluation with 100 diverse tasks. Each task comes with 50 expert demonstrations, yielding approximately 6,500 total demonstrations across the full benchmark.

The dataset records 128x128 pixel agentview and eye-in-hand RGB images, 7-DoF joint positions and velocities, gripper state, and end-effector pose at each timestep. Language instructions accompany every task (e.g., 'pick up the red mug and place it on the shelf'). LIBERO is released under the MIT License, making it one of the most permissively licensed robotics benchmarks available. It has become a standard evaluation platform for continual learning methods, with adoption by research groups studying policy architectures like Diffusion Policy, ACT, and various transformer-based approaches.

Since its release, LIBERO has been cited extensively in lifelong learning and multitask robotics research. Its structured task decomposition makes it especially useful for ablation studies. However, its value is fundamentally as a benchmark for algorithmic comparison rather than as a source of training data for real-world deployment.

LIBERO at a Glance

130
Manipulation Tasks
5
Task Suites
~6,500
Expert Demonstrations
50
Demos per Task
128x128
Image Resolution
1
Robot (Simulated Franka)

LIBERO vs. Claru: Side-by-Side Comparison

A detailed comparison across the dimensions that matter most when moving from research benchmarking to production deployment.

DimensionLIBEROClaru
Data SourceSimulation only (robosuite/BDDL)Real-world teleoperated demonstrations
Scale~6,500 demos across 130 tasks1K to 1M+ demos, scaled to your needs
Robot PlatformSimulated Franka Panda onlyAny physical robot you deploy
Image Resolution128x128 RGB (agentview + wrist)Up to 4K RGB, configurable multi-view
Sensor ModalitiesRGB images, joint state, gripper stateRGB + depth + force/torque + proprioception + tactile
Task CustomizationFixed 130 tasks, procedurally generatedCustom tasks designed for your deployment scenario
Environment DiversitySingle simulated tabletop sceneReal kitchens, warehouses, labs, factories
Language AnnotationsTemplate-based task instructionsFree-form natural language with multi-annotator agreement
LicenseMIT LicenseCommercial license with IP assignment
Ongoing CollectionStatic benchmark (version-based updates)Continuous collection and iterative expansion

Key Limitations of LIBERO for Production Use

The most fundamental limitation of LIBERO for production robotics is the sim-to-real gap. Every demonstration in the dataset is generated in simulation with perfect physics, deterministic lighting, and idealized object geometries. Policies trained exclusively on LIBERO data routinely fail when transferred to physical robots due to visual domain shift, unmodeled contact dynamics, and sensor noise that the simulation does not capture. While sim-to-real transfer techniques like domain randomization can narrow this gap, they cannot close it entirely for contact-rich manipulation.

LIBERO's image resolution of 128x128 pixels is well below what production vision systems require. Modern visuomotor policies increasingly rely on high-resolution inputs (640x480 or above) to perceive fine-grained object features like texture, deformation, and small-part geometry. The low resolution was a practical choice for benchmark speed but limits the visual complexity of tasks that can be meaningfully evaluated.

The dataset is restricted to a single robot morphology -- the Franka Emika Panda with a parallel-jaw gripper. Teams deploying different arm configurations, dexterous hands, mobile manipulators, or dual-arm systems cannot directly benefit from LIBERO data. The simulated Franka also lacks the kinematic imprecision and joint compliance that characterize real hardware.

Task diversity, while impressive for a benchmark, is concentrated on tabletop manipulation in a single simulated kitchen-like scene. Real production environments span warehouses, retail shelves, hospital rooms, and manufacturing lines -- none of which are represented. The procedurally generated task variations control for specific factors (spatial, object, goal) but do not capture the unconstrained variability of real-world deployment.

Finally, LIBERO contains no force/torque, tactile, or depth data. For contact-rich tasks like insertion, packing, or tool use, proprioceptive and haptic feedback is critical for reliable execution. The absence of these modalities makes LIBERO unsuitable as training data for policies that must handle compliant or force-sensitive manipulation.

When to Use LIBERO vs. Commercial Data

LIBERO is the right choice when your goal is algorithmic benchmarking. If you are publishing a paper on continual learning, multitask policy architectures, or knowledge transfer across sequential task distributions, LIBERO's controlled task suites provide exactly the structure needed for clean experimental comparisons. The five-suite decomposition lets you isolate whether your method improves on spatial generalization, object generalization, or long-horizon reasoning -- distinctions that would be confounded in unstructured real-world data.

LIBERO is also useful for rapid prototyping. Because it runs entirely in simulation, you can iterate on policy architectures and training recipes without hardware overhead. Teams that are still in the algorithm development phase, before they have committed to a specific robot platform or deployment domain, can use LIBERO to validate ideas cheaply.

Switch to Claru when you have a specific deployment target. If you know which robot you are shipping, which environment it will operate in, and which tasks it must execute, you need data that matches those exact conditions. Simulation data cannot capture the visual texture of your specific warehouse shelving, the mass distribution of your specific product SKUs, or the backlash in your specific actuators. Claru collects teleoperated demonstrations on your hardware, in your space, with the sensor suite you will use in production.

The strongest approach for many teams is a combined pipeline: pretrain on simulation data (LIBERO or similar) for broad manipulation priors, then fine-tune on Claru's real-world demonstrations for deployment-specific performance. This combination leverages LIBERO's task diversity for general capability while relying on real data to close the sim-to-real gap.

How Claru Complements LIBERO

Claru's data collection service directly addresses LIBERO's core gaps. Where LIBERO provides simulated demonstrations on a single robot in a single scene, Claru deploys trained teleoperators to collect demonstrations on your physical robot in your actual deployment environment. This eliminates the sim-to-real transfer problem entirely -- the training data is real data from the start.

For teams that have built initial policies using LIBERO, Claru provides the fine-tuning data layer that bridges the gap to production. Our demonstrations are collected at the resolution, frame rate, and sensor configuration your policy expects. We support multi-view setups with calibrated extrinsics, synchronized force/torque and tactile streams, and high-frequency proprioceptive logging -- modalities that simply do not exist in LIBERO.

Claru also scales beyond LIBERO's fixed 50-demo-per-task ceiling. Production policies often require hundreds or thousands of demonstrations per task variant to achieve the reliability needed for deployment. We can collect 10,000+ demonstrations for a single task across environmental variations (different lighting, different object instances, different operator styles) that stress-test your policy's robustness in ways simulation cannot.

Data is delivered in your preferred format -- RLDS, HDF5, zarr, or LeRobot -- with standardized schemas that plug directly into training pipelines. Every demonstration passes our multi-stage quality control process: automated trajectory validation, visual inspection for failure modes, and inter-annotator agreement checks on language annotations. The result is data that is not only real but production-grade.

References

  1. [1]Liu et al.. LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning.” NeurIPS 2023, 2023. Link
  2. [2]Chi et al.. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion.” RSS 2023, 2023. Link
  3. [3]Zhao et al.. Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware.” RSS 2023, 2023. Link
  4. [4]Zhu et al.. robosuite: A Modular Simulation Framework and Benchmark for Robot Learning.” arXiv 2020, 2020. Link

Frequently Asked Questions

LIBERO is designed as a research benchmark for evaluating continual and lifelong learning algorithms, not as a training data source for production systems. Its 6,500 simulated demonstrations at 128x128 resolution lack the visual fidelity, sensor diversity, and environmental realism needed for reliable real-world deployment. It is best used for algorithmic comparison and prototyping, with real-world data added for production fine-tuning.

Yes. LIBERO is released under the MIT License, which permits commercial use without restriction. However, the practical limitation is that simulation-only data typically requires substantial additional real-world data to achieve production-level performance, which is where a service like Claru comes in.

Policies trained exclusively on LIBERO's simulated data experience significant performance degradation when deployed on physical robots. The gap manifests in visual perception (synthetic vs. real textures and lighting), dynamics (idealized vs. noisy contact physics), and sensor characteristics (clean simulation vs. noisy real sensors). Domain randomization helps but does not fully close this gap for contact-rich tasks.

The most effective approach is a pretrain-then-fine-tune pipeline. Use LIBERO (or similar simulation benchmarks) during pretraining to learn broad manipulation primitives and spatial reasoning. Then fine-tune on real-world demonstrations from Claru that match your exact robot, environment, and task requirements. This typically outperforms either data source alone.

Claru does not replicate LIBERO's specific task suite, because those tasks are defined within a simulation environment. Instead, Claru collects demonstrations for the real-world tasks you actually need -- pick-and-place, packing, assembly, tool use, or any custom manipulation task on your specific robot platform and in your actual deployment environment.

Close the Sim-to-Real Gap

Get real-world demonstrations on your robot, in your environment, at the scale your policy needs. Talk to our team about complementing your LIBERO research with production-grade training data.