Real-World Data for LIBERO
LIBERO evaluates lifelong robot learning — whether policies acquire new skills without forgetting old ones. Real-world data tests this in authentic, visually changing conditions.
LIBERO at a Glance
LIBERO Task Suites
Each suite isolates a different generalization axis, revealing which aspects of manipulation knowledge transfer and which cause interference.
| Suite | Tasks | Generalization Axis | What It Measures |
|---|---|---|---|
| LIBERO-Spatial | 10 | Spatial arrangement | Can the policy adapt when objects are in new locations? |
| LIBERO-Object | 10 | Object categories | Can the policy generalize grasping to new object types? |
| LIBERO-Goal | 10 | Task specification | Can the policy understand new goals with familiar objects? |
| LIBERO-Long | 10 | Temporal complexity | Can the policy handle multi-step task sequences? |
| LIBERO-90 | 90 | Comprehensive | Full-scale continual learning across diverse tasks |
LIBERO vs. Related Benchmarks
| Feature | LIBERO | CALVIN | Meta-World | RLBench |
|---|---|---|---|---|
| Primary evaluation | Continual learning (forgetting + transfer) | Sequential task chaining | Multi-task / meta-learning | Multi-task success rate |
| Task count | 130 | 34 | 50 | 100 |
| Language conditioning | Templated language goals | Free-form natural language | Task ID only | Task name only |
| Sequential protocol | Sequential task suites (continual) | Sequential task chains (single episode) | Multi-task simultaneous | Multi-task simultaneous |
| Physics engine | MuJoCo (robosuite) | PyBullet | MuJoCo | CoppeliaSim |
Benchmark Profile
LIBERO (LIfelong BEnchmark for RObot learning) evaluates lifelong and continual learning for robot manipulation. Created by Liu et al. at UT Austin and NVIDIA, published at NeurIPS 2024, it tests whether robot policies can learn new tasks without forgetting previously learned ones, using 130 language-annotated manipulation tasks organized into 5 evaluation suites built on the robosuite/MuJoCo framework.
The Sim-to-Real Gap
LIBERO uses robosuite environments with MuJoCo physics, sharing that framework's simplified contact modeling. The continual learning evaluation assumes sequential task presentation in fixed order, which does not match real deployment where tasks arrive unpredictably and concurrently. All LIBERO tasks share the same visual renderer, which means policies may learn simulation-specific visual shortcuts for retaining knowledge across tasks rather than robust physical understanding.
Real-World Data Needed
Sequential real-world task demonstrations showing progressive skill acquisition across changing environments. Data from evolving environments where new objects, tools, and tasks appear over time while visual and physical conditions change. Visual diversity across task suites to prevent policies from learning renderer-specific features. Data that captures both the learning of new skills and the maintenance of old ones in authentic conditions.
Complementary Claru Datasets
Egocentric Activity Dataset
Real human activity across 100+ diverse environments naturally demonstrates continual learning — skills acquired in one kitchen transfer and adapt to others, with authentic visual and physical variation.
Manipulation Trajectory Dataset
Diverse manipulation data across many task types and environments provides the variety and volume needed for evaluating real-world lifelong learning robustness.
Custom Sequential Task Collection
Purpose-collected data with tasks introduced progressively in real environments mirrors LIBERO's continual learning evaluation with authentic visual and physical variation between task phases.
Bridging the Gap: Technical Analysis
LIBERO addresses a fundamental challenge for deployed robots: they must learn new tasks over their lifetime without forgetting how to do old ones. This continual learning problem is well-studied in computer vision but underexplored in robotics, where each new task involves both visual perception changes and motor skill adaptation.
The five task suites cleverly isolate different generalization axes. LIBERO-Spatial tests whether a policy can adapt when objects move to new locations. LIBERO-Object tests generalization when new object categories appear. LIBERO-Goal tests understanding of novel task specifications with familiar objects. LIBERO-Long tests multi-step sequencing. This structured evaluation reveals which aspects of manipulation knowledge transfer naturally and which must be explicitly preserved through continual learning mechanisms.
However, LIBERO's simulation environment means all tasks share the same visual renderer, physics engine, and workspace geometry. A policy might develop continual learning strategies that exploit simulation-specific invariances — texture consistency, lighting uniformity, deterministic physics — rather than building robust physical understanding. Real-world continual learning is harder because visual and physical conditions change unpredictably between task acquisition phases.
The catastrophic forgetting problem is more severe in the real world because visual domain shifts between environments compound the task-level forgetting. A robot that learns to pick up cups in a bright lab and then learns drawer manipulation in a dim workshop may forget cup picking not because of task interference but because the visual features have shifted. LIBERO cannot measure this coupled visual-task forgetting because its visual domain is constant.
Real-world data for lifelong learning must capture the natural progression of skill acquisition across changing environments. Claru's dataset collection across 100+ locations naturally provides this — different environments present different manipulation challenges, visual conditions, and object sets, mirroring the evolving deployment contexts that real robots face throughout their operational lifetime.
Key Papers
- [1]Liu et al.. “LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning.” NeurIPS 2024, 2024. Link
- [2]Kirkpatrick et al.. “Overcoming Catastrophic Forgetting in Neural Networks.” PNAS 2017, 2017. Link
- [3]Mandlekar et al.. “RoboMimic: A Framework for Studying Robotic Manipulation Policy Learning.” CoRL 2022, 2022. Link
- [4]Zhu et al.. “robosuite: A Modular Simulation Framework and Benchmark for Robot Learning.” arXiv 2009.12293, 2020. Link
Frequently Asked Questions
Lifelong learning means a robot acquires new manipulation skills over time without forgetting previously learned ones. LIBERO evaluates this by sequentially presenting task suites and measuring forward transfer (does old knowledge help with new tasks?), backward transfer (does new learning harm old skills?), and average success across all accumulated tasks.
LIBERO's simulation uses consistent visual rendering and physics across all tasks, so policies may learn simulation-specific shortcuts for retaining knowledge. Real-world continual learning involves changing visual conditions, new physical environments, and unpredictable task arrival — coupled domain shifts that LIBERO's constant simulation environment cannot measure.
Catastrophic forgetting occurs when training a neural network on new tasks causes it to lose performance on previously learned tasks. In manipulation, this means a robot that learns drawer opening and then learns stacking might suddenly fail at drawer opening. LIBERO's backward transfer metric directly quantifies this forgetting across its 130 tasks.
Diverse environments force policies to learn robust representations that transfer across conditions rather than memorizing environment-specific features. Training on manipulation in many different real kitchens builds visual features that generalize when the robot encounters a new kitchen — exactly the cross-environment transfer that lifelong learning requires for deployment.
Each suite isolates one generalization axis. LIBERO-Spatial varies object locations. LIBERO-Object introduces new object categories. LIBERO-Goal changes task specifications with familiar objects. LIBERO-Long tests multi-step sequences. LIBERO-90 provides comprehensive evaluation with 90 diverse tasks. Together, they reveal which generalization dimensions cause the most forgetting and the most transfer.
Get Sequential Task Learning Data
Discuss diverse, progressively structured data for lifelong robot learning research with authentic environmental variation.