Real-World Data for LIBERO

LIBERO evaluates lifelong robot learning — whether policies acquire new skills without forgetting old ones. Real-world data tests this in authentic, visually changing conditions.

LIBERO at a Glance

130

Tasks

Task Suites

MuJoCo

Physics Engine

Lifelong

Learning Focus

20 Hz

Control Freq

2024

Published

LIBERO Task Suites

Each suite isolates a different generalization axis, revealing which aspects of manipulation knowledge transfer and which cause interference.

Suite	Tasks	Generalization Axis	What It Measures
LIBERO-Spatial	10	Spatial arrangement	Can the policy adapt when objects are in new locations?
LIBERO-Object	10	Object categories	Can the policy generalize grasping to new object types?
LIBERO-Goal	10	Task specification	Can the policy understand new goals with familiar objects?
LIBERO-Long	10	Temporal complexity	Can the policy handle multi-step task sequences?
LIBERO-90	90	Comprehensive	Full-scale continual learning across diverse tasks

LIBERO vs. Related Benchmarks

Feature	LIBERO	CALVIN	Meta-World	RLBench
Primary evaluation	Continual learning (forgetting + transfer)	Sequential task chaining	Multi-task / meta-learning	Multi-task success rate
Task count	130	34	50	100
Language conditioning	Templated language goals	Free-form natural language	Task ID only	Task name only
Sequential protocol	Sequential task suites (continual)	Sequential task chains (single episode)	Multi-task simultaneous	Multi-task simultaneous
Physics engine	MuJoCo (robosuite)	PyBullet	MuJoCo	CoppeliaSim

Benchmark Profile

LIBERO (LIfelong BEnchmark for RObot learning) evaluates lifelong and continual learning for robot manipulation. Created by Liu et al. at UT Austin and NVIDIA, published at NeurIPS 2024, it tests whether robot policies can learn new tasks without forgetting previously learned ones, using 130 language-annotated manipulation tasks organized into 5 evaluation suites built on the robosuite/MuJoCo framework.

Task Set

130 language-annotated manipulation tasks organized into 5 suites that each isolate a different generalization axis: LIBERO-Spatial (10 tasks testing spatial reasoning), LIBERO-Object (10 tasks testing object category generalization), LIBERO-Goal (10 tasks testing goal understanding with the same objects), LIBERO-Long (10 long-horizon multi-step tasks), and LIBERO-90 (90 tasks for comprehensive continual learning evaluation). Each suite is designed so that tasks share some structure but differ on the target axis.

Observation Space

RGB images from agentview camera (128x128) and eye-in-hand wrist camera (128x128), proprioceptive state including 7 joint positions, 7 joint velocities, and gripper aperture, and natural language task descriptions specifying the manipulation goal.

Action Space

7-DOF end-effector pose deltas (3D position delta, 3D orientation delta, gripper open/close) at 20 Hz control frequency on a simulated Franka Panda arm via robosuite's OSC controller.

Evaluation Protocol

Forward transfer (FWT): performance on new tasks after sequential training. Backward transfer (BWT): performance retention on old tasks after learning new ones. Average success rate (ASR) across all learned tasks after full training sequence. The benchmark evaluates catastrophic forgetting by measuring BWT, and positive transfer by measuring whether learning task N helps task N+1.

The Sim-to-Real Gap

LIBERO uses robosuite environments with MuJoCo physics, sharing that framework's simplified contact modeling. The continual learning evaluation assumes sequential task presentation in fixed order, which does not match real deployment where tasks arrive unpredictably and concurrently. All LIBERO tasks share the same visual renderer, which means policies may learn simulation-specific visual shortcuts for retaining knowledge across tasks rather than robust physical understanding.

Real-World Data Needed

Sequential real-world task demonstrations showing progressive skill acquisition across changing environments. Data from evolving environments where new objects, tools, and tasks appear over time while visual and physical conditions change. Visual diversity across task suites to prevent policies from learning renderer-specific features. Data that captures both the learning of new skills and the maintenance of old ones in authentic conditions.

Complementary Claru Datasets

Egocentric Activity Dataset

Real human activity across 100+ diverse environments naturally demonstrates continual learning — skills acquired in one kitchen transfer and adapt to others, with authentic visual and physical variation.

Manipulation Trajectory Dataset

Diverse manipulation data across many task types and environments provides the variety and volume needed for evaluating real-world lifelong learning robustness.

Custom Sequential Task Collection

Purpose-collected data with tasks introduced progressively in real environments mirrors LIBERO's continual learning evaluation with authentic visual and physical variation between task phases.

Bridging the Gap: Technical Analysis

LIBERO addresses a fundamental challenge for deployed robots: they must learn new tasks over their lifetime without forgetting how to do old ones. This continual learning problem is well-studied in computer vision but underexplored in robotics, where each new task involves both visual perception changes and motor skill adaptation.

The five task suites cleverly isolate different generalization axes. LIBERO-Spatial tests whether a policy can adapt when objects move to new locations. LIBERO-Object tests generalization when new object categories appear. LIBERO-Goal tests understanding of novel task specifications with familiar objects. LIBERO-Long tests multi-step sequencing. This structured evaluation reveals which aspects of manipulation knowledge transfer naturally and which must be explicitly preserved through continual learning mechanisms.

However, LIBERO's simulation environment means all tasks share the same visual renderer, physics engine, and workspace geometry. A policy might develop continual learning strategies that exploit simulation-specific invariances — texture consistency, lighting uniformity, deterministic physics — rather than building robust physical understanding. Real-world continual learning is harder because visual and physical conditions change unpredictably between task acquisition phases.

The catastrophic forgetting problem is more severe in the real world because visual domain shifts between environments compound the task-level forgetting. A robot that learns to pick up cups in a bright lab and then learns drawer manipulation in a dim workshop may forget cup picking not because of task interference but because the visual features have shifted. LIBERO cannot measure this coupled visual-task forgetting because its visual domain is constant.

Real-world data for lifelong learning must capture the natural progression of skill acquisition across changing environments. Claru's dataset collection across 100+ locations naturally provides this — different environments present different manipulation challenges, visual conditions, and object sets, mirroring the evolving deployment contexts that real robots face throughout their operational lifetime.

Key Papers

[1]Liu et al.. “LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning.” NeurIPS 2024, 2024. Link
[2]Kirkpatrick et al.. “Overcoming Catastrophic Forgetting in Neural Networks.” PNAS 2017, 2017. Link
[3]Mandlekar et al.. “RoboMimic: A Framework for Studying Robotic Manipulation Policy Learning.” CoRL 2022, 2022. Link
[4]Zhu et al.. “robosuite: A Modular Simulation Framework and Benchmark for Robot Learning.” arXiv 2009.12293, 2020. Link

Frequently Asked Questions

Lifelong learning means a robot acquires new manipulation skills over time without forgetting previously learned ones. LIBERO evaluates this by sequentially presenting task suites and measuring forward transfer (does old knowledge help with new tasks?), backward transfer (does new learning harm old skills?), and average success across all accumulated tasks.

LIBERO's simulation uses consistent visual rendering and physics across all tasks, so policies may learn simulation-specific shortcuts for retaining knowledge. Real-world continual learning involves changing visual conditions, new physical environments, and unpredictable task arrival — coupled domain shifts that LIBERO's constant simulation environment cannot measure.

Catastrophic forgetting occurs when training a neural network on new tasks causes it to lose performance on previously learned tasks. In manipulation, this means a robot that learns drawer opening and then learns stacking might suddenly fail at drawer opening. LIBERO's backward transfer metric directly quantifies this forgetting across its 130 tasks.

Diverse environments force policies to learn robust representations that transfer across conditions rather than memorizing environment-specific features. Training on manipulation in many different real kitchens builds visual features that generalize when the robot encounters a new kitchen — exactly the cross-environment transfer that lifelong learning requires for deployment.

Each suite isolates one generalization axis. LIBERO-Spatial varies object locations. LIBERO-Object introduces new object categories. LIBERO-Goal changes task specifications with familiar objects. LIBERO-Long tests multi-step sequences. LIBERO-90 provides comprehensive evaluation with 90 diverse tasks. Together, they reveal which generalization dimensions cause the most forgetting and the most transfer.