Robotics Benchmarks

Every robotics benchmark measures performance in simulation. Real-world data reveals whether those scores transfer to physical robots. Explore how purpose-collected data bridges the sim-to-real gap for each benchmark.

RLBench

RLBench is a large-scale benchmark and learning environment built on CoppeliaSim (V-REP) and PyRep. Created by Stephen James et al. at Imperial College London in 2020, it provides 100 carefully design...

View benchmark →

CALVIN

CALVIN (Composing Actions from Language and Vision) is a benchmark for evaluating language-conditioned multi-step manipulation. Created by Oier Mees et al. at the University of Freiburg in 2022, it te...

View benchmark →

ManiSkill

ManiSkill is a GPU-parallelized simulation benchmark from the SAPIEN team at UC San Diego. Now in its third iteration (ManiSkill3, 2024), it provides high-fidelity object manipulation tasks using the ...

View benchmark →

Colosseum

Colosseum is a benchmark designed to evaluate the robustness of vision-language-action (VLA) models under systematic environmental perturbations. Created by Pumacay et al. and presented at RSS 2024, i...

View benchmark →

Habitat

Habitat is an embodied AI simulation platform from Meta FAIR that evaluates navigation and rearrangement in photorealistic 3D indoor environments. Habitat 2.0 and 3.0 introduced articulated object int...

View benchmark →

Meta-World

Meta-World is a multi-task benchmark for meta-reinforcement learning and multi-task learning, providing 50 manipulation tasks with a simulated Sawyer robot arm. Created by researchers at UC Berkeley, ...

View benchmark →

robosuite

robosuite is a modular simulation framework and benchmark for robot manipulation built on MuJoCo. Developed by the Stanford Vision and Learning Lab (SVL), it provides standardized manipulation environ...

View benchmark →

LIBERO

LIBERO (LIfelong BEnchmark for RObot learning) evaluates lifelong and continual learning for robot manipulation. Created by Liu et al. at UT Austin and NVIDIA, published at NeurIPS 2024, it tests whet...

View benchmark →

SAPIEN

SAPIEN is a physics simulation platform and benchmark for interactive 3D environments. Built by UC San Diego and Stanford researchers, it provides articulated object simulation with PartNet-Mobility a...

View benchmark →

VLABench

VLABench evaluates vision-language-action models on their ability to ground natural language instructions in physical manipulation. It tests VLA models on compositional language understanding — can th...

View benchmark →

RoboCasa

RoboCasa is a large-scale simulation benchmark for household robot manipulation. Built on robosuite, it provides photorealistic kitchen environments with over 150 object categories and 2,500 3D assets...

View benchmark →

SimplerEnv

SimplerEnv is an evaluation framework designed to bridge simulation and real-world robot evaluation. Created by researchers at UC San Diego and UC Berkeley, it provides simulated replicas of real eval...

View benchmark →

Real Robot Challenge

The Real Robot Challenge (RRC) is a unique benchmark that provides remote access to real TriFinger robot platforms for evaluation. Organized by the Max Planck Institute, it allows researchers worldwid...

View benchmark →

DexArt

DexArt is a benchmark for dexterous manipulation of articulated objects using multi-finger robotic hands. Presented at CVPR 2023 by Bao et al., it evaluates policies on tasks requiring coordinated fin...

View benchmark →

FurnitureBench

FurnitureBench is a real-world furniture assembly benchmark created by Heo et al. at CMU and KAIST, presented at RSS 2023. It uses real IKEA-style 3D-printed furniture kits with standardized assembly ...

View benchmark →

robosuite (Benchmark)

robosuite is a modular simulation framework and benchmark for robot manipulation built on MuJoCo. This benchmark page covers robosuite as a standardized evaluation framework, distinct from the robosui...

View benchmark →

DexArt

DexArt is a benchmark for dexterous manipulation of articulated objects using multi-finger robot hands. Created by researchers at UC San Diego and Tsinghua University, it focuses on manipulating objec...

View benchmark →

ADROIT

ADROIT (Autonomous Dexterous RObot In-hand manipulation Tasks) is a benchmark for dexterous manipulation using the Shadow Hand — a 24-DOF anthropomorphic robot hand. Developed at the University of Was...

View benchmark →

DeepMind Control Suite

The DeepMind Control Suite (dm_control) is a set of continuous control tasks built on MuJoCo, providing standardized benchmarks for reinforcement learning in locomotion, manipulation, and balance. Cre...

View benchmark →

FurnitureBench

FurnitureBench is a real-world furniture assembly benchmark that provides standardized tasks, hardware setup, and evaluation protocols for contact-rich, long-horizon manipulation. Created by researche...

View benchmark →

Bi-DexHands

Bi-DexHands is a benchmark for bimanual dexterous manipulation using two multi-finger robot hands. Developed at PKU, it provides standardized tasks requiring coordination between two dexterous hands i...

View benchmark →

ARNOLD

ARNOLD (A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic 3D Scenes) evaluates an agent's ability to follow natural language instructions to manipulate objects in pho...

View benchmark →

PartNet-Mobility

PartNet-Mobility is a large-scale dataset and benchmark of articulated 3D objects with part-level annotations and mobility information. Built on SAPIEN, it provides over 2,000 articulated object model...

View benchmark →

Bridge the Sim-to-Real Gap

Talk to our team about purpose-built real-world data that validates and improves simulation-trained robot policies.