Robotics Benchmarks
Every robotics benchmark measures performance in simulation. Real-world data reveals whether those scores transfer to physical robots. Explore how purpose-collected data bridges the sim-to-real gap for each benchmark.
RLBench
RLBench is a large-scale benchmark and learning environment built on CoppeliaSim (V-REP) and PyRep. Created by Stephen James et al. at Imperial College London in 2020, it provides 100 carefully design...
View benchmark →CALVIN
CALVIN (Composing Actions from Language and Vision) is a benchmark for evaluating language-conditioned multi-step manipulation. Created by Oier Mees et al. at the University of Freiburg in 2022, it te...
View benchmark →ManiSkill
ManiSkill is a GPU-parallelized simulation benchmark from the SAPIEN team at UC San Diego. Now in its third iteration (ManiSkill3, 2024), it provides high-fidelity object manipulation tasks using the ...
View benchmark →Colosseum
Colosseum is a benchmark designed to evaluate the robustness of vision-language-action (VLA) models under systematic environmental perturbations. Created by Pumacay et al. and presented at RSS 2024, i...
View benchmark →Habitat
Habitat is an embodied AI simulation platform from Meta FAIR that evaluates navigation and rearrangement in photorealistic 3D indoor environments. Habitat 2.0 and 3.0 introduced articulated object int...
View benchmark →Meta-World
Meta-World is a multi-task benchmark for meta-reinforcement learning and multi-task learning, providing 50 manipulation tasks with a simulated Sawyer robot arm. Created by researchers at UC Berkeley, ...
View benchmark →robosuite
robosuite is a modular simulation framework and benchmark for robot manipulation built on MuJoCo. Developed by the Stanford Vision and Learning Lab (SVL), it provides standardized manipulation environ...
View benchmark →LIBERO
LIBERO (LIfelong BEnchmark for RObot learning) evaluates lifelong and continual learning for robot manipulation. Created by Liu et al. at UT Austin and NVIDIA, published at NeurIPS 2024, it tests whet...
View benchmark →SAPIEN
SAPIEN is a physics simulation platform and benchmark for interactive 3D environments. Built by UC San Diego and Stanford researchers, it provides articulated object simulation with PartNet-Mobility a...
View benchmark →VLABench
VLABench evaluates vision-language-action models on their ability to ground natural language instructions in physical manipulation. It tests VLA models on compositional language understanding — can th...
View benchmark →RoboCasa
RoboCasa is a large-scale simulation benchmark for household robot manipulation. Built on robosuite, it provides photorealistic kitchen environments with over 150 object categories and 2,500 3D assets...
View benchmark →SimplerEnv
SimplerEnv is an evaluation framework designed to bridge simulation and real-world robot evaluation. Created by researchers at UC San Diego and UC Berkeley, it provides simulated replicas of real eval...
View benchmark →Real Robot Challenge
The Real Robot Challenge (RRC) is a unique benchmark that provides remote access to real TriFinger robot platforms for evaluation. Organized by the Max Planck Institute, it allows researchers worldwid...
View benchmark →DexArt
DexArt is a benchmark for dexterous manipulation of articulated objects using multi-finger robotic hands. Presented at CVPR 2023 by Bao et al., it evaluates policies on tasks requiring coordinated fin...
View benchmark →FurnitureBench
FurnitureBench is a real-world furniture assembly benchmark created by Heo et al. at CMU and KAIST, presented at RSS 2023. It uses real IKEA-style 3D-printed furniture kits with standardized assembly ...
View benchmark →robosuite (Benchmark)
robosuite is a modular simulation framework and benchmark for robot manipulation built on MuJoCo. This benchmark page covers robosuite as a standardized evaluation framework, distinct from the robosui...
View benchmark →DexArt
DexArt is a benchmark for dexterous manipulation of articulated objects using multi-finger robot hands. Created by researchers at UC San Diego and Tsinghua University, it focuses on manipulating objec...
View benchmark →ADROIT
ADROIT (Autonomous Dexterous RObot In-hand manipulation Tasks) is a benchmark for dexterous manipulation using the Shadow Hand — a 24-DOF anthropomorphic robot hand. Developed at the University of Was...
View benchmark →DeepMind Control Suite
The DeepMind Control Suite (dm_control) is a set of continuous control tasks built on MuJoCo, providing standardized benchmarks for reinforcement learning in locomotion, manipulation, and balance. Cre...
View benchmark →FurnitureBench
FurnitureBench is a real-world furniture assembly benchmark that provides standardized tasks, hardware setup, and evaluation protocols for contact-rich, long-horizon manipulation. Created by researche...
View benchmark →Bi-DexHands
Bi-DexHands is a benchmark for bimanual dexterous manipulation using two multi-finger robot hands. Developed at PKU, it provides standardized tasks requiring coordination between two dexterous hands i...
View benchmark →ARNOLD
ARNOLD (A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic 3D Scenes) evaluates an agent's ability to follow natural language instructions to manipulate objects in pho...
View benchmark →PartNet-Mobility
PartNet-Mobility is a large-scale dataset and benchmark of articulated 3D objects with part-level annotations and mobility information. Built on SAPIEN, it provides over 2,000 articulated object model...
View benchmark →Bridge the Sim-to-Real Gap
Talk to our team about purpose-built real-world data that validates and improves simulation-trained robot policies.
Get in Touch