Real-World Data for FurnitureBench

FurnitureBench standardizes real-world furniture assembly evaluation with reproducible 3D-printed kits. Diverse assembly demonstrations build policies that handle the precision insertion and force control that simulation cannot teach.

FurnitureBench at a Glance

Assembly Tasks

4-11

Steps per Task

Real HW

Evaluation

Franka Panda

Robot

10 Hz

Control Freq

2023

Released

Assembly Task Breakdown

Each FurnitureBench task requires ordered subtask completion. Failure at any step prevents subsequent steps.

Task	Assembly Steps	Key Skill Tested	Difficulty
One-Leg Table	4 (pick leg, pre-align, insert, verify)	Precision peg insertion	Medium
Round Table	8 (multi-leg + tabletop attachment)	Sequential precision + spatial planning	Hard
Lamp Assembly	11 (base, pole, shade, bulb, wiring)	Delicate component handling + multi-stage insertion	Very Hard

FurnitureBench vs. Related Assembly Benchmarks

Feature	FurnitureBench	IKEA Furniture (Lee)	Factory (Isaac)	RLBench Assembly
Real hardware eval	Yes (standardized kits)	Simulation only	Simulation only	Simulation only
Reproducible	3D-printed STL files	N/A (sim only)	N/A (sim only)	N/A (sim only)
Force feedback	6-axis F/T sensor	Simulated contact	Simulated contact	Simulated contact
Task complexity	4-11 ordered steps	Multi-step assembly	Single insertion	1-3 steps

Benchmark Profile

FurnitureBench is a real-world furniture assembly benchmark created by Heo et al. at CMU and KAIST, presented at RSS 2023. It uses real IKEA-style 3D-printed furniture kits with standardized assembly sequences, providing both a physical evaluation protocol with reproducible hardware and a matched simulation environment in Isaac Gym for training. It is one of the few benchmarks with standardized real-world evaluation that any lab can replicate.

Task Set

3 furniture assembly tasks of increasing difficulty: one-leg table (4 assembly steps), round table (8 assembly steps), and lamp assembly (11 assembly steps). Each requires multi-step manipulation including part identification, grasping, alignment under tight tolerances, insertion, and fastening. Tasks use 3D-printed parts with standardized dimensions and color-coded components.

Observation Space

RGB images from front camera (640x480) and wrist-mounted camera, 7-DOF robot joint positions and velocities, end-effector 6-DOF pose, binary gripper state, and wrist-mounted force/torque measurements (6-axis).

Action Space

7-DOF end-effector delta poses (3D position + 3D orientation + gripper open/close) on a Franka Panda arm, executed at 10 Hz control frequency. OSC (Operational Space Control) impedance controller handles low-level joint commands.

Evaluation Protocol

Success rate on real hardware using standardized 3D-printed furniture kits. Assembly is measured by completion of each ordered subtask (pick part, transport, pre-align, insert, fasten) and overall assembly completion. Each task has a defined subtask sequence — success on step N requires completion of steps 1 through N-1. Evaluation uses 10-20 trials per task with randomized initial part placement.

The Sim-to-Real Gap

FurnitureBench uniquely provides both simulation (Isaac Gym) and real evaluation, enabling direct sim-to-real comparison. The main gaps are insertion physics — peg-hole alignment with sub-millimeter tolerances requires force-sensitive control that simulation models imprecisely. Real 3D-printed parts have slight compliance, layer-line texture, and manufacturing variation that printed copies of the same part are not identical. Visual appearance differs between the simulation renderer and real parts under lab lighting.

Real-World Data Needed

Assembly demonstrations on real furniture kits with 6-axis force/torque feedback during insertion phases. Diverse assembly strategies showing different approach angles, grip choices, and recovery from misalignment. Data from assembly in varied lighting conditions and workspace configurations. Multi-modal demonstrations (teleoperation and human hand) to capture both robot-executable and human-expert strategies.

Complementary Claru Datasets

Manipulation Trajectory Dataset

Contact-rich manipulation data with force measurements provides pretraining for the precise insertion and alignment skills that FurnitureBench's peg-in-hole assembly demands.

Custom Assembly Task Collection

Purpose-collected assembly demonstrations with diverse furniture kits and assembly strategies provide direct training data for multi-step assembly sequencing and error recovery.

Egocentric Activity Dataset

Human furniture assembly video from 100+ environments provides visual pretraining for understanding multi-step assembly sequences with natural error recovery and strategy adaptation.

Bridging the Gap: Technical Analysis

FurnitureBench is distinctive because it provides standardized physical evaluation — real 3D-printed furniture kits that any lab can reproduce from published STL files. This eliminates the hardware variability that makes comparing real-world results across labs difficult. Any researcher with a Franka Panda and a 3D printer can reproduce the exact evaluation conditions.

The assembly tasks test manipulation skills that most benchmarks ignore: precise alignment under sub-millimeter tolerances, multi-step sequencing with strict ordering dependencies (you cannot attach the tabletop before the legs), and force-sensitive insertion where visual observation alone is insufficient. These skills are directly relevant to manufacturing robotics and industrial assembly.

The insertion phase presents the hardest sim-to-real challenge. Peg-in-hole insertion with tight tolerances requires compliant, force-sensitive control — the robot must feel when the peg contacts the hole edge and adjust its approach angle and force vector. Isaac Gym models contact with simplified penalty-based methods that miss the stick-slip friction, part compliance, and alignment sensitivity of real insertion. A policy that learns to 'jam and push' in simulation will damage parts on real hardware.

The multi-step nature compounds the challenge. The lamp assembly requires 11 sequential steps, and failure at any step prevents completion. If the one-leg insertion has 85% success, the probability of completing all 4 steps of the one-leg table drops to ~52%. For the 11-step lamp, even 95% per-step success yields only ~57% full assembly completion.

Real-world assembly data with force measurements during insertion provides the training signal simulation cannot generate. Demonstrations showing how force profiles change during successful versus failed insertions, and how experienced assemblers adapt their approach angle based on tactile feedback, provide the compliant control strategies that are absent from simulation-only training.

Key Papers

[1]Heo et al.. “FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation.” RSS 2023, 2023. Link
[2]Lee et al.. “IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks.” ICRA 2021, 2021. Link
[3]Chi et al.. “Diffusion Policy: Visuomotor Policy Learning via Action Diffusion.” RSS 2023, 2023. Link
[4]Zhao et al.. “Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware.” RSS 2023, 2023. Link

Frequently Asked Questions

FurnitureBench provides standardized physical furniture kits (3D-printed from published STL files) that any lab can reproduce. This enables direct comparison of real-world results across institutions — something most benchmarks cannot offer because of hardware and object variability. It is one of very few benchmarks with reproducible real-world evaluation.

Assembly tests manipulation skills most benchmarks ignore: precision alignment under sub-millimeter tolerances, multi-step sequencing with strict ordering dependencies, and force-sensitive insertion. These skills are directly relevant to manufacturing robotics, and the long-horizon nature (up to 11 steps) exposes compounding errors that single-step benchmarks miss.

Insertion phases require compliant, force-sensitive control — detecting when a peg contacts the hole edge and adjusting approach angle and force vector. Simulation models contact with penalty-based methods that produce qualitatively wrong force profiles. Real force measurements during insertion provide the ground truth for learning compliant control strategies that avoid jamming or part damage.

Assembly steps must complete in order — step N requires steps 1 through N-1. Success rates multiply across steps. Even 90% per-step reliability yields only ~31% success on the 11-step lamp assembly. This compounding math means small improvements in per-step precision have outsized effects on full-task completion, making FurnitureBench a sensitive test of manipulation reliability.

Published results show significant sim-to-real drops, particularly on insertion steps. Simulation-only approaches achieve reasonable transport performance (moving parts to the workspace) but fail on precision insertion where contact dynamics, part compliance, and manufacturing variation are critical. Hybrid approaches combining simulation pre-training with real-world fine-tuning on force-annotated demonstrations perform best.