Warehouse Robotics Data: Training Data for Picking, Packing, and Palletizing

Warehouse automation is the largest commercial deployment market for manipulation robots, yet most training data comes from controlled lab setups that cannot replicate the SKU variety, clutter conditions, and throughput demands of real fulfillment centers. The gap between lab-trained pick rates and production requirements costs operators millions in failed deployments.

Why Do Lab-Trained Warehouse Robots Underperform in Production?

Warehouse manipulation operates under constraints that research environments rarely model. Production picking systems must handle thousands of unique SKUs with continuous inventory rotation, maintain 99%+ pick success rates to match human performance, and sustain throughput of 600-1,000 picks per hour across multi-hour shifts. AnyGrasp demonstrated a generalizable grasp detection framework achieving strong benchmark results, but acknowledged performance degradation on novel materials, transparent surfaces, and extreme object geometries not present in training data. In production warehouse environments, these edge cases are not edge cases at all: reflective packaging, transparent bottles, deformable bags, and irregularly shaped items represent a substantial fraction of typical e-commerce inventory. The consistent pattern across commercial warehouse robot deployments is a 20-30 percentage point gap between benchmark pick rates and production pick rates, driven primarily by training data that does not represent operational conditions.

[1]

What Makes Warehouse Data Different from Lab Manipulation Data?

Lab manipulation datasets like DROID capture careful demonstrations of isolated pick-and-place tasks on clean surfaces. Production warehouse environments present compound challenges: bins containing 50+ mixed items in random orientations, conveyor belts moving at fixed speeds requiring real-time grasp planning, variable lighting across warehouse zones and shifts, and packaging that changes with supplier substitutions. Open X-Embodiment aggregated over 1 million trajectories but the data is dominated by single-object tabletop manipulation in research settings. The tasks, object distributions, and environmental conditions are fundamentally different from production warehouse operations where a robot must pick a small cosmetic tube wedged between two heavy boxes in a dimly lit bin while maintaining throughput targets.

[2][3]

How Does SKU Diversity Create a Long-Tail Data Problem?

A typical e-commerce fulfillment center handles 50,000-500,000 unique SKUs. Each SKU has distinct geometry, weight, packaging material, and grasp affordances. GraspNet-1Billion provides 1 billion grasp poses but across only 88 objects, none of which are commercial products in commercial packaging. The long-tail distribution means that common items are well-represented in any training set, but the thousands of infrequent SKUs that collectively represent 20-40% of pick volume are underrepresented or absent. Production pick failures disproportionately cluster on these tail items, and no amount of synthetic data generation from 3D scans can replace the material properties, packaging variability, and damage conditions that only real-world capture provides.

[4]

How Do Open Datasets Compare for Warehouse Robot Training?

The table below compares datasets relevant to warehouse robotics against Claru custom collection. Production warehouse deployment requires data with commercial product diversity, operational environment conditions, and task coverage beyond isolated pick-and-place.

Name	Scale	Tasks	Environments	Limitations
Open X-Embodiment	1M+ trajectories, 22 robot platforms	Short-horizon manipulation; primarily single-object pick-and-place	Research labs; controlled tabletop setups	No warehouse environments; no commercial products; single-object tasks only; no throughput-constrained operations
DROID	76K trajectories, 564 scenes	Table-top manipulation with Franka robots	13 institutions; lab environments	No logistics or warehouse tasks; fixed-base robot only; no conveyor or bin-picking scenarios
GraspNet-1Billion	1 billion grasp poses, 88 objects	6-DoF grasp detection on curated objects	Lab tabletop with controlled lighting	88 objects only; no commercial packaging; no clutter density variation; controlled conditions
Amazon Picking Challenge Data	Limited release; task-specific bins with known objects	Bin picking of known retail products	Competition setup simulating warehouse shelf	Small object set; known items only; single lighting condition; competition-specific constraints
Claru Custom	386K+ video clips, ~500 contributors, configurable per warehouse	Configurable: bin picking, conveyor sorting, packing, palletizing, kitting, quality inspection	Real fulfillment centers, distribution hubs, and warehouse floors with production lighting and conditions	Requires engagement lead time (days to launch, 1-2 week calibration); not a public benchmark

Open X-Embodiment

Scale1M+ trajectories, 22 robot platforms

TasksShort-horizon manipulation; primarily single-object pick-and-place

EnvironmentsResearch labs; controlled tabletop setups

LimitationsNo warehouse environments; no commercial products; single-object tasks only; no throughput-constrained operations

DROID

Scale76K trajectories, 564 scenes

TasksTable-top manipulation with Franka robots

Environments13 institutions; lab environments

LimitationsNo logistics or warehouse tasks; fixed-base robot only; no conveyor or bin-picking scenarios

GraspNet-1Billion

Scale1 billion grasp poses, 88 objects

Tasks6-DoF grasp detection on curated objects

EnvironmentsLab tabletop with controlled lighting

Limitations88 objects only; no commercial packaging; no clutter density variation; controlled conditions

Amazon Picking Challenge Data

ScaleLimited release; task-specific bins with known objects

TasksBin picking of known retail products

EnvironmentsCompetition setup simulating warehouse shelf

LimitationsSmall object set; known items only; single lighting condition; competition-specific constraints

Claru Custom

Scale386K+ video clips, ~500 contributors, configurable per warehouse

TasksConfigurable: bin picking, conveyor sorting, packing, palletizing, kitting, quality inspection

EnvironmentsReal fulfillment centers, distribution hubs, and warehouse floors with production lighting and conditions

LimitationsRequires engagement lead time (days to launch, 1-2 week calibration); not a public benchmark

Annotators

Countries

0M+

Annotations Delivered

Same-day

QA Turnaround

Frequently Asked Questions

Claru covers the full spectrum of warehouse manipulation tasks: bin picking (single and multi-item), conveyor sortation, order packing, palletizing and depalletizing, kitting and assembly, and quality inspection. Task coverage is configured per engagement to match the specific operational workflows of the target deployment.

Yes. Claru deploys data collection directly into production warehouse environments. Contributors use standard smartphones and wearable cameras to capture first-person footage during actual operations, ensuring data reflects real lighting conditions, bin configurations, product distributions, and operational pressures of the target deployment site.

Claru captures demonstrations across the client's actual product catalog rather than a curated subset. The annotation pipeline includes product-level metadata (category, packaging type, weight class, fragility) alongside grasp annotations. This ensures training data covers the full SKU distribution, including the long-tail items where production pick failures concentrate.

Collection throughput scales with the number of deployed contributors and the target warehouse's operational volume. Claru's collection infrastructure has demonstrated rapid scaling to approximately 500 contributors globally. For warehouse engagements, data collection is integrated into existing shift operations to capture production-representative picking speeds and operational patterns without disrupting throughput.

╔════════════════════╗
║  INITIATE CONTACT  ║
║  ▶ CONNECT NOW     ║
╚════════════════════╝

┌────────────────┐
│ STATUS: READY  │
│ AWAITING INPUT │
└────────────────┘

// INITIATE

Your next hire isn't a vendor.
It's a data team.

Tell us what you're training. We'll scope the dataset.

</>

References

[1]Fang et al.. “AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains.” IEEE T-RO 2023, 2023. Generalizable 6-DoF grasp detection with noted performance degradation on novel materials and transparent surfaces absent from training data. Link
[2]Khazatsky et al.. “DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset.” arXiv 2024, 2024. 76,000 robot manipulation trajectories demonstrating value of diverse collection but limited to lab tabletop setups. Link
[3]O'Brien et al.. “Open X-Embodiment: Robotic Learning Datasets and RT-X Models.” arXiv 2024, 2024. 1M+ trajectories from 22 platforms but dominated by single-object tabletop manipulation in research settings. Link
[4]Fang et al.. “GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping.” CVPR 2020, 2020. 1 billion grasp poses across 88 objects; insufficient object diversity for commercial warehouse applications. Link