Kitchen Manipulation Data: Training Robots for Food Prep and Household Tasks

Kitchen environments are the most common deployment target for household robots, yet they present some of the hardest manipulation challenges: deformable food items, transparent containers, wet surfaces, and multi-step task sequences requiring tool use. Public datasets either simulate kitchens synthetically or capture a narrow range of tasks in a few lab kitchens, leaving a critical training data gap for real-world kitchen autonomy.

Why Is Kitchen Manipulation One of the Hardest Data Problems in Robotics?

Kitchen environments concentrate nearly every manipulation challenge into a single domain. Objects are deformable (dough, vegetables), transparent (glasses, bottles), reflective (pots, utensils), and small (spices, garnishes). Tasks are inherently multi-step and require tool use: cutting requires grasping a knife, stabilizing the food item, and applying controlled force along a trajectory. RoboCasa established a large-scale simulation benchmark spanning 150+ kitchen layouts with 2,500+ object instances and demonstrated that environment diversity is critical for policy generalization. However, the authors noted that simulated kitchens cannot capture the material properties that determine manipulation success: the friction of a wet cutting board, the deformability of bread under a knife, or the compliance of a garbage bag being pulled from its roll. BEHAVIOR-1K defined 1,000 everyday activities with kitchen tasks comprising the largest single category, highlighting that kitchen manipulation is central to household robot deployment but remains far from solved.

[1][4]

What Gaps Exist in Current Kitchen Manipulation Datasets?

Existing kitchen datasets fall into two categories: simulation-based and lab-based, each with structural limitations. RoboCasa provides scale and diversity in simulation but cannot model contact-rich interactions with deformable food items. EPIC-KITCHENS offers 100 hours of real kitchen video from 45 kitchens but provides egocentric observation without robot action labels, making it suitable for activity recognition but not direct policy learning. The BridgeData V2 dataset includes real robot manipulation demonstrations, but kitchen-relevant tasks are a small fraction of its 60,000+ trajectories, and all data comes from a single lab setup. CALVIN provides language-conditioned manipulation benchmarks but operates in a simplified tabletop environment without realistic kitchen objects. The fundamental issue is that no existing dataset combines real kitchen environments, diverse food and tool manipulation, and the action-labeled demonstrations that learned policies require.

[2][3]

Why Does Kitchen Environment Diversity Matter for Policy Transfer?

Kitchen layouts, appliances, and tool sets vary enormously across households, cultures, and geographies. A policy trained in a single kitchen with a specific knife set, cutting board, and stove configuration will fail when deployed in a kitchen with different spatial arrangements, tool sizes, or appliance interfaces. RoboCasa showed that training across 150+ unique kitchen layouts produced policies that generalized significantly better than single-layout training, but this result was in simulation where layout variation is free. In the real world, capturing kitchen diversity requires visiting many actual kitchens across different housing types, countries, and socioeconomic contexts. This is precisely the environment diversity gap that distinguishes lab-collected data from data that produces deployable policies.

[1]

How Do Open Kitchen Datasets Compare to Custom Collection?

The table below compares datasets relevant to kitchen manipulation against Claru custom collection. The critical gaps in open data are real-world kitchen diversity, food manipulation coverage, and action-labeled demonstrations.

Name	Scale	Tasks	Environments	Limitations
RoboCasa (Sim)	100K+ trajectories, 150+ kitchen layouts	Kitchen manipulation; opening, closing, picking, placing, cooking	Simulated kitchens with AI-generated textures and layouts	Simulation only; no deformable food physics; cannot model wet/sticky surfaces or real material properties
EPIC-KITCHENS	100 hours, 45 kitchens, 20M frames	Egocentric activity recognition; cooking, cleaning, food prep	Real home kitchens across multiple countries	No action labels for robot learning; observation-only; unstructured annotations
BridgeData V2	60K+ trajectories, 24 environments	Tabletop manipulation including some kitchen-adjacent tasks	Lab environments with toy kitchen objects	Kitchen tasks are a small fraction; toy objects rather than real food; single lab setup
BEHAVIOR-1K	1,000 activity definitions, simulation benchmark	1,000 household activities including extensive kitchen tasks	Simulated household environments with 50 scenes	Benchmark definitions only; no real-world demonstrations; simulation physics limitations for deformable objects
Claru Custom	386K+ video clips, ~500 contributors, global kitchen coverage	Configurable: food prep, cooking, cleaning, tool use, multi-step meal preparation across cuisines	Real home kitchens across multiple countries and housing types; professional kitchens; diverse appliance sets	Requires engagement lead time (days to launch, 1-2 week calibration); not a public benchmark

RoboCasa (Sim)

Scale100K+ trajectories, 150+ kitchen layouts

TasksKitchen manipulation; opening, closing, picking, placing, cooking

EnvironmentsSimulated kitchens with AI-generated textures and layouts

LimitationsSimulation only; no deformable food physics; cannot model wet/sticky surfaces or real material properties

EPIC-KITCHENS

Scale100 hours, 45 kitchens, 20M frames

TasksEgocentric activity recognition; cooking, cleaning, food prep

EnvironmentsReal home kitchens across multiple countries

LimitationsNo action labels for robot learning; observation-only; unstructured annotations

BridgeData V2

Scale60K+ trajectories, 24 environments

TasksTabletop manipulation including some kitchen-adjacent tasks

EnvironmentsLab environments with toy kitchen objects

LimitationsKitchen tasks are a small fraction; toy objects rather than real food; single lab setup

BEHAVIOR-1K

Scale1,000 activity definitions, simulation benchmark

Tasks1,000 household activities including extensive kitchen tasks

EnvironmentsSimulated household environments with 50 scenes

LimitationsBenchmark definitions only; no real-world demonstrations; simulation physics limitations for deformable objects

Claru Custom

Scale386K+ video clips, ~500 contributors, global kitchen coverage

TasksConfigurable: food prep, cooking, cleaning, tool use, multi-step meal preparation across cuisines

EnvironmentsReal home kitchens across multiple countries and housing types; professional kitchens; diverse appliance sets

LimitationsRequires engagement lead time (days to launch, 1-2 week calibration); not a public benchmark

Annotators

Countries

0M+

Annotations Delivered

Same-day

QA Turnaround

Frequently Asked Questions

Claru captures the full range of kitchen manipulation tasks: ingredient preparation (washing, peeling, chopping, slicing), cooking operations (stirring, flipping, pouring, seasoning), tool use (knives, spatulas, whisks, tongs), cleaning (wiping, scrubbing, loading dishwasher), and multi-step meal preparation sequences. Task coverage is configured per engagement based on the target deployment scenario.

Kitchen count scales with engagement scope. Claru's global contributor network of approximately 500 people spans multiple countries and housing types, from compact apartments to large family homes to professional kitchens. Each kitchen contributes unique spatial layouts, appliance sets, and tool inventories that improve policy generalization across deployment environments.

Yes. Real-kitchen capture inherently includes deformable food manipulation that simulation cannot model: kneading dough, slicing ripe fruit, handling raw meat, wrapping tortillas, and assembling sandwiches. Annotations include object state transitions (e.g., whole to sliced, raw to cooked) and grasp adaptation labels that capture how human manipulation strategies change based on food item compliance and fragility.

Kitchen demonstrations can be paired with natural language instructions at the task level (e.g., 'chop the onion into small pieces') and the step level (e.g., 'hold the onion steady with your left hand, then make vertical cuts'). Language annotation depth is configured per engagement. The structured activity taxonomy provides hierarchical task descriptions that can be converted to language instructions for VLA architectures.

╔════════════════════╗
║  INITIATE CONTACT  ║
║  ▶ CONNECT NOW     ║
╚════════════════════╝

┌────────────────┐
│ STATUS: READY  │
│ AWAITING INPUT │
└────────────────┘

// INITIATE

Your next hire isn't a vendor.
It's a data team.

Tell us what you're training. We'll scope the dataset.

</>

References

[1]Nasiriany et al.. “RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots.” arXiv 2024, 2024. Large-scale kitchen simulation benchmark with 150+ layouts and 2,500+ objects; showed that environment diversity is critical for policy generalization but acknowledged simulation cannot model deformable food physics. Link
[2]Damen et al.. “Scaling Egocentric Vision: The EPIC-KITCHENS Dataset.” ECCV 2018, 2018. 100 hours of egocentric kitchen video from 45 kitchens revealing significant cross-cultural variation in cooking activities, tools, and kitchen layouts. Link
[3]Walke et al.. “BridgeData V2: A Dataset for Robot Learning at Scale.” CoRL 2024, 2024. 60,000+ real robot manipulation trajectories across 24 environments demonstrating that diverse real-world demonstrations improve policy generalization. Link
[4]Li et al.. “BEHAVIOR-1K: A Human-Centered Benchmark for Embodied AI with 1,000 Everyday Activities.” CoRL 2023, 2023. Defined 1,000 everyday household activities with kitchen tasks as the largest single category, establishing that kitchen manipulation is central to household robot deployment. Link