Kitchen Manipulation Data: Training Robots for Food Prep and Household Tasks
Kitchen environments are the most common deployment target for household robots, yet they present some of the hardest manipulation challenges: deformable food items, transparent containers, wet surfaces, and multi-step task sequences requiring tool use. Public datasets either simulate kitchens synthetically or capture a narrow range of tasks in a few lab kitchens, leaving a critical training data gap for real-world kitchen autonomy.
Why Is Kitchen Manipulation One of the Hardest Data Problems in Robotics?
Kitchen environments concentrate nearly every manipulation challenge into a single domain. Objects are deformable (dough, vegetables), transparent (glasses, bottles), reflective (pots, utensils), and small (spices, garnishes). Tasks are inherently multi-step and require tool use: cutting requires grasping a knife, stabilizing the food item, and applying controlled force along a trajectory. RoboCasa established a large-scale simulation benchmark spanning 150+ kitchen layouts with 2,500+ object instances and demonstrated that environment diversity is critical for policy generalization. However, the authors noted that simulated kitchens cannot capture the material properties that determine manipulation success: the friction of a wet cutting board, the deformability of bread under a knife, or the compliance of a garbage bag being pulled from its roll. BEHAVIOR-1K defined 1,000 everyday activities with kitchen tasks comprising the largest single category, highlighting that kitchen manipulation is central to household robot deployment but remains far from solved.
[1][4]What Gaps Exist in Current Kitchen Manipulation Datasets?
Existing kitchen datasets fall into two categories: simulation-based and lab-based, each with structural limitations. RoboCasa provides scale and diversity in simulation but cannot model contact-rich interactions with deformable food items. EPIC-KITCHENS offers 100 hours of real kitchen video from 45 kitchens but provides egocentric observation without robot action labels, making it suitable for activity recognition but not direct policy learning. The BridgeData V2 dataset includes real robot manipulation demonstrations, but kitchen-relevant tasks are a small fraction of its 60,000+ trajectories, and all data comes from a single lab setup. CALVIN provides language-conditioned manipulation benchmarks but operates in a simplified tabletop environment without realistic kitchen objects. The fundamental issue is that no existing dataset combines real kitchen environments, diverse food and tool manipulation, and the action-labeled demonstrations that learned policies require.
[2][3]Why Does Kitchen Environment Diversity Matter for Policy Transfer?
Kitchen layouts, appliances, and tool sets vary enormously across households, cultures, and geographies. A policy trained in a single kitchen with a specific knife set, cutting board, and stove configuration will fail when deployed in a kitchen with different spatial arrangements, tool sizes, or appliance interfaces. RoboCasa showed that training across 150+ unique kitchen layouts produced policies that generalized significantly better than single-layout training, but this result was in simulation where layout variation is free. In the real world, capturing kitchen diversity requires visiting many actual kitchens across different housing types, countries, and socioeconomic contexts. This is precisely the environment diversity gap that distinguishes lab-collected data from data that produces deployable policies.
[1]How Do Open Kitchen Datasets Compare to Custom Collection?
The table below compares datasets relevant to kitchen manipulation against Claru custom collection. The critical gaps in open data are real-world kitchen diversity, food manipulation coverage, and action-labeled demonstrations.
RoboCasa (Sim)
EPIC-KITCHENS
BridgeData V2
BEHAVIOR-1K
Claru Custom
Annotators
Countries
Annotations Delivered
QA Turnaround
Frequently Asked Questions
Claru captures the full range of kitchen manipulation tasks: ingredient preparation (washing, peeling, chopping, slicing), cooking operations (stirring, flipping, pouring, seasoning), tool use (knives, spatulas, whisks, tongs), cleaning (wiping, scrubbing, loading dishwasher), and multi-step meal preparation sequences. Task coverage is configured per engagement based on the target deployment scenario.
Kitchen count scales with engagement scope. Claru's global contributor network of approximately 500 people spans multiple countries and housing types, from compact apartments to large family homes to professional kitchens. Each kitchen contributes unique spatial layouts, appliance sets, and tool inventories that improve policy generalization across deployment environments.
Yes. Real-kitchen capture inherently includes deformable food manipulation that simulation cannot model: kneading dough, slicing ripe fruit, handling raw meat, wrapping tortillas, and assembling sandwiches. Annotations include object state transitions (e.g., whole to sliced, raw to cooked) and grasp adaptation labels that capture how human manipulation strategies change based on food item compliance and fragility.
Kitchen demonstrations can be paired with natural language instructions at the task level (e.g., 'chop the onion into small pieces') and the step level (e.g., 'hold the onion steady with your left hand, then make vertical cuts'). Language annotation depth is configured per engagement. The structured activity taxonomy provides hierarchical task descriptions that can be converted to language instructions for VLA architectures.
Your next hire isn't a vendor.
It's a data team.
Tell us what you're training. We'll scope the dataset.
References
- [1]Nasiriany et al.. “RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots.” arXiv 2024, 2024. Large-scale kitchen simulation benchmark with 150+ layouts and 2,500+ objects; showed that environment diversity is critical for policy generalization but acknowledged simulation cannot model deformable food physics. Link
- [2]Damen et al.. “Scaling Egocentric Vision: The EPIC-KITCHENS Dataset.” ECCV 2018, 2018. 100 hours of egocentric kitchen video from 45 kitchens revealing significant cross-cultural variation in cooking activities, tools, and kitchen layouts. Link
- [3]Walke et al.. “BridgeData V2: A Dataset for Robot Learning at Scale.” CoRL 2024, 2024. 60,000+ real robot manipulation trajectories across 24 environments demonstrating that diverse real-world demonstrations improve policy generalization. Link
- [4]Li et al.. “BEHAVIOR-1K: A Human-Centered Benchmark for Embodied AI with 1,000 Everyday Activities.” CoRL 2023, 2023. Defined 1,000 everyday household activities with kitchen tasks as the largest single category, establishing that kitchen manipulation is central to household robot deployment. Link