EPIC-KITCHENS Alternative: Commercial Data for Production Kitchen Robotics

EPIC-KITCHENS-100 provides 100 hours of egocentric kitchen video with 89,977 action segments across 45 kitchens -- the gold standard for kitchen activity understanding research. But its CC BY-NC 4.0 license, lack of robot action labels, and single-domain scope create barriers for production kitchen robot teams. Compare EPIC-KITCHENS with Claru's commercially licensed, robot-ready manipulation data.

EPIC-KITCHENS Profile

Institution

University of Bristol

Year

2018

Scale

100 hours of egocentric cooking video, 89,977 action segments, 20M frames across 45 kitchens

License

CC BY-NC 4.0 (non-commercial -- explicitly prohibits commercial use)

Modalities
Egocentric RGB videoAction segment annotations (verb-noun pairs with temporal boundaries)Object bounding box annotationsVISOR pixel-level segmentation masks (separate release)

How Claru Helps Teams Beyond EPIC-KITCHENS

EPIC-KITCHENS is the gold standard for kitchen activity understanding research, and its fine-grained verb-noun annotation taxonomy has shaped how the field thinks about kitchen manipulation. However, the gap between understanding kitchen activities in video and controlling a robot to perform those activities is substantial. EPIC-KITCHENS' CC BY-NC license blocks commercial use entirely, its passive observation lacks the action labels and sensor modalities that robot policy training demands, and its home-only kitchen environments may not match commercial deployment settings. Claru bridges these gaps with commercially licensed robot demonstrations collected in your specific kitchen environment, with the full sensor stack and continuous action labels that modern imitation learning architectures require. Teams that pretrain visual encoders on EPIC-KITCHENS and fine-tune policies on Claru data get the best of both worlds: deep kitchen scene understanding grounded in robot-specific control data for production-ready manipulation policies.

What Is EPIC-KITCHENS?

EPIC-KITCHENS is a large-scale egocentric video dataset focused on kitchen activities, created by the University of Bristol and led by Dessi Damen. The original EPIC-KITCHENS-55 was published at ECCV 2018 (arXiv 1804.02748), featuring 55 hours of unscripted cooking video recorded by 32 participants in their own kitchens across 4 countries. Participants belonging to 10 different nationalities captured highly diverse kitchen habits and cooking styles using head-mounted cameras, resulting in the first large-scale collection of fine-grained kitchen activity from the first-person perspective.

EPIC-KITCHENS-100, published in the International Journal of Computer Vision in 2022 (arXiv 2006.13256), nearly doubled the dataset to 100 hours of video containing 20 million frames, 89,977 action segments, and 700 variable-length videos across 45 kitchen environments. The extended version introduced a refined annotation pipeline that produces 54% more actions per minute and 128% more action segments than the original, resulting in denser and more complete fine-grained activity annotations. The dataset was re-collected from 16 returning participants, enabling a unique 'test of time' evaluation -- whether models trained on 2018 footage generalize to new recordings from the same kitchens two years later.

The annotation taxonomy is structured around verb-noun pairs describing fine-grained actions. The dataset uses 97 verb classes (e.g., 'cut', 'wash', 'put', 'take', 'open') and 300 noun classes (e.g., 'knife', 'pan', 'onion', 'water', 'lid'), generating thousands of unique action combinations. Annotations include action segment temporal boundaries with start and end times, enabling tasks like action recognition, action anticipation, action detection, and the EPIC-KITCHENS VISOR benchmark (arXiv 2209.13064) for video segmentation and object relations. The dataset is split roughly 75/10/15 into train, validation, and test sets.

EPIC-KITCHENS has become a foundational benchmark for egocentric vision research and has influenced kitchen robot development. Its fine-grained action vocabulary provides a semantic framework for understanding cooking procedures, and its unscripted nature captures the real variability of kitchen work. However, the dataset was designed for video understanding research -- it contains passive observation of human cooking, not robot manipulation demonstrations, and this fundamental distinction limits its direct applicability to robot policy training.

EPIC-KITCHENS-100 at a Glance

100h
Hours of Kitchen Video
89,977
Action Segments
20M
Frames
45
Kitchen Environments
97
Verb Classes
300
Noun Classes

EPIC-KITCHENS vs. Claru: Side-by-Side Comparison

A detailed comparison across the dimensions that matter when building production kitchen robots.

DimensionEPIC-KITCHENSClaru
LicenseCC BY-NC 4.0 -- explicitly prohibits commercial useSingle commercial license with IP assignment
Scale100 hours, 89,977 action segments, 45 kitchensCustomizable: 1K to 1M+ demonstrations scoped to your tasks
Data TypePassive human observation -- no robot interaction or control signalsActive robot demonstrations with teleoperation action labels
Action LabelsVerb-noun pairs and temporal segments (semantic, not robot-compatible)Joint positions, end-effector deltas, gripper states at control frequency
Sensor ModalitiesEgocentric RGB video only (no depth, force, or proprioception)RGB + depth + force/torque + proprioception + tactile
Environment ScopeResidential kitchens only -- single domainAny environment: commercial kitchens, restaurants, warehouses, homes
Object Diversity300 noun classes from 45 home kitchensCustom object sets matching your deployment -- including industrial kitchen equipment
Manipulation DetailActivity-level annotations (what action), no grasp or contact dataGrasp types, contact events, force profiles, object state transitions
Quality ControlMulti-stage crowd annotation pipeline with temporal refinementProduction QC with >90% inter-annotator agreement and automated checks
UpdatesVersion-based releases (EPIC-55, EPIC-100, VISOR)Continuous collection on your timeline and evolving requirements

Key Limitations of EPIC-KITCHENS for Production Use

The CC BY-NC 4.0 license is the most immediate barrier for commercial teams. This license explicitly prohibits using EPIC-KITCHENS data in commercial products, including training models that are deployed in commercial kitchen robots. There is no commercial licensing option available from the University of Bristol. Teams building production kitchen robots must either find alternative data sources or risk license violation -- a non-starter for any company with legal review processes.

EPIC-KITCHENS contains zero robot data. Every frame shows human hands manipulating kitchen objects from a head-mounted camera perspective. While this is valuable for understanding kitchen activities and object affordances, it cannot directly train robot manipulation policies. The dataset lacks joint positions, end-effector velocities, gripper states, force/torque measurements, and proprioceptive data -- the fundamental information needed for imitation learning. A policy trained solely on EPIC-KITCHENS video would understand what actions look like but would have no signal for how to execute them on a physical robot arm.

The single-domain scope is both a strength and a limitation. EPIC-KITCHENS captures extraordinary depth in residential kitchen activities, but the 45 kitchen environments are all home kitchens from a relatively small number of participants. Commercial kitchen robots often deploy in restaurants, institutional kitchens, food processing facilities, and catering operations -- environments that differ substantially from home kitchens in layout, equipment, scale, and the types of objects present. Models trained exclusively on home kitchen data may not generalize well to these commercial settings.

The annotation taxonomy, while rich for activity recognition, is misaligned with robot control requirements. Verb-noun pairs like 'cut onion' or 'wash pan' describe semantic actions, but robot policies need continuous control signals at 10-50 Hz. There is no information about grip force, approach trajectory, contact dynamics, or the precise motor commands that distinguish a successful grasp from a failed one. EPIC-KITCHENS tells you what happened; it cannot tell you the low-level control details of how it happened.

When to Use EPIC-KITCHENS vs. Commercial Data

EPIC-KITCHENS is the premier benchmark for kitchen activity understanding research. If your work involves action recognition, action anticipation, temporal action detection, or egocentric video understanding in kitchen settings, EPIC-KITCHENS provides the most rigorous and widely-used evaluation framework. The EPIC-KITCHENS Challenges -- hosted annually at CVPR and ECCV workshops -- have driven significant progress in these areas, and publishing results on EPIC-KITCHENS benchmarks is the standard way to demonstrate progress in egocentric activity understanding.

EPIC-KITCHENS is also useful for pretraining visual representations that understand kitchen objects and activities. Its 97 verb classes and 300 noun classes provide a rich semantic vocabulary for kitchen scenes, and models pretrained on EPIC-KITCHENS data learn visual features that encode object affordances specific to cooking and food preparation. These pretrained representations can serve as initialization for downstream robot learning tasks in kitchen environments.

Switch to commercial data when you need to train robot control policies for kitchen manipulation. Visual pretraining on EPIC-KITCHENS can give your model an understanding of kitchen scenes, but the policy that maps visual observations to motor commands requires demonstrations from your actual robot performing your actual tasks in your actual kitchen environment. This is where Claru's robot-specific data becomes essential.

The most effective pipeline for kitchen robot development is: pretrain visual features on EPIC-KITCHENS (or a combination of EPIC-KITCHENS and Ego4D), then fine-tune on commercially licensed robot demonstrations from Claru for policy training. This captures EPIC-KITCHENS' deep kitchen activity understanding while grounding motor control in robot-specific demonstrations collected under conditions that match production deployment.

How Claru Complements EPIC-KITCHENS

Claru provides the robot manipulation data that EPIC-KITCHENS lacks, specifically designed for kitchen robot deployment. Where EPIC-KITCHENS shows human hands cooking from a head-mounted perspective, Claru provides teleoperated robot demonstrations with synchronized action labels, proprioceptive states, and multi-modal sensor streams. Our kitchen data covers the tasks that kitchen robots actually perform: pick-and-place of ingredients, tool use (spatulas, tongs, ladles), pouring, scooping, stirring, and plating -- each captured with the control signals needed for imitation learning.

Unlike EPIC-KITCHENS' home-only kitchen environments, Claru collects data across the full range of kitchen settings where robots deploy. This includes commercial restaurant kitchens with industrial equipment, institutional food service facilities, automated food preparation lines, and residential kitchens. Data is collected in your specific deployment environment, eliminating the domain gap between training and production. The objects, lighting conditions, counter heights, and equipment layouts in your training data match what the robot will encounter when deployed.

Every Claru demonstration includes the complete sensor stack for contact-rich kitchen manipulation: RGB and depth imagery, 6-axis force/torque measurements (critical for tasks like cutting, stirring, and pouring that require precise force control), full proprioceptive state, and gripper contact detection. These modalities are synchronized at your robot's control frequency, providing the rich sensory context that kitchen manipulation policies need. EPIC-KITCHENS' monocular RGB video alone cannot provide this information.

All Claru kitchen data ships under a single commercial license with full IP assignment. There is no non-commercial restriction, no application process, and no ambiguity about commercial use rights. Data is delivered in RLDS, HDF5, zarr, and LeRobot formats, ready for immediate integration with your training pipeline. Our quality standards ensure greater than 90% inter-annotator agreement on task labels, with automated kinematic validation to flag anomalous trajectories before they enter your training set.

References

  1. [1]Damen et al.. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset.” ECCV 2018, 2018. Link
  2. [2]Damen et al.. Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100.” IJCV 2022, 2022. Link
  3. [3]Darkhalil et al.. EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations.” NeurIPS 2022, 2022. Link
  4. [4]Grauman et al.. Ego4D: Around the World in 3,000 Hours of Egocentric Video.” CVPR 2022, 2022. Link

Frequently Asked Questions

No. EPIC-KITCHENS is released under the CC BY-NC 4.0 license, which explicitly prohibits commercial use. Training a model on EPIC-KITCHENS data and deploying it in a commercial kitchen robot would violate this license. There is no commercial licensing option available from the University of Bristol. Teams building production kitchen robots need commercially licensed data from the start. Claru provides kitchen manipulation data under a standard commercial license with full IP assignment, covering all delivered demonstrations for use in commercial products.

EPIC-KITCHENS provides excellent activity recognition benchmarks for kitchen scenes but cannot directly train manipulation policies. The dataset contains passive observation of human cooking from head-mounted cameras -- it has no robot action labels (joint positions, end-effector velocities, gripper states), no proprioceptive data, no force/torque measurements, and no depth imagery. These are all essential for imitation learning. EPIC-KITCHENS can pretrain visual encoders that understand kitchen objects and activities, but the policy head that maps observations to robot actions requires robot-specific demonstration data.

EPIC-KITCHENS-100 nearly doubled the dataset from 55 to 100 hours of video, growing from 39,596 to 89,977 action segments across 45 kitchens (up from 32). The annotation pipeline was completely redesigned, producing 54% more actions per minute and 128% more action segments for denser, more complete coverage. Sixteen participants from the original collection returned to record new footage, enabling a unique temporal generalization benchmark: whether models trained on 2018 data can generalize to 2020 recordings from the same kitchens. The extended dataset also added the VISOR benchmark for pixel-level video segmentation and object relation understanding.

The most effective approach is to use EPIC-KITCHENS for visual pretraining only, not policy training. Pretrain your visual encoder on EPIC-KITCHENS to learn rich representations of kitchen objects, utensils, and activities. Then fine-tune on commercially licensed robot demonstration data from Claru for the policy training stage. This gives your model EPIC-KITCHENS' deep understanding of kitchen scenes while providing the action labels, proprioceptive data, and force/torque measurements needed for manipulation control. Claru collects demonstrations in your specific kitchen environment to minimize the domain gap between training and deployment.

No. All 45 kitchens in EPIC-KITCHENS-100 are residential home kitchens belonging to the study participants. The dataset does not include commercial restaurant kitchens, institutional food service facilities, food processing plants, or other commercial environments where kitchen robots typically deploy. These commercial settings differ substantially from home kitchens in equipment, layout, scale, lighting, and object types. Claru collects data in your specific deployment environment -- whether that is a commercial kitchen, food preparation facility, or any other setting -- ensuring that training data matches production conditions.

Build Kitchen Robots That Work in Production

Go beyond EPIC-KITCHENS' passive observation with robot-ready manipulation data collected in your kitchen environment. Get a custom data collection plan from our team.