Egocentric Restaurant Video Dataset
First-person restaurant environment video for training food service robots and hospitality automation. 45K+ clips across 15+ restaurant types with food handling, plating, and service workflow annotations.
Dataset at a Glance
Why Egocentric Restaurant Data Matters for Robotics
Food service is one of the fastest-growing markets for robotic automation, driven by labor shortages and the repetitive nature of commercial kitchen and front-of-house tasks. Robots like Bear Robotics Servi, Miso Robotics Flippy, and Richtech Robotics Adam are already deployed in restaurants, but current systems are limited to narrow tasks because they lack the training data needed for generalized food handling, plating, and service workflows. The manipulation complexity of food preparation -- soft deformable ingredients, precise portioning, aesthetic plating, and strict hygiene protocols -- demands training data from the perspective of experienced food service professionals.
Academic kitchen datasets like EPIC-KITCHENS focus on home cooking in residential settings. While valuable for general activity recognition research, they miss the commercial-scale dynamics that restaurant robots encounter: simultaneous ticket management, high-volume batch preparation, strict food safety timing (temperature danger zones), commercial equipment operation (convection ovens, salamanders, immersion circulators), and the coordination between kitchen and front-of-house that defines restaurant service flow.
Claru's egocentric restaurant dataset captures the full spectrum of commercial food service operations across 15+ restaurant types: fast-casual, fine dining, QSR chains, hotel banquet kitchens, cafe and bakery operations, food trucks, ghost kitchens, and institutional cafeterias. Each environment type presents distinct manipulation challenges -- a sushi restaurant requires entirely different dexterity and plating precision than a burger line, and both differ from the batch-scale operations of a hospital cafeteria.
The egocentric perspective captures critical visual cues that food service professionals rely on: the color and texture changes that indicate cooking doneness, the subtle resistance feedback visible in knife technique, the spatial awareness needed to navigate crowded commercial kitchens, and the precise hand-eye coordination of garnishing and plating. These visual patterns are invisible in overhead surveillance footage but are exactly what food service robots need to learn.
Sensor Configuration and Collection Methodology
Collection uses chest-mounted GoPro HERO12 cameras in food-safe silicone housings, positioned to capture the hand workspace at the typical counter height (36 inches) where most food preparation occurs. Depth from co-mounted Intel RealSense D455 provides aligned RGB-D at the 0.3-1.2m working distance typical for food prep surfaces. All camera housings are NSF-certified food-safe materials that can be sanitized between sessions using standard restaurant cleaning protocols.
Collectors are active food service professionals -- line cooks, prep cooks, baristas, pastry chefs, servers, bartenders, and kitchen managers -- performing genuine tasks during actual service periods. Sessions capture both prep work (mise en place, batch cooking, dough preparation) and live service (ticket execution, plating, expediting, table service). Collection occurs during real business hours to capture authentic time pressure, multi-ticket management, and the environmental conditions (steam, grease, heat) of operating commercial kitchens.
Metadata recorded for every session includes restaurant type, station position (grill, saute, fry, prep, pastry, bar, expo), menu category being prepared, time of day, service volume level (slow/steady/rush), and an equipment manifest. Kitchen layout sketches document station positions and traffic flow patterns. For front-of-house clips, table layouts and service sequences are logged.
The dataset spans the full environmental range of commercial food service: the intense heat and steam of grill and saute stations, the cold precision of pastry and garde manger, the rapid motion blur of rush-hour line work, the variable lighting from commercial hood ventilation systems, and the cluttered visual complexity of active commercial kitchens where dozens of ingredients, tools, and plates occupy every surface simultaneously.
Comparison with Public Datasets
How Claru's egocentric restaurant dataset compares to publicly available alternatives for food service robotics.
| Dataset | Clips | Hours | Modalities | Environments | Annotations |
|---|---|---|---|---|---|
| EPIC-KITCHENS-100 (IJCV 2022) | ~90K segments | ~100 | RGB | 45 home kitchens | Verb-noun actions |
| YouCook2 (AAAI 2018) | ~2K videos | ~176 | RGB (YouTube) | Home cooking | Recipe steps, descriptions |
| 50 Salads (2013) | 50 videos | ~4 | RGB-D, accelerometer | Lab kitchen | Fine-grained actions |
| Claru Egocentric Restaurant | 45K+ | 300+ | RGB-D | 15+ restaurant types | Food state, plating, service flow, tools, hygiene, hand-object |
Annotation Pipeline and Quality Assurance
Stage one automated pre-labeling applies DINOv2 for ingredient and tool segmentation, SAM2 for instance-level food item and utensil masks, and a custom food-state classifier trained to distinguish raw, cooking, and finished states for common commercial ingredients. Automated thermal zone estimation uses visual cues (steam, color change, surface texture) to flag potential food safety timing events.
Stage two human annotation is performed by annotators with commercial food service experience (ServSafe certified or equivalent). They add domain-specific labels: food preparation action taxonomy (100+ verbs covering cutting techniques, cooking methods, plating movements, and service actions), ingredient state tracking (raw, marinating, cooking, resting, plated, garnished), equipment operation phases, food safety compliance indicators (handwashing, glove changes, temperature checks, cross-contamination risks), and service workflow stages (ticket receive, prep, cook, plate, expo check, serve).
Stage three QA targets 95%+ agreement on action boundaries, 93%+ IoU on food item segmentation, and 97%+ on food safety event detection (food safety annotations are treated as safety-critical, similar to healthcare data). Clips from fine dining environments receive additional plating aesthetics annotations -- garnish placement, sauce distribution, portion geometry -- from annotators with culinary arts training.
The complete taxonomy covers 100+ food service action verbs (julienne, brunoise, saute, deglaze, plate, garnish, bus, reset), 70+ ingredient categories with state attributes, 30+ kitchen tool and equipment types, 15 food safety compliance checkpoints, and 8 service workflow phases. This annotation depth enables training models that understand not just food manipulation mechanics, but the quality standards and safety protocols that define professional food service.
Use Cases
Kitchen Manipulation Policies
Training robot arms for food preparation tasks: cutting, portioning, mixing, cooking, and plating. Egocentric demonstrations from professional cooks capture the force modulation, timing cues, and visual quality checks that define skilled food preparation. Example systems: Miso Robotics Flippy, Dexai Robotics Alfred.
Food Service Workflow Automation
Optimizing commercial kitchen operations through automated task sequencing, ticket management, and prep scheduling. Models learn the multi-task coordination patterns of experienced kitchen teams during different service volume levels.
Food Safety Monitoring
Real-time detection of food safety compliance events: handwashing frequency, glove changes, temperature checks, cross-contamination risks, and time-temperature abuse. The egocentric perspective enables personal safety monitoring for individual food handlers in commercial operations.
Key References
- [1]Damen et al.. “Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100.” IJCV 2022, 2022. Link
- [2]Grauman et al.. “Ego4D: Around the World in 3,000 Hours of Egocentric Video.” CVPR 2022, 2022. Link
- [3]Zhao et al.. “Learning to Cook: Manipulation of Deformable Food Objects.” ICRA 2023, 2023. Link
- [4]Octo Model Team. “Octo: An Open-Source Generalist Robot Policy.” RSS 2024, 2024. Link
How Claru Delivers This Data
Claru's collector network includes active food service professionals across 15+ restaurant types in major metropolitan markets. Collection captures the full diversity of commercial food service -- from high-volume QSR lines processing hundreds of orders per hour to fine dining kitchens where a single plate may involve 15 minutes of precision plating. This breadth is essential for training food service robots that must adapt to different operational tempos and quality standards.
Custom campaigns can target specific restaurant types (QSR, fast-casual, fine dining), kitchen stations (grill, saute, pastry, bar), food categories (Asian cuisine knife work, Italian pasta preparation, bakery operations), or front-of-house service tasks. Turnaround from campaign specification to annotated delivery is typically 4-6 weeks.
Data is delivered in your preferred format with all sensor streams time-synchronized. Food safety annotations are available as separate layers that can be included or excluded based on your use case. Annotation exports support RLDS, HDF5, WebDataset, LeRobot, and custom schemas.
Frequently Asked Questions
15+ types including fast-casual, fine dining, QSR chains, hotel banquet kitchens, cafes, bakeries, food trucks, ghost kitchens, institutional cafeterias, pizzerias, sushi restaurants, taco shops, and barbecue operations. Each type presents distinct manipulation challenges and operational patterns.
EPIC-KITCHENS captures home cooking in residential kitchens. Claru's dataset captures commercial food service with professional-grade equipment, high-volume operations, multi-ticket coordination, food safety protocols, and the time pressure of live restaurant service. The manipulation skills, equipment, and operational dynamics are fundamentally different.
Yes. Every clip includes food safety compliance annotations: handwashing events, glove changes, temperature checks, cross-contamination risks, and time-temperature tracking for perishable ingredients. These annotations are aligned with ServSafe and HACCP standards.
Yes. Custom campaigns can target specific stations (grill, saute, prep, pastry, bar, expo), cuisine types, or service contexts (prep vs. rush hour vs. closing procedures). Contact us with your requirements for scoping.
Yes. Collection occurs during real business hours with active ticket flow. This captures authentic time pressure, multi-task coordination, environmental conditions (steam, heat, grease), and the realistic pacing of commercial food service operations.
Request a Sample Pack
Get a curated sample of egocentric restaurant video with food handling and service workflow annotations to evaluate for your food service robotics project.