Egocentric Retail Video Dataset

First-person video of real retail environments — grocery stores, pharmacies, department stores — with product interaction annotations for training retail automation AI.

Dataset at a Glance

70K+

Video clips

500+

Hours recorded

25+ store types

Environments

Annotation layers

Comparison with Public Datasets

How Claru's dataset compares to publicly available alternatives.

Dataset	Clips	Hours	Modalities	Environments	Annotations
Metrabs Retail	5K	15	RGB-D	Lab store	Pose, shelves
EgoProceL	62	8	RGB	Mixed	Procedure steps
Claru Retail	70K+	500+	RGB, Depth	25+ store types	Products, shelves, paths, hands, navigation

Use Cases

Shelf Monitoring Robots

Autonomous shelf scanning and out-of-stock detection using mobile platforms. Example models: Simbe Tally, Badger Technologies, BossaNova.

Shopping Assistant Robots

Customer interaction and in-store navigation assistance. Example models: Fellow Robots, SoftBank Pepper, LG CLOi.

Visual Retail Analytics

Understanding customer behavior and product interaction patterns. Example models: RetailNext, Trax, Standard AI.

Key References

[1]Ragusa et al.. “The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos.” WACV 2021, 2020. Link
[2]Grauman et al.. “Ego4D: Around the World in 3,000 Hours of Egocentric Video.” CVPR 2022, 2022. Link
[3]Li et al.. “Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives.” CVPR 2024, 2024. Link

How Claru Delivers This Data

Claru collectors capture first-person video in real retail environments. Unlike ceiling-mounted CCTV datasets, Claru's egocentric perspective matches how shelf-scanning robots perceive their environment. Product-level interaction annotations enable training models that understand shopping behavior at a granular level.

Frequently Asked Questions

Grocery stores, pharmacies, convenience stores, department stores, home improvement stores, and specialty retail, ranging from 2,000 to 100,000+ square feet.

Every hand-product contact event is labeled with timestamps, product category, interaction type (pick up, examine, return), and shelf location.

Yes. Shelf-state annotations label visible products, positions, and gaps, training models to detect out-of-stock conditions and planogram deviations.

Related Resources

Request a Sample Pack

Get a curated sample of egocentric retail video data with full annotations to evaluate for your project.

Get in Touch Browse the Data Catalog