Egocentric Outdoor Urban Video Dataset

First-person video of urban pedestrian environments — sidewalks, crosswalks, plazas — captured across 30+ cities with navigation annotations for training delivery robots and outdoor autonomous systems.

Dataset at a Glance

95K+
Video clips
700+
Hours recorded
30+ cities
Environments
6+
Annotation layers

Comparison with Public Datasets

How Claru's dataset compares to publicly available alternatives.

DatasetClipsHoursModalitiesEnvironmentsAnnotations
Cityscapes25K~50RGB, Stereo50 cities (vehicle)Semantic segmentation
nuScenes40K5.5RGB, LiDAR, Radar2 cities (vehicle)3D boxes, maps
Claru Urban95K+700+RGB, Depth, IMU30+ cities (pedestrian)Pedestrians, surfaces, obstacles, weather

Use Cases

Sidewalk Delivery Robots

Navigating pedestrian environments with dynamic foot traffic and urban obstacles. Example models: Serve Robotics, Nuro, Coco.

Legged Robot Navigation

Outdoor locomotion and path planning in unstructured urban terrain. Example models: Boston Dynamics Spot, ANYbotics ANYmal, Ghost Robotics.

Urban Scene Understanding

Scene parsing for identifying sidewalks, road surfaces, curb cuts, and construction zones. Example models: SegFormer, Mask2Former, OneFormer.

Key References

  1. [1]Cordts et al.. The Cityscapes Dataset for Semantic Urban Scene Understanding.” CVPR 2016, 2016. Link
  2. [2]Caesar et al.. nuScenes: A Multimodal Dataset for Autonomous Driving.” CVPR 2020, 2020. Link
  3. [3]Shah et al.. GNM: A General Navigation Model to Drive Any Robot.” ICRA 2023, 2023. Link

How Claru Delivers This Data

Claru's collector network spans 100+ cities, capturing genuine pedestrian-perspective urban navigation. Unlike vehicle-mounted datasets, Claru's data shows the world from sidewalk height — the perspective delivery robots and legged robots actually operate from.

Frequently Asked Questions

Driving datasets are captured from vehicle height with vehicle-centric annotations. Claru's urban dataset is captured from pedestrian/robot height (1-1.5m), showing curb details, ground textures, and leg-level obstacles that vehicle datasets miss.

The dataset spans clear, overcast, rainy, and snowy conditions across all four seasons, with night captures under artificial lighting. Each clip carries weather metadata for filtering.

Yes. Every outdoor clip includes GPS traces at 1Hz for spatial indexing and geographic diversity verification.

Request a Sample Pack

Get a curated sample of egocentric outdoor urban video data with full annotations to evaluate for your project.