ropedia-aiother

Xperience-10M

A large-scale egocentric multimodal dataset of human experience containing 10 million interactions and 10,000 hours of synchronized first-person recordings with six video streams, audio, stereo depth, camera pose, hand mocap, full-body mocap, IMU, and hierarchical language annotations for embodied AI, robotics, and world modeling research.

Downloads2.3M
Episodes10000000
Hours10000
Likes161

Why This Matters for Physical AI

Xperience-10M provides the largest structured multimodal egocentric dataset with synchronized 3D/4D annotations essential for training embodied AI systems that understand motion, geometry, and interaction from human experience at scale.

Technical Profile

Modalities
rgbaudiodepthproprioceptionlanguageimupoint_cloud
Environment
lab
Task Types
egocentric action recognitiontask predictionaction captioninghuman-object interactiondepth estimationhand pose estimationbody motion estimationimitation learning
Episodes
10000000
Total Hours
10000
Data Format
HDF5
Annotation Types
language_instructionsaction_labelssegmentationcamera_posemocaphierarchical_captions
License
other
Part of the Xperience-10M family

Community Signals

Access

Need custom rgb data?

Claru builds purpose-built datasets for lab applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets