X-Humanoid2026apache-2.0

RoboGene-Dataset

A large-scale robotic task dataset generated by a diversity-driven agentic framework, specifically designed for Vision-Language-Action (VLA) model pre-training on humanoid robots. It addresses limited scene variety and insufficient physical grounding in existing datasets to enhance generalization capabilities.

Downloads7K
Likes1

Why This Matters for Physical AI

RoboGene provides a diversity-driven, physically-grounded pre-training dataset that significantly improves VLA model generalization for humanoid robots in complex real-world scenarios, achieving 5x higher success rates on unseen tasks compared to existing foundation models.

Technical Profile

Modalities
rgbproprioceptionlanguage
Robot Embodiments
humanoid
Environment
kitchenlaboratorymedicalofficeretaileducationindustrialsimulation
Task Types
manipulationpick_and_placeassemblytool_usagedual_arm_coordination
Data Format
LeRobot
Annotation Types
language_instructions
License
apache-2.0
Part of the RoboGene-Dataset family

Community Signals

Top 10% by downloads
HuggingFace Discussions1

Access

Need custom rgb data?

Claru builds purpose-built datasets for kitchen applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets