BeingBeyondMIT

Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos

The first dexterous Vision-Language-Action model pretrained from large-scale human videos via explicit hand motion modeling. Includes pretrained VLA models (1B, 8B, 14B parameters) and a post-training dataset for robot alignment.

Downloads569
Likes1

Technical Profile

Modalities
rgblanguage
Task Types
manipulationdexterous_manipulation
Data Format
zarr
License
MIT
Part of the Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos family

Community Signals

Top 50% by downloads

Access

Need custom rgb data?

Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets