BeingBeyondMIT
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
The first dexterous Vision-Language-Action model pretrained from large-scale human videos via explicit hand motion modeling. Includes pretrained VLA models (1B, 8B, 14B parameters) and a post-training dataset for robot alignment.
Downloads569
Likes1
Technical Profile
- Modalities
- rgblanguage
- Task Types
- manipulationdexterous_manipulation
- Data Format
- zarr
- License
- MIT
Community Signals
Top 50% by downloads
Access
Need custom rgb data?
Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.
Request a Sample Pack