BeingBeyond2025MIT
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
The first dexterous Vision-Language-Action model pretrained from large-scale human videos via explicit hand motion modeling. Includes pretrained VLA models (1B, 8B, 14B parameters) and a post-training dataset for robot alignment.
Downloads188
Likes1
Why This Matters for Physical AI
Being-H0 advances dexterous vision-language-action modeling by pretraining on large-scale human videos with explicit hand motion modeling, enabling robots to understand and replicate complex human hand manipulation tasks.
Technical Profile
- Modalities
- rgblanguage
- Action Space
- motion_tokens
- Task Types
- manipulationdexterous_manipulation
- Data Format
- zarr
- Annotation Types
- language_instructionshand_motion
- License
- MIT
Community Signals
Top 50% by downloads
HuggingFace Discussions1
Access
Need custom rgb data?
Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.
Request a Sample Pack