cagataydevmit
snub-pretrain-mix
A manifest dataset containing pointers and mixing ratios for all pretrain sources used by snub-v0 and snub-v1 models. The dataset itself consists of YAML and JSON metadata pointing to upstream datasets rather than raw data.
Downloads0
Why This Matters for Physical AI
This meta-dataset enables reproducible pretraining of foundation models for robotics by providing an exact manifest of upstream dataset sources, versions, and mixing ratios needed to train world foundation models like snub.
Technical Profile
- Data Format
- YAML, JSON
- License
- mit
Access
Need custom physical AI data?
Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.
Request a Sample Pack