cagataydevmit

snub-pretrain-mix

A manifest dataset containing pointers and mixing ratios for all pretrain sources used by snub-v0 and snub-v1 models. The dataset itself consists of YAML and JSON metadata pointing to upstream datasets rather than raw data.

Downloads0

Why This Matters for Physical AI

This meta-dataset enables reproducible pretraining of foundation models for robotics by providing an exact manifest of upstream dataset sources, versions, and mixing ratios needed to train world foundation models like snub.

Technical Profile

Data Format
YAML, JSON
License
mit
Part of the snub family

Access

Need custom physical AI data?

Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.

Request a Sample Pack