Training Data for Skild AI
Skild AI is building a universal robot brain. Here is how massive, diverse real-world data trains a foundation model for any robot doing any task.
About Skild AI
Skild AI is building a general-purpose foundation model for robots, aiming to create a single scalable model that enables any robot to perform any task. Founded by CMU robotics professors Deepak Pathak and Abhinav Gupta, Skild emphasizes learning from massive, diverse data rather than hand-engineered control.
Skild AI at a Glance
Known Data Requirements
Skild's mission to build a universal robot brain demands the most diverse dataset in robotics. Their model must learn from data spanning different embodiments, environments, and tasks simultaneously. Founded by researchers who pioneered large-scale robot learning at CMU, Skild understands that the quality and diversity of training data is the primary determinant of model capability.
Massive multi-task manipulation data
Source: Skild AI research publications and founding mission
Hundreds of distinct manipulation tasks captured across diverse environments and object categories to train a generalist manipulation policy.
Real-world environment diversity
Source: Sim-to-real focus in Skild's research approach
Data from hundreds of distinct real-world environments to calibrate simulation parameters and validate sim-to-real transfer across diverse conditions.
Cross-modal sensor data
Source: Foundation model architecture requiring multiple input modalities
Synchronized multi-modal recordings — RGB, depth, tactile, proprioception, language — to train multi-modal fusion in the foundation model.
Locomotion data across robot morphologies
Source: Skild's demonstrated cross-embodiment locomotion capabilities
Walking, running, and traversal recordings from bipedal humanoids, quadrupeds, and wheeled platforms across diverse terrain types to train universal locomotion controllers.
Failure and recovery demonstration data
Source: Robustness requirements for general-purpose deployment
Recordings of manipulation and locomotion failures paired with successful recovery strategies — dropped objects, slipped grasps, trip recovery — to train models that handle real-world uncertainty rather than only learning from successful demonstrations.
How Claru Data Addresses These Needs
| Lab Need | Claru Offering | Rationale |
|---|---|---|
| Massive multi-task manipulation data | Manipulation Trajectory Dataset + Custom Multi-Task Collection | Claru's existing manipulation data spans diverse tasks, supplemented by coordinated collection campaigns targeting specific task categories that fill gaps in Skild's training distribution. |
| Real-world environment diversity | Egocentric Activity Dataset (100+ cities) + Custom Collection | Claru's global presence across 100+ cities provides unmatched environmental diversity. Purpose-collected data from these locations offers real-world conditions that calibrate sim-to-real models. |
| Cross-modal sensor data | Multi-Modal Custom Collection Campaigns | Claru can equip collectors with standardized multi-sensor packages (cameras, depth sensors, IMUs) to produce synchronized multi-modal recordings at scale across distributed locations. |
| Failure and recovery demonstration data | Custom Failure-Recovery Collection Campaigns | Claru can design collection protocols that intentionally capture manipulation failures and subsequent recovery attempts, providing the failure-mode training data that typical success-only datasets lack. |
Technical Data Analysis
Skild AI emerges from the CMU robotics tradition that has consistently pushed the boundaries of robot learning at scale. Co-founders Deepak Pathak and Abhinav Gupta bring complementary expertise — Pathak in self-supervised learning and curiosity-driven exploration, Gupta in embodied intelligence and visual learning — that converges on a data-centric approach to robot intelligence.
Skild's technical strategy bets heavily on the scaling hypothesis: that a sufficiently large and diverse training dataset, combined with an appropriately scaled model architecture, will produce emergent robot capabilities analogous to what was observed in language models. This bet makes data the strategic resource — more diverse, higher-quality data directly translates to more capable robot policies.
The sim-to-real pipeline is central to Skild's approach. They use simulation to generate large quantities of synthetic manipulation data, then calibrate simulation parameters using real-world data to improve transfer fidelity. This calibration step requires real-world recordings from diverse environments — different lighting conditions, surface materials, object properties, and workspace configurations. The broader the real-world calibration set, the more accurately simulation can approximate reality.
The multi-task dimension is equally data-hungry. A general-purpose robot brain must handle hundreds or thousands of distinct tasks — from picking up small objects to opening doors to operating tools. Each task category needs sufficient demonstrations to learn the relevant manipulation primitives. Covering this task space requires a systematic data collection effort that maps the space of useful robot behaviors and ensures adequate coverage of each task category.
Skild's emphasis on cross-embodiment learning — their demos show the same model controlling humanoids, quadrupeds, and drones — creates another data axis. Each embodiment introduces unique kinematic constraints, sensor configurations, and contact dynamics. The model must learn to abstract across these differences, which requires sufficient data from each embodiment type to learn the commonalities and differences.
The failure-recovery dimension is an underappreciated data need. Most robot learning datasets contain only successful demonstrations, producing models that perform well in ideal conditions but fail catastrophically when anything goes wrong. For a general-purpose deployment, robots must handle dropped objects, slipped grasps, unexpected collisions, and environmental changes. Training robust recovery behaviors requires data that includes failures and the strategies used to recover from them.
Key Research & References
- [1]Pathak et al.. “Curiosity-driven Exploration by Self-Supervised Prediction.” ICML 2017, 2017. Link
- [2]Gupta et al.. “Embodied Intelligence via Learning and Evolution.” Nature Communications, 2022. Link
- [3]Open X-Embodiment Collaboration. “Open X-Embodiment: Robotic Learning Datasets and RT-X Models.” ICRA 2024, 2024. Link
- [4]Pathak et al.. “Self-Supervised Exploration via Disagreement.” ICML 2019, 2019. Link
- [5]Agrawal et al.. “Learning to Poke by Poking: Experiential Learning of Intuitive Physics.” NeurIPS 2016, 2016. Link
- [6]Nair et al.. “R3M: A Universal Visual Representation for Robot Manipulation.” CoRL 2022, 2022. Link
Frequently Asked Questions
Skild bets on the scaling hypothesis — that a sufficiently large and diverse dataset with an appropriately scaled model will produce emergent robot capabilities, similar to what happened with language models. This makes training data the strategic bottleneck: more diverse, higher-quality data directly translates to more capable robot policies.
Skild uses simulation for data generation but calibrates simulation parameters using real-world recordings. Without diverse real-world calibration data, simulated environments drift from physical reality — surfaces are too smooth, objects too rigid, lighting too uniform. Real-world data from many environments keeps simulation grounded in physical truth.
A truly general-purpose model must handle hundreds or thousands of distinct manipulation and navigation tasks. Each task category needs sufficient demonstrations to learn relevant primitives. This requires systematic data collection that maps the full space of useful robot behaviors and ensures adequate coverage across task types.
Most robot datasets contain only successful demonstrations, producing models that work in ideal conditions but fail when anything goes wrong. For commercial deployment, robots must handle dropped objects, slipped grasps, and unexpected obstacles. Training failure recovery requires data that captures failures and the strategies used to recover from them — a data type almost entirely absent from existing datasets.
Skild trains a single model on data from multiple robot morphologies — humanoids, quadrupeds, drones. The model learns abstract representations of tasks (what to do) that are shared across embodiments, while also learning embodiment-specific control (how to do it on each platform). This requires sufficient data from each morphology to learn both the shared and specific components.
Power the Universal Robot Brain
Discuss massive-scale, diverse training data for Skild AI's foundation model.