Training Data for CMU Robotics Institute

CMU's Robotics Institute is the world's largest academic robotics center and birthplace of Skild AI. Here is how diverse real-world data supports research that shapes the entire field.

About CMU Robotics Institute

Carnegie Mellon's Robotics Institute is the largest academic robotics research center in the world. Founded in 1979, the RI houses over 500 faculty and researchers across labs led by Deepak Pathak (robot learning at scale), Abhinav Gupta (embodied AI), Oliver Kroemer (manipulation), and many others. CMU has produced foundational work on cross-embodiment learning, curiosity-driven exploration, visual navigation, and manipulation from human demonstrations.

Large-scale robot learning from diverse dataCross-embodiment transfer and generalizationVisual navigation in the real worldManipulation from human demonstrationsCuriosity-driven exploration and self-supervised learning

CMU Robotics at a Glance

1979

RI Founded

Largest

Academic Robotics Center

500+

Faculty & Researchers

Skild AI

Notable Spinout ($14B)

ViNT

Navigation Foundation Model

Open X

Cross-Embodiment Pioneer

Known Data Requirements

CMU Robotics Institute's emphasis on real-world robot deployment — not just simulation — creates persistent demand for diverse manipulation data, navigation recordings from authentic environments, and cross-embodiment demonstrations that span the variety of robot platforms used across CMU's many labs. The institute's outsized influence on the field means that data collected for CMU research propagates to companies and labs worldwide.

Visual navigation data from diverse real environments

Source: ViNT paper (Shah et al., CoRL 2023) and NoMaD research

Navigation trajectories with visual observations from diverse indoor and outdoor environments — not just university corridors but homes, retail spaces, parks, and industrial facilities — to extend generalization beyond the environments accessible to a few university labs.

Manipulation demonstrations across multiple platforms

Source: CMU's cross-embodiment research and multi-lab robot fleet

Manipulation recordings from diverse robot platforms used across CMU labs, formatted for cross-embodiment learning research and policy training that inspired the Open X-Embodiment project and Skild AI.

Human demonstration video for imitation learning

Source: Gupta and Pathak labs' work on learning from human video

Third-person and egocentric video of humans performing tasks that robots should learn, with temporal and spatial annotations for extracting manipulation primitives and understanding task structure.

Multi-task manipulation with object diversity

Source: Kroemer Manipulation Lab's generalization research

Manipulation recordings spanning hundreds of distinct object categories and surface types to train policies that generalize across the long tail of real-world objects and workspace configurations.

Outdoor and unstructured terrain locomotion data

Source: CMU quadruped and legged robot research programs

Real-world locomotion recordings on grass, gravel, slopes, stairs, and construction sites with full kinematic and IMU measurements for training robust locomotion controllers that transfer from simulation.

How Claru Data Addresses These Needs

Lab Need	Claru Offering	Rationale
Visual navigation data from diverse real environments	Egocentric Activity Dataset + Custom Navigation Collection	Claru's egocentric data captures navigation through real environments across 100+ cities. Targeted collection with navigation-specific sensor packages extends coverage to environments underrepresented in academic datasets — providing the geographic diversity ViNT needs to generalize globally.
Manipulation demonstrations across multiple platforms	Manipulation Trajectory Dataset + Custom Multi-Platform Collection	Claru's manipulation data spans diverse interaction types. Custom collection using platform-specific recording formats can fill coverage gaps for specific robot embodiments used across CMU's many labs.
Human demonstration video for imitation learning	Egocentric Activity Dataset (~386K clips)	Claru's 386K-clip egocentric dataset provides extensive human demonstration video with activity labels and temporal annotations — directly usable for learning manipulation primitives from human video, a core research direction for the Gupta and Pathak labs.
Multi-task manipulation with object diversity	Custom Multi-Object Manipulation Collection	Claru collectors operating in their own homes and local environments naturally produce object and surface diversity that exceeds what any single university lab can achieve, covering cultural and regional object variation.

Technical Data Analysis

CMU's Robotics Institute has produced many of the key researchers and ideas driving the current wave of robot learning. Skild AI — founded by CMU professors Deepak Pathak and Abhinav Gupta in 2023 and now valued at over $14 billion after raising $1.4 billion in January 2026 — is a direct product of CMU research on cross-embodiment learning and scalable robot data. The ViNT and NoMaD navigation models, cross-embodiment transfer learning, and curiosity-driven exploration all have deep CMU roots. This academic influence means that data collected for CMU research shapes the training distributions used by companies and labs worldwide.

The ViNT visual navigation model illustrates the data challenge clearly. ViNT was trained on navigation data from diverse real-world environments and showed unprecedented generalization — but its training data was limited to environments accessible to a few university labs primarily in the United States. Expanding ViNT-quality navigation to truly global environmental diversity requires data collection at a geographic scale that academic institutions cannot achieve on their own. Homes in Tokyo look different from homes in Lagos, which look different from homes in Pittsburgh. Visual navigation policies must handle this variation to be broadly useful.

CMU's manipulation research similarly benefits from environmental diversity. Oliver Kroemer's manipulation lab develops methods that generalize across objects and settings, but generalization is fundamentally limited by the diversity of training data. More objects, more surfaces, more lighting conditions, more workspace configurations — each dimension of diversity improves robustness. The gap between laboratory demonstrations and real-world deployment conditions remains the primary obstacle, and it is fundamentally a data diversity problem.

The cross-embodiment research championed by Pathak and Gupta creates a meta-level data need: manipulation data collected on as many different robot platforms as possible. This research direction, which directly inspired the Open X-Embodiment project and Skild AI's universal robot brain, requires coordination across multiple institutions and robot types. The principle is that a model trained on data from many different robots learns embodiment-agnostic task representations — understanding what needs to happen separately from the specific kinematic structure doing it. Claru's ability to collect data using standardized protocols on diverse hardware platforms supports this coordination challenge directly.

Key Research & References

[1]Shah et al.. “ViNT: A Foundation Model for Visual Navigation.” CoRL 2023, 2023. Link
[2]Pathak et al.. “Curiosity-driven Exploration by Self-Supervised Prediction.” ICML 2017, 2017. Link
[3]Gupta et al.. “Embodied Intelligence via Learning and Evolution.” Nature Communications, 2022. Link
[4]Shah et al.. “NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration.” arXiv 2310.07896, 2023. Link
[5]Open X-Embodiment Collaboration. “Open X-Embodiment: Robotic Learning Datasets and RT-X Models.” ICRA 2024, 2024. Link
[6]Bahl et al.. “Human-to-Robot Imitation in the Wild.” RSS 2022, 2022. Link

Frequently Asked Questions

CMU models like ViNT show strong generalization but are limited by training data from a few university environments. Real-world environments vary dramatically across regions — architecture, object types, lighting, layout conventions. Geographically diverse data produces models that work globally, not just in Pittsburgh.

Cross-embodiment data is manipulation and navigation recordings from multiple different robot platforms. Training on this diversity forces models to learn embodiment-agnostic task representations — understanding what to do separately from the specific hardware doing it. CMU pioneered this research direction with work that inspired Open X-Embodiment and Skild AI.

CMU has produced founders of Skild AI (valued at $14 billion), Aurora Innovation, and many other robotics companies. Research benchmarks and datasets created at CMU become industry standards. High-quality data created for CMU research propagates through the entire robotics ecosystem via published papers, open-source code, and researcher career paths.

Skild AI was co-founded in 2023 by CMU professors Deepak Pathak and Abhinav Gupta, who brought complementary expertise in curiosity-driven learning and embodied intelligence. Their CMU research on scalable robot learning and cross-embodiment transfer directly informed Skild's approach to building a universal robot foundation model.

CMU researchers have shown that visual representations learned from watching human activities transfer to robot manipulation tasks. Robots can learn task structure, object affordances, and spatial relationships from human video before being fine-tuned on robot-specific data. High-quality, annotated human activity video provides much stronger training signal than raw internet video.