Training Data for 1X Technologies
1X Technologies believes data is the bottleneck for android intelligence. Here is how purpose-collected real-world data accelerates their path to general-purpose robots.
About 1X Technologies
1X Technologies (formerly Halodi Robotics) builds androids designed for everyday work. Founded in 2014 in Moss, Norway, the company has relocated its AI headquarters to Sunnyvale, California and raised over $125 million from investors including OpenAI, EQT Ventures, Samsung NEXT, and Tiger Global. Their EVE wheeled humanoid is deployed in commercial security and logistics, while the NEO bipedal humanoid — priced at $20,000 for early access — targets consumer households with deliveries beginning in 2026. VP of AI Eric Jang, formerly of Google Brain, has been one of the field's most vocal advocates for the thesis that robot intelligence is fundamentally a data scaling problem.
1X Technologies at a Glance
Core Data Requirements
Teleoperation Demonstrations
Thousands of hours of teleoperated task demonstrations across diverse home environments with full multi-modal sensor recordings.
Household Activity Video
First-person and third-person video of domestic tasks — cleaning, cooking, organizing — for World Model pretraining and task understanding.
Environmental Diversity
Object manipulation data collected across hundreds of different homes and commercial spaces to train generalizable consumer-grade policies.
Language-Grounded Demonstrations
Task demonstrations paired with diverse natural language descriptions for Redwood AI's instruction-following capabilities.
Known Data Requirements
1X's approach to robot intelligence centers on massive data collection — they have explicitly stated that data volume is the primary bottleneck for android intelligence. VP of AI Eric Jang's widely-read essay 'All You Need Is Data' argues that sufficiently diverse demonstration data, combined with modern neural network architectures, can produce android intelligence without hand-engineered control. Their NEO humanoid needs diverse demonstrations of household tasks, egocentric video for visual pretraining and world model training, and multi-environment recordings to achieve the generalization required for consumer deployment across millions of unique homes.
Large-scale teleoperation demonstrations across diverse homes
Source: 1X VP of AI Eric Jang's 'All You Need Is Data' essay and public statements, 2024
Thousands of hours of teleoperated task demonstrations across diverse household and commercial environments, with full sensor recordings including RGB, depth, proprioception, and force feedback. Must span hundreds of distinct physical environments to match the diversity of consumer homes where NEO will deploy.
Household activity video for World Model pretraining
Source: NEO product positioning and 1X World Model architecture description
First-person and third-person video of humans performing common household tasks — cleaning, organizing, cooking, laundry, setting tables — to train the World Model that serves as 1X's learned physics simulator. The World Model predicts future visual states from current observations and proposed actions, requiring video that captures how objects and environments change during manipulation.
Multi-environment object interaction data for generalization
Source: 1X research publications and Eric Jang's data scaling presentations
Object manipulation recordings captured across many different physical environments — varying kitchen layouts, living rooms, bedrooms, offices — to train policies that generalize beyond the training distribution. The consumer deployment target means NEO must handle objects and environments it has never seen in training data.
Language-grounded demonstrations for Redwood AI
Source: 1X Redwood AI announcement and NEO product demonstrations
Manipulation and navigation demonstrations paired with natural language instructions for training the Redwood AI vision-language model that controls NEO in real time. Instructions must span the full range of household commands a consumer might issue, from simple ('pick up that cup') to compositional ('put the dishes from the counter into the dishwasher').
Long-horizon task recordings spanning multi-step activities
Source: NEO domestic assistance target applications
Complete recordings of multi-step household activities (e.g., preparing a meal from start to cleanup, doing a full load of laundry, organizing a room) that capture task dependencies, state tracking, and error recovery. Consumer household tasks are inherently long-horizon, requiring policies that maintain context across minutes of execution.
How Claru Data Addresses These Needs
| Lab Need | Claru Offering | Rationale |
|---|---|---|
| Large-scale teleoperation demonstrations across diverse homes | Manipulation Trajectory Dataset + Custom Multi-Home Collection | Claru's existing manipulation trajectories plus custom teleoperation campaigns leverage a distributed collector network that operates in their own homes — producing data with authentic residential diversity across 100+ cities. This provides parallel data collection across dozens of environments simultaneously, matching 1X's data scaling philosophy. |
| Household activity video for World Model pretraining | Egocentric Activity Dataset (~386K clips) | Claru's egocentric dataset contains 386K+ clips of human activities in real environments with temporal annotations and activity labels. The dataset captures how objects, scenes, and environments change during manipulation — exactly the visual dynamics that 1X's World Model needs to predict. This provides a substantial pretraining corpus without requiring robot-specific data collection. |
| Multi-environment object interaction data for generalization | Cross-environment Data Collection Campaigns | Claru's presence in 100+ cities means object interaction data can be collected across diverse home layouts, furniture styles, kitchen configurations, and cultural contexts. Each collector's home is a unique environment, producing the distributional breadth that consumer deployment demands — different countertops, different appliances, different household objects, different lighting. |
| Language-grounded demonstrations for Redwood AI | Custom Language-Paired Data Collection | Claru's annotation pipeline pairs demonstrations with diverse natural language instructions written by human annotators. Multiple phrasings per task provide the instruction diversity that Redwood AI's language backbone needs for robust understanding of natural consumer commands. |
| Long-horizon task recordings spanning multi-step activities | Custom Long-Horizon Activity Collection | Claru can coordinate collection campaigns where collectors perform complete multi-step household activities — full cooking sequences, complete laundry cycles, room organization sessions — in their own homes. These recordings capture authentic task dependencies, temporal structure, and the natural variation of real domestic workflows. |
Technical Data Analysis
1X Technologies has been more explicit than any other humanoid company about the centrality of data to their approach. VP of AI Eric Jang — formerly a research scientist at Google Brain where he worked on robotic grasping and reinforcement learning — has publicly argued that the path to android intelligence is fundamentally a data problem. In his widely-read essay 'All You Need Is Data,' Jang argued that given sufficient demonstrations of human behavior, neural networks can learn the visuomotor policies needed for general-purpose robots. This philosophy drives their massive teleoperation infrastructure and creates an insatiable demand for diverse, high-quality demonstration data.
The technical architecture behind 1X's approach relies on two key components. The World Model is a learned simulator that predicts future visual states from current observations and proposed actions. Rather than hand-building physics engines to predict how the world behaves, 1X trains neural networks on video data to learn these predictions implicitly. This approach requires massive quantities of video showing physical interactions in diverse environments — the exact kind of data that purpose-built collection campaigns produce more efficiently than web scraping. Redwood AI is a vision-language model that processes camera feeds and natural language instructions to generate robot control signals in real time. Together, these systems allow NEO to understand verbal commands, visually perceive its environment, predict the consequences of actions, and execute physical tasks.
The end-to-end imitation learning pipeline maps directly from visual observations to motor commands without decomposing robot behavior into separate perception, planning, and control modules. This architectural choice means the network must implicitly learn physics, object properties, spatial relationships, and task structure entirely from raw demonstrations — demanding orders of magnitude more training data than modular systems where each component can be trained on different data types.
NEO's target domain — consumer household assistance — is the most data-hungry deployment context imaginable. Homes vary enormously in layout, furniture, object placement, lighting, and cultural norms. A robot that can fold laundry in one apartment needs demonstrations across different clothing types, folding surfaces, laundry room configurations, and home environments to generalize reliably. The long tail of household variation — regional appliance differences, cultural norms around food preparation, idiosyncratic organization systems — is essentially infinite. No amount of data from a single teleoperation studio can cover this variation.
The consumer price point creates additional pressure. At $20,000, NEO must work reliably out of the box in each customer's unique home environment. Unlike industrial robots that can be professionally calibrated for a specific workspace, consumer robots face immediate deployment in unknown environments. This requires either exhaustive pretraining data that covers the full distribution of home environments, or extremely efficient adaptation mechanisms that can customize behavior with minimal local data. 1X is betting on the former approach — and that bet requires data at unprecedented scale and diversity.
As 1X scales from hundreds to thousands of deployed NEO units, each robot operating in a unique home becomes both a deployment endpoint and a potential data collection node. But bootstrapping this fleet requires initial training data from diverse homes before deployment — a chicken-and-egg problem that external data collection partnerships solve directly.
Key Research & References
- [1]Jang, E.. “All You Need Is Data.” 1X Technologies Blog, 2024. Link
- [2]Open X-Embodiment Collaboration. “Open X-Embodiment: Robotic Learning Datasets and RT-X Models.” ICRA 2024, 2024. Link
- [3]Chi et al.. “Diffusion Policy: Visuomotor Policy Learning via Action Diffusion.” RSS 2023, 2023. Link
- [4]Black et al.. “pi-zero: A Vision-Language-Action Flow Model for General Robot Control.” arXiv 2410.24164, 2024. Link
- [5]Brohan et al.. “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control.” CoRL 2023, 2023. Link
Frequently Asked Questions
VP of AI Eric Jang argues that robot intelligence is fundamentally a data scaling problem, not an algorithm problem. Their end-to-end imitation learning approach requires the network to implicitly learn physics, object properties, and task structure from raw demonstrations — which demands orders of magnitude more data than modular systems. More diverse demonstrations from more environments directly translate to better generalization. The scaling laws that govern language models apply similarly to robot policies: performance improves predictably with more diverse training data.
NEO needs teleoperated demonstrations of common household tasks (cleaning, cooking, organizing) collected across many different homes with varying layouts, furniture, and object placement. It also needs egocentric video of humans performing these tasks for World Model pretraining, language-paired demonstrations for Redwood AI instruction following, and long-horizon activity recordings that capture multi-step task dependencies.
The World Model is a learned physics simulator — a neural network that predicts future visual states from current observations and proposed actions. Instead of hand-coding physics rules, it learns physical dynamics from video data. The World Model needs massive quantities of video showing how objects and environments change during physical interactions: pushing objects, opening containers, pouring liquids, folding fabric. More diverse video from more environments produces more accurate physical predictions.
The consumer price point means NEO must work reliably in unknown home environments without professional calibration. Unlike industrial robots that operate in controlled, pre-mapped workspaces, NEO faces immediate deployment in homes it has never seen. This requires pretraining data diverse enough to cover the full distribution of consumer home environments — or extremely efficient adaptation mechanisms. 1X is betting on the data-scaling approach, which demands training data from hundreds to thousands of distinct residential environments.
Claru operates a global network of 10,000+ data collectors across 100+ cities who can perform standardized data collection in their own homes and local environments. This provides the environmental diversity and collection throughput that single-site teleoperation studios cannot achieve. Each collector's home is a unique data collection environment — different furniture, appliances, layouts, lighting — matching 1X's philosophy that more data from more environments is the path to consumer-grade robot generalization.
Scale 1X's Data Collection Pipeline
Discuss how Claru's global collector network can accelerate android intelligence through purpose-built demonstration data.