Training Data for Sanctuary AI
Sanctuary AI's Phoenix has the most dexterous hands in commercial humanoid robotics. Here is how real-world data trains those hands to work.
About Sanctuary AI
Sanctuary AI is building Phoenix, a general-purpose humanoid robot with industry-leading dexterous hands. The company's Carbon AI system aims to replicate human-like intelligence for manipulation-heavy tasks, with a focus on retail, logistics, and manufacturing applications.
Sanctuary AI at a Glance
Known Data Requirements
Sanctuary AI's emphasis on dexterous hands — Phoenix has the most human-like hand design among commercial humanoids — creates acute demand for fine-grained manipulation data. Their Carbon AI system needs demonstrations of precision grasping, in-hand manipulation, and bimanual coordination at a scale that only distributed data collection can provide.
Fine-grained dexterous manipulation
Source: Sanctuary AI emphasis on human-like hand dexterity and Carbon AI
Precision grasping, in-hand manipulation, and finger-level coordination demonstrations with multi-camera and tactile sensor recordings.
Retail and commercial task demonstrations
Source: Sanctuary AI deployment targets in retail and logistics
Task demonstrations for retail scenarios — shelf stocking, product sorting, package handling — captured in real store environments.
Bimanual coordination sequences
Source: Phoenix's dual-arm design and task requirements
Two-handed task demonstrations where both hands coordinate — opening containers, folding items, assembling products — with synchronized multi-modal recordings.
Tool use and object manipulation with articulated grippers
Source: Phoenix's articulated hand design enabling tool use unlike simplified grippers
Demonstrations of using everyday tools — scissors, screwdrivers, tongs, pens — where the high-DOF hand must adapt grip configuration to tool geometry and apply task-appropriate force profiles.
Teleoperation demonstration data with haptic feedback
Source: Sanctuary AI's teleoperation pipeline for Carbon AI training
High-fidelity teleoperation recordings with synchronized haptic feedback, capturing the teleoperator's force intentions alongside the robot's executed motions for learning compliant manipulation policies.
How Claru Data Addresses These Needs
| Lab Need | Claru Offering | Rationale |
|---|---|---|
| Fine-grained dexterous manipulation | Custom Dexterous Manipulation Collection | Claru can coordinate collection campaigns focused on precision grasping and in-hand manipulation tasks using multi-camera setups that capture the finger-level detail needed for dexterous policy learning. |
| Retail and commercial task demonstrations | Egocentric Activity Dataset + Custom Retail Collection | Claru's existing activity video covers retail-adjacent scenarios, with targeted collection campaigns in partner retail environments for domain-specific training data. |
| Bimanual coordination sequences | Manipulation Trajectory Dataset with bimanual annotations | Claru's manipulation data includes multi-arm coordination recordings with temporal synchronization — critical for learning bimanual policies where timing and force distribution between hands matters. |
| Tool use and object manipulation with articulated grippers | Egocentric Activity Dataset + Custom Tool Use Collection | Claru's egocentric dataset captures real humans using everyday tools from a first-person perspective. Targeted tool-use collection campaigns can produce the high-detail, multi-angle recordings that dexterous tool manipulation policies require. |
Technical Data Analysis
Sanctuary AI has made a deliberate bet on dexterity. While other humanoid companies use simplified grippers or two-finger claws, Phoenix features highly articulated hands designed to match human dexterity. This design decision unlocks a broader range of manipulation tasks but creates a proportionally larger data requirement — dexterous manipulation in a 20+ DOF hand space requires far more demonstrations than simple parallel-jaw grasping.
The Carbon AI control system is designed to learn from human demonstrations at scale. Sanctuary operates a teleoperation pipeline where human operators control Phoenix remotely, generating training data with every shift. However, the diversity of demonstrations is limited by the number and variety of teleoperation environments. A single teleoperation studio produces data from one physical setting — the same table, the same objects, the same lighting.
Claru's distributed collection network addresses this diversity gap directly. By coordinating collectors across 100+ locations to perform standardized manipulation tasks with local objects and environments, Claru can provide the environmental variety that single-site teleoperation cannot. Each collector location contributes unique surface textures, object geometries, lighting conditions, and workspace configurations.
The retail deployment context adds another dimension. Shelf stocking, product sorting, and package handling in real stores involve objects with diverse geometries, materials, and weights — from fragile glass bottles to heavy canned goods. Training data must capture this product diversity across different store layouts and shelf configurations. Claru's ability to collect data in actual retail environments provides the authentic visual and physical context that laboratory mockups cannot replicate.
Phoenix's articulated hands also enable tool use — a capability that most humanoids with simple grippers cannot attempt. Using scissors, operating a screwdriver, holding tongs, or writing with a pen requires the hand to adapt its grip configuration to the tool geometry and apply force profiles specific to each tool's mechanics. Training this capability requires demonstrations of diverse tool use tasks, a data type that barely exists in current robot learning datasets.
The 7th generation Phoenix robot represents Sanctuary's latest hardware iteration, with improved actuators, lighter weight, and enhanced sensory feedback. Each hardware generation shifts the optimal training data distribution slightly, as new actuator capabilities enable previously impossible manipulation strategies. This creates an ongoing need for fresh training data that exploits the latest hardware capabilities.
Key Research & References
- [1]Sanctuary AI. “Carbon: A General-Purpose AI Control System for Humanoid Robots.” Company Technical Overview, 2024. Link
- [2]Shaw et al.. “Learning Dexterous Manipulation from Human Demonstrations.” CoRL 2023, 2023. Link
- [3]Chen et al.. “Visual Dexterity: In-Hand Reorientation of Novel Objects.” ICRA 2023, 2023. Link
- [4]Qi et al.. “General In-Hand Object Rotation with Vision and Touch.” CoRL 2023, 2023. Link
- [5]Arunachalam et al.. “Holo-Dex: Teaching Dexterity with Immersive Mixed Reality.” ICRA 2023, 2023. Link
- [6]Mandikal and Grauman. “DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors.” CoRL 2022, 2022. Link
Frequently Asked Questions
Phoenix's 20+ DOF hands create a high-dimensional action space for manipulation. Unlike simple grippers with 1-2 DOF, dexterous hands need demonstrations that cover finger-level coordination, in-hand manipulation, precision grasping, and force distribution — requiring orders of magnitude more demonstration data to train reliable policies.
Sanctuary operates a teleoperation pipeline where human operators remotely control Phoenix, generating training data during each session. However, this single-site approach limits environmental diversity. Distributed data collection across many locations provides the variety needed for policies that generalize beyond the teleoperation studio.
Phoenix needs demonstrations of shelf stocking, product sorting, and package handling in real store environments with diverse products (varying shapes, weights, materials, fragility). Training data must cover different store layouts, shelf configurations, and product categories to enable reliable retail deployment.
Yes. Phoenix's 20+ DOF articulated hands can adapt grip configuration to different tool geometries — holding scissors, operating screwdrivers, using tongs. Most competing humanoids with simplified grippers cannot perform tool use. Training this capability requires demonstrations of diverse tool interactions, a data type that barely exists in current robot learning datasets.
Robots trained through teleoperation learn directly from real demonstrations — avoiding the sim-to-real gap but inheriting the environmental biases of the teleoperation studio. If all training data comes from one room with one set of objects, the policy overfits to those conditions. Diverse real-world data from many environments is essential to break this overfitting.
Train Phoenix's Dexterous Hands
Discuss fine-grained manipulation data for Sanctuary AI's humanoid robot platform.