Training Data for Toyota Research Institute

TRI pioneered diffusion policy for robot manipulation. Here is how real-world household data trains the next generation of assistive robots.

About Toyota Research Institute

Toyota Research Institute (TRI) applies AI to robotics, autonomous driving, and materials science. Their robotics program focuses on assistive robots for aging populations, household manipulation, and diffusion policy-based robot learning — with a distinctive emphasis on contact-rich manipulation and deformable object handling.

Diffusion policy for robot manipulationHousehold assistive roboticsContact-rich and deformable manipulationLarge behavior models for robotsBimanual manipulation with ALOHA systems

TRI Robotics at a Glance

$1B

Initial Investment

2015

Founded

Toyota

Parent Company

Diffusion

Key Innovation

Home

Target Domain

Known Data Requirements

TRI's diffusion policy research and assistive robotics program require extensive household manipulation data — particularly for contact-rich tasks involving deformable objects like cloth, food, and flexible packaging. Their vision of robots assisting aging populations demands training data from real home environments with authentic objects and task configurations.

Household manipulation with deformable objects

Source: TRI diffusion policy papers (Chi et al., 2023) and assistive robotics program

Manipulation demonstrations involving cloth folding, food preparation, flexible packaging, and other deformable object interactions in real kitchen and living environments.

Contact-rich manipulation recordings

Source: TRI's emphasis on tasks requiring complex contact dynamics

Multi-modal recordings of tasks like wiping surfaces, loading dishwashers, and stacking irregular objects where contact mechanics drive task success.

Assistive interaction data with elderly populations

Source: Toyota's strategic focus on aging society solutions

Data on assistance tasks relevant to elderly individuals — reaching high shelves, opening containers, organizing medications — in real home settings.

Bimanual ALOHA-style demonstration data

Source: TRI's adoption and extension of the ALOHA bimanual teleoperation system

Two-armed teleoperation demonstrations for precise bimanual tasks — pouring liquids, serving meals, folding laundry — using ALOHA-style low-cost hardware for scalable data collection.

Kitchen activity sequences with state annotations

Source: TRI's household robot program targeting cooking assistance

Complete kitchen task recordings — meal preparation, cleaning, organizing — annotated with object states (raw/cooked, clean/dirty, open/closed) and task phase boundaries for training state-aware manipulation policies.

How Claru Data Addresses These Needs

Lab Need	Claru Offering	Rationale
Household manipulation with deformable objects	Custom Household Manipulation Collection	Claru can collect household manipulation data in real homes across its 100+ city network — with authentic kitchens, real cloth, actual food items — providing the environmental and object diversity that lab settings cannot match.
Contact-rich manipulation recordings	Manipulation Trajectory Dataset with force annotations	Claru's manipulation data includes contact-rich interactions with multi-modal recordings suitable for training contact-aware policies like diffusion policy.
Assistive interaction data with elderly populations	Egocentric Activity Dataset + Custom Assistive Task Collection	Claru's egocentric video captures daily activities including assistance-relevant tasks, with the option for targeted collection of elderly-assistance scenarios in real homes.
Kitchen activity sequences with state annotations	Egocentric Activity Dataset + Custom Kitchen Collection	Claru's egocentric dataset captures real cooking and kitchen activities from first-person perspective. Targeted kitchen collection campaigns with state annotations produce the phase-labeled data diffusion policies need for long-horizon meal preparation tasks.

Technical Data Analysis

TRI occupies a unique position in the robotics landscape — backed by Toyota's resources and strategic interest in aging-society solutions, they combine cutting-edge AI research with a clear deployment target: household assistive robots. Their diffusion policy work (Chi et al., 2023) has become one of the most influential approaches in robot learning, and their continued development of this framework drives specific data requirements.

Diffusion policy excels at contact-rich manipulation precisely because it can model multi-modal action distributions — when there are multiple valid ways to fold a cloth or load a dishwasher, the diffusion model captures this distribution rather than averaging across modes. But this capability demands training data that contains diverse solutions to the same task. For cloth folding alone, TRI needs demonstrations showing different folding strategies, cloth types, surface conditions, and starting configurations.

The deformable object challenge is particularly data-hungry. Deformable objects — cloth, food items, flexible packaging, cables — have effectively infinite state spaces that cannot be exhaustively explored in simulation. While simulators can model simple cloth physics, the interaction between real fabric, real surfaces, and real grippers involves friction, material compliance, and draping dynamics that differ dramatically from simulation. Real-world demonstrations of deformable manipulation are irreplaceable.

TRI's assistive robotics vision adds a demographic dimension to the data requirement. Robots that assist elderly individuals must understand home environments as configured by actual residents — not standardized laboratory kitchens. Medicine bottles in bathroom cabinets, groceries in varying refrigerator layouts, clothing in diverse closet configurations. Claru's ability to collect data in real homes across many locations provides the environmental authenticity that TRI's assistive vision demands.

The ALOHA bimanual system, originally developed at Stanford and adopted extensively by TRI, has become the standard hardware for scalable bimanual data collection. ALOHA's low cost (~$20K per setup) makes it practical to deploy multiple collection stations, but the data diversity is still limited by the environments where stations are placed. TRI's adoption of ALOHA-style systems creates a natural partnership with Claru's distributed collection model — deploying low-cost teleoperation rigs across Claru's collector network to generate bimanual demonstration data from diverse real-world settings.

TRI's recent Track2Act work demonstrates learning manipulation from internet videos by predicting point tracks — visual correspondences that show how objects move during manipulation. This approach can leverage Claru's large-scale egocentric video as a pretraining resource, since first-person cooking and household videos naturally contain the dense manipulation observations that point-track models need.

Key Research & References

[1]Chi et al.. “Diffusion Policy: Visuomotor Policy Learning via Action Diffusion.” RSS 2023, 2023. Link
[2]Zhao et al.. “Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware.” RSS 2023, 2023. Link
[3]Ha et al.. “Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition.” CoRL 2024, 2024. Link
[4]Chi et al.. “Diffusion Policy Policy Optimization.” arXiv, 2024. Link
[5]Zhao et al.. “ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoperation.” arXiv, 2024. Link
[6]Bharadhwaj et al.. “Track2Act: Predicting Point Tracks from Internet Videos Enables Diverse Zero-Shot Robot Manipulation.” arXiv 2405.01527, 2024. Link

Frequently Asked Questions

Diffusion policy (Chi et al., 2023) learns robot manipulation by modeling the distribution of successful actions using a denoising diffusion process. It excels at contact-rich tasks with multiple valid solutions. It needs diverse real-world demonstrations showing different strategies for the same task — something only achievable through data collected in varied physical environments.

Deformable objects (cloth, food, cables) have effectively infinite state spaces. Simulators model simple deformable physics but miss the friction, material compliance, and draping dynamics of real materials. Real-world demonstrations of deformable manipulation are irreplaceable because the gap between simulated and real deformable object physics remains large.

Assistive robots must work in real homes as configured by actual residents — not standardized lab environments. This means training data from real kitchens, bathrooms, and living spaces with authentic object placements and configurations. Data diversity across many homes and resident preferences is essential for robust assistance.

ALOHA is a low-cost (~$20K) bimanual teleoperation system that enables human operators to perform two-handed manipulation tasks while recording full kinematic data. TRI uses ALOHA stations to generate bimanual demonstration data at scale. The platform's low cost makes it practical to deploy across many collection sites, scaling data diversity alongside volume.

Japan faces a severe demographic challenge — by 2040, over 35% of the population will be over 65. Toyota views assistive robots as a business necessity, not just a research experiment. TRI's robotics research aims to create robots that help elderly individuals with daily tasks in their own homes, making real-world household data essential for deployment.