Building the Data Infrastructure for Physical AI
Claru builds the training data behind frontier AI — egocentric video, robotics trajectories, world-model footage, and expert human judgment at scale. We are the training-data platform of Reka AI.
The Real-World Data Gap
Large language models trained on internet text. Image models trained on web-scraped photos. But robots that need to pick up a coffee mug, navigate a warehouse, or fold laundry cannot learn from internet data. They need egocentric video of real humans performing real tasks in real environments — captured with depth, pose, and action labels at millisecond precision.
This data does not exist on the internet. It cannot be synthesized in simulation without a crippling domain gap. Benchmarks like Ego4D and Open X-Embodiment have demonstrated the value of large-scale real-world datasets, but every frontier robotics lab we have spoken to cites the same bottleneck: not compute, not algorithms — data.
Claru closes that gap. We operate the capture infrastructure, enrichment pipelines, and annotation workforce to deliver training-ready datasets for robotics and physical AI — from brief to first delivery in days, not months.
Part of Reka AI
Claru is the training-data platform of Reka AI. We are the team that built the data infrastructure behind Reka's vision models and Moonvalley's generative video models — two companies that joined forces to advance models and infrastructure for physical AI.
Reka is a frontier lab building natively multimodal models for the physical AI era — foundation models that perceive and act in the real world, trained on egocentric and physical-world data, alongside the inference and video-processing infrastructure that runs them at scale. Inside that stack, Claru is the real-world data layer. We operate the capture infrastructure, multi-model enrichment pipelines, and expert annotation workforce that turn raw footage into training-ready datasets — egocentric video, robotics trajectories, and world-model footage built to the standard physical-AI models demand. Being part of Reka means our data pipeline is shaped by the people training the models that consume it.
The same investors backing Reka AI back the data infrastructure Claru builds for it.
Our Team & Network
Our core engineering and research team includes ex-FAANG engineers and researchers with deep expertise in computer vision, robotics, and AI infrastructure. We have built large-scale data pipelines, trained production ML models, and shipped AI products used by millions. Our enrichment pipeline leverages foundation models including Depth Anything V2, ViTPose, and SAM3 to produce embodied AI datasets with six enrichment layers on every clip.
Beyond the core team, Claru operates a global data collection network: 10,000+ trained collectors across 100+ cities on 6 continents. Every collector is equipped with wearable cameras and follows structured capture protocols designed for each project. This network gives us the geographic and environmental diversity that physical AI models need to generalize beyond controlled lab settings.
Engineering
Full-stack AI infrastructure: capture pipelines, multi-model enrichment (depth, pose, segmentation), vector search, and delivery systems built for petabyte-scale datasets.
Research
Deep expertise in computer vision, embodied AI, and robot learning. We design annotation schemas and quality metrics in collaboration with each client's ML team.
Operations
Global data collection at scale. Collector recruitment, training, quality assurance, and logistics across 100+ cities. Brief to first delivery in days.
Claru by the Numbers
Explore Our Work
Talk to Our Team
Whether you are building a robotics foundation model, training VLAs, or need custom real-world datasets — we would like to hear about it.