Liquid Pouring Training Data

Liquid handling datasets for robotics — precision pouring, volume estimation, container transfer, and dispensing tasks with fill-level annotations and pour dynamics for kitchen and laboratory applications.

Data Requirements

Modality

RGB-D multi-view + high-precision weight (100 Hz) + proprioception

Volume Range

1K-10K pouring episodes across container-liquid combinations

Temporal Resolution

30 Hz video, 100 Hz weight sensing, 50 Hz proprioception

Key Annotations
Fill level trajectory (from weight series)Pour rate and tilt angle profileSpill detection and volumeContainer and liquid type metadataPour quality score (accuracy to target)
Compatible Models
Diffusion PolicyACT/ALOHANeural pouring controllersViscosity-conditioned policies
Environment Types
KitchenLaboratoryBar/beverage stationIndustrial dispensing line

How Claru Supports This Task

Claru's pouring data collection stations feature calibrated overhead and side-view RGB-D cameras, high-precision load cells (0.1g resolution, 100 Hz), and libraries of 15+ container geometries per station. Our liquid library covers water, cooking oils, dairy, juices, syrups, and other viscous liquids for multi-property training data. Each episode includes full weight time series, pour dynamics annotations (onset, peak rate, cessation), spill detection labels, and container-liquid metadata. Our factorial collection protocol ensures balanced representation across container pairs, liquid types, and fill levels. Datasets are delivered in RLDS, HDF5, or custom formats with full sensor calibration, weight calibration certificates, and stratified train/val/test splits. A typical 3,000-episode pouring dataset covering 10 container pairs and 5 liquid types ships in 2-3 weeks.

What Is Robotic Pouring and Why Is It Uniquely Challenging?

Liquid pouring is one of the most deceptively difficult manipulation tasks for robots. The challenge lies in the physics: liquid is a continuous medium whose behavior during pouring depends on viscosity, surface tension, container geometry, pour angle, and pour rate — all interacting nonlinearly. A small change in wrist angle can transition a controlled stream into a splashing torrent. Humans exploit decades of motor learning to pour water into a glass without conscious thought, but robots must learn this from data because the underlying fluid dynamics are computationally intractable to simulate at the speed needed for real-time control.

The data requirements for pouring are distinct from other manipulation tasks. Vision alone cannot determine fill level accurately for opaque containers, so weight sensing (via load cells under the target container) becomes a critical modality. The temporal dynamics matter enormously: pour rate must be controlled precisely, with different strategies for fast initial filling versus careful final approach to the target volume. Schenck and Fox (2017) showed that learned pouring policies trained on real data achieve 15-20 mL accuracy on 200 mL target fills, while simulation-trained policies exhibit 40-60 mL errors due to the sim-to-real gap in fluid dynamics.

Commercial applications span kitchen robotics (Moley, Samsung Bot Chef), laboratory automation (liquid handling for biotech), bartending systems (Makr Shakr), and industrial dispensing. Each domain has different precision requirements: kitchen pouring tolerates +/-10% volume error, lab pipetting demands +/-1% accuracy, and industrial dispensing operates in continuous-flow regimes. Training data must be collected with the precision requirements of the target domain, particularly in the weight sensing calibration and temporal resolution.

The physics of pouring creates unique data quality requirements. Unlike rigid object manipulation where actions are deterministic given the same initial conditions, pouring has inherent stochasticity: the same wrist angle can produce different flow rates depending on the liquid level in the source container (hydrostatic pressure changes as the container empties), the surface tension at the spout lip, and even ambient vibrations. This means a policy must learn not just an open-loop pour trajectory but a closed-loop control strategy that monitors flow rate (via weight change) and adjusts tilt angle in real time. Training data must capture this feedback loop by recording the operator's corrective adjustments throughout the pour — not just the final successful trajectory.

Pouring Data at a Glance

1K-10K
Pouring episodes needed
100 Hz
Weight sensing rate
15-20 mL
Best real-data accuracy (200 mL target)
3-5x
Sim-to-real error multiplier

Data Requirements by Pouring Application

Different pouring domains require different precision levels, modalities, and data volumes.

ApplicationVolume AccuracyKey ModalitiesData VolumeCritical Challenge
Kitchen pouring+/- 10%RGB + weight1K-3K episodesContainer diversity
Bartending+/- 5%RGB + weight + flow sensor2K-5K episodesLiquid viscosity variation
Lab liquid handling+/- 1%Weight + vision + proprioception5K-10K episodesSub-mL precision
Industrial dispensing+/- 2%Flow rate + weight1K-5K episodesContinuous flow control

State of the Art in Learned Pouring

Early robotic pouring systems used analytical models of fluid flow, but these failed to generalize across container shapes and liquid types. The shift to data-driven approaches began with Schenck and Fox (2017), who trained neural network pouring controllers from 1,000+ real episodes, and demonstrated that real-world data captures fluid dynamics effects (splashing, meniscus formation, drip behavior) that even sophisticated fluid simulators miss. Their system achieved 92% success rate for pouring water to target fill levels in transparent containers.

More recent work uses Diffusion Policy for pouring tasks. Chi et al. (2023) showed that action diffusion naturally handles the multimodality of pouring — there are valid fast-pour and slow-pour strategies for the same target volume. With just 100 demonstrations, Diffusion Policy achieves comparable performance to specialized pouring controllers trained on 10x more data. The ALOHA system (Zhao et al., 2023) demonstrated bimanual pouring (holding a pot in one hand, a cup in the other) from 50 demonstrations, though success rates drop significantly for novel container combinations.

The emerging challenge is zero-shot generalization to unseen liquids. A policy trained on water fails when encountering honey (high viscosity), olive oil (low surface tension), or carbonated beverages (gas bubbles). Pan et al. (2024) address this by training viscosity-conditioned policies from a dataset of 20+ liquid types, where viscosity is estimated from pour stream visual features. This approach requires collecting demonstrations across a diverse liquid library — a data collection challenge that favors distributed collection networks over single-lab setups.

Spill prevention and recovery are the practical priorities for deployment. The most common failure mode is overshoot — continuing to pour after the target volume is reached because of liquid in flight between source and target containers. The delay between tilting the source container back to neutral and the last drop leaving the spout can be 0.5-2 seconds depending on liquid viscosity and spout geometry. Training data should capture the pour cessation strategy: how far in advance of the target volume the operator begins tilting back, and how this lead time varies by liquid type and pour rate. Explicitly annotating the anticipation offset in demonstrations enables policies to learn proactive rather than reactive pour cessation.

Collection Methodology for Pouring Data

A pouring data collection station requires: calibrated RGB-D cameras (overhead for fill-level view, side-view for pour stream), a high-precision load cell under the target container (100 Hz, 0.1g resolution), robot proprioception at 50+ Hz, and optionally a flow rate sensor for continuous dispensing tasks. The source and target container library should include at least 10 container geometries (cups, glasses, bottles, pitchers, bowls) with varying spout designs.

Each episode follows a structured protocol: weigh the source container (establishing initial liquid volume), execute the pour via teleoperation, and weigh both containers after completion. The weight differential provides ground truth fill level. Pour dynamics annotations — pour onset time, peak flow rate, tilt angle trajectory, and pour cessation — are extracted post-hoc from the weight time series and robot joint trajectories. Spill detection uses a combination of weight discrepancy (total weight loss exceeds target transfer) and visual detection of liquid outside the target container.

Liquid diversity is the second critical axis after container diversity. Claru's pouring collection protocol requires a minimum of 5 liquid types per campaign: water, cooking oil, milk/cream, juice (high sugar = different surface tension), and a thick liquid like honey or syrup. Each liquid-container combination gets a minimum of 50 episodes at 3+ target fill levels (25%, 50%, 75% full). This produces approximately 1,500 episodes from a single station in 5 days, with full weight-series and visual annotations.

Pour cessation annotation is a critical quality layer specific to pouring data. Each episode is annotated with the pour cessation lead time — the time between the operator beginning to tilt the container back and the last liquid leaving the spout. This metric varies by liquid viscosity (0.3 s for water, 1.5 s for honey) and is essential for training policies that stop pouring proactively rather than reactively. Automated cessation timing is extracted from the weight derivative curve (the inflection point marks when the operator began the stop maneuver) and validated by human annotation for accuracy.

Key Datasets for Pouring Research

DatasetYearEpisodesLiquidsModalitiesAvailability
Schenck & Fox20171,000WaterRGB + weightPublic
MIT Pouring20196003 typesRGB-D + weight + F/TPublic
ALOHA Kitchen2023~50 pour demosWaterRGB + proprioceptionPublic
Multi-Viscosity Pour20245,00020+ typesRGB + weight + viscosityLimited release
Claru Custom20261K-10K+ConfigurableRGB-D + weight + full annotationsBuilt to spec

References

  1. [1]Schenck & Fox. Visual Closed-Loop Control for Pouring Liquids.” ICRA 2017, 2017. Link
  2. [2]Chi et al.. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion.” RSS 2023, 2023. Link
  3. [3]Zhao et al.. Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware.” RSS 2023, 2023. Link
  4. [4]Pan et al.. Learning Viscosity-Conditioned Pouring Policies from Visual Observations.” CoRL 2024, 2024. Link

Frequently Asked Questions

For a fixed container pair (e.g., pitcher to glass) with water, 100-200 demonstrations achieve 90%+ success with Diffusion Policy. For generalization across 10+ container combinations, expect 1,000-3,000 episodes. For multi-liquid generalization (water, oil, honey), add 200-500 episodes per liquid type per container pair. Start with a single container pair and scale based on deployment scope.

Vision cannot reliably determine fill level for opaque containers or liquids matching the container color. Weight provides ground truth fill level at 100 Hz resolution, capturing the pour dynamics that vision misses: flow rate acceleration during tilt, the moment of pour cessation, and subtle drip behavior. The weight time series also enables automated annotation of pour quality metrics without manual labeling.

Fluid simulation (SPH, MPM) can generate physically plausible pouring motions, but the sim-to-real gap is severe for control-relevant dynamics. Real-data policies achieve 15-20 mL accuracy on 200 mL targets, while sim-only policies show 40-60 mL errors — a 3-5x degradation. The gap widens for viscous liquids and complex container geometries. Use simulation for pretraining (10K+ episodes) then fine-tune on 500-2,000 real episodes for production accuracy.

Use a factorial design: minimum 10 source-target container pairs times 5 liquid types times 3 target fill levels. Each combination gets 50+ episodes. This produces a balanced dataset where the policy learns to disentangle container geometry effects from liquid property effects. Include transparent, translucent, and opaque containers to stress-test vision-based fill estimation.

Five primary failure modes: (1) overshoot — pouring past the target volume due to flow inertia, (2) spillage — liquid missing the target container, (3) drip trail — residual drops after pour cessation, (4) splash — liquid rebounds out of target at high flow rates, (5) incomplete pour — stopping short of target due to conservative control. A quality dataset includes all five failure modes with labels, enabling the policy to learn corrective strategies for each.

The fill level of the source container significantly affects pour dynamics. A full container requires a smaller tilt angle to initiate flow (higher hydrostatic pressure), while a nearly empty container requires aggressive tilting and produces more erratic flow. Demonstrations should cover the full range of source fill levels: full (90%+), half (40-60%), and nearly empty (under 20%). Each level has different optimal tilt strategies and pour cessation timing. Annotate each episode with the source container initial and final fill levels to enable fill-level-conditioned policy training. Budget at least 30% of demonstrations at low fill levels, as these are the hardest for policies to handle.

Get a Custom Quote for Liquid Pouring Data

Tell us your target liquid types, container geometries, and accuracy requirements. We will design a collection plan with the right sensor configuration and episode count.