Pushing Task Training Data

Q: How many push interactions are needed for accurate push prediction?

For single-object push prediction, the MIT Push Dataset showed that neural models achieve good accuracy with 50,000-100,000 pushes per surface type, though marginal improvements continue up to 250,000. For a new object on a known surface, 50-100 calibration pushes suffice to adapt a pretrained model. For goal-conditioned multi-step push policies, 200-1,000 demonstrated push sequences per task type are needed for behavioral cloning with Diffusion Policy.

Q: Is simulation sufficient for push manipulation learning?

Simulation is effective for learning push strategies but inaccurate for precise push outcome prediction. The sim-to-real gap for push dynamics is primarily due to friction modeling — real friction is velocity-dependent, anisotropic, and varies with surface wear, while simulators typically use simple Coulomb friction. Simulation-only models achieve 5-8 mm position error versus 2-3 mm for real-data models. Use simulation for RL policy training with coarse physics, then calibrate with 5,000-20,000 real pushes for deployment precision.

Q: What surface properties affect push data collection?

Surface friction is the dominant factor. Collect data on 2-3 surface materials representing your deployment environment (e.g., stainless steel, laminate, rubber mat). Each surface produces different push outcomes for the same push parameters. Also control for surface cleanliness (dust and oil change friction), temperature (rubber friction varies with temperature), and surface wear (new vs. used surfaces). Document surface properties in dataset metadata for reproducibility.

Q: How do you handle rotational push outcomes?

Rotation is the most challenging aspect of push prediction. A push applied away from the center of friction induces coupled translation-rotation that depends on the object's mass distribution and contact geometry. To capture rotational behavior, vary push points around the full object perimeter (not just one side) and include both centered pushes (pure translation) and eccentric pushes (rotation-inducing). Annotate the center of friction location for each object, which can be estimated from 10-20 calibration pushes at systematically varied push points.

Q: Can push data transfer across different robot platforms?

Push outcome data (object motion given push parameters) transfers well across platforms because it depends on object-surface physics, not the robot. The key is to parameterize pushes in the object frame (push point, angle, velocity, duration) rather than robot joint space. Push policies that map RGB observations to push parameters also transfer reasonably (70-80% of single-platform performance) because the visual reasoning is robot-agnostic. The main transfer gap is in push execution precision — different robots have different position control accuracy.

Pushing datasets for non-prehensile robotic manipulation — planar object sliding, goal-directed push rearrangement, and contact-dynamics demonstrations with friction modeling for training push prediction and planning policies.

Data Requirements

Modality

RGB-D + object pose tracking + push parameters + contact force (optional)

Volume Range

10K-500K push interactions or 1K-10K multi-step demonstrations

Temporal Resolution

30-60 Hz object tracking, 100 Hz pusher position, per-push annotations

Key Annotations

Pre-push and post-push object pose (x, y, theta)Push parameters (contact point, angle, velocity, duration)Contact force profile at fingertip (if instrumented)Surface friction coefficient characterizationMulti-object displacement tracking (cascading pushes)Goal achievement label for goal-conditioned tasks

Compatible Models

Diffusion PolicyPushNetNeural push predictorsModel-based push plannersRT-1 / RT-2Goal-conditioned RL policies

Environment Types

Research tabletop workspaceWarehouse singulation stationManufacturing fixture loadingKitchen surface clearingLaboratory planar workspaceConveyor-side push sorting

How Claru Supports This Task

Claru provides pushing data collection for both systematic push physics research and goal-directed manipulation applications. Our stations feature overhead RGB-D cameras with AprilTag fiducial tracking at sub-millimeter accuracy, calibrated planar surfaces with characterized friction coefficients, and robot arms with fingertip force sensors for push contact measurement. We support systematic parameter-sweep collection (100-500 pushes per object per hour) and teleoperated goal-directed demonstrations for singulation, sorting, and arrangement tasks. Deliverables include pre/post-push object poses, push parameter annotations, contact force profiles, and per-push state labels — formatted for push prediction networks, Diffusion Policy, RL environments, or custom architectures. Our throughput enables scaling from research datasets (10K-50K pushes) to production-scale corpora (200K+ interactions) on real client objects.

What Is Robotic Pushing and Why Does Data Matter?

Pushing — moving objects by applying forces through sustained contact without grasping — is one of the most fundamental yet underappreciated manipulation skills. Humans push objects constantly: sliding a plate across a table, nudging a box into alignment, sweeping crumbs into a dustpan. For robots, pushing is essential when objects are too large, too flat, or too heavy to grasp, and as a pre-manipulation strategy to singulate cluttered objects or reposition items into graspable configurations. Pushing is also the simplest form of contact-rich manipulation, making it an ideal testbed for learning physics-based manipulation policies.

The physics of pushing are deceptively complex. When a finger pushes an object on a surface, the resulting motion depends on the push point relative to the center of friction, the friction coefficient between the object and surface, the object's mass distribution, and the contact geometry between the finger and object. Lynch and Mason (1996) formalized the mechanics of planar pushing, showing that the push-to-slide transition depends on the ratio of pushing force to normal load and the eccentricity of the push point. A push applied through the center of friction produces pure translation, while eccentric pushes produce coupled translation and rotation — a behavior that is intuitive for humans but requires explicit physics reasoning for robots.

The MIT Push Dataset (Yu et al., 2016) was the seminal contribution that enabled data-driven push prediction. Containing 250,000 push interactions across 11 object shapes with systematic variation of push parameters (location, angle, velocity), this dataset showed that neural network push predictors trained on real data outperform analytical models by 40% on held-out objects because they implicitly learn material-specific friction behaviors that analytical models must approximate. This result established that real-world push data is essential for accurate push prediction.

Modern applications of push manipulation extend well beyond research benchmarks. In warehouse automation, push-based singulation separates touching objects in cluttered bins to enable individual grasping — Amazon reports that singulation pushes are required for 30-40% of bin picks in typical e-commerce inventory. In manufacturing, push-based fixture loading aligns parts against reference surfaces with sub-millimeter precision. In household robotics, pushing is the primary strategy for clearing surfaces, arranging objects, and operating sliding mechanisms (drawers, doors). Building robust push policies for these diverse applications requires demonstration data that captures the full range of object-surface friction interactions.

Pushing Data by the Numbers

250K

Push interactions in MIT Push Dataset

40%

Neural push prediction improvement over analytical models

30-40%

Warehouse picks requiring singulation pushes

Object shapes in foundational MIT Push Dataset

<1 mm

Push alignment precision for manufacturing fixtures

3-DoF

Planar object state (x, y, theta) for push prediction

Data Requirements by Push Learning Approach

Push manipulation learning ranges from forward-model prediction to goal-conditioned policies. Each approach requires different data structures.

Approach	Data Volume	Key Modalities	Physics Model	Strengths
Forward push prediction (neural)	50K-500K push interactions	Object pose + push parameters + outcome pose	Learned from data	Accurate single-step prediction; material-aware
Model-based push planning	10K-100K pushes for model fitting	Object shape + friction coefficients + push outcomes	Quasi-static or learned dynamics	Long-horizon planning; interpretable
Goal-conditioned push RL	500K-2M episodes (mostly sim)	RGB-D + goal image/pose + reward signal	Simulator (MuJoCo, PyBullet)	Goal-directed; handles multi-step pushes
Behavioral cloning for push manipulation	1K-10K demonstrated push sequences	RGB + proprioception + push trajectory	Implicit in demonstrations	Captures human strategies; multi-object sequences
Sim-to-Real push transfer	1M+ sim + 5K-20K real for calibration	Sim contact dynamics + real friction calibration	Calibrated simulator	Scalable; diverse object coverage

State of the Art in Learned Push Manipulation

The MIT Push Dataset (Yu et al., 2016) established the foundational benchmark for data-driven push prediction. Training a neural network on 250,000 push interactions to predict the 3-DoF outcome (delta x, delta y, delta theta), the resulting model achieves 2.3 mm average position error and 1.7-degree average rotation error on held-out pushes — 40% better than the quasi-static analytical model. Importantly, the learned model generalizes to novel objects with only 50-100 calibration pushes per new object, compared to thousands of pushes needed to fit analytical model parameters.

PushNet (Li et al., 2018) extended push prediction to visual observations, training an encoder-decoder network that takes an overhead RGB image and push parameters as input and predicts the post-push object configuration. On the MIT Push Dataset benchmark, PushNet achieves 3.1 mm position error from images alone — only 35% worse than models with ground-truth pose access. This demonstrated that visual push prediction is viable for real-world deployment where precise object pose estimation may not be available.

For multi-step push manipulation, Diffusion Policy (Chi et al., 2023) achieved breakthrough results on the PushT benchmark — a task requiring pushing a T-shaped block to a target configuration. Diffusion Policy achieves 88.2% success rate compared to 72.3% for BC-RNN and 65.1% for IBC, establishing the state of the art for goal-conditioned push manipulation. The advantage comes from modeling the multimodal action distribution inherent in pushing — there are many valid push sequences to reach the same goal, and Diffusion Policy generates diverse, high-quality solutions while other methods collapse to suboptimal averages.

Recent work on foundation models for pushing has shown promising generalization. RT-2 (Brohan et al., 2023) demonstrates zero-shot push manipulation for novel objects when instructed in natural language ('push the can to the left'), achieving 68% success on unseen objects compared to 45% for RT-1. The key insight is that internet-scale pretraining provides implicit physics understanding — the VLM has seen millions of images of objects being pushed and can reason about likely outcomes. However, precise quantitative push prediction still requires task-specific data, as foundation models trade precision for breadth.

Collection Methodology for Pushing Data

Push data collection requires precise tracking of both the pusher (robot finger or tool) and the object before, during, and after each push interaction. The standard setup uses an overhead camera system (RGB or RGB-D) with fiducial markers (AprilTags) on the objects for sub-millimeter pose tracking at 30-60 Hz. The pusher position is recorded from robot joint encoders at 100+ Hz. For systematic push data collection, the workspace should be a flat, uniform surface with known friction properties — anodized aluminum or laminate surfaces provide consistent friction coefficients across the workspace.

Systematic push data collection varies push parameters across a grid: push location on the object boundary (8-16 evenly spaced points), push angle relative to the object surface normal (0, 15, 30, 45 degrees), and push velocity (10-100 mm/s in 3-5 steps). For each parameter combination, record the full push interaction: pre-push object pose, push start position, push trajectory (position over time), and post-push object pose after settling. This systematic approach produces 100-500 pushes per object in 1-2 hours, with complete coverage of the push parameter space.

For goal-directed push manipulation demonstrations, teleoperation is preferred over systematic collection because it captures human push planning strategies. Operators are presented with a current object configuration and a target configuration (shown as an overlay or separate display) and must push objects to achieve the goal. Record the full push sequence (typically 3-15 pushes for multi-object arrangements) with per-push annotations: push start/end positions, contact duration, object pose change, and distance to goal after each push. Include both successful and unsuccessful demonstrations — failed push sequences where the operator overshoots or creates unrecoverable configurations provide valuable negative signal.

For multi-object push manipulation (singulation, sorting, arrangement), the workspace should contain 5-15 objects in randomized configurations. Operators push objects to achieve specified goals: separate touching objects (singulation), sort objects by category into designated zones, or arrange objects into a target pattern. Annotate per-push: which object was pushed, whether any other objects were displaced by the push (cascading effects), and the post-push state of all objects. Multi-object push data is particularly valuable because it teaches policies about object-object interactions during pushing — a pushed object can collide with and displace neighboring objects, creating complex chain reactions.

Key Datasets for Robotic Pushing

Pushing datasets range from systematically collected single-object interactions to goal-directed multi-step manipulation demonstrations.

Dataset	Year	Scale	Objects	Key Features	Limitations
MIT Push Dataset (Yu et al.)	2016	250K push interactions	11 planar objects (varied shapes)	Systematic parameter variation; sub-mm tracking	Single-push only; no multi-step sequences
Omnipush (Bauza et al.)	2019	250K pushes, 250 objects	3D-printed shapes with varied COM	Large object diversity; controlled friction	Synthetic objects; single surface type
PushT (Chi et al.)	2023	200 demonstrations per variant	T-shaped block	Goal-conditioned; multi-step; benchmark for Diffusion Policy	Single object type; 2D workspace only
Planar manipulation (Zhou et al.)	2018	100K+ simulated push episodes	Convex 2D shapes in simulation	Long-horizon planning; multi-object	Sim-only; limited real-world validation
DROID push subtasks	2024	Subset of 76K total episodes	Real household objects	Multi-site collection; diverse environments	Push is subset; not push-specific annotations

How Claru Supports Pushing Data Needs

Claru provides pushing data collection for both systematic push physics research and goal-directed manipulation applications. Our push collection stations feature overhead RGB-D cameras with AprilTag fiducial tracking at sub-millimeter accuracy, calibrated planar surfaces with characterized friction coefficients, and robot arms instrumented with fingertip force sensors for measuring push contact forces. We support both systematic parameter-sweep collection (100-500 pushes per object per hour) and teleoperated goal-directed push demonstrations.

We collect pushing data on client-supplied objects across diverse shapes, materials, and weight distributions. For systematic datasets, we vary push location, angle, and velocity across a configurable parameter grid with automated collection protocols. For goal-directed push manipulation, our operators demonstrate multi-step push sequences for singulation, sorting, and arrangement tasks with per-push annotations including contact points, displacement vectors, and goal-distance metrics.

Claru delivers pushing datasets formatted for forward push prediction models, goal-conditioned RL environments, Diffusion Policy training, or custom architectures. Standard deliverables include pre/post-push object poses with sub-millimeter accuracy, push parameter annotations (location, angle, velocity, duration), contact force profiles, and for multi-step demonstrations, per-push state annotations and goal achievement labels. Our collection throughput enables rapid scaling from research-scale datasets (10K-50K pushes) to production-scale corpora (200K+ interactions).

References

[1]Yu et al.. “More Than a Million Ways to Be Pushed: A High-Fidelity Experimental Dataset of Planar Pushing.” IROS 2016, 2016. Link
[2]Bauza et al.. “Omnipush: Accurate, Diverse, Real-World Dataset of Pushing Dynamics with RGB-D Video.” IROS 2019, 2019. Link
[3]Chi et al.. “Diffusion Policy: Visuomotor Policy Learning via Action Diffusion.” RSS 2023, 2023. Link
[4]Li et al.. “Push-Net: Deep Planar Pushing for Objects with Unknown Physical Properties.” RSS 2018, 2018. Link
[5]Lynch and Mason. “Stable Pushing: Mechanics, Controllability, and Planning.” IJRR 1996, 1996. Link

Frequently Asked Questions

For single-object push prediction, the MIT Push Dataset showed that neural models achieve good accuracy with 50,000-100,000 pushes per surface type, though marginal improvements continue up to 250,000. For a new object on a known surface, 50-100 calibration pushes suffice to adapt a pretrained model. For goal-conditioned multi-step push policies, 200-1,000 demonstrated push sequences per task type are needed for behavioral cloning with Diffusion Policy.

Simulation is effective for learning push strategies but inaccurate for precise push outcome prediction. The sim-to-real gap for push dynamics is primarily due to friction modeling — real friction is velocity-dependent, anisotropic, and varies with surface wear, while simulators typically use simple Coulomb friction. Simulation-only models achieve 5-8 mm position error versus 2-3 mm for real-data models. Use simulation for RL policy training with coarse physics, then calibrate with 5,000-20,000 real pushes for deployment precision.

Surface friction is the dominant factor. Collect data on 2-3 surface materials representing your deployment environment (e.g., stainless steel, laminate, rubber mat). Each surface produces different push outcomes for the same push parameters. Also control for surface cleanliness (dust and oil change friction), temperature (rubber friction varies with temperature), and surface wear (new vs. used surfaces). Document surface properties in dataset metadata for reproducibility.

Rotation is the most challenging aspect of push prediction. A push applied away from the center of friction induces coupled translation-rotation that depends on the object's mass distribution and contact geometry. To capture rotational behavior, vary push points around the full object perimeter (not just one side) and include both centered pushes (pure translation) and eccentric pushes (rotation-inducing). Annotate the center of friction location for each object, which can be estimated from 10-20 calibration pushes at systematically varied push points.

Push outcome data (object motion given push parameters) transfers well across platforms because it depends on object-surface physics, not the robot. The key is to parameterize pushes in the object frame (push point, angle, velocity, duration) rather than robot joint space. Push policies that map RGB observations to push parameters also transfer reasonably (70-80% of single-platform performance) because the visual reasoning is robot-agnostic. The main transfer gap is in push execution precision — different robots have different position control accuracy.

Related Resources

Glossary

Contact Rich Manipulation→

Glossary

Manipulation Trajectory→

How To Build A Manipulation Dataset→

Guide

How To Build A Contact Rich Manipulation Dataset→

Get a Custom Quote for Pushing Task Data

Tell us about your push manipulation requirements — object types, surface conditions, and goal specifications — and we will design a data collection plan for your specific application.

Get in Touch Browse the Data Catalog