Pushing Task Training Data
Pushing datasets for non-prehensile robotic manipulation — planar object sliding, goal-directed push rearrangement, and contact-dynamics demonstrations with friction modeling for training push prediction and planning policies.
Data Requirements
RGB-D + object pose tracking + push parameters + contact force (optional)
10K-500K push interactions or 1K-10K multi-step demonstrations
30-60 Hz object tracking, 100 Hz pusher position, per-push annotations
How Claru Supports This Task
Claru provides pushing data collection for both systematic push physics research and goal-directed manipulation applications. Our stations feature overhead RGB-D cameras with AprilTag fiducial tracking at sub-millimeter accuracy, calibrated planar surfaces with characterized friction coefficients, and robot arms with fingertip force sensors for push contact measurement. We support systematic parameter-sweep collection (100-500 pushes per object per hour) and teleoperated goal-directed demonstrations for singulation, sorting, and arrangement tasks. Deliverables include pre/post-push object poses, push parameter annotations, contact force profiles, and per-push state labels — formatted for push prediction networks, Diffusion Policy, RL environments, or custom architectures. Our throughput enables scaling from research datasets (10K-50K pushes) to production-scale corpora (200K+ interactions) on real client objects.
What Is Robotic Pushing and Why Does Data Matter?
Pushing — moving objects by applying forces through sustained contact without grasping — is one of the most fundamental yet underappreciated manipulation skills. Humans push objects constantly: sliding a plate across a table, nudging a box into alignment, sweeping crumbs into a dustpan. For robots, pushing is essential when objects are too large, too flat, or too heavy to grasp, and as a pre-manipulation strategy to singulate cluttered objects or reposition items into graspable configurations. Pushing is also the simplest form of contact-rich manipulation, making it an ideal testbed for learning physics-based manipulation policies.
The physics of pushing are deceptively complex. When a finger pushes an object on a surface, the resulting motion depends on the push point relative to the center of friction, the friction coefficient between the object and surface, the object's mass distribution, and the contact geometry between the finger and object. Lynch and Mason (1996) formalized the mechanics of planar pushing, showing that the push-to-slide transition depends on the ratio of pushing force to normal load and the eccentricity of the push point. A push applied through the center of friction produces pure translation, while eccentric pushes produce coupled translation and rotation — a behavior that is intuitive for humans but requires explicit physics reasoning for robots.
The MIT Push Dataset (Yu et al., 2016) was the seminal contribution that enabled data-driven push prediction. Containing 250,000 push interactions across 11 object shapes with systematic variation of push parameters (location, angle, velocity), this dataset showed that neural network push predictors trained on real data outperform analytical models by 40% on held-out objects because they implicitly learn material-specific friction behaviors that analytical models must approximate. This result established that real-world push data is essential for accurate push prediction.
Modern applications of push manipulation extend well beyond research benchmarks. In warehouse automation, push-based singulation separates touching objects in cluttered bins to enable individual grasping — Amazon reports that singulation pushes are required for 30-40% of bin picks in typical e-commerce inventory. In manufacturing, push-based fixture loading aligns parts against reference surfaces with sub-millimeter precision. In household robotics, pushing is the primary strategy for clearing surfaces, arranging objects, and operating sliding mechanisms (drawers, doors). Building robust push policies for these diverse applications requires demonstration data that captures the full range of object-surface friction interactions.
Pushing Data by the Numbers
Data Requirements by Push Learning Approach
Push manipulation learning ranges from forward-model prediction to goal-conditioned policies. Each approach requires different data structures.
| Approach | Data Volume | Key Modalities | Physics Model | Strengths |
|---|---|---|---|---|
| Forward push prediction (neural) | 50K-500K push interactions | Object pose + push parameters + outcome pose | Learned from data | Accurate single-step prediction; material-aware |
| Model-based push planning | 10K-100K pushes for model fitting | Object shape + friction coefficients + push outcomes | Quasi-static or learned dynamics | Long-horizon planning; interpretable |
| Goal-conditioned push RL | 500K-2M episodes (mostly sim) | RGB-D + goal image/pose + reward signal | Simulator (MuJoCo, PyBullet) | Goal-directed; handles multi-step pushes |
| Behavioral cloning for push manipulation | 1K-10K demonstrated push sequences | RGB + proprioception + push trajectory | Implicit in demonstrations | Captures human strategies; multi-object sequences |
| Sim-to-Real push transfer | 1M+ sim + 5K-20K real for calibration | Sim contact dynamics + real friction calibration | Calibrated simulator | Scalable; diverse object coverage |
State of the Art in Learned Push Manipulation
The MIT Push Dataset (Yu et al., 2016) established the foundational benchmark for data-driven push prediction. Training a neural network on 250,000 push interactions to predict the 3-DoF outcome (delta x, delta y, delta theta), the resulting model achieves 2.3 mm average position error and 1.7-degree average rotation error on held-out pushes — 40% better than the quasi-static analytical model. Importantly, the learned model generalizes to novel objects with only 50-100 calibration pushes per new object, compared to thousands of pushes needed to fit analytical model parameters.
PushNet (Li et al., 2018) extended push prediction to visual observations, training an encoder-decoder network that takes an overhead RGB image and push parameters as input and predicts the post-push object configuration. On the MIT Push Dataset benchmark, PushNet achieves 3.1 mm position error from images alone — only 35% worse than models with ground-truth pose access. This demonstrated that visual push prediction is viable for real-world deployment where precise object pose estimation may not be available.
For multi-step push manipulation, Diffusion Policy (Chi et al., 2023) achieved breakthrough results on the PushT benchmark — a task requiring pushing a T-shaped block to a target configuration. Diffusion Policy achieves 88.2% success rate compared to 72.3% for BC-RNN and 65.1% for IBC, establishing the state of the art for goal-conditioned push manipulation. The advantage comes from modeling the multimodal action distribution inherent in pushing — there are many valid push sequences to reach the same goal, and Diffusion Policy generates diverse, high-quality solutions while other methods collapse to suboptimal averages.
Recent work on foundation models for pushing has shown promising generalization. RT-2 (Brohan et al., 2023) demonstrates zero-shot push manipulation for novel objects when instructed in natural language ('push the can to the left'), achieving 68% success on unseen objects compared to 45% for RT-1. The key insight is that internet-scale pretraining provides implicit physics understanding — the VLM has seen millions of images of objects being pushed and can reason about likely outcomes. However, precise quantitative push prediction still requires task-specific data, as foundation models trade precision for breadth.
Collection Methodology for Pushing Data
Push data collection requires precise tracking of both the pusher (robot finger or tool) and the object before, during, and after each push interaction. The standard setup uses an overhead camera system (RGB or RGB-D) with fiducial markers (AprilTags) on the objects for sub-millimeter pose tracking at 30-60 Hz. The pusher position is recorded from robot joint encoders at 100+ Hz. For systematic push data collection, the workspace should be a flat, uniform surface with known friction properties — anodized aluminum or laminate surfaces provide consistent friction coefficients across the workspace.
Systematic push data collection varies push parameters across a grid: push location on the object boundary (8-16 evenly spaced points), push angle relative to the object surface normal (0, 15, 30, 45 degrees), and push velocity (10-100 mm/s in 3-5 steps). For each parameter combination, record the full push interaction: pre-push object pose, push start position, push trajectory (position over time), and post-push object pose after settling. This systematic approach produces 100-500 pushes per object in 1-2 hours, with complete coverage of the push parameter space.
For goal-directed push manipulation demonstrations, teleoperation is preferred over systematic collection because it captures human push planning strategies. Operators are presented with a current object configuration and a target configuration (shown as an overlay or separate display) and must push objects to achieve the goal. Record the full push sequence (typically 3-15 pushes for multi-object arrangements) with per-push annotations: push start/end positions, contact duration, object pose change, and distance to goal after each push. Include both successful and unsuccessful demonstrations — failed push sequences where the operator overshoots or creates unrecoverable configurations provide valuable negative signal.
For multi-object push manipulation (singulation, sorting, arrangement), the workspace should contain 5-15 objects in randomized configurations. Operators push objects to achieve specified goals: separate touching objects (singulation), sort objects by category into designated zones, or arrange objects into a target pattern. Annotate per-push: which object was pushed, whether any other objects were displaced by the push (cascading effects), and the post-push state of all objects. Multi-object push data is particularly valuable because it teaches policies about object-object interactions during pushing — a pushed object can collide with and displace neighboring objects, creating complex chain reactions.
Key Datasets for Robotic Pushing
Pushing datasets range from systematically collected single-object interactions to goal-directed multi-step manipulation demonstrations.
| Dataset | Year | Scale | Objects | Key Features | Limitations |
|---|---|---|---|---|---|
| MIT Push Dataset (Yu et al.) | 2016 | 250K push interactions | 11 planar objects (varied shapes) | Systematic parameter variation; sub-mm tracking | Single-push only; no multi-step sequences |
| Omnipush (Bauza et al.) | 2019 | 250K pushes, 250 objects | 3D-printed shapes with varied COM | Large object diversity; controlled friction | Synthetic objects; single surface type |
| PushT (Chi et al.) | 2023 | 200 demonstrations per variant | T-shaped block | Goal-conditioned; multi-step; benchmark for Diffusion Policy | Single object type; 2D workspace only |
| Planar manipulation (Zhou et al.) | 2018 | 100K+ simulated push episodes | Convex 2D shapes in simulation | Long-horizon planning; multi-object | Sim-only; limited real-world validation |
| DROID push subtasks | 2024 | Subset of 76K total episodes | Real household objects | Multi-site collection; diverse environments | Push is subset; not push-specific annotations |
How Claru Supports Pushing Data Needs
Claru provides pushing data collection for both systematic push physics research and goal-directed manipulation applications. Our push collection stations feature overhead RGB-D cameras with AprilTag fiducial tracking at sub-millimeter accuracy, calibrated planar surfaces with characterized friction coefficients, and robot arms instrumented with fingertip force sensors for measuring push contact forces. We support both systematic parameter-sweep collection (100-500 pushes per object per hour) and teleoperated goal-directed push demonstrations.
We collect pushing data on client-supplied objects across diverse shapes, materials, and weight distributions. For systematic datasets, we vary push location, angle, and velocity across a configurable parameter grid with automated collection protocols. For goal-directed push manipulation, our operators demonstrate multi-step push sequences for singulation, sorting, and arrangement tasks with per-push annotations including contact points, displacement vectors, and goal-distance metrics.
Claru delivers pushing datasets formatted for forward push prediction models, goal-conditioned RL environments, Diffusion Policy training, or custom architectures. Standard deliverables include pre/post-push object poses with sub-millimeter accuracy, push parameter annotations (location, angle, velocity, duration), contact force profiles, and for multi-step demonstrations, per-push state annotations and goal achievement labels. Our collection throughput enables rapid scaling from research-scale datasets (10K-50K pushes) to production-scale corpora (200K+ interactions).
References
- [1]Yu et al.. “More Than a Million Ways to Be Pushed: A High-Fidelity Experimental Dataset of Planar Pushing.” IROS 2016, 2016. Link
- [2]Bauza et al.. “Omnipush: Accurate, Diverse, Real-World Dataset of Pushing Dynamics with RGB-D Video.” IROS 2019, 2019. Link
- [3]Chi et al.. “Diffusion Policy: Visuomotor Policy Learning via Action Diffusion.” RSS 2023, 2023. Link
- [4]Li et al.. “Push-Net: Deep Planar Pushing for Objects with Unknown Physical Properties.” RSS 2018, 2018. Link
- [5]Lynch and Mason. “Stable Pushing: Mechanics, Controllability, and Planning.” IJRR 1996, 1996. Link
Frequently Asked Questions
For single-object push prediction, the MIT Push Dataset showed that neural models achieve good accuracy with 50,000-100,000 pushes per surface type, though marginal improvements continue up to 250,000. For a new object on a known surface, 50-100 calibration pushes suffice to adapt a pretrained model. For goal-conditioned multi-step push policies, 200-1,000 demonstrated push sequences per task type are needed for behavioral cloning with Diffusion Policy.
Simulation is effective for learning push strategies but inaccurate for precise push outcome prediction. The sim-to-real gap for push dynamics is primarily due to friction modeling — real friction is velocity-dependent, anisotropic, and varies with surface wear, while simulators typically use simple Coulomb friction. Simulation-only models achieve 5-8 mm position error versus 2-3 mm for real-data models. Use simulation for RL policy training with coarse physics, then calibrate with 5,000-20,000 real pushes for deployment precision.
Surface friction is the dominant factor. Collect data on 2-3 surface materials representing your deployment environment (e.g., stainless steel, laminate, rubber mat). Each surface produces different push outcomes for the same push parameters. Also control for surface cleanliness (dust and oil change friction), temperature (rubber friction varies with temperature), and surface wear (new vs. used surfaces). Document surface properties in dataset metadata for reproducibility.
Rotation is the most challenging aspect of push prediction. A push applied away from the center of friction induces coupled translation-rotation that depends on the object's mass distribution and contact geometry. To capture rotational behavior, vary push points around the full object perimeter (not just one side) and include both centered pushes (pure translation) and eccentric pushes (rotation-inducing). Annotate the center of friction location for each object, which can be estimated from 10-20 calibration pushes at systematically varied push points.
Push outcome data (object motion given push parameters) transfers well across platforms because it depends on object-surface physics, not the robot. The key is to parameterize pushes in the object frame (push point, angle, velocity, duration) rather than robot joint space. Push policies that map RGB observations to push parameters also transfer reasonably (70-80% of single-platform performance) because the visual reasoning is robot-agnostic. The main transfer gap is in push execution precision — different robots have different position control accuracy.
Get a Custom Quote for Pushing Task Data
Tell us about your push manipulation requirements — object types, surface conditions, and goal specifications — and we will design a data collection plan for your specific application.