shreetharother

FYP Stage 2 — VLA Pre-training Subset

Name: FYP Stage 2 — VLA Pre-training Subset
Creator: shreethar
License: other
Keywords: rgb, language, trajectory_prediction, affordance_detection, task_planning, visual_question_answering, image_captioning, failure_analysis, simulation

A compact ~90 GB multi-source dataset for Vision-Language-Action (VLA) pre-training, aggregating 8 upstream robotics sources with pre-materialized 448×448 images and inline storage for single-call dataset loading.

Downloads230

Technical Profile

Modalities: rgblanguage
Environment: simulation
Task Types: trajectory_predictionaffordance_detectiontask_planningvisual_question_answeringimage_captioningfailure_analysis
Data Format: HuggingFace Datasets
License: other

Part of the FYP Stage 2 — VLA Pre-training Subset family

Access

View on HuggingFace

Need custom rgb data?

Claru builds purpose-built datasets for simulation applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets

OmniAction

A large-scale multimodal dataset for proactive robot manipulation with 141,162 episodes covering contextual instruction following through spoken dialogue, environmental sounds, and visual cues. The dataset includes 5,096 distinct speaker timbres, 2,482 non-verbal sound events, and 640 environmental backgrounds across six categories of contextual instructions.

rgbaudiolanguage

104K downloadsApr 2026cc-by-nc-4.0

OmniAction

A large-scale multimodal dataset for proactive robot manipulation comprising 141,162 episodes with cross-modal contextual instructions derived from spoken dialogue, environmental sounds, and visual cues rather than explicit commands.

rgbaudiolanguage

98K downloadsMar 2026cc-by-nc-4.0

Xperience-10M

A large-scale egocentric multimodal dataset of human experience containing 10 million interactions and 10,000 hours of synchronized first-person recordings with six video streams, audio, stereo depth, camera pose, hand mocap, full-body mocap, IMU, and hierarchical language annotations for embodied AI, robotics, and world modeling research.

rgbaudiodepthproprioception+3

89K downloadsApr 2026other

ABC-130k

The largest open-source robot teleoperation dataset containing bimanual manipulation trajectories collected on two-arm YAM stations with 130,822 episodes across 3,555 hours of data.

rgbproprioception

87K downloadsJun 2026apache-2.0

OmniAction

A large-scale multimodal dataset for proactive robot manipulation with 141,162 episodes covering contextual instruction following through spoken dialogue, environmental sounds, and visual cues.

rgbaudiolanguage

86K downloadsApr 2026cc-by-nc-4.0

Hy-Embodied-0.5-VLA-Data

A large-scale bimanual manipulation dataset with 2,163 hours of high-fidelity demonstrations collected via custom fingertip UMI device with optical motion-capture, spanning 70+ manipulation tasks for training Vision-Language-Action foundation models.

rgbproprioceptionlanguage

76K downloadsJun 2026cc-by-4.0