X-HumanoidMIT

WoW-1 Benchmark Samples

Name: WoW-1 Benchmark Samples
Creator: X-Humanoid
License: MIT
Keywords: language, manipulation, object_manipulation, action_generation

Official evaluation dataset for the World-Omniscient World Model project containing 612 natural language prompts representing real-world robot interaction tasks, designed to assess physical consistency and causal reasoning capabilities of generative world models for robotics and embodied AI.

Downloads576
Likes1

Technical Profile

Modalities: language
Task Types: manipulationobject_manipulationaction_generation
Data Format: JSON / Parquet
License: MIT

Part of the WoW-1 Benchmark Samples family

Community Signals

Top 50% by downloads

Access

View on HuggingFace

Need custom language data?

Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets

OmniAction

A large-scale multimodal dataset for proactive robot manipulation comprising 141,162 episodes across 112 skills and 748 objects, enriched with audio, visual, and contextual instruction data for cross-modal intention recognition.

rgbaudiolanguage

145K downloadsMar 2026cc-by-nc-4.0

Xperience-10M

A large-scale egocentric multimodal dataset of human experience containing 10 million interactions and 10,000 hours of synchronized first-person recordings with six video streams, audio, stereo depth, camera pose, hand mocap, full-body mocap, IMU, and hierarchical language annotations for embodied AI, robotics, and world modeling research.

rgbaudiodepthproprioception+3

127K downloadsApr 2026other

OmniAction

A large-scale multimodal dataset for proactive robot manipulation comprising 141,162 episodes with cross-modal contextual instructions derived from spoken dialogue, environmental sounds, and visual cues rather than explicit commands.

rgbaudiolanguage

71K downloadsMar 2026cc-by-nc-4.0

OmniAction

A large-scale multimodal dataset for proactive robot manipulation with 141,162 episodes covering contextual instruction following through spoken dialogue, environmental sounds, and visual cues.

rgbaudiolanguage

48K downloadsApr 2026cc-by-nc-4.0

OmniAction

A large-scale multimodal dataset for proactive robot manipulation with 141,162 episodes covering contextual instruction following through spoken dialogue, environmental sounds, and visual cues. The dataset includes 5,096 distinct speaker timbres, 2,482 non-verbal sound events, and 640 environmental backgrounds across six categories of contextual instructions.

rgbaudiolanguage

32K downloadsApr 2026cc-by-nc-4.0

ManiTwin-100K: Manipulation-Ready Digital Object Twins

A large-scale dataset of 100K manipulation-ready digital object twins with simulation-ready 3D meshes, physical properties, functional point annotations, grasp configurations, and language descriptions validated through physics-based simulation.

3d_meshlanguage

22K downloadsApr 2026cc-by-nc-4.0