oier-mees2025mit
FuSe
FuSe contains 26,866 trajectories collected on a WidowX robot with visual, tactile, sound, and action data across multiple environments, annotated with natural language instructions.
Downloads2K
Episodes26866
Likes4
Why This Matters for Physical AI
FuSe demonstrates the importance of heterogeneous sensor modalities (vision, touch, sound) for training generalizable robot policies through language grounding, advancing multi-modal learning for physical AI.
Technical Profile
- Modalities
- rgbtactileaudioproprioception
- Robot Embodiments
- WidowX
- Environment
- lab
- Task Types
- manipulation
- Episodes
- 26866
- Annotation Types
- language_instructions
- License
- mit
Community Signals
Top 25% by downloads
HuggingFace Discussions1
Access
Need custom rgb data?
Claru builds purpose-built datasets for lab applications with dense human annotations and quality assurance.
Request a Sample Pack