huggingworldcc-by-4.0
TheoremQA and Multi-Domain Reasoning Dataset
A multi-domain text retrieval and reasoning dataset spanning biology, earth science, economics, psychology, robotics, and other fields, with reasoning annotations from multiple LLM sources.
Downloads5
Why This Matters for Physical AI
While primarily a language and reasoning dataset, the robotics subset may support training language models for robotic task understanding and planning.
Technical Profile
- Modalities
- language
- Task Types
- text-retrieval
- Annotation Types
- language_instructionsreasoning
- License
- cc-by-4.0
Access
Need custom language data?
Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.
Request a Sample Pack