huggingworldcc-by-4.0

TheoremQA and Multi-Domain Reasoning Dataset

A multi-domain text retrieval and reasoning dataset spanning biology, earth science, economics, psychology, robotics, and other fields, with reasoning annotations from multiple LLM sources.

Downloads5

Why This Matters for Physical AI

While primarily a language and reasoning dataset, the robotics subset may support training language models for robotic task understanding and planning.

Technical Profile

Modalities
language
Task Types
text-retrieval
Annotation Types
language_instructionsreasoning
License
cc-by-4.0
Part of the TheoremQA family

Access

Need custom language data?

Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets