Ximeng08312026apache-2.0
Image-Text-Point Cloud Triplets Dataset (CTP)
A multimodal dataset of aligned image, natural language, and 3D LiDAR point cloud triplets curated from nuScenes and KITTI for contrastive tensor pre-training and unified multimodal representation learning.
Downloads44
Why This Matters for Physical AI
This dataset enables multimodal representation learning by aligning visual, geometric, and linguistic modalities, which is foundational for embodied AI systems that must understand and interact with 3D environments using diverse sensory inputs.
Technical Profile
- Modalities
- rgbpoint_cloudlanguage
- Environment
- autonomous-driving
- Task Types
- zero-shot-image-classification
- Data Format
- jsonl
- Annotation Types
- language_instructionsbounding_boxes
- License
- apache-2.0
Access
Need custom rgb data?
Claru builds purpose-built datasets for autonomous-driving applications with dense human annotations and quality assurance.
Request a Sample Pack