Ximeng08312026apache-2.0

Image-Text-Point Cloud Triplets Dataset (CTP)

A multimodal dataset of aligned image, natural language, and 3D LiDAR point cloud triplets curated from nuScenes and KITTI for contrastive tensor pre-training and unified multimodal representation learning.

Downloads44

Why This Matters for Physical AI

This dataset enables multimodal representation learning by aligning visual, geometric, and linguistic modalities, which is foundational for embodied AI systems that must understand and interact with 3D environments using diverse sensory inputs.

Technical Profile

Modalities
rgbpoint_cloudlanguage
Environment
autonomous-driving
Task Types
zero-shot-image-classification
Data Format
jsonl
Annotation Types
language_instructionsbounding_boxes
License
apache-2.0
Part of the Image-Text-Point Cloud Triplets Dataset (CTP) family

Access

Need custom rgb data?

Claru builds purpose-built datasets for autonomous-driving applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets