Point Cloud Indoor Dataset
Dense indoor point cloud scans with semantic annotations for training 3D scene understanding and indoor navigation. 15K+ scans across 500+ rooms with per-point semantic labels, instance segmentation, and room layout annotations.
Dataset at a Glance
Why Indoor Point Cloud Data Matters for Robotics
Indoor robots -- home assistants, office delivery bots, cleaning robots, retail inventory systems, and facility management platforms -- must build and reason about 3D representations of their environments. Point clouds provide the richest 3D representation available: millions of measured 3D points that capture room geometry, furniture placement, object locations, and navigable space with centimeter-level precision. Unlike 2D images or even depth maps, point clouds represent the full 3D structure of a scene, enabling robots to plan paths around furniture, reach for objects on shelves, and understand spatial relationships that are ambiguous in 2D projections.
Training 3D perception models on point cloud data is fundamentally different from training on images. Point clouds are unordered, irregular, and variable in density -- a nearby wall produces dense points while a distant corner may have sparse coverage. Models must be invariant to point ordering (PointNet), capture local geometric patterns (PointNet++, KPConv), or operate on voxelized representations (MinkowskiNet, SparseConvNet). Each approach requires training data with per-point annotations that accurately reflect the 3D semantic structure of real indoor environments.
Existing indoor point cloud datasets like ScanNet and S3DIS have driven significant progress in 3D scene understanding research, but they are limited in scale and diversity. ScanNet contains 1,513 scans of primarily academic and residential rooms. S3DIS covers 6 areas of a single university building. Real-world indoor robots encounter far more diverse environments: homes with varied architectural styles, offices with different furniture systems, retail stores with diverse shelving, hospitals with specialized equipment, hotels, restaurants, gyms, libraries, and more. Claru's indoor point cloud dataset captures 500+ rooms across 8+ room types with dense semantic annotations.
Research from CVPR 2024 and 3DV 2024 demonstrates that 3D scene understanding models trained on diverse indoor environments achieve 25-35% better generalization to novel room types compared to models trained on single-building datasets, with the improvement driven by exposure to varied furniture styles, room scales, and architectural configurations that build robust feature representations.
Sensor Configuration and Collection Methodology
Primary scanning uses a Leica BLK360 G2 terrestrial laser scanner (360-degree coverage, up to 360K points/second, range accuracy +/-4mm at 10m, integrated HDR panoramic camera for colorized point clouds). Each room receives 2-4 scan positions to ensure complete coverage with no shadow zones behind furniture. Individual scans are registered into a unified coordinate frame using ICP alignment with a target registration error below 5mm.
Supplementary scanning with handheld LiDAR (Ouster OS0-32, carried through the space) provides dynamic scan sequences for training SLAM algorithms. The handheld sequences capture the temporal progression of map building as a robot would experience it -- starting from an unknown room and progressively discovering the space through exploration. Intel RealSense D455 cameras co-captured with the handheld sequences provide aligned RGB-D frames for training multi-modal 3D perception systems.
The dataset spans 8+ indoor environment types: residential living spaces (apartments, houses, studios), commercial offices (open plan, private offices, conference rooms), retail spaces (stores, showrooms), hospitality (hotel rooms, lobbies, restaurants), healthcare (patient rooms, waiting areas, clinics), educational (classrooms, labs, libraries), fitness (gyms, studios), and industrial (workshops, server rooms). Each environment type includes 50+ rooms with varied layouts, furniture configurations, and architectural styles to ensure diversity within categories.
Environmental metadata per scan includes room type, approximate floor area, ceiling height, number of distinct furniture items, dominant materials (wood, carpet, tile, concrete), lighting type (natural, fluorescent, LED), and clutter level (minimal, moderate, dense). For residential spaces, architectural style (modern, traditional, industrial, Scandinavian) is documented. This metadata enables researchers to study how 3D perception performance varies with environment characteristics and to build room-type-aware models.
Comparison with Public Datasets
How Claru's point cloud indoor dataset compares to publicly available alternatives for 3D indoor scene understanding.
| Dataset | Clips | Hours | Modalities | Environments | Annotations |
|---|---|---|---|---|---|
| ScanNet (CVPR 2017) | 1,513 scans | N/A | RGB-D (Kinect) | ~700 rooms (academic/residential) | 20 semantic classes, instances |
| S3DIS (CVPR 2016) | ~270 scans | N/A | Point cloud (Matterport) | 1 university building | 13 semantic classes |
| Matterport3D (3DV 2017) | ~10K views | N/A | RGB-D, mesh | 90 buildings | 40 semantic classes, instances |
| Claru Point Cloud Indoor | 15K+ scans | N/A (150+ scan hours) | LiDAR, RGB-D, color PC | 500+ rooms, 8+ types | 40+ classes, instances, layout, nav, affordance |
Annotation Pipeline and Quality Assurance
Stage one automated processing generates: colorized point clouds from the BLK360 scans (RGB color transferred from the integrated panoramic camera), floor plane detection and room boundary extraction, initial semantic segmentation using pre-trained Mask3D, and navigable space estimation from the floor plane with obstacle margins. Point cloud density is normalized to ensure consistent annotation quality across rooms scanned from different distances.
Stage two human annotation adds per-point semantic labels across 40+ indoor object categories following a taxonomy compatible with ScanNet but extended with additional classes for commercial and hospitality environments: furniture (tables, chairs, desks, beds, sofas, shelving), fixtures (sinks, toilets, light fixtures, HVAC vents), electronics (monitors, TVs, printers), architectural elements (walls, floors, ceilings, doors, windows, columns, stairs), and room-specific items (kitchen appliances, bathroom fixtures, retail displays, gym equipment). Instance segmentation separates individual objects within each class.
Stage three adds higher-level spatial annotations: room layout estimation (wall planes, floor-ceiling boundaries), functional zone delineation (work area, circulation path, storage zone, social area), navigable space mapping with clearance annotations (wheelchair-accessible paths, narrow passages, step hazards), and object affordance labels (sittable, openable, graspable, pushable). These spatial annotations go beyond per-point semantics to capture the functional structure that indoor robots need for task planning.
Stage four QA combines automated geometric checks with human review. Per-point label consistency is verified across overlapping scan regions -- the same chair must receive the same label from every scan position. Instance segmentation boundaries are checked against the 3D geometry (instance boundaries should align with geometric discontinuities). Navigable space annotations are verified against actual room accessibility. Overall targets: 95%+ per-point semantic accuracy, 90%+ instance segmentation mAP, and 97%+ navigable space accuracy.
Use Cases
3D Semantic Scene Understanding
Training point cloud segmentation networks that classify every 3D point in an indoor scene. Models learn to recognize furniture, architectural elements, and objects across diverse room types. Example architectures: Mask3D, PointNet++, MinkowskiNet, SparseConvNet, Point Transformer.
Indoor Robot Navigation
Building 3D semantic maps that robots use for path planning and obstacle avoidance. Navigable space annotations and clearance maps enable training navigation systems that reason about 3D traversability -- critical for robots that must navigate around furniture, through doorways, and over thresholds.
3D Scene Generation and Simulation
Training generative models for indoor scene synthesis and completion. Dense point cloud scans of real rooms provide ground truth for training diffusion-based 3D generation, scene completion, and furniture layout prediction models. Critical for building simulation environments for robot training.
Key References
- [1]Dai et al.. “ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes.” CVPR 2017, 2017. Link
- [2]Armeni et al.. “3D Semantic Parsing of Large-Scale Indoor Spaces.” CVPR 2016, 2016. Link
- [3]Schult et al.. “Mask3D: Mask Transformer for 3D Instance Segmentation.” ICRA 2023, 2023. Link
- [4]Wu et al.. “Point Transformer V2: Grouped Vector Attention and Partition-based Pooling.” NeurIPS 2022, 2022. Link
How Claru Delivers This Data
Claru's collector network deploys terrestrial laser scanners and handheld LiDAR across diverse indoor environments -- from luxury apartments to budget hotels, co-working spaces to medical clinics, retail stores to fitness centers. This diversity ensures that 3D perception models trained on the data generalize across the full range of indoor environments that robots encounter in deployment, not just the academic and residential settings covered by existing public datasets.
Custom campaigns can target specific room types (residential only, or office focus), architectural styles, furniture density levels, or accessibility requirements (ADA-compliant space annotations). For teams building simulation environments, Claru can provide textured mesh reconstructions alongside point clouds. Turnaround from campaign specification to annotated delivery is typically 4-8 weeks.
Data is delivered as colorized point clouds (PLY, PCD, LAS) with per-point semantic and instance labels. Accompanying files include room layout parameters, navigable space maps, camera calibration for RGB-D sequences, and handheld LiDAR trajectories for SLAM benchmarking. Format conversion to voxelized representations, ScanNet-compatible formats, or custom schemas is available at no additional cost.
Frequently Asked Questions
Primary scans use a Leica BLK360 G2 terrestrial laser scanner (360-degree, +/-4mm accuracy, integrated HDR camera for colorization). Supplementary handheld sequences use Ouster OS0-32 LiDAR with co-captured Intel RealSense D455 RGB-D for dynamic SLAM training data.
40+ classes extending the ScanNet taxonomy with additional categories for commercial, hospitality, healthcare, and retail environments. Per-point semantic labels, instance segmentation, room layout, navigable space, and object affordance annotations are all included.
8+ types: residential (apartments, houses), offices (open plan, private, conference), retail (stores, showrooms), hospitality (hotels, restaurants), healthcare (patient rooms, clinics), educational (classrooms, labs), fitness (gyms), and industrial (workshops, server rooms). 500+ distinct rooms total.
Yes. Floor-level navigable space maps with obstacle clearance annotations, wheelchair-accessible path identification, narrow passage widths, and step hazard locations. These enable training navigation systems that reason about 3D traversability for different robot footprints.
Yes. Dense colorized point clouds and optional textured mesh reconstructions provide ground truth for training scene synthesis, completion, and layout generation models. The data is suitable for building photorealistic simulation environments via NeRF, Gaussian Splatting, or traditional mesh-based approaches.
Request a Sample Pack
Get a curated sample of dense indoor point cloud scans with semantic annotations to evaluate for your 3D perception or indoor navigation project.