Robotics Data Formats & Standards

Every robotics framework expects data in a specific format. This guide covers the schema, compatibility, and conversion paths for 15 formats used in robot learning. Claru delivers data in any of these formats.

15 formats documented

RLDS (Reinforcement Learning Datasets)

RLDS (Reinforcement Learning Datasets) is Google DeepMind's standard format for robotics datasets. Understand its schema, which models use it, and how Claru delivers data in RLDS format.

.tfrecord

HDF5 (Hierarchical Data Format 5)

HDF5 is the most widely used file format for robotics datasets. Learn its structure, compression options, and how Claru delivers robot training data in HDF5.

.hdf5.h5

WebDataset (Tar-based Shards)

WebDataset uses tar archives for efficient sequential I/O in large-scale training. Understand the shard format, streaming capability, and Claru's WebDataset delivery.

.tar.tar.gz

TFDS (TensorFlow Datasets)

TFDS provides structured dataset pipelines for TensorFlow. Learn how robotics datasets are structured in TFDS and how Claru delivers TFDS-compatible data.

.tfrecord

LeRobot Format

LeRobot is Hugging Face's open-source robotics framework with its own dataset format. Learn the schema, ecosystem, and how Claru delivers LeRobot-compatible data.

.parquet.mp4

Zarr (Chunked Array Storage)

Zarr provides chunked, compressed N-dimensional array storage ideal for large robotics datasets. Understand its structure and cloud-native capabilities.

.zarr.zip

ROS Bag

ROS bags record timestamped sensor data from Robot Operating System. Learn the ROS 1/2 bag formats and how Claru converts between ROS and ML formats.

.bag.db3.mcap

MCAP (Modular Container for Autonomous Platforms)

MCAP is a modern, high-performance container format for heterogeneous robotics data. Understand its advantages over ROS bags and how Claru uses MCAP.

.mcap

nuScenes Format

The nuScenes format is the de facto standard for autonomous driving datasets. Learn its schema, annotation structure, and how Claru delivers compatible data.

.json.pcd.bin.jpg

KITTI Format

The KITTI format is one of the most widely supported data formats in autonomous driving and 3D vision. Learn its file structure and how Claru delivers KITTI-compatible data.

.bin.txt.png

BOP Format (Benchmark for 6D Object Pose)

BOP is the standard format for 6D object pose estimation benchmarks. Learn its structure and how Claru delivers pose estimation training data in BOP format.

.json.png.ply

COCO Format (Common Objects in Context)

COCO format is the most widely used annotation format for object detection and segmentation. Learn its structure and how Claru delivers robotics annotations in COCO format.

.json.jpg.png

Open3D Format

Open3D provides standard point cloud formats and processing tools for 3D robotics data. Understand PCD, PLY, and Open3D's tensor dataset format.

.pcd.ply.xyz.obj

Apache Arrow / Parquet

Apache Arrow and Parquet provide columnar data storage for efficient analytics and ML training. Learn how robotics tabular data is stored in Arrow format.

.parquet.arrow.feather

Protocol Buffers for Robotics

Protocol Buffers (protobuf) provide efficient binary serialization for robotics data schemas. Learn how protobuf is used in robotics pipelines and MCAP containers.

.pb.proto.binpb.mcap

Need Data in a Specific Format?

Claru handles all format conversion as part of the delivery pipeline. Tell us your framework and we will deliver data ready to load.