JackLiu0406apache-2.0

DA3-XVLA Training & Eval Pipeline

A custom Vision Language Action architecture combining Florence-2 VLM with frozen Depth Anything 3 backbone, Perceiver resampler, and DiT-style flow-matching action decoder, with full training and evaluation pipelines for RoboPRO/Roboreal and RoboTwin 2.0 tasks.

Downloads39

Technical Profile

Modalities
rgbdepth
Environment
simulation
Task Types
manipulationpick_and_place
Data Format
HDF5
License
apache-2.0
Part of the DA3-XVLA Training & Eval Pipeline family

Access

Need custom rgb data?

Claru builds purpose-built datasets for simulation applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets