JackLiu0406apache-2.0
DA3-XVLA Training & Eval Pipeline
A custom Vision Language Action architecture combining Florence-2 VLM with frozen Depth Anything 3 backbone, Perceiver resampler, and DiT-style flow-matching action decoder, with full training and evaluation pipelines for RoboPRO/Roboreal and RoboTwin 2.0 tasks.
Downloads39
Technical Profile
- Modalities
- rgbdepth
- Environment
- simulation
- Task Types
- manipulationpick_and_place
- Data Format
- HDF5
- License
- apache-2.0
Access
Need custom rgb data?
Claru builds purpose-built datasets for simulation applications with dense human annotations and quality assurance.
Request a Sample Pack