IffYuan2025other

Embodied-R1-3B-v1

A 3B vision-language model for general robotic manipulation that uses a Pointing mechanism and Reinforced Fine-tuning to bridge perception and action with strong zero-shot generalization.

Downloads152

Why This Matters for Physical AI

This model advances embodied AI by enabling robots to understand visual scenes and generate grounded actions through vision-language reasoning, demonstrating generalization capabilities critical for robotic manipulation across diverse tasks and environments.

Technical Profile

Modalities
rgblanguage
Action Space
pointing
Task Types
manipulationvisual_target_groundingreferring_region_groundingopen_form_grounding
Annotation Types
language_instructions
License
other
Part of the Embodied-R1 family

Community Signals

Top 50% by downloads

Access

Need custom rgb data?

Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.

Request a Sample Pack

Related Datasets