IffYuan2025other
Embodied-R1-3B-v1
A 3B vision-language model for general robotic manipulation that uses a Pointing mechanism and Reinforced Fine-tuning to bridge perception and action with strong zero-shot generalization.
Downloads152
Why This Matters for Physical AI
This model advances embodied AI by enabling robots to understand visual scenes and generate grounded actions through vision-language reasoning, demonstrating generalization capabilities critical for robotic manipulation across diverse tasks and environments.
Technical Profile
- Modalities
- rgblanguage
- Action Space
- pointing
- Task Types
- manipulationvisual_target_groundingreferring_region_groundingopen_form_grounding
- Annotation Types
- language_instructions
- License
- other
Community Signals
Top 50% by downloads
Access
Need custom rgb data?
Claru builds purpose-built datasets for any environment applications with dense human annotations and quality assurance.
Request a Sample Pack