// ROLE SUMMARY
This role is adversarial by design. You will probe language models for weaknesses by writing creative, unusual, and sometimes uncomfortable prompts designed to surface failure modes.
AI Safety Red Teamer
// DESCRIPTION
This role is adversarial by design. You will probe language models for weaknesses by writing creative, unusual, and sometimes uncomfortable prompts designed to surface failure modes. Think of yourself as a security researcher, but for AI behavior instead of network infrastructure. Every vulnerability you find helps the model become safer for end users. The work requires creativity, persistence, and a willingness to explore dark corners of model behavior systematically.
A background in cybersecurity, penetration testing, or adversarial ML is ideal, but we have also had strong hires from journalism, law, and creative writing -- anyone who is good at finding holes in systems and articulating what they found. You need to be comfortable working with sensitive content categories (violence, hate speech, self-harm) in a clinical, analytical context. Emotional resilience is not optional.
This is project-based work with defined evaluation periods. You will be assigned specific models and attack categories for each evaluation cycle (typically 2-4 weeks). Compensation is hourly with significant bonuses for high-severity findings. Expect 15-25 hours per week.
// SKILLS & REQUIREMENTS
// FREQUENTLY ASKED QUESTIONS
// READY TO GET STARTED?
Apply in minutes
Create your profile, select your areas of expertise, and start working on frontier AI projects.
Apply Now