// ROLE SUMMARY
We need people who can think like attackers. You will methodically test AI models against a taxonomy of failure modes: harmful content generation, jailbreak susceptibility, PII leakage, bias amplification, and more.
Safety Evaluation Analyst
// DESCRIPTION
We need people who can think like attackers. You will methodically test AI models against a taxonomy of failure modes: harmful content generation, jailbreak susceptibility, PII leakage, bias amplification, and more. Each test is logged in a structured format that feeds into the safety team's tracking system. Successful exploits are prioritized for mitigation; unsuccessful attempts still provide valuable negative evidence.
A background in cybersecurity, penetration testing, or adversarial ML is ideal, but we have also had strong hires from journalism, law, and creative writing -- anyone who is good at finding holes in systems and articulating what they found. You need to be comfortable working with sensitive content categories (violence, hate speech, self-harm) in a clinical, analytical context. Emotional resilience is not optional.
Onboarding includes a detailed walkthrough of our taxonomy of failure modes, the reporting template, and the specific model you will be testing. After onboarding, you work independently but can raise urgent findings through a priority Slack channel.
// SKILLS & REQUIREMENTS
// FREQUENTLY ASKED QUESTIONS
// READY TO GET STARTED?
Apply in minutes
Create your profile, select your areas of expertise, and start working on frontier AI projects.
Apply Now