// ROLE SUMMARY
We need people who can think like attackers. You will methodically test AI models against a taxonomy of failure modes: harmful content generation, jailbreak susceptibility, PII leakage, bias amplification, and more.
Prompt Attack Researcher
// DESCRIPTION
We need people who can think like attackers. You will methodically test AI models against a taxonomy of failure modes: harmful content generation, jailbreak susceptibility, PII leakage, bias amplification, and more. Each test is logged in a structured format that feeds into the safety team's tracking system. Successful exploits are prioritized for mitigation; unsuccessful attempts still provide valuable negative evidence.
A background in cybersecurity, penetration testing, or adversarial ML is ideal, but we have also had strong hires from journalism, law, and creative writing -- anyone who is good at finding holes in systems and articulating what they found. You need to be comfortable working with sensitive content categories (violence, hate speech, self-harm) in a clinical, analytical context. Emotional resilience is not optional.
Red team sessions are scheduled in 3-4 hour blocks. Most testers work 3-5 sessions per week. The work is mentally intense, so we encourage breaks between sessions. A weekly debrief with the safety team reviews top findings and updates attack priorities.
// SKILLS & REQUIREMENTS
// FREQUENTLY ASKED QUESTIONS
// READY TO GET STARTED?
Apply in minutes
Create your profile, select your areas of expertise, and start working on frontier AI projects.
Apply Now