// ROLE SUMMARY

We need people who can think like attackers. You will methodically test AI models against a taxonomy of failure modes: harmful content generation, jailbreak susceptibility, PII leakage, bias amplification, and more.

Prompt Attack Researcher

Red Teaming$3550/hrRemotePosted February 7, 2026

// DESCRIPTION

We need people who can think like attackers. You will methodically test AI models against a taxonomy of failure modes: harmful content generation, jailbreak susceptibility, PII leakage, bias amplification, and more. Each test is logged in a structured format that feeds into the safety team's tracking system. Successful exploits are prioritized for mitigation; unsuccessful attempts still provide valuable negative evidence.

A background in cybersecurity, penetration testing, or adversarial ML is ideal, but we have also had strong hires from journalism, law, and creative writing -- anyone who is good at finding holes in systems and articulating what they found. You need to be comfortable working with sensitive content categories (violence, hate speech, self-harm) in a clinical, analytical context. Emotional resilience is not optional.

Red team sessions are scheduled in 3-4 hour blocks. Most testers work 3-5 sessions per week. The work is mentally intense, so we encourage breaks between sessions. A weekly debrief with the safety team reviews top findings and updates attack priorities.

// SKILLS & REQUIREMENTS

Strong written communication for vulnerability reportingCreative and lateral thinking about system vulnerabilitiesSystematic approach to testing and documentationUnderstanding of AI safety concepts and failure modesFamiliarity with prompt engineering techniquesAbility to reproduce and clearly document exploitsComfort working with sensitive content categories

// FREQUENTLY ASKED QUESTIONS

// READY TO GET STARTED?

Apply in minutes

Create your profile, select your areas of expertise, and start working on frontier AI projects.

Apply Now