// ROLE SUMMARY

This role is about making judgment calls on AI output quality. You will see prompts across a wide range of topics -- math, coding, creative writing, factual questions, safety scenarios -- and rate the corresponding model responses on dimensions like helpfulness, truthfulness, and harmlessness.

Language Model Feedback Analyst

RLHF$4055/hrRemotePosted January 21, 2026

// DESCRIPTION

This role is about making judgment calls on AI output quality. You will see prompts across a wide range of topics -- math, coding, creative writing, factual questions, safety scenarios -- and rate the corresponding model responses on dimensions like helpfulness, truthfulness, and harmlessness. Some comparisons are obvious; many are genuinely difficult and require you to weigh tradeoffs between competing virtues. Your ratings directly shape how the model behaves in production.

Ideal candidates have a graduate-level education or equivalent professional experience in a field that requires careful argumentation and evidence evaluation. We have had especially strong results from people with backgrounds in academic research, technical writing, legal analysis, and scientific peer review. Familiarity with LLM failure modes -- hallucination, sycophancy, refusal errors -- is valuable.

Annotators work in focused sessions of 3-6 hours at a time, scheduling their own shifts within project windows. Weekly volume targets are typically 20-30 hours but can scale up during surge periods. A weekly calibration meeting aligns the team on rubric updates and tricky edge cases.

// SKILLS & REQUIREMENTS

Strong analytical and critical thinking skillsExcellent written English communicationGood judgment on safety and sensitivity issuesExperience with RLHF or preference labeling pipelinesFamiliarity with LLM capabilities and failure modes

// FREQUENTLY ASKED QUESTIONS

// READY TO GET STARTED?

Apply in minutes

Create your profile, select your areas of expertise, and start working on frontier AI projects.

Apply Now