// ROLE SUMMARY
You will evaluate code generated by AI models. Each task shows you a programming prompt and one or more candidate solutions.
Programming Task Evaluator
// DESCRIPTION
You will evaluate code generated by AI models. Each task shows you a programming prompt and one or more candidate solutions. Your job is to assess correctness, efficiency, readability, and adherence to best practices, then rank the solutions and write a brief justification. Languages vary by project but commonly include Python, JavaScript/TypeScript, Java, C++, and SQL. Some tasks also ask you to identify bugs, suggest fixes, or rate the quality of inline comments and documentation.
We look for developers who care about code quality -- not just whether it runs, but whether it is maintainable, efficient, and clear. Experience with code review tools (GitHub PRs, Gerrit, Crucible) is a plus. Strong knowledge of software testing principles helps because some evaluations require you to reason about edge cases and test coverage.
Work is fully asynchronous. You choose which tasks to pick up based on language and difficulty filters. A weekly sync call with the project lead covers rubric updates and edge-case discussions. Minimum commitment is 10 hours per week.
// SKILLS & REQUIREMENTS
// FREQUENTLY ASKED QUESTIONS
// READY TO GET STARTED?
Apply in minutes
Create your profile, select your areas of expertise, and start working on frontier AI projects.
Apply Now