EU AI Act Red Teaming Requirements: What Data You Actually Need
The EU AI Act introduces the first legally binding red teaming requirements for AI systems. Article 55 mandates adversarial testing for general-purpose AI (GPAI) providers, with fines up to 35 million euros or 7% of global turnover for prohibited practices under Article 99. This guide breaks down the specific articles, enforcement dates, and the red teaming data infrastructure you need to demonstrate compliance before the August 2025 deadline for new GPAI providers.
Article 55 mandates adversarial testing for GPAI providers with systemic risk
Article 55 of the EU AI Act requires providers of general-purpose AI models with systemic risk to "perform adversarial testing, including through red teaming" to identify and mitigate risks [1]. This is not a suggestion or best practice; it is a legal obligation with enforcement mechanisms. GPAI models are classified as systemic risk when their cumulative compute exceeds 10^25 FLOPs or when designated by the AI Office based on other criteria. For providers above this threshold, Article 55 requires documented adversarial testing protocols, structured reporting of identified vulnerabilities, and evidence of mitigation measures. The requirement applies to new GPAI providers starting August 2, 2025, with existing providers given until August 2, 2027 to comply [1].
[1]The enforcement timeline creates immediate compliance pressure
The EU AI Act enforcement follows a phased timeline. Article 5 prohibitions on unacceptable-risk AI practices took effect February 2, 2025 [1]. Article 55 obligations for new GPAI providers with systemic risk take effect August 2, 2025. Full enforcement under Article 99 begins August 2, 2026, with national authorities empowered to impose fines. Existing GPAI providers who were on the market before August 2, 2025 have a transitional period: they must comply with Article 55 adversarial testing requirements by August 2, 2027. Annex XI specifies the technical documentation that GPAI providers must maintain, including descriptions of adversarial testing methodologies and results [1]. This documentation becomes the evidentiary basis for compliance assessments.
[1]Fines scale with revenue, making non-compliance existentially expensive
Article 99 establishes a three-tier fine structure. Violations of prohibited AI practices (Article 5) carry fines up to 35 million euros or 7% of global annual turnover, whichever is higher [1]. Violations of other obligations, including Article 55 adversarial testing requirements, carry fines up to 15 million euros or 3% of global turnover [1]. For a company with 500 million euros in annual revenue, the maximum Article 55 non-compliance fine is 15 million euros. For a company with 5 billion euros revenue, it rises to 150 million euros. These penalties make the cost of implementing a structured red teaming program negligible relative to the cost of non-compliance.
[1]What is the EU AI Act enforcement timeline for red teaming obligations?
The EU AI Act rolls out enforcement in phases between February 2025 and August 2027. Each milestone introduces new obligations for AI providers, with GPAI adversarial testing requirements landing in August 2025 for new providers. This timeline determines when your red teaming data infrastructure must be operational.
February 2, 2025
August 2, 2025
August 2, 2026
August 2, 2027
Building and Red-Teaming an AI Content Moderation System
We decomposed the moderation pipeline into discrete visual and text classification models — NSFW detection, celebrity likeness recognition, and IP likeness detection — each with independent confidence thresholds. Rather than applying a single binary filter, we defined category-level rulings and conjunction-based logic: a piece of content could be flagged by one model but cleared by another depending on the product context and risk profile. Confidence thresholds were calibrated per category using labeled datasets, with separate configurations for consumer-facing products (stricter) and enterprise APIs (more permissive).
Read Full Case StudyScaling Generative AI Safety Through Human-Led Data Labeling
We built a high-throughput, quality-controlled annotation workflow focused exclusively on residual risk — only reviewing outputs that had already passed the client's automated moderation pipeline. This design choice was deliberate: the goal was not to replicate automated filtering but to measure its failure rate and characterize the types of violations it misses. Annotators evaluated text and video outputs against a multi-dimensional safety taxonomy covering nudity/NSFW content, violence and gore, hate speech and harassment, self-harm, and illegal activity.
Read Full Case StudyAnnotators
Countries
Annotations Delivered
QA Turnaround
Frequently Asked Questions
New GPAI providers with systemic risk must comply with Article 55 adversarial testing requirements by August 2, 2025. Existing GPAI providers who were on the market before that date have until August 2, 2027 to achieve full compliance. Article 5 prohibitions on unacceptable-risk AI practices already took effect February 2, 2025. Full enforcement with fine authority under Article 99 begins August 2, 2026.
Article 99 imposes fines up to 15 million euros or 3% of global annual turnover (whichever is higher) for violations of Article 55 adversarial testing requirements. Violations of Article 5 prohibited practices carry higher penalties: up to 35 million euros or 7% of global turnover. For a company with 1 billion euros in annual revenue, the maximum Article 55 fine is 30 million euros.
No. Article 55 requires GPAI providers to "perform adversarial testing, including through red teaming" but does not prescribe a specific methodology. Annex XI requires documentation of the testing approach, vulnerabilities found, and mitigations applied. In practice, this means providers have flexibility in methodology but must demonstrate systematic coverage and produce audit-ready documentation. NIST AI RMF and the Anthropic Constitutional Classifiers approach provide reference frameworks that align with the Act's intent.
There is no minimum specified in the Act. Anthropic's published red teaming program used 183 participants over 3,000 hours to achieve meaningful vulnerability reduction (86% to 4.4% jailbreak rate). Claru's safety annotation programs have deployed teams producing 241,000+ annotations with continuous 92%+ calibration agreement. The appropriate scale depends on model complexity, risk classification, and the number of safety categories requiring coverage.
Yes, if their models are placed on the EU market or their outputs are used within the EU. The AI Act applies based on market presence, not corporate domicile. A US-based GPAI provider whose model is available to EU users must comply with Article 55 adversarial testing requirements on the same timeline as EU-based providers. This extraterritorial scope mirrors the GDPR enforcement model.
Your next hire isn't a vendor.
It's a data team.
Tell us what you're training. We'll scope the dataset.
References
- [1]European Parliament and Council. “Regulation (EU) 2024/1689 — The Artificial Intelligence Act.” Official Journal of the European Union, 2024. Establishes legally binding adversarial testing requirements for GPAI providers with systemic risk (Article 55), with fines up to 35M euros or 7% turnover for prohibited practices (Article 99). Link
- [2]Anthropic. “Constitutional Classifiers: Defending Against Universal Jailbreaks.” Anthropic Research, 2025. 183 human red team participants spent 3,000+ hours testing; jailbreak success rate reduced from 86% to 4.4% through iterative model hardening with constitutional classifiers. Link
- [3]Claru. “Building and Red-Teaming an AI Content Moderation System.” Case Study, 2026. Achieved sub-2% output rejection rate with full safety coverage across NSFW, celebrity, and IP detection categories through structured adversarial testing and real-time monitoring dashboards. Link
- [4]Claru. “Scaling Generative AI Safety Through Human-Led Data Labeling.” Case Study, 2025. 241,000+ safety annotations across text and video outputs, maintaining violation rates below 2% with 92%+ annotator calibration agreement through continuous monitoring. Link
- [5]National Institute of Standards and Technology. “Artificial Intelligence Risk Management Framework (AI RMF 1.0).” NIST, 2023. Provides a voluntary framework for AI risk management including adversarial testing, red teaming, and continuous monitoring practices that align with EU AI Act requirements. Link