Large-Scale Image Annotation for Fashion AI
Challenge:Fashion AI models trained on inconsistently labeled data develop silent failure modes — misclassifying garment subcategories, confusing product attributes, or hallucinating descriptions that diverge from visual content.
Solution:We deployed a distributed workforce of 1,000+ annotators and QA reviewers, organized into specialized teams by product category — apparel, accessories, beauty, and household.
Result:The annotated dataset materially improved the client's visual product recognition accuracy and SKU conditioning performance.
Fashion AI models trained on inconsistently labeled data develop silent failure modes — misclassifying garment subcategories, confusing product attributes, or hallucinating descriptions that diverge from visual content. The client's existing annotation pipeline produced acceptable per-item accuracy but lacked the structural consistency needed for SKU-level conditioning: annotators applied labels from memory rather than a governed taxonomy, descriptions varied in style and specificity, and QA was sample-based rather than systematic. Scaling from tens of thousands to millions of images under these conditions would compound inconsistency into a dataset-level quality problem that downstream models would inherit as systematic bias.
We deployed a distributed workforce of 1,000+ annotators and QA reviewers, organized into specialized teams by product category — apparel, accessories, beauty, and household. Each team received category-specific training materials with visual exemplars, edge-case galleries, and decision trees for ambiguous classifications. The structured taxonomy covered product type, material, pattern, color, fit, occasion, and style attributes, with controlled vocabularies enforced at the annotation interface level.
Annotators labeled both product images (white-background, single-item) and lifestyle images (styled, multi-item, environmental context). For each item, they applied structured product labels and authored both short descriptions (1-2 sentences, factual attributes) and long descriptions (3-5 sentences, contextual styling and use-case information). Description templates enforced consistent structure while allowing natural language variation.
Quality control operated at three layers: automated validation caught structural errors (missing required fields, taxonomy violations, description length bounds), peer review compared annotations for the same product across annotators to flag inconsistencies, and expert auditors performed targeted deep-dives on categories with historically high disagreement rates. Inter-annotator agreement was tracked weekly using Cohen's kappa, with categories falling below 0.80 triggering retraining interventions.
The annotated dataset materially improved the client's visual product recognition accuracy and SKU conditioning performance. Models trained on the Claru-annotated data showed measurable gains in attribute-level classification accuracy compared to the prior dataset, with the largest improvements in fine-grained subcategories (e.g., distinguishing blazer from sport coat, or crossbody from messenger bag) where annotation consistency matters most. The structured descriptions also enabled a new text-conditioned generation feature that was not feasible with the previous unstructured label set.
Representative record from the annotation pipeline.


A woman with deep brown skin and short curly black hair faces forward. She is wearing two white gold necklaces layered on her neck.
{
"annotation_id": "8e44ea82-4a86-4a68-b788-0dc0d4fd570c",
"brand_name": "Unforgettable",
"category": "Jewelry",
"subcategory": "Necklace",
"short_caption": "A woman with deep brown skin and short curly black hair faces forward. She is wearing two white gold necklaces layered on her neck.",
"label": "lifestyleImage",
"shot_classification": "Plain Background",
"shot_type": "Close Up (CU)",
"bounding_box": {
"x1": 172.48,
"x2": 1263.8,
"y1": 457.85,
"y2": 1351.61
}
}Ready to build your next dataset?
Tell us about your project and we will scope a plan within 48 hours.