How do you maintain consistency across 1,000+ annotators?

Three mechanisms: controlled vocabularies enforced at the interface level prevent free-text drift; inter-annotator agreement is tracked weekly using Cohen's kappa with retraining triggered below 0.80; and a three-layer QA process (automated, peer, expert) catches inconsistencies at progressively finer granularity. Annotators are organized into category-specialized teams to build domain expertise.

What types of images were annotated?

Both product images (white-background, single-item studio shots) and lifestyle images (styled, multi-item, environmental context). Product images require precise attribute labeling; lifestyle images additionally require identifying and separately labeling each visible item within a complex scene. The annotation interface supports both workflows with task-specific UX.

How were short and long descriptions quality-controlled?

Description templates enforced consistent structure — short descriptions followed a factual attribute pattern, long descriptions followed a contextual styling pattern. Automated checks validated length bounds, required attribute mentions, and vocabulary compliance. Peer review compared descriptions for visually similar products to catch stylistic drift. Expert auditors reviewed samples stratified by category and annotator.

Fashion AI

Large-Scale Image Annotation for Fashion AI

Q: How was the annotation taxonomy designed?

The taxonomy was co-developed with the client's ML team and refined through iterative pilot rounds. It covers product type, material, pattern, color, fit, occasion, and style attributes with controlled vocabularies. Edge cases and ambiguous classifications are handled by decision trees embedded in the annotation interface, not left to annotator judgment.

3M+Fashion images annotated

Fashion AI

summary.md

Challenge:Fashion AI models trained on inconsistently labeled data develop silent failure modes — misclassifying garment subcategories, confusing product attributes, or hallucinating descriptions that diverge from visual content.

Solution:We deployed a distributed workforce of 1,000+ annotators and QA reviewers, organized into specialized teams by product category — apparel, accessories, beauty, and household.

Result:The annotated dataset materially improved the client's visual product recognition accuracy and SKU conditioning performance.

0M+Images annotated with structured taxonomy

0+Trained annotators deployed

>0.00Cohen's kappa inter-annotator agreement

0-layerQA process (automated + peer + expert)

// THE CHALLENGE

Fashion AI models trained on inconsistently labeled data develop silent failure modes — misclassifying garment subcategories, confusing product attributes, or hallucinating descriptions that diverge from visual content. The client's existing annotation pipeline produced acceptable per-item accuracy but lacked the structural consistency needed for SKU-level conditioning: annotators applied labels from memory rather than a governed taxonomy, descriptions varied in style and specificity, and QA was sample-based rather than systematic. Scaling from tens of thousands to millions of images under these conditions would compound inconsistency into a dataset-level quality problem that downstream models would inherit as systematic bias.

// OUR APPROACH

We deployed a distributed workforce of 1,000+ annotators and QA reviewers, organized into specialized teams by product category — apparel, accessories, beauty, and household. Each team received category-specific training materials with visual exemplars, edge-case galleries, and decision trees for ambiguous classifications. The structured taxonomy covered product type, material, pattern, color, fit, occasion, and style attributes, with controlled vocabularies enforced at the annotation interface level.

Annotators labeled both product images (white-background, single-item) and lifestyle images (styled, multi-item, environmental context). For each item, they applied structured product labels and authored both short descriptions (1-2 sentences, factual attributes) and long descriptions (3-5 sentences, contextual styling and use-case information). Description templates enforced consistent structure while allowing natural language variation.

Quality control operated at three layers: automated validation caught structural errors (missing required fields, taxonomy violations, description length bounds), peer review compared annotations for the same product across annotators to flag inconsistencies, and expert auditors performed targeted deep-dives on categories with historically high disagreement rates. Inter-annotator agreement was tracked weekly using Cohen's kappa, with categories falling below 0.80 triggering retraining interventions.

IngestCurate product URLs with images

Label1,000+ annotators apply structured taxonomy

DescribeShort + long-form product descriptions

ValidateMulti-layer quality review

IngestCurate product URLs with images

Label1,000+ annotators apply structured taxonomy

DescribeShort + long-form product descriptions

ValidateMulti-layer quality review

// RESULTS

3M+Images annotated with structured taxonomy

1,000+Trained annotators deployed

>0.85Cohen's kappa inter-annotator agreement

3-layerQA process (automated + peer + expert)

// IMPACT

The annotated dataset materially improved the client's visual product recognition accuracy and SKU conditioning performance. Models trained on the Claru-annotated data showed measurable gains in attribute-level classification accuracy compared to the prior dataset, with the largest improvements in fine-grained subcategories (e.g., distinguishing blazer from sport coat, or crossbody from messenger bag) where annotation consistency matters most. The structured descriptions also enabled a new text-conditioned generation feature that was not feasible with the previous unstructured label set.

// SAMPLE DATA

Representative record from the annotation pipeline.

sample_annotation.json

// ANNOTATION PREVIEW

// INPUT SOURCE

// BOUNDING BOXCONFIDENCE: 99.8%

// CLASSIFICATION

Category:JewelrySubcategory:NecklaceShot Type:Close Up (CU)Label:lifestyleImage

// GENERATED CAPTION

A woman with deep brown skin and short curly black hair faces forward. She is wearing two white gold necklaces layered on her neck.

// JSON_RESPONSE

{
  "annotation_id": "8e44ea82-4a86-4a68-b788-0dc0d4fd570c",
  "brand_name": "Unforgettable",
  "category": "Jewelry",
  "subcategory": "Necklace",
  "short_caption": "A woman with deep brown skin and short curly black hair faces forward. She is wearing two white gold necklaces layered on her neck.",
  "label": "lifestyleImage",
  "shot_classification": "Plain Background",
  "shot_type": "Close Up (CU)",
  "bounding_box": {
    "x1": 172.48,
    "x2": 1263.8,
    "y1": 457.85,
    "y2": 1351.61
  }
}

Service UsedExpert Annotation

// RELATED

105K

High-Confidence Video Content Classification at Scale

105,000 video clips classified in just seven days — after rapidly redesigning the annotation framework mid-project to eliminate subjectivity and deliver zero downstream rework.

Read case study

1.07M+

Preserving Object Identity Across Video Time

1.07M+ cross-segment identity verifications — pairing objects across video time using bounding boxes, facial keypoints, and human validation to teach models identity persistence despite viewpoint, motion, and lighting changes.

Read case study

// FAQ

Ready to build your next dataset?

Tell us about your project and we will scope a plan within 48 hours.