What is Semi-Supervised Learning, and when is it used?
Semi-Supervised Learning sits between supervised and unsupervised — the model trains on a small labeled set combined with a large unlabeled set.
Why it exists — labeling data is expensive (medical scans, fraud cases, legal documents), but unlabeled data is abundant.
How it works — the model first uses labeled data to form an initial hypothesis, then uses unlabeled data to refine boundaries (via pseudo-labeling, consistency regularization, or self-training).
Common techniques — pseudo-labeling, label propagation, FixMatch, MixMatch.
When to use: when labels are scarce/expensive but raw data is plentiful — speech recognition, image classification with few annotated examples, medical diagnosis.
Anchor your answer in a real-world constraint: 'labeling is expensive.' Semi-supervised learning shines when you have lots of unlabeled data and a small labeled set — common in medical imaging, fraud, document classification.