What is Semi-Supervised Learning, and when is it used?

Question

Accepted Answer

Semi-Supervised Learning sits between supervised and unsupervised — the model trains on a small labeled set combined with a large unlabeled set . Why it exists — labeling data is expensive (medical scans, fraud cases, legal documents), but unlabeled data is abundant. How it works — the model first uses labeled data to form an initial hypothesis, then uses unlabeled data to refine boundaries (via pseudo-labeling, consistency regularization, or self-training). Common techniques — pseudo-labeling, label propagation, FixMatch, MixMatch. When to use: when labels are scarce/expensive but raw data is plentiful — speech recognition, image classification with few annotated examples, medical diagnosis. Anchor your answer in a real-world constraint: 'labeling is expensive.' Semi-supervised learning shines when you have lots of unlabeled data and a small labeled set — common in medical imaging, fraud, document classification.