What is a self join and when do you use it?
A self join is a regular JOIN where a table is joined to itself using table aliases. The two “copies” are treated as different tables for the duration of the query.
Hierarchical data — employee → manager, category → parent category.
Comparing rows in the same table — finding pairs, duplicates, sequence gaps.
Aliases are mandatory — same table needs different names to disambiguate.
For deep hierarchies — use a recursive CTE (MySQL 8+) instead of chained self joins.
-- Employee/Manager hierarchy
SELECT
e.name AS employee,
e.title,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
-- Shows each employee with their manager's name
-- CEO has NULL manager (LEFT JOIN keeps them)
-- Find employees earning more than their manager
SELECT e.name, e.salary AS emp_salary, m.name AS manager, m.salary AS mgr_salary
FROM employees e
INNER JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;
-- Find duplicate emails
SELECT a.email, a.id, b.id
FROM users a
INNER JOIN users b ON a.email = b.email AND a.id < b.id;
-- a.id < b.id prevents matching a row with itself and avoids duplicate pairsTwo aliases (e and m) treat the employees table as two separate tables. e.manager_id = m.id links each employee to their manager. LEFT JOIN keeps employees without managers (CEO).
The salary comparison finds employees earning more than their boss. The duplicate finder uses a.id < b.id to avoid self-matches and duplicate pairs.
Self join = same table, two aliases. The employee/manager hierarchy is THE classic example. 'Find employees earning more than their manager' is a top interview query.
The a.id < b.id trick for finding duplicates without self-matches is essential.