Hiprup

What is a self join and when do you use it?

A self join is a regular JOIN where a table is joined to itself using table aliases. The two “copies” are treated as different tables for the duration of the query.

  • Hierarchical data — employee → manager, category → parent category.

  • Comparing rows in the same table — finding pairs, duplicates, sequence gaps.

  • Aliases are mandatory — same table needs different names to disambiguate.

  • For deep hierarchies — use a recursive CTE (MySQL 8+) instead of chained self joins.

-- Employee/Manager hierarchy
SELECT 
  e.name AS employee,
  e.title,
  m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
-- Shows each employee with their manager's name
-- CEO has NULL manager (LEFT JOIN keeps them)

-- Find employees earning more than their manager
SELECT e.name, e.salary AS emp_salary, m.name AS manager, m.salary AS mgr_salary
FROM employees e
INNER JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

-- Find duplicate emails
SELECT a.email, a.id, b.id
FROM users a
INNER JOIN users b ON a.email = b.email AND a.id < b.id;
-- a.id < b.id prevents matching a row with itself and avoids duplicate pairs

Two aliases (e and m) treat the employees table as two separate tables. e.manager_id = m.id links each employee to their manager. LEFT JOIN keeps employees without managers (CEO).

The salary comparison finds employees earning more than their boss. The duplicate finder uses a.id < b.id to avoid self-matches and duplicate pairs.

Self join = same table, two aliases. The employee/manager hierarchy is THE classic example. 'Find employees earning more than their manager' is a top interview query.

The a.id < b.id trick for finding duplicates without self-matches is essential.

What is a self join and when do you use it? | Hiprup