What is the Aggregation Framework in MongoDB?
The Aggregation Framework is MongoDB's data processing pipeline. Documents flow through a sequence of stages, each transforming the input before passing it on.
Common stages —
$match(filter),$project(reshape),$group(aggregate),$sort,$limit,$lookup(join).$group — the equivalent of SQL
GROUP BY; supports accumulators like$sum,$avg,$max,$push.$lookup — left-outer join to another collection.
Order matters — put
$matchand$projectearly to reduce data flowing through later stages; use indexes when possible.Modern preference over Map-Reduce — faster, easier to read, more features.
// Group by department, calculate average salary
db.employees.aggregate([
{ $match: { status: 'active' } }, // Filter active employees
{ $group: {
_id: '$department', // Group by department
avgSalary: { $avg: '$salary' }, // Average salary
count: { $sum: 1 }, // Count per group
maxSalary: { $max: '$salary' } // Max salary
}},
{ $sort: { avgSalary: -1 } }, // Sort by avg salary desc
{ $limit: 5 } // Top 5 departments
]);
// $lookup — JOIN with another collection
db.orders.aggregate([
{ $lookup: {
from: 'users', // Collection to join
localField: 'userId', // Field in orders
foreignField: '_id', // Field in users
as: 'user' // Output array field
}},
{ $unwind: '$user' }, // Flatten the array to object
{ $project: {
orderId: 1,
total: 1,
'user.name': 1,
'user.email': 1
}}
]);
// $bucket — histogram
db.products.aggregate([
{ $bucket: {
groupBy: '$price',
boundaries: [0, 25, 50, 100, 500],
default: 'expensive',
output: { count: { $sum: 1 }, avgPrice: { $avg: '$price' } }
}}
]);Pipeline stages execute in order: $match filters first (efficient — reduces documents for subsequent stages), $group aggregates by department with $avg/$sum/$max, $sort orders results, $limit takes top 5. $lookup performs a left outer join with the users collection. $unwind converts the user array (from $lookup) to a single object. $bucket creates price range histograms.
Know the major stages: $match (filter first for performance), $group (aggregation), $lookup (JOIN), $unwind (flatten arrays), $project (reshape), $sort, $limit. Pipeline order matters — $match early reduces work for later stages. $lookup + $unwind is the JOIN pattern.
Aggregation replaced Map-Reduce.