Hiprup

What is the Aggregation Framework in MongoDB?

The Aggregation Framework is MongoDB's data processing pipeline. Documents flow through a sequence of stages, each transforming the input before passing it on.

  • Common stages$match (filter), $project (reshape), $group (aggregate), $sort, $limit, $lookup (join).

  • $group — the equivalent of SQL GROUP BY; supports accumulators like $sum, $avg, $max, $push.

  • $lookup — left-outer join to another collection.

  • Order matters — put $match and $project early to reduce data flowing through later stages; use indexes when possible.

  • Modern preference over Map-Reduce — faster, easier to read, more features.

// Group by department, calculate average salary
db.employees.aggregate([
  { $match: { status: 'active' } },          // Filter active employees
  { $group: {
    _id: '$department',                       // Group by department
    avgSalary: { $avg: '$salary' },           // Average salary
    count: { $sum: 1 },                       // Count per group
    maxSalary: { $max: '$salary' }            // Max salary
  }},
  { $sort: { avgSalary: -1 } },              // Sort by avg salary desc
  { $limit: 5 }                               // Top 5 departments
]);

// $lookup — JOIN with another collection
db.orders.aggregate([
  { $lookup: {
    from: 'users',                            // Collection to join
    localField: 'userId',                     // Field in orders
    foreignField: '_id',                      // Field in users
    as: 'user'                                // Output array field
  }},
  { $unwind: '$user' },                       // Flatten the array to object
  { $project: {
    orderId: 1,
    total: 1,
    'user.name': 1,
    'user.email': 1
  }}
]);

// $bucket — histogram
db.products.aggregate([
  { $bucket: {
    groupBy: '$price',
    boundaries: [0, 25, 50, 100, 500],
    default: 'expensive',
    output: { count: { $sum: 1 }, avgPrice: { $avg: '$price' } }
  }}
]);

Pipeline stages execute in order: $match filters first (efficient — reduces documents for subsequent stages), $group aggregates by department with $avg/$sum/$max, $sort orders results, $limit takes top 5. $lookup performs a left outer join with the users collection. $unwind converts the user array (from $lookup) to a single object. $bucket creates price range histograms.

Know the major stages: $match (filter first for performance), $group (aggregation), $lookup (JOIN), $unwind (flatten arrays), $project (reshape), $sort, $limit. Pipeline order matters — $match early reduces work for later stages. $lookup + $unwind is the JOIN pattern.

Aggregation replaced Map-Reduce.

What is the Aggregation Framework in MongoDB? | Hiprup