MongoDB Aggregation Operations

You are not going to always return records from MongoDB but sometimes you need to perform some computations such as sum, count, average, etc. Here you need the aggregation operations which are used to process data that return the computed results. Therefore aggregation actually groups data from multiple documents and performs a variety of operations on the grouped data to return the single result.

Aggregate operations retrieve a single value after performing a calculation on a set of values. Often aggregate functions are accompanied by group clause in the find() method.

There are three ways to perform aggregation in MongoDB – the aggregation pipeline, the map-reduce function, and single purpose aggregation methods.

Prerequisites

MongoDB 4.4.0, MongoDB Setup

Aggregation Examples

Count

Returns a count of a number of non-NULL values of a given expression. If it does not find any matching row, it returns 0.

As a first step I want to insert some sample data into publisher collection into MongoDB. Execute below insert() statements in the MongoDB shell to insert these data.

db.publisher.insert({"pubId":"P001","pubName":"Pub 1","pubCity":"Mumbai","country":"India","noOfOffices":10});
db.publisher.insert({"pubId":"P002","pubName":"Pub 2","pubCity":"Kolkata","country":"India","noOfOffices":7});
db.publisher.insert({"pubId":"P003","pubName":"Pub 3","pubCity":"New York","country":"USA","noOfOffices":5});
db.publisher.insert({"pubId":"P004","pubName":"Pub 4","pubCity":"London","country":"UK","noOfOffices":4});
db.publisher.insert({"pubId":"P005","pubName":"Pub 5","pubCity":"Cambridge","country":"USA","noOfOffices":3});
db.publisher.insert({"pubId":"P006","pubName":"Pub 6","pubCity":"New Delhi","country":"India","noOfOffices":9});

To count the documents in the publisher collection you can use command db.publisher.count(); in Mongo shell. You will get result 6. To achieve the same result you can also execute the command db.publisher.find().length();.

Now let’s say you want to return the number of publishers for each country. Here you need to use the aggregate function. The following query will give you exactly the same result.

db.publisher.aggregate([{ $group : {_id : '$country', count : {$sum : 1}} }]);

The result you will get from the above query is.

mongodb aggregation operations

The _id is the object by which you want to accumulate the values. The each 1 object for this _id type will be added to the count. The group is used to group by specific field. Here I am grouping by cateId.

Sum

Sum function returns the sum of an expression. It returns null when the return set has no row.

I will first insert some sample data into a collection called book.

db.book.insert({"cateId":"C001","name":"Information Technology","price":75.00,"quantity":15});
db.book.insert({"cateId":"C001","name":"Computer Science","price":70.00,"quantity":12});
db.book.insert({"cateId":"C002","name":"Physics","price":60.00,"quantity":11});
db.book.insert({"cateId":"C001","name":"Mechanis","price":50.00,"quantity":10});
db.book.insert({"cateId":"C003","name":"English","price":30.00,"quantity":15});
db.book.insert({"cateId":"C002","name":"Mathematics","price":80.00,"quantity":9});

To fetch total quantity for each category (cateId) for the books use below query. Here I am grouping books by cateId and summing up quantity from each cateId.

db.book.aggregate([{ $group : {_id : "$cateId", count: {$sum: "$quantity"}} }]);

The above query will give you the following results.

{ "_id" : "C002", "count" : 20 }
{ "_id" : "C001", "count" : 37 }
{ "_id" : "C003", "count" : 15 }

Let’s say you want to calculate total amount for each cateId for all quantities. So here is the query and output.

> db.book.aggregate([{ $group : {_id : "$cateId", count: {$sum: "$quantity"}, total : {$sum : {$multiply: [ "$price", "$quantity" ]}} } }]);

{ "_id" : "C002", "count" : 20, "total" : 1380 }
{ "_id" : "C001", "count" : 37, "total" : 2465 }
{ "_id" : "C003", "count" : 15, "total" : 450 }

Average

Now I will show you how to work with average function in MongoDB. Let’s say we want to calculate average amount for each category irrespective of the total quantity.

I am going to use the same dataset, as I have used for Sum operation, for calculating average amount for each category. Here is the below query that does the job.

db.book.aggregate([{$group : {_id : "$cateId", count: {$sum: "$quantity"}, total: {$sum : {$multiply: [ "$price", "$quantity" ]}}, avgAmount: { $avg: { $multiply: [ "$price", "$quantity" ] }}, avgQuantity: { $avg: "$quantity" } }}]);

The above query will produce below output.

{ "_id" : "C002", "count" : 20, "total" : 1380, "avgAmount" : 690, "avgQuantity" : 10 }
{ "_id" : "C003", "count" : 15, "total" : 450, "avgAmount" : 450, "avgQuantity" : 15 }
{ "_id" : "C001", "count" : 37, "total" : 2465, "avgAmount" : 821.6666666666666, "avgQuantity" : 12.333333333333334 }

Now if you look at the output and probably you will be surprised that avgAmount should be 69 instead of 690 because total quantity (count) is 20. But it actually does the average on total number of documents for each category. It will clear your idea if you look at the avgQuantity value. So it gives average for each group.

Match

Match is similar to like in WHERE clause of the SQL query. It is used to match the documents. Let’s say you want to retrieve documents for category C002. here is the query and output:

> db.book.aggregate([{$match:{cateId:"C002"}}]);

{ "_id" : ObjectId("5f8adb2b3fc578f9d0a566b4"), "cateId" : "C002", "name" : "Physics", "price" : 60, "quantity" : 11 }
{ "_id" : ObjectId("5f8adb2c3fc578f9d0a566b7"), "cateId" : "C002", "name" : "Mathematics", "price" : 80, "quantity" : 9 }

Project

Project is used to select only the specified fields from the documents. You can also read projection example with find() in MongoDB. Let’s say you want to select only name and quantity of books. So I am putting 1 for name and quantity for displaying. By default _id field always appears. So if you want to exclude _id field then you need to use "_id": 0.

> db.book.aggregate([{$match:{cateId:"C002"}}, {$project:{"name":1, "quantity":1}}]);

{ "_id" : ObjectId("5f8adb2b3fc578f9d0a566b4"), "name" : "Physics", "quantity" : 11 }
{ "_id" : ObjectId("5f8adb2c3fc578f9d0a566b7"), "name" : "Mathematics", "quantity" : 9 }

Min

It finds minimum value of a field in each group. So here I am finding minimum total amount for each group (cateId).

> db.book.aggregate([{$group : {_id : "$cateId", count: {$sum: "$quantity"}, total: {$sum : {$multiply: [ "$price", "$quantity" ]}}, minAmount: { $min: { $multiply: [ "$price", "$quantity" ] }} }}]);

{ "_id" : "C002", "count" : 20, "total" : 1380, "minAmount" : 660 }
{ "_id" : "C003", "count" : 15, "total" : 450, "minAmount" : 450 }
{ "_id" : "C001", "count" : 37, "total" : 2465, "minAmount" : 500 }

Max

Finds maximum value of a field in each group. The below query will find the maximum total amount from each group (cateId). It does just opposite to the above query.

> db.book.aggregate([{$group : {_id : "$cateId", count: {$sum: "$quantity"}, total: {$sum : {$multiply: [ "$price", "$quantity" ]}}, maxAmount: { $max: { $multiply: [ "$price", "$quantity" ] }} }}]);

{ "_id" : "C002", "count" : 20, "total" : 1380, "maxAmount" : 720 }
{ "_id" : "C001", "count" : 37, "total" : 2465, "maxAmount" : 1125 }
{ "_id" : "C003", "count" : 15, "total" : 450, "maxAmount" : 450 }

First and Last

Getting specific field’s value from first and last document of each group. The following query does exactly same.

> db.book.aggregate([{$group:{"_id":"$cateId", "first":{$first:"$name"}, "last":{$last:"$name"}}}]);

{ "_id" : "C002", "first" : "Physics", "last" : "Mathematics" }
{ "_id" : "C003", "first" : "English", "last" : "English" }
{ "_id" : "C001", "first" : "Information Technology", "last" : "Mechanis" }

Min and Max

The following query shows how to find minimum and maximum total amount for each group (cateId).

> db.book.aggregate([{$group:{"_id":"$cateId", count: {$sum: "$quantity"}, "maxAmount":{$max:{ $multiply: [ "$price", "$quantity" ] }}, "minAmount":{$min: { $multiply: [ "$price", "$quantity" ] }}}}]);

{ "_id" : "C002", "count" : 20, "maxAmount" : 720, "minAmount" : 660 }
{ "_id" : "C003", "count" : 15, "maxAmount" : 450, "minAmount" : 450 }
{ "_id" : "C001", "count" : 37, "maxAmount" : 1125, "minAmount" : 500 }

Push

Push adds a field’s value form each document in group to an array used to project data in array format. For example you want to aggregate all quantities into an array. So below example can help you.

> db.book.aggregate([{$group:{"_id":"cateId", "arrPush":{$push:"$quantity"}}}]);

{ "_id" : "cateId", "arrPush" : [ 15, 12, 11, 10, 15, 9 ] }

AddToSet

AddToSet is similar to the push but it does not allow duplicates. The following example removes 15 from the array.

> db.book.aggregate([{$group:{"_id":"cateId", "arrAddToSet":{$addToSet:"$quantity"}}}]);

{ "_id" : "cateId", "arrAddToSet" : [ 10, 11, 9, 15, 12 ] }

Sort

You may want to sort your documents based on a particular field. Let’s say you want to sort books by name for cateId C001. You can see the example below:

> db.book.aggregate([{$match:{cateId:"C001"}}, {$project:{"name":1, "quantity":1}}, {$sort: {name:1}}]);

{ "_id" : ObjectId("5f8adb2b3fc578f9d0a566b3"), "name" : "Computer Science", "quantity" : 12 }
{ "_id" : ObjectId("5f8adb2b3fc578f9d0a566b2"), "name" : "Information Technology", "quantity" : 15 }
{ "_id" : ObjectId("5f8adb2b3fc578f9d0a566b5"), "name" : "Mechanis", "quantity" : 10 }

You may also want to sort by quantity, look at the below example:

> db.book.aggregate([{$match:{cateId:"C001"}}, {$project:{"name":1, "quantity":1}}, {$sort: {quantity:1}}]);

{ "_id" : ObjectId("5f8adb2b3fc578f9d0a566b5"), "name" : "Mechanis", "quantity" : 10 }
{ "_id" : ObjectId("5f8adb2b3fc578f9d0a566b3"), "name" : "Computer Science", "quantity" : 12 }
{ "_id" : ObjectId("5f8adb2b3fc578f9d0a566b2"), "name" : "Information Technology", "quantity" : 15 }

Skip

Skip skips over the number of documents specified for this parameter. For example you may want to skip first two document while sorting as shown in the above example. Therefore it fetches only one document.

> db.book.aggregate([{$match:{cateId:"C001"}}, {$project:{"name":1, "quantity":1}}, {$sort: {quantity:1}}, {$skip: 2}]);

{ "_id" : ObjectId("5f8adb2b3fc578f9d0a566b2"), "name" : "Information Technology", "quantity" : 15 }

Limit

You can limit the number of documents returned from MongoDB. Let’s say you want to return only two documents or rows. The below example may help you:

> db.book.aggregate([{$match:{cateId:"C001"}}, {$project:{"name":1, "quantity":1}}, {$sort: {quantity:1}}, {$limit: 2}]);

{ "_id" : ObjectId("5f8adb2b3fc578f9d0a566b5"), "name" : "Mechanis", "quantity" : 10 }
{ "_id" : ObjectId("5f8adb2b3fc578f9d0a566b3"), "name" : "Computer Science", "quantity" : 12 }

Query

Problem

How to write a query to get all categories (cateId) where average amount is greater than or equal to 500 of categories are having quantity less than or equal to 12?

Solution

In order to do that you need to write a query to match categories that have a quatity that is less than or equal to 12. Then add the aggregate stage to group the categories by the cateId. Then add an accumulator with a field named e.g. avgAmount to find the average amount per cateId using the $avg accumulator and below the existing $match and $group aggregates add another $match aggregate so that you’re only retrieving results with an avgAmount that is greather than or equal to 500.

db.book.aggregate([
	{"$match": {"quantity": {"$lte": 12}}},
	{"$group": {"_id": "$cateId",
		avgAmount: { $avg: { $multiply: [ "$price", "$quantity" ] }}
		}
	},
	{"$match": {"avgAmount": {"$gte": 500}}}
]);

You will get the following output on Mongo shell:

{ "_id" : "C002", "avgAmount" : 690 }
{ "_id" : "C001", "avgAmount" : 670 }

That’s all about how to work with aggregation operations in MongoDB.

Leave a Reply

Your email address will not be published. Required fields are marked *