MongoDB—聚合

时间:2020-09-09 19:22:32   收藏:0   阅读:69

Aggregation

Aggregation Pipeline

db.collection.aggregate

格式

db.collection.aggregate(pipeline, options)

Returns

示例

文档

db.orders.insertMany([
{ _id: 1, cust_id: "abc1", ord_date: ISODate("2012-11-02T17:04:11.102Z"), status: "A", amount: 50 },
{ _id: 2, cust_id: "xyz1", ord_date: ISODate("2013-10-01T17:04:11.102Z"), status: "A", amount: 100 },
{ _id: 3, cust_id: "xyz1", ord_date: ISODate("2013-10-12T17:04:11.102Z"), status: "D", amount: 25 },
{ _id: 4, cust_id: "xyz1", ord_date: ISODate("2013-10-11T17:04:11.102Z"), status: "D", amount: 125 },
{ _id: 5, cust_id: "abc1", ord_date: ISODate("2013-11-12T17:04:11.102Z"), status: "A", amount: 25 }
])

Pipeline Expressions

管道表达式

https://docs.mongodb.com/v4.2/core/aggregation-pipeline/#pipeline-expressions

for update

https://docs.mongodb.com/v4.2/tutorial/update-documents-with-aggregation-pipeline/

for Sharded Collections

https://docs.mongodb.com/v4.2/core/aggregation-pipeline-sharded-collections/#aggregation-pipeline-sharded-collection

vs Map-Reduce

限制

优化

聚合命令对单个集合进行操作,逻辑上将整个集合传递给聚合管道,为了优化操作,应尽可能比表面扫描整个集合

Stages

$group

格式

db.collection.aggregate([
{
  $group:
    {
      _id: <expression>, 
      <field>: { <accumulator> : <expression> },
      ...
    }
 }
])

示例

文档

db.sales.insertMany([
  { "_id" : 1, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("2"), "date" : ISODate("2014-03-01T08:00:00Z") },
  { "_id" : 2, "item" : "jkl", "price" : NumberDecimal("20"), "quantity" : NumberInt("1"), "date" : ISODate("2014-03-01T09:00:00Z") },
  { "_id" : 3, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" : NumberInt( "10"), "date" : ISODate("2014-03-15T09:00:00Z") },
  { "_id" : 4, "item" : "xyz", "price" : NumberDecimal("5"), "quantity" :  NumberInt("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") },
  { "_id" : 5, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") },
  { "_id" : 6, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") },
  { "_id" : 7, "item" : "def", "price" : NumberDecimal("7.5"), "quantity": NumberInt("10") , "date" : ISODate("2015-09-10T08:43:00Z") },
  { "_id" : 8, "item" : "abc", "price" : NumberDecimal("10"), "quantity" : NumberInt("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") },
])

$project

格式

{ $project: { <specification(s)> } }

示例

$sort

格式

{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }

优化

$skip

格式

{ $skip: <positive integer> }

$limit

格式

{ $limit: <positive integer> }

$match

格式

{ $match: { <query> } }

限制

  1. $match 查询语法与读取操作查询语法相同;例如:$match不接受原始聚合表达式。要在$match中包含聚合表达式,请使用$expr查询表达式

    { $match: { $expr: { <aggregation expression> } } }
    
  2. 要在 $match 中使用 $text,则必须作为管道的第一阶段

示例

$lookup

相等查询
{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

示例

不相关子查询与多条件查询
{
   $lookup:
     {
       from: <collection to join>,
       let: { <var_1>: <expression>, …, <var_n>: <expression> },
       pipeline: [ <pipeline to execute on the collection to join> ],
       as: <output array field>
     }
}

示例

$count

格式

{ $count: <string> }

示例

$unwind

格式

{ $unwind: <field path> }
{
  $unwind:
    {
      path: <field path>,
      includeArrayIndex: <string>,
      preserveNullAndEmptyArrays: <boolean>
    }
}

示例

?

$out

格式

{ $out: "<output-collection>" }

示例

db.books.insert([
{ "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 },
{ "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 },
{ "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 },
{ "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 },
{ "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 }
])

聚合

db.books.aggregate( [
                      { $group : { _id : "$author", books: { $push: "$title" } } },
                      { $out : "authors" }
                  ] )

结果在当前数据库中新增 authors 集合

{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] }
{ "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }

$merge

$addFields

格式

{ $addFields: { <newField>: <expression>, ... } }

Operators

## 仅 Group ##

$push

格式

{ $push: <expression> }

$addToSet

格式

{ $addToSet: <expression> }

$sum

格式

$group 阶段

{ $sum: <expression> }

其他阶段

{ $sum: <expression> }
{ $sum: [ <expression1>, <expression2> ... ]  }

$avg

格式

$group 阶段

{ $avg: <expression> }

其他阶段

{ $avg: <expression> }
{ $avg: [ <expression1>, <expression2> ... ]  }

$first

格式

{ $first: <expression> }

$last

格式

{ $last: <expression> }

$max

格式

$group 阶段

{ $max: <expression> }

其他阶段

{ $max: <expression> }
{ $max: [ <expression1>, <expression2> ... ]  }

$min

格式

$group 阶段

{ $min: <expression> }

其他阶段

{ $min: <expression> }
{ $min: [ <expression1>, <expression2> ... ]  }

## 条件判断 ##

$switch

格式

$switch: {
   branches: [
      { case: <expression>, then: <expression> },
      { case: <expression>, then: <expression> },
      ...
   ],
   default: <expression>
}

示例

文档

db.grades.insert([
{ "_id" : 1, "name" : "Susan Wilkes", "scores" : [ 87, 86, 78 ] },
{ "_id" : 2, "name" : "Bob Hanna", "scores" : [ 71, 64, 81 ] },
{ "_id" : 3, "name" : "James Torrelio", "scores" : [ 91, 84, 97 ] }
])

聚合

db.grades.aggregate( [
  {
    $project:
      {
        "name" : 1,
        "summary" :
        {
          $switch:
            {
              branches: [
                {
                  case: { $gte : [ { $avg : "$scores" }, 90 ] },
                  then: "Doing great!"
                },
                {
                  case: { $and : [ { $gte : [ { $avg : "$scores" }, 80 ] },
                                   { $lt : [ { $avg : "$scores" }, 90 ] } ] },
                  then: "Doing pretty well."
                },
                {
                  case: { $lt : [ { $avg : "$scores" }, 80 ] },
                  then: "Needs improvement."
                }
              ],
              default: "No scores found."
            }
         }
      }
   }
] )

## 数组 ##

$size

格式

{ $size: <expression> }

## 字符串 ##

$substr

3.4 版本中被废弃,现在是 substrBytes 的别名

格式

{ $substr: [ <string>, <start>, <length> ] }

$indexOfBytes

格式

{ $indexOfBytes: [ <string expression>, <substring expression>, <start>, <end> ] }

示例

db.inventory.insert([
{ "_id" : 1, "item" : "foo" },
{ "_id" : 2, "item" : "fóofoo" },
{ "_id" : 3, "item" : "the foo bar" },
{ "_id" : 4, "item" : "hello world fóo" },
{ "_id" : 5, "item" : null },
{ "_id" : 6, "amount" : 3 }
])

聚合

db.inventory.aggregate(
   [
     {
       $project:
          {
            byteLocation: { $indexOfBytes: [ "$item", "foo" ] },
          }
      }
   ]
)

$strLenBytes

格式

{ $strLenBytes: <string expression> }

示例

文档

{ "_id" : 1, "name" : "apple" }
{ "_id" : 2, "name" : "banana" }
{ "_id" : 3, "name" : "éclair" }
{ "_id" : 4, "name" : "hamburger" }
{ "_id" : 5, "name" : "jalape?o" }
{ "_id" : 6, "name" : "pizza" }
{ "_id" : 7, "name" : "tacos" }
{ "_id" : 8, "name" : "寿司" }

聚合

db.food.aggregate(
  [
    {
      $project: {
        "name": 1,
        "length": { $strLenBytes: "$name" }
      }
    }
  ]
)

$strcasecmp

格式

{ $strcasecmp: [ <expression1>, <expression2> ] }

$toLower

格式

{ $toLower: <expression> }

示例

db.inventory.aggregate(
   [
     {
       $project:
         {
           item: { $toLower: "$item" },
           description: { $toLower: "$description" }
         }
     }
   ]
)

$toUpper

格式

{ $toUpper: <expression> }

$concat

格式

{ $concat: [ <expression1>, <expression2>, ... ] }

$split

格式

{ $split: [ <string expression>, <delimiter> ] }
Example Results
{ $split: [ "June-15-2013", "-" ] } [ "June", "15", "2013" ]
{ $split: [ "banana split", "a" ] } [ "b", "n", "n", " split" ]
{ $split: [ "Hello World", " " ] } [ "Hello", "World" ]
{ $split: [ "astronomical", "astro" ] } [ "", "nomical" ]
{ $split: [ "pea green boat", "owl" ] } [ "pea green boat" ]
{ $split: [ "headphone jack", 7 ] } Errors with message: ...
{ $split: [ "headphone jack", /jack/ ] } Errors with message: ...

"$split requires an expression that evaluates to a string as a second argument, found: regex"

## 比较运算 ##

$gt

格式

{ $gt: [ <expression1>, <expression2> ] }

$gte

格式

{ $gte: [ <expression1>, <expression2> ] }

$lt

格式

{ $lt: [ <expression1>, <expression2> ] }

$lte

格式

{ $lte: [ <expression1>, <expression2> ] }

## 算术运算 ##

$add

格式

{ $add: [ <expression1>, <expression2>, ... ] }

$subtract

格式

{ $subtract: [ <expression1>, <expression2> ] }

示例

文档

db.sales.insertMany([
   { "_id" : 1, "item" : "abc", "price" : 10, "fee" : 2, "discount" : 5, "date" : ISODate("2014-03-01T08:00:00Z") },
   { "_id" : 2, "item" : "jkl", "price" : 20, "fee" : 1, "discount" : 2, "date" : ISODate("2014-03-01T09:00:00Z") }
])

聚合

db.sales.aggregate( [ { $project: { item: 1, dateDifference: { $subtract: [ "$date", 5 * 60 * 1000 ] } } } ] )

$multiply

格式

{ $multiply: [ <expression1>, <expression2>, ... ] }

$divide

格式

{ $divide: [ <expression1>, <expression2> ] }

$mod

格式

{ $mod: [ <expression1>, <expression2> ] }

## 时间日期 ##

$year

格式

{ $year: <dateExpression> }

示例

Example Result
{ $year: new Date("2016-01-01") } 2016
{ $year: { date: new Date("Jan 7, 2003") } } 2003
{ $year: { date: new Date("August 14, 2011"), timezone: "America/Chicago" } } 2011
{ $year: ISODate("1998-11-07T00:00:00Z") } 1998
{ $year: { date: ISODate("1998-11-07T00:00:00Z"), timezone: "-0400" } } 1998
{ $year: "March 28, 1976" } error
{ $year: Date("2016-01-01") } error
{ $year: "2009-04-09" } error

示例

{
  "_id" : 1,
  "date" : ISODate("2014-01-01T08:15:39.736Z")
}

聚合

db.sales.aggregate(
   [
     {
       $project:
         {

           year: { $year: "$date" },

           month: { $month: "$date" },
           day: { $dayOfMonth: "$date" },
           hour: { $hour: "$date" },
           minutes: { $minute: "$date" },
           seconds: { $second: "$date" },
           milliseconds: { $millisecond: "$date" },
           dayOfYear: { $dayOfYear: "$date" },
           dayOfWeek: { $dayOfWeek: "$date" },
           week: { $week: "$date" }
         }
     }
   ]
)

$month

格式

{ $month: <dateExpression> }

示例

Example Result
{ $month: new Date("2016-01-01") } 1
{ $month: { date: new Date("Nov 7, 2003") } } 11
{ $month: ISODate("2000-01-01T00:00:00Z") } 1
{ $month: { date: new Date("August 14, 2011"), timezone: "America/Chicago" } } 8
{ $month: { date: ISODate("2000-01-01T00:00:00Z"), timezone: "-0500" } } 12
{ $month: "March 28, 1976" } error
{ $month: { date: Date("2016-01-01"), timezone: "-0500" } } error
{ $month: "2009-04-09" } error

$week

格式

{ $week: <dateExpression> }

示例

Example Result
{ $week: new Date("Jan 1, 2016") } 0
{ $week: { date: new Date("2016-01-04") } } 1
{ $week: { date: new Date("August 14, 2011"), timezone: "America/Chicago" } } 33
{ $week: ISODate("1998-11-01T00:00:00Z") } 44
{ $week: { date: ISODate("1998-11-01T00:00:00Z"), timezone: "-0500" } } 43
{ $week: "March 28, 1976" } error
{ $week: Date("2016-01-01") } error
{ $week: "2009-04-09" } error

NOTE

$hour cannot take a string as an argument.

$hour

格式

{ $hour: <dateExpression> }

$minute

格式

{ $minute: <dateExpression> }

$second

格式

{ $second: <dateExpression> }

$millisecond

格式

{ $millisecond: <dateExpression> }

$dayOfYear

格式

{ $dayOfYear: <dateExpression> }

$dayOfMonth

格式

{ $dayOfMonth: <dateExpression> }

$dayOfWeek

格式

{ $dayOfWeek: <dateExpression> }

map-reduce

格式

db.collection.mapReduce(
                         <map>,								   // 类似 $group
                         <reduce>,
                         {
                           out: <collection>,		  // 类似 $out
                           query: <document>,	// 类似 $match
                           sort: <document>,       // 类似 $sort
                           limit: <number>,			  // 类似 $limit
                           finalize: <function>,     // 类似 $project
                           scope: <document>,
                           jsMode: <boolean>,
                           verbose: <boolean>,
                           bypassDocumentValidation: <boolean>
                         }
                       )

JS Function

map-reduce 操作和 $where 操作符表达式不能访问某些全局函数或属性,例如db

以下函数和属性可以访问

Available Properties Available Functions
args assert() isNumber() print()
MaxKey BinData() isObject() printjson()
MinKey DBPointer() ISODate() printjsononeline()
DBRef() isString() sleep()
doassert() Map() Timestamp()
emit() MD5() tojson()
gc() NumberInt() tojsononeline()
HexData() NumberLong() tojsonObject()
hex_md5() ObjectId() UUID()
version()

示例

普通文档

计算每个消费者的总价

以 cust_id字段 分组,计算每组中 price字段的总和

文档

db.orders.insertMany([
   { _id: 1, cust_id: "Ant O. Knee", ord_date: new Date("2020-03-01"), price: 25, items: [ { sku: "oranges", qty: 5, price: 2.5 }, { sku: "apples", qty: 5, price: 2.5 } ], status: "A" },
   { _id: 2, cust_id: "Ant O. Knee", ord_date: new Date("2020-03-08"), price: 70, items: [ { sku: "oranges", qty: 8, price: 2.5 }, { sku: "chocolates", qty: 5, price: 10 } ], status: "A" },
   { _id: 3, cust_id: "Busby Bee", ord_date: new Date("2020-03-08"), price: 50, items: [ { sku: "oranges", qty: 10, price: 2.5 }, { sku: "pears", qty: 10, price: 2.5 } ], status: "A" },
   { _id: 4, cust_id: "Busby Bee", ord_date: new Date("2020-03-18"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },
   { _id: 5, cust_id: "Busby Bee", ord_date: new Date("2020-03-19"), price: 50, items: [ { sku: "chocolates", qty: 5, price: 10 } ], status: "A"},
   { _id: 6, cust_id: "Cam Elot", ord_date: new Date("2020-03-19"), price: 35, items: [ { sku: "carrots", qty: 10, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" },
   { _id: 7, cust_id: "Cam Elot", ord_date: new Date("2020-03-20"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },
   { _id: 8, cust_id: "Don Quis", ord_date: new Date("2020-03-20"), price: 75, items: [ { sku: "chocolates", qty: 5, price: 10 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" },
   { _id: 9, cust_id: "Don Quis", ord_date: new Date("2020-03-20"), price: 55, items: [ { sku: "carrots", qty: 5, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 }, { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },
   { _id: 10, cust_id: "Don Quis", ord_date: new Date("2020-03-23"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" }
])

嵌套文档

计算2020-03-01之后,每个sku(单品)的订单数,总销量,以及每个订单平均销量

以 sku 字段分组,计算每组中 qty 字段的总和,以及每组中_id字段不同的文档的数量,最后计算平均销量

评论(0
© 2014 mamicode.com 版权所有 京ICP备13008772号-2  联系我们:gaon5@hotmail.com
迷上了代码!