查找文件數組數

我有2個系列：單詞和短語每個單詞文檔都有一個短語id的數組。並且每個短語可以是活動的或不活動的。查找文件數組數

例如：

詞：
{ 「字」=> 「你好」，短語=> [1,2]} {
「字」=> 「表」，短語=> [2]}

短語：
{「id」=> 1，「phrase」=>「hello world！」，「active」=> 1}
{「id」=> 2，「phrase」=>「hello，i已經買了新表「，」主動「=> 0}

我需要爲每個單詞計算活動短語的數量。

在PHP中我不喜歡這樣寫道：
1.讓所有的話
2.每個字活躍起來短語的數量與條件[「主動」 => 1]

問：哪有我在一個請求中獲得有效短語的單詞數量？我試圖使用MapReduce，但我需要爲每個單詞提出請求以獲取活動短語的數量。

UPD： 在我的測試集合中有92 000個短語和23 000個單詞。

我已經測試了這兩個變體：用php循環中的每個單詞在其中我得到短語計數和聚合函數在mongo。

但是，由於phrase_data，我改變了下面的聚合管道。它是數組，所以我不能使用$匹配它。 $ lookup後使用$ unwind。

[ '$unwind' => '$5'], 
    [ 
     '$lookup' => [ 
     'from' => 'phrases_926ee3bc9fa72b029e028ec90e282072ea0721d1', 
      'localField' => '5', 
      'foreignField' => '0', 
      'as' => 'phrases_data' 
     ] 
    ], 
    [ '$unwind' => '$phrases_data'], 
    [ '$match' => [ 'phrases_data.3' => 77] ], //phrases_data.3 => 77 it is similar to phrases_data.active => 1 
    [ '$group' => 
     [ 
      '_id' => ['word' => '$1', 'id' => '$0'], 
      'active_count' => [ '$sum' => 1] 
     ] 
    ], 
    [ '$match' => [ 'active_count' => ['$gt' => 0]] ], 
    [ '$sort' => 
     [ 
      'active_count' => -1 
     ] 
    ]

的問題是，$組命令取的處理時間的80％。它比php循環慢得多。這裏是我的測試集的結果：

1. Php loop (get words-> get phrases count for each word): 10 seconds 
2. Aggregation function : 20 seconds

來源

2017-05-06 Дмитрий Бережнов

什麼是您的mongo服務器版本和php mongo驅動程序版本？ – Veeram

Mongo 3.2。，php mongo驅動似乎是v1，我不知道 –

db.words.aggregate([ 
    { "$unwind" : "$phrases"}, 
    { 
     "$lookup": { 
      "from": "phrases", 
      "localField": "phrases", 
      "foreignField": "id", 
      "as": "phrases_data" 
     } 
    }, 
    { "$match" : { "phrases_data.active" : 1} }, 
    { "$group" : { 
     "_id" : "$word", 
     "active_count" : { $sum : 1 } 
     } 
    } 
]);

您可以使用上述聚合管道：

放鬆從詞集合documen短語陣列作爲單獨的文件
進行查找（加入）短語集合使用展開的短語
篩選短語並檢查使用$匹配的活動
最後按字和計數使用$ sum分組短語：1

來源

2017-05-06 15:03:22

謝謝！我知道聚合框架和功能，如「$放鬆」，但我擔心它會很慢。但是如果沒有更改的優點，我會嘗試使用它。 –

您從單詞集合中獲得的主要參考資料是在數組中。所以放鬆是最好的方式。 –

您可以在3.4中使用下面的聚合管道。

您不需要$unwind陣列ID在3.3.4版本和之後。

https://stackoverflow.com/a/36647133/2683814

下面的查詢將加入words與phrases集合，然後$filter + $size計數活動行。

<?php 

    $manager = new MongoDB\Driver\Manager("mongodb://localhost:27017"); 

    $pipeline = 
     [ 
      [ 
      '$lookup' => [ 
       'from' => 'phrases', 
       'localField' => 'phrases', 
       'foreignField' => 'id', 
       'as' => 'phrases' 
      ] 
      ], 
      [ 
      '$addFields' => 
      [ 
       'phrases' => 
       [ 
        '$size'=> [ 
         [ 
          '$filter' => [ 
          'input' => '$phrases', 
          'as' => 'phrase', 
          'cond' => [ 
           '$eq' => [ 
            '$$phrase.active', 1, 
            ] 
           ], 
          ], 
         ], 
        ], 
       ], 
       '_id' => 0 
      ], 
     ], 
    ]; 

    $command = new \MongoDB\Driver\Command([ 
     'aggregate' => 'words', 
     'pipeline' => $pipeline 
     ]); 

    $cursor = $manager->executeCommand('test', $command); 

    foreach($cursor as $key => $document) { 
     var_dump($document); 
    } 
?>

來源

2017-05-06 15:37:38 Veeram

我不能使用它，我有mongo 3.2 –

查找文件數組數

回答

相關問題