ElasticSearch在整個數據中出現的總數不同

我對ElasticSearch（版本2.3.3）非常新，這是我對數據的以下格式。ElasticSearch在整個數據中出現的總數不同

{ 
    "title": "Doc 1 title", 
    "year": "14", 
    "month": "06", 
    "sentences": [ 
     { 
      "id": 1, 
      "text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit", 
      "class": "Introduction", 
      "synth": "intr" 
     }, 
     { 
      "id": 2, 
      "text": "Donec molestie pulvinar odio, ultricies dictum mi porttitor sit amet.", 
      "class": "Introduction", 
      "synth": "abstr" 
     }, 
     { 
      "id": 3, 
      "text": "Aliquam id tristique diam. Suspendisse convallis convallis est ut condimentum.", 
      "class": "Main_Content", 
      "synth": "body" 
     }, 
     { 
      "id": 4, 
      "text": "Nunc ornare eros at pretium faucibus. Praesent congue cursus aliquet.", 
      "class": "Main_Content", 
      "synth": "body" 
     }, 
     { 
      "id": 5, 
      "text": "Integer pellentesque quam ut nulla dignissim hendrerit.", 
      "class": "Future_Work", 
      "synth": "ftr" 
     }, 
     { 
      "id": 6, 
      "text": "Pellentesque faucibus vehicula diam.", 
      "class": "Bibliography", 
      "synth": "bio" 
     } 
    ] 
}

而且，諸如doc1，doc2，...，doc700的多個文檔。

我想要生成這樣一個查詢，我得到了整個文檔批量按年排序的每個不同「類」的出現總次數。

所以，結果將類似於以下內容。

{ 
    "year" : "14", 
    "count" : [ 
     { "Introduction" : 1357 }, 
     { "Main_Content" : 1021 }, 
     { "Future_Work" : 490 }, 
     { "Bibliography" : 241 } 
    ], 
    "year" : "15", 
    "count" : [ 
     { "Introduction" : 972 } , 
     { "Main_Content" : 712 }, 
     { "Future_Work" : 335 }, 
     { "Bibliography" : 81 } 
    ] 
}

是否可以實現我張貼的內容？或者，對於每個「班級」來說，這樣做會更容易嗎？

非常感謝。

來源

2016-06-09 Mayhem

這可以使用Nested Aggregation來完成。如果現有的映射沒有嵌套映射，那麼你也許可以使用以下方法：

{ 
    "mappings": { 
     "book": { 
      "properties": { 
      "title": { 
       "type": "string" 
      }, 
      "month": { 
       "type": "string" 
      }, 
      "year": { 
       "type": "string" 
      }, 
      "sentences": { 
       "type": "nested", 
        "properties": { 
         "synth": { 
          "type": "string" 
         }, 
         "id": { 
          "type": "long" 
         }, 
         "text": { 
          "type": "string" 
         }, 
         "class": { 
          "type": "string" 
         } 
        } 
       } 
      } 
     } 
    } 
}

然後運行以下查詢：

{ 
    "size": 0, 
    "aggs": { 
     "years": { 
      "terms": { 
       "field": "year" 
      }, 
      "aggs" : { 
       "sentences" : { 
        "nested" : { 
         "path" : "sentences" 
        }, 
        "aggs" : { 
         "classes" : { "terms" : { "field" : "sentences.class" } } 
        } 
       } 
      } 
     } 
    } 
}

這裏是樣本數據：

"aggregations": { 
    "years": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
     { 
      "key": "14", 
      "doc_count": 2, 
      "sentences": { 
       "doc_count": 12, 
       "classes": { 
        "doc_count_error_upper_bound": 0, 
        "sum_other_doc_count": 0, 
        "buckets": [ 
        { 
         "key": "introduction", 
         "doc_count": 4 
        }, 
        { 
         "key": "main_content", 
         "doc_count": 4 
        }, 
        { 
         "key": "bibliography", 
         "doc_count": 2 
        }, 
        { 
         "key": "future_work", 
         "doc_count": 2 
        } 
        ] 
       } 
      } 
     }, 
     { 
      "key": "15", 
      "doc_count": 1, 
      "sentences": { 
       "doc_count": 5, 
       "classes": { 
        "doc_count_error_upper_bound": 0, 
        "sum_other_doc_count": 0, 
        "buckets": [ 
        { 
         "key": "main_content", 
         "doc_count": 2 
        }, 
        { 
         "key": "bibliography", 
         "doc_count": 1 
        }, 
        { 
         "key": "future_work", 
         "doc_count": 1 
        }, 
        { 
         "key": "introduction", 
         "doc_count": 1 
        } 
        ] 
       } 
      } 
     } 
     ] 
    } 
}

不要在這裏與doc_count混淆，它們是主文檔中您的「類」的真實發生。它們實際上是作爲與主文檔相關聯的嵌套文檔存儲的。

希望它有幫助。

來源

2016-06-10 07:13:45

我已經嘗試了像你所建議的映射，但是當運行查詢時，我得到{ 「type」：「aggregation_execution_exception」，「reason」：「[嵌套]嵌套路徑[句子]不是嵌套的」 } – Mayhem

你能驗證映射是否爲索引正確創建？ –

是的，當我運行-XGET/index/_mapping/type時它會得到映射，但我不得不說我提出的問題數據不是整個數據。我試圖映射對查詢很重要的特定部分，其餘部分（我忽略）與查詢無關，並且不包含任何要搜索的字段並且計數爲 – Mayhem

您可以將Aggs嵌套在一起，並使用術語聚合將結果拆分成桶並按照您的希望對其進行計數。一個例子是

POST index/type/_search 
{ 
    "size": 0, 
    "aggs": { 
    "agg1": { 
     "terms": { 
     "field": "year" 
     }, 
     "aggs": { 
     "agg2": { 
      "terms": { 
      "field": "sentences.class" 
      }   
     } 
     } 
    } 
    } 
}

我還沒有嘗試過這與對象的嵌套數組，但它應該仍然工作。一些更多有用的信息可以在這裏找到

https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html

來源

2016-06-09 20:32:35 pythonHelpRequired

謝謝你的快速反應。是的，這個查詢ALMOST按我的意願工作，但第二個聚合的返回只返回doc_count，而不是返回值。計算屬於某一年的每個文檔的特定類別的出現次數是真正的問題。 – Mayhem

ElasticSearch在整個數據中出現的總數不同

回答

相關問題