Elasticsearch電子郵件的域名聚合

我是Elasticsearch的新手，我試圖計算字段子字符串的不同位置。Elasticsearch電子郵件的域名聚合

我有電子郵件收件人作爲郵件日誌索引的一部分，我想計算索引中不同域的數量。

因此，例如，如果在我的索引中有3個郵件日誌，它們來自以下地址：[email protected],[email protected]和[email protected];我希望看到2個郵件來自b.com域，1個郵件來自e.com域。

來源

2016-08-01 user2604150

你需要一個pattern_capture filter，應該只能捕獲@後面的內容。此外，不要與文本的原始分析一塌糊塗，我建議增加一個子場到原來的email場，並與工作只針對這一特定聚集：

PUT /test 
{ 
    "settings": { 
    "analysis": { 
     "filter": { 
     "email_domains": { 
      "type": "pattern_capture", 
      "preserve_original" : 0, 
      "patterns": [ 
      "@(.+)" 
      ] 
     } 
     }, 
     "analyzer": { 
     "email": { 
      "tokenizer": "uax_url_email", 
      "filter": [ 
      "email_domains", 
      "lowercase", 
      "unique" 
      ] 
     } 
     } 
    } 
    }, 
    "mappings": { 
    "emails": { 
     "properties": { 
     "email": { 
      "type": "string", 
      "fields": { 
      "domain": { 
       "type": "string", 
       "analyzer": "email" 
      } 
      } 
     } 
     } 
    } 
    } 
}

嘗試一些測試數據：

POST /test/emails/_bulk 
{"index":{"_id":"1"}} 
{"email": "[email protected]"} 
{"index":{"_id":"2"}} 
{"email": "[email protected], [email protected]"} 
{"index":{"_id":"3"}} 
{"email": "[email protected]"} 
{"index":{"_id":"4"}} 
{"email": "[email protected]"} 
{"index":{"_id":"5"}} 
{"email": "[email protected]"}

併爲您的具體使用情況下，類似下面的簡單聚合應該這樣做：

GET /test/emails/_search 
{ 
    "size": 0, 
    "aggs": { 
    "by_domain": { 
     "terms": { 
     "field": "email.domain", 
     "size": 10 
     } 
    } 
    } 
}

，結果是這樣的：

"aggregations": { 
     "by_domain": { 
     "doc_count_error_upper_bound": 0, 
     "sum_other_doc_count": 0, 
     "buckets": [ 
      { 
       "key": "outlook.com", 
       "doc_count": 3 
      }, 
      { 
       "key": "gmail.com", 
       "doc_count": 2 
      }, 
      { 
       "key": "yahoo.com", 
       "doc_count": 1 
      } 
     ] 
     } 
    }

來源

2016-08-01 21:00:00

Elasticsearch電子郵件的域名聚合

回答

相關問題