2017-06-16 50 views
0

是否有可能對嵌套數據類型(https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html)中的文本執行更多類似此查詢(https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html)?Elasticsearch嵌套更多喜歡這個查詢

,我想查詢(我有過它是如何因爲數據被另一方所擁有的格式沒有控制)的文件看起來是這樣的:

{ 
    "communicationType": "Email", 
    "timestamp": 1497633308917, 
    "textFields": [ 
    { 
     "field": "Subject", 
     "text": "This is the subject of the email" 
    }, 
    { 
     "field": "To", 
     "text": "[email protected]" 
    }, 
    { 
     "field": "Body", 
     "text": "This is the body of the email" 
    } 
    ] 
} 

我想執行更多喜歡此電子郵件正文的查詢。之前,該文件中使用看起來像這樣:

{ 
    "communicationType": "Email", 
    "timestamp": 1497633308917, 
    "textFields": { 
    "subject": "This is the subject of the email", 
    "to: "[email protected]", 
    "body": "This is the body of the email" 
    } 
} 

,我能夠進行更多類似的電子郵件正文這個查詢是這樣的:

{ 
    "query": { 
    "more_like_this": { 
     "fields": ["textFields.body"], 
     "like": "This is a similar body of an email", 
     "min_term_freq": 1 
    }, 
    "bool": { 
     "filter": [ 
     { "term": { "communicationType": "Email" } }, 
     { "range": { "timestamp": { "gte": 1497633300000 } } } 
     ] 
    } 
    } 
} 

但是現在數據源已被棄用,我需要能夠對具有嵌套數據類型中的電子郵件正文的新數據源執行等效查詢。我只想將文本與具有「正文」的「標題」的「文本」字段進行比較。

這可能嗎?如果是這樣,查詢將如何?如果在非嵌套文檔上執行嵌套數據類型的查詢與之前的查詢相比會有一個主要性能問題?即使在應用時間戳和通信類型過濾器之後,仍然會有每個查詢需要比較相似文本的數以千萬計的文檔,因此性能很重要。

回答

0

其實,這竟然是簡單的使用更多類似這樣的查詢嵌套查詢裏面:

{ 
    "query": { 
    "bool": { 
     "must": { 
     "nested": { 
      "path": "textFields", 
      "query": { 
      "bool": { 
       "must": { 
       "more_like_this": { 
        "fields": ["textFields.text"], 
        "like_text": "This is a similar body of an email", 
        "min_term_freq": 1 
       } 
       }, 
       "filter": { 
       "term": { "textFields.field": "Body" } 
       } 
      } 
      } 
     } 
     }, 
     "filter": [ 
     { 
      "term": { 
      "communicationType": "Email" 
      } 
     }, 
     { 
      "range": { 
      "timestamp": { 
       "gte": 1497633300000 
      } 
      } 
     } 
     ] 
    } 
    }, 
    "min_score": 2 
}