2013-07-01 73 views
1

不知道這是錯誤還是我錯過了一些東西。但是,術語方面正在返回錯誤數量的條款數量。elasticsearch:錯誤計數方面

我有一個字段有str_tag_analyzer

我想從字段中獲取標籤雲。我想獲得排名前20的標籤以及他們的數量(他們出現了多少次)

術語方面看起來這種情況下的解決方案。我有一個理解,術語facet query中的size參數控制將返回多少個標記。

當我運行不同大小的術語分面查詢時,我得到意想不到的結果。這裏是我的一些查詢和他們的結果。

查詢1

curl -XGET 'http://server:9200/stage_profiles/wrapper_0/_search?pretty=1' -d ' 
{ 
query : { 
    "nested" : { 
    "query" : { 
     "field" : { 
     "gsid" : 222 
     } 
    }, 
    "path" : "medals" 
    } 
}, from: 0, size: 0 
, 
facets: { 
"tags" : { "terms" : {"field" : "field_val_t", size: 1} } 
} 
}' 


{ 
    "took" : 1, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 3, 
    "successful" : 3, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 189, 
    "max_score" : 1.0, 
    "hits" : [ ] 
    }, 
    "facets" : { 
    "tags" : { 
     "_type" : "terms", 
     "missing" : 57, 
     "total" : 331, 
     "other" : 316, 
     "terms" : [ { 
     "term" : "hyderabad", 
     "count" : 15 
     } ] 
    } 
    } 

查詢2

curl -XGET 'http://server:9200/stage_profiles/wrapper_0/_search?pretty=1' -d ' 
{ 
query : { 
    "nested" : { 
    "query" : { 
     "field" : { 
     "gsid" : 222 
     } 
    }, 
    "path" : "medals" 
    } 
}, from: 0, size: 0 
, 
facets: { 
"tags" : { "terms" : {"field" : "field_val_t", size: 3} } 
} 
}' 


{ 
    "took" : 1, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 3, 
    "successful" : 3, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 189, 
    "max_score" : 1.0, 
    "hits" : [ ] 
    }, 
    "facets" : { 
    "tags" : { 
     "_type" : "terms", 
     "missing" : 57, 
     "total" : 331, 
     "other" : 282, 
     "terms" : [ { 
     "term" : "playing", 
     "count" : 20 
     }, { 
     "term" : "hyderabad", 
     "count" : 15 
     }, { 
     "term" : "pune", 
     "count" : 14 
     } ] 
    } 
    } 
} 

查詢3

curl -XGET 'http://server:9200/stage_profiles/wrapper_0/_search?pretty=1' -d ' 
{ 
query : { 
    "nested" : { 
    "query" : { 
     "field" : { 
     "gsid" : 222 
     } 
    }, 
    "path" : "medals" 
    } 
}, from: 0, size: 0 
, 
facets: { 
"tags" : { "terms" : {"field" : "field_val_t", size: 10} } 
} 
}' 
{ 
    "took" : 1, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 3, 
    "successful" : 3, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 189, 
    "max_score" : 1.0, 
    "hits" : [ ] 
    }, 
    "facets" : { 
    "tags" : { 
     "_type" : "terms", 
     "missing" : 57, 
     "total" : 331, 
     "other" : 198, 
     "terms" : [ { 
     "term" : "playing", 
     "count" : 20 
     }, { 
     "term" : "hyderabad", 
     "count" : 19 
     }, { 
     "term" : "bangalore", 
     "count" : 18 
     }, { 
     "term" : "pune", 
     "count" : 16 
     }, { 
     "term" : "chennai", 
     "count" : 16 
     }, { 
     "term" : "games", 
     "count" : 13 
     }, { 
     "term" : "testing", 
     "count" : 11 
     }, { 
     "term" : "cricket", 
     "count" : 9 
     }, { 
     "term" : "singing", 
     "count" : 6 
     }, { 
     "term" : "movies", 
     "count" : 5 
     } ] 
    } 
    } 
} 

我有如下考慮 1.第一個查詢是給具有15計數標記,但還有另一個標籤的計數爲20(可以在查詢2和3中看到)。因此它必須返回「正在播放」標籤,計數爲20. 2.第二個查詢返回「hyderabad」標籤的計數爲15,但第三個查詢返回的計數爲19,用於相同標籤。

如果您需要任何其他信息,例如地圖,ES中的數據,請告訴我。 謝謝

回答

1

這是一個known issue。解決方法是使用單個分片或要求更多條款,然後打算顯示。

+0

從版本0.90.6開始,您還可以使用['shard_size'](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-facet.html#_accuracy_control) 。 – Sonson123

+0

這不是實現它的最好方法。使用單個碎片可能會影響性能。 – eliasah