0
我們正在測試Apache Impala,並注意到使用GROUP BY和LIKE一起工作非常緩慢 - 單獨的查詢工作速度更快。這裏有兩個例子:使用Group By和Like Impala進行Impala查詢的性能下降
# 1.37s 1.08s 1.35s
SELECT * FROM hive.default.pcopy1B where
(lower("by") like '%part%' and lower("by") like '%and%' and lower("by") like '%the%')
or (lower(title) like '%part%' and lower(title) like '%and%' and lower(title) like '%the%')
or (lower(url) like '%part%' and lower(url) like '%and%' and lower(url) like '%the%')
or (lower(text) like '%part%' and lower(text) like '%and%' and lower(text) like '%the%')
limit 100;
# 156.64s 155.63s
select "by", type, ranking, count(*) from pcopy where
(lower("by") like '%part%' and lower("by") like '%and%' and lower("by") like '%the%')
or (lower(title) like '%part%' and lower(title) like '%and%' and lower(title) like '%the%')
or (lower(url) like '%part%' and lower(url) like '%and%' and lower(url) like '%the%')
or (lower(text) like '%part%' and lower(text) like '%and%' and lower(text) like '%the%')
group by "by", type, ranking
order by 4 desc limit 10;
可能有人請解釋爲什麼這個問題時,如果有任何變通辦法?
這兩個查詢看起來與我很不一樣。第一個選擇記錄,只需要一個遊標,第二個必須檢索所有記錄並同時運行GROUP和SORT。如果返回的記錄非常多,這可能會解釋時間上的差異。或者我錯過了什麼? – LSerni