這將是可能更快,但不太可靠的解決方案:
t=# create table t (i int);
CREATE TABLE
t=# insert into t select generate_series(1,9,1);
INSERT 0 9
t=# insert into t select generate_series(1,999999,1);
INSERT 0 999999
t=# insert into t select generate_series(1,9999999,1);
INSERT 0 9999999
現在查詢:
t=# select i,count(*) from t group by i having count(*) > 1 order by 2 desc,1 limit 1;
i | count
---+-------
1 | 3
(1 row)
Time: 7538.476 ms
現在從統計檢查:
t=# analyze t;
ANALYZE
Time: 1079.465 ms
t=# with fr as (select most_common_vals::text::text[] from pg_stats where tablename = 't' and attname='i')
select count(1),i from t join fr on true where i::text = any(most_common_vals) group by i;
count | i
-------+--------
2 | 94933
2 | 196651
2 | 242894
2 | 313829
2 | 501027
2 | 757714
2 | 778442
2 | 896602
2 | 929918
2 | 979650
2 | 999259
(11 rows)
Time: 3584.582 ms
,最後只是檢查如果不是uniq只存在一個最頻繁的值:
統計在表上收集後
t=# select count(1),i from t where i::text = (select (most_common_vals::text::text[])[1] from pg_stats where tablename = 't' and attname='i') group by i;
count | i
-------+------
2 | 1540
(1 row)
Time: 1871.907 ms
更新
pg_stats
數據modifyed。因此,您有機會獲得數據分配方面的最新彙總統計信息。在我的實例樣本:
t=# delete from t where i = 1540;
DELETE 2
Time: 941.684 ms
t=# select count(1),i from t where i::text = (select (most_common_vals::text::text[])[1] from pg_stats where tablename = 't' and attname='i') group by i;
count | i
-------+---
(0 rows)
Time: 1876.136 ms
t=# analyze t;
ANALYZE
Time: 77.108 ms
t=# select count(1),i from t where i::text = (select (most_common_vals::text::text[])[1] from pg_stats where tablename = 't' and attname='i') group by i;
count | i
-------+-------
2 | 41377
(1 row)
Time: 1878.260 ms
當然
如果依靠更多的則只是一個最頻繁的值,失敗機會減少,但再次 - 這種方法依賴於統計數據「新鮮」。
你的代碼不工作(它看起來應該)?什麼是問題? –
查詢需要2分鐘的1000萬行數據集,我需要更快的速度。 –
與您的查詢數據在數據庫端進行處理 - 而不是psycopg2 –