PostgreSQL自聯接是否忽略索引？

我在PostgreSQL 8.4.12下表：PostgreSQL自聯接是否忽略索引？

  Table "public.ratings" 
Column |   Type   | Modifiers 
--------+------------------------+----------- 
userid | character varying(128) | 
item | character varying(128) | 
score | integer    | 
Indexes: 
    "ratings_item" btree (item) 
    "ratings_ui" btree (userid, item) 
    "ratings_userid" btree (userid)

我想執行自聯接查找由額定的特定項目的所有用戶評分的項目。爲了簡單起見，我將使用查詢來獲得每個可疑類似項目的評分數量，例如：

select r2.item,sum(1) 
from ratings r1 
left join ratings r2 using (userid) 
where r1.item='an3.php' 
group by r2.item

查詢工作正常，但對於我的表有3600萬條記錄，它需要永遠。當我解釋聲明，我得到如下：

GroupAggregate (cost=8102958.42..8247621.18 rows=16978 width=17) -> Sort (cost=8102958.42..8151108.60 rows=19260072 width=17) 
     Sort Key: r2.item 
     -> Hash Left Join (cost=1458652.29..4192647.43 rows=19260072 width=17) 
       Hash Cond: ((r1.userid)::text = (r2.userid)::text) 
       -> Bitmap Heap Scan on ratings r1 (cost=868.20..77197.24 rows=24509 width 
=22) 
        Recheck Cond: ((item)::text = 'an3.php'::text) 
        -> Bitmap Index Scan on ratings_item (cost=0.00..862.07 rows=24509 width=0) 
          Index Cond: ((item)::text = 'an3.php'::text) 
       -> Hash (cost=711028.93..711028.93 rows=36763293 width=39) 
        -> Seq Scan on ratings r2 (cost=0.00..711028.93 rows=36763293 width 
=39)

從以往的經驗，我相信了「收視率R2序列掃描」是罪魁禍首。

在另一方面，如果我搜索項目不存在：

select r2.item,sum(1) from ratings r1 left join ratings r2 using (userid) 
where r1.item='targetitem' group by r2.item;

這似乎做工精細（即不返回任何結果，這是直接的）

GroupAggregate (cost=2235887.19..2248234.70 rows=16978 width=17) -> Sort (cost=2235887.19..2239932.29 rows=1618038 width=17) 
     Sort Key: r2.item 
     -> Nested Loop Left Join (cost=0.00..1969469.94 rows=1618038 width=17) 
       -> Index Scan using ratings_item on ratings r1 (cost=0.00..8317.74 rows=2 059 width=22) 
        Index Cond: ((item)::text = 'targetitem'::text) 
       -> Index Scan using ratings_userid on ratings r2 (cost=0.00..947.24 rows= 419 width=39) 
        Index Cond: ((r1.userid)::text = (r2.userid)::text)

同樣的表和查詢在MySQL中工作正常，但我無法將我的推薦系統遷移到另一個數據庫。

我做錯了什麼，或者這是與Postgres的東西？有沒有解決辦法？

來源

2015-06-10 TOM MARRACCI

有多少行匹配到an3.php？真空分析是否已經運行？如果有很多行匹配，則索引可能無法使用， –

可能有數百萬個可能的結果。 Mysql能夠在幾秒鐘內爲每個查詢返回一些內容。我將運行真空分析評級並再試一次。 –

vacum分析評分已完成，但仍希望對評分r2進行連續掃描。 –

要回答標題中的（修辭）問題：第

我在這裏看到了不少問題，在第一行開始。

Postgres 8.4 has reached EOL last year。沒人應該再使用它，它太舊了。如果可能，升級到當前版本。

除了這個，你至少應該在最新的小版本。 8.4.12於2012年6月4日發佈，缺少兩年的錯誤和安全修復程序。 8.2.23是死亡版本的最後一個版本。
Read the versioning policy of the project.

接着，varchar(128)是作爲PK/FK非常低效的，特別是對具有數百萬行的表。處理過程中不必要的大而昂貴。改爲使用integer or bigint。或者UUID如果你真的需要更大的號碼空間（我懷疑它）。

接下來，我沒有在(userid, item)（which would obsolete an additional index on the same）上看到UNIQUE或PRIMARY KEY約束。無論您的表格定義是否缺乏，或者您的查詢是錯誤的，或者您的問題已被破壞。

試試這個改寫查詢：

SELECT r2.item, count(*) AS ct 
FROM (
    SELECT userid 
    FROM ratings 
    WHERE item = 'an3.php' 
    GROUP BY 1 -- should not be necessary, but constraint is missing 
    ) r1 
JOIN ratings r2 USING (userid) 
GROUP BY 1;

在現代的Postgres，你需要以獲得最佳性能兩個指標。在(item, userid)和(userid, item)。

Is a composite index also good for queries on the first field?

在Postgres裏9.2+，你甚至可能會得到僅索引掃描出這一點。我不確定如何從過時的版本中獲得最佳效果。無論哪種方式，varchar(128)也是索引的昂貴數據類型。

來源

2015-06-11 01:52:22

PostgreSQL自聯接是否忽略索引？

回答

相關問題