postgres按整數類型列比字符類型列快？

我有4代表這是postgres按整數類型列比字符類型列快？

create table web_content_3 (content integer, hits bigint, bytes bigint, appid varchar(32) ); 
create table web_content_4 (content character varying (128), hits bigint, bytes bigint, appid varchar(32) ); 
create table web_content_5 (content character varying (128), hits bigint, bytes bigint, appid integer); 
create table web_content_6 (content integer, hits bigint, bytes bigint, appid integer);

我使用相同組查詢由約2個百萬條記錄即SELECT content, sum(hits) as hits, sum(bytes) as bytes, appid from web_content_{3,4,5,6} GROUP BY content,appid; 結果是：

- Table Name | Content | appid  | Time Taken [In ms] 
- =========================================================== 
- web_content_3 | integer | Character | 27277.931 
- web_content_4 | Character | Character | 151219.388 
- web_content_5 | Character | integer | 127252.023 
- web_content_6 | integer | integer | 5412.096

這裏web_content_6查詢到各地5secs只比較其他三個組合，使用這個統計我們可以說整數，整數組合爲更快，但問題是爲什麼？

我也有EXPLAIN結果，但它確實給了我關於web_content_4和web_content_6查詢之間劇烈變化的任何解釋。

它在這裏。

test=# EXPLAIN ANALYSE SELECT content, sum(hits) as hits, sum(bytes) as bytes, appid from web_content_4 GROUP BY content,appid; 
                   QUERY PLAN                
-------------------------------------------------------------------------------------------------------------------------------------- 
GroupAggregate (cost=482173.36..507552.31 rows=17680 width=63) (actual time=138099.612..151565.655 rows=17680 loops=1) 
    -> Sort (cost=482173.36..487196.11 rows=2009100 width=63) (actual time=138099.202..149256.707 rows=2009100 loops=1) 
     Sort Key: content, appid 
     Sort Method: external merge Disk: 152488kB 
     -> Seq Scan on web_content_4 (cost=0.00..45218.00 rows=2009100 width=63) (actual time=0.010..349.144 rows=2009100 loops=1) 
Total runtime: 151613.569 ms 
(6 rows) 

Time: 151614.106 ms 

test=# EXPLAIN ANALYSE SELECT content, sum(hits) as hits, sum(bytes) as bytes, appid from web_content_6 GROUP BY content,appid; 
                   QUERY PLAN                
-------------------------------------------------------------------------------------------------------------------------------------- 
GroupAggregate (cost=368814.36..394194.51 rows=17760 width=24) (actual time=3282.333..5840.953 rows=17760 loops=1) 
    -> Sort (cost=368814.36..373837.11 rows=2009100 width=24) (actual time=3282.176..3946.025 rows=2009100 loops=1) 
     Sort Key: content, appid 
     Sort Method: external merge Disk: 74632kB 
     -> Seq Scan on web_content_6 (cost=0.00..34864.00 rows=2009100 width=24) (actual time=0.011..297.235 rows=2009100 loops=1) 
Total runtime: 6172.960 ms

來源

2014-03-13 mayurpatel

因爲比較。比較整數而不是「字符串」的速度更快 – StanislavL

可能在字符串的情況下，它按字符比較進行字符比較，所以在排序的情況下也需要時間..您也可以在解釋計劃中看到。 –

任何這些表上的索引？ –

Gordon Linoff當然是對的。溢出到磁盤是昂貴的。

如果您可以節省內存，您可以告訴PostgreSQL更多地使用排序等。我構建了一個表，使用隨機數據填充它，並在運行此查詢之前分析它。

EXPLAIN ANALYSE 
SELECT content, sum(hits) as hits, sum(bytes) as bytes, appid 
from web_content_4 
GROUP BY content,appid; 

"GroupAggregate (cost=364323.43..398360.86 rows=903791 width=96) (actual time=25059.086..29789.234 rows=1998067 loops=1)" 
" -> Sort (cost=364323.43..369323.34 rows=1999961 width=96) (actual time=25057.540..27907.143 rows=2000000 loops=1)" 
"  Sort Key: content, appid" 
"  Sort Method: external merge Disk: 216016kB" 
"  -> Seq Scan on web_content_4 (cost=0.00..52472.61 rows=1999961 width=96) (actual time=0.010..475.187 rows=2000000 loops=1)" 
"Total runtime: 30012.427 ms"

我得到了相同的執行計劃。在我的情況下，這個查詢做了一個外部合併排序，需要大約216MB的磁盤。通過設置work_mem的值，我可以告訴PostgreSQL爲這個查詢提供更多的內存。

set work_mem = '250MB'; 
EXPLAIN ANALYSE 
SELECT content, sum(hits) as hits, sum(bytes) as bytes, appid 
from web_content_4 
GROUP BY content,appid; 

"HashAggregate (cost=72472.22..81510.13 rows=903791 width=96) (actual time=3196.777..4505.290 rows=1998067 loops=1)" 
" -> Seq Scan on web_content_4 (cost=0.00..52472.61 rows=1999961 width=96) (actual time=0.019..437.252 rows=2000000 loops=1)" 
"Total runtime: 4726.401 ms"

現在是PostgreSQL使用哈希聚合（設置work_mem這樣只會影響我的當前連接。），和執行時間由6個30秒至5秒的因素下降。

我沒有測試web_content_6，因爲整數替換文本通常需要幾個加入到恢復文本。所以我不確定我們會在那裏比較蘋果和蘋果。

來源

2014-03-13 14:57:13

此聚合的性能將由排序的速度驅動。在所有情況相同的情況下，較大數據需要比較短數據更多的時間。「快速」的情況是分類74M字節; 「慢」，152M字節。

這可能會導致性能上的差異，但在大多數情況下卻不會達到30倍的差異。你會看到一個巨大差異的情況是，較小的數據適合內存，較大的數據不適合內存。溢出到磁盤是昂貴的。

一個懷疑是數據已被排序或幾乎排序web_content_6(content, appid)。這可能會縮短排序所需的時間。如果將實際時間與兩種類型的「成本」進行比較，則會發現「快速」版本的運行速度比預期要快得多（假設成本相當）。

來源

2014-03-13 14:07:20

postgres按整數類型列比字符類型列快？

回答

相關問題