從Influx轉到Postgres，需要提示

我用Influx來存儲我們的時間序列數據。它工作的時候很酷，然後大約一個月後，它停止工作，我不知道爲什麼。（類似於這個問題https://github.com/influxdb/influxdb/issues/1386）從Influx轉到Postgres，需要提示

也許Influx將有一天會很棒，但現在我需要使用更穩定的東西。我在想Postgres。我們的數據來自許多傳感器，每個傳感器都有一個傳感器ID。所以我想我們的結構化數據，這樣的：

（PK），sensorId（串），時間（時間戳），價值（浮動）

涌入是專爲時間序列數據，因此它可能有一些內置優化。我是否需要自己進行優化以使Postgres高效？更具體地說，我有這些問題：

Influx擁有'系列'這個概念，創建新系列很便宜。所以我對每個傳感器都有一個單獨的系列。我應該爲每個傳感器創建一個單獨的Postgres表格嗎？
我應該如何設置索引來快速查詢？一個典型的查詢是：在過去的3天中選擇sensor123的所有數據。
我應該在時間列中使用時間戳還是整數？
如何設置保留策略？例如。刪除超過一週的數據。
Will Postgres會水平放大嗎？我可以設置ec2羣集進行數據複製和負載平衡嗎？
可以在Postgres中下載樣本嗎？我讀過一些我可以使用date_trunc的文章。但似乎我無法將它date_trunc到特定的時間間隔，例如25秒。
我錯過了其他的注意事項嗎？

在此先感謝！

更新將時間列存儲爲大整數比將其存儲爲時間戳更快。難道我做錯了什麼？

把它作爲時間戳：

postgres=# explain analyze select * from test where sensorid='sensor_0'; 

Bitmap Heap Scan on test (cost=3180.54..42349.98 rows=75352 width=25) (actual time=10.864..19.604 rows=51840 loops=1) 
    Recheck Cond: ((sensorid)::text = 'sensor_0'::text) 
    Heap Blocks: exact=382 
    -> Bitmap Index Scan on sensorindex (cost=0.00..3161.70 rows=75352 width=0) (actual time=10.794..10.794 rows=51840 loops=1) 
     Index Cond: ((sensorid)::text = 'sensor_0'::text) 
Planning time: 0.118 ms 
Execution time: 22.984 ms 

postgres=# explain analyze select * from test where sensorid='sensor_0' and addedtime > to_timestamp(1430939804); 

Bitmap Heap Scan on test (cost=2258.04..43170.41 rows=50486 width=25) (actual time=22.375..27.412 rows=34833 loops=1) 
    Recheck Cond: (((sensorid)::text = 'sensor_0'::text) AND (addedtime > '2015-05-06 15:16:44-04'::timestamp with time zone)) 
    Heap Blocks: exact=257 
    -> Bitmap Index Scan on sensorindex (cost=0.00..2245.42 rows=50486 width=0) (actual time=22.313..22.313 rows=34833 loops=1) 
     Index Cond: (((sensorid)::text = 'sensor_0'::text) AND (addedtime > '2015-05-06 15:16:44-04'::timestamp with time zone)) 
Planning time: 0.362 ms 
Execution time: 29.290 ms

把它作爲大整數：

postgres=# explain analyze select * from test where sensorid='sensor_0'; 


Bitmap Heap Scan on test (cost=3620.92..42810.47 rows=85724 width=25) (actual time=12.450..19.615 rows=51840 loops=1) 
    Recheck Cond: ((sensorid)::text = 'sensor_0'::text) 
    Heap Blocks: exact=382 
    -> Bitmap Index Scan on sensorindex (cost=0.00..3599.49 rows=85724 width=0) (actual time=12.359..12.359 rows=51840 loops=1) 
     Index Cond: ((sensorid)::text = 'sensor_0'::text) 
Planning time: 0.130 ms 
Execution time: 22.331 ms 

postgres=# explain analyze select * from test where sensorid='sensor_0' and addedtime > 1430939804472; 


Bitmap Heap Scan on test (cost=2346.57..43260.12 rows=52489 width=25) (actual time=10.113..14.780 rows=31839 loops=1) 
    Recheck Cond: (((sensorid)::text = 'sensor_0'::text) AND (addedtime > 1430939804472::bigint)) 
    Heap Blocks: exact=235 
    -> Bitmap Index Scan on sensorindex (cost=0.00..2333.45 rows=52489 width=0) (actual time=10.059..10.059 rows=31839 loops=1) 
     Index Cond: (((sensorid)::text = 'sensor_0'::text) AND (addedtime > 1430939804472::bigint)) 
Planning time: 0.154 ms 
Execution time: 16.589 ms

來源

2015-05-03 user1657624

你的問題是**方式過於寬泛**，觸及多個問題，而不是遵循SO在編程問題上提出具體問題的實踐，指定你自己做了什麼。我建議你編輯這個帖子，在適當的論壇上提出具體問題併發布其他問題的其他問題（例如Q.5屬於dba.stackexchange）。 – Patrick

對於每個版本，只有一次運行時，16ms與29ms是無法證明「* integer快於時間戳*」的。（小）差異很可能是由系統中的緩存或其他事情引起的（例如，您應該使用'explain（analyze，verbose，buffers）'重複陳述） –

我多次重複該語句，整數總是更快比時間戳。但是，如果我不做to_timestamp（1430939804），而是事先轉換它，那麼它就像整數一樣快。也許to_timestamp被多次調用並且沒有優化？ – user1657624

你不應該爲每個傳感器創建一個表。相反，您可以在表中添加一個字段來標識它所在的系列。您還可以使用另一個表來描述有關該系列的其他屬性。如果數據點可能屬於多個系列，那麼您需要完全不同的結構。

對於在Q2中描述的查詢，您recorded_at列的索引應該工作（時間是SQL保留關鍵字，所以最好避免，作爲一個名字）

你應該使用TIMESTAMP WITH TIME ZONE爲您的時間數據類型。

保留取決於您。

Postgres有多種分片/複製選項。這是一個很大的話題。

不知道我理解你的目標＃6，但我相信你可以找出一些東西。

來源

2015-05-03 23:55:18 Bill

謝謝比爾。如果我有數百個傳感器每5秒向Postgres發送一次數據點，您是否會看到潛在的性能問題？（這是每天約800萬點）如果我需要從數百個系列中選擇一個系列，每個系列包含數十萬個點，該怎麼辦？我會進行一些測試，但我也很看重你的意見。 – user1657624

你可能想看看這樣的東西。 http://zaiste.net/2014/07/table_inheritance_and_partitioning_with_postgresql/你可以根據你的分割時間。然後刪除舊數據，只需刪除適當的子表。我想你會發現Postgres在性能方面會讓你大吃一驚。 – Bill

我用100個傳感器進行了測試，每個傳感器增加50k點，性能非常合理。唯一有點慢的是批量插入數據庫的部分。我在（sensorId，recorded_at）上創建了一個索引，然後使用COPY插入點，並且花了2分鐘添加所有點。這是正常的嗎？另一件事是，將records_at存儲爲大整數比將其存儲爲時間戳要快。我用psql的輸出更新了原來的問題。 – user1657624

從Influx轉到Postgres，需要提示

回答

相關問題