紅移分佈。通過子列

我的情況紅移分佈。通過子列

我有我的紅移集羣一些表格，所有分解成兩種的ORDER_ID，shipment_id，或shipment_item_id取決於表的精細程度是。 order_id是shipment_id上的一對多關係，shipment_id是shipemnt_item_id上的一對多關係。

我的問題

我分配上的order_id，因此所有shipment_id和shipment_item_id記錄應該是整個表在同一節點上，因爲它們是由ORDER_ID分組。我的問題是，當我必須加入shipment_id或shipment_item_id時，那麼紅移會知道這些記錄位於相同的節點上，還是會繼續廣播這些表，因爲它們沒有連接到order_id上？

實施例表

unified_order         shipment_details 
+----------+-------------+------------------+ +-------------+-----------+--------------+ 
| order_id | shipment_id | shipment_item_id | | shipment_id | ship_day | ship_details | 
+----------+-------------+------------------+ +-------------+-----------+--------------+ 
|  1 |   1 |    1 | |   1 | 1/1/2017 | stuff  | 
|  1 |   1 |    2 | |   2 | 5/1/2017 | other stuff | 
|  1 |   1 |    3 | |   3 | 6/14/2017 | more stuff | 
|  1 |   2 |    4 | |   4 | 5/13/2017 | less stuff | 
|  1 |   2 |    5 | |   5 | 6/19/2017 | that stuff | 
|  1 |   3 |    6 | |   6 | 7/31/2017 | what stuff | 
|  2 |   4 |    7 | |   7 | 2/5/2017 | things  | 
|  2 |   4 |    8 | +-------------+-----------+--------------+ 
|  3 |   5 |    9 | 
|  3 |   5 |    10 | 
|  4 |   6 |    11 | 
|  5 |   7 |    12 | 
|  5 |   7 |    13 | 
+----------+-------------+------------------+

分佈

distribution_by_node 
+------+----------+-------------+------------------+ 
| node | order_id | shipment_id | shipment_item_id | 
+------+----------+-------------+------------------+ 
| 1 |  1 |   1 |    1 | 
| 1 |  1 |   1 |    2 | 
| 1 |  1 |   1 |    3 | 
| 1 |  1 |   2 |    4 | 
| 1 |  1 |   2 |    5 | 
| 1 |  1 |   3 |    6 | 
| 1 |  5 |   7 |    12 | 
| 1 |  5 |   7 |    13 | 
| 2 |  2 |   4 |    7 | 
| 2 |  2 |   4 |    8 | 
| 3 |  3 |   5 |    9 | 
| 3 |  3 |   5 |    10 | 
| 4 |  4 |   6 |    11 | 
+------+----------+-------------+------------------+

來源

2017-09-28 Andrew O' Brien

給出您的查詢示例，我在'shipment_details'表中看不到'order_id'，如果沒有'order_id'，它將如何分配訂單和貨物之間的這種列和關係似乎只保留在'unified_order'表中 – AlexYes

亞馬遜紅移文檔不細講信息是如何節點之間共享，但它是值得懷疑的，它「廣播表」。

相反，信息可能根據需要在節點之間發送 - 只有相關列將被共享，並且可能只有數據的子範圍。

與其擔心太多的內部實現，你應該對實際查詢測試各種DISTKEY和SORTKEY策略以確定性能。

按照Choose the Best Distribution Style的建議來減少需要在節點之間發送的數據量，並參考Amazon Redshift Best Practices for Designing Queries以改善查詢。

來源

2017-09-28 04:14:24

感謝John的回覆。如果該表未加入分配列，則Redshift將[廣播]（http://docs.aws.amazon.com/redshift/latest/dg/c_data_redistribution.html）（請參閱ds_bcast_inner）。我不太熟悉Redshift中的查詢構建，因爲我在SQL服務器中，但我相信它也適用於連接之後的select和where語句[SELECT語句的邏輯處理順序]（https://docs.microsoft .COM/EN-US/SQL/T-SQL /查詢/選擇-的Transact-SQL）。我打算對此進行測試，但通過Redshift，我們需要構建數據以支持我們的查詢。 –

您可以EXPLAIN您的查詢來了解數據在執行過程中將如何分發（或不分發）。在本文檔中，您將看到如何閱讀查詢計劃： Evaluating the Query Plan

來源

2017-09-28 20:20:13 AlexYes

是的，我會測試這個和審查執行計劃，但填充這些表與新的分配鍵將需要一段時間，所以我想我會問。 –

紅移分佈。通過子列

回答

相關問題