2015-03-03 43 views
4

我有一個teradata視圖,每天包含10億條記錄,我需要處理數據1年,因此我們有365億條記錄,數據按日期分區 - 每天間隔。將teradata中的數十億條記錄從一個表移動到另一個表

我需要插入 - 選擇與措施3個ID列(數據將根據這些分組)2列(需要使用SUM AGG功能)

查詢是類似下面:

Insert into table1 
Select 
    col1, col2, col3, SUM(col4), SUM(col5) 
FROM 
    table2 
GROUP BY 
    col1, col2, col3 
WHERE coldate between 'date1' and 'date2'; 

問題是,如果我運行一天並且我需要運行一年,查詢會繼續執行(20分鐘內未完成)。

我應該怎麼辦 - 我應該使用MLOAD - 插入選擇還是其他?

請建議,儘快解決。謝謝

Explain SELECT 
    ORIGINATING_NUMBER_VAL, 
    SUM(ACTIVITY_DURATION_MEAS), 
    SUM(Upload_Data_Volume), 
    SUM(Download_Data_Volume) 
FROM 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES 
WHERE 
    CAST(Activity_Start_Dttm as DATE) between '2014-12-01' AND '2014-12-31' 
GROUP BY 
    ORIGINATING_NUMBER_VAL; 

    1) First, we lock DP_TAB.NETWORK_ACTIVITY_DATA_RES in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access, we lock 
    DP_TAB.NETWORK_ACTIVITY_DATA_BLC_2013 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access, we lock 
    DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ4_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access, we lock 
    DP_TAB.NETWORK_ACTIVITY_DATA_RES_BLC in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access, we lock 
    DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ2_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access, we lock 
    DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ1_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access, and we lock 
    DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ3_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES for access. 
    2) Next, we do an all-AMPs RETRIEVE step from 31 partitions of 
    DP_TAB.NETWORK_ACTIVITY_DATA_RES in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES with a condition of (
    "(DP_TAB.NETWORK_ACTIVITY_DATA_RES in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >= 
    TIMESTAMP '2014-12-01 00:00:00') AND 
    ((DP_TAB.NETWORK_ACTIVITY_DATA_RES in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >= 
    TIMESTAMP '3015-02-09 00:00:00') AND 
    (DP_TAB.NETWORK_ACTIVITY_DATA_RES in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm < 
    TIMESTAMP '2015-01-01 00:00:00'))") into Spool 1 (all_amps), which 
    is built locally on the AMPs. The input table will not be cached 
    in memory, but it is eligible for synchronized scanning. The size 
    of Spool 1 is estimated with low confidence to be 1 row (70 bytes). 
    The estimated time for this step is 37.22 seconds. 
    3) We do an all-AMPs RETRIEVE step from 31 partitions of 
    DP_TAB.NETWORK_ACTIVITY_DATA_RES_BLC in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES with a condition of (
    "(DP_TAB.NETWORK_ACTIVITY_DATA_RES_BLC in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >= 
    TIMESTAMP '2014-12-01 00:00:00') AND 
    ((DP_TAB.NETWORK_ACTIVITY_DATA_RES_BLC in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm < 
    TIMESTAMP '2015-01-01 00:00:00') AND 
    ((DP_TAB.NETWORK_ACTIVITY_DATA_RES_BLC in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >= 
    TIMESTAMP '2014-10-13 00:00:00') AND 
    (DP_TAB.NETWORK_ACTIVITY_DATA_RES_BLC in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm < 
    TIMESTAMP '3015-02-10 00:00:00')))") into Spool 1 (all_amps), 
    which is built locally on the AMPs. The input table will not be 
    cached in memory, but it is eligible for synchronized scanning. 
    The result spool file will not be cached in memory. The size of 
    Spool 1 is estimated with low confidence to be 22,856,337,679 rows 
    (1,599,943,637,530 bytes). The estimated time for this step is 1 
    hour and 52 minutes. 
    4) We do an all-AMPs RETRIEVE step from 0 partitions of 
    DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ1_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES by way of an all-rows scan 
    with a condition of ("(DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ1_14 in 
    view dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >= 
    TIMESTAMP '2014-12-01 00:00:00') AND 
    ((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ1_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm< 
    TIMESTAMP '2015-01-01 00:00:00') AND 
    ((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ1_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm < 
    TIMESTAMP '2014-04-01 00:00:00') AND 
    (DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ1_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >= 
    TIMESTAMP '2014-01-01 00:00:00')))") into Spool 1 (all_amps), 
    which is built locally on the AMPs. The input table will not be 
    cached in memory, but it is eligible for synchronized scanning. 
    The size of Spool 1 is estimated with low confidence to be 
    22,856,337,680 rows (1,599,943,637,600 bytes). The estimated time 
    for this step is 0.01 seconds. 
    5) We do an all-AMPs RETRIEVE step from 0 partitions of 
    DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ2_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES by way of an all-rows scan 
    with a condition of ("(DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ2_14 in 
    view dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >= 
    TIMESTAMP '2014-12-01 00:00:00') AND 
    ((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ2_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm< 
    TIMESTAMP '2015-01-01 00:00:00') AND 
    ((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ2_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm < 
    TIMESTAMP '2014-07-01 00:00:00') AND 
    (DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ2_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >= 
    TIMESTAMP '2014-04-01 00:00:00')))") into Spool 1 (all_amps), 
    which is built locally on the AMPs. The input table will not be 
    cached in memory, but it is eligible for synchronized scanning. 
    The size of Spool 1 is estimated with low confidence to be 
    22,856,337,681 rows (1,599,943,637,670 bytes). The estimated time 
    for this step is 0.01 seconds. 
    6) We do an all-AMPs RETRIEVE step from 0 partitions of 
    DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ3_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES by way of an all-rows scan 
    with a condition of ("(DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ3_14 in 
    view dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >= 
    TIMESTAMP '2014-12-01 00:00:00') AND 
    ((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ3_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm< 
    TIMESTAMP '2014-10-01 00:00:00') AND 
    ((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ3_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >= 
    TIMESTAMP '2014-07-01 00:00:00') AND 
    (DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ3_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm < 
    TIMESTAMP '2015-01-01 00:00:00')))") into Spool 1 (all_amps), 
    which is built locally on the AMPs. The input table will not be 
    cached in memory, but it is eligible for synchronized scanning. 
    The size of Spool 1 is estimated with low confidence to be 
    22,856,337,682 rows (1,599,943,637,740 bytes). The estimated time 
    for this step is 0.01 seconds. 
    7) We do an all-AMPs RETRIEVE step from 0 partitions of 
    DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ4_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES by way of an all-rows scan 
    with a condition of ("(DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ4_14 in 
    view dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >= 
    TIMESTAMP '2014-12-01 00:00:00') AND 
    ((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ4_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm< 
    TIMESTAMP '2015-01-01 00:00:00') AND 
    ((DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ4_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm < 
    TIMESTAMP '2014-10-13 00:00:00') AND 
    (DP_TAB.NETWORK_ACTIVITY_DATA_BLCQ4_14 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >= 
    TIMESTAMP '2014-10-01 00:00:00')))") into Spool 1 (all_amps), 
    which is built locally on the AMPs. The input table will not be 
    cached in memory, but it is eligible for synchronized scanning. 
    The size of Spool 1 is estimated with low confidence to be 
    22,856,337,683 rows (1,599,943,637,810 bytes). The estimated time 
    for this step is 0.01 seconds. 
    8) We do an all-AMPs RETRIEVE step from 0 partitions of 
    DP_TAB.NETWORK_ACTIVITY_DATA_BLC_2013 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES by way of an all-rows scan 
    with a condition of ("(DP_TAB.NETWORK_ACTIVITY_DATA_BLC_2013 in 
    view dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm >= 
    TIMESTAMP '2014-12-01 00:00:00') AND 
    ((DP_TAB.NETWORK_ACTIVITY_DATA_BLC_2013 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm< 
    TIMESTAMP '2014-01-01 00:00:00') AND 
    (DP_TAB.NETWORK_ACTIVITY_DATA_BLC_2013 in view 
    dp_tab_view.NETWORK_ACTIVITY_DATA_RES.Activity_Start_Dttm < 
    TIMESTAMP '2015-01-01 00:00:00'))") into Spool 1 (all_amps), which 
    is built locally on the AMPs. The input table will not be cached 
    in memory, but it is eligible for synchronized scanning. The size 
    of Spool 1 is estimated with low confidence to be 22,856,337,684 
    rows (1,599,943,637,880 bytes). The estimated time for this step 
    is 0.01 seconds. 
    9) We do an all-AMPs SUM step to aggregate from Spool 1 (Last Use) by 
    way of an all-rows scan with a condition of (
    "((CAST((NETWORK_ACTIVITY_DATA_RES.ACTIVITY_START_DTTM) AS 
    DATE))>= DATE '2014-12-01') AND 
    ((CAST((NETWORK_ACTIVITY_DATA_RES.ACTIVITY_START_DTTM) AS DATE))<= 
    DATE '2014-12-31')") , grouping by field1 (ORIGINATING_NUMBER_VAL). 
    Aggregate Intermediate Results are computed globally, then placed 
    in Spool 4. The aggregate spool file will not be cached in memory. 
    The size of Spool 4 is estimated with low confidence to be 
    17,142,253,263 rows (1,628,514,059,985 bytes). The estimated time 
    for this step is 6 hours and 28 minutes. 
10) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of 
    an all-rows scan into Spool 2 (group_amps), which is built locally 
    on the AMPs. The result spool file will not be cached in memory. 
    The size of Spool 2 is estimated with low confidence to be 
    17,142,253,263 rows (1,165,673,221,884 bytes). The estimated time 
    for this step is 21 minutes and 27 seconds. 
11) Finally, we send out an END TRANSACTION step to all AMPs involved 
    in processing the request. 
    -> The contents of Spool 2 are sent back to the user as the result of 
    statement 1. The total estimated time is 8 hours and 42 minutes. 
+1

什麼是索引和計劃?看起來像'範圍'型查詢,它拒絕任何複合索引。 Group'ing十億。的未索引記錄既是記憶又是耗時的任務。 – Matt 2015-03-03 13:21:30

+1

目標表的PI是什麼?它是否與源表匹配?如果不這樣做,是否有可能在查詢計劃的重新分配步驟中存在歪斜問題。具有10億行的分區相當深,但取決於系統配置,而不是難以管理的。您的統計數據是基於PPI表的推薦做法收集的? – 2015-03-03 16:26:16

+0

感謝您的回覆,下面是選擇查詢的執行計劃(考慮1個月),總時間估計爲8小時加。請指教。 https://onedrive.live.com/?cid=73d6f5250a5bffa7&id=73D6F5250A5BFFA7!256&ithint=file,txt&authkey=!ABNlAtlSDyGDaLI – 2015-03-04 06:09:57

回答

1

按照@JNevill的建議,將目標表創建爲MULTISET總是一個好主意。除此之外,您可以做的事情不多,因爲計劃看起來合理。

既然你似乎有源表(We do an all-AMPs RETRIEVE step from 31 partitions of),你可以運行一系列較小的日常查詢每日分區 - 它不會是任何更快,但:

  • 你會得到你的結果增量式,
  • 如果發生故障,在查詢運行數小時後您不必重新開始工作
  • 您將擁有更好的ETA,因爲您很快就會得到實際的執行時間。 EXPLAIN中的數字可能與實際時間有很大不同。
相關問題