我正在使用這兩個查詢將數據首先插入事務轉儲中的事務精製表中,然後刪除可能的重複項。TSQL插入和刪除重複項以提高查詢性能
INSERT INTO [dbo].[Transactions_Refined]
SELECT
Client_ID,
Customer_ID,
Transaction_ID,
SUM(try_parse(value_sold AS numeric(18,2))) AS value_sold,
SUM(try_parse(quantity AS numeric(18,4))) AS quantity,
subclass,
article,
try_parse(Transaction_Date AS Datetime) AS Transaction_Date,
Store_ID
FROM
[dbo].[Transaction_Dump]
GROUP BY
Client_ID, Customer_ID, Transaction_ID,
try_parse(Transaction_Date AS Datetime),
subclass, article, Store_ID ;
WITH cte AS
(
SELECT
*,
row_number() OVER(PARTITION BY Client_ID, Customer_ID, Transaction_ID, value_sold, quantity, subclass, article
ORDER BY Client_ID, Customer_ID, Transaction_ID, value_sold, quantity, subclass, article) AS [rn]
FROM
[dbo].[Transactions_Refined]
WHERE
Client_ID IN (SELECT DISTINCT [Client_ID]
FROM [dbo].[Transaction_Dump]))
DELETE cte
WHERE [rn] > 1 ;
我想加快這個過程。任何想法?我正在考慮使用外連接。
是Transaction_Dump一個表,你從一些文本文件加載? 這就是您使用TRY_PARSE的原因嗎? 它通常包含多少條記錄? Transactions_Refined包含多少條記錄? 它是一個累積表,你追加並不斷增長? 我看到您使用ROW_NUMBER爲了刪除重複密鑰,但是不是以這種方式丟失信息? –
用'EXISTS'替換'IN DISTINCT' –
To:tomislav_t是的數據來自一個文本文件,我解析它,因爲我需要將它轉換成適當的格式化數據。我沒有使用SSIS我正在使用Azure數據工廠。是的,你知道我正在追加數據。不,我不會丟失我需要的信息。 –