2017-10-06 24 views
0

我創建了一個函數,創建一個隨機地址,但每次調用都需要很長的時間(大約10-20秒)。我必須在超過900,000條記錄上執行此操作,並且通過計算此功能的時間,這需要120天的時間。下面是函數:加快表值函數

CREATE function dbo.fn_GetAddress2 (@state NVARCHAR(20)) 
returns @NewAddress TABLE 
( 
    Address1 NVARCHAR(MAX), 
    Address2 NVARCHAR(MAX), 
    City  NVARCHAR(MAX), 
    Postcode NVARCHAR(MAX) 
) 
AS 
BEGIN 
    DECLARE @Address1 NVARCHAR(MAX) 
    DECLARE @Address2 NVARCHAR(MAX) 
    DECLARE @City  NVARCHAR(MAX) 
    DECLARE @Postcode NVARCHAR(MAX) 
    DECLARE @StreetPID NVARCHAR(MAX) 
    DECLARE @newID1  NVARCHAR(36) 

    SELECT @StreetPID = 
     (SELECT TOP 1 g.street_locality_pid AS StreetPID 
      FROM [GNAF].dbo.Street_Locality g 
       INNER JOIN [GNAF].dbo.Address_Detail aD ON g.street_locality_pid = aD.street_locality_pid 
      WHERE g.street_name IS NOT NULL AND g.state != @state AND aD.flat_number IS NOT NULL 
      ORDER BY (SELECT new_id FROM getNewID)) 

    SELECT @Address1 = 
     (SELECT TOP 1 CAST(aD.flat_number AS VARCHAR(20)) + ' ' + g.Street_name + ' ' + g.street_type_code AS Address1 
      FROM [GNAF].dbo.Street_Locality g 
       INNER JOIN [GNAF].dbo.Address_Detail aD ON g.street_locality_pid = aD.street_locality_pid 
      WHERE g.street_name IS NOT NULL AND g.state != @state AND aD.flat_number IS NOT NULL 
        AND g.street_locality_pid = @StreetPID 
      ORDER BY (SELECT new_id FROM getNewID)) 


    SELECT @postcode = 
     (SELECT TOP 1 aD.postcode AS postcode 
      FROM [GNAF].dbo.Street_Locality g 
       INNER JOIN [GNAF].dbo.Address_Detail aD ON g.street_locality_pid = aD.street_locality_pid 
      WHERE g.street_name IS NOT NULL AND g.state != @state AND aD.flat_number IS NOT NULL 
        AND g.street_locality_pid = @StreetPID 
      ORDER BY (SELECT new_id FROM getNewID)) 

    SELECT @City = 
     (SELECT TOP 1 l.locality_name AS city 
      FROM [GNAF].dbo.Street_Locality g 
       INNER JOIN [GNAF].dbo.Address_Detail aD ON g.street_locality_pid = aD.street_locality_pid 
       INNER JOIN [GNAF].dbo.Locality l ON aD.locality_pid = l.locality_pid 
      WHERE g.street_name IS NOT NULL AND g.state != @state AND aD.flat_number IS NOT NULL 
        AND g.street_locality_pid = @StreetPID 
      ORDER BY (SELECT new_id FROM getNewID)) 

    IF @Address1 IS NOT NULL 
    BEGIN 
     INSERT @NewAddress 
     SELECT @Address1, @Address2, @city, @postcode; 
    END; 
    Return; 
END 
GO 

的[GNAF]數據庫是一個龐大的數據庫,充滿了澳大利亞的每一個地址。函數和newid()對我來說是全新的。

香港專業教育學院嘗試了幾種不同方法,包括CTE:

SET @State = 'NSW' 
;WITH CTE AS (
    SELECT TOP 1 CAST(aD.flat_number AS VARCHAR(20)) + ' ' + g.Street_name + ' ' + g.street_type_code AS Address1 
      , aD.postcode AS postcode 
    FROM [GNAF].dbo.Street_Locality g 
     INNER JOIN [GNAF].dbo.Address_Detail aD ON g.street_locality_pid = aD.street_locality_pid 
    WHERE g.street_name IS NOT NULL AND g.state != @state AND aD.flat_number IS NOT NULL 
    ORDER BY (SELECT new_id FROM getNewID) 
) 
SELECT @Address1 = (SELECT Address1 FROM CTE) 
     ,@postcode = (SELECT postcode FROM CTE) 
SELECT @Address1 
     , @postcode 

這竟是慢。任何幫助,將不勝感激。

+0

有幾件事情:如果'CROSS JOIN'是'getNewID'表,並且按'new_id'列排序而不是當前執行方式,那麼可能會產生更好的執行計劃。假設我正在理解你想要做的事情,你不需要所有這些變量或CTE或表變量 - 你可以用一個select語句來做你想做的一切。例如'CREATE FUNCTION dbo.blah(@state NVARCHAR(20))RETURNS TABLE AS RETURN(SELECT TOP 1 <我的所有列> FROM <我的表和連接> CROSS JOIN getNewID AS n WHERE ORDER BY n.new_id);' – ZLK

+0

謝謝,我注意到的是,查詢需要完全相同的時間,無論它返回1或1000000條記錄,我現在要做的是創建一個動態SQL查詢來計算唯一狀態的數量然後生病只是將它們連接在一起使用row_number與實際數據。只是努力將一個變量傳遞給CTE查詢作爲動態SQL –

+0

aaaand我只記得你可以在SQL 2005中使用TOP(@variable)+ –

回答

1

這是一些應該爲你工作的東西。請注意:我只是簡單地創建了5個新表,每個地址部分一個,並且使用地址表中的數據填充它們,而不是爲了遍歷整個地址表。除了狀態表,我用了2000。您可以使用或多或少的方法,只需確保在函數中更改模數值以匹配您在每個表中的行數。

在任何情況下,它都很快......我將發佈SET STATISTICS IO,基於10,000,100000 & 1,000,000行生成的TIME數字。

USE tempdb; 
GO 
-- Populate a series of individual tables one for each part of the address... 
CREATE TABLE dbo.a1 (ID INT NOT NULL IDENTITY (1,1) PRIMARY KEY CLUSTERED, Address1 VARCHAR(60)); 
INSERT dbo.a1 (Address1) 
SELECT TOP 2000 b.PhysAddr1 FROM Xyz.dbo.ContactBranch b WHERE b.PhysAddr1 LIKE '[0-Z ][0-Z ][0-Z ][0-Z ][0-Z ]%'; 

CREATE TABLE dbo.a2 (ID INT NOT NULL IDENTITY (1,1) PRIMARY KEY CLUSTERED, Address2 VARCHAR(50)); 
INSERT dbo.a2 (Address2) 
SELECT TOP 2000 ISNULL(b.PhysAddr2, '') FROM Xyz.dbo.ContactBranch b; 

CREATE TABLE dbo.cty (ID INT NOT NULL IDENTITY (1,1) PRIMARY KEY CLUSTERED, City VARCHAR(50)); 
INSERT dbo.cty (City) 
SELECT TOP 2000 b.PhysCity FROM Xyz.dbo.ContactBranch b WHERE b.PhysCity LIKE '[0-Z ][0-Z ][0-Z ][0-Z ][0-Z ]%'; 

CREATE TABLE dbo.st (ID INT NOT NULL IDENTITY (1,1) PRIMARY KEY CLUSTERED, State CHAR(2)); 
INSERT dbo.st (State) 
SELECT s.Description FROM Xyz.dbo.LK_States s WHERE s.Description LIKE '[a-Z][a-Z]'; 

CREATE TABLE dbo.zip (ID INT NOT NULL IDENTITY (1,1) PRIMARY KEY CLUSTERED, Zip VARCHAR(5)); 
INSERT dbo.zip (Zip) 
SELECT TOP 2000 LEFT(b.PhysZip10, 5) FROM Xyz.dbo.ContactBranch b WHERE b.PhysZip10 LIKE '[0-Z ][0-Z ][0-Z ][0-Z ][0-Z ]%'; 

/* DROP TABLE dbo.a1; DROP TABLE dbo.a2; DROP TABLE dbo.cty; DROP TABLE dbo.st; DROP TABLE dbo.zip; */ 
/* 
(2000 rows affected) 
(2000 rows affected) 
(2000 rows affected) 
(52 rows affected) 
(2000 rows affected) 
*/ 

的功能代碼...

SET QUOTED_IDENTIFIER ON 
GO 
SET ANSI_NULLS ON 
GO 
CREATE FUNCTION dbo.tfn_AddressGenerator 
/* =================================================================== 
10/06/2017 JL, Created: to randomly generate random addresses. 
    The general premmise is based on the Ben-Gan" or inline Tally table. 
=================================================================== */ 
--===== Define I/O parameters 
(
    @State CHAR(2), 
    @NumToCreate INT 
) 
RETURNS TABLE WITH SCHEMABINDING AS 
RETURN 

    WITH 
     cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)), --rows 
     cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),        -- 100 rows 
     cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),        -- 10,000 rows 
     cte_Tally (n) AS (
      SELECT TOP (@NumToCreate) 
       ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 
      FROM 
       cte_n3 a CROSS JOIN cte_n3 b             -- 100,000,000 rows 
      ) 
    SELECT 
     a1.Address1, 
     a2.Address2, 
     c.City, 
     State = IIF(s1.State = @State, s2.State, s1.State), 
     z.Zip 
    FROM 
     cte_Tally t 
     CROSS APPLY (VALUES (
      ABS(CHECKSUM(t.n)) % 2000 + 1, ABS(CHECKSUM(t.n)) % 1528 + 1, 
      ABS(CHECKSUM(t.n)) % 2000 + 1, ABS(CHECKSUM(t.n)) % 52 + 1, 
      ABS(CHECKSUM(t.n)) % 52 + 1, ABS(CHECKSUM(t.n)) % 2000 + 1 
      )) x (Add1, Add2, City, State1, State2, Zip) 
     CROSS APPLY (SELECT TOP 1 dbo.a1.Address1 FROM dbo.a1 WHERE x.Add1 = dbo.a1.ID) a1 
     CROSS APPLY (SELECT TOP 1 dbo.a2.Address2 FROM dbo.a2 WHERE x.Add2 = dbo.a2.ID) a2 
     CROSS APPLY (SELECT TOP 1 dbo.cty.City FROM dbo.cty WHERE x.City = dbo.cty.ID) c 
     CROSS APPLY (SELECT TOP 1 dbo.st.State FROM dbo.st WHERE x.State1 = dbo.st.ID) s1 
     CROSS APPLY (SELECT TOP 1 dbo.st.State FROM dbo.st WHERE x.State2 = dbo.st.ID) s2 
     CROSS APPLY (SELECT TOP 1 dbo.Zip.Zip  FROM dbo.zip WHERE x.Zip = dbo.zip.ID) z; 
GO 

功能的實際執行...

SELECT ag.Address1, ag.Address2, ag.City,ag.State, ag.Zip 
FROM dbo.tfn_AddressGenerator('FL',10000) ag; 

樣本輸出...

Address1     Address2 City    State Zip 
--------------------------- ----------- ---------------- ----- ----- 
111 CONGRESSIONAL BLVD     ATLANTA   AL 30042 
414 Eagle Rock Ave # 100 STE 400  MARIETTA   AR 70816 
414 Eagle Rock Ave Ste 107 Suite 300 NORCROSS   AZ 72116 
3931 HIGHWAY 78 W STE B200    SAVANNAH   CA 31702 
4728 Joseph Eli Dr   STE 6  STONE MOUNTAIN CO 30338 
29620 IH10 West       DULUTH   CT 63026 
4666 El Camino Real      ATLANTA   DC 60555 
3700 Thomas Rd Ste 215  STE 100  ATLANTA   DE 32241 
3700 Thomas Rd Ste 215  STE B-2190 ALPHARETTA  FL 36117 
2615 East West Connector    ALPHARETTA  GA 35201 

萬行結果...

SQL Server parse and compile time: 
    CPU time = 0 ms, elapsed time = 0 ms. 

(10000 rows affected) 
Table 'zip'. Scan count 0, logical reads 20000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'st'. Scan count 0, logical reads 40000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'cty'. Scan count 0, logical reads 20000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'a2'. Scan count 0, logical reads 20000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'a1'. Scan count 0, logical reads 20000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times: 
    CPU time = 94 ms, elapsed time = 93 ms. 

100000排結果...

SQL Server parse and compile time: 
    CPU time = 0 ms, elapsed time = 0 ms. 

(100000 rows affected) 
Table 'zip'. Scan count 0, logical reads 200000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'st'. Scan count 0, logical reads 400000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'cty'. Scan count 0, logical reads 200000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'a2'. Scan count 0, logical reads 200000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'a1'. Scan count 0, logical reads 200000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times: 
    CPU time = 907 ms, elapsed time = 948 ms. 

百萬行的結果...

SQL Server parse and compile time: 
    CPU time = 0 ms, elapsed time = 1 ms. 
SQL Server parse and compile time: 
    CPU time = 31 ms, elapsed time = 51 ms. 

(1000000 rows affected) 
Table 'a1'. Scan count 0, logical reads 4000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'a2'. Scan count 0, logical reads 3056, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'cty'. Scan count 0, logical reads 4000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'st'. Scan count 0, logical reads 208, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'zip'. Scan count 0, logical reads 4000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times: 
    CPU time = 10921 ms, elapsed time = 15743 ms. 

100K排在不到一秒鐘的&一百萬行中〜15秒...

0

想通最簡單的方法是隻要運行它針對有數量可變的每一個狀態,這裏是代碼:

DECLARE @states TABLE (name NVARCHAR(50)); 

INSERT INTO @states (name) 
SELECT DISTINCT 
    State 
FROM anon_AddressChange 


DECLARE @count INT 
DECLARE @i INT 
SET @i = 0 
SET @count = (SELECT COUNT(*) FROM @states) 

while @i < @count 
BEGIN 

    DECLARE @state NVARCHAR(MAX) 
    SET @State = (SELECT top 1 name from @states order by name) 

    DECLARE @amount INT 
    SET @amount = (SELECT count(*) FROM anon_addresschange where state = @state) 





    ;WITH CTE AS (
     SELECT TOP (@amount) CAST(aD.flat_number AS VARCHAR(20)) + ' ' + g.Street_name + ' ' + g.street_type_code AS Address1 
       , aD.postcode AS postcode 
       , l.locality_name AS city 

     FROM [GNAF].dbo.Street_Locality g 
      INNER JOIN [GNAF].dbo.Address_Detail aD ON g.street_locality_pid = aD.street_locality_pid 
      INNER JOIN [GNAF].dbo.Locality l ON aD.locality_pid = l.locality_pid 
     WHERE g.street_name IS NOT NULL AND g.state = @state AND aD.flat_number IS NOT NULL 
      AND g.state NOT IN ('OT', 'NT' ,'TAS' ,'VIC' ,'ACT') 
     ORDER BY (SELECT new_id FROM getNewID) 
    ) 
    UPDATE anon_addresschange SET 
     newStreet1  = UPPER(LEFT(a.Address1,1))+LOWER(SUBSTRING(a.Address1,2,LEN(a.Address1))) 
     ,newCity  = UPPER(LEFT(a.city,1))+LOWER(SUBSTRING(a.city,2,LEN(a.city))) 
     ,newPostcode = a.postcode 
     ,newState  = @state 
     ,newCountry  = 'Australia' 
    FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY CAST(GETDATE() AS TIMESTAMP)) AS RowNumber from cte) a 
    CROSS APPLY (
    SELECT *, ROW_NUMBER() OVER (ORDER BY CAST(GETDATE() AS TIMESTAMP)) AS RowNumber FROM anon_AddressChange 
    WHERE state = @state) b 
    WHERE a.Rownumber = b.Rownumber 
     AND anon_addresschange.personID = b.personID 


    SET @i = @i + 1 
    delete from @states WHERE NAME IN (SELECT TOP 1 name FROM @states order by name) 
END 

我真正需要做的在更新/插入語句使用。

這花了2秒鐘來運行1003條記錄,所以對於1,000,000條記錄來說只需要33分鐘。