2011-11-07 77 views
2

我需要從包含每位員工銷售日誌的XML文件中收集不同員工的列表。不幸的是,XML文件中的數據並不完全「一致」。該文件的結構如下所示:從SQL Server中的重複數據填充不同的列表

<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345" 
     CustomerName="Bob" SaleNumber="..." /> 
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345" 
     CustomerName="Pat" SaleNumber="..." /> 
<Sale EmployeeId="67890" EmployeeName=""  EmployeeManagerId="12345" 
     CustomerName="Sally" SaleNumber="..." /> 
<Sale EmployeeId="67890" EmployeeName=""  EmployeeManagerId="12345" 
     CustomerName="Sue" SaleNumber="..." /> 
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId=""  
     CustomerName="Jack" SaleNumber="..." /> 
<Sale EmployeeId="58203" EmployeeName="Fred" EmployeeManagerId=""  
     CustomerName="Bill" SaleNumber="..." /> 

該XML文件被上傳到web應用程序,其通過它的內容(如XML)給存儲過程在SQL Server進行處理。由於該文件的大小(最多30,000個元素),我希望儘可能少地在Web應用程序中進行處理。

到目前爲止我所提出的最佳解決方案是爲每個不同的EmployeeId和ManagerId值創建一行臨時表。然後,對於表中的每一行,遍歷具有匹配EmployeeId的XML元素,直到找到名稱不爲null的條目(然後重複爲ManagerId)。

因此,對於每個唯一的員工ID,我會對結果進行兩次迭代以查看我是否可以找到他們的姓名和經理ID。

一旦文件被處理,我希望Employee表看起來像這樣:

+---------+------+------------+ 
| Id (PK) | Name | ManagerId | 
+---------+------+------------+ 
| 12345 | NULL | NULL  | 
| 67890 | John | 12345  | 
| 58203 | Fred | NULL  | 
+---------+------+------------+ 

是否有這種更有效的(和更少的程序)的解決方案?

+0

什麼是您的最終目標是什麼?將這些數據推入SQL數據庫?您使用什麼語言來處理XML? – arb

+0

最後,文件中的任何EmployeeId和ManagerId在我們的SQL數據庫的Employee表中都應該有一個條目。如果可能的話,我還想填寫儘可能多的有關員工(他們的姓名或ManagerId)的信息,但我不能依賴提供的信息。 –

+0

我已更新問題以指定XML處理正在SQL Server中的存儲過程中完成,因此使用的語言是T-SQL。 –

回答

2
declare @xml xml = ' 
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345" 
     CustomerName="Bob" SaleNumber="..." /> 
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345" 
     CustomerName="Pat" SaleNumber="..." /> 
<Sale EmployeeId="67890" EmployeeName=""  EmployeeManagerId="12345" 
     CustomerName="Sally" SaleNumber="..." /> 
<Sale EmployeeId="67890" EmployeeName=""  EmployeeManagerId="12345" 
     CustomerName="Sue" SaleNumber="..." /> 
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId=""  
     CustomerName="Jack" SaleNumber="..." /> 
<Sale EmployeeId="58203" EmployeeName="Fred" EmployeeManagerId=""  
     CustomerName="Bill" SaleNumber="..." />' 

-- "E1 is all employees" 
;with E1 as  
(
    select T.N.value('@EmployeeId', 'int') as Id, 
     T.N.value('@EmployeeName', 'nvarchar(100)') as Name, 
     T.N.value('@EmployeeManagerId', 'int') as ManagerID 
    from @xml.nodes('/Sale') as T(N) 
), 
-- E2 groups on id to get only one emp for each id 
E2 as 
(
    select Id, max(Name) as Name, nullif(max(ManagerID), 0) as ManagerID 
    from E1 
    group by Id 
), 
-- "All manager id's" 
M as 
(
    select distinct T.N.value('@EmployeeManagerId', 'int') as Id 
    from @xml.nodes('/Sale') as T(N) 
    where T.N.value('@EmployeeManagerId', 'int') <> 0  
) 
-- "All unique employees" 
select Id, Name, ManagerID 
from E2 
union all 
-- "Add managers with a lookup against emp for name and manager id" 
select M.Id, E2.Name, E2.ManagerID 
from M 
    left outer join E2 
    on M.Id = E2.ID 
3

這會得到結果,但如果樣本數據不同,可能需要一些清理工作。

DECLARE @T TABLE (x XML) 
INSERT INTO @T 
     (x) 
VALUES ('<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345"  CustomerName="Bob" SaleNumber="..." />') 
    , ('<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345"  CustomerName="Pat" SaleNumber="..." />'), 
     ('<Sale EmployeeId="67890" EmployeeName=""  EmployeeManagerId="12345"  CustomerName="Sally" SaleNumber="..." />') 
    , ('<Sale EmployeeId="67890" EmployeeName=""  EmployeeManagerId="12345"  CustomerName="Sue" SaleNumber="..." />'), 
     ('<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId=""    CustomerName="Jack" SaleNumber="..." />'), 
     ('<Sale EmployeeId="58203" EmployeeName="Fred" EmployeeManagerId=""    CustomerName="Bill" SaleNumber="..." />') 

;WITH c 
AS (

SELECT DISTINCT ID = x.value('(/Sale/@EmployeeId)[1]', 'int') 
     , NAME = x.value('(/Sale/@EmployeeName)[1]', 'varchar(4)') 
     , ManagerID = x.value('(/Sale/@EmployeeManagerId)[1]', 'int') 
FROM @t 
WHERE x.value('(/Sale/@EmployeeName)[1]', 'varchar(4)') <> '' 
) 

SELECT ID, NAME, ManagerID =MIN(NULLIF(ManagerID, 0)) 
FROM c 
GROUP BY ID, Name 
UNION 
SELECT ManagerID, NULL, NULL 
FROM c 
WHERE ManagerID NOT IN (SELECT DISTINCT ID FROM c) 
    AND ManagerID <> 0 
相關問題