SQL查詢到一個逗號分隔的列分割成許多一對多的關係

我被賦予了的3Gb csv文件，我需要在SQL Server中導入2012SQL查詢到一個逗號分隔的列分割成許多一對多的關係

我現在有500萬行的數據在分段表看起來像這樣（簡化）。

Staging表：

+-------------------+------------+---------------+------------+ 
|  Name  | Thumbnail |  Tags  | Categories | 
+-------------------+------------+---------------+------------+ 
| History   | thumb1.jpg | history,essay | history | 
| Nutricion Lecture | thumb2.jpg | food,essay | health  | 
+-------------------+------------+---------------+------------+

的問題是關於我的臨時表中的tags和categories列。

如何從我的臨時表中的信息傳遞給我實際的表，也可以創建一個唯一記錄每個標籤和類別 - 和創建所需的許多一對多的關係？

需要根據現有標籤檢查每個標籤，以創建新記錄 - 或 - 獲取現有標籤的Id。

Programs：

+----+-----------+------------+ 
| id | Program | Thumbnail | 
+----+-----------+------------+ 
| 1 | History | thumb1.jpg | 
| 2 | Nutricion | thumb2.jpg | 
+----+-----------+------------+

Tags：

+----+---------+ 
| Id | Tag | 
+----+---------+ 
| 1 | history | 
| 2 | essay | 
| 3 | food | 
+----+---------+

（分類表省略，因爲它看起來一樣的標籤）

的許多一對多的關係：

Programs_Tags：

+---------+-----+ 
| program | tag | 
+---------+-----+ 
|  1 | 1 | 
|  1 | 2 | 
|  2 | 2 | 
+---------+-----+

Programs_Categories：

+---------+----------+ 
| program | category | 
+---------+----------+ 
|  1 |  1 | 
|  2 |  2 | 
+---------+----------+

我認爲這是純粹的SQL更快那麼這將是爲它編寫的工具。

來源

2014-04-20 Fred Fickleberry III

我不確定這是否在SQL中更快。但是，這是一種方法。

首先，創建五個表，你需要爲這個：

程序
標籤
分類
ProgramTags
ProgramCategories

有了適當的結構，包括身份標識列。

然後將數據加載到程序中。這很容易，只是一個適當的選擇。

然後創建Tags和Categories表。這裏是你將如何裝載Tags表：

with cte as (
     select (case when tags like '%,%' 
        then left(tags, charindex(tags, ',')) 
        else tags 
       end) as tag, 
      (case when tags like '%,%' 
        then substring(tags, charindex(tags, ',') + 1, len(tags)) 
       end) as resttags 
     from staging 
     where tags is not null and tags <> '' 
     union all 
     select (case when resttags like '%,%' then left(resttags, charindex(tags, ',')) 
        else resttags 
       end) as tag, 
      (case when tags like '%,%' 
        then substring(resttags, charindex(resttags, ',') + 1, len(testtags)) 
       end) as resttags 
     from cte 
     where resttags is not NULL and resttags <> '' 
    ) 
select distinct tags 
from cte;

（顯然，這需要一個insert）。

對Categories做同樣的處理。

然後通過加載ProgramTags：

select p.ProgramId, t.TagId 
from staging s join 
    Programs p 
    on s.<whatever> = p.<whatever> join 
    Tags t 
    on ','+s.tags+',' like '%,'+t.tag+',%';

第一個加入是讓程序ID。第二個是獲取適當的標籤。表現不會很好，但它可能足夠滿足你需要做的事情。

來源

2014-04-20 13:47:42

無法運行--->消息8116，級別16，狀態1，行1 參數數據類型int對於子字符串函數的參數1無效。消息207，級別16，狀態1，行12 無效的列名'標記'。消息207，級別16，狀態1，行15 列名'標記'無效。消息207，級別16，狀態1，行16 無效的列名'testtags'。消息207，級別16，狀態1，行16 無效的列名'testtags'。 –

@FrankieYale。。。我不知道我在想什麼，把第一個參數換成'substr（）'的參數。 –

SQL查詢到一個逗號分隔的列分割成許多一對多的關係

回答

相關問題