2016-06-21 48 views
0

我試圖使用帶格式文件的BULK INSERT將.CSV文件導入到SQL Server表中。我可以將它導入,但任何拉丁字符都會導入爲奇怪的字符。我自己完成這個個人項目感到非常自豪,但我已經達到了只需要幫助的地步。我可以做一些亂七八糟的UPDATE改變人物和導入數據後REPLACE語句,但我真的希望能夠爲他們出現在一個步驟.csv文件導入拉丁字符。 下面是我創建的數據庫和表:BULK INSERT的格式文件中的適當歸類

CREATE DATABASE Test; 

CREATE TABLE dbo.rawData 
    ([Position] nvarchar(500) NULL, 
    [Const] nvarchar(500) NULL, 
    [Created] nvarchar(500) NULL, 
    [Modified] nvarchar(500) NULL, 
    [Description] nvarchar(500) NULL, 
    [Title] nvarchar(500) NOT NULL, 
    [TitleType] nvarchar(500) NULL, 
    [Directors] nvarchar(500) NULL, 
    [YouRated] nvarchar(500) NULL, 
    [IMDbRating] nvarchar(500) NULL, 
    [Runtime] nvarchar(500) NULL, 
    [Year] nvarchar(500) NULL, 
    [Genres] nvarchar(500) NULL, 
    [NumVotes] nvarchar(500) NULL, 
    [ReleaseDate] nvarchar(500) NULL, 
    [URL] nvarchar(500) NULL, 
    ) 
GO 

這裏是一些我從.CSV文件(保存爲ratings.csv)所採取的工作數據。我使用Notepad ++,並以UTF-8編碼。注意最後一行的「藥命俱樂部」怎麼有一個導演,在他的名字一個拉丁字符:

"position","const","created","modified","description","Title","Title type","Directors","You rated","IMDb Rating","Runtime (mins)","Year","Genres","Num. Votes","Release Date (month/day/year)","URL" 
"1","tt0437863","Tue Feb 16 00:00:00 2016","","","The Benchwarmers","Feature Film","Dennis Dugan","5","5.6","80","2006","comedy, romance, sport","39413","2006-04-07","http://www.imdb.com/title/tt0437863/" 
"2","tt0085334","Tue Feb 16 00:00:00 2016","","","A Christmas Story","Feature Film","Bob Clark","6","8.1","94","1983","comedy, family","103770","1983-11-18","http://www.imdb.com/title/tt0085334/" 
"3","tt2403029","Tue Feb 16 00:00:00 2016","","","The Starving Games","Feature Film","Jason Friedberg, Aaron Seltzer","2","3.3","83","2013","comedy","13719","2013-10-31","http://www.imdb.com/title/tt2403029/" 
"4","tt0316465","Tue Feb 16 00:00:00 2016","","","Radio","Feature Film","Michael Tollin","6","6.9","109","2003","biography, drama, sport","31692","2003-10-24","http://www.imdb.com/title/tt0316465/" 
"5","tt0141369","Tue Feb 16 00:00:00 2016","","","Inspector Gadget","Feature Film","David Kellogg","4","4.1","78","1999","action, adventure, comedy, family, sci_fi","35340","1999-07-18","http://www.imdb.com/title/tt0141369/" 
"6","tt0033563","Tue Feb 16 00:00:00 2016","","","Dumbo","Feature Film","Sam Armstrong, Norman Ferguson","6","7.3","64","1941","animation, family, musical","80737","1941-10-23","http://www.imdb.com/title/tt0033563/" 
"7","tt0384642","Tue Feb 16 00:00:00 2016","","","Kicking & Screaming","Feature Film","Jesse Dylan","5","5.5","95","2005","comedy, family, romance, sport","29539","2005-05-01","http://www.imdb.com/title/tt0384642/" 
"8","tt0116705","Tue Feb 16 00:00:00 2016","","","Jingle All the Way","Feature Film","Brian Levant","7","5.4","89","1996","comedy, family","66879","1996-11-16","http://www.imdb.com/title/tt0116705/" 
"9","tt1981677","Tue Feb 16 00:00:00 2016","","","Pitch Perfect","Feature Film","Jason Moore","7","7.2","112","2012","comedy, music, romance","203205","2012-09-28","http://www.imdb.com/title/tt1981677/" 
"10","tt0409459","Tue Feb 16 00:00:00 2016","","","Watchmen","Feature Film","Zack Snyder","7","7.6","162","2009","action, mystery, sci_fi","368137","2009-02-23","http://www.imdb.com/title/tt0409459/" 
"11","tt1343092","Tue Feb 16 00:00:00 2016","","","The Great Gatsby","Feature Film","Baz Luhrmann","5","7.3","143","2013","drama, romance","345664","2013-05-01","http://www.imdb.com/title/tt1343092/" 
"12","tt0332379","Tue Feb 16 00:00:00 2016","","","School of Rock","Feature Film","Richard Linklater","5","7.1","108","2003","comedy, music","202083","2003-09-09","http://www.imdb.com/title/tt0332379/" 
"13","tt0120783","Tue Feb 16 00:00:00 2016","","","The Parent Trap","Feature Film","Nancy Meyers","6","6.4","128","1998","adventure, comedy, drama, family, romance","82087","1998-07-20","http://www.imdb.com/title/tt0120783/" 
"14","tt0790636","Tue Feb 16 00:00:00 2016","","","Dallas Buyers Club","Feature Film","Jean-Marc Vallée","7","8.0","117","2013","biography, drama","308118","2013-09-07","http://www.imdb.com/title/tt0790636/" 

我有一個格式文件(保存爲format.fmt),當在記事本中打開++看起來是這樣的:

11.0 
16 
1  SQLCHAR    0  1000 "\",\"" 1  Position     SQL_Latin1_General_CP1_CI_AS 
2  SQLCHAR    0  1000 "\",\"" 2  Const      SQL_Latin1_General_CP1_CI_AS 
3  SQLCHAR    0  1000 "\",\"" 3  Created     SQL_Latin1_General_CP1_CI_AS 
4  SQLCHAR    0  1000 "\",\"" 4  Modified     SQL_Latin1_General_CP1_CI_AS 
5  SQLCHAR    0  1000 "\",\"" 5  Description    SQL_Latin1_General_CP1_CI_AS 
6  SQLCHAR    0  1000 "\",\"" 6  Title      SQL_Latin1_General_CP1_CI_AS 
7  SQLCHAR    0  1000 "\",\"" 7  TitleType     SQL_Latin1_General_CP1_CI_AS 
8  SQLCHAR    0  1000 "\",\"" 8  Directors     SQL_Latin1_General_CP1_CI_AS 
9  SQLCHAR    0  1000 "\",\"" 9  YouRated     SQL_Latin1_General_CP1_CI_AS 
10  SQLCHAR    0  1000 "\",\"" 10 IMDbRating     SQL_Latin1_General_CP1_CI_AS 
11  SQLCHAR    0  1000 "\",\"" 11 Runtime     SQL_Latin1_General_CP1_CI_AS 
12  SQLCHAR    0  1000 "\",\"" 12 Year      SQL_Latin1_General_CP1_CI_AS 
13  SQLCHAR    0  1000 "\",\"" 13 Genres      SQL_Latin1_General_CP1_CI_AS 
14  SQLCHAR    0  1000 "\",\"" 14 NumVotes     SQL_Latin1_General_CP1_CI_AS 
15  SQLCHAR    0  1000 "\",\"" 15 ReleaseDate    SQL_Latin1_General_CP1_CI_AS 
16  SQLCHAR    0  1000 "\""  16 URL      SQL_Latin1_General_CP1_CI_AS 

當我運行下面的代碼時,一切都會導入,但是拉丁字符被替換爲一系列奇怪的字符。下面是我運行代碼:

BULK INSERT [Test].[dbo].[rawData] 
FROM 'C:\IMDbRatings\Files\ratings.csv' WITH (FIRSTROW = 2, FORMATFILE= 'C:\IMDbRatings\format.fmt'); 

有幾件事情我想是改變.csv文件UCS-2中,在與散裝INSERT的條款加入不同的條件,並改變變量輸入格式文件到SQLNCHAR而不是SQLCHAR,但沒有任何工作。通常在這些情況下發生的是「0行受到影響」,而不是錯誤。任何幫助將如此讚賞。

+1

什麼是您的數據庫,表和列與「Jean-MarcVallée」的名字? – Whencesoever

+0

這三個都在使用SQL_Latin1_General_CP1_CI_AS – Walker

+1

幾天前你沒有發佈這個完全相同的問題嗎?它是否必須是bulkinsert的導入方法?你有SSIS嗎?或者? – Matt

回答

1

@Walker我承認我從來沒有使用批量插入,但試圖設置你的測試用例,並不斷得到不完整或無法讀取,我有和保存格式文件。無論如何,嘗試將編碼更改爲1252在記事本+ +這是編碼 - >字符集 - > Western Eurpoean - > Windows-1252保存該文件並嘗試導入

也我剛看到這篇文章How to write UTF-8 characters using bulk insert in SQL Server?這是有趣的表明UTF-8是問題,直到2016年的SQL,但一個答案,引起了我的眼睛是SQLNCHAR vs SQLCHAR,因爲我認爲你是存儲Unicode數據您這意味着你需要改變你的數據類型,在你的格式文件,並已裝箱表。

+0

對於格式文件或rating.csv?我非常感謝你在這裏的幫助。 – Walker

+1

rating.csv我不認爲格式文件會有所作爲 – Matt

+1

我也寫了一些我認爲你需要修改數據類型爲nchar,因此你可以處理Unicode字符,因爲char不會存儲它們。 – Matt

1

我回答的希望,這將節省別人我最近所經歷的頭痛這個老問題。

簡而言之:在使用代碼頁65001從UTF-8編碼文件插入時,應在格式文件中使用""排序規則。您必須具有用於代碼頁65001的SQL Server 2016可用。


執行以下操作:

  1. 指定批量插入表在UTF-8與BULK INSERT語句
  2. 在您的格式文件CODEPAGE = 65001編碼,指定字符列類型爲SQLCHAR
  3. 在您的格式文件,使用「」歸類爲所有列

BULK INSERT語句:

BULK INSERT [Test].[dbo].[rawData] 
FROM 'C:\IMDbRatings\Files\ratings.csv' 
WITH (CODEPAGE = 65001, FIRSTROW = 2, FORMATFILE= 'C:\IMDbRatings\format.fmt'); 

格式文件:

13.0 
16 
1  SQLCHAR    0  1000 "\",\"" 1  Position     "" 
2  SQLCHAR    0  1000 "\",\"" 2  Const      "" 
3  SQLCHAR    0  1000 "\",\"" 3  Created     "" 
4  SQLCHAR    0  1000 "\",\"" 4  Modified     "" 
5  SQLCHAR    0  1000 "\",\"" 5  Description    "" 
6  SQLCHAR    0  1000 "\",\"" 6  Title      "" 
7  SQLCHAR    0  1000 "\",\"" 7  TitleType     "" 
8  SQLCHAR    0  1000 "\",\"" 8  Directors     "" 
9  SQLCHAR    0  1000 "\",\"" 9  YouRated     "" 
10  SQLCHAR    0  1000 "\",\"" 10 IMDbRating     "" 
11  SQLCHAR    0  1000 "\",\"" 11 Runtime     "" 
12  SQLCHAR    0  1000 "\",\"" 12 Year      "" 
13  SQLCHAR    0  1000 "\",\"" 13 Genres      "" 
14  SQLCHAR    0  1000 "\",\"" 14 NumVotes     "" 
15  SQLCHAR    0  1000 "\",\"" 15 ReleaseDate    "" 
16  SQLCHAR    0  1000 "\""  16 URL      "" 

""RAW整理從https://technet.microsoft.com/en-us/library/ms190657(v=sql.105).aspx

指定數據存儲在代碼頁是在命令的代碼頁選項或bc中指定 p_control BCPFILECP 提示。如果沒有指定,則數據文件的排序規則爲客戶機的OEM代碼頁的 。