2016-06-11 43 views
0

我有這個CSV file約16.916條記錄。當我加載到MySQL時,它只能檢測到15.945條記錄。 那是什麼的MySQL說:MySQL負載忽略一些記錄

Records: 15945 Deleted: 0 Skipped: 0 Warnings: 0 

誰能告訴我,爲什麼MySQL的忽略一些記錄,我該如何解決這一問題?

我使用LOAD函數這樣加載該文件:

LOAD DATA LOCAL INFILE 'germany-filtered.csv' 
INTO TABLE point_of_interest 
FIELDS TERMINATED BY ',' 
    ENCLOSED BY '"' 
LINES TERMINATED BY '\n' 
IGNORE 1 LINES 
(osm_id,lat,lng,access,addr_housename,addr_housenumber,addr_interpolation,admin_level,aerialway,aeroway,amenity,area,barrier,bicycle,brand,bridge,boundary,building,capital,construction,covered,culvert,cutting,denomination,disused,ele,embankment,foot,generator_source,harbour,highway,historic,horse,intermittent,junction,landuse,layer,leisure,ship_lock,man_made,military,motorcar,name,osm_natural,office,oneway,operator,place,poi,population,power,power_source,public_transport,railway,ref,religion,route,service,shop,sport,surface,toll,tourism,tower_type,tunnel,water,waterway,wetland,width,wood); 

那數據庫模式我用:

CREATE TABLE point_of_interest (
    `poi_id` int(10) unsigned NOT NULL auto_increment, 
    `lat` DECIMAL(10, 8) default NULL, 
    `lng` DECIMAL(11, 8) default NULL, 
    PRIMARY KEY (`poi_id`), 
    KEY `lat` (`lat`), 
    KEY `lng` (`lng`), 
    osm_id BIGINT, 
    access TEXT, 
    addr_housename TEXT, 
    addr_housenumber TEXT, 
    addr_interpolation TEXT, 
    admin_level TEXT, 
    aerialway TEXT, 
    aeroway TEXT, 
    amenity TEXT, 
    area TEXT, 
    barrier TEXT, 
    bicycle TEXT, 
    brand TEXT, 
    bridge TEXT, 
    boundary TEXT, 
    building TEXT, 
    capital TEXT, 
    construction TEXT, 
    covered TEXT, 
    culvert TEXT, 
    cutting TEXT, 
    denomination TEXT, 
    disused TEXT, 
    ele TEXT, 
    embankment TEXT, 
    foot TEXT, 
    generator_source TEXT, 
    harbour TEXT, 
    highway TEXT, 
    historic TEXT, 
    horse TEXT, 
    intermittent TEXT, 
    junction TEXT, 
    landuse TEXT, 
    layer TEXT, 
    leisure TEXT, 
    ship_lock TEXT, 
    man_made TEXT, 
    military TEXT, 
    motorcar TEXT, 
    name TEXT, 
    osm_natural TEXT, 
    office TEXT, 
    oneway TEXT, 
    operator TEXT, 
    place TEXT, 
    poi TEXT, 
    population TEXT, 
    power TEXT, 
    power_source TEXT, 
    public_transport TEXT, 
    railway TEXT, 
    ref TEXT, 
    religion TEXT, 
    route TEXT, 
    service TEXT, 
    shop TEXT, 
    sport TEXT, 
    surface TEXT, 
    toll TEXT, 
    tourism TEXT, 
    tower_type TEXT, 
    tunnel TEXT, 
    water TEXT, 
    waterway TEXT, 
    wetland TEXT, 
    width TEXT, 
    wood TEXT 
) ENGINE=InnoDB; 

更新:

我已經檢查了第一和最後一個記錄但都存在。也有很多這樣的做空值的記錄存在:

1503898236,10.5271308,52.7468051,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 

更新2:

這些是我發現的記錄被遺漏在數據庫:

4228380062,9.9386752,53.6135468,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Dammwild,,,,,,,,,,,,,,,,,,,,attraction,,,,,,, 
4228278589,9.9391503,53.5960304,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Kaninchen,,,,,,,,,,,,,,,,,,,,attraction,,,,,,, 
4228278483,9.9396935,53.5960729,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Onager,,,,,,,,,,,,,,,,,,,,attraction,,,,,,, 
4226772791,8.8394263,54.1354887,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Familienlagune Perlebucht,,,,,,,,,,,,,,,,,,,,attraction,,,,,,, 

它似乎幾乎所有osm_id4開頭的記錄都丟失了。真奇怪。

+0

可能不是你想聽到的,但知道哪些行被忽略會是非常有趣的。 – fvu

+0

我也想知道。我已經檢查過第一個和最後一個記錄,但都存在。我不想檢查每一條記錄。 – Peter

+0

我更新了一些缺失記錄的問題。也許這有助於找到原因。 – Peter

回答

0

我沒有找到MySQL忽略某些記錄的原因,所以我搜索瞭解決方法。有2個解決方案,爲我工作:

斯普利特CSV文件分成多個部分

split -l 10 file.csv 

我想通了,如果我分裂CSV成多個部分,並加載它們到MySQL,它承認每個記錄。但是,如果文件非常小(〜10個記錄/文件),這隻適用於我。所以這個解決方案對我來說是不可行的。

轉換的CSV到MySQL的INSERT語句

bash腳本的這部分CSV文件轉換成含INSERT INTO條款SQL文件:

cp file.csv inserts.sql 
# replace empty CSV value with NULL 
sed -r 's;^,|,$;NULL,;g 
:l 
s;,,;,NULL,;g 
t l' -i inserts.sql 

#replace " with ' 
sed -e ':a' -e 'N' -e '$!ba' -e 's/\"/\x27/g' -i inserts.sql 

# enquote every value 
sed 's/[^,][^,]*/"&"/g' -i inserts.sql 

# replace ,, with ,NULL,NULL, 
sed 's/,,/,NULL,NULL,/g' -i inserts.sql 

# replace ,, with , 
sed 's/,,/,/g' -i inserts.sql 

# add INSERT INTO table_name VALUES (NULL, before each line 
# Note: The first value is NULL because its the primary key which is set from my table 
sed 's/^/INSERT INTO table_name VALUES (NULL,/' -i inserts.sql 

# add); at the end of each line 
sed 's/$/);/' -i inserts.sql 

# replace ,); with); 
sed 's/,);/);/g' -i inserts.sql 

注:我不保證該解決方案適用於所有CSV文件,因此在使用之前請檢查生成的SQL文件。

0

試試這個,看看你在文件中有重複的ID:

顯示文件

# cat mycsv.csv 
6991,10.4232704,49.4970160,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bauernhaus aus Seubersdorf,,,,,,,,,,,,,,,,,,,,attraction,,,,,,, 
4228380062,9.9386752,53.6135468,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Dammwild,,,,,,,,,,,,,,,,,,,,attraction,,,,,,, 
4228278589,9.9391503,53.5960304,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Kaninchen,,,,,,,,,,,,,,,,,,,,attraction,,,,,,, 
4228278483,9.9396935,53.5960729,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Onager,,,,,,,,,,,,,,,,,,,,attraction,,,,,,, 
4226772791,8.8394263,54.1354887,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Familienlagune Perlebucht,,,,,,,,,,,,,,,,,,,,attraction,,,,,,, 
4228278589,9.9391503,53.5960304,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Kaninchen,,,,,,,,,,,,,,,,,,,,attraction,,,,,,, 

算上線

# wc -l mycsv.csv 
6 mycsv.csv 

刪除重複的ID和再次計數

# cut -d',' -f1 mycsv.csv | sort | uniq | wc -l 
5 
+0

感謝您的回答。看起來'cut'沒有找到任何重複的行。執行此命令前後我得到16920行。您是否嘗試過使用我在問題中鏈接的CSV文件的解決方案? – Peter

+0

@Peter - 發現錯誤:-)。你有重複的密鑰。原因是id對於INT字段來說很大。所以它戰爭被截斷,你有重複。將** poi_id更改爲BIGINT **並且每件事都很好 –

+1

'poi_id'是'AUTO_INCREMENT',不從.CSV加載。 – wchiquito