我繼承了幾百個我想導入熊貓數據幀的CSV。它們的格式,像這樣:將不正確格式的CSV讀入熊貓 - 未轉義的引號
username;date;retweets;favorites;text;geo;mentions;hashtags;id;permalink
;2011-03-02 11:04;0;0;"ICYMI: "What you have is 87 people who have common goals of working for [the] next generation; that’s why our...";;;;"42993734165594112";https://twitter.com/AustinScottGA08/status/42993734165594112
;2014-02-25 10:38;3;0;"Will be asking tough questions of #IRS at 2/26 FSGG hearing; supporting bills to make agency more accountable.";;;#IRS;"438352361812426752";https://twitter.com/AnderCrenshaw/status/438352361812426752
;2017-06-14 12:39;4;6;"Thank you to the brave men and women who have answered the call to defend our great nation. Happy 242nd Birthday @USArmy ! #ArmyBDay pic.twitter.com/brBYCOLBJZ";;@USArmy;#ArmyBDay;"875045042758369281";https://twitter.com/AustinScottGA08/status/875045042758369281
要扳指成熊貓數據幀,我想:
tweets = pd.read_csv(file, header=0, sep=';', parse_dates = True)
,並得到這個錯誤:
ParserError: Error tokenizing data. C error: Expected 10 fields in line 1, saw 11
我認爲這是因爲該字段中有一個非轉義報價
ICYMI: "What you have is 87 people who have common goals of working for [the] next generation; that’s why our...
所以,我想
tweets = pd.read_csv(file, header=0, sep=';', parse_dates = True, quoting=csv.QUOTE_NONE)
,並得到一個新的錯誤(我假設,因爲有;在現場):
Will be asking tough questions of #IRS at 2/26 FSGG hearing; supporting bills to make agency more accountable. http:// tinyurl.com/n8ozeg5
ParserError: Error tokenizing data. C error: Expected 10 fields in line 2, saw 11
我不能再生這些CSV文件。我想知道的是,我如何預處理/修復它們,以便它們的格式正確(即,在字段中轉義引號)?或者,有沒有辦法直接將它們讀入數據框,即使使用未轉義的引號?
什麼蟒蛇和熊貓的版本您使用的?我用Python 3.6.1和pandas得到了不同的結果0.19.2 –
Python 3.5.3 pandas 0.20.2 - 你會發生什麼? – Libby
對於這種情況,我不需要每一列,並添加'usecols'解決了我眼前的問題。但它並沒有回答我的實際問題。這裏是工作的一行:'tweets = pd.read_csv(file,header = 0,sep =';',parse_dates = True,quoting = csv.QUOTE_NONE,usecols = [「date」,「hashtags」,「permalink」] )' – Libby