2014-12-18 339 views
1

我有一個.csv文件,列值包含一些逗號。下面是例子:python csv模塊讀取csv按逗號分割,但忽略雙引號或單引號內的逗號

Header: ID  Value   Content           Date 
     1  34    "market, business"        12/20/2013 
     2  15    "market, business", yesterday, metric   11/21/2014 
     3  18    "market," business and yesterday     10/20/2014 
     4  19    yesterday, today,        11/22/2014 

這是,如果我打開文本崇高的.csv文件格式,它出現在格式:

1, 34, "market, business", 12/20/2013 
2, 15, "market, business", "yesterday, metric, 11/21/2014 
3, 18, "market," business and yesterday, 10/20/2014 
4, 19, yesterday, today, 11/22/2014 

但我想是Python的csv後閱讀器程序是:

[1, 34, "market, business", 12/20/2013] 
[2, 15, "market, business" "yesterday metric, 11/21/2014] 
[3, 18, "market," business and yesterday, 10/20/2014] 
[4, 19, yesterday today, 11/22/2014] 

這些是我剛樣本數據,「內容」列是頭痛這裏原因CSV模塊的用途「」作爲分隔符,我用

reader = csv.reader(f, skipinitialspace=True) 

它適用於第一行,如果所有的字符串都在一個雙引號內。但它不適用於第三和第二行,如果在引號外有逗號(單或雙)

如何解決問題?我現在只是在python中使用傳統的csv模塊,「熊貓」有能力解決這個問題嗎?

謝謝。

我做了一些更新,我想我要的是,方法在不同的地方指定逗號...... 現在我在這裏貼似乎不合理的原因沒有辦法,我能找到的csv模塊內部講,從分離器的區別「,」和「,」在一個字段內。即使excel不能...

任何想法?

+0

看的 「相關問題」 到右側列表。做任何這些回答你的問題? – kdopen

+0

請發佈您的csv樣本和所需的DataFrame。 – unutbu

+1

所需的Python列表會引發SyntaxErrors,因爲有不匹配的引號和沒有任何引號的字符串。請修復。 – unutbu

回答

1

如果我們可以假設

  • 每行開始用逗號分隔的兩個整數,
  • 每一行與日期結束時,用逗號
  • 剩餘的(在中間)的一切屬於分離第三列

那麼你的數據可以被分析是這樣的:

data = list() 
with open('data') as f: 
    for line in f: 
     parts = line.split(',', 2) 
     parts[2:4] = parts[2].rsplit(',', 1) 
     parts[:2] = map(int, parts[:2]) 
     parts[2:] = map(str.strip, parts[2:]) 
     data.append(parts) 

for row in data: 
    print(row) 

產生

[1, 34, '"market, business"', '12/20/2013'] 
[2, 15, '"market, business", "yesterday, metric', '11/21/2014'] 
[3, 18, '"market," business and yesterday', '10/20/2014'] 
[4, 19, 'yesterday, today', '11/22/2014'] 

那麼你可以做這樣一個數據幀:

import pandas as pd 
df = pd.DataFrame(data, columns=['Id','Value','Content','Date']) 
print(df) 

產量

Id Value         Content  Date 
0 1  34      "market, business" 12/20/2013 
1 2  15 "market, business", "yesterday, metric 11/21/2014 
2 3  18  "market," business and yesterday 10/20/2014 
3 4  19      yesterday, today 11/22/2014 
相關問題