2017-02-11 207 views
-1

我在閱讀csv文件時遇到問題。在Python中讀取csv文件

csv格式: 下面是CSV文件中的兩個條目格式:

"1", "one", "<long class=\"like\" >\ 
    <short class=\"over\">\ 
    </short> 
    </long>", "3", "<long class=\"like\" >\ 
    <short class=\"over\">\ 
    </short> 
    </long>", "facebook" 
    "2", "two", "<long class=\"like\" >\ 
    <short class=\"over\">\ 
    </short> 
    </long>", "3", "<long class=\"like\" >\ 
    <short class=\"over\">\ 
    </short> 
    </long>", "facebook" 

如何讀取每一行這種CSV文件的?

+2

對於csv文件,這是一個奇怪的內容。如何看待預期的結果? – RomanPerekhrest

+0

每行前面應該有4個空格還是格式問題? –

+0

stackoverflow格式行之前沒有空格 – justkid

回答

0

假設從CSV文件看一些兩個條目象下面這樣:

"1", "one", "<long class=\"like\" ><short class=\"over\"></short></long>", "3", "<long class=\"like\" ><short class=\"over\"></short></long>" "facebook" 
"2", "two", "<long class=\"like\" ><short class=\"over\"></short></long>", "3", "<long class=\"like\" ><short class=\"over\"></short></long>" "facebook" 

考慮使用re.findall()功能:

import re 

with open('test.csv', 'r') as fh: 
    lines = fh.read().split('\n') 
    for l in lines: 
     fields = re.findall(r'^\"(\d+)\", \"(\w+)\", (.+)', l, re.S) 
     a, b, c = fields[0] # unpacking fields 
     print(a, b, c, sep='\t') 

輸出:

1 one "<long class=\"like\" ><short class=\"over\"></short></long>", "3", "<long class=\"like\" ><short class=\"over\"></short></long>" "facebook" 
2 two "<long class=\"like\" ><short class=\"over\"></short></long>", "3", "<long class=\"like\" ><short class=\"over\"></short></long>" "facebook" 
1

爲什麼不要使用csv包?

你可以閱讀每一行和像你想用它玩,例如:

import csv 
with open('prueba.csv','r') as file: 
    reader = csv.reader(file, delimiter=';') 
    for row in reader: 
     <That you want to do with each row> 

但也許你想要做的另一個不同的事情。