2015-04-29 83 views
2

我有一個有10列,每列記錄如下製表符分隔文本文件:Python的 - 轉換製表符分隔的文件轉換成CSV以特定的方式

p001 64  20141209  meals (attendees) ML ENTER Entertainment xyz Restaurants  6.0  "_e' Restaurants (123) 456-7890 \r\n   FORUM \r\n  ,Around \r\n\r\n':33 113-2 \r\n\r\n 8440 XYZ09'15  1:11PM \r\n\r\n 1 Burger   6.00 \r\n\r\n SSIONS  6.00 \r\n TOTAL PAID 6 .00 \r\n XXXXXXXXXXX2012 XX/XX \r\n XYZ EXPRESS 
6.00 \r\n\r\n\r\n 7,-10(YOU! FOR DINING WITH US! \r\n\r\n   113-2 \r\n\r\nYour r is: 840  \r\n" 

P.S:最後一欄已文本括在「」。而我的第一列不是唯一的。

我想將此文本文件轉換爲csv文件,以便我只從記錄的第1,2,8,9,10列中選取數據。另外,所有的數據都應該包含在「」中。

例如,上面記錄着應轉換爲輸出CSV文件中的以下行:

"p001","64","xyz Restaurants","6.0","_e' Restaurants (123) 456-7890 \r\n   FORUM \r\n  ,Around \r\n\r\n':33 113-2 \r\n\r\n 8440 XYZ09'15  1:11PM \r\n\r\n 1 Burger   6.00 \r\n\r\n SSIONS  6.00 \r\n TOTAL PAID 6 .00 \r\n XXXXXXXXXXX2012 XX/XX \r\n XYZ EXPRESS 
    6.00 \r\n\r\n\r\n 7,-10(YOU! FOR DINING WITH US! \r\n\r\n   113-2 \r\n\r\nYour r is: 840  \r\n" 
+0

[讀取並解析TSV文件,然後將其保存爲CSV(\ * efficient \ *)](可能會重複)(http://stackoverflow.com/questions/13992971/reading-and-parsing-a- tsv-file-then-manipulation-it-for-saving-as-csv-efficie) –

回答

2

這應該爲你工作。請注意,這對於輸入和輸出都使用csv庫,我們只是更改文本分隔符。當您編寫文件時,CSV應自動轉義您的引用字符。

import csv 
try: 
    with open(r'input.tsv', 'r', newline='\n') as in_f, \ 
     open(r'output.csv', 'w', newline='\n') as out_f: 
     reader = csv.reader(in_f, delimiter='\t') 
     writer = csv.writer(out_f, delimiter=',', quoting=csv.QUOTE_ALL) # Quoting added per comment from @Rob. 
     for li in reader: 
      try: 
       writer.writerow([li[0], li[1], li[2], li[7], li[8], li[9],]) 
      except IndexError: # Prevent errors on blank lines. 
       pass 
except IOError as err: 
    print(err) 

我無法解析出其中的標籤應該在你的樣本數據(而不是空格),但具有下列數據測試它爲input.tsv

1 2 3 4 5 6 7 8 9 10 
11 12 13 14 15 16 17 18 19 20 
21 22 23 24 25 26 27 28 29 30 

會生成在output.csv結果如下:

"1","2","3","8","9","10" 
"11","12","13","18","19","20" 
"21","22","23","28","29","30" 

更新

請注意,添加quoting=csv.QUOTE_ALL的代碼更新是根據Rob的評論中的建議。謝謝你的收穫!

+0

我想你錯過了* *所有數據都應該放在「*」中。試試'csv.writer(out_f,delimiter =',',quoting = csv.QUOTE_ALL)''。 –

+0

@Rob - 已添加到代碼中。謝謝你的收穫。 –

相關問題