R在Python中的read.table等效項

我正試圖將一些處理工作從R移到Python。在R中，我使用read.table（）讀取真正凌亂的CSV文件，並自動以正確的格式分割記錄。例如。R在Python中的read.table等效項

391788,"HP Deskjet 3050 scanner always seems to break","<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p> 

<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p> 
","windows-7 printer hp"

被正確地分成4列。 1條記錄可以分成許多行，並且在所有地方都有逗號。在R我只是這樣做：

read.table(infile, header = FALSE, nrows=chunksize, sep=",", stringsAsFactors=FALSE)

在Python中有什麼可以做到這一點同樣好嗎？

謝謝！

來源

2013-10-23 mchangun

您可以使用csv模塊。

from csv import reader 
csv_reader = reader(open("C:/text.txt","r"), quotechar="\"") 

for row in csv_reader: 
    print row 

['391788', 'HP Deskjet 3050 scanner always seems to break', "<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>\n\n<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>\n", 'windows-7 printer hp']

長度輸出= 4

來源

2013-10-23 08:59:05

但這只是返回字符串。它不會像read.table那樣推斷每一列的類型。 –

的pandas模塊還提供了許多R-樣函數和數據結構，包括read_csv。這裏的優點是數據將作爲熊貓DataFrame讀入，比標準的Python列表或字典更容易操作（尤其是如果您習慣於R）。這裏是一個例子：

>>> from pandas import read_csv 
>>> ugly = read_csv("ugly.csv",header=None) 
>>> ugly 
     0            1 \ 
0 391788 HP Deskjet 3050 scanner always seems to break 

                2      3 
0 <p>I'm running a Windows 7 64 blah blah blah..... windows-7 printer hp

來源

2013-10-23 14:25:30 David

R在Python中的read.table等效項

回答

相關問題