我正試圖將一些處理工作從R移到Python。在R中,我使用read.table()讀取真正凌亂的CSV文件,並自動以正確的格式分割記錄。例如。R在Python中的read.table等效項
391788,"HP Deskjet 3050 scanner always seems to break","<p>I'm running a Windows 7 64 blah blah blah........ake this work permanently?</p>
<p>Update: It might have something to do with my computer. It seems to work much better on another computer, windows 7 laptop. Not sure exactly what the deal is, but I'm still looking into it...</p>
","windows-7 printer hp"
被正確地分成4列。 1條記錄可以分成許多行,並且在所有地方都有逗號。在R我只是這樣做:
read.table(infile, header = FALSE, nrows=chunksize, sep=",", stringsAsFactors=FALSE)
在Python中有什麼可以做到這一點同樣好嗎?
謝謝!
但這只是返回字符串。它不會像read.table那樣推斷每一列的類型。 –