我有一堆包含製表符分隔的文本文件。第二列包含一個ID號,每個文件已經按該ID號排序。我想通過第2列中的id號將每個文件分隔成多個文件。這就是我所擁有的。如何在python中通過id拆分文本文件
readpath = 'path-to-read-file'
writepath = 'path-to-write-file'
for filename in os.listdir(readpath):
with open(readpath+filename, 'r') as fh:
lines = fh.readlines()
lastid = 0
f = open(writepath+'checkme.txt', 'w')
f.write(filename)
for line in lines:
thisid = line.split("\t")[1]
if int(thisid) <> lastid:
f.close()
f = open(writepath+thisid+'-'+filename,'w')
lastid = int(thisid)
f.write(line)
f.close()
我得到的是簡單地全部用在新的文件名前面的每個文件的第一個ID號讀取文件的副本。這就好像
thisid = line.split("\t")[1]
只在循環中完成一次。對發生了什麼的任何線索?
編輯
使用的問題我的文件\ r而非\ r \ n至終止線。更正後的代碼(只需添加'儒的打開讀取文件和交換時爲= <>!):
readpath = 'path-to-read-file'
writepath = 'path-to-write-file'
for filename in os.listdir(readpath):
with open(readpath+filename, 'rU') as fh:
lines = fh.readlines()
lastid = 0
f = open(writepath+'checkme.txt', 'w')
f.write(filename)
for line in lines:
thisid = line.split("\t")[1]
if int(thisid) != lastid:
f.close()
f = open(writepath+thisid+'-'+filename,'w')
lastid = int(thisid)
f.write(line)
f.close()
你檢查什麼'INT(thisid)'實際上是由之前'如果將一個'打印(INT(thisid))'評估在每個經過一輪循環,例如'塊? – nekomatic
我剛把'f.write(line +'\ r')'改成'f.write(thisid + line +'\ r')''thisid'只出現在第一行。 – Joseph
只需注意:'<>'在Python 3.x中已被刪除,即使使用2.x也不推薦使用'!=' –