我正在嘗試編寫一段python來讀取我的文件。該代碼是下面:findall函數抓取錯誤信息
import re, os
captureLevel = [] # capture read scale.
captureQID = [] # capture questionID.
captureDesc = [] # capture description.
file=open(r'E:\Grad\LIS\LIS590 Text mining\Final_Project\finalproject_data.csv','rt')
newfile=open('finalwordlist.csv','w')
mytext=file.read()
for row in mytext.split('\n'):
grabLevel=re.findall(r'(\d{1})+\n',row)
captureLevel.append(grabLevel)
grabQID=re.findall(r'(\w{1}\d{5})',row)
captureQID.append(grabQID) #ERROR LINE.
grabDesc=re.findall(r'\,+\s+(\w.+)',row)
captureDesc.append(grabDesc)
lineCount = 0
wordCount = 0
lines = ''.join(grabDesc).split('.')
for line in lines:
lineCount +=1
for word in line.split(' '):
wordCount +=1
newfile.write(''.join(grabLevel) + '|' + ''.join(grabQID) + '|' + str(lineCount) + '|' + str(wordCount) + '|' + word + '\n')
newfile.close()
這裏有三個線我的數據:
a00004," another oakstr eetrequest, helped student request item",2 a00005, asked retiree if he used journal on circ list,2 a00006, asked scientist about owner of some archival notes,2
下面是結果: 22|a00002|1|1|a00002, 22|a00002|1|2| 22|a00002|1|3|scientist 22|a00002|1|4|looking 22|a00002|1|5|for
的結果的第一列應該只是一個數字,但爲什麼它打印出兩位數字?
任何想法這裏有什麼問題?謝謝。