2016-03-08 49 views
0

.txt文件保持數據如下(源:在2章here 「datingTestSet2.txt」):在python2.7.11中,爲什麼我不能刪除fileopen代碼?

40920 8.326976 0.953952 largeDoses 
14488 7.153469 1.673904 smallDoses 
26052 1.441871 0.805124 didntLike 
75136 13.147394 0.428964 didntLike 
38344 1.669788 0.134296 didntLike 
... 

代碼:

from numpy import * 
import operator 
from os import listdir 

def file2matrix(filename): 
    fr = open(filename) 
    # arr = fr.readlines() # Code1!!!!!!!!!!!!!!!!!!! 
    numberOfLines = len(fr.readlines())  #get the number of lines in the file 
    returnMat = zeros((numberOfLines,3))  #prepare matrix to return 
    classLabelVector = []      #prepare labels return 
    fr = open(filename) # Code2!!!!!!!!!!!!!!!!!!!!! 
    index = 0 
    for line in fr.readlines(): 
     line = line.strip() 
     listFromLine = line.split('\t') 
     returnMat[index,:] = listFromLine[0:3] 
     classLabelVector.append(int(listFromLine[-1])) 
     index += 1 
    return returnMat,classLabelVector 

datingDataMat, datingLabels = file2matrix('datingTestSet2.txt') 

此函數的結果是:

 datingDataMat     datingLabels 
40920 8.326976 0.953952   3 
14488 7.153469 1.673904   2 
26052 1.441871 0.805124   1 
75136 13.147394 0.428964   1 
38344 1.669788 0.134296   1 
72993 10.141740 1.032955   1 
35948 6.830792 1.213192   3 
42666 13.276369 0.543880   3 
67497 8.631577 0.749278   1 
35483 12.273169 1.508053   3 
50242 3.723498 0.831917   1 
...  ...   ...    ... 

我的問題是:

  1. 當我剛剛刪除Code2(fr = open(filename),其中index = 0以上), 函數的結果變成全零矩陣,並且全零矢量。 爲什麼我不能刪除Code2?不第一行(fr = open(filename)工作?

  2. 當我剛剛加入代碼1(arr = fr.readlines()),這是錯誤的。爲什麼???

    returnMat[index,:] = listFromLine[0:3] 
    
    IndexError: index 0 is out of bounds for axis 0 with size 0 
    

回答

2

1)無法刪除由於此行的Code2行:

numberOfLines = len(fr.readlines())  #get the number of lines in the file 

在該行中,您正在閱讀文件的末尾。再次打開它可以讓你在文件的開始處...

2)類似於上面的答案,如果你調用readLines()來讀取所有行並將文件光標移動到文件...因此,如果您再次嘗試在文件上讀取文件,則沒有任何可讀的文件,因此失敗。

1

您目前位於文件的末尾。因此,您第二次嘗試閱讀文件內容時會產生錯誤。你需要回到文件的開頭。用途:

fr.seek(0) 

您的相反:

fr = open(filename) # Code2!!!!!!!!!!!!!!!!!!!!! 
0

你只需要readlines一次。

def file2matrix(filename): 
    fr = open(filename) 
    lines = fr.readlines()  
    fr.close()  
    numberOfLines = len(lines)  #get the number of lines in the file 
    returnMat = zeros((numberOfLines,3))  #prepare matrix to return 
    classLabelVector = []      #prepare labels return 
    index = 0 
    for line in lines: 
     line = line.strip() 
     listFromLine = line.split('\t') 
     returnMat[index,:] = listFromLine[0:3] 
     # careful here, returnMat is initialed as floats 
     # listFromLine is list of strings 
     classLabelVector.append(int(listFromLine[-1])) 
     index += 1 
    return returnMat,classLabelVector 

我可以建議一些其他的變化:

def file2matrix(filename): 
    with open(filename) as f: 
     lines = f.readlines() 
    returnList = [] 
    classLabelList = [] 
    for line in lines: 
     listFromLine = line.strip().split('\t') 
     returnList.append(listFromLine[0:3]) 
     classLabelList.append(int(listFromLine[-1])) 
    returnMat = np.array(returnList, dtype=float) 
    return returnMat, classLabelList 

甚至

def file2matrix(filename): 
    with open(filename) as f: 
     lines = f.readlines() 
    ll = [line.strip().split('\t')] 
    returnMat = np.array([l[0:3] for l in ll], dtype=float) 
    classLabelList = [int(l[-1]) for l in ll] 
    # classLabelVec = np.array([l[-1] for l in ll], dtype=int) 
    return returnMat, classLabelList 
相關問題