2013-06-03 20 views
0

我想要計算txt文件中每行的錯誤發生次數。只讀一次而不刪除它Python使用.readline()

foo.txt文件:

1 1 1 1 1 NA # so, Missings: 1 
1 1 1 NA 1 1 # so, Missings: 1 
1 1 NA 1 1 NA # so, Missings: 2 

但是我也想以獲得用於第一行元素的量(假設這是相等的所有行)。

miss = [] 
with open("foo.txt") as f: 
    for line in f: 
     miss.append(line.count("NA")) 

>>> miss 
[1, 1, 2]   # correct 

問題是當我試圖確定元素的數量。我用下面的代碼做了這個:

miss = [] 
with open("foo.txt") as f: 
    first_line = f.readline() 
    elements = first_line.count(" ") # given that values are separated by space 
    for line in f: 
     miss.append(line.count("NA")) 

>>> (elements + 1) 
6 # True, this is correct   
>>> miss 
[1,2] # misses the first item due to readline() removing lines.` 

我怎樣才能讀取第一行而不刪除它的進一步操作?

+0

不成熟的優化是萬惡之源。只需計算循環內_each_行的長度:'for line in f:... elements = len(line.split())'。 – georg

回答

2

嘗試f.seek(0)。這會將文件句柄重置到文件的開頭。然後

完整的例子是:

miss = [] 
with open("foo.txt") as f: 
    first_line = f.readline() 
    elements = first_line.count(" ") # given that values are separated by space 
    f.seek(0) 
    for line in f: 
     miss.append(line.count("NA")) 

更妙的是閱讀的所有行,即使是第一線,只有一次,並檢查元素的數量只有一次:

miss = [] 
elements = None 
with open("foo.txt") as f: 
    for line in f: 
     if elements is None: 
      elements = line.count(" ") # given that values are separated by space 
     miss.append(line.count("NA")) 

順便說一句:不是元素的數量是line.count(" ") + 1

我推薦使用len(line.split()),因爲這也處理選項卡,雙空格,前/後間隔等

+0

這個很有用,非常感謝。 – PascalVKooten

0

你也可以只把第一行分別

with open("foo.txt") as f: 
    first_line = next(f1) 
    elements = first_line.count(" ") # given that values are separated by space 
    miss = [first_line.count("NA")] 
    for line in f: 
     miss.append(line.count("NA") 
+0

接下來究竟是什麼? – PascalVKooten

+0

@Dualinity:http://docs.python.org/3/library/functions.html#next – georg

2

提供的所有行項目的數量,你可以僅計算在最後一行項目:

miss = [] 
with open("foo.txt") as f: 
    for line in f: 
     miss.append(line.count("NA") 
    elements = len(line.split()) 

一個更好的辦法來算大概是:

elements = len(line.split()) 

因爲這也會計算以多個空格或製表符分隔的項目。

+0

請注意,'.count(「」)'將被關閉1,所以'len(split)'是唯一的正確的一個。 – georg

+0

謝謝。是。這是我會這樣做的方式。另外,項目之間通常有多個空間或製表符。刪除OP版本。 –