如何在Python中迭代數據文件而不用代碼重複？

我想編寫一個腳本來處理一些數據文件。數據文件是隻是數據的列ASCII文本，這裏是一個簡單的例子...如何在Python中迭代數據文件而不用代碼重複？

的第一列是一個ID號，在這種情況下爲1〜3第二列是利益價值。（我使用的實際文件有更多的ID和值，但讓我們在這裏保持簡單）。

的data.txt內容：

我想遍歷數據和每個ID提取值，並對其進行處理，即獲取所有值ID 1，並與他們做什麼，然後得到爲ID 2等

所有值這樣我就可以在Python寫這篇文章。

#!/usr/bin/env python 

def processValues(values): 
    print "Will do something with data here: ", values 

f = open('data.txt', 'r') 
datalines = f.readlines() 
f.close() 

currentID = 0 
first = True 

for line in datalines: 
    fields = line.split() 

    # if we've moved onto a new ID, 
    # then process the values we've collected so far 
    if (fields[0] != currentID): 

     # but if this is our first iteration, then 
     # we just need to initialise our ID variable 
     if (not first): 
      processValues(values) # do something useful 

     currentID = fields[0] 
     values = [] 
     first = False 

    values.append(fields[1]) 

processValues(values) # do something with the last values

我現在的問題是processValues()必須在最後再次調用。因此，這需要代碼重複，並且意味着我有一天可能會寫這樣的劇本，卻忘了把多餘的processValues()末，並因此錯過了最後一個ID。它還需要存儲它是否是我們的'第一'迭代，這是令人討厭的。

有沒有辦法做到這一點，而不需要對processValues()進行兩次函數調用（每個新ID的循環內有一個，最後一個ID的循環後有一個）？

我能想到的唯一方法是存儲行號並在循環中檢查，如果我們在最後一行。但似乎刪除我們存儲的線本身，而不是指數或者線的總數「的foreach」風格處理的地步。這也適用於其他腳本語言如Perl，哪裏會共同來遍歷與while(<FILE>)線，沒有剩餘的行數的概念。是否總是需要在最後再次寫入函數調用？

來源

2012-11-30 trev

你想看看itertools.groupby如果某個鍵的所有事件都contigious - 一個基本的例子...

from itertools import groupby 
from operator import itemgetter 

with open('somefile.txt') as fin: 
    lines = (line.split() for line in fin) 
    for key, values in groupby(lines, itemgetter(0)): 
     print 'Key', key, 'has values' 
     for value in values: 
      print value

或者 - 你也可以看看使用collections.defaultdict有list爲默認值。

來源

2012-11-30 12:25:11

隨着loadtxt()它可能是這樣的：

from numpy import loadtxt 

data = loadtxt("data.txt") 
ids = unique(data[:,0]).astype(int) 

for id in ids: 
    d = data[ data[:,0] == id ] 
    # d is a reduced (matrix) containing data for <id> 
    # ....... 
    # do some stuff with d

對於示例print d會給：

id= 1 
d= 
[[ 1. 5.] 
[ 1. 4.] 
[ 1. 10.] 
[ 1. 19.]] 
id= 2 
d= 
[[ 2. 15.] 
[ 2. 18.] 
[ 2. 20.] 
[ 2. 21.]] 
id= 3 
d= 
[[ 3. 50.] 
[ 3. 52.] 
[ 3. 55.] 
[ 3. 70.]]

來源

2012-11-30 12:36:54 Tengis

如何在Python中迭代數據文件而不用代碼重複？

回答

相關問題