2011-07-12 20 views
1

我有一個大的製表符分隔的文本文件,例如,john_file叫它:在python中,如何基於特定值將文件解析爲列表?

1 john1 23 54 54
2 john2 34 45 66
3 john3 35 43 54
4 john2 34 54 78

5 john1 12 34 65
6 john3 34 55 66

什麼是快速的方法來此文件解析成3裏sts基於名字(john1,2或3)?

fh=open('john_file.txt','r').readlines() 
john1_list=[] 
for i in fh: 
if i.split('\t')[1] == "john1": 
    john1_list.append(i) 

在此先感謝

+0

你aldready回答自己的問題?你的解決方案似乎很快 –

+0

@Christian:感謝您的快速響應。在這個例子的代碼中,我將不得不編寫3個循環。在我從john1到john30的實際文件中,我正在尋找更簡潔的方式。 – user839145

回答

6
from collections import defaultdict 

d = defaultdict(list) 

with open('john_file.txt') as f: 
    for line in f: 
     fields = line.split('\t') 
     d[fields[1]].append(line) 

個人名單,然後在d['john1']d['john2']

0

你可以這樣做:

fh=open('john_file.txt','r').readlines() 
john_lists={} 
for i in fh: 
    j=i.split('\t')[1] 
    if j not in johns: 
     john_lists[j]=[] 
    johns[j].append(i) 

這有沒有優勢取決於事先了解可能的價值觀第二列。

正如其他人所指出的,你也可以使用defaultdict

from collections import defaultdict 
fh=open('john_file.txt','r').readlines() 
john_lists=defaultdict(list) 
for i in fh: 
    j=i.split('\t')[1] 
    johns[j].append(i) 
+0

如果您將'john_lists'設置爲'collections.defaultdict(list)',那麼您將不需要if語句。 –

0
>>> from collections import defaultdict 
>>> a = defaultdict(list) 
>>> for line in '''1 john1 23 54 54 
... 2 john2 34 45 66 
... 3 john3 35 43 54 
... 4 john2 34 54 78 
... 5 john1 12 34 65 
... 6 john3 34 55 66 
... '''.split('\n'): 
... data = filter(None, line.split()) 
... if data: 
... a[data[1]].append(data) 
... 
>>> data 
[] 
>>> a 
defaultdict(<type 'list'>, {'john1': [['1', 'john1', '23', '54', '54'], ['5', 'john1', '12', '34', '65']], 'john2': [['2', 'john2', '34', '45', '66'], ['4', 'john2', '34', '54', '78']], 'john3': [['3', 'john3', '35', '43', '54'], ['6', 'john3', '34', '55', '66']]}) 
0

littletable使得這種簡單的切片和切塊容易,使得物體進入/可查詢/按屬性可轉動的列表,就像內存中的迷你數據庫一樣,但比SQLite的開銷更小。

from collections import namedtuple 
from littletable import Table 

data = """\ 
1 john1 23 54 54 
2 john2 34 45 66 
3 john3 35 43 54 
4 john2 34 54 78 
5 john1 12 34 65 
6 john3 34 55 66""" 

Record = namedtuple("Record", "id name length width height") 
def makeRecord(s): 
    s = s.strip().split() 
    # convert all but name to ints, and build a Record instance 
    return Record(*(ss if i == 1 else int(ss) for i,ss in enumerate(s))) 

# create a table and load it up 
# (if this were CSV data, would be even simpler) 
t = Table("data") 
t.create_index("id", unique=True) 
t.create_index("name") 
t.insert_many(map(makeRecord, data.splitlines())) 

# get a record by unique key 
# (unique indexes return just the single record) 
print t.id[4] 
print 

# get all records matching an indexed value 
# (non-unique index retrievals return a new Table) 
for d in t.name['john1']: 
    print d 
print 

# dump summary pivot tables 
t.pivot('name').dump_counts() 
print 

t.create_index('length') 
t.pivot('name length').dump_counts() 

打印:

Record(id=4, name='john2', length=34, width=54, height=78) 

Record(id=1, name='john1', length=23, width=54, height=54) 
Record(id=5, name='john1', length=12, width=34, height=65) 

Pivot: name 
john1  2 
john2  2 
john3  2 

Pivot: name,length 
      12  23  34  35 Total 
john1  1  1  0  0  2 
john2  0  0  2  0  2 
john3  0  0  1  1  2 
Total  1  1  3  1  6 
相關問題