2015-11-16 75 views
1

如果我有一個文件:根據分組變量從文件加載列表列表?

A pgm1 
A pgm2 
A pgm3 
Z pgm4 
Z pgm5 
C pgm6 
C pgm7 
C pgm8 
C pgm9 

如何創建列表:

[['pgm1','pgm2','pgm3'],['pgm4','pgm5'],['pgm6','pgm7','pgm8','pgm9']] 

我需要保留從負載文件中的原始順序。所以[pgm4,pgm5]必須是第二個子列表。

我的偏好是當分組變量從前一個變爲「A,Z,C」時觸發新的子列表。但是我可以接受,如果分組變量必須是連續的,即「1,2,3」。

(這是爲了支持運行在每個子列表兼任節目,而是在等待所有上游方案,以進行下一列表之前完成。)

我在RHEL 2.6.32使用Python 2.6 .6

+0

請您能不能告訴你有什麼到目前爲止已經試過? – styvane

+0

我進行了網絡搜索並搜索了超過一個小時之前發佈的「列表的Python文件列表」。難倒我的是如何檢測團隊何時改變。話雖如此,未來我會盡我所能提供我已經嘗試的示例代碼,作爲所有SO帖子的一部分。 – Scott

回答

0

在我的OP後,其他網絡搜索發現這個:How do I use Python's itertools.groupby()?

這是我目前的方法。請告知我是否可以使它更加Pythonic。

loadfile1.txt(無分組變量 - 相同的輸出loadfile4.txt):

pgm1 
pgm2 
pgm3 

pgm4 
pgm5 

pgm6 
pgm7 
pgm8 
/a/path/with spaces/pgm9 

loadfile2.txt(隨機分組變量):

10, pgm1 
10, pgm2 
10, pgm3 

ZZ, pgm4 
ZZ, pgm5 

-5, pgm6 
-5, pgm7 
-5, pgm8 
-5, /a/path/with spaces/pgm9 

loadfile3.txt(同一分組變量 - 不依賴關係 - 多線程):

,pgm1 
,pgm2 
,pgm3 

,pgm4 
,pgm5 

,pgm6 
,pgm7 
,pgm8 
,/a/path/with spaces/pgm9 

loadfile4.txt(不同的分組變量 - dep endencies - 單線程):

1, pgm1 
2, pgm2 
3, pgm3 

4, pgm4 
5, pgm5 

6, pgm6 
7, pgm7 
8, pgm8 
9, /a/path/with spaces/pgm9 

我的Python腳本:

#!/usr/bin/python 

# See https://stackoverflow.com/questions/4842057/python-easiest-way-to-ignore-blank-lines-when-reading-a-file 

# convert file to list of lines, ignoring any blank lines 
filename = 'loadfile2.txt' 

with open(filename) as f_in: 
    lines = filter(None, (line.rstrip() for line in f_in)) 

print(lines) 

# convert list to a list of lists split on comma 
lines = [i.split(',') for i in lines] 
print(lines) 

# create list of lists based on the key value (first item in sub-lists) 
listofpgms = [] 
for key, group in groupby(lines, lambda x: x[0]): 
    pgms = [] 
    for pgm in group: 
     try: 
      pgms.append(pgm[1].strip()) 
     except IndexError: 
      pgms.append(pgm[0].strip()) 

    listofpgms.append(pgms) 

print(listofpgms) 

輸出使用loadfile2.txt時:

['10, pgm1', '10, pgm2', '10, pgm3', 'ZZ, pgm4', 'ZZ, pgm5', '-5, pgm6', '-5, pgm7', '-5, pgm8', '-5, /a/path/with spaces/pgm9'] 
[['10', ' pgm1'], ['10', ' pgm2'], ['10', ' pgm3'], ['ZZ', ' pgm4'], ['ZZ', ' pgm5'], ['-5', ' pgm6'], ['-5', ' pgm7'], ['-5', ' pgm8'], ['-5', ' /a/path/with spaces/pgm9']] 
[['pgm1', 'pgm2', 'pgm3'], ['pgm4', 'pgm5'], ['pgm6', 'pgm7', 'pgm8', '/a/path/with spaces/pgm9']] 
1

只需使用collections.defaultdict()

代碼:

import collections 
d = collections.defaultdict(list) 

infile = 'filename' 
with open(infile) as f: 
    a = [i.strip() for i in f] 

a = [i.split() for i in a] 

for key, value in a: 
    d[key].append(value) 

l = list(d.values()) 

演示:

>>> import collections 
>>> d = collections.defaultdict(list) 

>>> infile = 'filename' 
>>> with open(infile) as f: 
...  a = [i.strip() for i in f] 

>>> a = [i.split() for i in a] 
>>> a 
[['A', 'pgm1'], ['A', 'pgm2'], ['A', 'pgm3'], ['Z', 'pgm4'], ['Z', 'pgm5'], ['C', 'pgm6'], ['C', 'pgm7'], ['C', 'pgm8'], ['C', 'pgm9']] 

>>> for key, value in a: 
...  d[key].append(value) 

>>> d 
defaultdict(<class 'list'>, {'A': ['pgm1', 'pgm2', 'pgm3'], 'C': ['pgm6', 'pgm7', 'pgm8', 'pgm9'], 'Z': ['pgm4', 'pgm5']}) 

>>> d.values() 
dict_values([['pgm1', 'pgm2', 'pgm3'], ['pgm6', 'pgm7', 'pgm8', 'pgm9'], ['pgm4', 'pgm5']]) 

>>> list(d.values()) 
[['pgm1', 'pgm2', 'pgm3'], ['pgm6', 'pgm7', 'pgm8', 'pgm9'], ['pgm4', 'pgm5']] 
>>> 

的打擊代碼做同樣的事情,上面的代碼做,但保留順序:

infile = 'filename' 
with open(infile) as f: 
    a = [i.strip() for i in f] 

a = [i.split() for i in a] 

def orderset(seq): 
    seen = set() 
    seen_add = seen.add 
    return [ x for x in seq if not (x in seen or seen_add(x))] 

l = [] 
for i in orderset([i[0] for i in a]): 
    l.append([j[1] for j in a if j[0] == i]) 
+0

我需要保留加載文件的原始順序。所以pgm4,pgm5需要成爲第二個子列表。我使用Python 2.6.6在RHEL 2.6.32上,所以我沒有OrderedDict。 – Scott

+0

@Scott:嗯......讓我編輯我的答案,請稍等...... –

+0

@Scott:好的,完成了。希望這個幫助:) –