根據分組變量從文件加載列表列表？

如果我有一個文件：根據分組變量從文件加載列表列表？

A pgm1 
A pgm2 
A pgm3 
Z pgm4 
Z pgm5 
C pgm6 
C pgm7 
C pgm8 
C pgm9

如何創建列表：

[['pgm1','pgm2','pgm3'],['pgm4','pgm5'],['pgm6','pgm7','pgm8','pgm9']]

我需要保留從負載文件中的原始順序。所以[pgm4，pgm5]必須是第二個子列表。

我的偏好是當分組變量從前一個變爲「A，Z，C」時觸發新的子列表。但是我可以接受，如果分組變量必須是連續的，即「1,2,3」。

（這是爲了支持運行在每個子列表兼任節目，而是在等待所有上游方案，以進行下一列表之前完成。）

我在RHEL 2.6.32使用Python 2.6 .6

來源

2015-11-16 Scott

請您能不能告訴你有什麼到目前爲止已經試過？ – styvane

我進行了網絡搜索並搜索了超過一個小時之前發佈的「列表的Python文件列表」。難倒我的是如何檢測團隊何時改變。話雖如此，未來我會盡我所能提供我已經嘗試的示例代碼，作爲所有SO帖子的一部分。 – Scott

在我的OP後，其他網絡搜索發現這個：How do I use Python's itertools.groupby()?

這是我目前的方法。請告知我是否可以使它更加Pythonic。

loadfile1.txt（無分組變量 - 相同的輸出loadfile4.txt）：

pgm1 
pgm2 
pgm3 

pgm4 
pgm5 

pgm6 
pgm7 
pgm8 
/a/path/with spaces/pgm9

loadfile2.txt（隨機分組變量）：

10, pgm1 
10, pgm2 
10, pgm3 

ZZ, pgm4 
ZZ, pgm5 

-5, pgm6 
-5, pgm7 
-5, pgm8 
-5, /a/path/with spaces/pgm9

loadfile3.txt（同一分組變量 - 不依賴關係 - 多線程）：

,pgm1 
,pgm2 
,pgm3 

,pgm4 
,pgm5 

,pgm6 
,pgm7 
,pgm8 
,/a/path/with spaces/pgm9

loadfile4.txt（不同的分組變量 - dep endencies - 單線程）：

1, pgm1 
2, pgm2 
3, pgm3 

4, pgm4 
5, pgm5 

6, pgm6 
7, pgm7 
8, pgm8 
9, /a/path/with spaces/pgm9

我的Python腳本：

#!/usr/bin/python 

# See https://stackoverflow.com/questions/4842057/python-easiest-way-to-ignore-blank-lines-when-reading-a-file 

# convert file to list of lines, ignoring any blank lines 
filename = 'loadfile2.txt' 

with open(filename) as f_in: 
    lines = filter(None, (line.rstrip() for line in f_in)) 

print(lines) 

# convert list to a list of lists split on comma 
lines = [i.split(',') for i in lines] 
print(lines) 

# create list of lists based on the key value (first item in sub-lists) 
listofpgms = [] 
for key, group in groupby(lines, lambda x: x[0]): 
    pgms = [] 
    for pgm in group: 
     try: 
      pgms.append(pgm[1].strip()) 
     except IndexError: 
      pgms.append(pgm[0].strip()) 

    listofpgms.append(pgms) 

print(listofpgms)

輸出使用loadfile2.txt時：

['10, pgm1', '10, pgm2', '10, pgm3', 'ZZ, pgm4', 'ZZ, pgm5', '-5, pgm6', '-5, pgm7', '-5, pgm8', '-5, /a/path/with spaces/pgm9'] 
[['10', ' pgm1'], ['10', ' pgm2'], ['10', ' pgm3'], ['ZZ', ' pgm4'], ['ZZ', ' pgm5'], ['-5', ' pgm6'], ['-5', ' pgm7'], ['-5', ' pgm8'], ['-5', ' /a/path/with spaces/pgm9']] 
[['pgm1', 'pgm2', 'pgm3'], ['pgm4', 'pgm5'], ['pgm6', 'pgm7', 'pgm8', '/a/path/with spaces/pgm9']]

來源

2015-11-17 00:39:26 Scott

只需使用collections.defaultdict()。

代碼：

import collections 
d = collections.defaultdict(list) 

infile = 'filename' 
with open(infile) as f: 
    a = [i.strip() for i in f] 

a = [i.split() for i in a] 

for key, value in a: 
    d[key].append(value) 

l = list(d.values())

演示：

>>> import collections 
>>> d = collections.defaultdict(list) 

>>> infile = 'filename' 
>>> with open(infile) as f: 
...  a = [i.strip() for i in f] 

>>> a = [i.split() for i in a] 
>>> a 
[['A', 'pgm1'], ['A', 'pgm2'], ['A', 'pgm3'], ['Z', 'pgm4'], ['Z', 'pgm5'], ['C', 'pgm6'], ['C', 'pgm7'], ['C', 'pgm8'], ['C', 'pgm9']] 

>>> for key, value in a: 
...  d[key].append(value) 

>>> d 
defaultdict(<class 'list'>, {'A': ['pgm1', 'pgm2', 'pgm3'], 'C': ['pgm6', 'pgm7', 'pgm8', 'pgm9'], 'Z': ['pgm4', 'pgm5']}) 

>>> d.values() 
dict_values([['pgm1', 'pgm2', 'pgm3'], ['pgm6', 'pgm7', 'pgm8', 'pgm9'], ['pgm4', 'pgm5']]) 

>>> list(d.values()) 
[['pgm1', 'pgm2', 'pgm3'], ['pgm6', 'pgm7', 'pgm8', 'pgm9'], ['pgm4', 'pgm5']] 
>>>

的打擊代碼做同樣的事情，上面的代碼做，但保留順序：

infile = 'filename' 
with open(infile) as f: 
    a = [i.strip() for i in f] 

a = [i.split() for i in a] 

def orderset(seq): 
    seen = set() 
    seen_add = seen.add 
    return [ x for x in seq if not (x in seen or seen_add(x))] 

l = [] 
for i in orderset([i[0] for i in a]): 
    l.append([j[1] for j in a if j[0] == i])

來源

2015-11-16 06:03:10

我需要保留加載文件的原始順序。所以pgm4，pgm5需要成爲第二個子列表。我使用Python 2.6.6在RHEL 2.6.32上，所以我沒有OrderedDict。 – Scott

@Scott：嗯......讓我編輯我的答案，請稍等...... –

@Scott：好的，完成了。希望這個幫助:) –

根據分組變量從文件加載列表列表？

回答

相關問題