2014-11-21 316 views
1

我有一個Training_list,它的列表列表例如從列表創建一個子列表

[[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'], 
[1,2,3,4,5,6,7,8,9,10,11,12,13,'>50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'>50k'], 
[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'>50k'], 
... 
] 

我想根據最後一個屬性將這個列表分成兩個子列表。 第一個列表應包含所有的< 50k記錄的under_50k列表清單,例如,

[[1,2,3,4,5,6,7,8,9,10,11,12,13], [1,2,3,4,5,6,7,8,9,10,11,12,13], ...] 

第二個列表應包含所有> 50k記錄的over_50k列表列表,例如,

[[1,2,3,4,5,6,7,8,9,10,11,12,13], [1,2,3,4,5,6,7,8,9,10,11,12,13], ...] 

一旦兩個列表創建然後我試圖加起來每個索引列表 例如

[1,2,3,4,5,6,7,8,9,10,11,12,13] + [1,2,3,4,5,6,7,8,9,10,11,12,13] 
= [2,4,6,8,10,12,14,16,18,20,22,24,26] 

似乎可以得到清單的細分工作。

def sums_list(): 

    sums_list = [] 
    try: 
     for index in range(15): 
      sums_list.append(under_50k_list[index]+over_50k_list[index]) 
    except: 
     pass 
     return(sums_list) 

def under_over_lists(): 

    under_50k_list = [0]*14 
    under_50k_count = 0 
    over_50k_list = [0]*14 
    over_50k_count = 0 
    try: 
     for row in training_list: 
      if row[-1].lstrip() == '<=50K': 
       under_50k_list = sums_list(under_50k_list, row[:-1]) 
       under_50k_count += 1 
      else: 
       if row[-1].lstrip() == '>50K': 
        over_50k_list = sums_list(over_50k_list, row[:-1]) 
        over_50k_count += 1 
    except: 
     pass 
     print(under_50k_list) 
     return under_over_lists 
+0

任何幫助將不勝感激。謝謝 – saggart 2014-11-21 14:33:56

+0

您應該提供額外的標籤,例如這是什麼編程語言。 – user1438038 2014-11-21 14:35:46

+0

對不起,我是新來的堆棧溢出,它的python – saggart 2014-11-21 14:41:49

回答

0

您可以使用numpy如果list的每個子列表是同樣大小的

>>> import numpy as np 
>>> llist=[[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'], [1,2,3,4,5,6,7,8,9,10,11,12,13,'>50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'>50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'],[1,2,3,4,5,6,7,8,9,10,11,12,13,'>50k']] 
>>> under_50k_list=[i[:-1] for i in llist if i[-1]=='<50k'] 
>>> under_50k_list 
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]] 
>>> over_50k_list=[i[:-1] for i in llist if i[-1]=='>50k'] 
>>> over_50k_list 
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]] 
>>> sum(np.array(under_50k_list)) 
array([ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39]) 
>>> under_50k_sum=sum(np.array(under_50k_list)) 
>>> over_50k_sum=sum(np.array(over_50k_list)) 
>>> under_50k_sum 
array([ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39]) 
>>> over_50k_sum 
array([ 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39]) 
0

您應該使用append方法,讓您的細分正常工作。我覺得它對動態大小的列表更友好。

over_50k = [] 
under_50k = [] 
for row in training_list: 
    if row[-1] == "<50k": 
     under_50k.append(row[:-1]) 
    elif row[-1] == ">50k": 
     over_50k.append(row[:-1]) 

現在,讓您的資金:

over_50k_sum = [i for i in over_50k[0]] # initialize with the first one 
for i in range(1,len(over_50k)):   # skips the first one 
    for j in range(len(over_50k[i])): 
     over_50k_sum[j] += over_50k[i][j] 

under_50k_sum = [i for i in under_50k[0]] # initialize with the first one 
for i in range(1,len(under_50k)):   # skips the first one 
    for j in range(len(under_50k[i])): 
     under_50k_sum[j] += under_50k[i][j] 
+0

[37,0.7173543689320389,None,None,9,0.3351132686084142,0.05165857605177993,0.2942961165048544,0.8373381877022654,0.6119741100323625,0,045,None,' <= 50K'],[46,0.7173543689320389,無,無,13,0.03673139158576052,0.113989838187702265,0.3013349514563107,0.8373381877022654,0.38802588996763754,0,0,25,無,'= 50K'],[44,0.7173543689320389,無,無,9,0.1610032362459547,0.12823624595469255,0.3013349514563107,0.8373381877022654,0.6119741100323625,0,0,40,None,'> 50K'], – saggart 2014-11-21 17:57:39

+0

嗨,喬,我試過你的代碼; over_50k = [] under_50k = [] 爲行中training_list: 打印( 「U」) 如果行[-1] .lstrip()== '<= 50K': 打印( 「是」) under_50k.append(row [: - 1]) print(「b」) else: if row [-1] .lstrip()=='> 50K': over_50k.append(row [: - 1] ) – saggart 2014-11-21 17:58:33

+0

它看起來不會創建over&under列表,我嘗試在下面添加print語句 - training_list中的行,如果row [-1] .lstrip()=='<= 50K'也在下面:&if row [-1 ] .lstrip()=='> 50K':他們都打印出來,不知道是什麼問題,我包含在我的實際培訓列表 – saggart 2014-11-21 18:02:15

0

假設你只想要的結果,而不是中間的列表,它只是:

ll = [[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'], ... ] 

sumlist = lambda a,b:[x+y for x,y in zip(a,b)] 
def sum_if(lists, key): 
    return reduce(sumlist, (l[:-1] for l in lists if l[-1]==key)) 

under_50k_count = sum_if(ll, '<50k') 
over_50k_count = sum_if(ll, '>50k') 

這可能是值得導入ìzip from itertools和如果您的列表很長並且想要減少複製,那麼使用該代替zip,但它肯定不是必需的。

0

由於單行的樂趣:

ll = [[1,2,3,4,5,6,7,8,9,10,11,12,13,'<50k'], ...] 
under_50k_count = [sum(b) for b in zip(*[a[:-1] for a in ll if a[-1].startswith('<')])] 
over_50k_count = [sum(b) for b in zip(*[a[:-1] for a in ll if a[-1].startswith('>')])] 

這本身是不是一個有用的堆棧溢出的答案,落入類別「試試這個」沒有解釋,讓我們打破它一點。

我們用列表解析分離出名單的2個不同的種類:

[a[:-1] for a in ll if a[-1].startswith('<')] 

我們再unpack這個名單並把它傳遞給壓縮,這爲我們提供了一個元組的列表:

[(1,1,1,...), (2,2,2,...), ...] 

然後,我們使用另一個列表理解來對這些元組進行求和。

列表解析明顯很快,解包列表不是(尤其是大列表)。所以,雖然壓縮這樣的東西很有趣,但如果速度有任何問題,或者如果任何人繼承你的代碼甚至有遠程可能性,我都不會推薦使用它。