要根據條件形成羣集？

將樣品輸入文件（實際輸入文件包含大約50,000個條目）：要根據條件形成羣集？

我必須在列中的每個值與相同的值等615615615比較必須被分組在一起羣集必須包含像146180 COLUMN1值.. ...... 45,49則羣集必須打破&形式的另一個羣集爲下一組相同的值616616616 ..........的等

我寫的代碼是：

from __future__ import division 
from sys import exit 
h = 0 
historyjobs = [] 
targetjobs = [] 


def quickzh(zhlistsub, 
    targetjobs=targetjobs,num=0,denom=0): 

li = [] ; ji = [] 
j = 0 
for i in zhlistsub: 
    x1 = targetjobs[j][0] 

    x = targetjobs[i][0] 

    num += x 
    denom += 1 
    if x1 >= 0.9 * (num/denom):#to group all items with same value in column 0 
     li.append(targetjobs[i][1]) 
    else: 
     break  
return li 


def filewr(listli): 
global h 
s = open("newout1","a") 
if(len(listli) != 0): 
     h += 1 
     s.write("cluster: %d"%h) 
     s.write("\n") 
     s.write(str(listli)) 
     s.write("\n\n") 
else: 
     print "0" 


def new(inputfile, 
historyjobs=historyjobs,targetjobs=targetjobs): 
zhlistsub = [];zhlist = [] 
k = 0 

with open(inputfile,'r') as f: 
    for line in f: 
     job = map(int,line.split()) 
     targetjobs.append(job) 
    while True: 
    if len(targetjobs) != 0: 

     zhlistsub = [i for i, element in enumerate(targetjobs)] 

     if zhlistsub: 
      listrun = quickzh(zhlistsub) 
      filewr(listrun) 
     historyjobs.append(targetjobs.pop(0)) 
     k += 1 
    else: 
     break 

new('newfinal1')

輸出，我得到的是：

cluster: 1 
[146, 180, 53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53] 

cluster: 2 
[180, 53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53] 

cluster: 3 
[53, 42, 52, 52, 51, 45, 49, 34, 44, 42, 41, 42, 42, 43, 42, 33, 33, 33, 33, 33, 47, 68, 449, 41, 1138, 46, 53] 
..................so on

但是，我需要輸出爲：

cluster: 1 
    [146, 180, 53, 42, 52, 52, 51, 45, 49] 
    cluster: 2 
    [34, 44, 42, 41, 42] 
    cluster: 3 
    [42, 43, 42] 
    _____________________ so on

所以任何人都可以建議我應該做哪些改變來調節，以獲得所需的結果。它是真的有用嗎？

來源

2013-09-27 jhon cooper

我有一個真正艱難的時間，瞭解你需要什麼...但通常對於分組，'itertools.groupby'或者'collections.defaultdict'是要走的路... – mgilson

試試這個，groupby負責創建羣的照顧，所有剩下要做的就是建立名單：

import itertools as it 
[[y[1] for y in x[1]] for x in it.groupby(data, key=lambda x:x[0])]

上述假設data是你輸入所在，而且它已經過濾和排序由第一列。對於這個問題的例子，它看起來像這樣：

data = [[615, 146], [615, 180], [615, 53] ... ]

來源

2013-09-27 03:23:58

如果x1> = 0.9 *（num/denom），你可以在我的if if條件中提出一些條件：''提供結果。 –

我的答案有助於構建羣集，但尚不清楚如何使用該條件過濾值。我只能建議你將問題分成兩部分，首先過濾掉輸入，在我的例子中建立一個列表作爲'data'，然後用上面的列表理解建立集羣 –

沒有測試的答案，但按照這個概念

import collections.defaultdict 

cluster=defaultdict(list) 

with open(inputfile,'r') as f: 
    for line in f: 
     clus, val = line.split() 
     cluster[clus].append(val) 

for clus, val in cluster: 
    print "cluster" +str(clus)+"\n" 
    print str(val)+"\n"

來源

2013-09-27 03:32:15 Ananta

要根據條件形成羣集？

回答

相關問題