2014-09-04 35 views
0

我有一個三元組元組列表。前兩個項目是經常重複(GPS座標),而最後一個項目是一個分數(信號強度)去重複元組列表,喜歡某些元組

[(62.45807, -114.41026, 8), 
(62.45807, -114.41026, 11), 
(62.45807, -114.41026, 18), 
(62.45807, -114.41026, 16), 
(62.45807, -114.41026, 9), 
(62.45785, -114.41003, 23), 
(62.45785, -114.41003, 19), 
(62.45785, -114.41003, 11), 
(62.45785, -114.41003, 17), 
(62.45785, -114.41003, 14), 
(62.45785, -114.41003, 11), 
(62.45785, -114.41003, 15), 
(62.45765, -114.40978, 28), 
(62.45765, -114.40978, 16), 
(62.45765, -114.40978, 10), 
(62.45765, -114.40978, 15), 
(62.45765, -114.40978, 25)] 

我想知道如何刪除重複的GPS座標,而寧願得分最高與此落得:

[(62.45807, -114.41026, 18), 
(62.45785, -114.41003, 23), 
(62.45765, -114.40978, 28)] 

怎麼辦相同,但平均得分像這樣的東西

[(62.45807, -114.41026, 12), 
(62.45785, -114.41003, 16), 
(62.45765, -114.40978, 19)] 
+0

你是怎麼試圖解決這個問題的? – APerson 2014-09-04 13:15:47

+0

熊貓有你想要的功能。類似的問題在這裏:http://stackoverflow.com/questions/12497402/python-pandas-remove-duplicates-by-columns-a-keeping-the-row-with-the-highest – Vicky 2014-09-04 13:22:55

+0

答案如何「太寬泛', 請?我提供了樣本輸入,預期輸出並描述了從一個到另一個的條件。我也得到了及時的答覆。我想了解這個問題如何能夠做得更好以備將來參考。謝謝。 – user3481267 2014-09-04 16:12:50

回答

2

落得聽起來像是工作3210:

>>> from itertools import groupby 

最大:

>>> [max(g, key=lambda x:x[-1]) for k, g in groupby(data, key= lambda x:x[:2])] 
[(62.45807, -114.41026, 18), 
(62.45785, -114.41003, 23), 
(62.45765, -114.40978, 28)] 

平均:

>>> [a + (round(sum(c for _, _, c in b)/float(len(b))),) 
         for a, b in ((k, list(g)) for k, g in 
              groupby(data, key= lambda x:x[:2]))] 
[(62.45807, -114.41026, 12.0), 
(62.45785, -114.41003, 16.0), 
(62.45765, -114.40978, 19.0)] 
+0

謝謝!這是簡潔的,並有訣竅。 – user3481267 2014-09-04 16:09:52

0

你可以做一個函數,每個值映射到一個字典,一個關鍵的GPS座標,其中該值爲分數列表

def create_gps_score_dict(gps_score_list): 
    gps_score_dict = {} 
    for gps_score in gps_score_list: 
     if (gps_score[0], gps_score[1]) in gps_score_dict.keys(): 
      gps_score_dict[(gps_score[0], gps_score[1])].append(gps_score[2]) 
     else: 
      gps_score_dict[(gps_score[0], gps_score[1])] = [gps_score[2]] 
    return gps_score_dict 

現在你可以生成看這個簡單字典的結果。

def max_gps_scores(gps_score_dict): 
    gps_score_list = [] 
    for gps, score in gps_score_dict.items(): 
     gps_score_list.append((gps[0], gps[1], max(score)) 

>>> gps_score_list=[(62.45807, -114.41026, 8), 
    (62.45807, -114.41026, 11), 
    (62.45807, -114.41026, 18), 
    (62.45807, -114.41026, 16), 
    (62.45807, -114.41026, 9), 
    (62.45785, -114.41003, 23), 
    (62.45785, -114.41003, 19), 
    (62.45785, -114.41003, 11), 
    (62.45785, -114.41003, 17), 
    (62.45785, -114.41003, 14), 
    (62.45785, -114.41003, 11), 
    (62.45785, -114.41003, 15), 
    (62.45765, -114.40978, 28), 
    (62.45765, -114.40978, 16), 
    (62.45765, -114.40978, 10), 
    (62.45765, -114.40978, 15), 
    (62.45765, -114.40978, 25)] 

>>> max_gps_scores(create_gps_score_dict(gps_score_list)) 
[(62.45807, -114.41026, 18), (62.45765, -114.40978, 28), (62.45785, -114.41003,23)] 

我會離開平均高達你!