2017-02-21 130 views
0

我在熊貓數據幀上應用K-means聚類。集羣分配功能如下:Keyerror在應用lambda函數時發生在熊貓數據幀

def assign_to_cluster(row): 
    lowest_distance = -1 
    closest_cluster = -1 

    for cluster_id, centroid in centroids_dict.items(): 
     df_row = [row['PPG'],row['ATR']] 
     euclidean_distance = calculate_distance(centroids, df_row) 

     if lowest_distance == -1: 
      lowest_distance = euclidean_distance 
      closest_cluster = cluster_id 
     elif euclidean_distance < lowest_distance: 
      lowest_distance = euclidean_distance 
      closest_cluster = cluster_id 
    return closest_cluster 

point_guards['CLUSTER'] = point_guards.apply(lambda row: assign_to_cluster(row), axis=1) 

但我得到以下錯誤在使用lambda函數:

1945     return self._engine.get_loc(key) 
    1946    except KeyError: 
-> 1947     return   self._engine.get_loc(self._maybe_cast_indexer(key)) 
    1948 
    1949   indexer = self.get_indexer([key], method=method, tolerance=tolerance) 

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4154)() 

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)() 

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item  (pandas\hashtable.c:12368)() 

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)() 

KeyError: (0, 'occurred at index 0') 

是否有人可以提供解釋爲錯誤的原因,以及如何我可以解決嗎?如果您需要更多信息,請回復此郵件。 並道歉的格式。這是我第一次在StackOverflow中提問。

+0

什麼是point_guards.head()? – putonspectacles

+0

請參閱:http://stackoverflow.com/questions/16353729/pandas-how-to-use-apply-function-to-multiple-columns – putonspectacles

+0

@putonspectacles:point_guards是我工作的熊貓數據框的名稱。 head()函數打印數據幀的前10行。至少,這是我認爲的確如此。 –

回答

0

事實證明,我犯了一個簡單的語法錯誤。除了使用字典的「重心」部分「centroid_dict.items()」,而調用函數「calculate_distance」的:

for cluster_id, centroid in centroids_dict.items(): 
    df_row = [row['PPG'],row['ATR']] 
    euclidean_distance = calculate_distance(centroid, df_row) 
.... 

我用「重心」,而不是:

for cluster_id, centroid in centroids_dict.items(): 
    df_row = [row['PPG'],row['ATR']] 
    euclidean_distance = calculate_distance(centroids, df_row) 

它解決現在雖然。