2017-10-12 125 views
1

我有索引的duplicates_to_fetch數據幀:Python的熊貓KeyError異常: '標籤不在[指數]'

  mail_domaine   Values 
0     @A.com   [0, 2] 
1     @B.com   [1, 4] 

和以下df_to_rearrange數據幀

 movie_name    df_pname 
0    A    [mr a] 
1    B    [mr b] 
2    Aa    [mr aa] 
3    D    [mr d] 
4    Bb  [mr Bb, mr Bbb] 
5    E    [mr e] 

我想具有以下轉換後的數據幀

 movie_name     df_pname 
0   [A, Aa]    [mr a, mr aa] 
1   [B, Bb]  [mr b, mr Bb, mr Bbb] 
3    [D]      [mr d] 
5    [E]      [mr e]  

但是......當我放下行,算法停止,因爲缺少指數

我不喜歡

for i in range(0,len(druplicates_to_fetch)): 
     mylist = duplicates_to_fetch.loc[i,"Values"] 
     index_to_fetch_on = mylist[0] 

     # rearrange mylist (which can have >2 values) 
     mylist = [myindex for myindex in mylist if myindex != index_to_fetch_on] 

     for j in mylist: 
      df_to_rearrange.loc[index_to_fetch, "df_pname"].append(df_to_rearrange.loc[j, "df_pname"]) 
      df_to_rearrange.drop(df_to_rearrange.index[j], inplace=True) 

的錯誤是以下KeyError: 'the label [179] is not in the [index]'。那裏有更多的Pythonic方式來做到這一點?

+1

迭代用'對於i(0,LEN(duplicates_to_fetch)):MYLIST = duplicates_to_fetch.loc [我,「價值觀」]是笨重的。更好的辦法是:'爲我,duplicate_row in duplicates_to_fetch.iterrows()' – smci

+1

另外,'mylist = [ix for ix in mylist if ix!= index_to_fetch_on]'看起來像一個簡單的'drop()'操作。 – smci

回答

0

一種解決方案是

df_to_rearrange.drop(df_to_rearrange.index[np.where((df_to_rearrange.index==j))], inplace=True) 
1

這裏的另一種方式來做到這一點。在使用範圍爲lookup組和agg的列list

In [78]: lookup = {v: i for i, x in enumerate(duplicates_to_fetch['Values']) for v in x} 

In [79]: (df_to_rearrange.groupby(df_to_rearrange.index.to_series().map(lookup).fillna('')) 
      .agg({'movie_name': lambda x: [v for v in x], 
        'df_pname': lambda x: [a for v in x.values for a in v]}) 
      .reset_index(drop=True)) 
Out[79]: 
    movie_name    df_pname 
0 [A, Aa]   [mr a, mr aa] 
1 [B, Bb] [mr b, mr Bb, mr Bbb] 
2  [D]     [mr d] 

詳細

In [959]: duplicates_to_fetch 
Out[959]: 
    mail_domaine Values 
0  @A.com [0, 2] 
1  @B.com [1, 4] 

In [960]: df_to_rearrange 
Out[960]: 
    movie_name   df_pname 
0   A   [mr a] 
1   B   [mr b] 
2   Aa   [mr aa] 
3   D   [mr d] 
4   Bb [mr Bb, mr Bbb] 
In [958]: lookup 
Out[958]: {0: 0, 1: 1, 2: 0, 4: 1} 
+0

問題是'lambda x:list(x)',但這是由於我的數據 –

+0

現在更新爲這種情況 – Zero

+0

您的代碼將我的df從1135減少到37而不是1094(我的腳本完全呈現我想要的)。出於某種原因(group_by我認爲),您的代碼將一組1781個值! 也許如果你用'[mr E]'添加一行'E',你會看到pb –