Python熊貓：有效比較數據幀的行？

我有數據幀的DFM「：Python熊貓：有效比較數據幀的行？

match    group 
adamant   86 
adamant   86 
adamant bild  86 
360works   94 
360works   94

在「組」列是一樣的，我想用兩到了「匹配」列兩者的內容比較和另一列'添加的比較結果結果'。例如預期的結果是：

group  compare        result 
    86  adamant, adamant       same 
    86  adamant, adamant bild     not same 
    86  adamant, adamant bild     not same 
    94  360works,360works       same

任何人都可以幫忙嗎？

來源

2015-04-29 UserYmY

你能清理你預期的結果？我認爲格式化沒有按照您的預期發佈。無論哪種方式，似乎有點混淆 – afinit

@benine對不起！我編輯了文本 – UserYmY

你想在每個組中選擇每個可能的對嗎？ –

有點哈克，但它似乎爲我工作：

# initialize the list to store the dictionaries 
# that will create the new DataFrame 
new_df_dicts = [] 

# group on 'group' 
for group, indices in dfm.groupby('group').groups.iteritems(): 
    # get the values in the 'match' column 
    vals = dfm.ix[indices]['match'].values 
    # choose every possible pair from the array of column values 
    for i in range(len(vals)): 
     for j in range(i+1, len(vals)): 
      # compute the new values 
      compare = vals[i] + ', ' + vals[j] 
      if vals[i] == vals[j]: 
       result = 'same' 
      else: 
       result = 'not same' 
      # append the results to the DataFrame 
      new_df_dicts.append({'group': group, 'compare': compare, 'result': result}) 

# create the new DataFrame 
new_df = DataFrame(new_df_dicts)

這裏是我的輸出：

    compare group result 
0  360works, 360works  94  same 
1  adamant, adamant  86  same 
2 adamant, adamant bild  86 not same 
3 adamant, adamant bild  86 not same

以前我建議追加行已初始化的數據幀。從字典列表中創建一個DataFrame，而不是對DataFrame進行很多附加操作，運行速度快9-10倍。

來源

2015-04-29 19:22:07

kellehr非常感謝。我得到這個錯誤：TypeError：不支持的操作數類型爲+：'float'和'str' – UserYmY

當你嘗試'compare = str（vals [i]）+'，'+ str（vals [j ]）？ –

Thans工作。問題在於數據幀非常大，有193000行。這個解決方案可以更高效嗎？ – UserYmY

-1

這是另一種選擇。不知道是否它的效率更高，雖然

import itertools 
import pandas as pd 

new_df = pd.DataFrame() 
for grp in set(dfm['group']): 
    for combo in itertools.combinations(dfm[dfm['group'] == grp].index, 2): 
     # compute the new values 
     match1 = dfm['match'][combo[0]] 
     match2 = dfm['match'][combo[0]] 
     compare = match1 + ', ' + match2 
     if match1 == match2: 
      result = 'same' 
     else: 
      result = 'not same' 
     # append the results to the DataFrame 
     new_df = new_df.append({'group': grp, 'compare': compare, 'result': result}, ignore_index=True) 

print new_df

（格式化從詹姆斯的回答借來的）

來源

2015-04-29 20:37:21 afinit

Python熊貓：有效比較數據幀的行？

回答

相關問題