2017-01-20 70 views
-4

我在pyton兩個不同Dataframes這樣的:如何拆分和對比數據框在熊貓

import pandas 
df = pd.DataFrame({'AAA' : ["a1","a2","a3","a4","a5","a6","a7"], 
        'BBB' : ["c1","c2","c2","c2","c3","c3","c1"]}) 
df2 = pd.DataFrame({'AAA' : ["a1","a2","a4","a6","a7","a8","a9"], 
        'BBB' : ["c11","c21","c21","c12","c13","c13","c11"]}) 

Iwant比較「AAA」的值,並找到基於「BBB」組相似值的數量。 C1和C11之間的示例相似度爲1(a1)的C2之間 相似,C21爲2(A2,A4)

+0

對不起,我完全不理解這個問題,請問可以請reword –

+0

如何使用'CCC'?如果你不這樣做,爲什麼要展示給我們? – DyZ

+0

我刪除了CCC –

回答

0

下面的代碼來計算要(它不使用CCC列)的相似性:

sims = pd.merge(df,df2,how='outer').\ 
     groupby(['AAA'])['BBB'].sum().value_counts().reset_index() 
# index BBB 
#0 c2c21 2 
#1 c3c12 1 
#2 c1c13 1 
#3 c1c11 1 
#4  c2 1 
#5 c11 1 
#6  c3 1 
#7 c13 1 

sims['index'] = sims['index'].str.split('c').str[1:] 
sims[sims['index'].str.len() > 1] 
#  index BBB 
#0 [2, 21] 2 
#1 [3, 12] 1 
#2 [1, 13] 1 
#3 [1, 11] 1 
相關問題