2016-04-29 35 views
0

我有這樣Python:如何計算pandas數據框中對之間的協作?

df = pd.DataFrame({'Item':['A','A','A','B','B','C','C','C','C'], 
'Name':[Tom,John,Paul,Tom,Frank,Tom, John, Richard, James], 
'Weight:[2,2,2,3,3,5, 5, 5, 5]'}) 
df 
Item Name Weight 
A Tom  4 
A John 4 
A Paul 4 
B Tom  3 
B Frank 3 
C Tom  5 
C John 5 
C Richard 5 
C James 5 

對於每個人,我想與同一項目的人的名單平均在weight

df1 
Name    People       Times 
Tom  [John, Paul, Frank, Richard, James]  [(1/4+1/5),1/4,1/3,1/5,1/5] 
John [Tom, Richard, James]      [(1/4+1/5),1/5,1/5] 
Paul [Tom, John]        [1/4,1/4] 
Frank [Tom]          [1/3] 
Richard [Tom, John, James]      [1/5,1/5,1/5] 
James [Tom, John, Richard]      [1/5,1/5,1/5] 

爲了計算協作的時間,而不考慮的一個數據幀weight,我所做的:

#merge M:N by column Item 
df1 = pd.merge(df, df, on=['Item']) 

#remove duplicity - column Name_x == Name_y 
df1 = df1[~(df1['Name_x'] == df1['Name_y'])] 
#print df1 

#create lists 
df1 = df1.groupby('Name_x')['Name_y'].apply(lambda x: x.tolist()).reset_index() 
print df1 
    Name_x          Name_y 
0 Frank          [Tom] 
1 James      [Tom, John, Richard] 
2  John   [Tom, Paul, Tom, Richard, James] 
3  Paul        [Tom, John] 
4 Richard       [Tom, John, James] 
5  Tom [John, Paul, Frank, John, Richard, James] 


#get count by np.unique 
df1['People'] = df1['Name_y'].apply(lambda a: np.unique((a), return_counts =True)[0]) 
df1['times'] = df1['Name_y'].apply(lambda a: np.unique((a), return_counts =True)[1]) 
#remove column Name_y 
df1 = df1.drop('Name_y', axis=1).rename(columns={'Name_x':'Name'}) 
print df1 
     Name        People   times 
0 Frank        [Tom]    [1] 
1 James     [John, Richard, Tom]  [1, 1, 1] 
2  John   [James, Paul, Richard, Tom]  [1, 1, 1, 2] 
3  Paul       [John, Tom]   [1, 1] 
4 Richard     [James, John, Tom]  [1, 1, 1] 
5  Tom [Frank, James, John, Paul, Richard] [1, 1, 2, 1, 1] 

在過去的數據幀我有科拉的計數所有對之間硼化,但是我想他們的合作

回答

0

的加權計數與開始:

df = pd.DataFrame({'Item': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C'], 
        'Name': ['Tom', 'John', 'Paul', 'Tom', 'Frank', 'Tom', 'John', 'Richard', 'James'], 
        'Weight': [2, 2, 2, 3, 3, 5, 5, 5, 5]}) 

df1 = pd.merge(df, df, on=['Item']) 
df1 = df1[~(df1['Name_x'] == df1['Name_y'])].set_index(['Name_x', 'Name_y']).drop(['Item', 'Weight_y'], axis=1) 

你可以使用.apply()創造的價值和.unstack()爲寬幅:

collab = df1.groupby(level=['Name_x', 'Name_y']).apply(lambda x: np.sum(1/x)).unstack().loc[:, 'Weight_x'] 

Name_y  Frank James John Paul Richard  Tom 
Name_x             
Frank   NaN NaN NaN NaN  NaN 0.333333 
James   NaN NaN 0.2 NaN  0.2 0.200000 
John   NaN 0.2 NaN 0.5  0.2 0.700000 
Paul   NaN NaN 0.5 NaN  NaN 0.500000 
Richard  NaN 0.2 0.2 NaN  NaN 0.200000 
Tom  0.333333 0.2 0.7 0.5  0.2  NaN 

然後遍歷行並轉換爲列表:

df = pd.DataFrame(columns=['People', 'Times']) 
for p, data in collab.iterrows(): 
    s = data.dropna() 
    df.loc[p] = [s.index.tolist(), s.values] 

             People \ 
Frank         [Tom] 
James     [John, Richard, Tom] 
John    [James, Paul, Richard, Tom] 
Paul        [John, Tom] 
Richard     [James, John, Tom] 
Tom  [Frank, James, John, Paul, Richard] 

             Times 
Frank      [0.333333333333] 
James       [0.2, 0.2, 0.2] 
John      [0.2, 0.5, 0.2, 0.7] 
Paul        [0.5, 0.5] 
Richard      [0.2, 0.2, 0.2] 
Tom  [0.333333333333, 0.2, 0.7, 0.5, 0.2] 
+0

這是我想要的但我收到以下錯誤 – emax

+0

對不起,我跳過了一步,看到更新。 – Stefan

+0

太棒了!!!!!! – emax

相關問題