使用'lookup'乘以兩個python數組

import numpy as np 
import pandas as pd 

columns = ['id', 'A', 'B', 'C'] 
index = np.arange(3) 

df = pd.DataFrame(np.random.randn(3,4), columns=columns, index=index) 

weights = {'A': 0.10, 'B': 1.00, 'C': 1.50}

我需要在每個「單元格」中使用相應的權重（不包括第一列）多個值。例如：使用'lookup'乘以兩個python數組

df.at[0,'A'] * weights['A'] 
df.at[0,'B'] * weights['B']

什麼是最有效的方式來做到這一點，並在新的DataFrame中產生結果？

來源

2017-05-18 Carl

設置

df 
Out[1013]: 
     id   A   B   C 
0 -0.641314 -0.526509 0.225116 -1.131141 
1 0.018321 -0.944734 -0.123334 -0.853356 
2 0.703119 0.468857 1.038572 -1.529723 

weights 
Out[1026]: {'A': 0.1, 'B': 1.0, 'C': 1.5} 

W = np.asarray([weights[e] for e in sorted(weights.keys())])

解決方案

#use a matrix multiplication to apply the weights to each column 
df.loc[:,['A','B','C']] *= W 
df 
Out[1016]: 
     id   A   B   C 
0 -0.641314 -0.052651 0.225116 -1.696712 
1 0.018321 -0.094473 -0.123334 -1.280034 
2 0.703119 0.046886 1.038572 -2.294584

更新

如果您需要保留列名靈活，我認爲更好的辦法是保存列名並在2個列表中加權：

columns = sorted(weights.keys()) 
Out[1072]: ['A', 'B', 'C'] 

weights = [weights[e] for e in columns] 
Out[1074]: [0.1, 1.0, 1.5]

然後，你可以這樣來做：

df.loc[:,columns] *=weights 

Out[1067]: 
     id   A   B   C 
0 -0.641314 -0.052651 0.225116 -1.696712 
1 0.018321 -0.094473 -0.123334 -1.280034 
2 0.703119 0.046886 1.038572 -2.294584

一個oneliner解決方案：

df.loc[:,sorted(weights.keys())] *=[weights[e] for e in sorted(weights.keys())] 

df 
Out[1089]: 
     id   A   B   C 
0 -0.641314 -0.052651 0.225116 -1.696712 
1 0.018321 -0.094473 -0.123334 -1.280034 
2 0.703119 0.046886 1.038572 -2.294584

來源

2017-05-18 04:09:18 Allen

這是否假定權重數組與df列的順序相同？在我的實際數據，他們不會，所以我需要一種方法來查找相應的權重 – Carl

@weights不是一個數組，它是一個'dict' –

請看看我的udpates，看看這是你所追求的。 – Allen

這裏有一個簡潔的方式，如果逗你的幻想：

In [11]: df.assign(**{"{}_product".format(cl): val*df.loc[:,cl] 
    ...:    for cl, val in weights.items()}) 
Out[11]: 
     id   A   B   C A_product B_product C_product 
0 -1.893885 0.940408 0.841350 -0.669378 0.094041 0.841350 -1.004067 
1 -0.526427 0.472322 -0.546121 0.201615 0.047232 -0.546121 0.302423 
2 -0.450193 -0.422066 0.564866 1.866878 -0.042207 0.564866 2.800318

或這一點，如果你想更換數據：

In [13]: df.assign(**{cl: val*df.loc[:,cl] 
    ...:    for cl, val in weights.items()}) 
Out[13]: 
     id   A   B   C 
0 -1.893885 0.094041 0.841350 -1.004067 
1 -0.526427 0.047232 -0.546121 0.302423 
2 -0.450193 -0.042207 0.564866 2.800318

這將導致新的數據幀，並就地不起作用。

來源

2017-05-18 04:23:58

我認爲simpliest是dict創建Series可以對齊索引列名：

print (df) 
     id   A   B   C 
0 -0.641314 -0.526509 0.225116 -1.131141 
1 0.018321 -0.944734 -0.123334 -0.853356 
2 0.703119 0.468857 1.038572 -1.529723 

print (pd.Series(weights)) 
A 0.1 
B 1.0 
C 1.5 
dtype: float64 

df[['A','B','C']] *= pd.Series(weights) 
print (df) 
     id   A   B   C 
0 -0.641314 -0.052651 0.225116 -1.696711 
1 0.018321 -0.094473 -0.123334 -1.280034 
2 0.703119 0.046886 1.038572 -2.294585

，更全面的解決方案，謝謝piRSquared和juanpa.arrivillaga：

df[list(weights)] *= pd.Series(weights) 
print (df) 
     id   A   B   C 
0 -0.641314 -0.052651 0.225116 -1.696711 
1 0.018321 -0.094473 -0.123334 -1.280034 
2 0.703119 0.046886 1.038572 -2.294585

來源

2017-05-18 04:57:07 jezrael

是的......這是你的答案。保持它的通用'df [list（weights.keys（））] * = pd.Series（權重）' – piRSquared

很好地完成。學到了新東西。 – Allen

非常好，但你應該使用'list（權重）'而不是'list（weights.keys（））' –

這適應非重疊鍵在數據框和詞典中都有

np.random.seed([3,1415])  
df = pd.DataFrame(
    np.random.randn(3,4), 
    columns='id A B C D'.split() 
) 

weights = dict(A=.1, B=1., C=1.5, D=2.) 

df 

     id   A   B   C 
0 -2.129724 -1.268466 -1.970500 -2.259055 
1 -0.349286 -0.026955 0.316236 0.348782 
2 0.715364 0.770763 -0.608208 0.352390

注：df有id其中weights沒有。 weights有D，其中df沒有。該解決方案僅修改重疊的列。而且，它非常簡潔。

df.update(df.mul(pd.Series(weights)).dropna(1)) 
df 

     id   A   B   C 
0 -2.129724 -0.126847 -1.970500 -3.388583 
1 -0.349286 -0.002696 0.316236 0.523173 
2 0.715364 0.077076 -0.608208 0.528586

來源

2017-05-18 06:12:19 piRSquared

使用'lookup'乘以兩個python數組

回答

相關問題