多列的熊貓計算和給出多個條件

我有一個寬表的格式如下（最多10人）：多列的熊貓計算和給出多個條件

person1_status | person2_status | person3_status | person1_type | person_2 type | person3_type 
     0  |  1  |  0  |  7  |  4  |  6

凡狀態可以是0或1（第一3列）。

凡類型可以是範圍從4-7＃。此處的值對應於另一個指定基於類型的值的表。所以......

Type | Value 
4 | 10 
5 | 20 
6 | 30 
7 | 40

我需要計算兩列， 'A' 和 'B'，其中：

A是每個人的類型的值的總和（在行），其中狀態= 0
B是每個人的類型的值的總和（在該行中），其中狀態= 1

例如，所得到的列 'A'，和 'B' 將如下所示：

A | B 
70 | 10

這種情況的一個解釋是：

'A' 具有值70，因爲PERSON1和Person3可能具有「狀態「0並且具有對應的類型7和6（其對應於值30和40）。

類似地，應該有另一列「B」具有值「10」，因爲只有具有PERSON2狀態「1」和它們的類型是「4」（其具有10對應的值）。

這可能是一個愚蠢的問題，但我要如何做到這一點的矢量化的方式？我不想用一個for循環或任何事情，因爲這將是效率較低......

我希望是有道理的......任何人都可以幫助我嗎？我想我腦子裏死了，試圖弄清楚這一點。

對於更簡單的計算列，我只是np.where，但我有點卡在這裏，因爲我需要計算來自多個列的值的總和給定的條件，同時從單獨的表中拉入這些值。 ..

希望是有道理的

來源

2016-12-20 shishy

您能否提供[最小，完整和可驗證示例]（http://stackoverflow.com/help/mcve）？ –

是我給出的例子更清楚 – shishy

使用過濾器的方法，這將過濾的列名的那些字符串出現在其中。

製作用於查找一個數據幀值other_table和設置索引作爲類型列中。

df_status = df.filter(like = 'status') 
df_type = df.filter(like = 'type') 
df_type_lookup = df_type.applymap(lambda x: other_table.loc[x]).values 

df['A'] = np.sum((df_status == 0).values * df_type_lookup, 1) 
df['B'] = np.sum((df_status == 1).values * df_type_lookup, 1)

下面

完整的示例：

製造假數據

df = pd.DataFrame({'person_1_status':np.random.randint(0, 2,1000) , 
        'person_2_status':np.random.randint(0, 2,1000), 
        'person_3_status':np.random.randint(0, 2,1000), 
        'person_1_type':np.random.randint(4, 8,1000), 
        'person_2_type':np.random.randint(4, 8,1000), 
        'person_3_type':np.random.randint(4, 8,1000)}, 
       columns= ['person_1_status', 'person_2_status', 'person_3_status', 
          'person_1_type', 'person_2_type', 'person_3_type']) 

person_1_status person_2_status person_3_status person_1_type \ 
0    1    0    0    7 
1    0    1    0    6 
2    1    0    1    7 
3    0    0    0    7 
4    0    0    1    4 

    person_3_type person_3_type 
0    5    5 
1    7    7 
2    7    7 
3    7    7 
4    7    7

讓other_table

other_table = pd.Series({4:10, 5:20, 6:30, 7:40}) 

4 10 
5 20 
6 30 
7 40 
dtype: int64

篩選出的狀態和類型的列到自己的dataframes

df_status = df.filter(like = 'status') 
df_type = df.filter(like = 'type')

製作查找表

df_type_lookup = df_type.applymap(lambda x: other_table.loc[x]).values

應用矩陣乘法和求和跨行。

df['A'] = np.sum((df_status == 0).values * df_type_lookup, 1) 
df['B'] = np.sum((df_status == 1).values * df_type_lookup, 1)

輸出

person_1_status person_2_status person_3_status person_1_type \ 
0    0    0    1    7 
1    0    1    0    4 
2    0    1    1    7 
3    0    1    0    6 
4    0    0    1    5 

    person_2_type person_3_type A B 
0    7    5 80 20 
1    6    4 20 30 
2    5    5 40 40 
3    6    4 40 30 
4    7    5 60 20

來源

2016-12-20 01:37:31

正是我想要的，謝謝！我不知道過濾命令，所以...和lambda函數使它更容易。非常感激：）。 – shishy

考慮數據框df

mux = pd.MultiIndex.from_product([['status', 'type'], ['p%i' % i for i in range(1, 6)]]) 
data = np.concatenate([np.random.choice((0, 1), (10, 5)), np.random.rand(10, 5)], axis=1) 
df = pd.DataFrame(data, columns=mux) 
df

這是結構化的，我們可以爲type == 1

做的方式

df.status.mul(df.type).sum(1) 

0 0.935290 
1 1.252478 
2 1.354461 
3 1.399357 
4 2.102277 
5 1.589710 
6 0.434147 
7 2.553792 
8 1.205599 
9 1.022305 
dtype: float64

和type == 0

df.status.rsub（1）.mul（df.type）的.sum（1）

0 1.867986 
1 1.068045 
2 0.653943 
3 2.239459 
4 0.214523 
5 0.734449 
6 1.291228 
7 0.614539 
8 0.849644 
9 1.109086 
dtype: float64

你可以在此您列格式使用以下代碼

df.columns = df.columns.str.split('_', expand=True) 
df = df.swaplevel(0, 1, 1)

來源

2016-12-20 01:51:29 piRSquared

多列的熊貓計算和給出多個條件

回答

相關問題