2016-11-25 91 views
2
target_value  title people  start end twitter_map 
0 AGE_13_TO_17  13 to 17  1  13 17 AGE_13_TO_17 
1 AGE_13_TO_24  13 to 24  NaN  13 24   NaN 
2 AGE_13_TO_34  13 to 34  NaN  13 34   NaN 
3 AGE_13_TO_49  13 to 49  NaN  13 49   NaN 
4 AGE_13_TO_54  13 to 54  NaN  13 54   NaN 
5 AGE_OVER_13  Age Over 13 NaN  13 -   NaN 
6 AGE_18_TO_24  18 to 24  7  18 24 AGE_18_TO_24 
7 AGE_18_TO_54  18 to 54  NaN  18 54   NaN 
8 AGE_OVER_18  Age Over 18 NaN  18 -   NaN 
9 AGE_21_TO_34  21 to 34  NaN  21 34   NaN 
10 AGE_21_TO_49  21 to 49  NaN  21 49   NaN 
11 AGE_21_TO_54  21 to 54  NaN  21 54   NaN 
12 AGE_25_TO_34  25 to 34  34  25 34 AGE_25_TO_34 
13 AGE_25_TO_49  25 to 49  NaN  25 49   NaN 
14 AGE_OVER_25 Age Over 25 NaN  25 -   NaN 
15 AGE_35_TO_44  35 to 44  15  35 44 AGE_35_TO_44 
16 AGE_OVER_35 Age Over 35 NaN  35 -   NaN 
17 AGE_45_TO_54  45 to 54  1  45 54 AGE_45_TO_54 
18 AGE_OVER_50 Age Over 50 NaN  50 -   NaN 
19 AGE_55_TO_64  55 to 64  3  55 64 AGE_55_TO_64 
20 AGE_OVER_65   65+  6  65 - AGE_OVER_65 
21   None  All Ages NaN All Ages -   NaN 

因此,我有如上所示的這個數據框,其中包含一些年齡開始和年齡結束的值。但是有一些重疊的年齡段。我需要的基礎上,專門值欄填寫正確的在熊貓數據框中獲取重疊年齡段的年齡總和

料到產出的前兩行

target_value  title people  start end twitter_map 
0 AGE_13_TO_17  13 to 17  1  13 17 AGE_13_TO_17 
1 AGE_13_TO_24  13 to 24  8  13 24   NaN 
+0

前三欄已經加入了與過去的三列 –

+2

什麼是預期的輸出是什麼呢? –

+0

我在前兩行給出了一個示例...我希望它解釋 –

回答

2

我將在一個簡單的例子工作:

people start end 
    1 13 17 
    NaN 13 24 
    NaN 13 34 
    NaN 13 - 
    7 18 24 
    NaN 18 - 
    34 25 34 

首先更換-與無窮大,將所有浮動:

import numpy as np 
df = df.replace({'-': np.inf}).astype(float) 

然後選擇其中給出的「人」的數列,這將是輸入:

df_input = df.dropna() 

現在定義以下功能:

def func(row): 
    return df_input.loc[ 
      (df_input['start'] >= row['start']) & (df_input['end'] <= row['end']), 
      'people' 
     ].sum() 

對於在每一行數據框,它將輸入中滿足定義年齡段條件的所有數字相加(這是無窮大有用的地方)。

最後應用功能:

In [36]: df.apply(func, axis=1) 
Out[36]: 
0  1.0 
1  8.0 
2 42.0 
3 42.0 
4  7.0 
5 41.0 
6 34.0 
+0

謝謝!有一次,我更快;) – IanS

+0

是的,我知道在說什麼...... – jezrael