2017-09-12 157 views
2

我有〜70列的數據集,看起來像這樣:計數非空的大熊貓DF

ID_number Meeting1 Meeting2 Meeting3 Meeting4 Meeting5 Comments … 
123456789 9/15/2015 1/8/2016 4/27/2016 NaN   NaN   text text … 
987654321 9/22/2016 NaN   2/25/2017 NaN   NaN   text text … 
456789123 10/1/2015 11/30/2015 NaN   NaN   NaN   text text … 

我想創建一個額外的列(meeting_count)具有非空值的計數對於各個ID號碼的列Meeting1-Meeting5。

通常我會使用SQL和做類似:

select 
    Meeting1, 
    Meeting2, 
    Meeting3, 
    Meeting4, 
    Meeting5, 
    (
     select count(*) 
     from (values (Meeting1), (Meeting2), (Meeting3), (Meeting4), (Meeting5)) as v(col) 
     where v.col is not null 
    ) as meeting_count 
from Table 

但是,如果有一個相當簡單的方法來做到這一點在Python,我寧願做。

回答

2

試試這個

df['meeting_count'] = df.filter(regex=r'^Meeting').notnull().sum(axis=1) 

演示:

In [8]: df 
Out[8]: 
      ID_number Meeting1 Meeting2 Meeting3 Meeting4 Meeting5 Comments 
123456789 9/15/2015 1/8/2016 4/27/2016  NaN  NaN  text  text 
987654321 9/22/2016   NaN 2/25/2017  NaN  NaN  text  text 
456789123 10/1/2015 11/30/2015  NaN  NaN  NaN  text  text 

In [9]: df['meeting_count'] = df.filter(regex=r'^Meeting').notnull().sum(axis=1) 

In [10]: df 
Out[10]: 
      ID_number Meeting1 Meeting2 Meeting3 Meeting4 Meeting5 Comments meeting_count 
123456789 9/15/2015 1/8/2016 4/27/2016  NaN  NaN  text  text    3 
987654321 9/22/2016   NaN 2/25/2017  NaN  NaN  text  text    2 
456789123 10/1/2015 11/30/2015  NaN  NaN  NaN  text  text    2