0
我有一個樣品熊貓數據幀如下:查找列的值替換吶
df = pd.DataFrame({
'notes': pd.Series(['meth cook makes meth with purity of over 96%', 'meth cook is also called Heisenberg', 'meth cook has cancer', 'he is known as the best meth cook', 'Meth Dealer added chili powder to his batch', 'Meth Dealer learned to make the best meth', 'everyone goes to this Meth Dealer for best shot', 'girlfriend of the meth dealer died', 'this lawyer is a people pleasing person', 'cinnabon has now hired the lawyer as a baker', 'lawyer had to take off in the end', 'lawyer has a lot of connections who knows other guy']),
'name': pd.Series([np.nan, 'Walter White', np.nan, np.nan, np.nan, np.nan, 'Jessie Pinkman', np.nan, 'Saul Goodman', np.nan, np.nan, np.nan]),
'occupation': pd.Series(['meth cook', np.nan, np.nan, np.nan, np.nan, np.nan, 'meth dealer', np.nan, np.nan, 'lawyer', np.nan, np.nan])
})
name notes occupation
NaN meth cook makes meth with purity of over 96% meth cook
Walter White meth cook is also called Heisenberg NaN
NaN meth cook has cancer NaN
NaN he is known as the best meth cook NaN
NaN Meth Dealer added chili powder to his batch NaN
NaN Meth Dealer learned to make the best meth NaN
Jessie Pinkman everyone goes to this Meth Dealer for best shot meth dealer
NaN girlfriend of the meth dealer died NaN
Saul Goodman this lawyer is a people pleasing person NaN
NaN cinnabon has now hired the lawyer as a baker lawyer
NaN lawyer had to take off in the end NaN
NaN lawyer has a lot of connections who knows other guy NaN
因此,我們有一個總的三個職業:
pd.unique(df.occupation)
array(['meth cook', 'meth dealer', 'lawyer'], dtype=object)
我會喜歡在「筆記」列中查找「職業」值,並且如果職業中已經存在某個值,則將該行的所有缺失值替換爲匹配的職業。 例如:在第二行,職業缺失。但是,如果我們查找('meth cook','meth dealer','lawyer')的'notes'欄,我們會看到'meth cook'存在於第二行的'notes'列中。因此,缺少職業應該充滿「甲基廚師
我想:
df.occupation[df.occupation.notnull()].apply(lambda x: df.occupation.str.extract('('+x+')'))
但是,它並沒有給我我想要的結果。我想看看結果如下:
name notes occupation
NaN meth cook makes meth with purity of over 96% meth cook
Walter White meth cook is also called Heisenberg meth cook
NaN meth cook has cancer meth cook
NaN he is known as the best meth cook meth cook
NaN Meth Dealer added chili powder to his batch meth dealer
NaN Meth Dealer learned to make the best meth meth dealer
Jessie Pinkman everyone goes to this Meth Dealer for best shot meth dealer
NaN girlfriend of the meth dealer died meth dealer
Saul Goodman this lawyer is a people pleasing person lawyer
NaN cinnabon has now hired the lawyer as a baker lawyer
NaN lawyer had to take off in the end lawyer
NaN lawyer has a lot of connections who knows other guy lawyer
有人可以提供任何投入?
嘿感謝!我在等待的時候做了以下事情:'occup_list = list(pd.unique(df.occupation)) occupation_list = [x for occupation_list如果str(x)!='nan'] df ['occupation']。fillna (df.loc [pd.isnull(df.occupation)] ['notes']。apply(lambda x:filter(lambda occ:re.search(occ.lower(),x.lower()),occupation_list) 0]),inplace = True) '這似乎工作。但是,當我將它應用於我的實際數據集時,出現以下錯誤:列出索引。我不確定filter()是如何工作的。 –