問題是與NaN
,所以一個可能的解決方案是fillna
某些值未在柱name
第一:
#if need select by position use iloc
def score(x):
print (x)
if x.iloc[0] in x.iloc[1]:
return 1
elif x.iloc[0] in x.iloc[2]:
return 0
sh['label']= sh.fillna('tmp').apply(score, axis=1)
print(sh)
#if need select by column name
def score(x):
#print (x)
if x['name'] in x['yes']:
return 1
elif x['name'] in x['no']:
return 0
sh['label']= sh.fillna('tmp').apply(score, axis=1)
print(sh)
樣品:
sh = pd.DataFrame({
'name': ['b','b','b'],
'yes': [('b',),('a',),np.nan],
'no':[np.nan, np.nan, ('a','b')]
})
print(sh)
name no yes
0 b NaN (b,)
1 b NaN (a,)
2 b (a, b) NaN
def score(x):
#print (x)
if x['name'] in x['yes']:
return 1
elif x['name'] in x['no']:
return 0
sh['label']= sh.fillna('tmp').apply(score, axis=1)
print(sh)
name no yes label
0 b NaN (b,) 1.0
1 b NaN (a,) NaN
2 b (a, b) NaN 0.0
但隨着代碼問題以上是如果值在列yes
和0中。一個可能的解決方案是創建2
新列與boolean
True
和False
,然後轉換爲int
(1
,0
)由astype
:
sh = pd.DataFrame({
'name': ['b','b','b'],
'yes': [('b',),('a',),np.nan],
'no':[np.nan, ('b',), ('a','b')]
})
print(sh)
name no yes
0 b NaN (b,)
1 b (b,) (a,)
2 b (a, b) NaN
sh['label-yes']= sh.fillna('tmp').apply(lambda x: x['name'] in x['yes'], axis=1)
sh['label-no']= sh.fillna('tmp').apply(lambda x: x['name'] in x['no'], axis=1)
sh[['label-yes', 'label-no']] = sh[['label-yes', 'label-no']].astype(int)
print(sh)
name no yes label-yes label-no
0 b NaN (b,) 1 0
1 b (b,) (a,) 0 1
2 b (a, b) NaN 0 1
什麼是預期的輸出? – Zero
爲什麼'== True'?... – TigerhawkT3
'x'可能不包含您的期望。把'print(x [2])放在第二個'if'之前。 – Barmar