給定一個數據幀建立像這樣:
import pandas as pd
import numpy as np
df = pd.DataFrame({'dx1':[25041,25041,25041],
'dx2':[40391,40391,40391],
'dx3':[np.nan,25081,42822],
'dx4':[np.nan,np.nan,99681],
'dxpoa1':['Y','N','1'],
'dxpoa2':['E','W','N'],
'dxpoa3':[np.nan,'U','Y'],
'dxpoa4':[np.nan,np.nan,'Y']})
其中給出:
dx1 dx2 dx3 dx4 dxpoa1 dxpoa2 dxpoa3 dxpoa4
0 25041 40391 NaN NaN Y E NaN NaN
1 25041 40391 25081 NaN N W U NaN
2 25041 40391 42822 99681 1 N Y Y
定義一個實現替代規則的函數。這是用零替換目標列時在基準列中的值不是「Y」,「W」,「1」或「E」,正如我從描述中可以理解:
def subfunc(row,col_reference=None,col_target=None):
if not row[col_reference] in ['Y','W','1','E']:
row[col_target] = 0
return row
然後遍歷應用subfunc在每個行的列名:
for colname in df.columns:
if 'dxpoa' in colname:
colid = colname.split('dxpoa')[1]
df = df.apply(subfunc,axis=1,col_reference=colname,col_target='dx'+colid)
結果在數據幀
dx1 dx2 dx3 dx4 dxpoa1 dxpoa2 dxpoa3 dxpoa4
0 25041 40391 0 0 Y E NaN NaN
1 0 40391 0 0 N W U NaN
2 25041 0 42822 99681 1 N Y Y
你已經試過了嗎?你在那裏遇到什麼問題嗎? –
@AnandSKumar:我可以更改某一列的值,但不知道如何遍歷行或列。我正在嘗試使用iterrow()函數。但對python知之甚少。 – Sanoj