我有一個很大的數據框,有11列,我想用零替換NaN值,如果另一組列中的每個值都是NaN,否則將不爲空的數字轉換爲整數。我以下面的方式這樣做,但只有8000個觀測值需要很長時間才能完成(儘管它的確如此)。我認爲這場耗時近20分鐘:有條件的NaN填充
lt = ['lost_time_a', 'lost_time_b', 'lost_time_c', 'lost_time_d', 'lost_time_e', 'lost_time_f', 'lost_time_g',
'lost_time_h', 'lost_time_i', 'lost_time_j', 'ttl']
ht = ['hour1', 'hour2', 'hour3', 'hour4', 'hour5', 'hour6', 'hour7', 'hour8', 'hour9', 'hour10', 'hour11',
'hour12', 'hour13', 'hour14', 'hour15']
for row in FinalDF.index:
if not all([pd.isnull(FinalDF.loc[row, col]) for col in ht]):
for Col_ in lt:
val = FinalDF.loc[row, Col_]
if pd.isnull(val):
FinalDF.loc[row, Col_] = 0
else:
FinalDF.loc[row, Col_] = int(val)
所有幫助表示讚賞
下面是一些測試數據給你的鄉親:
import pandas as pd
import numpy as np
from numpy import nan as NA
FinalDF = pd.DataFrame({'hour1' : [NA, NA, NA, 70, 60],
'hour2' : [100, 50, NA, 120, 100],
'hour3' : [120, 80, NA, 130, 100],
'hour4' : [140, 90, NA, 120, 70],
'hour5' : [130, 200, NA, NA, NA],
'hour6' : [NA, NA, NA, 70, 60],
'hour7' : [100, 50, NA, 120, 100],
'hour8' : [120, 80, NA, 130, 100],
'hour9' : [140, 90, NA, 120, 70,],
'hour10' :[130, 200, NA, NA, NA],
'hour11' : [NA, NA, NA, 70, 60],
'hour12' : [100, 50, NA, 120, 100],
'hour13' : [120, 80, NA, 130, 100],
'hour14' : [140, 90, NA, 120, 70],
'hour15' : [130, 200, NA, NA, NA],
'lost_time_a' : [NA, NA, NA, NA, NA],
'lost_time_b' : [NA, 1.0, NA, NA, 4.1],
'lost_time_c' : [NA, NA, NA, NA, 10.1],
'lost_time_d' : [1, 2.3, NA, NA, 1],
'lost_time_e' : [NA, NA, NA, NA, NA],
'lost_time_f' : [NA, 1.0, NA, NA, 4.1],
'lost_time_g' : [NA, NA, NA, NA, 10.1],
'lost_time_h' : [1, 2.3, NA, NA, 1],
'lost_time_i' : [NA, NA, NA, NA, NA],
'lost_time_j' : [NA, 1.0, NA, NA, 4.1],
'ttl' : [NA, NA, NA, NA, NA]})
的部分輸出(失去的時間變量)
Out[18]:
lost_time_a lost_time_b lost_time_c lost_time_d lost_time_e
0 0 0 0 1 0
1 0 1 0 2 0
2 NaN NaN NaN NaN NaN
3 0 0 0 0 0
4 0 4 10 1 0
你可以製作一個獨立的例子,人們可以複製和粘貼測試? – DSM
已添加與發佈的代碼段相關的測試數據。 –