考慮使用轉換器 ARG在pandas.read_csv()
,你通過列號引用轉換函數的字典。下面檢查無數據串並有條件地替換np.nan
值,否則保留原樣:
import numpy as np
import pandas as pd
c_fct = lambda x : float(x if 'No data' not in x else np.nan)
convertdict = {1:c_fct, 2:c_fct, 3:c_fct, 4:c_fct, 5:c_fct}
df = pd.read_csv('Input.csv', converters=convertdict)
輸入CSV
ID Col1 Col2 Col3 Col4 Col5
TGG 0.634516647 0.900464347 0.998505978 0.170422713 0.893340128
GRI No data 0.045915333 0.718398939 0.924813864 No data
NLB 0.921127268 0.614460813 0.677857676 0.343612947 0.559437744
SEI 0.081852313 No data 0.890816385 0.943313021 0.874857844
LOY 0.632556715 0.362855866 0.038702448 0.253762859 No data
OPL 0.375088582 0.268283238 0.761552111 0.589547625 0.192223208
CTK 0.349464541 0.844718987 No data 0.841439909 0.898093646
EUE 0.629784261 0.982589843 0.315670377 0.832419474 0.950044814
JLP 0.543942659 0.988380305 0.417191823 0.823857176 0.542514099
RHK 0.728053447 0.521816539 0.402523435 No data 0.558226706
AEM 0.005495116 0.715363776 0.075508356 0.959119268 0.844730368
VLQ 0.21146319 0.558208766 0.501769554 0.226539046 0.795861461
MDB 0.230514689 0.223163664 No data 0.324636384 0.700716246
LPH 0.853433224 0.582678173 0.633109347 0.432191426 No data
PEP 0.41096305 No data .627776178 0.482359278 0.179863537
UQK 0.252598809 0.497517585 0.276060768 No data 0.087985623
KGJ 0.033985585 0.033702088 anNo data 0.286682709 0.543349787
JUQ 0.25971543 0.142067155 0.597985191 0.219841249 0.699822866
NYW No data 0.17187907 0.157413049 0.209011772 0.592824483
輸出
print(df)
# ID Col1 Col2 Col3 Col4 Col5
# 0 TGG 0.634517 0.900464 0.998506 0.170423 0.893340
# 1 GRI NaN 0.045915 0.718399 0.924814 NaN
# 2 NLB 0.921127 0.614461 0.677858 0.343613 0.559438
# 3 SEI 0.081852 NaN 0.890816 0.943313 0.874858
# 4 LOY 0.632557 0.362856 0.038702 0.253763 NaN
# 5 OPL 0.375089 0.268283 0.761552 0.589548 0.192223
# 6 CTK 0.349465 0.844719 NaN 0.841440 0.898094
# 7 EUE 0.629784 0.982590 0.315670 0.832419 0.950045
# 8 JLP 0.543943 0.988380 0.417192 0.823857 0.542514
# 9 RHK 0.728053 0.521817 0.402523 NaN 0.558227
# 10 AEM 0.005495 0.715364 0.075508 0.959119 0.844730
# 11 VLQ 0.211463 0.558209 0.501770 0.226539 0.795861
# 12 MDB 0.230515 0.223164 NaN 0.324636 0.700716
# 13 LPH 0.853433 0.582678 0.633109 0.432191 NaN
# 14 PEP 0.410963 NaN 0.627776 0.482359 0.179864
# 15 UQK 0.252599 0.497518 0.276061 NaN 0.087986
# 16 KGJ 0.033986 0.033702 NaN 0.286683 0.543350
# 17 JUQ 0.259715 0.142067 0.597985 0.219841 0.699823
# 18 NYW NaN 0.171879 0.157413 0.209012 0.592824
print(df.types)
# ID object
# Col1 float64
# Col2 float64
# Col3 float64
# Col4 float64
# Col5 float64
# dtype: object
原因是爲列被導入爲對象,因爲'沒有數據'是一個字符串文字。沒有它,列導入爲浮動。替換不會轉換數據類型。所以在替換後用'astype'進行轉換。 – Parfait