2017-10-13 114 views
1

我有一個csv文件,我讀爲pd.read_csv(文件),我想只獲得那些值大於零的行。過濾數據幀列值大於零?

數據框有一些空的單元格和一些負值以及一些exp數字,例如-1.72E + 10。

Time    A  B  C  D  E  F   G 
9/8/2017 8:40 1.29 0.27 1.78 0.23 0.33 0.05 -13.72 
9/8/2017 9:00 1.28 0.26 1.78 0.22 0.35 0.02 -13.59 
9/8/2017 9:20 1.43       
9/8/2017 9:40 1.44 0.29 1.93 0.25 0.28 0.01 -13.92 
9/8/2017 10:00 1.36 0.27 1.84 0.23 0.31 0.02 -13.77 
9/8/2017 10:20 1.38 0.27 1.89 0.23 0.31 0.01 -13.83 
9/8/2017 10:40  -1.72E+10 -1.72E+10 -1.72E+10 -1.72E+10 -1.72E+10 -1.72E+10 
9/8/2017 11:00 1.4 0.28 1.88 0.24 0.28 0.02 -13.92 
9/8/2017 11:20 1.43 0.28 1.92 0.24 0.29 0.02 -13.83 

每當我運行代碼它不會過濾這些數據。

df = df[df > 0] 

列的類型是str的,而不是numpy.float64

有人能告訴我什麼問題?

我要過濾整個數據框行其值大於0

回答

0

graeter我認爲你需要any用於檢查至少一個True

df = df[(df > 0).any(axis=1)] 

或者all進行檢查,如果所有True小號:

df = df[(df > 0).all(axis=1)] 

#last row and first numeric column was modify for no negative values 
print (df) 
      Time    A    B    C    D \ 
0 9/8/2017 8:40 1.290000e+00 2.700000e-01 1.780000e+00 2.300000e-01 
1 9/8/2017 9:00 1.280000e+00 2.600000e-01 1.780000e+00 2.200000e-01 
2 9/8/2017 9:20 1.430000e+00   NaN   NaN   NaN 
3 9/8/2017 9:40 1.440000e+00 2.900000e-01 1.930000e+00 2.500000e-01 
4 9/8/2017 10:00 1.360000e+00 2.700000e-01 1.840000e+00 2.300000e-01 
5 9/8/2017 10:20 1.380000e+00 2.700000e-01 1.890000e+00 2.300000e-01 
6 9/8/2017 10:40 1.720000e+10 -1.720000e+10 -1.720000e+10 -1.720000e+10 
7 9/8/2017 11:00 1.400000e+00 2.800000e-01 1.880000e+00 2.400000e-01 
8 9/8/2017 11:20 1.430000e+00 2.800000e-01 1.920000e+00 2.400000e-01 

       E    F  G 
0 3.300000e-01 5.000000e-02 -13.72 
1 3.500000e-01 2.000000e-02 -13.59 
2   NaN   NaN NaN 
3 2.800000e-01 1.000000e-02 -13.92 
4 3.100000e-01 2.000000e-02 -13.77 
5 3.100000e-01 1.000000e-02 -13.83 
6 -1.720000e+10 -1.720000e+10 NaN 
7 2.800000e-01 2.000000e-02 -13.92 
8 2.900000e-01 2.000000e-02 13.83 


df1 = df[(df > 0).all(axis=1)] 
print (df1) 
      Time  A  B  C  D  E  F  G 
8 9/8/2017 11:20 1.43 0.28 1.92 0.24 0.29 0.02 13.83 

df1 = df.loc[:, (df > 0).all()] 
print (df1) 
      Time    A 
0 9/8/2017 8:40 1.290000e+00 
1 9/8/2017 9:00 1.280000e+00 
2 9/8/2017 9:20 1.430000e+00 
3 9/8/2017 9:40 1.440000e+00 
4 9/8/2017 10:00 1.360000e+00 
5 9/8/2017 10:20 1.380000e+00 
6 9/8/2017 10:40 1.720000e+10 
7 9/8/2017 11:00 1.400000e+00 
8 9/8/2017 11:20 1.430000e+00 

EDIT1:

對於皈依float一切都沒有列Time

cols = df.columns.difference(['Time']) 
df[cols] = df[cols].astype(float) 
print (df.dtypes) 
Time  object 
A  float64 
B  float64 
C  float64 
D  float64 
E  float64 
F  float64 
G  float64 
dtype: object 

df1 = df.loc[:, (df > 0).all()] 
print (df1) 
      Time    A 
0 9/8/2017 8:40 1.290000e+00 
1 9/8/2017 9:00 1.280000e+00 
2 9/8/2017 9:20 1.430000e+00 
3 9/8/2017 9:40 1.440000e+00 
4 9/8/2017 10:00 1.360000e+00 
5 9/8/2017 10:20 1.380000e+00 
6 9/8/2017 10:40 1.720000e+10 
7 9/8/2017 11:00 1.400000e+00 
8 9/8/2017 11:20 1.430000e+00 
+0

但這並非過濾數據幀。我仍然得到負面的價值。 – Dheeraj

+0

我覺得'all'應該可以工作。 – jezrael

+0

我想單獨過濾列 – Dheeraj