用read_csv（）創建的DataFrame給出意外的查詢（）結果

我使用DataFrame.query()來查找行，並且遇到了一個問題，我只能在從CSV加載數據時進行復制。如果我在純Python中創建了我認爲是相同的DataFrame，則query（）按預期工作。用read_csv（）創建的DataFrame給出意外的查詢（）結果

這是數據的CSV：

,ASK_PRICE,ASK_QTY,BID_PRICE,BID_QTY 
2016-06-17 16:38:00.043,104.258,50.0,104.253,100.0 
2016-06-17 16:38:00.043,104.259,100.0,104.253,100.0 
2016-06-17 16:38:02.978,104.259,100.0,104.254,50.0 
2016-06-17 16:38:03.999,104.259,100.0,104.253,50.0 
2016-06-17 16:38:03.999,104.259,100.0,104.251,150.0 
2016-06-17 16:38:04.001,104.259,100.0,104.251,100.0

而這是表示該問題的示例腳本：

#!/usr/bin/env python 
import pandas as pd 
import numpy as np 
from datetime import datetime 

timestamp = [ 
     datetime.strptime('2016-06-17 16:38:00.043', '%Y-%m-%d %H:%M:%S.%f'), 
     datetime.strptime('2016-06-17 16:38:00.043', '%Y-%m-%d %H:%M:%S.%f'), 
     datetime.strptime('2016-06-17 16:38:02.978', '%Y-%m-%d %H:%M:%S.%f'), 
     datetime.strptime('2016-06-17 16:38:03.999', '%Y-%m-%d %H:%M:%S.%f'), 
     datetime.strptime('2016-06-17 16:38:03.999', '%Y-%m-%d %H:%M:%S.%f'), 
     datetime.strptime('2016-06-17 16:38:04.001', '%Y-%m-%d %H:%M:%S.%f') 
     ] 
bid_price = [ 104.253, 104.253, 104.254, 104.253, 104.251, 104.251 ] 
bid_qty = [ 100.0, 100.0, 50.0, 50.0, 150.0, 100.0 ] 
ask_price = [ 104.258, 104.259, 104.259, 104.259, 104.259, 104.259 ] 
ask_qty = [ 50.0, 100.0, 100.0, 100.0, 100.0, 100.0 ] 

df1 = pd.DataFrame(index=timestamp, data={'BID_PRICE': bid_price, 
    'BID_QTY': bid_qty, 'ASK_PRICE': ask_price, 'ASK_QTY': ask_qty}) 

df2 = pd.read_csv('in.csv', index_col=0, skip_blank_lines=True) 
df2.index = pd.to_datetime(df2.index) 

print df1 
print df2 
print 
print df1.index 
print df2.index 
print 
print df1.columns 
print df2.columns 
print 
df1.reset_index(inplace=True) 
df2.reset_index(inplace=True) 

print df1 
print df2 
print 

df1m = df1.query('(BID_PRICE == 104.254) and (BID_QTY >= 50)').tail(1) 
df2m = df2.query('(BID_PRICE == 104.254) and (BID_QTY >= 50)').tail(1) 
print df1m 
print df2m

在CSV的查詢創建數據幀失敗。據我可以看到它是相同的數據，索引和列類型，這兩個數據框之間有什麼區別？

來源

2016-06-24 Luke Bigum

什麼數據框的樣子像在調試？打印數據幀可能不會顯示它，因爲該對象可能有一個__str __，它以掩蓋問題的方式格式化數據。 –

這是一個well known problem of comparing float values

嘗試這樣的：

In [70]: df2.query('(abs(BID_PRICE - 104.254) < 0.000001) and (BID_QTY >= 50)') 
Out[70]: 
         ASK_PRICE ASK_QTY BID_PRICE BID_QTY 
2016-06-17 16:38:02.978 104.259 100.0 104.254  50.0

代替：

In [72]: df2.query('(BID_PRICE == 104.254) and (BID_QTY >= 50)') 
Out[72]: 
Empty DataFrame 
Columns: [ASK_PRICE, ASK_QTY, BID_PRICE, BID_QTY] 
Index: []

簡單的例子：

In [73]: 2.2 * 3.0 == 6.6 
Out[73]: False 

In [74]: 3.3 * 2.0 == 6.6 
Out[74]: True

來源

2016-06-24 14:50:59 MaxU

我不知道答案，但它SE ems與索引列相關。我運行了代碼的簡化版本，並按預期工作。

#!/usr/bin/env python 

import pandas as pd 

timestamp = [1, 2, 3, 4, 5, 6] 
bid_price = [104, 105, 106, 107, 107, 107] 
bid_qty = [100.0, 100.0, 50.0, 50.0, 150.0, 100.0] 

df1 = pd.DataFrame(index=timestamp, 
        data={'BID_PRICE': bid_price, 'BID_QTY': bid_qty}) 

df2 = pd.read_csv('in.csv', index_col=0, skip_blank_lines=True) 

print(df1) 
print(df2) 

df1m = df1.query('(BID_PRICE == 107) and (BID_QTY >= 50)').tail(1) 
df2m = df2.query('(BID_PRICE == 107) and (BID_QTY >= 50)').tail(1) 

print("Result 1: {}".format(df1m)) 
print("Result 2: {}".format(df2m))

---------------- in.csv文件內容-----------

Index,BID_PRICE,BID_QTY 
1, 104, 100.0 
2, 105, 100.0 
3, 106, 50.0 
4, 107, 50.0 
5, 107, 150.0 
6, 107, 100.0

來源

2016-06-24 15:15:10

用read_csv（）創建的DataFrame給出意外的查詢（）結果

回答

相關問題