我使用DataFrame.query()
來查找行,並且遇到了一個問題,我只能在從CSV加載數據時進行復制。如果我在純Python中創建了我認爲是相同的DataFrame,則query()按預期工作。用read_csv()創建的DataFrame給出意外的查詢()結果
這是數據的CSV:
,ASK_PRICE,ASK_QTY,BID_PRICE,BID_QTY
2016-06-17 16:38:00.043,104.258,50.0,104.253,100.0
2016-06-17 16:38:00.043,104.259,100.0,104.253,100.0
2016-06-17 16:38:02.978,104.259,100.0,104.254,50.0
2016-06-17 16:38:03.999,104.259,100.0,104.253,50.0
2016-06-17 16:38:03.999,104.259,100.0,104.251,150.0
2016-06-17 16:38:04.001,104.259,100.0,104.251,100.0
而這是表示該問題的示例腳本:
#!/usr/bin/env python
import pandas as pd
import numpy as np
from datetime import datetime
timestamp = [
datetime.strptime('2016-06-17 16:38:00.043', '%Y-%m-%d %H:%M:%S.%f'),
datetime.strptime('2016-06-17 16:38:00.043', '%Y-%m-%d %H:%M:%S.%f'),
datetime.strptime('2016-06-17 16:38:02.978', '%Y-%m-%d %H:%M:%S.%f'),
datetime.strptime('2016-06-17 16:38:03.999', '%Y-%m-%d %H:%M:%S.%f'),
datetime.strptime('2016-06-17 16:38:03.999', '%Y-%m-%d %H:%M:%S.%f'),
datetime.strptime('2016-06-17 16:38:04.001', '%Y-%m-%d %H:%M:%S.%f')
]
bid_price = [ 104.253, 104.253, 104.254, 104.253, 104.251, 104.251 ]
bid_qty = [ 100.0, 100.0, 50.0, 50.0, 150.0, 100.0 ]
ask_price = [ 104.258, 104.259, 104.259, 104.259, 104.259, 104.259 ]
ask_qty = [ 50.0, 100.0, 100.0, 100.0, 100.0, 100.0 ]
df1 = pd.DataFrame(index=timestamp, data={'BID_PRICE': bid_price,
'BID_QTY': bid_qty, 'ASK_PRICE': ask_price, 'ASK_QTY': ask_qty})
df2 = pd.read_csv('in.csv', index_col=0, skip_blank_lines=True)
df2.index = pd.to_datetime(df2.index)
print df1
print df2
print
print df1.index
print df2.index
print
print df1.columns
print df2.columns
print
df1.reset_index(inplace=True)
df2.reset_index(inplace=True)
print df1
print df2
print
df1m = df1.query('(BID_PRICE == 104.254) and (BID_QTY >= 50)').tail(1)
df2m = df2.query('(BID_PRICE == 104.254) and (BID_QTY >= 50)').tail(1)
print df1m
print df2m
在CSV的查詢創建數據幀失敗。據我可以看到它是相同的數據,索引和列類型,這兩個數據框之間有什麼區別?
什麼數據框的樣子像在調試?打印數據幀可能不會顯示它,因爲該對象可能有一個__str __,它以掩蓋問題的方式格式化數據。 –