2013-08-01 69 views
1

我有一個排序的CSV數據集,我想作爲一個多指標使用四列,其中包括兩個日期時間列:多指標和日期時間

Alex,Beta,2011-03-01 00:00:00,2011-03-03 00:00:00,A,8,11.4 
Alex,Beta,2011-03-03 00:00:00,2011-03-05 00:00:00,B,10,17.2 
Alex,Beta,2011-03-05 00:00:00,2011-03-07 00:00:00,A,3,11.4 
Alex,Beta,2011-03-07 00:00:00,2011-03-09 00:00:00,B,7,17.2 
Alex,Orion,2011-03-02 00:00:00,2011-03-04 00:00:00,A,4,11.4 
Alex,Orion,2011-03-03 00:00:00,2011-03-05 00:00:00,B,6,17.2 
Alex,Orion,2011-03-04 00:00:00,2011-03-06 00:00:00,A,3,11.4 
Alex,Orion,2011-03-05 00:00:00,2011-03-07 00:00:00,B,11,17.2 
Alex,ZZYZX,2011-03-02 00:00:00,2011-03-05 00:00:00,A,10,11.4 
Alex,ZZYZX,2011-03-04 00:00:00,2011-03-07 00:00:00,A,15,11.4 
Alex,ZZYZX,2011-03-06 00:00:00,2011-03-09 00:00:00,B,20,17.2 
Alex,ZZYZX,2011-03-08 00:00:00,2011-03-11 00:00:00,B,5,17.2 

我可以read_csv加載這個分層次顯示數據框。但索引它是另一回事。最近我可以告訴大家,熊貓不喜歡在這裏使用DateTime索引。如果我註釋掉index_col中的DateTime標籤以及索引語句(df.loc)中的對應條目,它就可以正常工作。

任何想法?

#!/usr/bin/env python 

import numpy as np 
import pandas as pd 

pd.set_option('display.height',   400) 
pd.set_option('display.width',    400) 
pd.set_option('display.max_rows',   1000) 
pd.set_option('display.max_columns',  30) 
pd.set_option('display.line_width',  200) 

try: 
    df = pd.read_csv(
     './sales.csv', 
     header =       None, 
     na_values =     ['NULL'], 
     names = [ 
      'salesperson', 
      'customer', 
      'invoice_date', 
      'ship_date', 
      'product', 
      'quantity', 
      'price', 
     ], 
     index_col = [ 
      'salesperson', 
      'customer', 
      'invoice_date', 
      'ship_date', 
     ], 
     parse_dates = [ 
      'invoice_date', 
      'ship_date', 
     ], 
    ) 
except Exception as e: 
    print(e) 

try: 
    print(df) 
    print(df.loc[(
     'Alex',     # salesperson 
     'ZZYZX',     # customer 
     '2011-03-02 00:00:00',  # invoice_date 
     '2011-03-05 00:00:00',  # ship_date 
    )]) 
except Exception as e: 
    print(e) 

回答

1

它似乎工作正常,即時獲得適當的df。雖然我會盡量避免在每個列表中的空條目。

如果使用parse_dates你也應該訪問這些列有適當datetime對象:

df.loc[('Alex','ZZYZX',pd.datetime(2011,3,2),pd.datetime(2011,3,5))] 

product  A 
quantity  10 
price  11.4 
Name: (Alex, ZZYZX, 2011-03-02 00:00:00, 2011-03-05 00:00:00), dtype: object 
+0

感謝。我沒有看到格式正確的日期時間對象,我認爲我應該以字符串格式使用它。 – highpost

+0

有時會傳遞一個字符串'as'日期時間與Pandas一起工作,但是在這裏它並不解析'即時' –