2017-02-27 24 views
1

我想使用熊貓MultiIndex切片器命令.xs()來分割導入的csv文件(時間序列)並對其進行處理。以下df複製我導入的csv文件的結構。當在大型數據集上擴展多索引切片器時長度不匹配錯誤

import pandas as pd 

df = pd.DataFrame(
    {'Sensor ID': [14,1,3,14,3], 
    'Building ID': [109,109,109,109,109], 
    'Date/Time': ["26/10/2016 14:31:14","26/10/2016 14:31:16", "26/10/2016 14:32:17", "26/10/2016 14:35:14", "26/10/2016 14:35:38"], 
    'Reading': [20.95, 20.62, 22.45, 20.65, 22.83], 
    }) 

df.set_index(['Sensor ID','Date/Time'], inplace=True) 
df.sort_index(inplace=True) 
print(df) 

SensorList = [1, 3, 14] 

for s in SensorList: 
    df1 = df.xs(s, level='Sensor ID') 

我已經測試了一小段csv數據的代碼,它工作正常。但是,在執行整個csv文件時,我收到錯誤:ValueError: Length mismatch: Expected axis has 19562 elements, new values have 16874 elements

印刷df.info()返回如下:

<class 'pandas.core.frame.DataFrame'> 
MultiIndex: 65981 entries, (1, 2016-10-26 14:35:15) to (19, 2016-11-07 11:27:14) 
Data columns (total 2 columns): 
Building ID 65981 non-null int64 
Reading  65981 non-null float64 
dtypes: float64(1), int64(1) 
memory usage: 1.5+ MB 
None 

什麼可能導致錯誤的任何提示?

編輯

我無意間截斷我的代碼,從而把它留在其目前的形式毫無意義。原始代碼將值重新抽樣爲15分鐘和1小時的間隔。

有:

units = ['D1','D3','D6','D10'] 
unit_output_path = './' + unit + '/' 

循環的作用:

for s in SensorList: 

    ## Slice multi-index to isolate all readings for sensor s 
    df1 = df_mi.xs(s, level='Sensor ID') 
    df1.drop('Building ID', axis=1, inplace=True) 

    ## Resample by 15min and 1hr intervals and exports individual csv files 
    df1_15min = df1.resample('15Min').mean().round(1) 
    df1_hr = df1.resample('60Min').mean().round(1) 

回溯:

File "D:\AN6478\AN6478_POE_ABo.py", line 52, in <module> 
    df1 = df_mi.xs(s, level='Sensor ID') 
    File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1736, in xs 
    setattr(result, result._get_axis_name(axis), new_ax) 
    File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2685, in __setattr__ 
    return object.__setattr__(self, name, value) 
    File "pandas\src\properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas\lib.c:44748) 
    File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\generic.py", line 428, in _set_axis 
    self._data.set_axis(axis, labels) 
    File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\internals.py", line 2635, in set_axis 
    (old_len, new_len)) 
ValueError: Length mismatch: Expected axis has 19562 elements, new values have 16874 elements 
+0

你期望的最終結果是什麼?也許這裏不需要循環。錯誤實際發生在哪裏 - 你能提供回溯嗎?此時,您的代碼會在每個循環中覆蓋'df1'。 – pansen

+0

@pansen我編輯了我的問題,包括在循環中執行的命令 – Andreuccio

回答

1

我不能告訴你到底爲什麼df1 = df_mi.xs(s, level='Sensor ID')這裏引發ValueError異常。 df_mi從哪裏來?

這是一個替代方案,使用groupby,它可以在不依賴multiIndex和xs的情況下完成您想要的虛擬數據框。 :

# reset index to have DatetimeIndex, otherwise resample won't work 
df = df.reset_index(0) 
df.index = pd.to_datetime(df.index) 

# create data frame for each sensor, keep relevant "Reading" column 
grouped = df.groupby("Sensor ID")["Reading"] 

# iterate each sensor data frame 
for sensor, sub_df in grouped: 
    quarterly = sub_df.resample('15Min').mean().round(1) 
    hourly = sub_df.resample('60Min').mean().round(1) 

    # implement your to_csv saving here 

注意,你也可以對多指標,因爲要重新取樣以後使用groupbydf.groupby(level="Sensor ID"),但是,它是更容易從它簡化了它的整體下降多指標傳感器ID

+0

我希望我可以使用多索引片工作,但此解決方案也適用於我的數據。謝謝! – Andreuccio

+1

如上所述,您可以使用'df.groupby(level =「Sensor ID」)'在multiIndex上對groupby進行分組。這工作非常好。另外,您可以嘗試[pd.IndexSlice](http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers)。例如,'df_mi.loc [pd.IndexSlice [:,14],「Reading」]'。 – pansen

+0

謝謝,真的很感激 – Andreuccio

相關問題