我想使用熊貓MultiIndex切片器命令.xs()
來分割導入的csv文件(時間序列)並對其進行處理。以下df
複製我導入的csv文件的結構。當在大型數據集上擴展多索引切片器時長度不匹配錯誤
import pandas as pd
df = pd.DataFrame(
{'Sensor ID': [14,1,3,14,3],
'Building ID': [109,109,109,109,109],
'Date/Time': ["26/10/2016 14:31:14","26/10/2016 14:31:16", "26/10/2016 14:32:17", "26/10/2016 14:35:14", "26/10/2016 14:35:38"],
'Reading': [20.95, 20.62, 22.45, 20.65, 22.83],
})
df.set_index(['Sensor ID','Date/Time'], inplace=True)
df.sort_index(inplace=True)
print(df)
SensorList = [1, 3, 14]
for s in SensorList:
df1 = df.xs(s, level='Sensor ID')
我已經測試了一小段csv數據的代碼,它工作正常。但是,在執行整個csv文件時,我收到錯誤:ValueError: Length mismatch: Expected axis has 19562 elements, new values have 16874 elements
。
印刷df.info()
返回如下:
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 65981 entries, (1, 2016-10-26 14:35:15) to (19, 2016-11-07 11:27:14)
Data columns (total 2 columns):
Building ID 65981 non-null int64
Reading 65981 non-null float64
dtypes: float64(1), int64(1)
memory usage: 1.5+ MB
None
什麼可能導致錯誤的任何提示?
編輯
我無意間截斷我的代碼,從而把它留在其目前的形式毫無意義。原始代碼將值重新抽樣爲15分鐘和1小時的間隔。
有:
units = ['D1','D3','D6','D10']
unit_output_path = './' + unit + '/'
循環的作用:
for s in SensorList:
## Slice multi-index to isolate all readings for sensor s
df1 = df_mi.xs(s, level='Sensor ID')
df1.drop('Building ID', axis=1, inplace=True)
## Resample by 15min and 1hr intervals and exports individual csv files
df1_15min = df1.resample('15Min').mean().round(1)
df1_hr = df1.resample('60Min').mean().round(1)
回溯:
File "D:\AN6478\AN6478_POE_ABo.py", line 52, in <module>
df1 = df_mi.xs(s, level='Sensor ID')
File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1736, in xs
setattr(result, result._get_axis_name(axis), new_ax)
File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2685, in __setattr__
return object.__setattr__(self, name, value)
File "pandas\src\properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas\lib.c:44748)
File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\generic.py", line 428, in _set_axis
self._data.set_axis(axis, labels)
File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\internals.py", line 2635, in set_axis
(old_len, new_len))
ValueError: Length mismatch: Expected axis has 19562 elements, new values have 16874 elements
你期望的最終結果是什麼?也許這裏不需要循環。錯誤實際發生在哪裏 - 你能提供回溯嗎?此時,您的代碼會在每個循環中覆蓋'df1'。 – pansen
@pansen我編輯了我的問題,包括在循環中執行的命令 – Andreuccio