我有一個數據集,當給RandomForestClassifier
中的一個算法scikit-learn
時,會導致算法運行內存不足。我正在使用pandas
數據框來加載數據。有沒有辦法讓我可以迭代地訓練算法,即我將把數據分成十個部分,並在所有部分上訓練算法以完成對數據集的訓練。這可能嗎?在scikit-learn中迭代訓練算法
編輯COMPLETE TRACEBACK
Traceback (most recent call last):
File "F:\major\solution-1.py", line 234, in <module>
prep_data()
File "F:\major\solution-1.py", line 160, in prep_data
selector.fit(data[predictors], data['ED2'])
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1963, in __getitem__
return self._getitem_array(key)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2008, in _getitem_array
return self.take(indexer, axis=1, convert=True)
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 1368, in take
self._consolidate_inplace()
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 2411, in _consolidate_inplace
self._protect_consolidate(f)
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 2402, in _protect_consolidate
result = f()
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 2410, in f
self._data = self._data.consolidate()
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3194, in consolidate
bm._consolidate_inplace()
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3199, in _consolidate_inplace
self.blocks = tuple(_consolidate(self.blocks))
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 4189, in _consolidate
_can_consolidate=_can_consolidate)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 4212, in _merge_blocks
new_values = new_values[argsort]
MemoryError
你的數據的大小是多少? – dooms
200,000行和337列... –
您是否嘗試將RandomForestClassifier應用於您的數據的一小部分?你沒有打印實際的錯誤。如果數據適合你的內存,它應該工作,如果你不使用大量的樹。 – dooms