我使用,我從所獲得的代碼:Comparing and replacing values inside DataFrames通過文件循環大熊貓
main_df = pd.read_csv('main.txt', sep='|', encoding='utf-8')
data_df = pd.read_csv('data.csv', encoding='utf-8')
main_df_part = main_df[['PRIM_LAT_DEC', 'PRIM_LONG_DEC', 'FEATURE_NAME', 'STATE_ALPHA']]
main_df_part.columns = ['LAT', 'LONG', 'CITY', 'STATE']
main_df_part = main_df_part.set_index(['CITY', 'STATE'])
data_df = data_df.set_index(['CITY', 'STATE'])
data_df.update(main_df_part)
data_df.to_csv('data/new.csv', sep=',', mode='a')
我有大約60文件,我需要通過運行。 main_df
,我試過如下:
總之
- Concatnate的文件,但繼續得到
pandas.parser.CParserError: Error tokenizing data. C error: out of memory
。 - 使用CHUNKSIZE,但這種轉換數據幀到
pandas.io.parsers.TextFileReader
做一些我以前 無效 - 方法最後,我試圖通過每個文件迭代,並把正確的 名稱,而不是
main.txt
但這樣做時繼續得到Exception: cannot handle a non-unique multi-index!
。
這是使用第三種方法:
files = [f for f in os.listdir('./data') if os.path.isfile(os.path.join('./data', f))]
for w in files:
main_df = pd.read_csv(w, sep='|', low_memory=False, encoding='utf-8')
任何想法如何解決多指標差?
的擴展信息
從方法1錯誤:
Traceback (most recent call last):
File "C:/Users/Leb/Desktop/Python/py-script/geo_pandas.py", line 6, in <module>
main_df = pd.read_csv('data.txt', sep='|', low_memory=False, encoding='utf-8')
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 474, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 260, in _read
return parser.read()
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 721, in read
ret = self._engine.read(nrows)
File "C:\Python34\lib\site-packages\pandas\io\parsers.py", line 1170, in read
data = self._reader.read(nrows)
File "pandas\parser.pyx", line 772, in pandas.parser.TextReader.read (pandas\parser.c:7581)
File "pandas\parser.pyx", line 858, in pandas.parser.TextReader._read_rows (pandas\parser.c:8532)
File "pandas\parser.pyx", line 1742, in pandas.parser.raise_parser_error (pandas\parser.c:20715)
pandas.parser.CParserError: Error tokenizing data. C error: out of memory
錯誤從方法2:
Traceback (most recent call last):
File "C:/Users/Leb/Desktop/Python/py-script/geo_pandas.py", line 11, in <module>
main_df_part = main_df[['PRIM_LAT_DEC', 'PRIM_LONG_DEC','FEATURE_NAME', 'STATE_ALPHA']]
TypeError: 'TextFileReader' object is not subscriptable
錯誤從方法3:
Traceback (most recent call last):
File "C:/Users/Leb/Desktop/Python/py-script/geo_pandas.py", line 32, in <module>
data_df.update(main_df_part)
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 3416, in update
other = other.reindex_like(self)
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1564, in reindex_like
return self.reindex(**d)
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 2511, in reindex
**kwargs)
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1773, in reindex
method, fill_value, copy).__finalize__(self)
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 2470, in _reindex_axes
fill_value, limit)
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 2477, in _reindex_index
limit=limit)
File "C:\Python34\lib\site-packages\pandas\core\index.py", line 4929, in reindex
"cannot handle a non-unique multi-index!")
Exception: cannot handle a non-unique multi-index!
請發佈您嘗試過的每件產品的確切回溯。 – Manhattan
將需要一秒鐘,但我會努力。 – Leb
我會冒險猜測方法3:'main_df_part'有兩個完全相同的索引。掃描它。您可能有一個城市州組合,在您的某個文件中出現多次。 – Manhattan