Python熊貓錯誤，同時刪除額外的空白空間

我想清理使用命令額外空格的數據框中的一列。數據幀有近800萬條記錄Python熊貓錯誤，同時刪除額外的空白空間

datt2.My_variable=datt2.My_variable.str.replace('\s+', ' ')

我最終得到下面的錯誤

MemoryError        Traceback (most recent call last) 
<ipython-input-10-158a51cfaa3d> in <module>() 
----> 1 datt2.My_variable=datt2.My_variable.str.replace('\s+', ' ') 

c:\python27\lib\site-packages\pandas\core\strings.pyc in replace(self, pat, repl, n, case, flags) 
    1504  def replace(self, pat, repl, n=-1, case=True, flags=0): 
    1505   result = str_replace(self._data, pat, repl, n=n, case=case, 
-> 1506        flags=flags) 
    1507   return self._wrap_result(result) 
    1508 

c:\python27\lib\site-packages\pandas\core\strings.pyc in str_replace(arr, pat, repl, n, case, flags) 
    334   f = lambda x: x.replace(pat, repl, n) 
    335 
--> 336  return _na_map(f, arr) 
    337 
    338 

c:\python27\lib\site-packages\pandas\core\strings.pyc in _na_map(f, arr, na_result, dtype) 
    152 def _na_map(f, arr, na_result=np.nan, dtype=object): 
    153  # should really _check_ for NA 
--> 154  return _map(f, arr, na_mask=True, na_value=na_result, dtype=dtype) 
    155 
    156 

c:\python27\lib\site-packages\pandas\core\strings.pyc in _map(f, arr, na_mask, na_value, dtype) 
    167   try: 
    168    convert = not all(mask) 
--> 169    result = lib.map_infer_mask(arr, f, mask.view(np.uint8), convert) 
    170   except (TypeError, AttributeError): 
    171 

pandas\src\inference.pyx in pandas.lib.map_infer_mask (pandas\lib.c:65837)() 

pandas\src\inference.pyx in pandas.lib.maybe_convert_objects (pandas\lib.c:56806)() 

MemoryError:

來源

2017-03-17 Enthusiast

如果你使用'datt2.My_variable.str.replace（r'\ s +'，''，inplace = True，regex = True）'？ –

@WiktorStribiżew，這個函數沒有參數'inplace'或'regex'。我仍然繼續前進並執行它，並且錯誤消息正如所預期的那樣：TypeError：replace（）得到了一個意外的關鍵字參數'inplace'' – Enthusiast

這是很多你正在處理的數據嗎？ – languitar

Question: I am trying to clean a column in data frame of extra white space ...
datt2.My_variable=datt2.My_variable.str.replace('\s+', ' ')

請評論，我理解你的expression正確？

pandas  Column   Column    DataSeries 
DataFrame  Name   DataSeries    Methode 
|-^-|  |----^-----| |-------^-------| |----------^----------| 
datt2  .My_variable = datt2.My_variable .str.replace('\s+', ' ')

我敢肯定使用re.sub是一樣的使用pandas.str.replace(...)，但沒有複製整個column數據。

From the pandas doc:
Series.str.replace(pat, repl, n=-1, case=True, flags=0)
Replace occurrences of pattern/regex in the Series/Index with some other string.
Equivalent to str.replace() or re.sub().

嘗試純python，例如：

import re 
    for idx in df.index: 
     df.loc[idx, 'My_variable'] = re.sub('\s\s+', ' ', df.loc[idx, 'My_variable'])

注意：考慮使用'\ S \ S +'，而不是 '\ S +'。
使用'\ s +'將取代ONE BLANK與ONE BLANK，這是無用的。

與Python測試：3.4.2 - 大熊貓：作爲或回答，如果這是爲你工作發表意見，爲什麼不0.19.2
回來和標誌你的問題。

來源

2017-03-18 16:24:27 stovfl

這是一個循環，不是嗎？這正是我爲什麼使用向量化正則表達式替換整個數據框列的原因。這並不回答我的原始問題。 – Enthusiast

在熊貓數據框中，我們有行和列。在re.sub中，對於一個熊貓列，您必須像循環中的代碼塊一樣逐行處理所有內容。但是，pandas.str.replace可以處理整個列而無需編寫循環。這在過去對我有用。然而，我現在使用的數據是800萬行。這個庫無法擴展這個數據量。 – Enthusiast

@Enthusiast：我明白你的觀點，請確認，我是否正確理解你的「表達」。 – stovfl

Python熊貓錯誤，同時刪除額外的空白空間

回答

相關問題