這是我用來從pandas
的列中刪除標點符號的函數。試圖從大熊貓的列中刪除標點符號
def remove_punctuation(text):
return re.sub(r'[^\w\s]','',text)
這就是我應用它的方式。
review_without_punctuation = products['review'].apply(remove_punctuation)
這裏的產品是pandas
數據幀。
這是我得到的錯誤信息。
TypeError Traceback (most recent call last)
<ipython-input-19-196c188dfb67> in <module>()
----> 1 review_without_punctuation = products['review'].apply(remove_punctuation)
/Users/username/Dropbox/workspace/private/pydev/ml/classification/.env/lib/python3.6/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
2292 else:
2293 values = self.asobject
-> 2294 mapped = lib.map_infer(values, f, convert=convert_dtype)
2295
2296 if len(mapped) and isinstance(mapped[0], Series):
pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:66124)()
<ipython-input-18-0950dc65d8b8> in remove_punctuation(text)
1 def remove_punctuation(text):
----> 2 return re.sub(r'[^\w\s]','',text)
/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/re.py in sub(pattern, repl, string, count, flags)
189 a callable, it's passed the match object and must return
190 a replacement string to be used."""
--> 191 return _compile(pattern, flags).sub(repl, string, count)
192
193 def subn(pattern, repl, string, count=0, flags=0):
TypeError: expected string or bytes-like object
我在做什麼錯。
給我們一個小例子請DataFrame。 – Denziloe
您可以檢查列「review」的任何一行中是否有'nan'或非字符串值? – Ali