刪除熊貓數據框中每一行的標點符號

我是python的新手，所以這可能是一個非常基本的問題。我正在嘗試使用lambda刪除熊貓數據框中每一行的標點符號。我使用了以下內容，但收到錯誤。我試圖避免將df轉換爲列表，然後將清理後的結果附加到新列表中，然後將其轉換回df。刪除熊貓數據框中每一行的標點符號

任何建議，將不勝感激！

import string 

df['cleaned'] = df['old'].apply(lambda x: x.replace(c,'') for c in string.punctuation)

來源

2015-10-09 RJL

您需要遍歷數據幀中的字符串，而不是覆蓋string.punctuation。您還需要使用.join()備份字符串。

df['cleaned'] = df['old'].apply(lambda x:''.join([i for i in x 
                if i not in string.punctuation]))

當lambda表達式變長時，它可以更具可讀性來分別寫出函數定義，例如，（感謝@AndyHayden的優化建議）：

def remove_punctuation(s): 
    s = ''.join([i for i in s if i not in frozenset(string.punctuation)]) 
    return s 

df['cleaned'] = df['old'].apply(remove_punctuation)

來源

2015-10-09 22:13:31 bernie

很不錯的！謝謝！ – RJL

非常歡迎！ – bernie

如果它適合你，你可以接受這個答案。 –

使用正則表達式將最有可能會更快這裏：

In [11]: RE_PUNCTUATION = '|'.join([re.escape(x) for x in string.punctuation]) # perhaps this is available in the re/regex library? 

In [12]: s = pd.Series(["a..b", "c<=d", "e|}f"]) 

In [13]: s.str.replace(RE_PUNCTUATION, "") 
Out[13]: 
0 ab 
1 cd 
2 ef 
dtype: object

來源

2015-10-09 22:42:15

這應該是被接受的答案... – clg4

同樣：'s.str.replace（'[{}]'.format（string.punctuation），''）' –

刪除熊貓數據框中每一行的標點符號

回答

相關問題