將字符串操作應用到熊貓數據框中

也有類似的答案，但我不能將它應用到我自己的案例中我想擺脫我的熊貓數據框中Windows目錄名稱的禁用字符。我試圖用這樣的：將字符串操作應用到熊貓數據框中

item_name 
0 stback 
1 yhhxx 
2 adfgs 
3 ghytt23 
4 ghh_h

我怎麼能做到這一點：

df1['item_name'] = "".join(x for x in df1['item_name'].rstrip() if x.isalnum() or x in [" ", "-", "_"]) if df1['item_name'] else ""

假設我有這樣

item_name 
0 st*back 
1 yhh?\xx 
2 adfg%s 
3 ghytt&{23 
4 ghh_h

我希望得到一個數據幀？注：我從互聯網上刮下的數據前面，並用下面的代碼針對舊版

item_name = "".join(x for x in item_name.text.rstrip() if x.isalnum() or x in [" ", "-", "_"]) if item_name else ""

現在，我有相同的項目新的意見，我想與舊的意見合併。但我忘了用同樣的方法，當我rescraped

來源

2017-04-17 edyvedy13

'df.item_name = df.item_name.apply（拉姆達X：x.replace（「\ s | - | _」，「」）' –

不，但我想保留「_」和「 - 」只是我想擺脫禁止Windows目錄的項目。無論如何， – edyvedy13

應該是're.sub'。 –

你可以總結的條件爲負字符類，並使用str.replace刪除它們，這裏\w代表字字符alnum + _，\s代表空間和-是文字破折號。隨着字符類^，[^\w\s-]不是字母數字，也不[" ", "-", "_"]任何字符相匹配，那麼你可以使用replace方法將其刪除：

df.item_name.str.replace("[^\w\s-]", "") 

#0  stback 
#1  yhhxx 
#2  adfgs 
#3 ghytt23 
#4  ghh_h 
#Name: item_name, dtype: object

來源

2017-04-17 21:03:01 Psidom

對不起，我編輯了我的問題，它會達到我以前所做的一樣嗎？ – edyvedy13

它應該。如答案中所述，該模式刪除不是'[a-zA-Z0-9，_， - ，「」]'的字符。 – Psidom

嘗試

import re 
df.item_name.apply(lambda x: re.sub('\W+', '', x)) 

0  stback 
1  yhhxx 
2  adfgs 
3 ghytt23 
4  ghh_h

來源

2017-04-17 21:03:54 Vaishali

如果你有一個正確轉義字符的列表

lst = ['\\\\', '\*', '\?', '%', '&', '\{'] 
df.replace(lst, '', regex=True) 

    item_name 
0 stback 
1  yhhxx 
2  adfgs 
3 ghytt23 
4  ghh_h

來源

2017-04-17 21:05:03 piRSquared

將字符串操作應用到熊貓數據框中

回答

相關問題