查找子字符串和使用正則表達式，蟒蛇

我有一個數據集，它看起來像這樣刪除它，查找子字符串和使用正則表達式，蟒蛇

"See the new #Gucci 5th Ave NY windows customized by @troubleandrew for the debut of the #GucciGhost collection." 
"Before the #GucciGhost collection debuts tomorrow, read about the artist @troubleandrew"

，所以我試圖擺脫所有的@和與它相連的字樣。我的數據集應該看起來像這樣。

"See the new #Gucci 5th Ave NY windows customized by for the debut of the #GucciGhost collection." 
    "Before the #GucciGhost collection debuts tomorrow, read about the artist"

所以我可以使用一個簡單的替換語句來擺脫@。但是相鄰的單詞是一個問題。

我正在使用重新搜索/查找事件。但我無法刪除這個詞。

P.S - 如果它是一個單詞，它不會是一個問題。但在這裏和那裏有多個單詞連接到@

來源

2016-09-15 M PAUL

你有什麼問題？什麼代碼不會刪除@ +單詞？你嘗試過're.sub'嗎？ –

我的問題是我無法刪除整個@ +單詞。我正在使用're.findall'。無論如何，'re.sub'起作用。謝謝 –

我的數據集可以使用正則表達式

import re 

a = [ 
"See the new #Gucci 5th Ave NY windows customized by @troubleandrew for the debut of the #GucciGhost collection.", 
"Before the #GucciGhost collection debuts tomorrow, read about the artist @troubleandrew" 
] 
pat = re.compile(r"@\S+") # \S+ all non-space characters 
for i in range(len(a)): 
    a[i] = re.sub(pat, "", a[i]) # replace it with empty string 
print a

這會給你想要的東西。

來源

2016-09-15 09:15:34

地道版本，潛艇額外的空間：

import re 

a = [ 
    "See the new #Gucci 5th Ave NY windows customized by @troubleandrew for the debut of the #GucciGhost collection.", 
    "Before the #GucciGhost collection debuts tomorrow, read about the artist @troubleandrew" 
] 

rgx = re.compile(r"\[email protected]\S+") 

b = [ re.sub(rgx, "", row) for row in a ] 

print b

\s?：\s比賽' '和?代表zero or one發生。

來源

2016-09-16 00:02:12 Marcs

查找子字符串和使用正則表達式，蟒蛇

回答

相關問題