2015-11-02 56 views
2

在推文分析過程中,我運行的是包含\或/(可能在一個「詞」中出現多個外觀)的「單詞」。我想有這樣的話完全刪除,但無法真正抓住這個刪除帶有特殊字符「」和「/」的文字

這是我的嘗試:

sen = 'this is \re\store and b\\fre' 
sen1 = 'this i\s /re/store and b//fre/' 

slash_back = r'(?:[\w_]+\\[\w_]+)' 
slash_fwd = r'(?:[\w_]+/+[\w_]+)' 
slash_all = r'(?<!\S)[a-z-]+(?=[,.!?:;]?(?!\S))' 

strt = re.sub(slash_back,"",sen) 
strt1 = re.sub(slash_fwd,"",sen1) 
strt2 = re.sub(slash_all,"",sen1) 
print strt 
print strt1 
print strt2 

我想獲得:

this is and 
this i\s and 
this and 

但是,我得到:

and 
this i\s/and/
i\s /re/store b//fre/ 

要添加:在這種情況下,「單詞」是一個字符串,由空格或標點符號分隔ns(如普通文字)

+1

精美的問的問題。我希望有一個問題模板,提問者不得不採用類似的方式。 – d0nut

+1

@iismathwizard我不得不重新加載頁面來仔細檢查我的眼睛是否正確 –

回答

1

這是怎麼回事?我加了一些標點符號的例子:

import re 

sen = r'this is \re\store and b\\fre' 
sen1 = r'this i\s /re/store and b//fre/' 
sen2 = r'this is \re\store, and b\\fre!' 
sen3 = r'this i\s /re/store, and b//fre/!' 

slash_back = r'\s*(?:[\w_]*\\(?:[\w_]*\\)*[\w_]*)' 
slash_fwd = r'\s*(?:[\w_]*/(?:[\w_]*/)*[\w_]*)' 
slash_all = r'\s*(?:[\w_]*[/\\](?:[\w_]*[/\\])*[\w_]*)' 

strt = re.sub(slash_back,"",sen) 
strt1 = re.sub(slash_fwd,"",sen1) 
strt2 = re.sub(slash_all,"",sen1) 
strt3 = re.sub(slash_back,"",sen2) 
strt4 = re.sub(slash_fwd,"",sen3) 
strt5 = re.sub(slash_all,"",sen3) 
print(strt) 
print(strt1) 
print(strt2) 
print(strt3) 
print(strt4) 
print(strt5) 

輸出:你可以做到這一點,而不re

this is and 
this i\s and 
this and 
this is, and! 
this i\s, and! 
this, and! 
+0

美麗!像夢一樣工作!非常感謝!! – Toly

0

一種方式是使用join和理解。

sen = 'this is \re\store and b\\fre' 
sen1 = 'this i\s /re/store and b//fre/' 

remove_back = lambda s: ' '.join(i for i in s.split() if '\\' not in i) 
remove_forward = lambda s: ' '.join(i for i in s.split() if '/' not in i) 

>>> print(remove_back(sen)) 
this is and 
>>> print(remove_forward(sen1)) 
this i\s and 
>>> print(remove_back(remove_forward(sen1))) 
this and 
+0

有趣的做法!我只認爲這是針對特定案例的特定解決方案,而我正在尋找一種通用解決方案。馬克的解決方案到目前爲止,已經從我的推特收集中最野生的字符串。謝謝! – Toly