python查找字符串中的子串

我想查找python中字符串中子字符串的出現次數。但我需要我的搜索是非常具體的。之前搜索的字符串我刪除所有的標點：python查找字符串中的子串

myString.translate（無，string.punctuation）

現在我搜索的子字符串。如果我正在搜索子字符串「hello bob」，並在字符串內部搜索，我將文本「hello bob-something else」或「hello bob'」以及其他一些文本一起。當我刪除標點符號時，兩個字符'不會被刪除，因爲它們不是unicode字符，因此上面提到的兩個字符串不應該被視爲「hello bob」這個詞的出現。

我用下面的正則表達式的代碼來嘗試獲得事件的正確數量，在大型文件（3000線以上），我開始沒有得到的話

counter = 0 
searcher = re.compile("hello bob" + r'([^\w-]|$)').search 
with open(myFile, 'r') as source: 
    for line in source: 
     if searcher(line): 
      counter += 1

別的東西出現的正確數量我試過

我想使用findAll函數，因爲到目前爲止，它給了我輸入的單詞的正確數目。

我發現這對計算器：

re.findall(r'\bword\b', read)

反正是有，我可以使用一個變量，而不是詞的？

比如我想使用：

myPhrase = "hello bob" 
re.findall(r'\bmyPhrase\b', read)

這應該是一樣的：

re.findall(r'\bhello bob\b', read)

來源

2017-02-13 memoryManagers

給出一個示例輸入和期望輸出。 –

查找關於re.findAll（） – TallChuck

@ juanpa.arrivillaga的信息這將是非常困難的，因爲上面的代碼在大多數情況下工作，但在大文本（3000行或更多）的texfiles上失敗 – memoryManagers

您可以執行字符串中使用下面的技巧來解決這個問題插值。

myphrase = "hello bob" 
pattern = r'\b{var}\b'.format(var = myphrase)

來源

2017-02-13 04:42:50 Prerit

這完美無缺地感謝 – memoryManagers

@memoryManagers不客氣！：d – Prerit

您可以使用re.escape(myPhrase)進行變量替換。

read = "hello bob ! how are you?" 
myPhrase = "hello bob" 
my_regex = r"\b" + re.escape(myPhrase) + r"\b" 

counter = 0 
if re.search(my_regex, read, re.IGNORECASE): 
    counter += 1 
else: 
    print "not found"

來源

2017-02-13 04:49:36

python查找字符串中的子串

回答

相關問題