正則表達式匹配第一個和最後一個字或詞

我有一個數據的列表一個巨大的文件，如這樣：正則表達式匹配第一個和最後一個字或詞

#fabulous  7.526 2301 2 
#excellent  7.247 2612 3 
#superb 7.199 1660 2 
#perfection  7.099 3004 4 
#terrific  6.922 629  1

我有一個包含這樣的句子列表的文件：

Terrific Theo Walcott is still shit, watch Rafa and Johnny deal with him on Saturday. 
its not that I'm a GSP fan, fabulous 
Iranian general says Israel's Iron Dome can't deal with their missiles 
with J Davlar 11th. Main rivals are team Poland.

我要檢查使用正則表達式如下：

如果在每一句第一個字相匹配的文件任何話例如，如果好棒，其，伊朗，與發生在文件中或不
如果在句子最後一個單詞匹配的文件例的任何話，如果星期六，超讚，導彈，波蘭出現在文件中或不
如果2或3個字符前後綴的單個單詞在句子中匹配2或3個字符前綴和後綴在文件中示例如果Ter，其，Ira，wi匹配任何2或3個單詞前綴文件與否。同樣適用於後綴。

我很新的正則表達式我能想到的這種方式，但沒有得到結果： term2.lower（）是在文件中的第一列

wordanalysis["trail"] = found if re.match(sentence[-1],term2.lower()) else not(found) 
    wordanalysis["lead"] = found if re.match(sentence[0],term2.lower()) else not(found)

來源

2013-12-13 fscore

嗨@ r3mus請檢查我的編輯 – fscore

我想檢查第一個單詞是否與文件中的單詞列表匹配。爲什麼這有什麼不對？我正在開展一個項目。 – fscore

@ r3mus對不起。是的，你說得對。檢查我的編輯例子。 – fscore

更新：每個由@justhalf提供的真棒建議，不需要使用正則表達式來分割單詞。如果您想區分大小寫匹配，請移除.lower()。

(^\s?\w+\b|(\b\w+)[\.?!\s]*$)

匹配：

MATCH 1-1. Terrific 
MATCH 2-1. Saturday. 
     2. Saturday 
MATCH 3-1. its 
MATCH 4-1. fabulous 
     2. fabulous 
MATCH 5-1. Iranian 
MATCH 6-1. missiles 
     2. missiles 
MATCH 7-1. with 
MATCH 8-1. Poland. 
     2. Poland

實施

這將在您的數據列表中的第一個字和最後一個字（不包括任何標點符號或尾隨空白）相匹配：

import re, string 

sentences = open("sentences.txt").read().splitlines() 
data = open("data.txt").read() 
pattern = re.compile(r"(^\s?\w+\b|(\b\w+)[\.?!\s]*$)") 
for line in sentences: 
    words = line.strip().split() 
    first = words[0].lower() 
    last = words[-1].translate(None, string.punctuation).lower() 
    if (re.search(first, data, re.I)): 
     print "Found " + first + " in data.txt" 
    if (re.search(last, data, re.I)): 
     print "Found " + last + " in data.txt"

這可能不是最優雅的做法，但你明白了。

代碼進行測試和工作，輸出：

Found Terrific in data.txt 
Found fabulous in data.txt

此外這並不能完成你的第三個標準，測試了這一點，看看它的工作至今爲您服務。

來源

2013-12-13 01:01:46 brandonscript

爲什麼你需要一個正則表達式才能得到第一個和最後一個單詞？你可以根據空格來分割，如下所示：'words = line.strip（）。分裂（）; first，last = words [0]，words [-1]' – justhalf

@justhalf好點，更新，並適應現在的標點符號。 – brandonscript

如何獲取data.txt的第二列？ – fscore

正則表達式匹配第一個和最後一個字或詞

回答

相關問題