使用帶單詞的列表來查找文本中的單詞

我有一個帶單詞的列表並希望在文本中搜索單詞。我的名單看起來像：使用帶單詞的列表來查找文本中的單詞

split_list = [y for x in old_list for y in x.split()] set_list = list(set(split_list)) ['Hello', 'Welcome', 'World'] #this is how the list looks like

現在我要採取set_list和搜索與該列表中的所有單詞的文本。我試了一下：

words_text = set(set_list).intersection(the_text) print words_text

我只得到set_list打印。我錯過了什麼？如果set_list包含單詞「Hello」，我需要在新列表中的文本中包含所有「Hello」。喜歡：['你好'，'你好'，'歡迎'，'你好'，...]

來源

2014-10-09 TAN-C-F-OK

集交集只返回集合共同的單詞。每個單詞只列出一次（因爲這是一個集合的意思）。

要列出話多次嘗試：

[word for word in the_text.split() if word in set_list]

來源

2014-10-09 21:38:43

謝謝，這似乎工作。但它的奇怪，我沒有得到所有。我從文中的41個單詞中得到了26個。 – 2014-10-09 21:47:36

可能有解釋。例如，如果您搜索的是「Hello」，並且文本中包含沒有空格的「Hello，World」，那麼將不會找到「Hello」，因爲'.split（）你正在比較完整的單詞。另外，大寫和小寫都被認爲是不同的，除非你想對待它們。所以'「你好」'不會匹配'「你好」'。 – 2014-10-09 21:49:15

是的，逗號。還有一些有點後。好的，所以我只需要擺脫文本中的標點符號，對吧？ like：'punctuation = re.compile（r「[ - 。？！，：;（）0-9 \ n \ r |]」） text_punct = [punctuation.sub（「」，word）for the word in the_text] ' – 2014-10-09 21:53:18

使用帶單詞的列表來查找文本中的單詞

回答

相關問題