2017-04-10 127 views
2

我有以下文本文件(您可以從here下載它)。Python - 用NLTK搜索文本

我試圖在文件中搜索詞language。對於這一點,我有以下Python腳本:

import nltk 

file = open('NLTK.txt', 'r') 
read_file = file.read() 
text = nltk.Text(read_file) 
match = text.concordance('language') 
print(match) 

然而,當我運行程序時,我得到下面的輸出,雖然該文件包含單詞language

No matches 
None 

爲什麼不能節目找到這個詞language如果它存在於文件中?

編輯1

我注意到,聲明text = nltk.Text(read_file)回報:

<Text: T h i s i s ...> 

感謝。

+0

接受的答案是關於如何解決這個問題是正確的,但這裏的另一個忠告:不要打擾學習與'Text'類工作;它僅用於交互式探索和演示。直接進入'PlaintextCorpusReader'(和其對應的註釋格式)。 – alexis

回答

4

我相信你需要首先使用標記來處理原始文本(as per ch3)。在您的示例文本中,Tokenizing然後處理給了我結果。

import nltk 

file = open('NLTK.txt', 'r') 
read_file = file.read() 
text = nltk.Text(nltk.word_tokenize(read_file)) 

match = text.concordance('language') 

或者,您可以使用NLTK語料庫讀者做這樣的令牌化和處理;

import nltk 
from nltk.corpus import PlaintextCorpusReader 

corp = PlaintextCorpusReader(r'C:/', 'NLTK.txt') 
text = nltk.Text(corp.words()) 

match = text.concordance('language') 

匹配結果;

Displaying 18 of 18 matches: 
            Language Processing . By `` natural languag 
            language '' we mean a language that is used 
            language that is used for everyday communic 
licit rules . We will take Natural Language Processing ・or NLP for short ・in a 
f computer manipulation of natural language . At one extreme , it could be as 
ted access to stored information , language processing has come to play a cent 
e textbook for a course on natural language processing or computational lingui 
is based on the Python programming language together with an open source libra 
source library called the Natural Language Toolkit (NLTK) . NLTK includes e 
s are deployed in a variety of new language technologies . For this reason it 
rite programs that analyze written language , regardless of previous programmi 
is book to get immersed in natural language processing . All relevant Python f 
ty for this application area . The language index will help you locate relevan 
mples and dig into the interesting language analysis material that starts in 1 
text using Python and the Natural Language Toolkit . To learn about advanced 
an help you manipulate and analyze language data , and how to write these prog 
s are used to describe and analyse language How data structures and algorithms 
and algorithms are used in NLP How language data is stored in standard formats