2017-06-15 51 views
0

你好,我想用下面的包叫做textrank,詳見以下網址:如何解決以下問題,使用TextRank?

https://github.com/davidadamojr/TextRank 

之後克隆所有PIP3依賴性,我試圖用這個倉庫如下:

textrank extract_summary test 

不過,我得到了以下錯誤:

MacBook-Pro:TextRank-master $ textrank extract_summary test 
Traceback (most recent call last): 
    File "/usr/local/bin/textrank", line 11, in <module> 
    load_entry_point('textrank==0.1.0', 'console_scripts', 'textrank')() 
    File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__ 
    return self.main(*args, **kwargs) 
    File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main 
    rv = self.invoke(ctx) 
    File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke 
    return _process_result(sub_ctx.command.invoke(sub_ctx)) 
    File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke 
    return ctx.invoke(self.callback, **ctx.params) 
    File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke 
    return callback(*args, **kwargs) 
    File "/usr/local/lib/python3.6/site-packages/main.py", line 21, in extract_summary 
    summary = textrank.extract_sentences(f.read()) 
    File "/usr/local/lib/python3.6/site-packages/textrank/__init__.py", line 169, in extract_sentences 
    sent_detector = nltk.data.load('tokenizers/punkt/english.pickle') 
    File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 801, in load 
    opened_resource = _open(resource_url) 
    File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 919, in _open 
    return find(path_, path + ['']).open() 
    File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 641, in find 
    raise LookupError(resource_not_found) 
LookupError: 
********************************************************************** 
    Resource 'tokenizers/punkt/PY3/english.pickle' not found. 
    Please use the NLTK Downloader to obtain the resource: >>> 
    nltk.download() 
    Searched in: 
    - '/Users/ad/nltk_data' 
    - '/usr/share/nltk_data' 
    - '/usr/local/share/nltk_data' 
    - '/usr/lib/nltk_data' 
    - '/usr/local/lib/nltk_data' 
    - '' 
********************************************************************** 

似乎有是缺少NLTK庫的文件,所以我嘗試:

MacBook-Pro:TextRank-master adolfocamachogonzalez$ python3 
Python 3.6.1 (default, Apr 4 2017, 09:40:21) 
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import nltk 
>>> nltk.download() 
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml 

但是我沒能獲得外部資源,因爲我試圖複製和粘貼鏈接到瀏覽器,但我只喜歡一個XML結構如下:

<?xml version="1.0"?> 
<?xml-stylesheet href="index.xsl" type="text/xsl"?> 
<nltk_data> 
    <packages> 
    <package checksum="d577c2cd0fdae148b36d046b14eb48e6" id="maxent_ne_chunker" languages="English" name="ACE Named Entity Chunker (Maximum entropy)" size="13404747" subdir="chunkers" unzip="1" unzipped_size="23604982" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/chunkers/maxent_ne_chunker.zip" /> 
    <package author="Australian Broadcasting Commission" checksum="ffb36b67ff24cbf7daaf171c897eb904" id="abc" name="Australian Broadcasting Commission 2006" size="1487851" subdir="corpora" unzip="1" unzipped_size="4054966" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/abc.zip" webpage="http://www.abc.net.au/" /> 
    <package checksum="ae529a1c5f13d6074f5b0d68d8edb537" contact="Gertjan van Noord" id="alpino" license="Distributed with permission of Gertjan van Noord" name="Alpino Dutch Treebank" size="2797255" subdir="corpora" unzip="1" unzipped_size="21604821" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/alpino.zip" webpage="http://www.let.rug.nl/~vannoord/trees/" /> 
    <package checksum="d3be36b53ab201372f1cd63ffc75e9a9" copyright="Public Domain (not copyrighted)" id="biocreative_ppi" license="Public Domain" name="BioCreAtIvE (Critical Assessment of Information Extraction Systems in Biology)" size="223566" subdir="corpora" unzip="1" unzipped_size="1537086" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/biocreative_ppi.zip" webpage="http://www.mitre.org/public/biocreative/" /> 
    <package author="W. N. Francis and H. Kucera" checksum="a0a8630959d3d937873b1265b0a05497" id="brown" license="May be used for non-commercial purposes." name="Brown Corpus" size="3314357" subdir="corpora" unzip="1" unzipped_size="10117565" url="https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/pa 

回答

1

文件english.pickle是將文本分解爲句子的「punkt」分詞器的一部分。要下載它,請執行以下操作(或者在交互式下載程序的模型下找到「punkt」)。

nltk.download("punkt") 

下載程序將檢查它可以寫入的位置的標準路徑列表,並將模型文件保存在那裏。之後它將可用於textrank的內部。