在NLTK中爲語料庫查找路徑

我正在使用自然語言工具包來編寫Python程序。其中我正在嘗試加載我自己文件的語料庫。要做到這一點，我使用的代碼如下效果：在NLTK中爲語料庫查找路徑

from nltk.corpus import PlaintextCorpusReader 
corpus_root=(insert filepath here) 
wordlists=PlaintextCorpusReader(corpus_root, '.*')

比方說，我的文件被稱爲reader.py和我的文件的語料庫位於在同一目錄reader.py稱爲「語料庫」目錄。我想知道一種通用的方法來查找上面的文件路徑，以便我的代碼可以爲使用代碼的任何人查找任何位置的「corpus」目錄的路徑。我試過這些帖子，但他們只允許我獲得絕對文件路徑： Find current directory and file's directory

任何幫助將不勝感激！

來源

2013-06-27 MEric

據我瞭解

你reader.py文件和目錄corpus總是在同一個目錄
你正在尋找一種方式來從reader.py指corpus無論在哪裏，你把它們放在你的目錄結構

在這種情況下，the question that you referred to似乎是你所需要的。另一種方法是在this other answer。使用第二個選項，您的代碼將被：

from nltk.corpus import PlaintextCorpusReader 
import os.path 
import sys 

basepath = os.path.dirname(__file__) 
corpus_root= os.path.abspath(os.path.join(basepath, "corpus")) 
wordlists=PlaintextCorpusReader(corpus_root, '.*')

記住的是，雖然創建的絕對路徑，它基於上面的basepath = os.path.dirname(__file__)位得到的信息，這將產生的reader.py當前目錄中創建。有些官方文檔請查看the documentation。

來源

2013-06-27 18:08:47 arturomp

在NLTK中爲語料庫查找路徑

回答

相關問題