Python NLTK練習：第5章

Hy guys，我開始在NLTK團隊的官方書籍之後學習NLTK。Python NLTK練習：第5章

我在5章「標記」 - 我不能在PDF版186頁解決excercises之一：

鑑於CFD2 ['VN指定過去分詞名單'] .keys（），嘗試收集緊接該列表中項目之前的所有字標記對的列表。

我試着這樣說：

wsj = nltk.corpus.treebank.tagged_words(simplify_tags=True) 

[wsj[wsj.index((word,tag))-1:wsj.index((word,tag))+1] for (word,tag) in wsj if word in cfd2['VN'].keys()]

，但它給了我這個錯誤：

Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/util.py", line 401, in iterate_from 
for tok in piece.iterate_from(max(0, start_tok-offset)): 
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/util.py", line 295, in iterate_from 
self._stream.seek(filepos) 
AttributeError: 'NoneType' object has no attribute 'seek'

我覺得我做錯了什麼在訪問華爾街日報結構，但我不能弄清楚什麼是錯的！

你能幫我嗎？

在此先感謝！

來源

2013-04-30 The Condor

wsj是nltk.corpus.reader.util.ConcatenatedCorpusView型表現得像一個列表（這就是爲什麼你可以使用功能，如index()），但「幕後」 NLTK從不讀取整個表到內存中，它只會從文件中讀取這些部分它需要的對象。看起來如果你在一個CorpusView對象上迭代並且使用index()（它需要再次迭代），那麼該文件對象將返回None。

這樣，它的工作原理，但它比列表理解那麼優雅：

for i in range(len(wsj)): 
    if wsj[i][0] in cfd2['VN'].keys(): 
     print wsj[(i-1):(i+1)]

來源

2013-04-30 22:14:54

謝謝，它的工作原理！ – 2013-05-02 20:14:45

貌似這兩個指數認購及切分導致異常：

wsj = nltk.corpus.treebank.tagged_words(simplify_tags=True) 
cfd2 = nltk.ConditionalFreqDist((t,w) for w,t in wsj) 
wanted = cfd2['VN'].keys() 

# just getting the index -> exception before 60 items 
for w, t in wsj: 
    if w in wanted: 
     print wsj.index((w,t)) 

# just slicing -> sometimes finishes, sometimes throws exception 
for i, (w,t) in enumerate(wsj): 
    if w in wanted: 
     print wsj[i-1:i+1]

我猜它是通過訪問您遍歷流以前的項目造成的。

，如果你遍歷一旦超過wsj創建索引列表，並使用他們的第二次迭代搶切片它工作正常：

results = [ 
    wsj[j-1:j+1] 
    for j in [ 
     i for i, (w,t) in enumerate(wsj) 
     if w in wanted 
    ] 
]

補充說明：調用index沒有start論證會每次返回第一場比賽。

來源

2013-04-30 16:58:01

wsj爲ConcatenatedCorpusView類型的，我認爲這是一個空的元組('.', '.')窒息。最簡單的解決方案是將ConcatenatedCorpusView明確地轉換爲list。你可以通過做：

wsj = list(wsj)

迭代工作正常即可。獲取重複項目的索引是一個單獨的問題。請參閱：https://gist.github.com/denten/11388676

來源

2014-04-29 04:51:00 denten

Python NLTK練習：第5章

回答

相關問題