我看到以下gensim tutorial page的腳本片段。Python中的「一字一句」語法是什麼意思?
在下面的Python腳本中是什麼語法的?
>> texts = [[word for word in document.lower().split() if word not in stoplist]
>> for document in documents]
我看到以下gensim tutorial page的腳本片段。Python中的「一字一句」語法是什麼意思?
在下面的Python腳本中是什麼語法的?
>> texts = [[word for word in document.lower().split() if word not in stoplist]
>> for document in documents]
這是一個list comprehension。您發佈的代碼循環遍歷document.lower.split()
中的每個元素,並創建一個僅包含滿足if
條件的元素的新列表。它爲documents
中的每個文檔執行此操作。
試試吧......
elems = [1, 2, 3, 4]
squares = [e*e for e in elems] # square each element
big = [e for e in elems if e > 2] # keep elements bigger than 2
你可以從你的例子看,列表內涵可以被嵌套。
這是一個list comprehension。一個更簡單的例子可能是:
evens = [num for num in range(100) if num % 2 == 0]
我很確定我在某些NLP應用程序中看到了這一行。
這個列表解析:
[[word for word in document.lower().split() if word not in stoplist] for document in documents]
相同
ending_list = [] # often known as document stream in NLP.
for document in documents: # Loop through a list.
internal_list = [] # often known as a a list tokens
for word in document.lower().split():
if word not in stoplist:
internal_list.append(word) # this is where the [[word for word...] ...] appears
ending_list.append(internal_list)
基本上你想要包含標記列表的文件清單。因此,通過文件循環,
for document in documents:
你再拆每個文檔分解成記號
list_of_tokens = []
for word in document.lower().split():
,然後使這些標記的列表:
list_of_tokens.append(word)
例如:
>>> doc = "This is a foo bar sentence ."
>>> [word for word in doc.lower().split()]
['this', 'is', 'a', 'foo', 'bar', 'sentence', '.']
It's th同樣:
>>> doc = "This is a foo bar sentence ."
>>> list_of_tokens = []
>>> for word in doc.lower().split():
... list_of_tokens.append(word)
...
>>> list_of_tokens
['this', 'is', 'a', 'foo', 'bar', 'sentence', '.']
謝謝幫助了很多與解釋... –
很高興答案幫助=) – alvas