2014-01-06 62 views

回答

6

這是一個list comprehension。您發佈的代碼循環遍歷document.lower.split()中的每個元素,並創建一個僅包含滿足if條件的元素的新列表。它爲documents中的每個文檔執行此操作。

試試吧......

elems = [1, 2, 3, 4] 
squares = [e*e for e in elems] # square each element 
big = [e for e in elems if e > 2] # keep elements bigger than 2 

你可以從你的例子看,列表內涵可以被嵌套。

5

這是一個list comprehension。一個更簡單的例子可能是:

evens = [num for num in range(100) if num % 2 == 0] 
4

我很確定我在某些NLP應用程序中看到了這一行。

這個列表解析:

[[word for word in document.lower().split() if word not in stoplist] for document in documents] 

相同

ending_list = [] # often known as document stream in NLP. 
for document in documents: # Loop through a list. 
    internal_list = [] # often known as a a list tokens 
    for word in document.lower().split(): 
    if word not in stoplist: 
     internal_list.append(word) # this is where the [[word for word...] ...] appears 
    ending_list.append(internal_list) 

基本上你想要包含標記列表的文件清單。因此,通過文件循環,

for document in documents: 

你再拆每個文檔分解成記號

list_of_tokens = [] 
    for word in document.lower().split(): 

,然後使這些標記的列表:

list_of_tokens.append(word)  

例如:

>>> doc = "This is a foo bar sentence ." 
>>> [word for word in doc.lower().split()] 
['this', 'is', 'a', 'foo', 'bar', 'sentence', '.'] 

It's th同樣:

>>> doc = "This is a foo bar sentence ." 
>>> list_of_tokens = [] 
>>> for word in doc.lower().split(): 
... list_of_tokens.append(word) 
... 
>>> list_of_tokens 
['this', 'is', 'a', 'foo', 'bar', 'sentence', '.'] 
+0

謝謝幫助了很多與解釋... –

+1

很高興答案幫助=) – alvas