2017-10-15 170 views
0
def unpack_dict(matrix, map_index_to_word): 
    table = sorted(map_index_to_word, key=map_index_to_word.get)  
    data = matrix.data 
    indices = matrix.indices 
    indptr = matrix.indptr   
    num_doc = matrix.shape[0]  
    return [{k:v for k,v in zip([table[word_id] for word_id in 
    indices[indptr[i]:indptr[i+1]] ], 
    data[indptr[i]:indptr[i+1]].tolist())} \ 
       for i in range(num_doc) ] 

wiki['tf_idf'] = unpack_dict(tf_idf, map_index_to_word) 

enter image description here任何人都可以解釋這個列表的理解?

map_index_to_word是單詞的詞典:指數幾千字。 tf_idf是TFIDF稀疏矢量 數據幀維基顯示在屏幕截圖這裏

回答

3
[{k: v for k, v in zip([table[word_id] for word_id in indices[indptr[i]:indptr[i + 1]]],data[indptr[i]:indptr[i + 1]].tolist())} for i in range(num_doc)] 

是一樣的:

final_list = [] 
for i in range(num_doc): 
    new_list = [] 
    for word_id in indices[indptr[i]:indptr[i + 1]]: 
     new_list.append(table[word_id]) 

    new_dict = {} 
    for k, v in zip(new_list, data[indptr[i]:indptr[i + 1]].tolist()): 
     new_dict[k] = v 
    final_list.append(new_dict) 
3

這?

[{k:v for k,v in zip([table[word_id] for word_id in 
    indices[indptr[i]:indptr[i+1]] ], 
    data[indptr[i]:indptr[i+1]].tolist())} \ 
       for i in range(num_doc) ] 

外的理解是

[... for i in range(num_doc) ] 

只是一個簡單的循環num_doc倍。

裏面是一個詞典理解。

{k:v for k,v in zip()} 

zip需要從k鍵:

[table[word_id] for word_id in indices[indptr[i]:indptr[i+1]] ] 

v值從:

data[indptr[i]:indptr[i+1]].tolist() 

所以i,外變量創建切片範圍,indptr[i]:indptr[i+1]

所以這是一個詞典列表。字典鍵值爲table[word_id],其中word_id位於indices的範圍內,其值爲data的對應範圍。

相關問題