爲條件頻率分佈創建一個令牌和文本元組

我想創建一個表，其中顯示3個文本中的某些詞的頻率，而文本是列和詞是行。爲條件頻率分佈創建一個令牌和文本元組

在表中，我想看看哪個單詞出現在哪個文本中。

這是我的文字和文字：

texts = [text1, text2, text3] 
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse']

爲了創造條件頻率分佈我想創建的元組應該像很多= [該列表（「文本1」，「血」），（ '文本1'， '青年'），...（ '文本2'， '血'），...）

我試圖像這樣創造很多：

lot = [(words, texte) 
    for word in words 
    for text in texts]

而不是很多=（'text1'，'blood'）等，而不是'text1'是整個t在列表中分機。

如何創建用於條件頻率分佈函數的元組列表？

來源

2015-06-21 Fadinha

不知道我完全理解你想要什麼，但是這可能會幫助http://stackoverflow.com/questions/30970342/remove-標點符號從 - 一個列表/ 30970369＃30970369 –

希望我已經正確理解你的問題。我認爲你將變量'word'和'texts'分配給他們自己的元組。

嘗試以下操作：

texts = [text1, text2, text3] 
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse'] 
lot = [(word, text) 
for word in words 
for text in texts]

編輯：因爲變化是如此微妙，我應該更詳細一點。在你的原始代碼中，你將'單詞'和'文本'設置爲它們自己的元組，即你正在分配整個數組而不是數組中的每個元素。

來源

2015-06-22 01:13:40 user3636636

我認爲這個嵌套的列表理解可能是你想要做的？

lot = [(word, 'text'+str(i)) 
    for i,text in enumerate(texts) 
    for word in text.split() 
    if word in words]

但是你可能要考慮使用Counter代替：

from collections import Counter 
counts = {} 
for i, text in enumerate(texts): 
    C = Counter(text.split()) 
    for word in words: 
     if word in C: 
     counts[word]['text'+str(i)] = C[word] 
     else: 
     counts[word]['text'+str(i)] = 0

來源

2015-06-22 01:15:42 maxymoo

爲條件頻率分佈創建一個令牌和文本元組

回答

相關問題