字典理解不按預期行事

def tf(tokens): 
    """ Compute TF 
    Args: 
     tokens (list of str): input list of tokens from tokenize 
    Returns: 
     dictionary: a dictionary of tokens to its TF values 
    """ 
    li = {} 
    total = len(tokens) 
    li = {token: 1 if not token in li else li[token] + 1 for token in tokens } 
    return {token: li[token]/ float(total) for token in li}

基本上，我想要一個字典，其中令牌是關鍵字，值是令牌列表中該令牌的頻率。字典理解不按預期行事

我希望我的理解能夠檢查令牌是否已經在li中。如果它只是將其值增加1，如果不是，則創建並將其值設置爲1.

由於某些原因，每個密鑰的值都是（1），無論它出現多少次在令牌列表中。

你能幫我看看爲什麼會發生這種情況嗎？

我可以用循環解決它，但我想掌握字典的理解。

非常感謝！

來源

2015-06-21 Paca

使用Counter字典：

from collections import Counter 
li = Counter(tokens)

使用普通的字典，你需要使用一個for循環和dict.setdefault：

li = {} 

for t in tokens: 
    li.setdefault(t,0) # if key not yet added create key/value pairing 
    li[t] += 1 # increment count for the key

你不能在一個字典理解遞增計數，你將永遠最後計數爲1，您需要一個Counter字典或一個顯式循環來處理重複鍵，因爲li指的是空字典，直到理解完成爲止。

if not token in li始終是真實的，所以你總是那麼在你的函數使用Couner字典和迭代值設置爲1

在項目：

def tf(tokens): 
    """ Compute TF 
    Args: 
     tokens (list of str): input list of tokens from tokenize 
    Returns: 
     dictionary: a dictionary of tokens to its TF values 
    """ 
    total = float(len(tokens)) 
    li = Counter(tokens) 
    return {token: v/float(total) for token,v in li.iteritems()}

來源

2015-06-21 12:16:44

字典理解執行第一，產生一個新的字典對象。只有當表達式完成時，li才綁定到該新字典。

換句話說，這是引擎蓋下會發生什麼，沒有_result是可用於循環參考：

li = {} 
_result = {} 
for token in tokens: 
    _result[token] = 1 if not token in li else li[token] + 1 
li = _result

由於li整個循環過程中是空的，token in li總是將是False。字典理解本身工作得很好。

如果你想算值，你可以只是使它成爲一個明確的循環：

li = {} 
for token in tokens: 
    li[token] = 1 if not token in li else li[token] + 1

但你會更好使用collections.Counter() object，它封裝相同的程序，並增加了其他功能上頂部：

from collections import Counter 

def tf(tokens): 
    li = Counter(tokens) 
    total = float(len(tokens)) 
    return {token: li[token]/total for token in li}

來源

2015-06-21 12:22:42

'itertools.Counter（）object'？我認爲你的意思是'collections.Counter（）' – Abhijit

@Ahhijit：我當然了。呃，甚至不知道爲什麼Chrome自動完成URL，我以前一定犯過錯誤。關閉網站搜索引擎！ –

@Abhijit：bingo，[用變量名稱鍵添加項到詞典python]（https://stackoverflow.com/a/12732835）已更正。 –

像列表/字典理解這樣的理解表達式是一個構建器表達式，並且該對象在表達式完全評估之前不會構造。在符號名稱後面跟隨生成字典的引用。

在您的特定示例中，您指的是符號li，它指的是對象空字典。所以表達式的評估過程中，li繼續指向一個空的字典這意味着，字典理解，可以等效寫成

li = {token: 1 if not token in {} else l{}[token] + 1 for token in tokens }

或簡化爲成員測試上一個空的字典永遠是假的

li = {token: 1 for token in tokens }

您需要的是已有的庫實用程序或基於狀態的解決方案。

幸運的是，標準庫collections提供了一個名爲counter函數的編寫和爲此而設計的

這隻會您的功能

def tf(tokens): 
    from collections import Counter 
    """ Compute TF 
    Args: 
     tokens (list of str): input list of tokens from tokenize 
    Returns: 
     dictionary: a dictionary of tokens to its TF values 
    """ 
    return Counter(tokens)

基於狀態的解決方案只需要爲每一個外部計數器獨特發生

def tf(tokens): 
    from collections import defaultdict 
    """ Compute TF 
    Args: 
     tokens (list of str): input list of tokens from tokenize 
    Returns: 
     dictionary: a dictionary of tokens to its TF values 
    """ 
    counter = defaultdict(int) 
    for token in tokens: 
      counter[token] += 1 
    return counter

或者如果您不打算使用defaultdict

def tf(tokens): 
    from collections import defaultdict 
    """ Compute TF 
    Args: 
     tokens (list of str): input list of tokens from tokenize 
    Returns: 
     dictionary: a dictionary of tokens to its TF values 
    """ 
    counter = {} 
    for token in tokens: 
      counter[token] = counter.get(token, 0) + 1 
    return counter

來源

2015-06-21 12:23:25 Abhijit

def tf(tokens): 
    mydic = {} 
    for key in tokens: 
     if key not in mydic: 
      mydic[key] = 1 
     else: 
      mydic[key] = mydic[key] + 1 
    d2 = dict((k, float(v)/len(tokens)) for k,v in mydic.items())    
    return d2

來源

2015-06-25 22:19:32

字典理解不按預期行事

回答

相關問題