創建csv文件詞典的字典

嗨，所以我想寫一個函數，分類（csv_file），從csv文件創建字典的默認詞典。第一個「列」（每行中的第一項）是字典中每個條目的關鍵字，然後第二個「列」（每行中的第二項）將包含這些值。創建csv文件詞典的字典

不過，我想通過調用兩個函數（以這個順序）改變值：

trigram_c（字符串）創建的字符串中的卦數的默認詞典（這是值）
正常（tri_counts）：它接受trigram_c的輸出並對計數進行標準化（即將每個trigram的計數轉換爲數字）。

因此，我最終的輸出是詞典的詞典：

{value: {trigram1 : normalised_count, trigram2: normalised_count}, value2: {trigram1: normalised_count...}...} and so on

我當前的代碼如下所示：

def classify(csv_file): 
    l_rows = list(csv.reader(open(csv_file))) 
    classified = dict((l_rows[0], l_rows[1]) for rows in l_rows)

例如，如果CSV文件是：

Snippet1, "It was a dark stormy day" 
Snippet2, "Hello world!" 
Snippet3, "How are you?"

最終輸出將類似於：

{Snippet1: {'It ': 0.5352, 't w': 0.43232}, Snippet2: {'Hel' : 0.438724,...}...} and so on.

（當然，會有不止兩個三元組計數，而且這個數字對於這個例子來說只是隨機的）。

任何幫助將不勝感激！

來源

2016-05-08 Indifferent Potato

首先，請檢查分類功能，因爲我無法運行它。這裏修正版本：

import csv 

def classify(csv_file): 
    l_rows = list(csv.reader(open(csv_file))) 
    classified = dict((row[0], row[1]) for row in l_rows) 
    return classified

它返回字典與第一列的鍵，值是第二列的字符串。
所以你應該迭代每個字典條目並將其值傳遞給trigram_c函數。我不明白你如何計算卦數，但是例如，如果你只是計算字符串中卦出現的次數，你可以使用下面的函數。如果你想進行其他計數，你只需要更新for循環中的代碼。

def trigram_c(string): 
    trigram_dict = {} 
    start = 0 
    end = 3 
    for i in range(len(string)-2): 
     # you could implement your logic in this loop 
     trigram = string[start:end] 
     if trigram in trigram_dict.keys(): 
      trigram_dict[trigram] += 1 
     else: 
      trigram_dict[trigram] = 1 
     start += 1 
     end += 1 
    return trigram_dict

來源

2016-05-08 17:31:44

創建csv文件詞典的字典

回答

相關問題