Python：列表的列表的字典

def makecounter(): 
    return collections.defaultdict(int) 

class RankedIndex(object): 
    def __init__(self): 
    self._inverted_index = collections.defaultdict(list) 
    self._documents = [] 
    self._inverted_index = collections.defaultdict(makecounter) 


def index_dir(self, base_path): 
    num_files_indexed = 0 
    allfiles = os.listdir(base_path) 
    self._documents = os.listdir(base_path) 
    num_files_indexed = len(allfiles) 
    docnumber = 0 
    self._inverted_index = collections.defaultdict(list) 

    docnumlist = [] 
    for file in allfiles: 
      self.documents = [base_path+file] #list of all text files 
      f = open(base_path+file, 'r') 
      lines = f.read() 

      tokens = self.tokenize(lines) 
      docnumber = docnumber + 1 
      for term in tokens: 
       if term not in sorted(self._inverted_index.keys()): 
        self._inverted_index[term] = [docnumber] 
        self._inverted_index[term][docnumber] +=1           
       else: 
        if docnumber not in self._inverted_index.get(term): 
         docnumlist = self._inverted_index.get(term) 
         docnumlist = docnumlist.append(docnumber) 
      f.close() 
    print '\n \n' 
    print 'Dictionary contents: \n' 
    for term in sorted(self._inverted_index): 
     print term, '->', self._inverted_index.get(term) 
    return num_files_indexed 
    return 0

我得到執行此代碼時的索引錯誤：列表索引超出範圍。Python：列表的列表的字典

上面的代碼生成一個字典索引，它將'term'存儲爲一個鍵，並將該術語作爲列表存儲在其中的文檔編號。對於例如：如果術語「貓」在文件1.txt的，5.txt和7.txt字典時將有：貓< - [1,5,7]

現在，我要修改它會添加詞頻，因此如果單詞cat在文檔1中出現兩次，文檔5中出現三次，文檔7出現一次：預期結果： term < - [[docnumber，term freq]，[docnumber，term freq]] < - 列表中的字典！貓< - [[1,2]，[5,3]，[7,1]]

我玩過代碼，但沒有任何效果。我不知道如何修改這個數據結構來達到上述目的。

在此先感謝。

來源

2010-10-05 csguy11

首先，使用工廠。首先：

def makecounter(): 
    return collections.defaultdict(int)

，並在以後使用

self._inverted_index = collections.defaultdict(makecounter)

，併爲for term in tokens:循環，

 for term in tokens: 
       self._inverted_index[term][docnumber] +=1

這使得在每個self._inverted_index[term]的字典如

{1:2,5:3,7:1}

在ÿ我們的例子。既然你想要，而不是在每個self._inverted_index[term]列表的列表，然後就在循環加載結束後：

self._inverted_index = dict((t,[d,v[d] for d in sorted(v)]) 
          for t in self._inverted_index)

一旦製成（這種方式或其他任何 - 我只是顯示一個簡單的方法來構建它！），那麼這個數據結構實際上會使用起來很尷尬，因爲當你不必要地構造時，這個數據結構很難使用（字典的字典更加有用，易於使用和構造），但是，嘿，男人肉＆c ;-)。

來源

2010-10-05 03:14:44

我已經做出了您所建議的更改。我意識到你的方法比實施清單列表更簡單明瞭。但是，它目前給我一個錯誤，我編輯了上面的代碼。 – csguy11 2010-10-05 03:37:37

@csguy，在你的'indexdir'方法中（假設它**是** 1，你的縮進如上所述都是錯誤的），你可以完全摧毀以前分配給'self._inverted_index'的任何東西，方法是將之前的，錯誤的數據結構，從而使您對代碼的編輯完全無關緊要。當你做'self.a = b'的時候，你意識到，只要沒有更多的事情就無所謂了，如果有的話，以前被分配給'self.a'，對吧？！ – 2010-10-05 05:10:25

我得到了問題所在，但由於我不太瞭解你的實現，所以我決定堅持我的方法，即列表列表的字典，即使它過於複雜。 – csguy11 2010-10-05 06:42:29

也許你可以爲（docname，frequency）創建一個簡單的類。

然後你的字典可能有這個新的數據類型的列表。你也可以做一個列表清單，但是一個單獨的數據類型會更乾淨。

來源

2010-10-05 03:06:17 JoshD

下面是一個可以使用的通用算法，但是您可以調整一些代碼。它產生一個字典，其中包含每個文件的字數統計字典。

filedicts = {} 
for file in allfiles: 
    filedicts[file] = {} 

    for term in terms: 
    filedict.setdefault(term, 0) 
    filedict[term] += 1

來源

2010-10-05 03:09:32 mikerobi

Python：列表的列表的字典

回答

相關問題