將多個值關聯到字典中的一個鍵Python

所以我正在研究這個文本挖掘項目。我試圖打開所有文件，獲取組織和摘要信息，在摘要中分割單詞，然後找出每個單詞顯示多少個文件。我的問題是關於最後一步：一個單詞顯示多少個文件？爲了回答這個問題，我正在制定一個字典詞的頻率來計算。我試圖告訴詞典：如果詞典沒有顯示在詞典中，捕獲附在其上的詞和文件編號;如果單詞顯示在字典中，但文件編號與任何現有編號不同，請在其後面附加文件編號。如果單詞及其文件號已經在字典中，則忽略它。以下是我的代碼。將多個值關聯到字典中的一個鍵Python

capturedfiles = [] 
capturedabstracts = [] 
wordFrequency = {} 
wordlist=open('test.txt','w') 
worddict=open('test3.txt','w') 
for filepath in matches[0:5]: 
    with open (filepath,'rt') as mytext: 
    mytext=mytext.read() 
    #print mytext 

    # code to capture file organizations. 
    grabFile=re.findall(r'File\s+\:\s+(\w\d{7})',mytext) 
    if len(grabFile) == 0: 
     matchFile= "N/A" 
    else: 
     matchFile = grabFile[0] 
    capturedfiles.append(matchFile) 

    # code to capture file abstracts 
    grabAbs=re.findall(r'Abstract\s\:\s\d{7}\s(\w.+)',mytext) 
    if len(grabAbs) == 0: 
     matchAbs= "N/A" 
    else: 
     matchAbs = grabAbs 
    capturedabstracts.append(matchAbs) 

    # arrange words in format. 
    lineCount = 0 
    wordCount = 0 
    lines = matchAbs[0].split('. ') 
    for line in lines: 
     lineCount +=1 
     for word in line.split(' '): 
      wordCount +=1 
      wordlist.write(matchFile + '|' + str(lineCount) + '|' + str(wordCount) + '|' + word + '\n') 

      if word not in wordFrequency: 
       wordFrequency[word]=[matchFile] 
      else: 
       if matchFile not in wordFrequency[word]: 
         wordFrequency[word].append(matchFile) 
       worddict.write(word + '|' + str(matchFile) + '\n') 


wordlist.close() 
worddict.close()

什麼我現在得到的是每一個字被與其匹配的文件號碼打印出來。如果一個單詞在整個文本中出現兩次，它將分別打印兩次。下面是它的外觀像一個例子：

變化| a9500006 是| a9500006 是| a9500007

我希望它看起來像：

變化| a9500006 是| a9500006，a9500007

來源

2014-04-04 Q-ximi

您需要的行爲正是'dict'對象的工作方式，這裏的問題與您打印文本的方式有關。如果只打印字典，則應該看到與多個值配對的鍵。 –

當我嘗試'打印wordFrequency'時，它反覆打印出結果。當我將它寫入另一個文件時，會列出每一個單詞。如果一個單詞在多個文件或多個文件中出現多次，它們將全部單獨列出。 –

將'print wordFrequency'放在任何循環之外並放在代碼的底部。 –

而不是寫在worddict每循環，編寫整個wordFrequency字典後。像這樣：

#assuming wordFrequency is a correctly built dictionary 
for key, value in wordFrequency.items(): 
    #key is a word, value is a list 
    worddict.write(key + '|') 
    for word in value: 
     #write each word in value 
     worddict.write(word) 
     #if it's not the last word, write a comma 
     if word != value[-1]: 
      worddict.write(', ') 
    #no more words, end line 
    worddict.write('\n')

PS：永遠，永遠，永遠混合製表符和空格！特別是在Python！

來源

2014-04-04 19:35:58 Dunno

這不是我正在尋找的... –

@ Q-ximi請解釋一下爲什麼，你的問題很混亂。看着評論，似乎我不是唯一一個這麼認爲的人。 – Dunno

將多個值關聯到字典中的一個鍵Python

回答

相關問題