從文本文件中製作完整的字典？

-2

爲.txt文件創建字典最簡單的方法是什麼？文本文件中的每個單詞都用空格分隔。文件中的每個單詞都應該是一個關鍵字（在字典中），其值是文件中某處的所有關鍵字，包括重複。從文本文件中製作完整的字典？

因此，如果文本文件是：我喜歡貓和狗。狗喜歡貓。我更喜歡狗。

字典是：

d = {'I': ['like', 'like'], 'like': ['cats', 'cats', 'dogs'], 'cats': ['and', '. ']...

...並且直到所有的話都成了關鍵。

編輯：對不起，我沒有顯示我到目前爲止的代碼，因爲我是一個極端的初學者，幾乎不知道我在做什麼。而且，它看起來很糟糕。然而，這裏的一些：

def textDictionary(fileName): 
    p = open(fileName) 
    f = p.read() 
    w = f.split() 
    newDictionary = {} 
    for i in range(len(w)): 
     newDictionary[w[i]] = w[i+1] 
    return newDictionary

現在，這當然不應該做的一切，我想還，但應該不會吧，至少回報：

{「我」：「像」，「像」： '貓'，'貓'：'和'...}

...等等？

但它給了我一些完全不同的東西。

來源

2014-02-24 user3317405

雖然我們很多人會很樂意幫助回答你的問題，我們更容易理解這個問題，並提供一個有用的答案，如果你告訴我們，你已經嘗試過。以下是關於如何提供[最小，完整，測試和可讀]（http://stackoverflow.com/help/mcve）代碼的一些信息。 – mhlester

對我而言，這看起來像一個defaultdict的工作。首先，你需要決定如何分割的話 - 爲了簡單起見，我就拆上的空白，但因爲你有標點，這可能是對正則表達式工作：

from collections import defaultdict 
d = defaultdict(list) 

with open('textfile') as fin: 
    data = fin.read() 
    words = data.split() 

for i, w in words: 
    try: 
     d[w].append(words[i+1]) 
    except IndexError: 
     pass # last word has no words which follow it...

來源

2014-02-24 04:48:55 mgilson

最好的辦法是遍歷兩個併發循環中的字，偏移一個。爲此，請在原始列表和列表[1:]上使用zip。

這個迭代將是你的關鍵和價值的字典。或者在這種情況下，defaultdict。使用list創建的defaultdict，自動初始化每個鍵爲空列表。

我喜歡貓，狗喜歡貓

{'I': ['like'], 'and': ['dogs'], 'cats': ['and'], 'like': ['cats', 'cats'], 'dogs': ['like']}

因此需要不設定初始值

from collections import defaultdict 

def textDictionary(fileName): 
    with open(fileName) as p: # with to open and automatically close 
     f = p.read() 
     w = f.split() 

    newDictionary = defaultdict(list) 
    # defaultdict initialized with list makes each element a list automatically, 
    # this is great for `append`ing 

    for key, value in zip(w, w[1:]): 
     newDictionary[key].append(value) # easy append! 

    return dict(newDictionary) # dict() changes defaultdict to normal

文件，你可以append

我注意到在這種情況下like後面是cats兩次。如果你只想要一個，初始化defaultdict與set代替list，並使用.add代替.append

Documentation on zip
Documentation on defaultdict

來源

2014-02-24 05:05:15 mhlester

好吧，這個作品，但爲不同的標點差異，如'貓'和'貓'，我不想要的單詞鍵。但是，我確實希望基於標點符號區分不同的值（即'like'：['cats'，以及'cats。']）。另外，我可以製作一個代表句子結尾/句首的關鍵詞嗎？比如，會有一個關鍵字'＆'，其值將是任何在文件某個位置開始一個句子的單詞？此鍵'＆'也會顯示在其他鍵的值中（即'貓'：['和'，'＆']將是一個條目）。任何我可以實現這些改變的方式？ – user3317405

很高興聽到這個作品！至於你的新問題，你應該繼續，並通過點擊[Ask Question]（問問題）（http://stackoverflow.com/questions/ask）按鈕將其作爲新問題提出。堆棧溢出模式針對個別問題和特定答案。 – mhlester

從文件中讀取行之後，你可以這樣做：

line = 'I like cats and dogs. Dogs like cats. I like dogs more.' 
line = line.replace('.', ' .') #To make sure 'dogs.' or 'cats.' do not become the keys of the dictionary. 
op = defaultdict(list) 
words = line.split() 
for i, word in enumerate(words): 
    if word not in '.': #To make sure '.' is not a key in the dictionary 
     try: 
      op[word].append(words[i+1]) 
     except IndexError: 
      pass

您需要明確照顧的唯一一件事情就是完全停止。評論解釋了代碼如何實現這一點。上面的代碼的結果：

{'and': ['dogs'], 'like': ['cats', 'cats', 'dogs'], 'I': ['like', 'like'], 'dogs': ['.', 'more'], 'cats': ['and', '.'], 'Dogs': ['like'], 'more': ['.']}

來源

2014-02-24 05:06:48 shaktimaan

從文本文件中製作完整的字典？

回答

相關問題