以更快的方式創建字典 - Python

我有以下文件包含超過500.000行。行如下所示：以更快的方式創建字典 - Python

0-0 0-1 1-2 1-3 2-4 3-5 
0-1 0-2 1-3 2-4 3-5 4-6 5-7 6-7 
0-9 1-8 2-14 3-7 5-6 4-7 5-8 6-10 7-11

對於每個元組，第一個數字表示一個字的上線n的文本中的，第二個數字字的索引在同一行n，而是在文本中的索引灣同樣值得指出的是，文本a中的同一個單詞可能與文本b中的多個單詞相關聯;如索引0處的行一樣，文本a中位置0處的詞連接到位置0處的兩個詞和文本b中的1。現在我想從上述行中提取信息，因此很容易檢索文本中的哪個單詞與文本中的哪個單詞連接。我想到使用字典，如下面的代碼是：

#suppose that I have opened the file as f 
for line in f.readlines(): 
    #I create a dictionary to save my results 
    dict_st=dict() 
    #I split the line so to get items like '0-0', '0-1', etc. 
    items=line.split() 
    for item in align_spl: 
     #I split each item at the hyphen so to get the two digits that are now string. 
     als=item.split('-') 
     #I fill the dictionary 
     if dict_st.has_key(int(als[0]))==False: 
      dict_st[int(als[0])]=[int(als[1])] 
     else: dict_st[int(als[0])].append(int(als[1]))

凡是涉及到整個文本字對應的infromation提取完成後，我再打印對準對方的話。現在這種方法很慢;特別是如果我不得不從500多個句子中重複它。我想知道是否有更快的方法來提取這些信息。謝謝。

來源

2013-06-13 user1718064

不要使用'has_key'。 '如果int（als [0]）不在dict_st：'正常工作 –

什麼是'align_spl'？ –

嗨，我不知道這是你需要

如果您需要字典的每一行：

for line in f: 
    dict_st=dict() 
    for item in line.split(): 
     k, v = map(int, item.split('-')) 
     dict_st.setdefault(k, set()).add(v)

如果你需要詞典整個文件：

dict_st={} 
for line in f: 
    for item in line.split(): 
     k, v = map(int, item.split('-')) 
     dict_st.setdefault(k, set()).add(v)

我已使用set而不是list來防止重複值。如果您需要這些重複，請使用'列表'

dict_st={} 
for line in f: 
    for item in line.split(): 
     k, v = map(int, item.split('-')) 
     dict_st.setdefault(k, []).append(v)

N.B.可以遍歷文件而不用在內存中讀取它readlines()

來源

2013-06-13 11:38:58 oleg

使用'defaultdict（set）'會更整潔。另外'對於f：行'不需要將整個文件一次讀入內存 –

對不起，你是對的。我複製了這一行，並沒有注意到'readlines（）' – oleg

以更快的方式創建字典 - Python

回答

相關問題