2013-01-24 73 views
0

我的文件是這樣的:如何使從一個文本文件中的字典,蟒蛇

aaien 12 13 39 
aan 10 
aanbad 12 13 14 57 58 38 
aanbaden 12 13 14 57 58 38 
aanbeden 12 13 14 57 58 38 
aanbid 12 13 14 57 58 39 
aanbidden 12 13 14 57 58 39 
aanbidt 12 13 14 57 58 39 
aanblik 27 28 
aanbreken 39 
... 

我想和關鍵字的詞典=字(如「aaien」)和值應該是一個旁邊的數字列表。 所以它必須這樣看: {'aaien':['12,13,39'],'aan':['10']}

這段代碼似乎不起作用。

document = open('LIWC_words.txt', 'r') 
liwcwords = document.read() 
dictliwc = {} 
for line in liwcwords: 
    k, v = line.strip().split(' ') 
    answer[k.strip()] = v.strip() 

liwcwords.close() 

Python Shell中給出了這樣的錯誤:ValueError異常:需要比1點的值更解壓

的感謝!

回答

3

你正在將你的行分成單詞列表,但只給它一個鍵和值。

這將工作:

with open('LIWC_words.txt', 'r') as document: 
    answer = {} 
    for line in document: 
     line = line.split() 
     if not line: # empty line? 
      continue 
     answer[line[0]] = line[1:] 

注意,你不需要給.split()一個參數;如果沒有參數,它將分裂爲空白並將結果剝離爲。這可以節省您不得不明確呼叫.strip()

另一種方法是分裂僅在第一空白:

with open('LIWC_words.txt', 'r') as document: 
    answer = {} 
    for line in document: 
     if line.strip(): # non-empty line? 
      key, value = line.split(None, 1) # None means 'all whitespace', the default 
      answer[key] = value.split() 

的第二個參數.split()限制由分割的數量,保證有返回至多2個元素,使得能夠解壓縮值分配給keyvalue

兩種方法的結果:

{'aaien': ['12', '13', '39'], 
'aan': ['10'], 
'aanbad': ['12', '13', '14', '57', '58', '38'], 
'aanbaden': ['12', '13', '14', '57', '58', '38'], 
'aanbeden': ['12', '13', '14', '57', '58', '38'], 
'aanbid': ['12', '13', '14', '57', '58', '39'], 
'aanbidden': ['12', '13', '14', '57', '58', '39'], 
'aanbidt': ['12', '13', '14', '57', '58', '39'], 
'aanblik': ['27', '28'], 
'aanbreken': ['39']} 

如果仍然只看到一個鍵和文件作爲(分)值的其餘部分,你的輸入文件可能使用非標準的行分隔符。與universal line ending support打開文件,通過將U字符模式:

with open('LIWC_words.txt', 'rU') as document: 
+0

現在,它給出了這樣的錯誤:IndexError:列表索引超出範圍 – user2007220

+0

您的線路可能是空 –

+0

@ user2007220:新增測試跳過空行。 –

1
>liwcwords = document.read() 
>dictliwc = {}  
>for line in liwcwords: 

您遍歷字符串在這裏,這是不是你想要的。試試document.readlines()。這是另一種解決方案。

from pprint import pprint 
with open('LIWC_words.txt') as fd: 
    d = {} 
    for i in fd: 
     entry = i.split() 
     if entry: d.update({entry[0]: entry[1:]}) 

pprint(d) 

下面是輸出的樣子

{'aaien': ['12', '13', '39'], 
'aan': ['10'], 
'aanbad': ['12', '13', '14', '57', '58', '38'], 
'aanbaden': ['12', '13', '14', '57', '58', '38'], 
'aanbeden': ['12', '13', '14', '57', '58', '38'], 
'aanbid': ['12', '13', '14', '57', '58', '39'], 
'aanbidden': ['12', '13', '14', '57', '58', '39'], 
'aanbidt': ['12', '13', '14', '57', '58', '39'], 
'aanblik': ['27', '28'], 
'aanbreken': ['39']}