2016-03-02 53 views
0

我目前面臨一個問題,使我的CVS數據字典。如何將字典作爲值插入Python中使用循環的字典

我有3列,我想在文件中使用:

userID, placeID, rating 
U1000, 12222, 3 
U1000, 13333, 2 
U1001, 13333, 4 

我想作的結果是這樣的:

{'U1000': {'12222': 3, '13333': 2}, 
'U1001': {'13333': 4}} 

也就是說, 我想使我的數據結構看起來像:

sample = {} 
sample["U1000"] = {} 
sample["U1001"] = {} 
sample["U1000"]["12222"] = 3 
sample["U1000"]["13333"] = 2 
sample["U1001"]["13333"] = 4 

但我有很多數據是親cessed。 我想獲得與循環的結果,但我已經嘗試過了2小時,失敗..

---以下代碼可以迷惑你---

我的結果看現在這個樣子:

{'U1000': ['12222', 3], 
'U1001': ['13333', 4]} 
  1. 該字典的值是一個列表,而一本字典
  2. 用戶「U1000」出現多次,但在我孤單的結果只有一次

我想我的代碼有很多錯誤..如果你不介意的話,請看看:

reader = np.array(pd.read_csv("rating_final.csv")) 
included_cols = [0, 1, 2] 

sample= {} 
target=[] 
target1 =[] 
for row in reader: 
     content = list(row[i] for i in included_cols) 
     target.append(content[0]) 
     target1.append(content[1:3]) 

sample = dict(zip(target, target1)) 

我怎麼能提高代碼? 我已經看過通過計算器,但由於個人缺乏能力, 任何人都可以請幫助我呢?

非常感謝!

+0

這似乎是你想要的字典作爲_values_ ,而不是_keys_。也許正確的標題匹配? – ShadowRanger

+0

謝謝你的提醒。已更正標題以及內容! –

+0

另外,你的例子有'{'U1000':{'12222':3},{'1333':2},'U1001':{'13333':4}}',但是這是'U1000'和' U1001',但沒有與{{1333':2}'相關聯的鍵(或無值)。你可以有'{'U1000':{'12222':3,'1333':2},'U1001':{'13333':4}}'或'{'U1000':[{'12222': 3},{'1333':2}],'U1001':[{'13333':4}]}',但不是你提供的。 – ShadowRanger

回答

2

這應該做你想要什麼:

import collections 

reader = ... 
sample = collections.defaultdict(dict) 

for user_id, place_id, rating in reader: 
    rating = int(rating) 
    sample[user_id][place_id] = rating 

print(sample) 
# -> {'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}} 

defaultdict是一個方便的工具,只要您試圖訪問一個關鍵,是不是在字典中提供的默認值。如果你(因爲你要sample['non-existent-user-id]失敗,KeyError例如)不喜歡它,使用:

reader = ... 
sample = {} 

for user_id, place_id, rating in reader: 
    rating = int(rating) 
    if user_id not in sample: 
     sample[user_id] = {} 
    sample[user_id][place_id] = rating 
+0

感謝您的澄清,這真的有幫助! –

1

例子中的預期輸出是不可能的,因爲{'1333': 2}不會與一個鍵關聯。你可以得到{'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}}雖然與dictdict一個S:

sample = {} 
for row in reader: 
    userID, placeID, rating = row[:3] 
    sample.setdefault(userID, {})[placeID] = rating # Possibly int(rating)? 

或者,使用collections.defaultdict(dict)以避免涉及setdefault(或其他方法需要一個try/except KeyErrorif userID in sample:在交換犧牲setdefault的原子爲不產生空dict小號不必要地):

import collections 

sample = collections.defaultdict(dict) 
for row in reader: 
    userID, placeID, rating = row[:3] 
    sample[userID][placeID] = rating 

# Optional conversion back to plain dict 
sample = dict(sample) 

轉換回普通dict確保將來升ookups不會自動生動化按鍵,正常情況下會提升KeyError,如果您print那麼它看起來像正常的dict

如果included_cols是很重要的(因爲名字或列索引可能會發生變化),則可以使用operator.itemgetter加快和簡化一次提取所有所需的列:

from collections import defaultdict 
from operator import itemgetter 

included_cols = (0, 1, 2) 
# If columns in data were actually: 
# rating, foo, bar, userID, placeID 
# we'd do this instead, itemgetter will handle all the rest: 
# included_cols = (3, 4, 0) 
get_cols = itemgetter(*included_cols) # Create function to get needed indices at once 

sample = defaultdict(dict) 
# map(get_cols, ...) efficiently converts each row to a tuple of just 
# the three desired values as it goes, which also lets us unpack directly 
# in the for loop, simplifying code even more by naming all variables directly 
for userID, placeID, rating in map(get_cols, reader): 
    sample[userID][placeID] = rating # Possibly int(rating)? 
+0

感謝您的回答,這真的有幫助! –