2017-05-03 45 views
0

我用這段代碼從json讀取數據。如何從列表中統計單詞?

json_file='report.json' 

json_data=open(json_file) 
data = json.load(json_data) 

t0 = [] 
t1 = [] 
tn = [] 

#counts = Counter(data['behavior']['processes'][3]['calls']) 
print (type(data['behavior']['processes'][3]['calls'])) 

for i in data['behavior']['processes'][3]['calls']: 

    t0 = i['arguments'] 
    print(t0) 

json_data.close() 

它顯示這樣的數據。

<class 'list'> 
aa 
bb 
aa 
cc 
bb 
cc 
aa 

我要算話的frequentcy結果應該是AA = 3,BB = 2,CC = 2

如果我取消在Counter(data['behavior']['processes'][3]['calls'])它會顯示錯誤。

TypeError: unhashable type: 'dict' 

如何從列表中統計單詞?

+0

你能告訴我們你的樣本數據? –

回答

0

計數器需要一個列表作爲輸入。

from collections import Counter 

#create a list from your data 
mylist = [i['arguments'] for i in data['behavior']['processes'][3]['calls']] 

#make a dict of counts 
counter_dict = Counter(mylist) 

#print out counts per item 
for val in counter_dict: 
    print '%i has %i occurrences' % (val, counter_dict[val]) 

(未測試的代碼)

0

,因爲我沒有你正在使用的數據還沒有測試。
但我認爲這會奏效。

json_file='report.json' 

json_data=open(json_file) 
data = json.load(json_data) 

t0 = [] 
t1 = [] 
tn = [] 

#counts = Counter(data['behavior']['processes'][3]['calls']) 
print (type(data['behavior']['processes'][3]['calls'])) 

data_count = {} 
for i in data['behavior']['processes'][3]['calls']: 

    t0 = i['arguments'] 
    count = data_count.get(t0) 
    if count is None: 
     data_count[t0] = 1 
    else: 
     data_count[t0] = count + 1 

    print(t0) 

json_data.close() 
print(data_count) 
1

你可以做

Counter(map(lambda x:x['argument'], data['behavior']['processes'][3]['calls'])) 
1
counterDict = {} # <== 
json_file='report.json' 
json_data=open(json_file) 
data = json.load(json_data) 

t0 = [] 
t1 = [] 
tn = [] 

#counts = Counter(data['behavior']['processes'][3]['calls']) 
print (type(data['behavior']['processes'][3]['calls'])) 

for i in data['behavior']['processes'][3]['calls']: 

    t0 = i['arguments'] 
    counterDict[t0] = counterDict.get(t0,0)+1 # <=== 

json_data.close() 

print(counterDict)