創建詞典並查看鍵是否總是具有相同的值

如果我有一個以數字開頭且後跟一些文本的行文件，我怎麼能看到數字總是跟着不同的文本？例如：創建詞典並查看鍵是否總是具有相同的值

0 Brucella abortus Brucellaceae 
0 Brucella ceti Brucellaceae 
0 Brucella canis Brucellaceae 
0 Brucella ceti Brucellaceae

所以在這裏，我想知道，0後跟文本3種不同的「類型」。

理想的情況下，我可以讀文件到一個python腳本，將有輸出是這樣的：

1:250 
2:98 
3:78 
4:65 
etc.

的第一個號碼會後:會有所不同「文本」的數量和次數發生這種情況有多少個數字。

我有下面的腳本，計算「文本」在不同的數字中找到了多少次，所以我想知道如何反轉它，所以我知道number有多少次不同的文本，以及有多少不同的文本存在。該腳本將numbers和「文本」的文件製作成字典，但我不確定如何操作此字典以獲得我想要的內容。

#!/usr/bin/env python 
#Dictionary to broken species, genus, family 

fileIn = 'usearchclusternumgenus.txt' 

d = {} 
with open(fileIn, "r") as f: 
     for line in f: 
       clu, gen, spec, fam = line.split() 
       d.setdefault(clu, []).append((spec)) 


# Iterate through and find out how many times each key occurs 
vals = {}      # A dictonary to store how often each value occurs. 
for i in d.values(): 
    for j in set(i):    # Convert to a set to remove duplicates 
    vals[j] = 1 + vals.get(j,0) # If we've seen this value iterate the count 
           # Otherwise we get the default of 0 and iterate it 
#print vals 

# Iterate through each possible freqency and find how many values have that count. 
counts = {}      # A dictonary to store the final frequencies. 
# We will iterate from 0 (which is a valid count) to the maximum count 
for i in range(0,max(vals.values())+1): 
    # Find all values that have the current frequency, count them 
    #and add them to the frequency dictionary 
    counts[i] = len([x for x in vals.values() if x == i]) 

for key in sorted(counts.keys()): 
    if counts[key] > 0: 
     print key,":",counts[key]`

來源

2014-03-01 Jen

我可能誤會你在這裏計算的。我的回答簡化了您的代碼;按照'clu'計算有多少獨特的'spec'值。 –

這是一個很好的方式來說實際上我在找什麼。我正在尋找多少次（或多少'clu'）具有獨特的'spec'。最終我想看看哪些，但首先我想知道它發生了多少次。 – Jen

儘管您的條款*文字*和*號碼*非常混亂。你能給我一小段樣本的輸入行和期望的輸出行嗎？ –

使用collections.defaultdict() object一組作爲工廠跟蹤不同的線路，然後打印出所收集的集合的大小：

from collections import defaultdict 

unique_clu = defaultdict(set) 

with open(fileIn) as infh: 
    for line in infh: 
     clu, gen, spec, rest = line.split(None, 3) 
     unique_clu[clu].add(spec) 

for key in sorted(unique_clu): 
    count = len(unique_clu[key]) 
    if count: 
     print '{}:{}'.format(key, count)

來源

2014-03-01 20:23:17

謝謝！我跑了這個，我在這一行得到一個錯誤：'unique_clu [clu] .add（spec）'（說unique_clu是未定義的），應該'unique_clu [clu]'是'unique_spec'？ – Jen

@Jen：對不起，是的，那是一個編輯錯誤。糾正。 –

這太棒了！有一種快速的方法可以查看'clu'的值有多少次？那麼'clu：1'有多少次？非常感謝！ – Jen

創建詞典並查看鍵是否總是具有相同的值

回答

相關問題