如果你想更統計方法來尋找異常值,你可以做這樣的事情:
data = {'18': [3.89, 1.28], '20': [1.39, 3.15], '15': [1.42, 3.10]}
avg = np.mean([x for sublist in data.values() for x in sublist])
stddev = np.std([x for sublist in data.values() for x in sublist])
對於一個標準偏差:
n_stddevs = 1
{k: [x for x in v if x >= avg-stddev*n_stddevs and x <= avg+stddev*n_stddevs] for k, v in data.items()}
# {'15': [1.42, 3.1], '18': [], '20': [1.39, 3.15]}
爲2:
n_stddevs = 2
{k: [x for x in v if x >= avg-stddev*n_stddevs and x <= avg+stddev*n_stddevs] for k, v in data.items()}
#{'15': [1.42, 3.1], '18': [3.89, 1.28], '20': [1.39, 3.15]}
0.5:
n_stddevs = 0.5
{k: [x for x in v if x >= avg-stddev*n_stddevs and x <= avg+stddev*n_stddevs] for k, v in data.items()}
# {'15': [], '18': [], '20': []}
是否所有列表*保證*具有相同的長度? –
是的,總是2-elem列表。 –
所以要清楚,你想要找到比所有值的均值更遠的值而不是指定的閾值?如果是這樣,你是否只需要存儲值的值或字典鍵呢? – timgeb