2016-07-21 46 views
1

我想在numpy的陣列到一個連續範圍0任意整數翻譯... N,就像這樣:翻譯整數在numpy的陣列到一個連續範圍0,...,N

source: [2 3 4 5 4 3] 
translating [2 3 4 5] -> [0 1 2 3] 
target: [0 1 2 3 2 1] 

有必須是比以下更好的方式:

import numpy as np 

"translate arbitrary integers in the source array to contiguous range 0...n" 

def translate_ids(source, source_ids, target_ids): 
    target = source.copy() 

    for i in range(len(source_ids)): 
     x = source_ids[i] 
     x_i = source == x 
     target[x_i] = target_ids[i] 

    return target 

# 

source = np.array([ 2, 3, 4, 5, 4, 3 ]) 
source_ids = np.unique(source) 
target_ids = np.arange(len(source_ids)) 

target = translate_ids(source, source_ids, target_ids) 

print "source:", source 
print "translating", source_ids, '->', target_ids 
print "target:", target 

這是什麼?

回答

4

IIUC,你可以簡單地使用np.unique的可選參數return_inverse,像這樣 -

np.unique(source,return_inverse=True)[1] 

採樣運行 -

In [44]: source 
Out[44]: array([2, 3, 4, 5, 4, 3]) 

In [45]: np.unique(source,return_inverse=True)[1] 
Out[45]: array([0, 1, 2, 3, 2, 1]) 
1

pandas.factorize是一個方法:

import pandas as pd 

lst = [2, 3, 4, 5, 4, 3] 
res = pd.factorize(lst, sort=True)[0] 

# [0 1 2 3 2 1] 

:這個變成一個清單,而np.unique將始終返回np.ndarray