2014-04-14 66 views
3

使用numpyitertools有沒有一種有效的方法來確定到下一個非連續元素的距離。與numpy陣列中非連續元素的距離

> import numpy as np 
> a=np.array(['a','b','b','c','d','a','b','b','c','c','c','d']) 

我想要輸出。

[1,2,1,1,1,1,2,1,3,2,1]

延伸這一點,我希望兩個新元素的距離。預期的輸出應當是

[3,3,2,2,2,3,5,4]

作爲兩個新元素之後ab(二)和c,和等等。

編輯1 我有兩個版本,爲尋找下一個新的元素:

import numpy as np               
a = np.array(['a', 'b', 'b', 'c', 'd', 'a', 'b', 'b', 'c', 'c', 'c', 'd']) 

# Using numpy 
u, idx = np.unique(a, return_inverse=True)              
idx = np.diff(idx)              
idx[idx < 0] = 1 
idx[idx > 1] = 1 
count = 1 
while 0 in idx:                  
    idx[np.diff(idx) == count] = count+1 
    count += 1                     │                       
print idx 

# Using loop 
oldElement = a[0] 
dist = [] 
count = 1 
for elm in a[1:]: 
    if elm == oldElement: 
     count += 1 
    else: 
     dist.extend(range(count, 0, -1)) 
     count = 1 
     oldElement = elm 
print dist 

但是這種方法不能簡單地推廣到發現2組新的元素。

+0

以下數組'np.array(['a','a','a'])'的預期輸出是什麼? – CoryKramer

+0

一個空數組[]。 – imsc

+0

對於「兩個新元素」距離,「['a','b','a','b']'的輸出應該是什麼? – shx2

回答

1

不幸的是,我沒有一個numpy的/矢量解決一般問題

這裏是一個通用解決方案,它適用於任何深度。您的問題的第一部分對應於深度= 1,第二至深度= 2。該解決方案也適用於更高的深度。很明顯,如果你只想解決depth = 1的情況,可以想出一個更簡單的解決方案。但是,對於這個問題,普遍性增加了複雜性。

from itertools import groupby, chain, izip 

ilen = lambda it: sum(1 for dummy in it) 

def get_squeezed_counts(a): 
    """ 
    squeeze a sequence to a sequnce of value/count. 
    E.g. ['a', 'a', 'a', 'b'] --> [['a',3], ['b',1]] 
    """ 
    return [ [ v, ilen(it) ] for v, it in groupby(a) ] 

def get_element_dist(counts, index, depth): 
    """ 
    For a given index in a "squeezed" list, return the distance (in the 
    original-array) with a given depth, or None. 
    E.g. 
    get_element_dist([['a',1],['b',2],['c',1]], 0, depth=1) --> 1  # from a to first b 
    get_element_dist([['a',1],['b',2],['c',1]], 1, depth=1) --> 2  # from first b to c 
    get_element_dist([['a',1],['b',2],['c',1]], 0, depth=2) --> 3  # from a to c 
    get_element_dist([['a',1],['b',2],['c',1]], 1, depth=2) --> None # from first b to end of sequence 
    """ 
    seen = set() 
    sum_counts = 0 
    for i in xrange(index, len(counts)): 
     v, count = counts[i] 
     seen.add(v) 
     if len(seen) > depth: 
      return sum_counts 
     sum_counts += count 
    # reached end of sequence before finding the next value 
    return None 

def get_squeezed_dists(counts, depth): 
    """ 
    Construct a per-squeezed-element distance list, by calling get_element_dist() 
    for each element in counts. 
    E.g. 
    get_squeezed_dists([['a',1],['b',2],['c',1]], depth=1) --> [1,2,None] 
    """ 
    return [ get_element_dist(counts, i, depth=depth) for i in xrange(len(counts)) ] 

def get_dists(a, depth): 
    counts = get_squeezed_counts(a) 
    squeezed_dists = get_squeezed_dists(counts, depth=depth) 
    # "Unpack" squeezed dists: 
    return list(chain.from_iterable(
     xrange(dist, dist-count, -1) 
     for (v, count), dist in izip(counts, squeezed_dists) 
     if dist is not None 
    )) 

print get_dists(['a','b','b','c','d','a','b','b','c','c','c','d'], depth = 1) 
# => [1, 2, 1, 1, 1, 1, 2, 1, 3, 2, 1] 
print get_dists(['a','a','a'], depth = 1) 
# => [] 
print get_dists(['a','b','b','c','d','a','b','b','c','c','c','d'], depth = 2) 
# => [3, 3, 2, 2, 2, 3, 5, 4] 
print get_dists(['a','b','a', 'b'], depth = 2) 
# => [] 

對於python3,更換xrange->rangeizip->zip

+1

謝謝。這看起來很有希望。 – imsc

0

這是我嘗試在一個元素的距離。

import numpy as np 
a=np.array(['a','b','b','c','d','a','b','b','c','c','c','d']) 
out = [] 
for i in range(len(a)): 
    count = 0 
    for j in range(len(a) - i): 
     if a[i] != a[j+i]: 
      out.append(count) 
      break 
     else: 
      count += 1 

結果

>>> out 
[1, 2, 1, 1, 1, 1, 2, 1, 3, 2, 1] 
0

這裏是一個知道如何向量化這個如果沒有太多的獨特元素。它可能不夠一般,只能真正解決你的問題序列a :)(我玩過陣列,直到它工作)。在結束while循環應該是一個可以優化:

import numpy as np 
a = np.array(['a','b','b','c','d','a','b','b','c','c','c','d']) 

aa = a[:, np.newaxis] == np.unique(a) 
aaa = np.cumsum(aa[::-1], axis=0)[::-1] * aa 

# this is where it gets messy 

negative_jump = True 
while negative_jump: 
    d = np.diff(aaa, axis=0) 
    correction = (d + 1) * (d < -1) 
    negative_jump = (correction != 0).any() 
    aaa[:-1] += correction 
result = aaa[:-1].sum(axis=1) 

說明:看看aaa循環之前。它將包含遠離0的數字。在該數據視圖中,從一行傳遞到另一行的遞減從不是< -1。如果是的話,上面的數字太大了。循環減少它直到-1或0.再次,不是最優的,人們可以做得更好。