相交，並設置

我想相交與一組一個np.array而無需將第一np.array轉換爲一個列表（程序減慢到一個不可行的水平）。相交，並設置

這裏是我當前的代碼：（請注意，我正從B，G，R rawCapture這個數據，並且selection_data是簡單地從預先一套。）

def GreenCalculations(data): 
    data.reshape(1,-1,3) 
    data={tuple(item) for item in data[0]} 
    ColourCount=selection_data & set(data) 
    Return ColourCount

現在我目前的問題，我覺得是由於數據[0]，我只比較圖片的第一部分。是否有可能遍歷所有行？

注意：tolist（）需要很多時間。

來源

2016-08-09 Davey Boy

首先樣品data;我猜這是一個nxnx3陣列，具有D型uint8

In [791]: data=np.random.randint(0,256,(8,8,3),dtype=np.uint8)

reshape方法返回一個新形狀的新陣，但並沒有改變，在就地：

In [793]: data.reshape(1,-1,3)

data.shape=(1,-1,3)會這麼做就地。但爲什麼最初1？

相反：

In [795]: aset={tuple(item) for item in data.reshape(-1,3)} 
In [796]: aset 
Out[796]: 
{(3, 92, 60), 
(5, 211, 227), 
(6, 185, 183), 
(9, 37, 0), 
.... 

In [797]: len(aset) 
Out[797]: 64

在我來說，一組64個獨特的項目 - 並不奇怪，因爲我是如何生成的值

你什麼都不做的data.reshape線和{tuple(item) for item in data[0]}賬戶爲什麼它似乎是在圖片的第一行上工作。

我猜selection_data類似於3項元組，如：

In [801]: selection_data = {tuple(data[1,3,:]), (1,2,3), tuple(data[5,5,:])} 
In [802]: selection_data 
Out[802]: {(1, 2, 3), (49, 132, 26), (76, 131, 16)} 
In [803]: selection_data&aset 
Out[803]: {(49, 132, 26), (76, 131, 16)}

你不說，你嘗試使用tolist，但我在生成的元組的猜測。

但奇怪的是，tolist速度可達轉換：

In [808]: timeit {tuple(item) for item in data.reshape(-1,3).tolist()} 
10000 loops, best of 3: 57.7 µs per loop 
In [809]: timeit {tuple(item) for item in data.reshape(-1,3)} 
1000 loops, best of 3: 239 µs per loop 
In [815]: timeit data.reshape(-1,3).tolist() 
100000 loops, best of 3: 19.8 µs per loop 
In [817]: timeit {tuple(item.tolist()) for item in data.reshape(-1,3)} 
10000 loops, best of 3: 100 µs per loop

所以這樣做的排序列表和設置操作，我們不妨跳轉到列表格式的時候了。

numpy有一些組功能，例如np.in1d。這隻對1d陣列進行操作，但正如在unique row問題中已經證明的那樣，我們可以通過將2d陣列視爲結構化陣列來解決這個問題。我不得不反覆折騰到目前爲止，這得到：

In [880]: dt=np.dtype('uint8,uint8,uint8') 
In [881]: data1=data.reshape(-1,3).view(dt).ravel() 
In [882]: data1 
Out[882]: 
array([(41, 145, 254), (138, 144, 7), (192, 241, 203), (42, 177, 215), 
     (78, 132, 87), (221, 176, 87), (107, 171, 147), (231, 13, 53), 
     ... 
     dtype=[('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1')])

構造一個選擇具有相同結構數組性質：

In [883]: selection=[data[1,3,:],[1,2,3],data[5,5,:]] 
In [885]: selection=np.array(selection,np.uint8).view(dt) 
In [886]: selection 
Out[886]: 
array([[(49, 132, 26)], 
     [(1, 2, 3)], 
     [(76, 131, 16)]], 
     dtype=[('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1')])

所以在selection的物品也在data1發現是：

In [888]: np.in1d(selection,data1) 
Out[888]: array([ True, False, True], dtype=bool)

以及data1中正在選擇的項目有：

In [890]: np.where(np.in1d(data1,selection)) 
Out[890]: (array([11, 45], dtype=int32),)

或拆開的形狀

In [891]: np.where(np.in1d(data1,selection).reshape(8,8)) 
Out[891]: (array([1, 5], dtype=int32), array([3, 5], dtype=int32))

相同的（1,3）和我用於產生selection（5,5）的項目。

的in1d時序爲競爭力：

In [892]: %%timeit 
    ...: data1=data.reshape(-1,3).view(dt).ravel() 
    ...: np.in1d(data1,selection) 
    ...: 
10000 loops, best of 3: 65.7 µs per loop 

In [894]: timeit selection_data&{tuple(item) for item in data.reshape(-1,3).tolist()} 
10000 loops, best of 3: 91.5 µs per loop

來源

2016-08-09 05:54:52 hpaulj

預計'tolist'會加快轉換速度。 Numpy對象將數據存儲爲原始值，而不是python對象。這意味着每個來自python的訪問都需要numpy來爲值創建一個包裝對象。這也在迭代時完成。 'tolist'方法在一個優化的C循環中創建所有的包裝，並將它們放到一個python列表中，隨後的迭代是通過一個很快的python列表，因爲它不需要創建包裝對象。 – Bakuriu

如果我正確理解你的問題（和IM不是100％肯定，我做的，但使用相同的假設hpaulj），您的問題可以這樣使用可以解決所述numpy_indexed包：

import numpy_indexed as npi 
ColourCount = npi.intersection(data.reshape(-1, 3), np.asarray(selection_data))

也就是說，它把兩個重構陣列以及設定作爲長度爲3的ndarrays，其中發現在向量化方式的交叉點的序列。

來源

2016-08-09 07:30:30

相交，並設置

回答

相關問題