用h5py填充具有相同化合物數據值的數據集的快速方法

我在hdf文件中有大量的化合物數據。該化合物的數據的類型看起來如下：用h5py填充具有相同化合物數據值的數據集的快速方法

numpy.dtype([('Image', h5py.special_dtype(ref=h5py.Reference)), 
       ('NextLevel', h5py.special_dtype(ref=h5py.Reference))])

隨着我創建和到圖像的引用在每個位置另一數據集的數據集。這些數據集的維數爲n x n，其中n通常至少爲256，但更可能> 2000。我必須開始填補這些數據集的每個位置具有相同值：

[[(image.ref, dataset.ref)...(image.ref, dataset.ref)], 
     . 
     . 
     . 
    [(image.ref, dataset.ref)...(image.ref, dataset.ref)]]

我儘量避免填充它有兩個for循環，如：

for i in xrange(0,n): 
     for j in xrange(0,n): 
     daset[i,j] =(image.ref, dataset.ref)

因爲表現很糟糕。所以我在尋找類似numpy.fill，numpy.shape,numpy.reshape,numpy.array,numpy.arrange,[:]等等。我以各種方式嘗試了這些函數，但它們似乎只適用於數字和字符串數據類型。有沒有辦法以更快的方式填充這些數據集，然後for循環？

預先感謝您。

來源

2013-06-24 samson

您可以使用numpy的broadcasting或numpy.repeat和numpy.reshape的組合：

my_dtype = numpy.dtype([('Image', h5py.special_dtype(ref=h5py.Reference)), 
      ('NextLevel', h5py.special_dtype(ref=h5py.Reference))]) 
ref_array = array((image.ref, dataset.ref), dtype=my_dtype) 
dataset = numpy.repeat(ref_array, n*n) 
dataset = dataset.reshape((n,n))

注意numpy.repeat返回扁平陣列，因此，使用的numpy.reshape。看來repeat比只是播放它快：

%timeit empty_dataset=np.empty(2*2,dtype=my_dtype); empty_dataset[:]=ref_array 
100000 loops, best of 3: 9.09 us per loop 

%timeit repeat_dataset=np.repeat(ref_array, 2*2).reshape((2,2)) 
100000 loops, best of 3: 5.92 us per loop

來源

2013-06-25 07:40:29 Yossarian

謝謝，作品完美。 – samson

用h5py填充具有相同化合物數據值的數據集的快速方法

回答

相關問題