將hdf5文件組合成單個數據集

我有許多hdf5文件，每個文件上都有一個數據集。我想將它們組合成一個數據集，其中數據全部位於同一個卷中（每個文件都是一個圖像，我想要一個大的延時圖像）。將hdf5文件組合成單個數據集

我寫了一個python腳本將數據提取爲一個numpy數組，然後將它們寫入一個新的h5文件。但是，這種方法將不起作用，因爲組合數據使用的內存超過了我擁有的32 GB RAM。

我也嘗試使用h5copy命令行工具。

h5copy -i file1.h5 -o combined.h5 -s '/dataset' -d '/new_data/t1' 
h5copy -i file2.h5 -o combined.h5 -s '/dataset' -d '/new_data/t2'

哪些工作，但它導致新文件中的許多數據集，而不是所有的數據集串聯。

來源

2015-10-06 not_a_computer_person

儘管無法將行明確附加到hdf5數據集，但在創建數據集時可以使用maxshape關鍵字，以便您可以「調整」數據集以適應新數據。（見http://docs.h5py.org/en/latest/faq.html#appending-data-to-a-dataset）

你的代碼最終會看起來像這樣，假設列數爲數據集始終是相同的：

import h5py 

output_file = h5py.File('your_output_file.h5', 'w') 

#keep track of the total number of rows 
total_rows = 0 

for n, f in enumerate(file_list): 
    your_data = <get your data from f> 
    total_rows = total_rows + your_data.shape[0] 
    total_columns = your_data.shape[1] 

    if n == 0: 
    #first file; create the dummy dataset with no max shape 
    create_dataset = output_file.create_dataset("Name", (total_rows, total_columns), maxshape=(None, None)) 
    #fill the first section of the dataset 
    create_dataset[:,:] = your_data 
    where_to_start_appending = total_rows 

    else: 
    #resize the dataset to accomodate the new data 
    create_dataset.resize(total_rows, axis=0) 
    create_dataset[where_to_start_appending:total_rows, :] = your_data 
    where_to_start_appending = total_rows 

output_file.close()

來源

2015-10-06 23:05:29

是什麼<根據f讓您的數據> – user1241241

這將是無論您需要做什麼命令或步驟來從每個文件獲取數據，都將取決於它的文件類型。例如，如果您使用的是HDF5文件列表，則需要使用h5py.File來創建文件對象，然後使用file_object [「dataset_name」] [slice]等文件從文件中讀取數據。 –

將hdf5文件組合成單個數據集

回答

相關問題