0
我HDF5文件傳輸到Amazon EC2 Linux實例後,我似乎無法看到的數據集在該文件(5GB,的md5sum轉移後檢查)H5py數據集沒有見過
當我運行代碼:
import h5py
h5_fname = 'DATA\DATA.h5'
print (h5py.version.info)
f = h5py.File(h5_fname, 'r')
print(f)
for name in f:
print(name)
print(f[name].shape)
f.close()
在我的本地計算機上我得到(這是正確的):
h5py 2.6.0
HDF5 1.8.15
Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
sys.platform win32
sys.maxsize 9223372036854775807
numpy 1.12.0
<HDF5 file "DATA.h5" (mode r)>
X_train
(1397, 1, 128, 128, 128)
y_train
(1397, 1)
i_train
(1397, 1)
X_test
(198, 1, 128, 128, 128)
y_test
(198, 1)
i_test
(198, 1)
當亞馬遜實例上運行:
h5py 2.6.0
HDF5 1.8.17
Python 3.5.1 (default, Sep 13 2016, 18:48:37)
[GCC 4.8.3 20140911 (Red Hat 4.8.3-9)]
sys.platform linux
sys.maxsize 9223372036854775807
numpy 1.11.3
<HDF5 file "DATA\DATA.h5" (mode r)>
有版本差異,但我不認爲這是這裏的問題。 有什麼建議嗎?
編輯: 我如何創建HDF5文件可能有用的代碼:
def create_h5(fname_):
f = h5py.File(fname_, 'w', libver='latest')
dtype_ = h5py.special_dtype(vlen=bytes)
num_samples_train = 1397
num_samples_test = 1595 - 1397
chunks_ = (1, 1, 128, 128, 128) #100MB
chunks_2 = (1, 1)
f.create_dataset('X_train', (num_samples_train, 1, 128, 128, 128), dtype=np.float32, maxshape=(None, None, None, 128, 128), chunks=chunks_, compression="gzip")
f.create_dataset('y_train', (num_samples_train, 1), dtype=np.int32, maxshape=(None, 1), chunks=chunks_2, compression="gzip")
f.create_dataset('i_train', (num_samples_train, 1), dtype=dtype_, maxshape=(None, 1), chunks=chunks_2, compression="gzip")
f.create_dataset('X_test', (num_samples_test, 1, 128, 128, 128), dtype=np.float32, maxshape=(None, None, None, 128, 128), chunks=chunks_, compression="gzip")
f.create_dataset('y_test', (num_samples_test, 1), dtype=np.int32, maxshape=(None, 1), chunks=chunks_2, compression="gzip")
f.create_dataset('i_test', (num_samples_test, 1), dtype=dtype_, maxshape=(None, 1), chunks=chunks_2, compression="gzip")
f.flush()
f.close()
print('HDF5 file created')
您應該使用'os'包中的'os.path.join' –