TensorFlow TFRecord與許多圖像在讀取期間崩潰

我很難從TFRecord文件讀取具有「許多」（多於500個）事件的文件。如果我創建500個事件文件，一切都很好，但超過500會導致一個錯誤，當我嘗試讀取並解析文件：TensorFlow TFRecord與許多圖像在讀取期間崩潰

W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: Could not parse example input, value: 
... 
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb5 in position 40: invalid start byte

的圖像是具有形狀(N, 2, 127, 50)花車（重新塑造，以(N, 127, 50, 2)在閱讀過程中）。我試着用兩種不同的方式寫它們：作爲字節列表和浮動列表，並且都以相同的方式失敗。

對於「字節法」，該代碼的業務部分是：

def write_to_tfrecord(data_dict, tfrecord_file): 
    writer = tf.python_io.TFRecordWriter(tfrecord_file) 
    features_dict = {} 
    for k in data_dict.keys(): 
     features_dict[k] = tf.train.Feature(
      bytes_list=tf.train.BytesList(value=[data_dict[k]['byte_data']]) 
     ) 
    example = tf.train.Example(
     features=tf.train.Features(feature=features_dict) 
    ) 
    writer.write(example.SerializeToString()) 
    writer.close()

，然後閱讀：

def tfrecord_to_graph_ops_xtxutuvtv(filenames): 
    def process_hitimes(inp, shape): 
     hitimes = tf.decode_raw(inp, tf.float32) 
     hitimes = tf.reshape(hitimes, shape) 
     hitimes = tf.transpose(hitimes, [0, 2, 3, 1]) 
     return hitimes 

    file_queue = tf.train.string_input_producer(filenames, name='file_queue') 
    reader = tf.TFRecordReader() 
    _, tfrecord = reader.read(file_queue) 

    tfrecord_features = tf.parse_single_example(
     tfrecord, 
     features={ 
      'hitimes-x': tf.FixedLenFeature([], tf.string), 
     }, 
     name='data' 
    ) 
    hitimesx = proces_hitimes(
     tfrecord_features['hitimes-x'], [-1, 2, 127, 50] 
    ) 
    return hitimesx

（通常情況下，我看了也寫其它張量，但問題在於只有一個）

對於「浮動法」，代碼如下所示：

def write_to_tfrecord(data_dict, tfrecord_file): 
    writer = tf.python_io.TFRecordWriter(tfrecord_file) 
    features_dict = {} 
    features_dict['hitimes-x'] = tf.train.Feature(
     float_list=tf.train.FloatList(
      value=data_dict['hitimes-x']['data'].flatten() 
     ) 
    ) 
    example = tf.train.Example(
     features=tf.train.Features(feature=features_dict) 
    ) 
    writer.write(example.SerializeToString()) 
    writer.close()

和，讀取時：

def tfrecord_to_graph_ops_xtxutuvtv(filenames): 
    def process_hitimes(inp, shape): 
     hitimes = tf.sparse_tensor_to_dense(inp) 
     hitimes = tf.reshape(hitimes, shape) 
     hitimes = tf.transpose(hitimes, [0, 2, 3, 1]) 
     return hitimes 

    file_queue = tf.train.string_input_producer(filenames, name='file_queue') 
    reader = tf.TFRecordReader() 
    _, tfrecord = reader.read(file_queue) 

    tfrecord_features = tf.parse_single_example(
     tfrecord, 
     features={ 
      'hitimes-x': tf.VarLenFeature(tf.float32), 
     }, 
     name='data' 
    ) 
    hitimesx = process_hitimes(
     tfrecord_features['hitimes-x'], [-1, 2, 127, 50] 
    ) 
    return hitimesx

正在寫入的數據的類型的FLOAT32 NumPy的ndarrays。

我很想知道這是一個錯誤（我使用的是TensorFlow 1.0），因爲這兩種方法對於高達500幅圖像都能很好地工作，但是當我嘗試使用更多圖像時會中斷。我查看了文檔，看看是否有我應該添加的參數，以便讀者和作者可以處理更大的文件，但我沒有找到任何東西（另外，500張圖片不是很多 - 我需要寫10張數以百萬計）。

任何想法？我打算今天試用TensorFlow 1.2，但還沒有機會。

來源

2017-07-12 Gabriel Perdue

我非常懷疑它與事件的數量有關。我正在使用tfrecord文件，每個文件都有10毫秒的事件，一切都很好。我建議你拍一張圖片並保存1k次，以確定它與500的數字無關。然後找到哪張圖片讓讀者感到不舒服，看看它與已有的圖片有什麼不同。 –

這不是事件500 - 我試過了。我認爲這是TF 1.0中的一個錯誤。 –

我升級到TF 1.2.1並且上面的問題消失了（至少在使用ByteList時 - 我不確定哪種方法更加習慣TensorFlow，但將所有內容視爲ByteList和字節數據更簡單我在這）。

有一個新問題，我相信，當閱讀一個大文件（現在，我可以寫上25K的事件，也許更多，在TF記錄文件）時 - 即TF打開整個文件一次，並將其全部加載到內存中，這比我的測試機器可以處理的數據處理更多，但我並不直接將此歸咎於TensorFlow（儘管我需要提出某種方便的壓縮或分塊方案，等等。）。

來源

2017-07-13 14:03:01

TensorFlow TFRecord與許多圖像在讀取期間崩潰

回答

相關問題