1
我想了解音頻文件的分類器。我閱讀了我的WAV文件並將它們轉換爲一系列譜圖圖像,以便在自定義Python函數中進行訓練。該函數被調用tf.py_func
並返回一組具有相同形狀的圖像。換句話說,圖像的形狀是明確的,但圖像的數量是動態的。 (例如3個頻譜圖用於短片段,15片段用於長片段)使用Tensorflow操作生成可變長度數據
有沒有辦法將結果列表解壓縮到tf.train.batch_join()
的進一步處理/排隊?未定義的序列長度似乎是許多TF操作的問題。可以以某種方式推斷長度嗎?
...
// Read the audio file name and label from a CSV file
audio_file, label = tf.decode_csv(csv_content)
def read_audio(audio_file):
signal = read_wav(audio_file)
images = [generate_image(segment) for segment in split_audio(signal)]
// This output is of varying length depending on the length of audio file.
return images
// Convert audio file to a variable length sequence of images
// Shape: <unknown>, which is to be expected from tf.py_func
image_sequence = tf.py_func(wav_to_spectrogram, [audio_file], [tf.float32])[0]
// Auxilliary to set a shape for the images defined in tf.py_func
def process_image(in_image):
image = tf.image.convert_image_dtype(in_image, dtype=tf.float32)
image.set_shape([600, 39, 1])
return (image, label)
// Shape: (?, 600, 39, 1)
images_labels = tf.map_fn(process_image, image_sequence, dtype=(tf.float32, tf.int32))
// This will not work. 'images_and_labels' needs to be a list
images, label_index_batch = tf.train.batch_join(
images_and_labels,
batch_size=batch_size,
capacity=2 * num_preprocess_threads * batch_size,
shapes=[data_shape, []],
)
一兩件事。我明白這個例子在做什麼,但我錯過了'標籤部分'。我如何將每個標籤合併/連接到一個序列?輸入到'tf.train.batch()'時,'py_func'的每個可變長度輸出應與其標籤配對。我怎樣才能做到這一點? – Tom