2016-07-21 44 views
1

我想了解音頻文件的分類器。我閱讀了我的WAV文件並將它們轉換爲一系列譜圖圖像,以便在自定義Python函數中進行訓練。該函數被調用tf.py_func並返回一組具有相同形狀的圖像。換句話說,圖像的形狀是明確的,但圖像的數量是動態的。 (例如3個頻譜圖用於短片段,15片段用於長片段)使用Tensorflow操作生成可變長度數據

有沒有辦法將結果列表解壓縮到tf.train.batch_join()的進一步處理/排隊?未定義的序列長度似乎是許多TF操作的問題。可以以某種方式推斷長度嗎?

... 
// Read the audio file name and label from a CSV file 
audio_file, label = tf.decode_csv(csv_content) 

def read_audio(audio_file): 

    signal = read_wav(audio_file) 
    images = [generate_image(segment) for segment in split_audio(signal)] 

    // This output is of varying length depending on the length of audio file. 
    return images 

// Convert audio file to a variable length sequence of images 
// Shape: <unknown>, which is to be expected from tf.py_func 
image_sequence = tf.py_func(wav_to_spectrogram, [audio_file], [tf.float32])[0] 

// Auxilliary to set a shape for the images defined in tf.py_func 
def process_image(in_image): 
    image = tf.image.convert_image_dtype(in_image, dtype=tf.float32) 
    image.set_shape([600, 39, 1]) 

    return (image, label) 


// Shape: (?, 600, 39, 1) 
images_labels = tf.map_fn(process_image, image_sequence, dtype=(tf.float32, tf.int32)) 


// This will not work. 'images_and_labels' needs to be a list 
images, label_index_batch = tf.train.batch_join(
    images_and_labels, 
    batch_size=batch_size, 
    capacity=2 * num_preprocess_threads * batch_size, 
    shapes=[data_shape, []], 
) 

回答

7

可以使用可變大小的張量作爲輸入和enqueue_many看待這個張量的可變大小輸入批次。

下面是py_func生成可變大小的批處理和批處理的示例,enqueue_many將其轉換爲恆定大小的批處理。

import tensorflow as tf 

tf.reset_default_graph() 

# start with time-out to prevent hangs when experimenting 
config = tf.ConfigProto() 
config.operation_timeout_in_ms=2000 
sess = tf.InteractiveSession(config=config) 

# initialize first queue with 1, 2, 1, 2 
queue1 = tf.FIFOQueue(capacity=4, dtypes=[tf.int32]) 
queue1_input = tf.placeholder(tf.int32) 
queue1_enqueue = queue1.enqueue(queue1_input) 
sess.run(queue1_enqueue, feed_dict={queue1_input: 1}) 
sess.run(queue1_enqueue, feed_dict={queue1_input: 2}) 
sess.run(queue1_enqueue, feed_dict={queue1_input: 1}) 
sess.run(queue1_enqueue, feed_dict={queue1_input: 2}) 
sess.run(queue1.close()) 

# call_func will produce variable size tensors 
def range_func(x): 
    return np.array(range(x), dtype=np.int32) 
[call_func] = tf.py_func(range_func, [queue1.dequeue()], [tf.int32]) 
queue2_dequeue = tf.train.batch([call_func], batch_size=3, shapes=[[]], enqueue_many=True) 

coord = tf.train.Coordinator() 
threads = tf.train.start_queue_runners(coord=coord) 
try: 
    while True: 
    print sess.run(queue2_dequeue) 
except tf.errors.OutOfRangeError: 
    pass 
finally: 
    coord.request_stop() 
coord.join(threads) 
sess.close() 

您應該看到

[0 0 1] 
[0 0 1] 
+0

一兩件事。我明白這個例子在做什麼,但我錯過了'標籤部分'。我如何將每個標籤合併/連接到一個序列?輸入到'tf.train.batch()'時,'py_func'的每個可變長度輸出應與其標籤配對。我怎樣才能做到這一點? – Tom