Tensorflow多線程圖像加載

所以我有這個玩具的例子代碼;Tensorflow多線程圖像加載

import glob 
from tqdm import tqdm 
import tensorflow as tf 

imgPaths = glob.glob("/home/msmith/imgs/*/*") # Some images 

filenameQ = tf.train.string_input_producer(imgPaths) 
reader = tf.WholeFileReader() 
key, value = reader.read(filenameQ) 

img = tf.image.decode_jpeg(value) 
init_op = tf.initialize_all_variables() 

with tf.Session() as sess: 
    sess.run(init_op) 
    coord = tf.train.Coordinator() 
    threads = tf.train.start_queue_runners(coord=coord) 
    for i in tqdm(range(10000)): 
     img.eval().mean()

加載圖像並打印每個圖像的均值。如何對其進行編輯，以便對圖像的加載部分進行多線程處理，這是目前我的tf圖像腳本的瓶頸。

來源

2016-11-25 mattdns

我想看看[QueueRunner（https://www.tensorflow.org/versions/r0.11/how_tos/threading_and_queues/index.html#queuerunner）類，雖然它不是很清楚，我如何將它與預先構建的閱讀器連接起來。 – sygi

編輯（2018/3/5）：使用tf.data API現在更容易獲得相同的結果。

import glob 
from tqdm import tqdm 
import tensorflow as tf 

imgPaths = glob.glob("/home/msmith/imgs/*/*") # Some images 

dataset = (tf.data.Dataset.from_tensor_slices(imgPaths) 
      .map(lambda x: tf.reduce_mean(tf.decode_jpeg(tf.read_file(x))), 
       num_parallel_calls=16) 
      .prefetch(128)) 

iterator = dataset.make_one_shot_iterator() 
next_mean = iterator.get_next() 

with tf.Session() as sess: 
    for i in tqdm(range(10000)): 
     sess.run(next_mean)

如sygi表明在their comment，一個tf.train.QueueRunner可以用於定義在一個單獨的線程運行一些OPS，和（通常）排隊值成TensorFlow隊列。

import glob 
from tqdm import tqdm 
import tensorflow as tf 

imgPaths = glob.glob("/home/msmith/imgs/*/*") # Some images 

filenameQ = tf.train.string_input_producer(imgPaths) 

# Define a subgraph that takes a filename, reads the file, decodes it, and                      
# enqueues it.                                     
filename = filenameQ.dequeue() 
image_bytes = tf.read_file(filename) 
decoded_image = tf.image.decode_jpeg(image_bytes) 
image_queue = tf.FIFOQueue(128, [tf.uint8], None) 
enqueue_op = image_queue.enqueue(decoded_image) 

# Create a queue runner that will enqueue decoded images into `image_queue`.                     
NUM_THREADS = 16 
queue_runner = tf.train.QueueRunner(
    image_queue, 
    [enqueue_op] * NUM_THREADS, # Each element will be run from a separate thread.                      
    image_queue.close(), 
    image_queue.close(cancel_pending_enqueues=True)) 

# Ensure that the queue runner threads are started when we call                        
# `tf.train.start_queue_runners()` below.                              
tf.train.add_queue_runner(queue_runner) 

# Dequeue the next image from the queue, for returning to the client.                       
img = image_queue.dequeue() 

init_op = tf.global_variables_initializer() 

with tf.Session() as sess: 
    sess.run(init_op) 
    coord = tf.train.Coordinator() 
    threads = tf.train.start_queue_runners(sess=sess, coord=coord) 
    for i in tqdm(range(10000)): 
     img.eval().mean()

來源

2016-12-05 23:40:55 mrry

這太好了。還有一些事情;如果我想做預處理，我會在image_queue.dequeue（）之前執行此操作嗎？另外，何時可以找出線程是否完成了輸入列表？ – mattdns

對於預處理，您可以在'image_queue.dequeue（）'之前執行此操作，但如果您希望另一組線程與解析並行執行該操作，則可以添加另一個隊列/'QueueRunner'。如果圖像大小相同，您可能會發現['tf.train.batch（）']（https://www.tensorflow.org/versions/r0.12/api_docs/python/io_ops.html#batch）對此有用。告訴線程何時完成的最簡單方法是使用'while not coord.should_stop（）：'而不是'for'循環。 – mrry

非常好。圖像的標籤用文件名的字符串編碼，如果我可以把它變成一個OH矢量，並且我想在正確的時間輸出正確的矢量......我可以通過添加另一個'''enqueue_op''來做到這一點'將類向量張量到列表中'''[enqueue_op]'''？順便說一下，我無法再支付2小時的賞金。 – mattdns

Tensorflow多線程圖像加載

回答

相關問題