從張量流中的文件隊列訪問文件名

我有一個圖像目錄和一個單獨的文件，將圖像文件名匹配到標籤。所以圖像的目錄中有像「火車/ 001.JPG」和標記文件看起來像文件：從張量流中的文件隊列訪問文件名

train/001.jpg 1 
train/002.jpg 2 
...

我可以很容易地通過創建從一個文件名加載開fileQueue從張量流圖像目錄圖片：

filequeue = tf.train.string_input_producer(filenames) 
reader = tf.WholeFileReader() 
img = reader.read(filequeue)

但我不知道如何將這些文件與標籤文件中的標籤結合在一起。看來我需要在每一步訪問隊列中的文件名。有沒有辦法讓他們？此外，一旦我有文件名，我需要能夠查找由文件名鍵入的標籤。這看起來像一個標準的Python字典不會工作，因爲這些計算需要發生在圖中的每一步。

來源

2015-12-02 bschreck

鑑於你的數據不是太大，你提供的文件名列表作爲一個Python數組，我建議只是在Python中進行預處理。創建兩個列表（相同順序）的文件名和標籤，並將其插入隨機洗牌隊列或隊列中，並從中退出。如果你想要string_input_producer的「循環無限」行爲，你可以在每個紀元開始時重新運行'入隊'。

一個非常玩具例子：

import tensorflow as tf 

f = ["f1", "f2", "f3", "f4", "f5", "f6", "f7", "f8"] 
l = ["l1", "l2", "l3", "l4", "l5", "l6", "l7", "l8"] 

fv = tf.constant(f) 
lv = tf.constant(l) 

rsq = tf.RandomShuffleQueue(10, 0, [tf.string, tf.string], shapes=[[],[]]) 
do_enqueues = rsq.enqueue_many([fv, lv]) 

gotf, gotl = rsq.dequeue() 

with tf.Session() as sess: 
    sess.run(tf.initialize_all_variables()) 
    tf.train.start_queue_runners(sess=sess) 
    sess.run(do_enqueues) 
    for i in xrange(2): 
     one_f, one_l = sess.run([gotf, gotl]) 
     print "F: ", one_f, "L: ", one_l

的關鍵是，你有效地入列對文件名/標籤的，當你做了enqueue，以及那些對由dequeue返回。

來源

2015-12-03 00:54:29 dga

好吧，那正是我所需要的！我沒有想過只是在python中匹配這兩個，並事先對它進行了洗牌 - 我只是試圖使用加載該文件的CIFAR教程中的代碼，然後再進行洗牌。 – bschreck

其實我只是試了一下，我覺得我的文件名列表太大了。使用此代碼只會掛起，但在減少列表中元素的數量時起作用。順便提一下，有87,000個文件。 – bschreck

有趣 - 它不應該掛，真的。你有沒有增加randomshufflequeue max到足夠大來處理你放入它的東西的數量？我會告誡說，我從來沒有嘗試過一個很大的隨機洗牌隊列。 :)如果您想節省內存，您可以將文件重寫爲csv，使用傳送到csv解碼器的textlinereader，然後將它們放入隊列中，使用queuerunner保持運行。儘管只有大約1MB的文件名，但很多工作。 – dga

這是我所能做的。

我第一次洗牌的文件名，並在Python相匹配的標籤，對他們說：

np.random.shuffle(filenames) 
labels = [label_dict[f] for f in filenames]

然後創建一個string_input_producer與推卸的文件名和一個FIFO用於標籤：

lv = tf.constant(labels) 
label_fifo = tf.FIFOQueue(len(filenames),tf.int32, shapes=[[]]) 
file_fifo = tf.train.string_input_producer(filenames, shuffle=False, capacity=len(filenames)) 
label_enqueue = label_fifo.enqueue_many([lv])

然後閱讀圖像，我可以使用WholeFileReader並獲得我可以使隊列出隊的標籤：

reader = tf.WholeFileReader() 
image = tf.image.decode_jpeg(value, channels=3) 
image.set_shape([128,128,3]) 
result.uint8image = image 
result.label = label_fifo.dequeue()

並生成批次如下：

min_fraction_of_examples_in_queue = 0.4 
min_queue_examples = int(num_examples_per_epoch * 
         min_fraction_of_examples_in_queue) 
num_preprocess_threads = 16 
images, label_batch = tf.train.shuffle_batch(
    [result.uint8image, result.label], 
    batch_size=FLAGS.batch_size, 
    num_threads=num_preprocess_threads, 
    capacity=min_queue_examples + 3 * FLAGS.batch_size, 
    min_after_dequeue=min_queue_examples)

來源

2015-12-03 16:47:44 bschreck

這個標籤閱讀框架非常適合我的代碼，它也使用'tf.WholeFileReader'來讀取圖像文件名，但是，用戶必須記得在開始訓練之前運行'sess.run（label_enqueue）'，否則程序會掛在那裏，等待入隊操作發生。 –

我試圖使用與您的代碼相同的想法，但我無法保持標籤與圖像同步。 http://stackoverflow.com/questions/43567552/tf-slice-input-producer-not-keeping-tensors-in-sync – rasen58

有tf.py_func()你可以利用實現從文件路徑映射到標籤。

files = gfile.Glob(data_pattern) 
filename_queue = tf.train.string_input_producer(
files, num_epochs=num_epochs, shuffle=True) # list of files to read 

def extract_label(s): 
    # path to label logic for cat&dog dataset 
    return 0 if os.path.basename(str(s)).startswith('cat') else 1 

def read(filename_queue): 
    key, value = reader.read(filename_queue) 
    image = tf.image.decode_jpeg(value, channels=3) 
    image = tf.cast(image, tf.float32) 
    image = tf.image.resize_image_with_crop_or_pad(image, width, height) 
    label = tf.cast(tf.py_func(extract_label, [key], tf.int64), tf.int32) 
    label = tf.reshape(label, []) 

training_data = [read(filename_queue) for _ in range(num_readers)] 

... 

tf.train.shuffle_batch_join(training_data, ...)

來源

2017-03-27 13:48:39

我用這個：

filename = filename.strip().decode('ascii')

來源

2017-03-29 22:21:59

另一個建議是保存TFRecord格式的數據。在這種情況下，您可以將所有圖像和所有標籤保存在同一個文件中。對於大數量的文件，它提供了很多優勢：

可以在同一個地方
數據在一個地方（不需要記住不同的目錄）
如果分配存儲數據和標籤有是很多文件（圖像），打開/關閉文件非常耗時。從ssd/hdd尋找文件的位置也需要時間

來源

2017-07-02 11:15:07

從張量流中的文件隊列訪問文件名

回答

相關問題