在Tensorflow中批量讀取numpy矩陣

我正試圖在GPU上運行一些迴歸模型。雖然我的GPU利用率很低，達到了20％。通過代碼後，在Tensorflow中批量讀取numpy矩陣

for i in range(epochs): 
    rand_index = np.random.choice(args.train_pr, 
     size=args.batch_size) 
    rand_x = X_train[rand_index] 
    rand_y = Y_train[rand_index]

我使用這三行爲每次迭代選擇一個隨機批處理。所以，我想問問培訓何時開始，我是否可以爲下一次迭代準備多一批？

我正在迴歸問題而不是分類問題。我已經在Tensorflow中看到了線程化，但僅爲圖像找到了示例，並且沒有用於訓練的大型100000X1000矩陣示例。

來源

2017-07-19 Deepak Singla

其重複的問題：https://stackoverflow.com/questions/45110098/tensorflow-next-batch-function-of-np-array/45110647#45110647 –

這是一個很好的發電機用例。您可以設置一個生成器函數來一次一個塊地生成您的numpy矩陣的切片。如果您使用像Keras這樣的包裝，則可以直接將發生器提供給train_on_batch功能。如果你更喜歡直接使用Tensorflow，你可以使用：

sess = tf.Session() 
sess.run(init) 
batch_gen = generator(data) 
batch = batch_gen.next() 
sess.run([optimizer, loss, ...], feed_dict = {X: batch[0], y: batch[1]})

注：我使用的優化和損失佔位符，你有你的定義來代替。請注意，您的生成器應產生一個（x，y）元組。如果你不熟悉發電機表情，還有很多在線的例子，但這裏是從Keras文檔一個簡單的例子，顯示如何在numpy矩陣從文件批量閱讀：更根本

def generate_arrays_from_file(path): 
    while 1: 
     f = open(path) 
     for line in f: 
      x, y = process_line(line) 
      yield (x, y) 
     f.close()

但也，低GPU使用率並不能真正表明加載批次時出現問題，而是說您的批量可能太小。

來源

2017-07-19 22:12:41 rvd

你有一個巨大的numpy數組，位於主機內存上。您希望能夠在CPU上並行處理它並將批次發送到設備。這是使用queues的好方案。

下面是一個簡單的例子，簡單地提取numpy的陣列的隨機切片（像你一樣），讓你在Python預處理與您喜愛的工具：

import numpy as np 
import tensorflow as tf 

def make_batch(x, y, batch_size): 
    rand_index = np.random.choice(x.shape[0], size=batch_size) 
    x_batch, y_batch = x[rand_index], y[rand_index] 
    # Do all your pre-processing here 
    # ... 
    return (x_batch, y_batch) 

x = np.arange(10, dtype=np.float32) 
y = np.arange(10, dtype=np.int32) 
batch_size = 2 
tf_make_batch = tf.py_func(make_batch, [x,y,batch_size], (tf.float32, tf.int32)) 

queue = tf.FIFOQueue(capacity=1000, dtypes=(tf.float32, tf.int32)) 
enqueue_op = queue.enqueue(tf_make_batch) 
inputs = queue.dequeue() 
qr = tf.train.QueueRunner(queue, [enqueue_op] * 4) 
with tf.Session() as sess: 
    coord = tf.train.Coordinator() 
    enqueue_threads = qr.create_threads(sess, coord=coord, start=True) 
    for step in range(10): 
    print(sess.run(inputs)) 
    coord.request_stop() 
    coord.join(enqueue_threads)

它採用了FIFOQueue，因爲隨機抽樣已經發生在make_batch。

當然要真正受益於多線程，make_batch應該做更多的採樣。當您將一些重要的預處理添加到管道中時，您可能會開始發現顯着差異。

來源

2017-07-20 06:14:58 user1735003

使用此方法，我可以更改x和y在某個步驟？假設我正在使用K-交叉驗證，並且在某個步驟之後，我需要將一些x的樣本與其他一些樣本進行交換。是否有可能這樣做？謝謝！ –

@Deepak是的！我編輯了我的問題來標記預處理應該發生的地方。 – user1735003

我相信在腳本中有一些錯誤，因爲對於tf.py_func，tf.placeholder應該是輸入而不是Numpy數組。你可以參考這個：https://www.tensorflow.org/api_docs/python/tf/py_func –

在Tensorflow中批量讀取numpy矩陣

回答

相關問題