2017-08-17 67 views
0

我正在使用tensorflow對象檢測API來執行一些半實時對象檢測任務。 圖像將以2張/秒的速度由相機拍攝。蝕刻圖像將被裁剪成4張小圖像,因此總共需要處理8張圖像/秒。加載圖像進行檢測的更有效的方法

我的檢測模型已導出到凍結圖形(.pb文件)並加載到GPU內存中。然後我將圖像加載到numpy數組,將它們饋入我的模型中。

檢測本身只需要約0.1秒/圖像,但是,加載每個圖像大約需要0.45秒。

我使用的腳本是從對象檢測api(link)提供的代碼示例中修改的,它讀取每個圖像並將它們轉換爲numpy數組,然後輸入檢測模型。這個過程中最耗時的一部分是load_image_into_numpy_array,大約需要0.45秒。

的腳本如下:

import numpy as np 
import os 
import six.moves.urllib as urllib 
import sys 
import tarfile 
import tensorflow as tf 
import zipfile 
import timeit 
import scipy.misc 


from collections import defaultdict 
from io import StringIO 
from matplotlib import pyplot as plt 
from PIL import Image 


from utils import label_map_util 

from utils import visualization_utils as vis_util 

# Path to frozen detection graph. This is the actual model that is used for the 
# object detection. 
PATH_TO_CKPT = 'animal_detection.pb' 

# List of the strings that is used to add correct label for each box. 
PATH_TO_LABELS = os.path.join('data', 'animal_label_map.pbtxt') 

NUM_CLASSES = 1 


detection_graph = tf.Graph() 
with detection_graph.as_default(): 
    od_graph_def = tf.GraphDef() 
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid: 
     serialized_graph = fid.read() 
     od_graph_def.ParseFromString(serialized_graph) 
     tf.import_graph_def(od_graph_def,name='') 

label_map = label_map_util.load_labelmap(PATH_TO_LABELS) 
categories = label_map_util.convert_label_map_to_categories(label_map, 
                  max_num_classes=NUM_CLASSES, 
                  use_display_name=True) 
category_index = label_map_util.create_category_index(categories) 

def load_image_into_numpy_array(image): 
    (im_width, im_height) = image.size 
    return np.array(image.getdata()).reshape(
     (im_height, im_width, 3)).astype(np.uint8) 

# For the sake of simplicity we will use only 2 images: 
    # image1.jpg 
    # image2.jpg 
    # If you want to test the code with your images, just add path to the 
    # images to the TEST_IMAGE_PATHS. 
PATH_TO_TEST_IMAGES_DIR = 'test' 
TEST_IMAGE_PATHS = [ 
    os.path.join(PATH_TO_TEST_IMAGES_DIR,'image{}.png'.format(i)) for i in range(1, 10) ] 

    # Size, in inches, of the output images. 
IMAGE_SIZE = (12, 8) 
config = tf.ConfigProto() 
config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1 
with detection_graph.as_default(): 
    with tf.Session(graph=detection_graph, config=config) as sess: 
    for image_path in TEST_IMAGE_PATHS: 
     start = timeit.default_timer() 
     image = Image.open(image_path) 
     # the array based representation of the image will be used later in order to prepare the 
     # result image with boxes and labels on it. 
     image_np = load_image_into_numpy_array(image) 
     # Expand dimensions since the model expects images to have shape: [1, None, None, 3] 
     image_np_expanded = np.expand_dims(image_np, axis=0) 
     image_tensor = detection_graph.get_tensor_by_name('image_tensor:0') 
     end = timeit.default_timer() 
     print(end-start) 
     start = timeit.default_timer() 
     # Each box represents a part of the image where a particular object was detected. 
     boxes = detection_graph.get_tensor_by_name('detection_boxes:0') 
     # Each score represent how level of confidence for each of the objects. 
     # Score is shown on the result image, together with the class label. 
     scores = detection_graph.get_tensor_by_name('detection_scores:0') 
     classes = detection_graph.get_tensor_by_name('detection_classes:0') 
     num_detections = detection_graph.get_tensor_by_name('num_detections:0') 
     # Actual detection. 
     (boxes, scores, classes, num_detections) = sess.run(
      [boxes, scores, classes, num_detections], 
      feed_dict={image_tensor: image_np_expanded}) 
     stop = timeit.default_timer() 
     print (stop - start) 
     # Visualization of the results of a detection. 
    vis_util.visualize_boxes_and_labels_on_image_array(
     image_np, 
     np.squeeze(boxes), 
     np.squeeze(classes).astype(np.int32), 
     np.squeeze(scores), 
     category_index, 
     use_normalized_coordinates=True, 
     line_thickness=2) 

我想更有效的方式來加載由相機產生的圖像,首先想到的是要避免numpy的陣列,並嘗試使用tensorflow本土的方式來加載圖像,但我不知道從哪裏開始,因爲我對tensorflow很陌生。

如果我能找到一些tensorflow的方式來加載圖像,也許我可以將4個圖像分爲1批次,並將它們送入我的模型,以便我可以在速度上得到一些改進。

一個不成熟的想法是嘗試將從1張原始圖像裁剪的4張小圖像保存到tf_record文件中,並將tf_record文件作爲一個批處理文件加載到模型中,但我不知道如何實現這一點。

任何幫助將不勝感激。

回答

1

我發現了一個解決方案,可以將圖像加載從0.4秒減少到0.01秒。如果有人也有同樣的問題,我會在這裏發佈答案。 而不是使用PIL.Image和numpy,我們可以在opencv中使用imread。 我還設法批量處理圖像,以便我們可以實現更好的加速。

的劇本要如下:

import numpy as np 
import os 
import six.moves.urllib as urllib 
import sys 
import tensorflow as tf 
import timeit 
import cv2 


from collections import defaultdict 

from utils import label_map_util 

from utils import visualization_utils as vis_util 

MODEL_PATH = sys.argv[1] 
IMAGE_PATH = sys.argv[2] 
BATCH_SIZE = int(sys.argv[3]) 
# Path to frozen detection graph. This is the actual model that is used for the 
# object detection. 
PATH_TO_CKPT = os.path.join(MODEL_PATH, 'frozen_inference_graph.pb') 

# List of the strings that is used to add correct label for each box. 
PATH_TO_LABELS = os.path.join('data', 'animal_label_map.pbtxt') 

NUM_CLASSES = 1 

detection_graph = tf.Graph() 
with detection_graph.as_default(): 
    od_graph_def = tf.GraphDef() 
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid: 
     serialized_graph = fid.read() 
     od_graph_def.ParseFromString(serialized_graph) 
     tf.import_graph_def(od_graph_def,name='') 

label_map = label_map_util.load_labelmap(PATH_TO_LABELS) 
categories = label_map_util.convert_label_map_to_categories(label_map, 
                  max_num_classes=NUM_CLASSES, 
                  use_display_name=True) 
category_index = label_map_util.create_category_index(categories) 

PATH_TO_TEST_IMAGES_DIR = IMAGE_PATH 
TEST_IMAGE_PATHS = [ 
    os.path.join(PATH_TO_TEST_IMAGES_DIR,'image{}.png'.format(i)) for i in range(1, 129) ] 

config = tf.ConfigProto() 
config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1 
with detection_graph.as_default(): 
    with tf.Session(graph=detection_graph, config=config) as sess: 
    for i in range(0, len(TEST_IMAGE_PATHS), BATCH_SIZE): 
     images = [] 
     start = timeit.default_timer() 
     for j in range(0, BATCH_SIZE): 
      image = cv2.imread(TEST_IMAGE_PATHS[i+j]) 
      image = np.expand_dims(image, axis=0) 
      images.append(image) 
      image_np_expanded = np.concatenate(images, axis=0) 
     image_tensor = detection_graph.get_tensor_by_name('image_tensor:0') 
     # Each box represents a part of the image where a particular object was detected. 
     boxes = detection_graph.get_tensor_by_name('detection_boxes:0') 
     # Each score represent how level of confidence for each of the objects. 
     # Score is shown on the result image, together with the class label. 
     scores = detection_graph.get_tensor_by_name('detection_scores:0') 
     classes = detection_graph.get_tensor_by_name('detection_classes:0') 
     num_detections = detection_graph.get_tensor_by_name('num_detections:0') 
     # Actual detection. 
     (boxes, scores, classes, num_detections) = sess.run(
      [boxes, scores, classes, num_detections], 
      feed_dict={image_tensor: image_np_expanded}) 
     stop = timeit.default_timer() 
     print (stop - start)