爲什麼Tensorflow Object Detection API只檢測到第一個類，而忽略其餘的？

-1

我在我自己的數據集上運行了一個快速測試ODA。我注意到它只能檢測一個類，就好像只有一個類一樣！爲什麼Tensorflow Object Detection API只檢測到第一個類，而忽略其餘的？

這裏是它檢測到正確的類的實例：

classes=[[ 1. 1. 2. 2. 1. 2. 1. 2. 1. 2. 2. 1. 2. 2. 2. 2. 2. 2. 
    2. 2. 2. 2. 1. 2. 1. 2. 1. 1. 2. 1. 2. 1. 2. 2. 2. 2. 
    1. 2. 2. 1. 2. 1. 1. 1. 2. 2. 2. 1. 1. 1. 2. 1. 1. 2. 
    2. 2. 1. 1. 2. 1. 2. 2. 1. 1. 1. 2. 1. 2. 2. 1. 2. 2. 
    2. 2. 1. 1. 1. 1. 2. 1. 2. 2. 1. 1. 2. 1. 2. 1. 2. 2. 
    1. 1. 2. 1. 1. 2. 2. 2. 1. 2.]]

這裏是一個例子，其中它不會做任何事情！：

並且這些打印在每張圖像下面的數字是classes變量的內容（下面給出的代碼）我打印的是否有任何其他類別被識別。

classes=[[ 1. 1. 2. 2. 1. 2. 1. 1. 1. 1. 2. 1. 2. 2. 2. 2. 2. 2. 
    2. 2. 2. 1. 2. 1. 1. 1. 1. 1. 1. 2. 2. 2. 1. 2. 1. 2. 
    2. 1. 2. 1. 2. 1. 2. 2. 2. 2. 1. 2. 1. 1. 1. 1. 2. 1. 
    2. 1. 2. 2. 1. 2. 1. 2. 2. 1. 2. 1. 1. 2. 1. 1. 2. 2. 
    2. 1. 1. 1. 2. 2. 1. 2. 1. 2. 2. 1. 1. 1. 2. 2. 2. 2. 
    1. 2. 2. 2. 2. 1. 1. 2. 1. 1.]]

這裏是它錯誤地檢測到的類（你可以再次看到它只能檢測類1）一個例子：

classes=[[ 1. 2. 2. 1. 1. 2. 1. 2. 2. 2. 2. 1. 1. 1. 1. 2. 1. 1. 
    2. 2. 2. 2. 2. 2. 1. 1. 2. 1. 2. 1. 1. 1. 1. 2. 1. 2. 
    2. 1. 1. 2. 1. 2. 1. 1. 1. 2. 1. 1. 2. 2. 1. 2. 1. 2. 
    2. 1. 1. 1. 1. 2. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 1. 2. 
    2. 2. 1. 1. 2. 2. 1. 1. 2. 2. 2. 2. 2. 1. 2. 1. 1. 1. 
    2. 1. 1. 1. 1. 1. 1. 1. 2. 1.]]

所以基本上只繪製一個矩形僅限1級！而完全忽略類2.我使用的筆記本Jupyter示例中提供的代碼如下：

with detection_graph.as_default(): 
    with tf.Session(graph=detection_graph) as sess: 
    # Definite input and output Tensors for detection_graph 
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0') 
    # Each box represents a part of the image where a particular object was detected. 
    detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0') 
    # Each score represent how level of confidence for each of the objects. 
    # Score is shown on the result image, together with the class label. 
    detection_scores = detection_graph.get_tensor_by_name('detection_scores:0') 
    detection_classes = detection_graph.get_tensor_by_name('detection_classes:0') 
    num_detections = detection_graph.get_tensor_by_name('num_detections:0') 
    for image_path in TEST_IMAGE_PATHS: 
     image = Image.open(image_path) 
     # the array based representation of the image will be used later in order to prepare the 
     # result image with boxes and labels on it. 
     image_np = load_image_into_numpy_array(image) 
     # Expand dimensions since the model expects images to have shape: [1, None, None, 3] 
     image_np_expanded = np.expand_dims(image_np, axis=0) 
     # Actual detection. 
     (boxes, scores, classes, num) = sess.run(
      [detection_boxes, detection_scores, detection_classes, num_detections], 
      feed_dict={image_tensor: image_np_expanded}) 
     # Visualization of the results of a detection. 
     vis_util.visualize_boxes_and_labels_on_image_array(
      image_np, 
      np.squeeze(boxes), 
      np.squeeze(classes).astype(np.int32), 
      np.squeeze(scores), 
      category_index, 
      use_normalized_coordinates=True, 
      line_thickness=4,max_boxes_to_draw=50) 
     #print(scores) 
     plt.figure(figsize=(image_np.shape[1]/float(96), image_np.shape[0]/float(96)))#IMAGE_SIZE 
     plt.imshow(image_np) 
     #matplotlib.image.imsave(os.path.basename(image_path), image_np) 
     plt.show() 
     print(classes)

我甚至嘗試設置min_score_thresh=0.1但什麼都沒有改變！然後我嘗試max_boxes_to_draw，你可以看到，再次無濟於事。代碼方面的其他內容與this完全相同，除了它從互聯網上下載模型的部分，我註釋掉了它並閱讀了我自己的預訓練模型。

我是新來的對象檢測，並不知道是什麼導致此。

更新：

我的標記圖如下所示：

item{ 
id: 1 
name: 'class1' 
} 
item{ 
id: 2 
name: 'class2' 
}

和我的數據集是由XML文件，如低於該使用下面我給的代碼片段轉換爲CSV 。註釋例如：

<annotation> 
    <folder>Imagenet_fldr</folder> 
    <filename>resized_imgnet_17.jpg</filename> 
    <path>G:\Tensorflow_section\dataset\Imagenet_fldr\resized_imgnet_17.jpg</path> 
    <source> 
    <database>arven</database> 
    </source> 
    <size> 
    <width>384</width> 
    <height>256</height> 
    <depth>3</depth> 
    </size> 
    <segmented>0</segmented> 
    <object> 
    <name>class1</name> 
    <pose>unknown</pose> 
    <truncated>1</truncated> 
    <difficult>0</difficult> 
    <bndbox> 
     <xmin>2</xmin> 
     <ymin>2</ymin> 
     <xmax>380</xmax> 
     <ymax>252</ymax> 
    </bndbox> 
    </object> 
</annotation>

，這裏是我用XML轉換爲CSV的片段：

import os 
import glob 
import pandas as pd 
import xml.etree.ElementTree as ET 
import sys 

def xml_to_csv(path,directory): 
    xml_list = [] 
    for xml_file in glob.glob(path + '/*.xml'): 
     #print(xml_file) 
     tree = ET.parse(xml_file) 
     root = tree.getroot() 
     for member in root.findall('object'): 
      value = (directory+'\\'+root.find('filename').text, 
        int(root.find('size')[0].text), 
        int(root.find('size')[1].text), 
        member[0].text, 
        int(member[4][0].text), 
        int(member[4][1].text), 
        int(member[4][2].text), 
        int(member[4][3].text) 
        ) 
      xml_list.append(value) 
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax'] 
    xml_df = pd.DataFrame(xml_list, columns=column_name) 
    return xml_df 


def main(): 
    for directory in os.listdir(sys.argv[1]): 
     image_path = sys.argv[1]+'\\'+directory 
     #print(image_path) 
     xml_df = xml_to_csv(image_path,directory) 
     xml_df.to_csv('{0}_labels.csv'.format(directory), index=None) 
     print('Successfully converted xml to csv.') 


main()

最後我這是怎麼創建TFRecords：

""" 
Usage: 
    # First specify the folder containing images! 
    # Create train data: 
    python xgenerate_tf_record.py --images_folder G:\\Tensorflow_section\\dtset\\ --csv_input=train_labels.csv --output_path=train.record 

    # Create test data: 
    python xgenerate_tf_record.py --images_folder G:\\Tensorflow_section\\dtset\\ --csv_input=test_labels.csv --output_path=test.record 
""" 
from __future__ import division 
from __future__ import print_function 
from __future__ import absolute_import 

import os 
import io 
import pandas as pd 
import tensorflow as tf 

from PIL import Image 
from object_detection.utils import dataset_util 
from collections import namedtuple, OrderedDict 
from pathlib import Path 

flags = tf.app.flags 
flags.DEFINE_string('images_folder', '', 'Path to the directory containing images') 
flags.DEFINE_string('csv_input', '', 'Path to the CSV input') 
flags.DEFINE_string('output_path', '', 'Path to output TFRecord') 
FLAGS = flags.FLAGS 


# TO-DO replace this with label map 
def class_text_to_int(row_label): 
    if row_label == 'class2': 
     return 0 
    if row_label == 'class1': 
     return 1  
    else: 
     None 


def split(df, group): 
    data = namedtuple('data', ['filename', 'object']) 
    gb = df.groupby(group) 
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)] 


def create_tf_example(group, path): 
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid: 
     encoded_img = fid.read() 
     #print(group, path)  
    encoded_img_io = io.BytesIO(encoded_img) 
    image = Image.open(encoded_img_io) 
    width, height = image.size 

    filename = group.filename.encode('utf8') 
    image_format = b'jpg' 
    xmins = [] 
    xmaxs = [] 
    ymins = [] 
    ymaxs = [] 
    classes_text = [] 
    classes = [] 

    for index, row in group.object.iterrows(): 
     ext = (Path(row['filename']).suffixes)[0].split(".")[1].lower() 
     #print('format = ',ext) 
     image_format = bytes(ext, encoding="utf8") 
     xmins.append(row['xmin']/width) 
     xmaxs.append(row['xmax']/width) 
     ymins.append(row['ymin']/height) 
     ymaxs.append(row['ymax']/height) 
     classes_text.append(row['class'].encode('utf8')) 
     classes.append(class_text_to_int(row['class'])) 



    tf_example = tf.train.Example(features=tf.train.Features(feature={ 
     'image/height': dataset_util.int64_feature(height), 
     'image/width': dataset_util.int64_feature(width), 
     'image/filename': dataset_util.bytes_feature(filename), 
     'image/source_id': dataset_util.bytes_feature(filename), 
     'image/encoded': dataset_util.bytes_feature(encoded_img), 
     'image/format': dataset_util.bytes_feature(image_format), 
     'image/object/bbox/xmin': dataset_util.float_list_feature(xmins), 
     'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs), 
     'image/object/bbox/ymin': dataset_util.float_list_feature(ymins), 
     'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs), 
     'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 
     'image/object/class/label': dataset_util.int64_list_feature(classes), 
    })) 
    return tf_example 


def main(_): 
    #print('In the name of Allah') 
    writer = tf.python_io.TFRecordWriter(FLAGS.output_path) 
    dataset_folders = FLAGS.images_folder #'G:\\Tensorflow_section\\dtset\\' 
    #print('dataset_folders = '+dataset_folders) 

    path = dataset_folders 
    examples = pd.read_csv(FLAGS.csv_input) 
    #print('examples: ',examples) 
    grouped = split(examples, 'filename') 
    for group in grouped: 
     tf_example = create_tf_example(group, path) 
     writer.write(tf_example.SerializeToString()) 

    writer.close() 
    output_path = os.path.join(os.getcwd(), FLAGS.output_path) 
    print('Successfully created the TFRecords: {}'.format(output_path)) 


if __name__ == '__main__': 
    tf.app.run()

來源

2017-09-21 Breeze

你的標籤貼圖是怎樣的？你如何建立TFRecords？ – ITiger

@ITiger：我用所需的信息更新了問題。 – Breeze

現在我明白了，我想我在class_text_to_int（）方法中犯了一個錯誤！我應該寫2而不是0，看起來0完全忽略了這個類！我對嗎？ – Breeze

正如你所創建的標籤映射，只是用它在你的代碼。在教程中提到，索引必須從1開始，因爲0級被視爲背景。您可以使用label_map_util模塊創建標籤。

from object_detection.utils import label_map_util 
from object_detection.utils import dataset_util 
import xml.etree.ElementTree as ET 

LABEL_MAP_PATH = "/PATH/TO/LABEL_MAP.pbtxt" 

def create_tf_example(directory, name): 

    # Read Image file 
    image_filename = "{}{}{}.jpg".format(directory, IMAGE_DIRECTORY, name) 
    # Read XML Annotation 
    xml_filename = os.path.join("{}{}{}.xml".format(directory, ANNOTATION_DIRECTORY, name)) 
    tree = ET.parse(xml_filename) 
    root = tree.getroot() 

    label_map_dict = label_map_util.get_label_map_dict(LABEL_MAP_PATH) 

    classes = [] 
    classes_text = [] 
    for o in root.findall('object'): 
     classes_text.append(o.find('name').text.encode('utf8')) 
     classes.append(label_map_dict[o.find('name').text]) 

     example = tf.train.Example(features=tf.train.Features(feature={ 
     # Do all the other stuff 
     'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 
     'image/object/class/label': dataset_util.int64_list_feature(classes), 
     })) 
    return examle

來源

2017-09-22 09:42:57 ITiger

-1

我找到了罪魁禍首！

這裏：

# TO-DO replace this with label map 
def class_text_to_int(row_label): 
    if row_label == 'class2': 
     return 0 
    if row_label == 'class1': 
     return 1  
    else: 
     None

需要改變，以

# TO-DO replace this with label map 
def class_text_to_int(row_label): 
    if row_label == 'class2': 
     return 2 
    if row_label == 'class1': 
     return 1  
    else: 
     None

匹配

item{ 
id: 1 
name: 'class1' 
} 
item{ 
id: 2 
name: 'class2' 
}

來源

2017-09-22 09:13:31 Breeze

爲什麼Tensorflow Object Detection API只檢測到第一個類，而忽略其餘的？

回答

相關問題