用OpenCV、Keras和TensorFlow进行区域建议物体检测
在这篇文章中,我们将学习如何用OpenCV、Keras和TensorFlow实现区域建议物体检测。
安装所有的依赖项
使用pip命令来安装所有的依赖项
pip install tensorflow keras imutils
pip install opencv-contrib-python
注意:确保安装上述OpenCV包,否则你可能面临导入错误。
步骤1:读取图像并应用OpenCV的选择性搜索方法
在这一步,我们将读取图像并对其应用OpenCV的选择性搜索方法。这个方法将返回一个矩形的列表,这些矩形基本上是感兴趣的区域。OpenCV为这种选择性搜索提供了两种不同的方法,一种是 “快速 “方法,另一种是 “精确 “方法,你必须根据你的使用情况来决定使用哪一种。
现在我们已经有了矩形,在我们进一步讨论之前,让我们试着将它所返回的感兴趣的区域可视化。
import numpy as np
import cv2
# this is the model we'll be using for
# object detection
from tensorflow.keras.applications import Xception
# for preprocessing the input
from tensorflow.keras.applications.xception import preprocess_input
from tensorflow.keras.applications import imagenet_utils
from tensorflow.keras.preprocessing.image import img_to_array
from imutils.object_detection import non_max_suppression
# read the input image
img = cv2.imread('Assets/img2.jpg')
# instanciate the selective search
# segmentation algorithm of opencv
search = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
# set the base image as the input image
search.setBaseImage(img)
# since we'll use the fast method we set it as such
search.switchToSelectiveSearchFast()
# you can also use this for more accuracy:
# search.switchToSelectiveSearchQuality()
rects = search.process() # process the image
roi = img.copy()
for (x, y, w, h) in rects:
# Check if the width and height of
# the ROI is atleast 10 percent
# of the image dimensions and only then
# show it
if (w / float(W) < 0.1 or h / float(H) < 0.1):
continue
# Let's visualize all these ROIs
cv2.rectangle(roi, (x, y), (x + w, y + h),
(0, 200, 0), 2)
roi = cv2.resize(roi, (640, 640))
final = cv2.hconcat([cv2.resize(img, (640, 640)), roi])
cv2.imshow('ROI', final)
cv2.waitKey(0)
输出:
这些是我们的函数在过滤掉不够大的ROI后收到的所有兴趣区域,也就是说,如果ROI的宽度或高度小于图像的10%,我们就不考虑它。
第2步:使用ROI创建一个最终输入阵列和边界框的列表
我们将创建两个单独的列表,其中包含RGB格式的图像,另一个列表将包含边界框坐标。这些列表将分别用于预测和创建边界框。我们还将确保我们只对足够大的ROI进行预测,例如,至少有我们图像20%的宽度或高度。
rois = []
boxes = []
(H, W) = img.shape[:2]
rois = []
boxes = []
(H, W) = img.shape[:2]
for (x, y, w, h) in rects:
# check if the ROI has atleast
# 20% the size of our image
if w / float(W) < 0.2 or h / float(H) < 0.2:
continue
# Extract the Roi from image
roi = img[y:y + h, x:x + w]
# Convert it to RGB format
roi = cv2.cvtColor(roi, cv2.COLOR_BGR2RGB)
# Resize it to fit the input requirements of the model
roi = cv2.resize(roi, (299, 299))
# Further preprocessing
roi = img_to_array(roi)
roi = preprocess_input(roi)
# Append it to our rois list
rois.append(roi)
# now let's store the box co-ordinates
x1, y1, x2, y2 = x, y, x + w, y + h
boxes.append((x1, y1, x2, y2))
现在,我们有了经过过滤和预处理的兴趣区域,让我们用它们来创建预测,使用我们的模型。
第3步:使用模型生成预测结果
我们使用Keras预训练模型中的ResNet50模型,主要是因为它对机器的影响不大,而且准确率也高。因此,首先,我们将创建我们的模型实例,然后传入我们的输入->ROI列表并生成预测。
在代码中,它看起来像这样。
# ———— Model————— #
model = Xception(weights=’imagenet’)
# Convert ROIS list to arrays for predictions
input_array = np.array(rois)
print(“Input array shape is ;” ,input_array.shape)
#———- Make Predictions ——-#
preds = model.predict(input_array)
preds = imagenet_utils.decode_predictions(preds, top=1)
现在,我们有了预测,让我们在图像上显示结果。
第4步:创建对象字典
在这一步中,我们将创建一个新的字典,基本上包含标签作为键,作为边界框,和概率作为值。这样我们就可以很容易地访问每个标签的预测,并对其应用非最大压制。我们可以通过循环预测并过滤掉置信度超过90%的预测(你可以根据你的需要改变它)来做到这一点。让我们看看代码。
# Initiate the dictionary
objects = {}
for (i, pred) in enumerate(preds):
# extract the prediction tuple
# and store it's values
iD = pred[0][0]
label = pred[0][1]
prob = pred[0][2]
if prob >= 0.9:
# grab the bounding box associated
# with the prediction and
# convert the coordinates
box = boxes[i]
# create a tuple using box and probability
value = objects.get(label, [])
# append the value to the list for the label
value.append((box, prob))
# Add this tuple to the objects
# dictionary that we initiated
objects[label] = value
输出:
{‘img’: [((126, 295, 530, 800), 0.5174897), ((166, 306, 497, 613), 0.510667), ((176, 484, 520, 656), 0.56631094), ((161, 304, 499, 613), 0.55209666), ((161, 306, 504, 613), 0.6020483), ((161, 306, 499, 613), 0.54256636), ((140, 305, 499, 800), 0.5012991), ((144, 305, 516, 800), 0.50028765), ((162, 305, 499, 642), 0.84315413), ((141, 306, 517, 800), 0.5257749), ((173, 433, 433, 610), 0.56347036)], ‘matchstick’: [((169, 633, 316, 800), 0.56465816), ((172, 633, 313, 800), 0.7206488), ((333, 639, 467, 800), 0.60068905), ((169, 633, 314, 800), 0.693922), ((172, 633, 314, 800), 0.70851576), ((167, 632, 314, 800), 0.6374499), ((172, 633, 316, 800), 0.5995729), ((169, 640, 307, 800), 0.67480534)], ‘guillotine’: [((149, 591, 341, 800), 0.59910816), ((149, 591, 338, 800), 0.7370558), ((332, 633, 469, 800), 0.5568006), ((142, 591, 341, 800), 0.6165994), ((332, 634, 468, 800), 0.63907826), ((332, 633, 468, 800), 0.57237893), ((142, 590, 321, 800), 0.6664309), ((331, 635, 467, 800), 0.5186203), ((332, 634, 467, 800), 0.58919555)], ‘water_tower’: [((144, 596, 488, 800), 0.50619787)], ‘barber_chair’: [((165, 465, 461, 576), 0.5565266)]}
正如你所看到的,这是一个字典,其中标签 “摇椅 “是关键,我们有一个元组列表,其中有边界框和概率作为值存储在这个标签上。
第5步:在图像上显示检测到的物体
如果你还不知道的话,再看一下对象字典,我们一个标签有多个边界框,那么直接在图像上显示出来,不就会有一个集群吗?
因此,我们需要使用non_max_suppression方法,我们将为我们解决这个问题。但是要使用这个函数,我们需要一个边界框数组和一个概率数组,它给我们返回一个边界框数组。
# Loop through the labels
# for each label apply the non_max_suppression
for label in objects.keys():
# clone the original image
# o that we can draw on it
img_copy = img.copy()
boxes = np.array([pred[0] for pred in objects[label]])
proba = np.array([pred[1] for pred in objects[label]])
boxes = non_max_suppression(boxes, proba)
# Now unpack the co-ordinates of the bounding box
(startX, startY, endX, endY) = boxes[0]
# Draw the bounding box
cv2.rectangle(img_copy, (startX, startY),
(endX, endY), (0, 255, 0), 2)
y = startY - 10 if startY - 10 > 10 else startY + 10
# Put the label on the image
cv2.putText(img_copy, label, (startX, y),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, (255, 0), 2)
# Show the image
cv2.imshow("Regional proposal object detection", img_copy)
cv2.waitKey(0)
下面是完整的实现方案:
# import the packages
import numpy as np
import cv2
# this is the model we'll be using for
# object detection
from tensorflow.keras.applications import Xception
# for preprocessing the input
from tensorflow.keras.applications.xception import preprocess_input
from tensorflow.keras.applications import imagenet_utils
from tensorflow.keras.preprocessing.image import img_to_array
from imutils.object_detection import non_max_suppression
# read the input image
img = cv2.imread('/content/img4.jpg')
# instanciate the selective search
# segmentation algorithm of opencv
search = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
# set the base image as the input image
search.setBaseImage(img)
search.switchToSelectiveSearchFast()
# you can also use this for more accuracy ->
# search.switchToSelectiveSearchQuality()
rects = search.process() # process the image
rois = []
boxes = []
(H, W) = img.shape[:2]
for (x, y, w, h) in rects:
# check if the ROI has atleast
# 20% the size of our image
if w / float(W) < 0.1 or h / float(H) < 0.1:
continue
# Extract the Roi from image
roi = img[y:y + h, x:x + w]
# Convert it to RGB format
roi = cv2.cvtColor(roi, cv2.COLOR_BGR2RGB)
# Resize it to fit the input requirements of the model
roi = cv2.resize(roi, (299, 299))
# Further preprocessing
roi = img_to_array(roi)
roi = preprocess_input(roi)
# Append it to our rois list
rois.append(roi)
# now let's store the box co-ordinates
x1, y1, x2, y2 = x, y, x + w, y + h
boxes.append((x1, y1, x2, y2))
# ------------ Model--------------- #
model = Xception(weights='imagenet')
# Convert ROIS list to arrays for predictions
input_array = np.array(rois)
print("Input array shape is ;", input_array.shape)
#---------- Make Predictions -------#
preds = model.predict(input_array)
preds = imagenet_utils.decode_predictions(preds, top=1)
# Initiate the dictionary
objects = {}
for (i, pred) in enumerate(preds):
# extract the prediction tuple
# and store it's values
iD = pred[0][0]
label = pred[0][1]
prob = pred[0][2]
if prob >= 0.9:
# grab the bounding box associated
# with the prediction and
# convert the coordinates
box = boxes[i]
# create a tuble using box and probability
value = objects.get(label, [])
# append the value to the list for the label
value.append((box, prob))
# Add this tuple to the objects dictionary
# that we initiated
objects[label] = value
# Loop through the labels
# for each label apply the non_max_suppression
for label in objects.keys():
# clone the original image so that we can
# draw on it
img_copy = img.copy()
boxes = np.array([pred[0] for pred in objects[label]])
proba = np.array([pred[1] for pred in objects[label]])
boxes = non_max_suppression(boxes, proba)
# Now unpack the co-ordinates of the bounding box
(startX, startY, endX, endY) = boxes[0]
# Draw the bounding box
cv2.rectangle(img_copy, (startX, startY),
(endX, endY), (0, 255, 0), 2)
y = startY - 10 if startY - 10 > 10 else startY + 10
# Put the label on the image
cv2.putText(img_copy, label, (startX, y),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, (255, 0), 2)
# Show the image
cv2.imshow("Regional proposal object detection", img_copy)
cv2.waitKey(0)
输出: