我们今天来做一个有趣的实验,如何用python实时识别手指的个数,并讲解一下识别过程中的算法,效果如下:
如何从混乱的背景中分割前景图像是一个难题。最明显的原因是由于人看图像和计算机看同一图像时存在的差距。人们可以很容易地弄清楚图像中的内容,但是对于计算机而言,图像只是3维矩阵。因此,计算机视觉问题仍然是一个挑战。看下面的图片。
上面这个图像,如果是人看了会在图像中找到不同的区域并标记其相应的标签,如:“天空”,“人”,“树”和“草”。那计算机如何才能识别呢,我们首先需要单独取出手部区域,以去除视频序列中所有不需要的部分。在分割手部区域之后,我们然后对视频序列中显示的手指进行计数,所以我们分两步走。
第一步:从视频序列中找到并分割手部区域。
第二步:从视频序列中分割的手区域计算手指的数量。
▊ 第一步、分割提取手部区域
手势识别的第一步显然是通过消除视频序列中所有其他不需要的部分来找到手部区域。起初这似乎令人恐惧。但是不用担心。使用Python和OpenCV会容易得多!
注意:视频序列只是相对于时间运行的帧集合或图像集合。
在深入探讨细节之前,让我们了解如何确定手部区域。
▶背景扣除
首先,我们需要一种有效的方法来将前景与背景分开。为此,我们使用移动平均值的概念。我们使我们的系统可以查看特定场景的30帧。在此期间,我们计算当前帧和先前帧的运行平均值。通过这样做,我们实质上告诉我们的系统-
好吧,机器人!您凝视的视频序列(这30帧的运行平均值)是背景。
在弄清背景之后,我们举起手来,使系统了解我们的手是进入背景的新条目,这意味着它成为前景对象。但是,我们将如何单独看待这一前景呢?答案是背景减法。
看下面的图片,它描述了背景减法的工作原理。
在使用移动平均值确定背景模型之后,我们使用当前框架以及背景,该框架还包含前景对象(在本例中为hand)。我们计算背景模型(随时间更新)和当前帧(有我们的手)之间的绝对差,以获得包含新添加的前景对象(这就是我们的手)的差异图像。这就是背景减法的全部含义。
▶运动检测和阈值
为了从该差异图像中检测出手部区域,我们需要对差异图像进行阈值处理,以使只有我们的手部区域可见,而所有其他不需要的区域都被涂成黑色。这就是运动检测的全部意义。
注意:阈值是基于特定阈值级别将像素强度分配为0和1,以便仅从图像中捕获我们感兴趣的对象。
▶轮廓提取
对差异图像进行阈值处理后,我们在结果图像中找到轮廓。假定面积最大的轮廓是我们的手。
注意:轮廓线是位于图像中的对象的轮廓或边界。
具体代码
# organize imports import cv2 import imutils import numpy as np # global variables bg = None
#-------------------------------------------------- # To find the running average over the background #-------------------------------------------------- def run_avg(image, aWeight): global bg # initialize the background if bg is None: bg = image.copy().astype("float") return # compute weighted average, accumulate it and update the background cv2.accumulateWeighted(image, bg, aWeight)
#--------------------------------------------- # To segment the region of hand in the image #--------------------------------------------- def segment(image, threshold=25): global bg # find the absolute difference between background and current frame diff = cv2.absdiff(bg.astype("uint8"), image) # threshold the diff image so that we get the foreground thresholded = cv2.threshold(diff, threshold, 255, cv2.THRESH_BINARY)[1] # get the contours in the thresholded image (_, cnts, _) = cv2.findContours(thresholded.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) # return None, if no contours detected if len(cnts) == 0: return else: # based on contour area, get the maximum contour which is the hand segmented = max(cnts, key=cv2.contourArea) return (thresholded, segmented)
首先,我们使用cv2.absdiff()函数找到背景模型与当前帧之间的绝对差异。
接下来,我们对差异图像进行阈值处理以仅显示手部区域。最后,我们对阈值图像进行轮廓提取,并获取面积最大的轮廓(这就是我们的手)。
我们将阈值图像以及分割后的图像作为元组返回。阈值背后的数学非常简单。如果x (n ) 表示输入图像在特定像素坐标下的像素强度,然后 阀值 决定我们将图像分割/阈值化为二进制图像的程度。公式如下:
#----------------- # MAIN FUNCTION #----------------- if __name__ == "__main__": # initialize weight for running average aWeight = 0.5 # get the reference to the webcam camera = cv2.VideoCapture(0) # region of interest (ROI) coordinates top, right, bottom, left = 10, 350, 225, 590 # initialize num of frames num_frames = 0 # keep looping, until interrupted while(True): # get the current frame (grabbed, frame) = camera.read() # resize the frame frame = imutils.resize(frame, width=700) # flip the frame so that it is not the mirror view frame = cv2.flip(frame, 1) # clone the frame clone = frame.copy() # get the height and width of the frame (height, width) = frame.shape[:2] # get the ROI roi = frame[top:bottom, right:left] # convert the roi to grayscale and blur it gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY) gray = cv2.GaussianBlur(gray, (7, 7), 0) # to get the background, keep looking till a threshold is reached # so that our running average model gets calibrated if num_frames < 30: run_avg(gray, aWeight) else: # segment the hand region hand = segment(gray) # check whether hand region is segmented if hand is not None: # if yes, unpack the thresholded image and # segmented region (thresholded, segmented) = hand # draw the segmented region and display the frame cv2.drawContours(clone, [segmented + (right, top)], -1, (0, 0, 255)) cv2.imshow("Thesholded", thresholded) # draw the segmented hand cv2.rectangle(clone, (left, top), (right, bottom), (0,255,0), 2) # increment the number of frames num_frames += 1 # display the frame with segmented hand cv2.imshow("Video Feed", clone) # observe the keypress by the user keypress = cv2.waitKey(1) & 0xFF # if the user pressed "q", then stop looping if keypress == ord("q"): break # free up memory camera.release() cv2.destroyAllWindows()上面的代码示例是我们程序的主要功能。我们将aWeight初始化为0.5。如移动平均值方程式中更早显示的那样,此阈值意味着如果为该变量设置一个较低的值,则将在大量的先前帧上执行移动平均值,反之亦然。我们使用cv2.VideoCapture(0)引用了我们的网络摄像头,这意味着我们在计算机中获取了默认的网络摄像头实例。
#-------------------------------------------------------------- # To count the number of fingers in the segmented hand region #-------------------------------------------------------------- def count(thresholded, segmented): # find the convex hull of the segmented hand region chull = cv2.convexHull(segmented) # find the most extreme points in the convex hull extreme_top = tuple(chull[chull[:, :, 1].argmin()][0]) extreme_bottom = tuple(chull[chull[:, :, 1].argmax()][0]) extreme_left = tuple(chull[chull[:, :, 0].argmin()][0]) extreme_right = tuple(chull[chull[:, :, 0].argmax()][0]) # find the center of the palm cX = int((extreme_left[0] + extreme_right[0]) / 2) cY = int((extreme_top[1] + extreme_bottom[1]) / 2) # find the maximum euclidean distance between the center of the palm # and the most extreme points of the convex hull distance = pairwise.euclidean_distances([(cX, cY)], Y=[extreme_left, extreme_right, extreme_top, extreme_bottom])[0] maximum_distance = distance[distance.argmax()] # calculate the radius of the circle with 80% of the max euclidean distance obtained radius = int(0.8 * maximum_distance) # find the circumference of the circle circumference = (2 * np.pi * radius) # take out the circular region of interest which has # the palm and the fingers circular_roi = np.zeros(thresholded.shape[:2], dtype="uint8") # draw the circular ROI cv2.circle(circular_roi, (cX, cY), radius, 255, 1) # take bit-wise AND between thresholded hand using the circular ROI as the mask # which gives the cuts obtained using mask on the thresholded hand image circular_roi = cv2.bitwise_and(thresholded, thresholded, mask=circular_roi) # compute the contours in the circular ROI (_, cnts, _) = cv2.findContours(circular_roi.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) # initalize the finger count count = 0 # loop through the contours found for c in cnts: # compute the bounding box of the contour (x, y, w, h) = cv2.boundingRect(c) # increment the count of fingers only if - # 1. The contour region is not the wrist (bottom area) # 2. The number of points along the contour does not exceed # 25% of the circumference of the circular ROI if ((cY + (cY * 0.25)) > (y + h)) and ((circumference * 0.25) > c.shape[0]): count += 1 return count
git clone https://github.com/Gogul09/gesture-recognition.git
注意:在30帧的校准期间,请勿摇动网络摄像头。如果在前30帧中晃动,则整个算法将无法达到我们的预期。
之后,您可以将手伸入边界框,显示手势,并相应地显示手指数。
网友评论0