CN108921001B - A video surveillance pan-tilt using artificial intelligence predictive tracking and its tracking method - Google Patents
A video surveillance pan-tilt using artificial intelligence predictive tracking and its tracking method Download PDFInfo
- Publication number
- CN108921001B CN108921001B CN201810348195.7A CN201810348195A CN108921001B CN 108921001 B CN108921001 B CN 108921001B CN 201810348195 A CN201810348195 A CN 201810348195A CN 108921001 B CN108921001 B CN 108921001B
- Authority
- CN
- China
- Prior art keywords
- scene
- target
- important
- video
- moving
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
本申请提出了一种采用人工智能预测追踪的视频监视云台及其预测追踪方法,可以分析出不同场景视频画面在场景模式方面的差异,进而针对某一种场景模式,基于人工智能的机器学习方法,确定其中重点目标的状态特征,从而训练获得特定场景模式下对重点目标的识别与追踪标准,在多种运动目标的场景之下可以基于每个运动目标与场景模式与采集目的的关联性提取重要目标,从而根据该重要目标的运动实现云台的预测追踪。
This application proposes a video surveillance pan/tilt using artificial intelligence predictive tracking and a predictive tracking method, which can analyze the differences in scene modes of video images in different scenes, and then, for a certain scene mode, machine learning based on artificial intelligence The method is to determine the state characteristics of the key targets, so as to obtain the identification and tracking standards of key targets in a specific scene mode, and in the scene of a variety of moving targets, it can be based on the correlation between each moving target and the scene mode and the collection purpose. Extract important targets, so as to realize the prediction and tracking of the gimbal according to the movement of the important targets.
Description
技术领域technical field
本申请涉及应用于智慧城市的视频监控技术领域,尤其涉及一种采用人工智能预测追踪的视频监视云台及其追踪方法。The present application relates to the technical field of video surveillance applied to smart cities, and in particular, to a video surveillance pan-tilt using artificial intelligence prediction and tracking and a tracking method thereof.
背景技术Background technique
智慧城市是运用物联网技术将智慧楼宇、智慧社区、智慧街道和智慧家庭连通为一体,形成全面覆盖、精确采集、智能分析的数字化信息体系。视频信息的采集与应用在智慧城市中发挥着不可或缺的重要作用,是实现区域安保监控、人脸身份识别、交通信息提取等智慧城市重要功能的前端基础。A smart city is a digital information system that uses IoT technology to connect smart buildings, smart communities, smart streets, and smart homes to form a digital information system with comprehensive coverage, accurate collection, and intelligent analysis. The collection and application of video information plays an indispensable and important role in smart cities, and is the front-end basis for realizing important functions of smart cities such as regional security monitoring, face identification, and traffic information extraction.
在公共空间,视频信息的采集以视频监视摄像机为主要的前端设施。视频监视摄像机一般由摄像机和监视云台两部分共同组成。其中,监视云台用于在结构上承载摄像机,并且携带摄像机转动以调节拍摄的方向和俯仰角度,从而使一台摄像机利用其有限的拍摄视野能够覆盖更大的预定空间范围。In public spaces, video surveillance cameras are the main front-end facilities for the collection of video information. A video surveillance camera is generally composed of a camera and a surveillance pan/tilt. Among them, the monitoring platform is used to carry the camera on the structure, and the carrying camera is rotated to adjust the shooting direction and pitch angle, so that one camera can cover a larger predetermined space with its limited shooting field of view.
锁定特定的运动对象并带动摄像机对该对象进行追踪拍摄,是监视云台的核心功能。为了保证适当的曝光量以及清晰的聚焦,在视频信息采集过程中一般要求主要的拍摄对象处于拍摄视野的中央区域。因此,当主要的拍摄对象是一个运动对象时,监视云台要能够跟随该运动对象而带动摄像机调节拍摄方向和俯仰角度,从而将该运动对象始终保持在拍摄视野的中央区域附近。Locking a specific moving object and driving the camera to track and shoot the object is the core function of the monitoring gimbal. In order to ensure proper exposure and clear focus, it is generally required that the main object to be photographed be in the central area of the photographic field of view during the video information collection process. Therefore, when the main object to be photographed is a moving object, the monitoring pan/tilt must be able to follow the moving object to drive the camera to adjust the shooting direction and pitch angle, so that the moving object is always kept near the central area of the shooting field of view.
为了实现上述追踪拍摄的功能,监视云台需要从摄像机拍摄的连续的实时画面帧当中提取出运动对象,根据该运动对象的在各帧实时画面帧中位置的变化,当该运动对象偏离拍摄视野的中央区域时,适应性带动摄像机进行调节,使得该运动对象回复至拍摄视野的中央区域。In order to realize the above-mentioned tracking shooting function, the monitoring PTZ needs to extract the moving object from the continuous real-time picture frames shot by the camera. When it is in the central area of the camera, the camera is adaptively driven to adjust, so that the moving object returns to the central area of the shooting field of view.
为了改善追踪拍摄的滞后性,监视云台还可以引入预测机制,由在先的若干帧实时画面帧当中运动对象的位置变化规律,提取该对象的运动方向和运动速度,进而按照该运动方向和速度预测后续的实时画面帧当中该运动对象的预计位置偏离,并参照预测提前调节监视云台,使得运动对象在后续的实时画面帧中回复至拍摄视野的中央区域。In order to improve the lag of tracking shooting, the monitoring pan/tilt can also introduce a prediction mechanism, which extracts the moving direction and speed of the moving object from the position change law of the moving object in the previous real-time picture frames, and then according to the moving direction and the moving speed. The speed predicts the expected position deviation of the moving object in the subsequent real-time picture frame, and adjusts the monitoring pan/tilt in advance with reference to the prediction, so that the moving object returns to the central area of the shooting field of view in the subsequent real-time picture frame.
然而,在智慧城市的真实应用之中,视频信息采集的公共空间场景结构往往是比较复杂的,该场景内通常存在较多数量的对象,其中也会包括较多数量的运动对象。因此,对于监视云台来说,如何在复杂、多运动对象的场景结构下实现准确的预测追踪也是一个亟待解决的问题。例如,在场景内存在多个运动目标,且各个运动目标的运动方向和运动速度存在差异的情况下,要求监视云台能够基于视频信息采集的预定目的,从中选取和锁定适当的运动目标,从而携带摄像机进行追踪拍摄。However, in the real application of smart cities, the structure of the public space scene for video information collection is often complex, and there are usually a large number of objects in the scene, including a large number of moving objects. Therefore, for the monitoring PTZ, how to achieve accurate prediction and tracking in the scene structure of complex and multi-moving objects is also an urgent problem to be solved. For example, when there are multiple moving objects in the scene, and there are differences in the moving direction and speed of each moving object, the monitoring platform is required to be able to select and lock the appropriate moving object based on the predetermined purpose of video information collection, so as to Carry a camera for tracking shooting.
现有技术中,在场景具有多个运动目标的情况下,监视云台所采取的追踪策略一般包括:锁定拍摄场景当中最先出现的运动目标进行追踪,直至该运动目标离开拍摄场景后,则更新至场景内现存运动目标中最早出现的予以追踪;或者,每次均锁定场景中最新出现的运动目标进行追踪;或者,在拍摄场景中设置一个关键区域,对最接近该关键区域的运动目标实施追踪;或者,在拍摄场景中选取占据画面比例最大的运动目标实施追踪。In the prior art, when the scene has multiple moving targets, the tracking strategy adopted by the monitoring pan/tilt generally includes: locking the first moving target in the shooting scene for tracking, until the moving target leaves the shooting scene, then update the tracking strategy. Track to the earliest existing moving object in the scene; or, lock the latest moving object in the scene for tracking each time; or, set a key area in the shooting scene, and implement the method for the moving object closest to the key area. Tracking; or, select the moving target that occupies the largest proportion of the screen in the shooting scene to perform tracking.
然而,这些追踪策略往往并不能与视频信息采集的场景特征以及视频信息采集的预定目的完全匹配。例如,根据视频信息采集的场景特征以及采集该场景视频信息的目的,应选取该场景特征下与采集目的具有最大重要度和关联度的运动目标进行追踪,而不完全受到该运动目标处于场景中的任何区域、该运动目标的画面大小或者在场景中出现的时机早晚的影响。However, these tracking strategies often cannot fully match the scene characteristics of video information collection and the intended purpose of video information collection. For example, according to the scene characteristics of the video information collection and the purpose of collecting the video information of the scene, the moving target with the greatest importance and relevance to the collection purpose under the scene characteristics should be selected for tracking, and it is not completely affected by the moving target being in the scene. , the size of the moving object, or the timing of its appearance in the scene.
但是,由于视频场景的采集目的与场景特征之间的关联往往并非是基于显式的一些固定规则就能够定义的,而采集场景的特征也难以预先描述,并且视频采集过程中的追踪策略需要结合场景的改变而进行动态调整,因此,难以用上述现有技术中的简易、明确、静态的追踪规则达到在任何场景特征下都锁定与采集目的具有最大重要度和关联度的运动目标实施跟踪。However, because the association between the purpose of video scene acquisition and scene characteristics is often not defined based on some explicit fixed rules, and the characteristics of the acquisition scene are difficult to describe in advance, and the tracking strategy in the video acquisition process needs to be combined Therefore, it is difficult to use the above-mentioned simple, clear and static tracking rules in the prior art to achieve tracking of moving targets with the greatest importance and relevance for acquisition purposes under any scene feature.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本申请的目的在于提出一种采用人工智能预测追踪的视频监视云台及其追踪方法。本发明利用人工智能方法实现视频信息采集过程中对场景特征的识别,并且学习各种场景模式之下对追踪锁定的运动目标的选择策略,从而实现对具有最大重要度和关联度的运动目标的自动提取和预测追踪。In view of this, the purpose of the present application is to propose a video surveillance pan/tilt using artificial intelligence predictive tracking and a tracking method thereof. The invention uses artificial intelligence methods to realize the recognition of scene features in the process of video information collection, and learns the selection strategy of tracking and locked moving objects under various scene modes, so as to realize the identification of moving objects with the greatest importance and relevance. Automatic extraction and predictive tracking.
本申请提供一种采用人工智能预测追踪的视频监视云台,其特征在于,包括:场景模式识别单元、目标区域智能学习单元、追踪目标提取单元、云台驱动单元;其中The present application provides a video surveillance pan/tilt using artificial intelligence prediction and tracking, which is characterized by comprising: a scene pattern recognition unit, a target area intelligent learning unit, a tracking target extraction unit, and a pan/tilt drive unit; wherein
所述场景模式识别单元用于从视频监视云台的摄像机拍摄的场景视频画面当中提取每个场景当中包含的目标,并且量化目标状态生成状态值;根据目标的状态值,判断两幅相邻的场景视频画面是否是相同的场景模式,对相同场景模式的场景视频画面标注同一类场景模式标记;The scene pattern recognition unit is used to extract the target contained in each scene from the scene video picture captured by the camera of the video surveillance platform, and quantify the target state to generate a state value; according to the state value of the target, determine two adjacent images. Whether the scene video pictures are of the same scene mode, mark the scene video pictures of the same scene mode with the same type of scene mode mark;
目标区域智能学习单元,用于通过一定数量的样本,学习每一种场景模式下重要运动目标的状态值特征;The target area intelligent learning unit is used to learn the state value characteristics of important moving targets in each scene mode through a certain number of samples;
追踪目标提取单元,用于目标区域智能学习单元的学习结果,根据从当前场景视频画面识别每一个运动目标的状态值,判断该运动目标是否属于重要目标,当结果表明当前场景视频画面中的一个运动目标属于重要目标,则将该运动目标的画面位置坐标提供给云台驱动单元;The tracking target extraction unit is used for the learning result of the target area intelligent learning unit. According to the state value of each moving target identified from the current scene video screen, it is judged whether the moving target belongs to an important target. When the result indicates that one of the current scene video screen If the moving target is an important target, then the screen position coordinates of the moving target are provided to the PTZ drive unit;
云台驱动单元,从追踪目标提取单元获得当前场景视频画面中的重要目标的画面位置坐标,从而按照该画面位置坐标锁定当前场景视频画面当中的重要目标,进行追踪拍摄。The pan-tilt driving unit obtains the screen position coordinates of the important target in the video picture of the current scene from the tracking target extraction unit, so as to lock the important target in the video screen of the current scene according to the screen position coordinates, and perform tracking shooting.
优选的是,所述场景模式识别单元比对相邻两幅场景视频画面中是否包含相同目标,以及相同目标占每幅场景视频画面中全部目标的比例是否低于第一阈值;如果两幅场景视频画面中至少一幅的相同目标百分比低于该第一阈值,则判定这两幅场景视频画面分别属于不同的场景模式;如果相邻两幅场景视频画面中包含的相同目标占全部目标的百分比均大于等于第一阈值,则进而利用两幅画面中相同目标各自的状态值,计算这两幅场景视频画面中全部相同目标的状态整体差异度;若状态整体差异度大于等于第二阈值,则判定这两幅场景视频画面具有不同的场景模式,若状态整体差异度低于第二阈值,则判定这两幅场景视频画面具有相同的场景模式。Preferably, the scene pattern recognition unit compares whether two adjacent scene video pictures contain the same object, and whether the proportion of the same object to all objects in each scene video picture is lower than the first threshold; If the percentage of the same objects in at least one of the video pictures is lower than the first threshold, it is determined that the two scene video pictures belong to different scene modes; if the same objects contained in the two adjacent scene video pictures account for the percentage of all objects are greater than or equal to the first threshold, then the respective state values of the same objects in the two pictures are used to calculate the overall state difference of all the same objects in the two scene video pictures; if the overall state difference is greater than or equal to the second threshold, then It is determined that the two scene video pictures have different scene modes, and if the overall state difference is lower than the second threshold, it is determined that the two scene video pictures have the same scene mode.
优选的是,目标区域智能学习单元为每一种场景模式提取一定数量的场景视频画面作为样本画面,并且人工对这些样本画面当中应锁定追踪的运动目标区域进行识别,确定每一幅样本画面当中属于重要目标的运动目标;确定该重要目标在样本画面中的状态值;将每一种场景模式的全部样本画面中的重要目标的状态值执行训练,获得训练好的重要目标识别SVM分类向量机;针对每一种场景模式的全部样本画面,提取样本画面包含的全部运动目标以及每一个运动目标的状态值,获得目标状态值列表,将每一种场景模式的全部样本画面的目标状态列表进行训练,获得训练好的场景模式识别SVM分类向量机。Preferably, the target area intelligent learning unit extracts a certain number of scene video pictures as sample pictures for each scene mode, and manually identifies the moving target area that should be locked and tracked among these sample pictures, and determines which of the sample pictures A moving target belonging to an important target; determine the state value of the important target in the sample picture; perform training on the state value of the important target in all sample pictures of each scene mode, and obtain the trained important target recognition SVM classification vector machine ; For all the sample pictures of each scene mode, extract all the moving objects contained in the sample pictures and the state values of each moving object, obtain a list of target state values, and carry out the target state list of all the sample pictures of each scene mode. Training to obtain the trained scene pattern recognition SVM classification vector machine.
优选的是,所述追踪目标提取单元用于将当前场景视频画面的运动目标状态值列表代入经训练之后的各个场景模式识别SVM分类向量机,从而判断当前场景视频画面属于哪一种场景模式;进而,将该场景视频画面中每一个运动目标的状态值代入训练好的该场景模式对应的重要目标识别SVM分类向量机,从而识别每一个运动目标是否属于重要目标,当分类结果表明当前场景视频画面中的一个运动目标属于重要目标。Preferably, the tracking target extraction unit is used for substituting the moving target state value list of the current scene video picture into each scene pattern recognition SVM classification vector machine after training, thereby judging which scene mode the current scene video picture belongs to; Further, the state value of each moving object in the scene video picture is substituted into the important object recognition SVM classification vector machine corresponding to the trained scene mode, so as to identify whether each moving object belongs to an important object, when the classification result indicates that the current scene video A moving object in the picture is an important object.
优选的是,所述云台驱动单元分析该重要目标的在各帧场景视频画面中位置的变化,当该重要目标偏离拍摄视野的中央区域时,提取该重要目标的运动方向和运动速度,进而按照该运动方向和速度预测该重要目标的预计位置偏离,并参照预测提前驱动监视云台的旋转机械机构。Preferably, the pan-tilt driving unit analyzes the positional change of the important target in each frame of the scene video image, and extracts the movement direction and movement speed of the important target when the important target deviates from the central area of the shooting field of view, and then extracts the movement direction and movement speed of the important target. The predicted position deviation of the important target is predicted according to the moving direction and speed, and the rotating mechanical mechanism of the monitoring platform is driven and monitored in advance with reference to the prediction.
本发明进而提供了一种应用于视频监视云台的人工智能预测追踪方法,其特征在于,包括:The present invention further provides an artificial intelligence prediction and tracking method applied to a video surveillance platform, characterized in that it includes:
场景模式识别步骤,用于从场景视频画面当中提取每个场景当中包含的目标,并且量化目标状态生成状态值;根据目标的状态值,判断两幅相邻的场景视频画面是否是相同的场景模式,对相同场景模式的场景视频画面标注同一类场景模式标记;The scene pattern recognition step is used to extract the target contained in each scene from the scene video picture, and quantify the target state to generate a state value; according to the state value of the target, determine whether two adjacent scene video pictures are the same scene pattern , mark the same type of scene mode mark on the scene video images of the same scene mode;
目标区域智能学习步骤,用于通过一定数量的样本,学习每一种场景模式下重要运动目标的状态值特征;The intelligent learning step of the target area is used to learn the state value characteristics of important moving targets in each scene mode through a certain number of samples;
追踪目标提取步骤,用于目标区域智能学习单元的学习结果,根据从当前场景视频画面识别每一个运动目标的状态值,判断该运动目标是否属于重要目标,当结果表明当前场景视频画面中的一个运动目标属于重要目标,则将该运动目标的画面位置坐标提供给云台驱动单元;The tracking target extraction step is used for the learning results of the intelligent learning unit in the target area. According to the state value of each moving target identified from the current scene video screen, it is judged whether the moving target belongs to an important target. When the result indicates that one of the current scene video images If the moving target is an important target, then the screen position coordinates of the moving target are provided to the PTZ drive unit;
云台驱动步骤,从追踪目标提取单元获得当前场景视频画面中的重要目标的画面位置坐标,从而按照该画面位置坐标锁定当前场景视频画面当中的重要目标,进行追踪拍摄。In the pan-tilt driving step, the screen position coordinates of the important targets in the video picture of the current scene are obtained from the tracking target extraction unit, so as to lock the important targets in the video screen of the current scene according to the screen position coordinates for tracking shooting.
优选的是,所述场景模式识别步骤中,比对相邻两幅场景视频画面中是否包含相同目标,以及相同目标占每幅场景视频画面中全部目标的比例是否低于第一阈值;如果两幅场景视频画面中至少一幅的相同目标百分比低于该第一阈值,则判定这两幅场景视频画面分别属于不同的场景模式;如果相邻两幅场景视频画面中包含的相同目标占全部目标的百分比均大于等于第一阈值,则进而利用两幅画面中相同目标各自的状态值,计算这两幅场景视频画面中全部相同目标的状态整体差异度;若状态整体差异度大于等于第二阈值,则判定这两幅场景视频画面具有不同的场景模式,若状态整体差异度低于第二阈值,则判定这两幅场景视频画面具有相同的场景模式。Preferably, in the scene pattern recognition step, it is compared whether two adjacent scene video pictures contain the same object, and whether the proportion of the same object to all objects in each scene video picture is lower than the first threshold; If the percentage of identical objects in at least one of the scene video pictures is lower than the first threshold, it is determined that the two scene video pictures belong to different scene modes; if the same objects contained in the two adjacent scene video pictures account for all the objects The percentage of all the same objects in the two scenes is greater than or equal to the first threshold, then the respective state values of the same objects in the two pictures are used to calculate the overall state difference of all the same objects in the two scenes; if the overall state difference is greater than or equal to the second threshold , it is determined that the two scene video pictures have different scene modes, and if the overall state difference is lower than the second threshold, it is determined that the two scene video pictures have the same scene mode.
优选的是,目标区域智能学习步骤中,为每一种场景模式提取一定数量的场景视频画面作为样本画面,并且人工对这些样本画面当中应锁定追踪的运动目标区域进行识别,确定每一幅样本画面当中属于重要目标的运动目标;确定该重要目标在样本画面中的状态值;将每一种场景模式的全部样本画面中的重要目标的状态值执行训练,获得训练好的重要目标识别SVM分类向量机;针对每一种场景模式的全部样本画面,提取样本画面包含的全部运动目标以及每一个运动目标的状态值,获得目标状态值列表,将每一种场景模式的全部样本画面的目标状态列表进行训练,获得训练好的场景模式识别SVM分类向量机。Preferably, in the target area intelligent learning step, a certain number of scene video images are extracted for each scene mode as sample images, and the moving target area that should be locked and tracked among these sample images is manually identified, and each sample image is determined. The moving target that belongs to the important target in the picture; determine the state value of the important target in the sample picture; perform training on the state value of the important target in all the sample pictures of each scene mode, and obtain the trained important target recognition SVM classification Vector machine; for all sample pictures of each scene mode, extract all moving objects contained in the sample pictures and the state values of each moving object, obtain a list of target state values, and combine the target states of all sample pictures of each scene mode The list is trained to obtain the trained scene pattern recognition SVM classification vector machine.
优选的是,所述追踪目标提取步骤中,用于将当前场景视频画面的运动目标状态值列表代入经训练之后的各个场景模式识别SVM分类向量机,从而判断当前场景视频画面属于哪一种场景模式;进而,将该场景视频画面中每一个运动目标的状态值代入训练好的该场景模式对应的重要目标识别SVM分类向量机,从而识别每一个运动目标是否属于重要目标,当分类结果表明当前场景视频画面中的一个运动目标属于重要目标。Preferably, in the tracking target extraction step, the moving target state value list of the video picture of the current scene is substituted into each scene pattern recognition SVM classification vector machine after training, so as to determine which scene the video picture of the current scene belongs to. Then, the state value of each moving target in the scene video picture is substituted into the important target recognition SVM classification vector machine corresponding to the trained scene mode, thereby identifying whether each moving target belongs to the important target, when the classification result indicates that the current A moving object in the scene video picture is an important object.
优选的是,所述云台驱动步骤中,分析该重要目标的在各帧场景视频画面中位置的变化,当该重要目标偏离拍摄视野的中央区域时,提取该重要目标的运动方向和运动速度,进而按照该运动方向和速度预测该重要目标的预计位置偏离,并参照预测提前驱动监视云台的旋转机械机构。Preferably, in the step of driving the pan-tilt head, the change of the position of the important object in each frame of the scene video picture is analyzed, and when the important object deviates from the central area of the shooting field of view, the moving direction and the moving speed of the important object are extracted. , and then predict the estimated position deviation of the important target according to the movement direction and speed, and drive and monitor the rotating mechanical mechanism of the pan/tilt head in advance with reference to the prediction.
可见,本发明的优点包括:可以分析出不同场景视频画面在场景模式方面的差异,进而针对某一种场景模式,基于人工智能的机器学习方法,确定其中重点目标的状态特征,从而训练获得特定场景模式下对重点目标的识别与追踪标准,在多种运动目标的场景之下可以基于每个运动目标与场景模式与采集目的的关联性提取重要目标,从而根据该重要目标的运动实现云台的预测追踪。It can be seen that the advantages of the present invention include: the differences in the scene modes of the video images of different scenes can be analyzed, and then for a certain scene mode, the machine learning method based on artificial intelligence can determine the state characteristics of the key targets, so as to obtain specific Recognition and tracking criteria for key targets in scene mode. In the scene of multiple moving targets, important targets can be extracted based on the correlation between each moving target and the scene mode and the collection purpose, so as to realize the pan-tilt according to the movement of the important target. forecast tracking.
附图说明Description of drawings
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:Other features, objects and advantages of the present application will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:
图1是本申请实施例的视频监视云台整体结构示意图;1 is a schematic diagram of the overall structure of a video surveillance platform according to an embodiment of the present application;
图2是本申请实施例的目标区域智能学习单元具体结构示意图;2 is a schematic diagram of a specific structure of a target area intelligent learning unit according to an embodiment of the present application;
图3是本申请实施例的追踪目标提取单元具体结构示意图;3 is a schematic structural diagram of a tracking target extraction unit according to an embodiment of the present application;
图4是本申请实施例的人工智能预测追踪方法流程图。FIG. 4 is a flowchart of an artificial intelligence prediction tracking method according to an embodiment of the present application.
具体实施方式Detailed ways
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that the embodiments in the present application and the features of the embodiments may be combined with each other in the case of no conflict. The present application will be described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
图1为本发明提出采用人工智能预测追踪的视频监视云台整体结构示意图。该视频监视云台整体上分为:场景模式识别单元101、目标区域智能学习单元102、追踪目标提取单元103、云台驱动单元104。FIG. 1 is a schematic diagram of the overall structure of a video surveillance pan/tilt using artificial intelligence prediction and tracking proposed by the present invention. The video surveillance pan/tilt is divided into: a scene pattern recognition unit 101 , a target area intelligent learning unit 102 , a tracking target extraction unit 103 , and a pan/tilt driving unit 104 .
首先,该视频监视云台有线或者无线信号连接摄像机,取得摄像机拍摄的每帧实时画面帧,每一个实时画面帧表示了一幅该视频监视云台负责拍摄的空间区域的场景视频画面。First, the video surveillance pan/tilt is connected to the camera via a wired or wireless signal, and each real-time picture frame captured by the camera is obtained, and each real-time picture frame represents a scene video image of the space area that the video surveillance pan/tilt is responsible for shooting.
场景模式识别单元101用于从视频监视云台的摄像机拍摄的场景视频画面当中提取每个场景当中包含的目标,并且量化目标状态生成状态值。针对其中任意相邻的两幅场景视频画面,首先利用canny算子边缘提取的方法,从画面中提取出每一个封闭边缘包围的画面区域作为一个目标,并且获取每个目标的颜色特征、纹理特征,区域质心到区域边缘的向量特征中的任意一个或者多个特征。通过比对这些特征判断相邻两幅场景视频画面中是否包含相同目标,以及相同目标占每幅场景视频画面中全部目标的比例是否低于第一阈值;如果两幅场景视频画面中至少一幅的相同目标百分比低于该第一阈值,则判定这两幅场景视频画面分别属于不同的场景模式。如果相邻两幅场景视频画面中包含的相同目标占全部目标的百分比均大于等于第一阈值,则进而利用两幅画面中相同目标各自的状态值,计算这两幅场景视频画面中全部相同目标的状态整体差异度;若状态整体差异度大于等于第二阈值,则判定这两幅场景视频画面具有不同的场景模式,若状态整体差异度低于第二阈值,则判定这两幅场景视频画面具有相同的场景模式。这里所述目标的状态值可以包括每个目标在场景视频画面中所在画面区域的大小、长宽比、画面位置坐标、倾斜度等多项参量的数值。根据上述判定结果,场景模式识别单元101对每一幅场景视频画面标注其场景模式,具有相同场景模式的场景视频画面被标注为同一类的场景模式标记。The scene pattern recognition unit 101 is used for extracting objects contained in each scene from the scene video pictures captured by the cameras of the video surveillance platform, and quantifying the state of the objects to generate a state value. For any two adjacent scene video pictures, first use the edge extraction method of the canny operator to extract the picture area surrounded by each closed edge from the picture as a target, and obtain the color features and texture features of each target. , any one or more of the vector features from the region centroid to the region edge. By comparing these features, it is judged whether two adjacent scene video pictures contain the same object, and whether the proportion of the same object to all objects in each scene video picture is lower than the first threshold; if at least one of the two scene video pictures If the percentage of the same target is lower than the first threshold, it is determined that the two scene video pictures belong to different scene modes respectively. If the percentage of the same objects contained in the two adjacent scene video pictures to the total objects is greater than or equal to the first threshold, then further use the respective state values of the same objects in the two pictures to calculate all the same objects in the two scene video pictures If the overall state difference is greater than or equal to the second threshold, it is determined that the two scene video pictures have different scene modes; if the overall state difference is lower than the second threshold, it is determined that the two scene video pictures have different scene modes Has the same scene mode. The state value of the target here may include the value of a number of parameters, such as the size of the screen area where each target is located in the scene video picture, the aspect ratio, the coordinates of the screen position, and the inclination. According to the above determination result, the scene mode identification unit 101 marks each scene video picture with its scene mode, and the scene video pictures with the same scene mode are marked with the same type of scene mode mark.
举例来说,对于在一个时间区间内采集的全部场景视频画面 S1,S2……Sn-1,Sn……Sm,对于其中相邻的两个场景视频画面Sn-1,Sn来说,经过目标识别,场景视频画面Sn-1包含的全部运动目标的集合为 Objectn-1={O1,O2……On-1,On……Ok},场景视频画面Sn包含的全部运动目标的集合为Objectn={O′1,O′2……O′l-1,O′l……O′m},则确定两个场景视频画面Sn-1,Sn中包含的相同目标,将两个场景视频画面中的相同目标作为以上两个集合的交集判断交集 Objectn-1∩Objectn的目标数量分别占集合Objectn-1和集合Objectn中目标数量的百分比,如果其中任意一个百分比低于第一阈值,则判定场景视频画面Sn-1,Sn具有不同的场景模式。如果交集Objectn-1∩Objectn的目标数量分别占集合Objectn-1和集合Objectn中目标数量的百分比均大于等于第一阈值,则进而针对交集当中的每个目标,分析该目标在场景视频画面Sn-1中的状态值与场景视频画面中Sn的状态值的绝对差值(若每个目标在场景视频画面Sn-1,Sn当中存在大小、长宽比、画面位置坐标、倾斜度等多种类型的状态值,则采用为每一类型的状态值的差异度进行计分制的方式,获得该绝对差值),进而对交集 Objectn-1∩Objectn中全部目标的该状态值绝对差值进行加权求和,作为状态整体差异度,即For example, for all the scene video pictures S 1 , S 2 ......S n-1 , S n ...... S m collected in a time interval, for two adjacent scene video pictures S n-1 , For Sn , after target recognition, the set of all moving targets contained in the scene video picture Sn -1 is Object n-1 ={O 1 ,O 2 ......O n-1 , On ......O k }, The set of all moving objects contained in the scene video picture Sn is Object n = {O' 1 , O' 2 ... O' l-1 , O' l ... O' m }, then determine two scene video pictures S n -1 , the same target contained in Sn, take the same target in the two scene video images as the intersection of the above two sets It is judged that the number of objects in the intersection Object n-1 ∩ Object n accounts for the percentages of the number of objects in the set Object n - 1 and the set Object n respectively. Sn has different scene modes. If the percentages of the number of targets in the intersection Object n-1 ∩ Object n respectively account for the number of targets in the set Object n-1 and the set Object n , respectively, are greater than or equal to the first threshold, then further for the intersection For each object among them, analyze the absolute difference between the state value of the object in the scene video picture Sn -1 and the state value of Sn in the scene video picture (if each object is in the scene video picture Sn -1 , S There are various types of state values in n , such as size, aspect ratio, screen position coordinates, inclination, etc., the absolute difference is obtained by using a scoring system for the difference of each type of state value), and then The absolute difference of the state values of all the targets in the intersection Object n-1 ∩ Object n is weighted and summed, as the overall difference degree of the state, that is
其中DIFF(Sn,Sn-1)表示场景视频画面Sn-1,Sn之间的状态整体差异度,表示场景视频画面Sn-1,Sn的共同目标的在场景视频画面Sn-1中的状态值,表示场景视频画面Sn-1,Sn的共同目标的在场景视频画面Sn中的状态值,αi表示加权求和系数。不同目标对于分析场景之间的状态整体差异度所做出的影响程度不一样,故而通过加权系数来体现该影响程度,对于越靠近前景拍摄视角中心的目标,其加权求和系数的权重最大,对前景非中心的目标其加权求和系数小于前景中心目标的加权求和系数权重,而背景目标其加权求和系数权重最小。若状态整体差异度DIFF(Sn,Sn-1)大于等于第二阈值,则判定这两幅场景视频画面Sn-1,Sn具有不同的场景模式,若状态整体差异度DIFF(Sn,Sn-1)低于第二阈值,则判定Sn-1,Sn具有相同的场景模式。这样,场景模式识别单元101对摄像机在该视角区间内采集的每一副场景视频画面 S1,S2……Sn-1,Sn……Sm标注场景模式的标记,其中,按照以上算法判断出属于相同场景模式的场景视频画面,被标注具有相同的场景模式标记;若判断属于不同的场景模式的场景视频画面则被标注不同的场景模式标记。Wherein DIFF(S n ,S n-1 ) represents the overall state difference between the scene video pictures Sn -1 ,S n , Represents the common target of the scene video picture Sn -1 , Sn The state value in the scene video picture Sn -1 , Represents the common target of the scene video picture Sn -1 , Sn The state value in the scene video picture Sn, α i represents the weighted summation coefficient. Different targets have different degrees of influence on the overall state difference between the analysis scenes, so the weighting coefficient is used to reflect the degree of influence. For the target closer to the center of the foreground shooting angle of view, the weighted summation coefficient has the largest weight. The weighted summation coefficient of the foreground non-central target is smaller than the weighted summation coefficient weight of the foreground central target, and the weighted summation coefficient weight of the background target is the smallest. If the overall state difference degree DIFF(S n , Sn -1 ) is greater than or equal to the second threshold, it is determined that the two scene video pictures Sn -1 , Sn have different scene modes. If the overall state difference degree DIFF(S n ) n , Sn -1 ) is lower than the second threshold, it is determined that Sn -1 , Sn have the same scene mode. In this way, the scene mode identification unit 101 marks each scene video picture S 1 , S 2 ...... The scene video pictures determined by the algorithm to belong to the same scene mode are marked with the same scene mode mark; if the scene video pictures judged to belong to different scene modes are marked with different scene mode marks.
目标区域智能学习单元102从场景模式识别单元101获得所述场景视频画面及其场景模式标记,进而,该目标区域智能学习单元102通过运用人工智能学习方法,通过一定数量的样本,学习每一种场景模式下与与视频采集目的具有最大重要度和关联度的运动目标的状态特征。如图2所示,目标区域智能学习单元102包括重要运动目标抽取单元 102A、重要运动目标分类训练单元102B、场景模式分类训练单元102C。The target area intelligent learning unit 102 obtains the scene video picture and its scene mode mark from the scene pattern recognition unit 101, and further, the target area intelligent learning unit 102 learns each type of The state characteristics of the moving objects with the greatest importance and relevance to the video capture purpose in the scene mode. As shown in FIG. 2, the target area intelligent learning unit 102 includes an important moving object extraction unit 102A, an important moving object classification training unit 102B, and a scene pattern classification training unit 102C.
首先介绍一下每种场景模式下用于识别重要运动目标状态特征的学习样本的形成过程。目标区域智能学习单元102为每一种场景模式提取一定数量的场景视频画面作为样本画面,并且人工对这些样本画面当中应锁定追踪的运动目标区域进行识别。人的视觉感知存在将注意力集中于希望关注的重要目标而忽略视域内的次要目标的特点,在用户观看场景视频画面的过程中,视线将重点集中于该场景视频画面的重要运动目标,从而用户视线在指向场景视频画面的重要目标-指向周围次要目标-再次指向该重要目标的模式下循环切换;其中,视线指向重要目标所持续的时间段明显较长,指向周围次要目标所持续的时间段明显较短。从而,重要运动目标抽取单元102A可以通过拍摄和分析对样本画面进行人工识别的识别者的视线指向的画面区域,确定每一幅样本画面当中属于重要目标的运动目标。当然,也可以由人工识别者手工标记出每一幅场景视频画面当中属于重要目标的运动目标。First, the formation process of the learning samples used to identify the state characteristics of important moving objects in each scene mode is introduced. The target area intelligent learning unit 102 extracts a certain number of scene video pictures as sample pictures for each scene mode, and manually identifies the moving target area that should be locked and tracked in these sample pictures. Human visual perception has the characteristic of focusing on the important target that you want to pay attention to and ignoring the secondary target in the field of view. As a result, the user's sight is cyclically switched in the mode of pointing to the important target of the scene video screen - pointing to the surrounding secondary target - pointing to the important target again. The duration of time is significantly shorter. Therefore, the important moving object extraction unit 102A can determine the moving object belonging to the important object in each sample picture by photographing and analyzing the picture area to which the eye of the recognizer who manually identifies the sample picture is directed. Of course, the moving objects that belong to important objects in each scene video picture can also be manually marked by a human recognizer.
对于从每一种场景模式的样本画面当中人工识别出来的重要目标,记为目标区域智能学习单元102的重要运动目标分类训练单元102B 确定该重要目标在样本画面中的状态值,记为Stat(OKey),如上文所述,该状态值Stat(OKey)是该重要目标OKey在样本画面中的大小、长宽比、画面位置坐标、倾斜度等多项参量的数值组成的数组。将每一种场景模式的全部样本画面的状态值Stat(OKey)数组代入SVM分类向量机执行训练。经过训练,目标区域智能学习单元102取得了若干个训练好的重要目标识别SVM分类向量机,每一个SVM分类向量机对应在一种场景模式下对场景视频画面中的运动目标是否属于重要目标进行分类识别。For the important objects manually identified from the sample images of each scene mode, denote as The important moving target classification training unit 102B of the target area intelligent learning unit 102 determines the state value of the important target in the sample picture, which is denoted as Stat(O Key ), as described above, the state value Stat(O Key ) is the important An array composed of the values of the size, aspect ratio, screen position coordinates, inclination and other parameters of the target O Key in the sample screen. The state value Stat(O Key ) array of all sample pictures of each scene mode is substituted into the SVM classification vector machine to perform training. After training, the target area intelligent learning unit 102 has obtained several trained important target recognition SVM classification vector machines. Classification identification.
进而,针对每一种场景模式的全部样本画面,目标区域智能学习单元102的场景模式分类训练单元102C提取样本画面包含的全部运动目标以及每一个运动目标的状态值,从而获得 {Stat(O1),Stat(O2),…Stat(On-1),Stat(On)…Stat(Ok)}的目标状态值列表,该列表是由样本画面中的每一个运动目标的状态值数组共同构成的。进而,目标区域智能学习单元102将每一种场景模式的全部样本画面的目标状态列表代入SVM分类向量机执行识别每一种场景模式的训练。经过训练,目标区域智能学习单元102取得了若干个训练好的场景模式识别SVM分类向量机,每一个SVM分类向量机可以利用场景视频画面中运动目标的状态值实现对该画面是否属于本向量机对应的场景模式的分类识别。Further, for all the sample pictures of each scene mode, the scene mode classification training unit 102C of the target area intelligent learning unit 102 extracts all the moving objects included in the sample pictures and the state value of each moving object, thereby obtaining {Stat(O 1 ) . ),Stat(O 2 ),…Stat(O n-1 ),Stat(O n )…Stat(O k )} of the target status value list, which is composed of the status value of each moving target in the sample picture composed of arrays. Furthermore, the target area intelligent learning unit 102 substitutes the target state list of all sample pictures of each scene mode into the SVM classification vector machine to perform the training of identifying each scene mode. After training, the target area intelligent learning unit 102 has obtained several trained SVM classification vector machines for scene pattern recognition, and each SVM classification vector machine can use the state value of the moving object in the scene video picture to realize whether the picture belongs to this vector machine. The classification and recognition of the corresponding scene mode.
追踪目标提取单元103用于将视频监视云台的摄像机拍摄的当前场景视频画面的运动目标状态值列表代入经场景模式分类训练单元102C 训练之后的各个场景模式识别SVM分类向量机,从而判断当前场景视频画面属于哪一种场景模式;进而,追踪目标提取单元103继续将该场景视频画面中每一个运动目标的状态值代入经重要运动目标分类训练单元102B训练好的该场景模式对应的重要目标识别SVM分类向量机,从而识别每一个运动目标是否属于重要目标,当分类结果表明当前场景视频画面中的一个运动目标属于重要目标,则追踪目标提取单元103将该运动目标的画面位置坐标提供给云台驱动单元104。具体来说,如图3 所述,追踪目标提取单元103从场景模式识别单元101获得所述当前场景视频画面及其场景模式标记,进而判断当前场景视频画面与在先的场景视频画面是否属于同一个场景模式;如果不属于同一个场景模式,则追踪目标提取单元103的场景模式判定单元103A从场景模式识别单元 101当前场景视频画面的全部运动目标以及每一个运动目标的状态值,从而获得{Stat(O1),Stat(O2),…Stat(On-1),Stat(On)…Stat(Ok)}作为运动目标状态值列表,场景模式判定单元103A将当前场景视频画面的运动目标状态值列表代入经过训练之后的每一个场景模式识别SVM分类向量机,每一个场景模式识别SVM分类向量机将输出“是”或者“否”的二元结果,其中输出“是”的向量机表明当前场景视频画面属于该向量机对应的场景模式。进而,追踪目标提取单元103的重点目标判定单元103B 从场景模式识别单元101获得当前场景视频画面中每一个运动目标的状态值Stat(O1),Stat(O2),…Stat(On-1),Stat(On)…Stat(Ok),将每一个运动目标的状态值代入所属场景模式下的重要目标识别SVM分类向量机,重要目标识别SVM分类向量机为每个输入的状态值输出“是”或者“否”的二元结果,其中输出“是”则表明该状态值对应的运动目标是当前场景视频画面的重要目标,则重点目标判定单元103B将该运动目标的画面位置坐标提供给云台驱动单元104。The tracking target extraction unit 103 is used for substituting the moving target state value list of the current scene video picture captured by the camera of the video surveillance platform into each scene pattern recognition SVM classification vector machine after being trained by the scene pattern classification training unit 102C, thereby judging the current scene. Which scene mode the video picture belongs to; and then, the tracking target extraction unit 103 continues to substitute the state value of each moving target in the scene video picture into the important target recognition corresponding to the scene mode trained by the important moving target classification training unit 102B SVM classification vector machine, thereby identifying whether each moving object belongs to an important object, when the classification result indicates that a moving object in the video picture of the current scene belongs to an important object, the tracking object extraction unit 103 provides the picture position coordinates of the moving object to the cloud Table drive unit 104 . Specifically, as shown in FIG. 3 , the tracking target extraction unit 103 obtains the current scene video picture and its scene mode mark from the scene pattern recognition unit 101, and then judges whether the current scene video picture and the previous scene video picture belong to the same A scene mode; if it does not belong to the same scene mode, the scene mode determination unit 103A of the tracking target extraction unit 103 obtains { Stat(O 1 ), Stat(O 2 ),...Stat(O n-1 ), Stat(O n )... Stat(O k )} as the moving object state value list, the scene mode determination unit 103A assigns the current scene video picture The list of moving target state values is substituted into each scene pattern recognition SVM classification vector machine after training, and each scene pattern recognition SVM classification vector machine will output a binary result of "Yes" or "No", where the output of "Yes" The vector machine indicates that the current scene video picture belongs to the scene mode corresponding to the vector machine. Furthermore, the key target determination unit 103B of the tracking target extraction unit 103 obtains the status values Stat(O 1 ), Stat(O 2 ), . . . Stat(O n- 1 ), Stat(O n )…Stat(O k ), substitute the state value of each moving target into the important target recognition SVM classification vector machine in the scene mode to which it belongs, and the important target recognition SVM classification vector machine is the state of each input The value outputs a binary result of "Yes" or "No", wherein the output of "Yes" indicates that the moving object corresponding to the state value is an important object in the video picture of the current scene, then the important object determination unit 103B determines the picture position of the moving object. The coordinates are provided to the pan-tilt drive unit 104 .
云台驱动单元104从追踪目标提取单元103获得当前场景视频画面中的重要目标的画面位置坐标,从而按照该画面位置坐标锁定当前场景视频画面当中的重要目标。云台驱动单元104分析该重要目标的在各帧场景视频画面中位置的变化,当该重要目标偏离拍摄视野的中央区域时,提取该重要目标的运动方向和运动速度,进而按照该运动方向和速度预测该重要目标的预计位置偏离,并参照预测提前驱动监视云台的旋转机械机构,使得运动对象在后续的场景视频画面中尽快回复至拍摄视野的中央区域。The pan-tilt driving unit 104 obtains the screen position coordinates of the important objects in the video picture of the current scene from the tracking target extracting unit 103, so as to lock the important objects in the video picture of the current scene according to the picture position coordinates. The pan-tilt driving unit 104 analyzes the change of the position of the important object in each frame of the scene video picture, and when the important object deviates from the central area of the shooting field of view, extracts the movement direction and movement speed of the important object, and then extracts the movement direction and movement speed of the important object according to the movement direction and The speed predicts the estimated position deviation of the important target, and drives the rotating mechanical mechanism of the monitoring head in advance with reference to the prediction, so that the moving object returns to the central area of the shooting field of view as soon as possible in the subsequent scene video images.
如图4所示,本发明进而提供了一种应用于视频监视云台的人工智能预测追踪方法。该方法包括以下步骤:As shown in FIG. 4 , the present invention further provides an artificial intelligence prediction and tracking method applied to a video surveillance PTZ. The method includes the following steps:
场景模式识别步骤,从视频监视云台的摄像机拍摄的场景视频画面当中提取每个场景当中包含的目标,并且量化目标的状态值;比对相邻两幅场景视频画面中是否包含相同目标,以及相同目标占每幅场景视频画面中全部目标的比例是否低于第一阈值;如果两幅场景视频画面中至少一幅的相同目标百分比低于该第一阈值,则判定这两幅场景视频画面分别属于不同的场景模式;如果相邻两幅场景视频画面中包含的相同目标占全部目标的百分比均大于等于第一阈值,则进而利用两幅画面中相同目标各自的状态值,计算这两幅场景视频画面中全部相同目标的状态整体差异度;若状态整体差异度大于等于第二阈值,则判定这两幅场景视频画面具有不同的场景模式,若状态整体差异度低于第二阈值,则判定这两幅场景视频画面具有相同的场景模式。In the scene pattern recognition step, the target contained in each scene is extracted from the scene video picture captured by the camera of the video surveillance PTZ, and the state value of the target is quantified; it is compared whether two adjacent scene video pictures contain the same target, and Whether the ratio of the same target to all targets in each scene video picture is lower than the first threshold; if the percentage of the same target in at least one of the two scene video pictures is lower than the first threshold, it is determined that the two scene video pictures are respectively belong to different scene modes; if the percentage of the same target contained in the video images of two adjacent scenes to all targets is greater than or equal to the first threshold, then the respective state values of the same targets in the two frames are used to calculate the two scenes The overall state difference degree of all the same objects in the video picture; if the overall state difference degree is greater than or equal to the second threshold, it is determined that the two scene video pictures have different scene modes, and if the overall state difference degree is lower than the second threshold, it is determined that The two scene video images have the same scene mode.
具体来说,对于在一个时间区间内采集的全部场景视频画面 S1,S2……Sn-1,Sn……Sm,对于其中相邻的两个场景视频画面Sn-1,Sn来说,经过目标识别,场景视频画面Sn-1包含的全部运动目标的集合为 Objectn-1={O1,O2……On-1,On……Ok},场景视频画面Sn包含的全部运动目标的集合为Objectn={O′1,O′2……O′l-1,′l……O′m},则确定两个场景视频画面Sn-1,Sn中包含的相同目标,将两个场景视频画面中的相同目标作为以上两个集合的交集判断交集 Objectn-1∩Objectn的目标数量分别占集合Objectn-1和集合Objectn中目标数量的百分比,如果其中任意一个百分比低于第一阈值,则判定场景视频画面Sn-1,Sn具有不同的场景模式。如果交集Objectn-1∩Objectn的目标数量分别占集合Objectn-1和集合Objectn中目标数量的百分比均大于等于第一阈值,则进而针对交集当中的每个目标,分析该目标在场景视频画面Sn-1中的状态值与场景视频画面中Sn的状态值的绝对差值(若每个目标在场景视频画面Sn-1,Sn当中存在大小、长宽比、画面位置坐标、倾斜度等多种类型的状态值,则采用为每一类型的状态值的差异度进行计分制的方式,获得该绝对差值),进而对交集 Objectn-1∩Objectn中全部目标的该状态值绝对差值进行加权求和,作为状态整体差异度,即Specifically, for all the scene video pictures S 1 , S 2 ......S n-1 , S n ...... S m collected in a time interval, for the two adjacent scene video pictures S n-1 , For Sn , after target recognition, the set of all moving targets contained in the scene video picture Sn -1 is Object n-1 ={O 1 ,O 2 ......O n-1 , On ......O k }, The set of all moving objects contained in the scene video picture Sn is Object n = {O' 1 , O' 2 ... O' l-1 , ' l ... O' m }, then determine the two scene video pictures Sn -1 , the same target contained in Sn , take the same target in the two scene video images as the intersection of the above two sets It is judged that the number of objects in the intersection Object n-1 ∩ Object n accounts for the percentages of the number of objects in the set Object n - 1 and the set Object n respectively. Sn has different scene modes. If the percentages of the number of targets in the intersection Object n-1 ∩ Object n respectively account for the number of targets in the set Object n-1 and the set Object n , respectively, are greater than or equal to the first threshold, then further for the intersection For each object among them, analyze the absolute difference between the state value of the object in the scene video picture Sn -1 and the state value of Sn in the scene video picture (if each object is in the scene video picture Sn -1 , S There are various types of state values in n , such as size, aspect ratio, screen position coordinates, inclination, etc., the absolute difference is obtained by using a scoring system for the difference of each type of state value), and then The absolute difference of the state values of all the targets in the intersection Object n-1 ∩ Object n is weighted and summed, as the overall difference degree of the state, that is
其中DIFF(Sn,Sn-1)表示场景视频画面Sn-1,Sn之间的状态整体差异度,表示场景视频画面Sn-1,Sn的共同目标的在场景视频画面Sn-1中的状态值,表示场景视频画面Sn-1,Sn的共同目标的在场景视频画面Sn中的状态值,αi表示加权求和系数。不同目标对于分析场景之间的状态整体差异度所做出的影响程度不一样,故而通过加权系数来体现该影响程度,对于越靠近前景拍摄视角中心的目标,其加权求和系数的权重最大,对前景非中心的目标其加权求和系数小于前景中心目标的加权求和系数权重,而背景目标其加权求和系数权重最小。若状态整体差异度DIFF(Sn,Sn-1)大于等于第二阈值,则判定这两幅场景视频画面Sn-1,Sn具有不同的场景模式,若状态整体差异度DIFF(Sn,Sn-1)低于第二阈值,则判定Sn-1,Sn具有相同的场景模式。这样,场景模式识别单元101对摄像机在该视角区间内采集的每一副场景视频画面 S1,S2……Sn-1,Sn……Sm标注场景模式的标记,其中,按照以上算法判断出属于相同场景模式的场景视频画面,被标注具有相同的场景模式标记;若判断属于不同的场景模式的场景视频画面则被标注不同的场景模式标记。Wherein DIFF(S n ,S n-1 ) represents the overall state difference between the scene video pictures Sn -1 ,S n , Represents the common target of the scene video picture Sn -1 , Sn The state value in the scene video picture Sn -1 , Represents the common target of the scene video picture Sn -1 , Sn The state value in the scene video picture Sn, α i represents the weighted summation coefficient. Different targets have different degrees of influence on the overall state difference between the analysis scenes, so the weighting coefficient is used to reflect the degree of influence. For the target closer to the center of the foreground shooting angle of view, the weighted summation coefficient has the largest weight. The weighted summation coefficient of the foreground non-central target is smaller than the weighted summation coefficient weight of the foreground central target, and the weighted summation coefficient weight of the background target is the smallest. If the overall state difference degree DIFF(S n , Sn -1 ) is greater than or equal to the second threshold, it is determined that the two scene video pictures Sn -1 , Sn have different scene modes. If the overall state difference degree DIFF(S n ) n , Sn -1 ) is lower than the second threshold, it is determined that Sn -1 , Sn have the same scene mode. In this way, the scene mode identification unit 101 marks each scene video picture S 1 , S 2 ...... The scene video pictures determined by the algorithm to belong to the same scene mode are marked with the same scene mode mark; if the scene video pictures judged to belong to different scene modes are marked with different scene mode marks.
目标区域智能学习步骤,用于获得所述场景视频画面及其场景模式标记,进而通过运用人工智能学习方法,通过一定数量的样本,学习每一种场景模式下与与视频采集目的具有最大重要度和关联度的重要目标的状态值特征,利用该状态值特征训练重要目标识别SVM分类向量机;并且利用样本的运动目标的状态值训练场景模式识别SVM分类向量机。具体来说,为场景模式识别步骤所识别的每一种场景模式提取一定数量的场景视频画面作为样本画面,并且人工对这些样本画面当中应锁定追踪的运动目标区域进行识别,确定每一幅样本画面当中属于重要目标的运动目标;确定该重要目标在样本画面中的状态值,记为Stat(OKey),如上文所述,该状态值Stat(OKey)是该重要目标OKey在样本画面中的大小、长宽比、画面位置坐标、倾斜度等多项参量的数值组成的数组,将每一种场景模式的全部样本画面的状态值Stat(OKey)数组代入SVM分类向量机执行训练;取得了若干个训练好的重要目标识别SVM分类向量机;针对每一种场景模式的全部样本画面,提取样本画面包含的全部运动目标以及每一个运动目标的状态值,获得{Stat(O1),Stat(O2),…Stat(On-1),Atat(On)…Stat(Ok)}的目标状态值列表,该列表是由样本画面中的每一个运动目标的状态值数组共同构成的;将每一种场景模式的全部样本画面的目标状态列表代入SVM分类向量机执行识别每一种场景模式的训练,取得了若干个训练好的场景模式识别 SVM分类向量机。The target area intelligent learning step is used to obtain the scene video picture and the scene mode mark, and then through the use of artificial intelligence learning method, through a certain number of samples, learn each scene mode and the purpose of video collection have the greatest importance And the state value feature of the important target of the correlation degree, use the state value feature to train the important target recognition SVM classification vector machine; and use the state value of the sample moving target to train the scene pattern recognition SVM classification vector machine. Specifically, a certain number of scene video pictures are extracted as sample pictures for each scene mode identified in the scene pattern recognition step, and the moving target area that should be locked and tracked in these sample pictures is manually identified, and each sample picture is determined. A moving object belonging to an important object in the picture; determine the state value of the important object in the sample picture, denoted as Stat(O Key ), as described above, the state value Stat(O Key ) is the important object O Key in the sample It is an array composed of the numerical values of multiple parameters such as size, aspect ratio, screen position coordinates, and inclination in the screen. Substitute the state value Stat(O Key ) array of all sample screens of each scene mode into the SVM classification vector machine for execution. Training; obtained several trained important target recognition SVM classification vector machines; for all sample pictures of each scene mode, extract all moving objects contained in the sample pictures and the state value of each moving object, and obtain {Stat(O 1 ),Stat(O 2 ),…Stat(O n-1 ),Atat(O n )…Stat(O k )} target state value list, the list is composed of the state of each moving object in the sample picture The value array is composed together; the target state list of all sample pictures of each scene pattern is substituted into the SVM classification vector machine to perform the training of identifying each scene pattern, and several trained scene pattern recognition SVM classification vector machines are obtained.
追踪目标提取步骤,用于将当前场景视频画面的运动目标状态值列表代入经训练之后的各个场景模式识别SVM分类向量机,从而判断当前场景视频画面属于哪一种场景模式;进而,将该场景视频画面中每一个运动目标的状态值代入训练好的该场景模式对应的重要目标识别SVM 分类向量机,从而识别每一个运动目标是否属于重要目标,当分类结果表明当前场景视频画面中的一个运动目标属于重要目标,则将该运动目标的画面位置坐标应用于云台驱动。The tracking target extraction step is used for substituting the moving target state value list of the current scene video picture into each scene pattern recognition SVM classification vector machine after training, thereby judging which scene pattern the current scene video picture belongs to; The state value of each moving object in the video picture is substituted into the important object recognition SVM classification vector machine corresponding to the trained scene mode, so as to identify whether each moving object belongs to an important object, when the classification result indicates that a movement in the current scene video picture If the target is an important target, the screen position coordinates of the moving target are applied to the gimbal drive.
云台驱动步骤,获得当前场景视频画面中的重要目标的画面位置坐标,从而按照该画面位置坐标锁定当前场景视频画面当中的重要目标,分析该重要目标的在各帧场景视频画面中位置的变化,当该重要目标偏离拍摄视野的中央区域时,提取该重要目标的运动方向和运动速度,进而按照该运动方向和速度预测该重要目标的预计位置偏离,并参照预测提前驱动监视云台的旋转机械机构,使得运动对象在后续的场景视频画面中尽快回复至拍摄视野的中央区域。In the step of driving the gimbal, the screen position coordinates of the important target in the video picture of the current scene are obtained, so as to lock the important target in the video screen of the current scene according to the position coordinates of the screen, and the change of the position of the important target in each frame of the video screen of the scene is analyzed. , when the important target deviates from the central area of the shooting field of view, extract the movement direction and movement speed of the important target, and then predict the expected position deviation of the important target according to the movement direction and speed, and drive the rotation of the monitoring platform in advance with reference to the prediction The mechanical mechanism enables the moving object to return to the central area of the shooting field of view as soon as possible in the subsequent scene video images.
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present application and an illustration of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution formed by the specific combination of the above technical features, and should also cover the above technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above-mentioned features with the technical features disclosed in this application (but not limited to) with similar functions.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810348195.7A CN108921001B (en) | 2018-04-18 | 2018-04-18 | A video surveillance pan-tilt using artificial intelligence predictive tracking and its tracking method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810348195.7A CN108921001B (en) | 2018-04-18 | 2018-04-18 | A video surveillance pan-tilt using artificial intelligence predictive tracking and its tracking method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN108921001A CN108921001A (en) | 2018-11-30 |
| CN108921001B true CN108921001B (en) | 2019-07-02 |
Family
ID=64403677
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810348195.7A Active CN108921001B (en) | 2018-04-18 | 2018-04-18 | A video surveillance pan-tilt using artificial intelligence predictive tracking and its tracking method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN108921001B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11335112B2 (en) | 2020-04-27 | 2022-05-17 | Adernco Inc. | Systems and methods for identifying a unified entity from a plurality of discrete parts |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111277745B (en) * | 2018-12-04 | 2023-12-05 | 北京奇虎科技有限公司 | Target person tracking methods, devices, electronic equipment and readable storage media |
| CN109727268A (en) * | 2018-12-29 | 2019-05-07 | 西安天和防务技术股份有限公司 | Method for tracking target, device, computer equipment and storage medium |
| CN109995756B (en) * | 2019-02-26 | 2022-02-01 | 西安电子科技大学 | Online single-classification active machine learning method for information system intrusion detection |
| CN110633648B (en) * | 2019-08-21 | 2020-09-11 | 重庆特斯联智慧科技股份有限公司 | Face recognition method and system in natural walking state |
| CN111010546A (en) * | 2019-12-20 | 2020-04-14 | 浙江大华技术股份有限公司 | Method and device for adjusting monitoring preset point and storage medium |
| US11403734B2 (en) | 2020-01-07 | 2022-08-02 | Ademco Inc. | Systems and methods for converting low resolution images into high resolution images |
| US11978328B2 (en) | 2020-04-28 | 2024-05-07 | Ademco Inc. | Systems and methods for identifying user-customized relevant individuals in an ambient image at a doorbell device |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103024344A (en) * | 2011-09-20 | 2013-04-03 | 佳都新太科技股份有限公司 | Automatic PTZ (Pan/Tilt/Zoom) target tracking method based on particle filter |
| CN102883175B (en) * | 2012-10-23 | 2015-06-17 | 青岛海信信芯科技有限公司 | Methods for extracting depth map, judging video scene change and optimizing edge of depth map |
| US10424341B2 (en) * | 2014-11-12 | 2019-09-24 | Massachusetts Institute Of Technology | Dynamic video summarization |
| CN107105207A (en) * | 2017-06-09 | 2017-08-29 | 北京深瞐科技有限公司 | Target monitoring method, target monitoring device and video camera |
-
2018
- 2018-04-18 CN CN201810348195.7A patent/CN108921001B/en active Active
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11335112B2 (en) | 2020-04-27 | 2022-05-17 | Adernco Inc. | Systems and methods for identifying a unified entity from a plurality of discrete parts |
Also Published As
| Publication number | Publication date |
|---|---|
| CN108921001A (en) | 2018-11-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108921001B (en) | A video surveillance pan-tilt using artificial intelligence predictive tracking and its tracking method | |
| CN101427263B (en) | Method and apparatus for selective rejection of digital images | |
| Hossain et al. | Crowd counting using scale-aware attention networks | |
| CN103761514B (en) | The system and method for recognition of face is realized based on wide-angle gunlock and many ball machines | |
| US10515471B2 (en) | Apparatus and method for generating best-view image centered on object of interest in multiple camera images | |
| CN105611230B (en) | Image processing apparatus and image processing method | |
| Wheeler et al. | Face recognition at a distance system for surveillance applications | |
| TWI382762B (en) | Method for tracking moving object | |
| CN108111818A (en) | Moving target active perception method and apparatus based on multiple-camera collaboration | |
| CN103310187B (en) | face image prioritization based on face quality analysis | |
| CN103905727B (en) | Object area tracking apparatus, control method, and program of the same | |
| KR101441333B1 (en) | Human body part detecting device and method thereof | |
| CN105930822A (en) | Human face snapshot method and system | |
| CN109583373B (en) | Pedestrian re-identification implementation method | |
| KR20150021526A (en) | Self learning face recognition using depth based tracking for database generation and update | |
| US10366482B2 (en) | Method and system for automated video image focus change detection and classification | |
| TWI601425B (en) | A method for tracing an object by linking video sequences | |
| US20170201723A1 (en) | Method of providing object image based on object tracking | |
| GB2409030A (en) | Face detection | |
| JP2021071794A (en) | Main subject determination device, imaging device, main subject determination method, and program | |
| CN109905641B (en) | Target monitoring method, device, equipment and system | |
| CN110633648B (en) | Face recognition method and system in natural walking state | |
| CN114120165A (en) | Gun and ball linked target tracking method and device, electronic device and storage medium | |
| CN116758474A (en) | Stranger stay detection method, device, equipment and storage medium | |
| CN113382304B (en) | Video stitching method based on artificial intelligence technology |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CP03 | Change of name, title or address | ||
| CP03 | Change of name, title or address |
Address after: 101100 rooms 1-6, building 1, courtyard 3, binhuibei 1st Street, Tongzhou District, Beijing Patentee after: Teslan Technology Group Co.,Ltd. Country or region after: China Address before: 100027 11 Floor West Tower of Qihao Building, 8 Xinyuan South Road, Chaoyang District, Beijing Patentee before: TERMINUS (BEIJING) TECHNOLOGY Co.,Ltd. Country or region before: China |
|
| TR01 | Transfer of patent right | ||
| TR01 | Transfer of patent right |
Effective date of registration: 20250626 Address after: 266011 No. 8 Ruijing Road, Shibei District, Qingdao City, Shandong Province Patentee after: Guangte Haizhi Marine Technology (Qingdao) Co.,Ltd. Country or region after: China Address before: 101100 rooms 1-6, building 1, courtyard 3, binhuibei 1st Street, Tongzhou District, Beijing Patentee before: Teslan Technology Group Co.,Ltd. Country or region before: China |