CN112907626B

CN112907626B - Moving target extraction method based on satellite super-time phase data multi-source information

Info

Publication number: CN112907626B
Application number: CN202110172481.4A
Authority: CN
Inventors: 鹿明; 李峰; 辛蕾; 杨雪; 鲁啸天; 张南; 任志聪; 肖化超
Original assignee: China Academy of Space Technology CAST
Current assignee: China Academy of Space Technology CAST
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2025-01-17
Anticipated expiration: 2041-02-08
Also published as: CN112907626A

Abstract

The invention relates to a moving target extraction method based on satellite super-phase data multi-source information, which comprises the steps of a, collecting a super-phase image of an area, preprocessing the image, b, primarily extracting a moving target in the image, extracting a road target on which the moving target depends, c, completing further extraction of the moving target in the image according to the primarily extracted moving target and the road target, and d, carrying out morphological processing on the target result image extracted in the step c to obtain a final result. The invention extracts the space geometric shape, the movement speed, the dependent road environment and other information of the moving target on the basis of the characteristics of the spectrum, the texture, the time sequence and the like of the video satellite, and is used for detecting the moving target together, thereby avoiding the influence of false moving targets caused by parallax and false moving targets caused by registration errors and random noise, effectively improving the accuracy of moving target detection and reducing the false detection rate.

Description

Moving target extraction method based on satellite super-time phase data multi-source information

Technical Field

The invention relates to a moving target extraction method based on satellite super-phase data multi-source information.

Background

The moving target detection is a technology formed by cross fusion in the fields of computer vision, remote sensing image processing, artificial intelligence and the like, and is an important research content of situation awareness. The system can not only sense the existence of things, but also sense the dynamic change trend of the things. The method plays an extremely important role in the aspects of real-time investigation, real-time monitoring, real-time control and the like in the military and civil fields. The video satellite is a novel earth observation technology, can capture continuous images from a moving satellite platform, and provides a reliable data source for moving target detection and situation awareness. The characteristics of large-area observation, high spatial resolution, video continuous imaging and the like of the video satellite enable the video satellite to rapidly acquire real-time dynamic information of the earth surface.

Due to the relative motion of the satellite platform and the ground surface, deformation such as translation, rotation, distortion, stretching and the like exists among different frames of the video due to factors such as the height fluctuation of the ground surface. Thus, objects such as high-rise buildings, towers, etc. may exhibit significant spurious motion characteristics. If the moving object detection algorithm such as the traditional inter-frame difference method, background modeling method, optical flow method and the like is used for directly detecting the moving object of the video satellite, the high-level objects may be misjudged as the moving objects. Moreover, identification of high-altitude target pseudo-motions caused by parallax changes at the edges of ground objects generated by image translation is difficult to accomplish by means of simple improvement of a traditional method.

Disclosure of Invention

The invention aims to provide a moving target extraction method based on satellite super-time phase data multi-source information.

In order to achieve the above object, the present invention provides a method for extracting a moving object based on satellite super-phase data multi-source information, comprising the following steps:

a. acquiring a time-out phase image of an area, and preprocessing the image;

b. initially extracting a moving target in the image, and extracting a road target on which the moving target depends in the image;

c. completing further extraction of the moving target in the image according to the initially extracted moving target and the road target;

d. And c, carrying out morphological processing on the target result image extracted in the step c to obtain a final result.

According to one aspect of the present invention, in the step a, a current frame image is acquired and one frame image is acquired in a period of time before and after the current frame, so as to form a three-frame sequence of a previous frame, the current frame and a subsequent frame;

The time interval of three-frame image acquisition is determined by the speed and length of the moving object and the frame frequency of the video, and the motion amplitude of the moving object between the extracted adjacent frames is between 10m and 100 m.

According to one aspect of the present invention, the preprocessing in the step a is to perform inter-frame registration on a frame image extracted before and after the current frame based on the current frame;

The inter-frame registration comprises the steps of reading a current frame image and a frame image before or after the current frame image into an array, and respectively carrying out key point detection and feature description on the two frame images by adopting SIFT, SURF, ORB or AKAZE algorithm;

Feature matching is carried out on key points on two frames of images by utilizing a matcher, wherein the matching method is to calculate the distance between descriptors of each pair of key points and return the minimum distance in k optimal matching with each key point;

and calculating homography matrixes of two image transformations according to the matching point pairs, carrying out image deformation on a frame of image before or after the current frame, and removing abnormal point pairs by adopting a RANSAC algorithm during deformation.

According to one aspect of the present invention, in the step b, the moving object in the image is primarily extracted in such a manner that the moving object is extracted based on the velocity attribute and the time-series attribute of the moving object, respectively;

when a moving object is extracted based on the speed attribute of the moving object, a dense optical flow solution is carried out on the current frame and a frame of image before the current frame by using an optical flow method, so that an optical flow state of each pixel is obtained, the speed and the direction of the pixel are unchanged and become the background, and otherwise, the current frame and the previous frame of image are the foreground object;

When a moving object is extracted based on the time sequence attribute of the moving object, a three-frame difference method is utilized to combine the mobility characteristics of the moving object on the time sequence, and a foreground object and a background object are primarily divided;

And extracting the road targets on which the moving targets depend on the image by using the D_ LinkNet network based on the deep learning.

According to one aspect of the invention, when a moving object is extracted based on the speed attribute of the moving object, a previous frame image and a current frame image which need to be subjected to optical flow calculation are sequentially input, and a pyramid is constructed for each image by designating the proportion of the images;

Determining the number of layers of the pyramid, the size of an average window, the iteration times of an algorithm on each layer of the image pyramid, the number of adjacent pixel points expanded by a polynomial at each pixel point, the Gaussian standard deviation for smoothing derivatives and initial stream approximation;

converting the calculated optical flow from a Cartesian coordinate system to a polar coordinate system, and acquiring the speed and direction of each pixel point;

and according to the speed and the direction of optical flow calculation, the speed and the direction value of the pixel point are 0, and are expressed as background targets, otherwise, the background targets are expressed as foreground targets.

According to one aspect of the invention, when the moving object is extracted based on the time sequence attribute of the moving object in the step b, all the acquired three frames of images are read, and the images are converted into gray level images from RGB images;

Respectively carrying out inter-frame difference on the gray level image of the current frame and the gray level images of the frames before and after the current frame to obtain two difference values;

setting a threshold value to respectively binarize the two difference images to obtain two binary images which distinguish foreground objects and background objects;

and performing AND operation on the two binary images, and extracting a moving target from the intersected images.

According to one aspect of the invention, when the road target on which the moving target depends is extracted in the step b, firstly, a sample data set for network training and testing is constructed and used for generating a road extraction network;

selecting remote sensing images of some target satellites, and marking road targets and other targets in the images;

dividing a data set into a training set, a verification set and a test set according to a certain proportion, wherein the training set is used for carrying out iterative training on network parameters, and the verification set is used for verifying whether a trained model can reach expected precision;

And taking the current frame image as a test data set, inputting the test data set into a trained road extraction network, and obtaining a final road segmentation result.

According to one aspect of the invention, the road extraction network is a U-shaped network based on an encoder-br idge-encoder structure;

The encoder part in the network is ResNet and the bridge part is five convolution blocks;

The decoder portion is the inverse of the decoder portion and upsamples and overlaps the same level of the decoder portion.

According to one aspect of the present invention, in the step c, the three types of results extracted in the step b are respectively stored in a binary manner of 0 and 1, wherein 0 represents a background target and 1 represents a foreground;

And extracting the target with the numerical value of 1 after binarization storage, and taking the target as a precise extraction result of the moving target.

According to one aspect of the present invention, the morphological processing in the step d is that a 3*3 circular template structure is adopted to perform morphological opening operation on the image extracted by the final target so as to eliminate spots in the image;

Adopting 3*3 circular template structure to perform morphological closing operation on the image subjected to morphological opening operation so as to eliminate holes in the image;

The method further comprises the steps of carrying out connectivity analysis on the morphologically processed image, and extracting a final moving target according to the following rules:

a vehicle target is determined when the target size is between 4 and 2000 pixels, the target aspect ratio is below 8, the ratio of the target area to the minimum circumscribed rectangular area is greater than 0.2, and the average pixel value of the target is between 10 and 250.

According to the invention, on the basis of the characteristics of the spectrum, texture, time sequence and the like of the video satellite, the information of the space geometric shape, the motion speed, the dependent road environment and the like of the moving target is extracted and is used for detecting the moving target together, so that the influence of false moving targets caused by parallax and false moving targets caused by registration errors and random noise is avoided, the accuracy of detecting the moving targets is effectively improved, and the false detection rate is reduced.

According to one aspect of the present invention, a road target on which a moving target depends in an image is extracted using a deep learning algorithm. Therefore, the object with the pseudo motion characteristic, which is mistakenly considered as the moving object in the primary moving object extraction, can be screened out, so that the moving object of the video satellite can be extracted more accurately, and the defect in the prior art is overcome.

According to one scheme of the invention, the two methods of the optical flow method and the improved three-frame difference method are used for respectively extracting the moving target in the image, and then the extraction results of the two methods and the extraction result of the road target are subjected to superposition analysis, so that the two methods are mutually corrected, and the accuracy of the final moving target extraction is further improved.

According to one aspect of the present invention, the captured image is preprocessed prior to the extraction of the moving object and the road object. The preprocessing mainly comprises the step of respectively carrying out inter-frame registration on two frames of images before and after the current frame. To avoid the phenomenon of image distortion due to satellite jitter, etc., from affecting subsequent target extraction.

Drawings

FIG. 1 schematically illustrates a flow chart of a method for moving object extraction based on satellite super-temporal data multisource information according to one embodiment of the present invention;

FIG. 2 schematically illustrates a diagram of a best matching keypoint pair of a current frame and a previous frame in an image registration process;

FIG. 3 schematically illustrates a schematic representation of the velocity of a moving object acquired by an optical flow method;

FIG. 4 schematically illustrates a moving object obtained by improving the three-frame differencing method;

Fig. 5 schematically shows a schematic view of a road target (road) extracted based on a deep learning method;

fig. 6 schematically shows the current frame (left) and the final object extraction performed on the current frame (right).

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

The present invention will be described in detail below with reference to the drawings and the specific embodiments, which are not described in detail herein, but the embodiments of the present invention are not limited to the following embodiments.

Referring to fig. 1, in the moving object extraction method based on satellite super-temporal data multi-source information, a super-temporal image (or called multi-temporal image) of a certain area is first collected, and the super-temporal image is image data (which can be understood as video data) with continuous time sequence. And then preprocessing the acquired image by using a data preprocessing module (the image reading can be completed by the data preprocessing module), and performing preliminary extraction of the moving target in the image by using a target information extraction module after preprocessing. Considering that an overhead object may be misjudged as a moving object, according to the concept of the present invention, a road object on which the moving object depends in an image is extracted additionally based on a background environment-dependent attribute of the moving object. Taking a vehicle and a road as examples, the vehicle is taken as a moving target, and the road is taken as a road target on which the vehicle depends. The primarily extracted moving objects may include both vehicles and high-rise buildings. However, these moving objects are only moving objects that should be extracted when they are present on the road. Therefore, the invention additionally considers the environment dependence attribute of the moving object, and further extracts the image on the basis of initial extraction, thereby eliminating the defect that objects with pseudo-moving characteristics such as high-rise buildings or iron towers are erroneously judged as the moving object in the prior art.

The method of the present invention will be described in detail below with reference to figure 1, which shows a Jilin video 03 star in the region of an airport in the atlanta, as an embodiment.

When the invention collects images, the collected timeout phase images comprise a current frame and a frame of image collected in a period of time before and after the current frame, so as to form a three-frame sequence of the previous frame, the current frame and the following frame. The time interval of data reading is determined by the speed and the length of the target to be extracted and the frame frequency of the video, so that the moving target is prevented from overlapping between adjacent frames to the greatest extent. In particular, for these three frames of images, the acquisition time interval between adjacent frames should ensure that the motion amplitude of the moving object is between 10m and 100m, so as to avoid overlapping of the moving object between different frames. Specifically, the previous frame, the current frame, and the subsequent frame may be sequentially read. And after initializing the first two frames of images, reading the next frame of image as a frame after the current frame. For this embodiment, the frame rate of the Jilin No. one video satellite is 10 frames/s, the movement speed of the ground vehicle is typically 20-160m/s, and the size of the ground vehicle target is typically 3-15 m. Therefore, in the present embodiment, 0.5s is set as the time interval of image acquisition (or image reading), so that most of the vehicle targets can be detected. Only a very small number of very long trucks and very low speed vehicles will cause cavitation due to the differences, which can also be eliminated during the post-treatment. The three frames of images with time intervals of the selected images which are determined together according to the speed of the moving object and the data frame rate of the overtime phase are obtained through the steps.

The invention aims at the steps of image preprocessing, namely, registering (transforming) two frames of images extracted before and after the current frame by taking the current frame as a reference. Therefore, the phenomenon that the subsequent target extraction is influenced by image distortion and the like caused by the factors such as the shaking of the satellite and the like can be avoided. Specifically, referring to fig. 2, a current frame and a previous frame of image are read into an array, and key point detection and feature description are respectively performed on the two frames of images by adopting SIFT, SURF, ORB, AKAZE algorithm. In this embodiment, AKAZE algorithm is used to screen 644 key points in a common way. After the key points of the two frames of images are identified, the matcher is used for carrying out feature matching on the key points of the two frames of images, the matching method is to calculate the distance of descriptors between each pair of key points and return the minimum distance in k best matching with each key point. Specifically, the distance between descriptors of each pair of key points is measured, and k best matching key point pairs with the smallest distance to each key point are obtained. As shown in fig. 2, in the present embodiment, 515 best-matching keypoint pairs are returned in total. Then, a homography matrix (Homographies) of two image transformations is calculated according to the matching point pairs, and image deformation is performed on a frame of image before the current frame. In this embodiment, in order to ensure an optimal deformation effect, the RANSAC algorithm is used to remove abnormal point pairs during deformation. And after the final deformation is finished, a registered front frame image can be obtained. The current frame and a frame image after the current frame are read into the array, and the registered frame image after registration can be obtained by using the same steps as described above.

The inter-frame registration (i.e. preprocessing) of the image is completed by the steps, so that the object in the image can be ensured not to be distorted to influence the subsequent detection and recognition flow. The invention adopts two methods to primarily extract the moving object, namely extracting the moving object based on the speed attribute and the time sequence attribute of the moving object. When the moving object is extracted based on the speed attribute of the moving object, the current frame and a frame of image before the current frame are subjected to dense optical flow calculation by utilizing an optical flow method, so that the optical flow state of each pixel is obtained, the speed and the direction of the pixel are unchanged, and the background is considered, otherwise, the background is considered as the foreground object. In this way, the speed of movement of the target at the pixel level is obtained, thereby achieving a speed measurement of the target. When the moving target is extracted based on the time sequence attribute of the moving target, the improved three-frame difference method is utilized to combine the mobility characteristics of the target on the time sequence, so as to carry out preliminary division of the foreground target and the background target.

In extracting a moving object using the optical flow method, it should be based on an image gradient constant assumption and a local optical flow constant assumption. That is, the brightness does not change when the same point changes with time. This is an assumption of the basic optical flow method that all optical flow method variants must be satisfied for deriving the optical flow method basic equation. In addition, it should be ensured that the time change does not cause a drastic change in position, so that the gray scale can deflect the position. In this way, the partial derivative of gray scale with respect to position can be approximated by the gray scale change caused by the unit position change between the previous and subsequent frames. In the embodiment, gunnar Farneback algorithm is adopted to calculate dense optical flow, the method performs point-to-point matching on all points on the image, and offset of all points is calculated to obtain an optical flow field, so that registration is performed. In the present embodiment, the previous frame image and the current frame image for which optical flow calculation is required are sequentially input, the image ratio is designated to be 0.5, and a (image) pyramid is constructed for each image. The number of layers of the pyramid is determined to be 3, the average window size is determined to be 12, the iteration number of the algorithm at each layer of the image pyramid is determined to be 3, and the number of adjacent pixel points expanded by the polynomial calculated at each pixel point is determined to be 7. The gaussian standard deviation for the smoothed derivative is then determined and used as the basis for polynomial expansion, which is set to 1.5 in this embodiment. In addition, the present embodiment uses the input stream as the initial stream approximation, although Gauss ian filters may be used in other embodiments.

After the setting is completed, the calculated optical flow is converted from a Cartesian coordinate system to a polar coordinate system, and the speed and the direction of the moving target of each pixel point are obtained. As shown in fig. 3, the higher the brightness in the optical flow chart is, the higher the speed is, and then a certain threshold value can be set to judge the moving object in the optical flow chart. For example, if the speed value is 0,1,2,3, &.. the point equal to or greater than 1 may be regarded as a moving object in the light flow graph, i.e. the threshold may be set to 1. Therefore, the moving object can be initially extracted according to the speed attribute of the moving object, namely, the object with the extraction speed being more than or equal to 1. In particular, the direction of a moving object may be characterized by color. And according to the speed and the direction of optical flow calculation, the speed and the direction value of the pixel point are 0, and are expressed as background targets, otherwise, the background targets are expressed as foreground targets.

When a moving object is extracted by using a three-frame difference method, three-frame images acquired in the image acquisition step are read. Converting the images from RGB images into gray images, and respectively carrying out inter-frame difference on the gray images of the current frame and the gray images of the frames before and after the current frame to obtain two difference images. Then, a threshold value can be set, and the difference value graphs of the two frames are respectively binarized to obtain two binary graphs. In the present embodiment, the threshold value is set to 40, and the higher the value, the brighter the threshold value, and the lower the value, the darker the threshold value, that is, the threshold value is set to 1 when the pixel is greater than the pixel in binarization, and is set to 0 when the pixel is greater than the pixel in binarization. Thus, the foreground (i.e., bright spot area) and the background (i.e., black area) can be distinguished, and thus, moving objects respectively extracted from the current frame and the frame images before and after the current frame are obtained. Then, the two binary images need to be subjected to an and (i.e. intersecting) operation, which can be understood as solving an intersection (also called stacking analysis) of two frames of binary images, for example, a point with 1 at the same time is a moving object (bright point), so that the moving object in the image can be initially extracted from the binary images after intersection, as shown in fig. 4.

The moving object in the image is initially extracted according to the speed attribute and the time sequence attribute of the moving object by the optical flow method and the three-frame difference method respectively. According to the above-described concept of the present invention, it is also necessary to extract a road object from which a moving object comes. In contrast, the invention utilizes a target (road) recognition method based on deep learning to extract the target dependent environment elements (namely the road targets), and the extraction of the ground moving target dependent environment mainly refers to the extraction of the road targets (vehicles) from the high-resolution satellite remote sensing image. The invention adopts the D_ LinkNet network to extract the road target. For this, it is first necessary to construct a road extraction network. Since the present embodiment uses the vehicle as a moving target and the road as a road target. The road extraction network is thus a U-shaped road extraction network based on the encoder-br idge-decoder structure. The encoder part in the network is one ResNet and the bridge part is five (regular) convolution blocks, the encoder part is the inverse of the encoder part. And the decoder part adopts a mode of upsampling and overlapping with the decoder part at the same level, thereby realizing the feature fusion of different spatial scales. In constructing a road extraction network, first a sample dataset of network training and testing is used to generate the road extraction network. Specifically, it is necessary to select a portion of representative remote sensing images of some target satellites in a targeted manner. Representative data may be selected by gray scale attributes, such as typical gray scale values of a road object at different times in the history. The data set constructed by the images can ensure that the trained model can identify various types of road targets at all times. Of course, the characteristics of the target object can be embodied according to other attributes such as space, spectrum and the like of the target object, so that model identification is facilitated. After the representative image is selected, the road target and other targets in the representative image are required to be marked (data set), the data set is divided into a training set, a verification set and a test set according to a certain proportion, the training set is used for carrying out iterative training on network parameters, and the verification set is used for verifying whether the trained model can reach expected precision. The training set is input into the road extraction network, and iterative training is carried out on network parameters until the network achieves ideal loss and precision on the training set and the verification set. Finally, taking the image frame (i.e. the current frame) of the video satellite as a test data set, inputting the trained network, and outputting to obtain a final road segmentation result, namely a road map (i.e. an extraction result) in the image corresponding to the current frame, as shown in fig. 5.

Thus, the moving target extracted in two modes and the road target obtained by using the deep learning algorithm can be obtained according to the steps. And then carrying out superposition analysis on the targets extracted three times, namely, jointly completing final extraction of the moving target by utilizing extraction results of three modes. Specifically, the three types of results, namely the moving object extracted by the optical flow method and the three-frame difference method and the extraction result (image) of the road object extracted based on the deep learning, are subjected to binary storage of 0 and 1 to form a binary image, and the moving object is stored as 1 (namely a bright spot) during storage, namely 0 represents a background object and 1 represents a foreground. Thus, the target with the numerical value of 1 after binarization storage is extracted from the current frame image, and the accurate (further) extraction result of the moving target obtained by fusing the multi-source information attribute can be obtained.

Thus, the present invention essentially extracts moving objects from images by using an optical flow method and a three-frame difference method, respectively, that is, integrates the object velocity attribute obtained by the optical flow method and the foreground object obtained by the three-frame difference method. The two methods have advantages and disadvantages, and the two methods are combined to enable mutual correction, and the most accurate moving target extraction result can be obtained by matching with the environmental attribute consideration on which the moving target depends. Of course, although the method has basically completed the final extraction of the moving object, the extraction result is only the initial result, and the method still needs to be perfected through a post-processing step.

Specifically, as shown in fig. 6, the data post-processing module is used to perform morphological processing on the image extracted by the target information, so as to eliminate the influence of small spots and small holes (i.e. random noise and image registration error) in the image on the integrity of the moving target (vehicle), and obtain a final result. Specifically, a 3*3 circular template structure is adopted to perform morphological opening operation on the image, so that small isolated points (namely spots) are eliminated. And then adopting 3*3 circular template structure to make morphological closing operation on the image passed through morphological opening operation so as to remove the influence of small hole on the whole body of vehicle. And the connectivity analysis is also required to be carried out on the result, and the wanted moving target is extracted according to a certain rule, wherein the rule can be formulated according to the size and the length-width ratio of the target, the area ratio of the target to the minimum circumscribed rectangle and the like. For example, for a vehicle object, only those moving objects having a size (dimension) >4 pels, and at the same time <2000 pels, the aspect ratio of the object being less than or equal to 8, the ratio of the object area to the smallest circumscribed rectangular area being >0.2, the average pixel value of the object being >10, or <250, would be considered a vehicle object. After morphological analysis and connectivity analysis, the current frame image marked with the final moving target extraction result can be obtained by storing the result graph.

In summary, the method is oriented to the moving target detection technology based on the super-time phase data, integrates the processes of inter-frame registration, optical flow calculation, three-frame difference, target dependent environment extraction based on deep learning, post-processing and the like of the super-time phase data on the basis of the traditional inter-frame difference and optical flow method, and therefore comprehensively utilizes the speed attribute, the time sequence attribute, the spectral attribute and the spatial position attribute of the ground object, namely perceives the multi-dimensional attribute of the target from different angles, and forms a whole set of moving target detection method oriented to the satellite super-time phase data. Compared with the traditional moving object detection method such as an inter-frame difference method, an optical flow method, a background modeling method and the like, the method has higher extraction precision, can further perform more accurate moving object detection, and has more practicability in satellite overtime phase data application. Therefore, the problems of low detection accuracy and high false detection rate generated when the current moving target detection algorithm is applied to satellite video data detection are solved, and more accurate detection of moving targets is realized. In summary, the method can be used for research on detection, tracking and the like of the moving target (ground moving vehicle target) of the space-based satellite super-time phase data, has very good universality and higher extraction precision in the extraction of the moving target of the satellite super-time phase data, has great significance in developing research on traffic flow analysis based on a video satellite, positioning and tracking of the ground moving target and the like, and can be widely applied to the fields of intelligent transportation, smart cities, emergency rescue and the like.

The above description is only one embodiment of the present invention and is not intended to limit the present invention, and various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A moving target extraction method based on multi-source information of satellite super-temporal data, comprising the following steps:

a. Collecting a super-phase image of a region and preprocessing the image;

b. Preliminarily extract the moving target in the image, and extract the road target on which the moving target in the image depends;

c. Based on the initially extracted moving targets and road targets, overlay analysis is performed to jointly complete further extraction of moving targets in the image;

d. Perform morphological processing on the target result image extracted in step c to obtain the final result;

In the step b, the moving target in the image is initially extracted based on the speed attribute and the time series attribute of the moving target;

When extracting moving targets based on their speed attributes, the optical flow method is used to perform dense optical flow calculations on the current frame and the previous frame to obtain the optical flow state of each pixel. If the speed and direction of the pixel remain unchanged, it is the background, otherwise it is the foreground target.

When extracting moving targets based on their time series attributes, the three-frame difference method is used in combination with the moving characteristics of the moving targets in the time series to preliminarily divide the foreground targets and background targets.

The D_LinkNet network based on deep learning is used to extract road targets that are dependent on moving targets in images;

When extracting a moving target based on its speed attribute, the previous frame image and the current frame image for which optical flow calculation is required are input in sequence, and a pyramid is constructed for each image at a specified image ratio;

Determine the number of pyramid levels, the size of the averaging window, the number of iterations of the algorithm at each level of the image pyramid, the number of neighboring pixels at each pixel for which the polynomial expansion is calculated, the Gaussian standard deviation used for smoothing derivatives, and the initial flow approximation;

Convert the calculated optical flow from the Cartesian coordinate system to the polar coordinate system to obtain the speed and direction of each pixel;

According to the speed and direction of the optical flow solution, the pixel whose speed and direction values are both 0 is represented as a background target, otherwise, it is a foreground target;

The morphological processing in step d is to use a 3*3 circular template structure to perform a morphological opening operation on the image extracted by the final target to eliminate spots in the image;

A 3*3 circular template structure is used to perform morphological closing operation on the image after morphological opening operation to eliminate holes in the image;

It also includes connectivity analysis of the morphologically processed image and extracting the final moving target according to the following rules:

When the target size is between 4 pixels and 2000 pixels, the target aspect ratio is less than 8, the ratio of the target area to the minimum circumscribed rectangle area is greater than 0.2, and the target's average pixel value is between 10 and 250, it is determined to be a vehicle target.

2. The method according to claim 1, characterized in that in step a, a current frame image is captured and one frame image is captured in a period of time before and after the current frame to form a three-frame sequence of a previous frame, a current frame and a subsequent frame;

The time interval between the acquisition of three frames of images is determined by the speed and length of the moving target and the frame rate of the video. The movement amplitude of the moving target between the extracted adjacent frames is between 10m and 100m.

3. The method according to claim 2, characterized in that the preprocessing in step a is to perform inter-frame registration on a frame of image extracted before and after the current frame based on the current frame;

The inter-frame registration includes reading the current frame image and the previous or subsequent frame image into an array, and using SIFT, SURF, ORB or AKAZE algorithm to perform key point detection and feature description on the two frames of images respectively;

The matcher is used to perform feature matching on the key points of the two frames. The matching method is to calculate the distance between the descriptors of each pair of key points and return the minimum distance among the k best matches to each key point.

The homography matrix of the two images is calculated based on the matching point pairs, and the image is deformed for the image before or after the current frame. The RANSAC algorithm is used to remove abnormal point pairs during deformation.

4. The method according to claim 1, characterized in that when extracting the moving target based on the time series attribute of the moving target in step b, all three frames of images collected are read, and the images are converted from RGB images to grayscale images;

Perform inter-frame difference between the grayscale image of the current frame and the grayscale images of the previous and next frames to obtain two difference images;

A threshold is set to binarize the two difference images respectively, and two binary images are obtained that distinguish the foreground and background objects;

Perform an AND operation on two binary images and extract the moving target in the intersected image.

5. The method according to claim 1 or 4, characterized in that when extracting the road target on which the moving target depends in step b, firstly constructing a sample data set for network training and testing to generate a road extraction network;

Select some remote sensing images from target satellites and mark the road targets and other targets in the images;

The data set is divided into training set, validation set and test set according to a certain ratio. The training set is used to iteratively train the network parameters, and the validation set is used to verify whether the trained model can achieve the expected accuracy.

The current frame image is used as a test data set and input into the trained road extraction network to obtain the final road segmentation result.

6. The method according to claim 5, characterized in that the road extraction network is a U-type network based on an encoder-bridge-decoder structure;

The encoder part of the network is a ResNet34, and the bridge part is five convolutional blocks;

The decoder part is the inverse operation of the encoder part, and uses upsampling and superimposed with the encoder part of the same level.

7. The method according to claim 6, characterized in that, in step c, the three types of results extracted in step b are stored as 0 and 1, respectively, 0 represents background target, and 1 represents foreground;

Targets whose values are all 1 after binary storage are extracted as the accurate extraction results of moving targets.