CN113469041A

CN113469041A - Image processing method and device, computer equipment and storage medium

Info

Publication number: CN113469041A
Application number: CN202110738815.XA
Authority: CN
Inventors: 吴尧; 四建楼
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-10-01
Also published as: WO2023273102A1

Abstract

The present disclosure provides an image processing method, device, computer equipment and storage medium, wherein the method includes: from a target image, acquiring a first image area corresponding to a first preset part of at least one object to be matched; the target image It is a frame of image in the video clip; based on the first image feature information corresponding to the first preset part of the target tracking object and the second image feature information corresponding to each first image area, from at least one to-be-matched object is selected and matched. a target matching object matched with the target tracking object; based on the first image region of the target matching object, obtain a second image region corresponding to the second preset part of the target matching object from the target image; perform semantic segmentation on the second image region, Obtain the segmentation result of the second preset part.

Description

Image processing method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, a computer device, and a storage medium.

Background

For a video file comprising a plurality of objects, the same target object tends to appear in a plurality of frames of images, so that in some scenes, a need may exist for tracking the target object and continuously segmenting a specific area of the target object. However, most of the existing segmentation techniques perform overall segmentation on a human body of a target object, and for segmentation of a partial region (e.g., a head region) in the target object, segmentation accuracy and efficiency cannot be guaranteed, and a tracking effect on the target object cannot be guaranteed while the segmentation accuracy is guaranteed.

Disclosure of Invention

The embodiment of the disclosure at least provides an image processing method, an image processing device, a computer device and a storage medium, so as to ensure the tracking effect of a target object while ensuring the segmentation precision.

In a first aspect, an embodiment of the present disclosure provides an image processing method, including:

acquiring a face area to be matched corresponding to the face of at least one object to be matched from a target image;

screening a target matching object matched with the target tracking object from the at least one object to be matched based on first image characteristic information corresponding to the face of the target tracking object and second image characteristic information corresponding to each face area to be matched;

acquiring a target head area corresponding to the head of a target matching object from the target image based on a face area to be matched of the target matching object;

and performing semantic segmentation on the target head region to obtain a head segmentation result of the target matching object.

Based on the first image feature information corresponding to the face of the target tracking object and the second image feature information of the face area to be matched corresponding to each object to be matched, the second image feature information matched with the first image feature information can be accurately screened out from the plurality of second image feature information, namely, the target matching object matched with the target tracking object can be accurately screened out, and accurate tracking of the target tracking object is achieved. And then, a target head area corresponding to the head is determined through the screened face area to be matched, and further, the target head area is only required to be subjected to semantic segmentation without processing other areas in the target image, so that the range of image processing can be narrowed, the accuracy of the image area to be processed can be ensured, and the calculated amount in the segmentation process is reduced on the basis of reducing the details of the image to be processed. In addition, since only the image details of the target head region need to be processed, the method is beneficial to improving the segmentation precision of the segmented head and obtaining more accurate segmentation results.

In a possible implementation manner, the obtaining, from the target image, a face region to be matched corresponding to a face of at least one object to be matched includes:

performing key point detection on the target image, and determining a target key point corresponding to the face of each object to be matched in at least one object to be matched included in the target image;

and for each object to be matched in at least one object to be matched, determining a face area to be matched corresponding to the object to be matched based on the target key point corresponding to the object to be matched.

The method for detecting the key points comprises the steps of detecting a target image by using a key point detection mode, firstly, accurately determining the target key points corresponding to the face of each object to be matched, and further, accurately determining the face area to be matched corresponding to the face of the object to be matched based on the determined accurate target key points.

In a possible implementation manner, the screening, from the at least one object to be matched, a target matching object matching the target tracking object based on first image feature information corresponding to a face of the target tracking object and second image feature information corresponding to each face region to be matched, includes:

and matching the first image characteristic information corresponding to the face of the target tracking object with the second image characteristic information corresponding to each face area to be matched, and taking the object to be matched corresponding to the second image characteristic information matched with the first image characteristic information as the target matching object.

The image characteristic information of the face of the target tracking object is necessarily the same or similar in different images, so that the second image characteristic information corresponding to the object to be matched is screened by utilizing the first image characteristic information corresponding to the target tracking object, the second image characteristic information matched with the first image characteristic information can be accurately screened, and further, the target matching object can be accurately determined based on the screened second image characteristic information.

In a possible implementation manner, before the obtaining, from the target image, a face region to be matched corresponding to a face of at least one object to be matched, the method further includes:

acquiring a tracking image;

based on the tracking image, determining a target tracking object and first image feature information corresponding to the face of the target tracking object in the tracking image.

By using the tracking image and some preset tracking requirements, the target tracking object and the first image characteristic information can be more accurately determined from the tracking image.

In one possible embodiment, the determining the target tracking object based on the tracking image includes:

performing key point detection on the tracking image, and determining a target key point corresponding to the face of each initial object in at least one initial object included in the tracking image;

and screening the target tracking object from the at least one initial object based on the key point confidence information of the target key point corresponding to the face of each initial object.

The higher the confidence of the key points, the more accurate the determined target key points are, and further, based on the key point confidence information of the target key points corresponding to each initial object, the target key points with the highest confidence of the key points can be screened out, and the target tracking object determined by the target key points with the highest confidence can be utilized, so that the successful tracking probability can be effectively improved.

In a possible implementation manner, the acquiring, from the target image, a target head region corresponding to a head of a target matching object based on a face region to be matched of the target matching object includes:

determining second position relation information between the face area to be matched and the target head area based on the first position relation information of the face and the head;

and acquiring a target head area corresponding to the head of the target matching object from the target image based on the second position relation information and the face area to be matched.

Based on the first position relation information of the face and the head, the size relation and the position relation between the face area to be matched corresponding to the face and the target head area corresponding to the head can be determined relatively accurately, and further, based on the determined size relation and the determined position relation, the conversion relation between the two image areas, namely the second position relation information, can be determined, so that the target head area can be determined relatively accurately based on the second position relation information.

In one possible embodiment, the method further comprises:

and under the condition that the second image characteristic information matched with the first image characteristic information does not exist, determining a new target tracking object based on the key point confidence information of the target key point corresponding to the face of each object to be matched.

Based on the confidence information of the target key points corresponding to the face of each object to be matched, the target key points corresponding to the face with the highest confidence can be determined, and then the new target tracking object with the highest success rate can be screened out by utilizing the target key points.

In a possible implementation, the performing semantic segmentation on the target head region to obtain a head segmentation result of the target matching object includes:

cutting out a region segmentation image corresponding to the target head region from the target image based on the target head region;

extracting the regional characteristic information and the structural information of the regional segmentation image by using a pre-trained deep neural network;

extracting region feature information respectively corresponding to different feature dimensions in the region segmentation image; wherein the plurality of feature dimensions include a first feature dimension and a second feature dimension that are adjacent, the first feature dimension being lower than the second feature dimension; the region feature information corresponding to the first feature dimension is determined based on the region feature information corresponding to the second feature dimension and the structural information of the region feature information corresponding to the second feature dimension; the structural information of the regional characteristic information corresponding to the second characteristic dimension is extracted by using the pre-trained deep neural network;

and performing semantic segmentation on the region segmentation image based on region feature information respectively corresponding to different feature dimensions to obtain a head segmentation result of the target matching object.

The low characteristic dimension can reflect the depth characteristics of the main body part of the area segmentation image, and the high characteristic dimension can reflect the depth characteristics of the edge part of the area segmentation image, so that the integral depth characteristics of the area segmentation image can be completely and accurately reflected by utilizing the area characteristic information of different characteristic dimensions, and therefore, the head is segmented based on the area characteristic information of different characteristic dimensions, the segmentation precision is improved, and accurate segmentation results can be obtained.

In a possible implementation manner, the performing semantic segmentation on the region segmentation image based on the region feature information respectively corresponding to different feature dimensions includes:

for each feature dimension in the different feature dimensions, determining a first semantic prediction result of the region segmentation image under the feature dimension based on region feature information corresponding to the feature dimension;

determining the probability that each pixel point in the region segmentation image is a pixel point corresponding to the head based on a first semantic prediction result of the region segmentation image under each feature dimension;

and performing semantic segmentation on the region segmentation image based on the probability that each pixel point in the region segmentation image is the pixel point corresponding to the head and a preset segmentation probability value.

The first semantic prediction result is used for representing the probability that each pixel point is the pixel point corresponding to the head, the pixel points with lower probability can be screened out by utilizing the preset segmentation probability value, the pixel points with higher probability are reserved, and the head segmentation is carried out by utilizing the probability of multiple dimensions corresponding to the pixel points and the preset segmentation probability value, so that the improvement of the head segmentation precision is facilitated.

In a possible implementation manner, the determining, based on a first semantic prediction result of the region segmentation image in each feature dimension, a probability that each pixel point in the region segmentation image is a pixel point corresponding to a head portion includes:

performing multiple times of fusion processing according to the sequence of the different feature dimensions from low to high to obtain the probability that each pixel point in the region segmentation image is the pixel point corresponding to the head;

wherein, the ith fusion processing in the multiple fusion processing comprises the following steps:

determining semantic confidence information of a first semantic prediction result under the first feature dimension;

fusing the first semantic prediction result under the first characteristic dimension and the first semantic prediction result under the second characteristic dimension by utilizing semantic confidence information of the first semantic prediction result under the first characteristic dimension to obtain a target semantic prediction result under the second characteristic dimension;

and updating the target semantic prediction result into a first semantic prediction result of a first feature dimension in the (i + 1) th fusion process.

The semantic confidence information can reflect the accuracy of the first semantic prediction result, multiple times of fusion processing are performed according to the sequence of different feature dimensions from low to high, and finally a target semantic prediction result fused with each first semantic prediction result is obtained, so that the deep neural network can generate different attention to the first semantic prediction results of the multiple feature dimensions, and the accuracy of the deep neural network is improved.

In a possible implementation manner, after the obtaining of the head segmentation result of the target matching object, the method further includes:

setting the color value of a pixel point corresponding to the head part in the target image as a first target value based on the segmentation result;

and setting the color value of the pixel point except the head in the target image as a second target value.

In this way, the divided head can be clearly exhibited in the target image.

acquiring a special effect processing request aiming at the target head area of the target image;

performing special effect processing on the target head region based on the special effect processing request and the head segmentation result; or

Acquiring color changing information of a hair region in the target head region for the target image;

and performing color changing processing on the hair area based on the color changing information and the head segmentation result.

Therefore, the special effect processing of the target image can be realized, and the target image with different special effects can be obtained.

In a second aspect, an embodiment of the present disclosure further provides an image processing apparatus, including:

the first acquisition module is used for acquiring a face area to be matched corresponding to the face of at least one object to be matched from a target image;

the matching module is used for screening a target matching object matched with the target tracking object from the at least one object to be matched based on first image characteristic information corresponding to the face of the target tracking object and second image characteristic information corresponding to each face area to be matched;

the second acquisition module is used for acquiring a target head area corresponding to the head of a target matching object from the target image based on a face area to be matched of the target matching object;

and the segmentation module is used for performing semantic segmentation on the target head region to obtain a head segmentation result of the target matching object.

In a possible implementation manner, the first obtaining module is configured to perform keypoint detection on the target image, and determine a target keypoint corresponding to a face of each object to be matched in at least one object to be matched included in the target image;

In a possible implementation manner, the matching module is configured to match first image feature information corresponding to the face of the target tracking object with second image feature information corresponding to each face region to be matched, and use an object to be matched corresponding to the second image feature information matched with the first image feature information as the target matching object.

In a possible embodiment, the apparatus further comprises:

the third acquisition module is used for acquiring a tracking image before the first acquisition module acquires a face area to be matched corresponding to the face of at least one object to be matched from the slave target image;

In a possible implementation manner, the third obtaining module is configured to perform keypoint detection on the tracking image, and determine a target keypoint corresponding to a face of each initial object in at least one initial object included in the tracking image;

In a possible implementation manner, the second obtaining module is configured to determine, based on the first positional relationship information of the face and the head, second positional relationship information between the face region to be matched and the target head region;

In a possible implementation manner, the matching module is further configured to determine a new target tracking object based on the keypoint confidence information of the target keypoint corresponding to the face of each object to be matched, if it is determined that there is no second image feature information matching the first image feature information.

In a possible implementation manner, the segmentation module is configured to crop out, based on the target head region, a region segmentation image corresponding to the target head region from the target image;

In a possible implementation manner, the segmentation module is configured to determine, for each feature dimension of the different feature dimensions, a first semantic prediction result of the region segmentation image in the feature dimension based on region feature information corresponding to the feature dimension;

In a possible implementation manner, the segmentation module is configured to perform multiple times of fusion processing according to the sequence from low to high of the different feature dimensions, and then obtain a probability that each pixel in the region segmentation image is a pixel corresponding to the head;

In a possible embodiment, the apparatus further comprises:

the setting module is used for setting the color value of a pixel point corresponding to the head in the target image as a first target value based on the segmentation result after the segmentation module obtains the head segmentation result of the target matching object;

In a possible embodiment, the apparatus further comprises:

the processing module is used for acquiring a special effect processing request aiming at the target head area of the target image after the head segmentation result of the target matching object is obtained by the segmentation module;

In a third aspect, this disclosure also provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the steps in the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.

For the description of the effects of the image processing apparatus, the computer device, and the computer-readable storage medium, reference is made to the description of the image processing method, which is not repeated here.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 shows a flowchart of an image processing method provided by an embodiment of the present disclosure;

fig. 2 is a flowchart illustrating a method for determining a face region to be matched according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating a determined face region to be matched in a target image according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram illustrating a face region to be matched and a target head region provided by an embodiment of the present disclosure;

FIG. 5 illustrates a flow chart of a method for semantically segmenting a target head region provided by an embodiment of the present disclosure;

fig. 6 shows a flowchart of a method for determining region feature information corresponding to a first feature dimension according to an embodiment of the present disclosure;

FIG. 7 illustrates a flow chart of a method for semantically segmenting a target head region provided by an embodiment of the present disclosure;

FIG. 8 is a flow chart illustrating a fusion process provided by an embodiment of the present disclosure;

fig. 9 is a schematic diagram illustrating a deep neural network performing semantic segmentation processing on a region segmentation image corresponding to a target head region according to an embodiment of the present disclosure;

FIG. 10 is a schematic flow chart illustrating a process for determining a head segmentation image corresponding to a target image according to an embodiment of the present disclosure;

fig. 11 shows a schematic diagram of an image processing apparatus provided by an embodiment of the present disclosure;

fig. 12 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

Furthermore, the terms "first," "second," and the like in the description and in the claims, and in the drawings described above, in the embodiments of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein.

Reference herein to "a plurality or a number" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Research shows that, for a video file comprising a plurality of objects, a plurality of frames of images of the video file often have the same target object, and therefore, in some scenes, there may be a need to track the target object, continuously segment the head of the target object, and perform special effect processing on the segmented head. For example, hair is dyed, and head effects are increased.

However, in most of the conventional segmentation techniques, the human body as the target object is segmented, and the segmentation of the head at a local part of the human body cannot ensure the segmentation accuracy, and even if the head is segmented, the whole human body as the target object needs to be segmented first, and then the head region needs to be segmented based on the segmented human body, thereby reducing the segmentation efficiency.

And the head is segmented based on the obtained human body, all pixel points corresponding to the human body need to be identified, so that the calculation amount needed to be processed is increased, the image processing range is expanded, and partial head details are inevitably ignored in the segmentation process, so that the head segmentation precision is further reduced.

Based on the above research, the present disclosure provides an image processing method, an image processing apparatus, a computer device, and a storage medium, which can relatively accurately screen out second image feature information matched with first image feature information from a plurality of second image feature information based on first image feature information corresponding to a face of a target tracking object and second image feature information of a face region to be matched corresponding to each object to be matched, that is, can relatively accurately screen out a target matching object matched with the target tracking object, thereby implementing accurate tracking of the target tracking object. And then, a target head area corresponding to the head is determined through the screened face area to be matched, and further, the target head area is only required to be subjected to semantic segmentation without processing other areas in the target image, so that the range of image processing can be narrowed, the accuracy of the image area to be processed can be ensured, and the calculated amount in the segmentation process is reduced on the basis of reducing the details of the image to be processed. In addition, since only the image details of the target head region need to be processed, the method is beneficial to improving the segmentation precision of the segmented head and obtaining more accurate segmentation results.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

To facilitate understanding of the present embodiment, first, an image processing method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the image processing method provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and in some possible implementations, the image processing method may be implemented by a processor calling a computer readable instruction stored in a memory.

The following describes an image processing method provided by an embodiment of the present disclosure, taking an execution subject as a computer device as an example.

As shown in fig. 1, a flowchart of an image processing method provided in an embodiment of the present disclosure may include the following steps:

s101: and acquiring a face area to be matched corresponding to the face of at least one object to be matched from the target image.

The target image may be a frame of image in a video clip, and may include at least one object to be matched.

The object to be matched may be an object appearing in the target image, or may appear in other frame images in the video segment corresponding to the target image. The object to be matched may be a person, an animal, or the like appearing in the target image.

For acquiring the target image, after the video segment is acquired, the current frame image being processed may be taken as the target image. Alternatively, the target image may be a specified target image directly acquired by user input.

In specific implementation, after the target image is acquired, image processing may be performed on the target image to determine an object to be matched included in the target image, and then, for each object to be matched, the face of the object to be matched and the position of the face in the target image are determined, and further, a face region to be matched corresponding to the face of the object to be matched may be determined based on the determined position of the face of the object to be matched in the target image.

S102: and screening a target matching object matched with the target tracking object from at least one object to be matched based on first image characteristic information corresponding to the face of the target tracking object and second image characteristic information corresponding to each face area to be matched.

Here, the target tracking object is a specific object requiring head segmentation, and the first image feature information corresponding to the face of the target tracking object may be stored in advance before image processing, or may be obtained based on an image previously processed during object tracking and image segmentation, which is not limited herein.

The target matching object is an object to be matched, which is the same object as the target tracking object, in the objects to be matched. The image feature information may include key point information, texture information, color information, and the like corresponding to the image region.

In this step, for each object to be matched in at least one object to be matched, after determining a face region to be matched corresponding to the face of the object to be matched, second image feature information corresponding to the face region to be matched may be determined. Based on the first image characteristic information, second image characteristic information corresponding to each object to be matched in the at least one object to be matched can be determined.

Then, based on the second image feature information corresponding to each object to be matched and the first image feature information corresponding to the face of the target tracking object, the second image feature information, in which the face corresponding to the image feature information is consistent with the face corresponding to the first image feature information, can be screened out, and further, based on the second image feature information, the corresponding object to be matched can be determined to be the target matching object.

S103: and acquiring a target head area corresponding to the head of the target matching object from the target image based on the face area to be matched of the target matching object.

In specific implementation, the position relationship, the size relationship and the like between the face region to be matched and the target head region can be determined according to the incidence relationship between the face and the head corresponding to the face region to be matched, and then the target head region corresponding to the head of the target matching object is determined from the target image based on the face region to be matched, the determined position relationship, the determined size relationship and the like.

In one embodiment, after the target head region is determined, an image corresponding to the target head region may be cropped in the target image to obtain a region segmentation image corresponding to the target head region.

S104: and performing semantic segmentation on the target head region to obtain a head segmentation result of the target matching object.

In this step, after the target head region is determined, semantic segmentation may be performed on the target head region in a semantic segmentation manner, so as to determine semantic information of each pixel point in the target head region.

Furthermore, the pixel points belonging to the head in the target head region can be determined according to the semantic information of the pixel points corresponding to the head and the semantic information of each pixel point. The target head region may then be segmented based on the pixel points belonging to the head, resulting in a head segmentation result of the target matching object.

In addition, if the region segmentation image corresponding to the target head region is determined, the region segmentation image can be directly subjected to semantic segmentation to obtain a head segmentation result of the target matching object.

Therefore, based on the first image characteristic information corresponding to the face of the target tracking object and the second image characteristic information of the face area to be matched corresponding to each object to be matched, the second image characteristic information matched with the first image characteristic information can be screened out from the plurality of second image characteristic information more accurately, namely, the target matching object matched with the target tracking object can be screened out more accurately, and accurate tracking of the target tracking object is achieved. And then, a target head area corresponding to the head is determined through the screened face area to be matched, and further, the target head area is only required to be subjected to semantic segmentation without processing other areas in the target image, so that the range of image processing can be narrowed, the accuracy of the image area to be processed can be ensured, and the calculated amount in the segmentation process is reduced on the basis of reducing the details of the image to be processed. In addition, since only the image details of the target head region need to be processed, the method is beneficial to improving the segmentation precision of the segmented head and obtaining more accurate segmentation results.

In an embodiment, for S101, a face region to be matched may be determined according to the method shown in fig. 2, and as shown in fig. 2, a flowchart of a method for determining a face region to be matched provided by the embodiment of the present disclosure may include the following steps:

s201: and carrying out key point detection on the target image, and determining a target key point corresponding to the face of each object to be matched in at least one object to be matched included in the target image.

Here, the key point detection neural network may be used to perform key point detection on the target image to determine a target key point corresponding to the face in the target image.

In specific implementation, the target image can be input into a pre-trained key point detection neural network, and the key point detection neural network is utilized to process the target image and determine all key points in the target image. Then, the target key points belonging to the face of the same object to be matched can be determined according to the position relationship between each target key point, and further, the target key point corresponding to the face of each object to be matched in at least one object to be matched included in the target image can be determined.

S202: and for each object to be matched in at least one object to be matched, determining a face area to be matched corresponding to the object to be matched based on the target key point corresponding to the object to be matched.

In specific implementation, for each object to be matched, based on the target key point corresponding to the object to be matched, an image region determined by the target key point may be determined, for example, a circumscribed rectangular region determined by the target key point may be determined. Furthermore, the determined image area can be used as the face area to be matched corresponding to the object to be matched.

Further, a face area to be matched corresponding to each object to be matched in the target image can be determined. Therefore, the trained key point detection neural network has reliable performance, so that accurate key point information can be output, and the face area to be matched corresponding to the object to be matched can be accurately determined by utilizing the determined key point information.

Taking an object to be matched as a target person as an example, as shown in fig. 3, a schematic diagram of a face region to be matched in a determined target image according to an embodiment of the present disclosure is provided. Wherein A represents a face region to be matched, and points in A are determined key points of the face.

In one embodiment, the image feature information of the face of the target tracking object is necessarily the same or similar in different images, and therefore, for S102, after determining the second image feature information corresponding to each object to be matched, the first image feature information corresponding to the face of the target tracking object may be obtained, and then the first image feature information corresponding to the target tracking object is matched with each second image feature information to determine whether there is matched image feature information, that is, whether there is second image feature information whose face corresponding to the image feature information is consistent with the face corresponding to the first image feature information. If yes, the image to be matched corresponding to the determined matched second image feature information can be used as a target matching image matched with the target tracking object. That is, the target tracking object is screened out.

In another embodiment, after second image feature information corresponding to one object to be matched is determined, the second image feature information can be matched with the first image feature information, whether the second image feature information is matched with the first image feature information is determined, and if the second image feature information is matched with the first image feature information, an image to be matched corresponding to the second image feature information can be directly determined as a target matching image; if not, after determining the second image characteristic information corresponding to the next object to be matched, returning to execute the step of matching with the first image characteristic information. Therefore, if the first image characteristic information determined in advance is the matched image characteristic information, the target matched object can be directly determined, and the second image characteristic information corresponding to other objects to be matched does not need to be determined, so that the matching speed can be improved, and the data volume required to be processed for determining the second image characteristic information corresponding to other objects to be matched can be reduced.

In an embodiment, before determining the face region to be matched in the target image, a tracking image corresponding to the target tracking object needs to be obtained. Then, based on the tracking image, the target tracking object and the image feature information corresponding to the face of the target tracking object in the tracking image may be determined, and then, may be used as the first image feature information for matching. In this way, based on the determined first image feature information, accurate tracking of the target tracking object can be achieved.

Here, the tracking image may be an adjacent frame image corresponding to the target image in the video segment, or may be a non-adjacent frame image corresponding to the target image in the video segment, which is not limited herein.

In specific implementation, the target tracking object may be determined according to the following steps:

step one, carrying out key point detection on a tracking image, and determining a target key point corresponding to the face of each initial object in at least one initial object included in the tracking image.

Here, the initial object may be an object included in the tracking image.

In this step, after the tracking image is obtained, the pre-trained keypoint detection neural network may be used to perform keypoint detection on the tracking image, and determine a target keypoint corresponding to the face of each initial object in at least one initial object included in the tracking image.

Here, the key point detection neural network may also output key point confidence information of the target key point corresponding to each initial object while determining the target key point corresponding to the face.

And secondly, screening the target tracking object from at least one initial object based on the key point confidence information of the target key point corresponding to the face of each initial object.

In specific implementation, the target keypoints with the highest confidence degree are screened out based on the confidence degrees corresponding to the keypoint confidence degree information of the target keypoints corresponding to each initial object, then the face regions corresponding to the target keypoints are determined based on the target keypoints, further, the initial objects corresponding to the face regions can be determined, and the initial objects are used as target tracking objects.

In another embodiment, after the tracking image is acquired, a tracking area specified by the user in the tracking image may be further determined, and then the key point detection may be directly performed on the specified tracking area, and an initial object of the area is determined and is used as the target tracking object.

Similarly, for the target image, whether a tracking area designated by the user exists can be determined, if so, the object to be matched corresponding to the area can be directly used as the target tracking object to perform semantic segmentation, the step of determining the target matching object can be omitted, and therefore, the segmentation of the preset part of the designated target object can be realized.

In addition, after the target key point with the highest confidence degree is determined or the target key point corresponding to the tracking area specified by the user is determined, first image feature information corresponding to the face corresponding to the target key point in the tracking image or the target image needs to be determined, so that object tracking can be performed based on the first image feature information subsequently.

In one embodiment, regarding S102, in the case that it is determined that there is no second image feature information matching the first image feature information, it may be stated that the target tracking object to be tracked is not in the target image. Furthermore, the object to be matched corresponding to the target key point with the highest confidence level can be determined based on the key point confidence level information of the target key point corresponding to the face of each object to be matched, and the object to be matched is used as a new target tracking object. Meanwhile, the image feature information of the face region corresponding to the new target tracking object may be determined and stored as new first image feature information. In this way, tracking of a new target tracking object can be achieved.

Or, in another embodiment, if it is determined that there is no second image feature information matching the first image feature information, the segmentation of the current target image may be abandoned, the other images in the video segment may be continuously acquired, and the step of determining the target matching image may be performed. Therefore, the method can realize the segmentation of the image of one or more frames in which the target object appears in the video clip, thereby realizing the tracking and segmentation of the preset part of the target object and improving the pertinence of image segmentation.

In one embodiment, for S103: the target head region corresponding to the head may be determined as follows:

step one, determining second position relation information between a face area to be matched and a target head area based on first position relation information of a face and the head.

Here, the first positional relationship information is used to characterize the positional relationship between the face and the head, the size inclusion relationship, and the like. The second position relation information is used for representing the conversion relation between the face area to be matched and the target head area.

During specific implementation, the relative position relationship and the size containing relationship of the face and the head in the target image can be determined according to the first position relationship information of the face and the head, and then the conversion relationship between the target head region corresponding to the head and the face region to be matched can be determined according to the determined relative position relationship, the determined size containing relationship and the face region to be matched corresponding to the face. That is, the second positional relationship information between the face region to be matched and the target head region is determined.

And secondly, acquiring a target head area corresponding to the head of the target matching object from the target image based on the second position relation information and the face area to be matched.

In this step, the face area to be matched may be converted into the target head area in the target image by using the conversion relationship corresponding to the determined second position relationship information with reference to the face area to be matched, that is, the target head area corresponding to the head is obtained.

Taking the face and head of the target person as an example, based on the face and head, it may be determined that the head contains a face. Furthermore, a conversion relation between the face and the head can be determined based on the position relation and the size containing relation between the face and the head, and a target head area corresponding to the head can be determined by using the conversion relation.

Fig. 4 is a schematic diagram of a face region to be matched and a target head region according to an embodiment of the present disclosure. Wherein, A represents the face area to be matched, and B represents the target head area. In specific implementation, after the face area to be matched is determined, a first height and a first width of a first rectangle corresponding to the face area to be matched may be determined, and further, a second width corresponding to a first preset multiple of the first width, a second height corresponding to a second preset multiple of the first height, and a third height corresponding to a third preset multiple of the first height may be determined.

Then, the center of the rectangle corresponding to the face region to be matched may be used as the center of the second rectangle corresponding to the target head region, the doubled second width may be used as the width of the second rectangle, and the height corresponding to the second width and the third height may be used as the height of the second rectangle, so that the second rectangle corresponding to the target head region may be determined. And then taking the image area corresponding to the second rectangle as the target head area. In one embodiment, the first preset multiple and the second preset multiple may be 1.5 times, and the third preset multiple may be 2.5 times. The specific settings of the first preset multiple, the second preset multiple and the third preset multiple may be set according to actual needs, and are not limited herein.

In an embodiment, for S104, the target head region may be semantically segmented according to the method shown in fig. 5 to obtain a head segmentation result of the target matching object, and as shown in fig. 5, a flowchart of a method for semantically segmenting the target head region provided by the embodiment of the present disclosure may include the following steps:

s501: based on the target head region, a region segmentation image corresponding to the target head region is cut out from the target image.

Here, after the target head region is determined, a partial image corresponding to the target head region may be clipped from the target image as a region segmentation image corresponding to the target head region based on a position of the target head region in the target image.

S502: and extracting the regional characteristic information and the structural information of the regional segmentation image by using a pre-trained deep neural network.

Here, the region feature information may include color feature information, feature points, structured information, and the like, where the feature points may be determined according to pixel points in the target header region, and have a corresponding relationship with the pixel points in the target header region. The color feature information can reflect the color of the feature point. The structured information is information for characterizing a positional relationship between feature points in the region feature information.

In specific implementation, after the target head region is determined, the region segmentation image corresponding to the target head region may be input to a pre-trained deep neural network, and then, the deep neural network may extract region feature information of the region segmentation image and may also extract structural information of the region segmentation image.

In addition, the pre-trained deep neural network mentioned in the embodiment of the present disclosure may also be adjusted according to the processing capability of the computer device to which it is applied, and is compatible with various computer devices on the basis of not affecting the segmentation accuracy.

S503: and extracting the region feature information respectively corresponding to different feature dimensions in the region segmentation image.

The plurality of feature dimensions comprise a first feature dimension and a second feature dimension which are adjacent to each other, and the first feature dimension is lower than the second feature dimension.

The region feature information corresponding to the first feature dimension is determined based on the region feature information corresponding to the second feature dimension and the structured information of the region feature information corresponding to the second feature dimension.

And the structural information of the region feature information corresponding to the second feature dimension is extracted by using the pre-trained deep neural network.

In this step, the step of extracting the region feature information corresponding to different feature dimensions in the region segmentation image is performed by a pre-trained deep neural network. The pre-trained deep neural network comprises feature extractors respectively corresponding to a plurality of feature dimensions; each feature extractor may extract regional feature information in its corresponding feature dimension. Based on a plurality of feature extractors, the region feature information respectively corresponding to different feature dimensions in the region segmentation image can be respectively extracted. For example, a pre-trained deep neural network may include 4 feature extractors, which can extract region feature information in 4 feature dimensions.

The characteristic dimension may be an image resolution, and the region segmentation image corresponding to the target head region has an initial image resolution. Specifically, the initial image resolution may also be an image resolution that the target image has.

In specific implementation, firstly, the pre-trained deep neural network can be used to extract the region feature information and the structural information corresponding to the initial image resolution of the region segmentation image. And then taking the initial image resolution as a second feature dimension, and determining the region feature information corresponding to the first feature dimension based on the region feature information and the structural information corresponding to the second feature dimension. Moreover, when the region feature information corresponding to the first feature dimension is determined, the structural information of the region feature information can also be determined.

Then, the first feature dimension may be used as a new second feature dimension, and region feature information and structured information corresponding to a next first feature dimension lower than the new second feature dimension may be determined. Therefore, the region feature information respectively corresponding to different feature dimensions in the region segmentation image and the structural information in the region feature information can be respectively extracted. And the characteristic dimension corresponding to the initial image resolution is the highest characteristic dimension.

S504: and performing semantic segmentation on the region segmentation image based on the region feature information respectively corresponding to different feature dimensions to obtain a head segmentation result of the target matching object.

The region feature information in the high feature dimension can reflect the depth features of the edge part of the region segmentation image, and the region feature information in the low feature dimension can reflect the depth features of the main part of the region segmentation image.

In specific implementation, based on the region feature information corresponding to different feature dimensions, a main region and an edge region belonging to a head in a region segmentation image corresponding to a target head region can be determined, and then, a region corresponding to the head in the region segmentation image can be determined.

Furthermore, based on the determined region corresponding to the head, semantic segmentation of the region segmentation image can be completed, and a head segmentation result of the target matching object is obtained.

In one embodiment, the structured information of the region feature information corresponding to the second feature dimension includes a positional relationship between the first feature points in the region feature information corresponding to the second feature dimension. Regarding the step of determining the region feature information corresponding to the first feature dimension, a method shown in fig. 6 may be adopted, and as shown in fig. 6, a flowchart of a method for determining the region feature information corresponding to the first feature dimension provided by the embodiment of the present disclosure may include the following steps:

s601: and for each second feature point in the first feature dimension, screening a first target feature point corresponding to the second feature point from the first feature points corresponding to the second feature dimension based on the position information of the second feature point.

Here, the region feature information corresponding to each feature dimension includes different numbers of feature points, and the number of second feature points in the region feature information corresponding to the first feature dimension is smaller than the number of first feature points in the region feature information corresponding to the second feature dimension. That is, the number of feature points corresponding to the low image resolution is smaller than the number of feature points corresponding to the high image resolution.

And each second feature point corresponding to the first feature dimension has a first feature point corresponding to the second feature point in the second feature dimension.

In this step, for each second feature point in the first feature dimension, the position information of the second feature point may be determined, and for each first feature point in the second feature dimension, the position information of the first feature point may also be determined. Then, based on the position information of each second feature point in the first feature dimension and the position information of each first feature point in the second feature dimension, a first feature point and a second feature point with the same position can be determined in the first feature dimension and the second feature dimension respectively, and the second feature point is used as a first target feature point corresponding to the first feature point which is screened out. That is, the first target feature point corresponding to each second feature point in the first feature dimension may be determined from the first feature points corresponding to the second feature dimensions.

S602: based on the first feature dimension and the second feature dimension, a target number of second feature points in the first feature dimension corresponding to the first feature points in the second feature dimension is determined.

Here, the region feature information of one second feature point in the first feature dimension may be determined from the region feature information of a plurality of first feature points in the second feature dimension.

In specific implementation, the target number of the second feature point in the first feature dimension corresponding to the first feature point in the second feature dimension may be determined based on a conversion relationship between the first feature dimension and the second feature dimension. For example, a second feature point in one first feature dimension may correspond to a second feature point in 10 second feature dimensions.

S603: and screening second target characteristic points of the target quantity from the first characteristic points corresponding to the second characteristic dimensions based on the position relation among the first characteristic points and the position information of the first target characteristic points.

In this step, based on the structured information corresponding to the second feature dimension, a position relationship between first feature points in the second feature dimension may be determined, and then, for each determined first target feature point, a target number of first feature points may be selected and screened from the first feature points corresponding to the second feature dimension as second target feature points according to the position relationship between the position information of the first target feature point and the first feature points.

In specific implementation, the first feature points with preset distance from the first target feature point and target quantity are screened from the first feature points corresponding to the second feature dimension according to the position information of the first target feature point and the position relationship between the first feature points, and the first feature points are used as second target feature points.

S604: and determining the region feature information of the second feature points based on the region feature information of the second target feature points, and determining the region feature information corresponding to the first feature dimension based on the region feature information of each second feature point in the determined first feature dimension.

Here, from the area feature information of each of the determined target number of second target feature points, the area feature information of the second feature point in the first feature dimension corresponding to the second target feature point may be determined.

Further, based on the above steps, the regional characteristic information of each second characteristic point in the first characteristic dimension can be determined, and the regional characteristic information corresponding to the first characteristic dimension can be determined based on the regional characteristic information of each second characteristic point.

In this way, the region feature information corresponding to different feature dimensions in the region-divided image corresponding to the target head region can be extracted.

In specific implementation, for the region feature information corresponding to the second feature dimension, the feature extractor corresponding to the first feature dimension may perform downsampling on the region feature information corresponding to the second feature dimension in a downsampling manner to determine the region feature information corresponding to the first feature dimension.

Further, after obtaining the region feature information corresponding to different feature dimensions, semantic segmentation may be performed on the region segmentation image corresponding to the target head region according to the method shown in fig. 7, as shown in fig. 7, which is a flowchart of a method for performing semantic segmentation on the target head region provided in the embodiment of the present disclosure, the method may include the following steps:

s701: and determining a first semantic prediction result of the region segmentation image under the characteristic dimension based on the region characteristic information corresponding to the characteristic dimension for each characteristic dimension in different characteristic dimensions.

Here, the first semantic prediction result is used to represent the probability that the pixel point in the region segmentation image is the pixel point corresponding to the head.

In specific implementation, for each feature dimension, after the feature extractor corresponding to the feature dimension in the deep neural network determines the region feature information corresponding to the feature dimension, the classifier corresponding to the feature dimension may determine the first semantic prediction result of the region segmentation image in the feature dimension according to the region feature information corresponding to the feature dimension.

Further, based on each classifier in the deep neural network, a first semantic prediction result of the region segmentation image in different feature dimensions may be determined.

In one embodiment, for S701, a first semantic prediction result of the region segmentation image in each feature dimension may be determined according to the following steps:

step one, aiming at the lowest feature dimension, determining a first semantic prediction result of the region segmentation image under the lowest feature dimension based on region feature information corresponding to the lowest feature dimension.

After obtaining the region feature information corresponding to each feature dimension, for a lowest feature dimension, the classifier corresponding to the lowest feature dimension may output a first semantic prediction result of the region segmentation image in the lowest feature dimension according to the region feature information corresponding to the lowest feature dimension.

And step two, aiming at each second feature dimension except the lowest feature dimension, determining a first semantic prediction result of the region segmentation image under the second feature dimension based on the region feature information corresponding to the second feature dimension and the first semantic prediction result of the region segmentation image under the first feature dimension.

Here, since the first feature dimension is lower than the second feature dimension, the lowest feature dimension must be one first feature dimension. After the classifier corresponding to the lowest feature dimension determines the first semantic prediction result in the lowest feature dimension, the classifier corresponding to the second feature dimension corresponding to the lowest feature dimension may determine the first semantic prediction result of the region segmentation image in the second feature dimension based on the first semantic prediction result in the lowest feature dimension and the region feature information in the second feature dimension. Furthermore, the classifier corresponding to each second feature dimension may determine the first semantic prediction result in the second feature dimension based on the first semantic prediction result in the first feature dimension and the region feature information in the second feature dimension.

In specific implementation, the classifiers corresponding to different feature dimensions may perform upsampling on the first semantic prediction result corresponding to the low feature dimension and the region feature information of the second feature dimension in an upsampling manner, and determine the first semantic prediction result of the second feature dimension.

S702: and determining the probability that each pixel point in the region segmentation image is the pixel point corresponding to the head part based on the first semantic prediction result of the region segmentation image in each feature dimension.

Here, after the first semantic prediction result under each feature dimension is obtained, multiple times of fusion processing may be performed according to a sequence from low to high of different feature dimensions, and then, a probability that each pixel point in the region segmentation image is a pixel point corresponding to the head may be obtained.

S703: and performing semantic segmentation on the region segmentation image based on the probability that each pixel point in the region segmentation image is the pixel point corresponding to the head and a preset segmentation probability value.

In specific implementation, the probability that each pixel point in the region segmentation image is the pixel point corresponding to the head can be compared with a preset segmentation probability value, the pixel point is taken as the pixel point corresponding to the head under the condition that the probability is determined to be greater than the preset segmentation probability value, and the pixel point is determined not to be the pixel point corresponding to the head under the condition that the probability is determined not to be greater than the preset segmentation probability value.

Furthermore, the head pixel point and the non-head pixel point in the region segmentation image can be determined, and based on the determined result, the semantic segmentation of the region segmentation image is completed to obtain a head segmentation result.

In a specific implementation, the head segmentation result may be a head segmentation image corresponding to the head.

In an embodiment, for the ith fusion process in the multiple fusion processes, the fusion process may be performed according to the fusion process procedure shown in fig. 8, and as shown in fig. 8, a flowchart of a fusion process procedure provided for an embodiment of the present disclosure may include the following steps:

s801: semantic confidence information of the first semantic prediction result under the first feature dimension is determined.

Here, the first feature dimension is the lowest feature dimension.

In specific implementation, a second feature dimension corresponding to the first feature dimension may be determined, and a fusion structure corresponding to the second feature dimension may be determined. Then, the fusion structure corresponding to the second feature dimension can determine the semantic confidence of the first semantic prediction result under the first feature dimension according to the activation function formula therein, so as to obtain semantic confidence information.

Wherein, the activation function formula can be a formula of a softmax function.

S802: and fusing the first semantic prediction result under the first characteristic dimension and the first semantic prediction result under the second characteristic dimension by utilizing the semantic confidence information of the first semantic prediction result under the first characteristic dimension to obtain a target semantic prediction result under the second characteristic dimension.

In this step, after obtaining the semantic confidence information of the first semantic prediction result in the first feature dimension, the fusion structure may fuse the first semantic prediction result in the first feature dimension and the first semantic prediction result in the second feature dimension based on the semantic confidence information to obtain the target semantic prediction result in the second feature dimension.

In specific implementation, according to semantic confidence information of the first semantic prediction result of each pixel point, the semantic confidence of the first semantic prediction result of each pixel point is determined, the semantic confidence of the first semantic prediction result of each pixel point is compared with a preset confidence threshold, and under the condition that the semantic confidence is not smaller than the preset confidence threshold, the first semantic prediction result of the pixel point corresponding to the semantic confidence is used as the first semantic prediction result in the second feature dimension. And under the condition that the semantic confidence is smaller than the preset confidence threshold, taking a first semantic prediction result of the pixel point corresponding to the semantic confidence under the second characteristic dimension as a target semantic prediction result.

S803: and updating the target semantic prediction result into a first semantic prediction result of a first feature dimension in the (i + 1) th fusion process.

In specific implementation, the second feature dimension may be used as a new first feature dimension, and the target semantic prediction result in the second feature dimension may be updated to a new first feature dimension first semantic prediction result in the (i + 1) th fusion process.

Based on the steps, a target semantic prediction result under the highest feature dimension can be determined, wherein the target semantic prediction result is also used for representing the probability that each pixel point in the region segmentation image is the pixel point corresponding to the head.

Therefore, based on the target semantic prediction result under the highest characteristic dimension, the probability that each pixel point in the region segmentation image is the pixel point corresponding to the head can be determined.

As shown in fig. 9, a schematic diagram of performing semantic segmentation processing on a region segmentation image corresponding to a target head region by using a deep neural network provided in the embodiment of the present disclosure is provided, where a feature extractor a, a feature extractor b, a feature extractor c, and a feature extractor d are feature extractors corresponding to different feature dimensions, and can extract region feature information corresponding to different feature dimensions in the region segmentation image corresponding to the target head region. The feature extractor a can extract region feature information corresponding to a feature dimension X1, the feature extractor b can extract region feature information corresponding to a feature dimension X2, the feature extractor c can extract region feature information corresponding to a feature dimension X3, and the feature extractor d can extract region feature information corresponding to a feature dimension X4, wherein X1 is higher than X2 and X3 is higher than X4. The classifier a, the classifier b, the classifier c and the classifier d are classifiers corresponding to different feature dimensions, the classifier d can determine a first semantic prediction result under the feature dimension X4 based on region feature information corresponding to the feature dimension X4, the classifier c can determine a first semantic prediction result under the feature dimension X3 based on the first semantic prediction result under the feature dimension X4 and region feature information corresponding to the feature dimension X3, the classifier b can determine a first semantic prediction result under the feature dimension X2 based on the first semantic prediction result under the feature dimension X3 and region feature information corresponding to the feature dimension X2, and the classifier a determines a first semantic prediction result under the feature dimension X1 based on the first semantic prediction result under the feature dimension X2 and region feature information corresponding to the feature dimension X1. The fusion structure c may determine the target semantic prediction result under the feature dimension X3 corresponding to the classifier c based on the first semantic prediction result output by the classifier d and the first semantic prediction result output by the classifier c, the fusion structure b may determine the target semantic prediction result under the feature dimension X2 corresponding to the classifier b based on the target semantic prediction result output by the fusion structure c and the first semantic prediction result output by the classifier b, the fusion structure a may determine the target semantic prediction result under the feature dimension X1 corresponding to the classifier a based on the target semantic prediction result output by the fusion structure b and the first semantic prediction result output by the classifier a, and thereafter, the deep neural network can complete semantic segmentation of the target head region based on the target semantic prediction result under the characteristic dimension X1 corresponding to the classifier a, so as to obtain a head segmentation image.

In addition, the key point detection neural network mentioned in the embodiment of the present disclosure may also determine the target key point corresponding to the face of each object to be matched based on the feature information and the structural information of the extracted target image in different feature dimensions. For a specific implementation process, the process of extracting the region feature information and the structural information of the region segmentation image corresponding to the target head region may be referred to, and details are not repeated here.

In an embodiment, after the segmentation result of the head is determined, the color value of the pixel point corresponding to the head in the target image may also be set as the first target value based on the determined segmentation result. And setting the color value of the pixel point in the target image except the head as a second target value.

In specific implementation, after the segmentation result of the head is obtained, the color value of each pixel point in the target image may be uniformly set as the second target value, for example, the color value corresponding to black. Then, the pixel point belonging to the head in the target image may be determined based on the position of each pixel point belonging to the head in the segmented image and the position of each pixel point in the target image, and the color value of the pixel point may be set as a first target value, for example, a color value corresponding to white. Thus, a head segmentation image corresponding to the target image can be obtained.

And, in an embodiment, the trained deep neural network may directly output a head segmentation image corresponding to the target image.

Fig. 10 is a schematic flow chart illustrating a process of determining a head segmentation image corresponding to a target image according to an embodiment of the present disclosure. The method comprises the steps of obtaining a target image, determining a region segmentation image, and outputting a head segmentation image corresponding to the target image, wherein C represents an image which is determined after key point detection is carried out on the target image and comprises target key points and a face region to be matched, A represents the face region to be matched, D represents the region segmentation image which is cut from the target image based on the determined target head region, and E represents the head segmentation image corresponding to the output target image.

In an embodiment, after obtaining the head segmentation result of the target matching object, a special effect processing request for a target head region of the target image may also be obtained.

Here, the special effect processing request for the target head region of the target image may be a user-initiated request for performing special effect processing on the head in the target head region of the target matching object in the target image, for example, special effect processing of adding a head hanger, adding an expression, enhancing brightness, contrast, and the like to the head in the target head region of the target matching object in the target image.

Further, the target head region may be subjected to special effect processing based on the special effect processing request and the head segmentation result.

Here, based on the determined head segmentation result and the target image, a specific position of the head in the target head region in the target image is determined, and thereafter, the determined head may be subjected to the special effect processing in response to the special effect processing request. For example, a cartoon head pendant is added to the head of the target matching object in the target image.

Alternatively, color change information for a hair region in the target head region of the target image may also be acquired.

Here, the user may also initiate a color changing request for a hair region corresponding to the head in the target head region, where the color changing request may include color changing information, and the color changing information may include a target color value.

Thereafter, the color change processing may be performed on the hair region based on the color change information and the head segmentation result.

In specific implementation, after determining the specific position of the head in the target head region in the target image based on the head segmentation result, the hair region corresponding to the head may be further determined. Then, based on the color changing information, color changing processing can be performed on each pixel point corresponding to the hair area, and color changing of the hair area is completed. Specifically, the color value of each pixel point corresponding to the hair region can be adjusted to the target color value in the color change information.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same inventive concept, an image processing apparatus corresponding to the image processing method is also provided in the embodiments of the present disclosure, and since the principle of the apparatus in the embodiments of the present disclosure for solving the problem is similar to the image processing method described above in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not described again.

As shown in fig. 11, a schematic diagram of an image processing apparatus provided in an embodiment of the present disclosure includes:

a first obtaining module 1101, configured to obtain, from a target image, a face region to be matched corresponding to a face of at least one object to be matched;

a matching module 1102, configured to filter a target matching object matched with a target tracking object from the at least one object to be matched based on first image feature information corresponding to a face of the target tracking object and second image feature information corresponding to each face region to be matched;

a second obtaining module 1103, configured to obtain, from the target image, a target head region corresponding to a head of a target matching object based on a face region to be matched of the target matching object;

and a segmentation module 1104, configured to perform semantic segmentation on the target head region to obtain a head segmentation result of the target matching object.

In a possible implementation manner, the first obtaining module 1101 is configured to perform keypoint detection on the target image, and determine a target keypoint corresponding to a face of each object to be matched in at least one object to be matched included in the target image;

In a possible implementation manner, the matching module 1102 is configured to match first image feature information corresponding to the face of the target tracking object with second image feature information corresponding to each face region to be matched, and use an object to be matched corresponding to the second image feature information matched with the first image feature information as the target matching object.

In a possible embodiment, the apparatus further comprises:

a third obtaining module 1105, configured to obtain a tracking image before the first obtaining module obtains a to-be-matched face region corresponding to the face of at least one to-be-matched object from the target image;

In a possible implementation manner, the third obtaining module 1105 is configured to perform keypoint detection on the tracking image, and determine a target keypoint corresponding to a face of each initial object in at least one initial object included in the tracking image;

In a possible implementation manner, the second obtaining module 1103 is configured to determine, based on first positional relationship information of the face and the head, second positional relationship information between the face region to be matched and the target head region;

In a possible implementation manner, the matching module 1102 is further configured to determine a new target tracking object based on the keypoint confidence information of the target keypoint corresponding to the face of each object to be matched, when it is determined that there is no second image feature information matching the first image feature information.

In a possible implementation manner, the segmentation module 1104 is configured to crop out, based on the target head region, a region segmentation image corresponding to the target head region from the target image;

In a possible implementation manner, the segmentation module 1104 is configured to determine, for each of the different feature dimensions, a first semantic prediction result of the region segmentation image in the feature dimension based on region feature information corresponding to the feature dimension;

In a possible implementation manner, the segmentation module 1104 is configured to perform multiple times of fusion processing according to the order from low to high of the different feature dimensions, and then obtain a probability that each pixel point in the region segmentation image is a pixel point corresponding to the head;

In a possible embodiment, the apparatus further comprises:

a setting module 1106, configured to, after the segmentation module 1104 obtains a head segmentation result of the target matching object, set a color value of a pixel point corresponding to a head in the target image as a first target value based on the segmentation result;

In a possible embodiment, the apparatus further comprises:

a processing module 1107, configured to obtain a special effect processing request for the target head region of the target image after the segmentation module 1104 obtains a head segmentation result of the target matching object;

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

An embodiment of the present disclosure further provides a computer device, as shown in fig. 12, which is a schematic structural diagram of a computer device provided in an embodiment of the present disclosure, and includes:

a processor 1201 and a memory 1202; the memory 1202 stores machine-readable instructions executable by the processor 1201, the processor 1201 being configured to execute the machine-readable instructions stored in the memory 1202, the processor 1201 performing the following steps when the machine-readable instructions are executed by the processor 1201: s101: acquiring a face area to be matched corresponding to the face of at least one object to be matched from a target image; s102: screening a target matching object matched with the target tracking object from at least one object to be matched based on first image characteristic information corresponding to the face of the target tracking object and second image characteristic information corresponding to each face area to be matched; s103: based on the face region to be matched of the target matching object, acquiring a target head region corresponding to the head of the target matching object from the target image, and S104: and performing semantic segmentation on the target head region to obtain a head segmentation result of the target matching object.

The storage 1202 includes a memory 1221 and an external storage 1222; the memory 1221 is also referred to as an internal memory, and temporarily stores operation data in the processor 1201 and data exchanged with the external memory 1222 such as a hard disk, and the processor 1201 exchanges data with the external memory 1222 via the memory 1221.

For the specific execution process of the instruction, reference may be made to the steps of the image processing method described in the embodiments of the present disclosure, and details are not described here.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the image processing method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The computer program product of the image processing method provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute steps of the image processing method described in the above method embodiments, which may be referred to specifically for the above method embodiments, and are not described herein again.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implementing, and for example, a plurality of units or components may be combined, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. An image processing method, comprising:

2. The method according to claim 1, wherein the obtaining, from the target image, a face region to be matched corresponding to a face of at least one object to be matched comprises:

3. The method according to claim 1 or 2, wherein the screening of the target matching object matching the target tracking object from the at least one object to be matched based on the first image feature information corresponding to the face of the target tracking object and the second image feature information corresponding to each face region to be matched comprises:

4. The method according to any one of claims 1 to 3, before the obtaining, from the target image, a face region to be matched corresponding to a face of at least one object to be matched, further comprising:

acquiring a tracking image;

5. The method of claim 4, wherein determining a target tracking object based on the tracking image comprises:

6. The method according to any one of claims 1 to 5, wherein the obtaining a target head region corresponding to the head of a target matching object from the target image based on a face region to be matched of the target matching object comprises:

7. The method of claim 3, further comprising:

8. The method according to any one of claims 1 to 7, wherein the performing semantic segmentation on the target head region to obtain a head segmentation result of the target matching object comprises:

9. The method according to claim 8, wherein the performing semantic segmentation on the region segmentation image based on the region feature information respectively corresponding to different feature dimensions comprises:

10. The method according to claim 9, wherein the determining, based on the first semantic prediction result of the region-segmented image in each feature dimension, a probability that each pixel point in the region-segmented image is a head-corresponding pixel point comprises:

11. The method according to any one of claims 1 to 10, further comprising, after the obtaining the head segmentation result of the target matching object:

12. The method according to any one of claims 1 to 11, further comprising, after the obtaining the head segmentation result of the target matching object:

13. An image processing apparatus characterized by comprising:

14. A computer device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, the processor for executing the machine-readable instructions stored in the memory, the processor performing the steps of the image processing method according to any one of claims 1 to 12 when the machine-readable instructions are executed by the processor.

15. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when executed by a computer device, performs the steps of the image processing method according to any one of claims 1 to 12.