WO2018176281A1

WO2018176281A1 - Sketch image generation method and device

Info

Publication number: WO2018176281A1
Application number: PCT/CN2017/078637
Authority: WO
Inventors: 谭文伟; 林倞; 张冬雨
Original assignee: 华为技术有限公司
Priority date: 2017-03-29
Filing date: 2017-03-29
Publication date: 2018-10-04
Also published as: CN110023989A; CN110023989B

Abstract

Disclosed are a sketch image generation method and device, applicable for solving the issue in which prior art face sketch image automatic generation technology is less accurate, has poorer generalization capabilities and slower sketch image generation. The method comprises: acquiring a face image to be processed; acquiring, by means of P convolutional layers of a first network branch in a pre-trained deep convolutional neural network model, facial sketch features of the face image so as to obtain a facial structure sketch image; acquiring, by means of P convolutional layers of a second network branch in the deep convolutional neural network model, hair sketch features of the face image so as to obtain a hair texture sketch image; and synthesizing the facial structure sketch image and the hair texture sketch image to obtain a sketch image of the face image.

Description

Method and device for generating sketch image

Technical field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for generating a sketch image.

Background technique

The sketch portrait automatic generation refers to a process of automatically generating a face image with a sketch style by inputting a face image.

The automatic generation technology of face sketch images has important applications in many fields. For example, in the field of public safety, a sketch image generated based on a photo of a suspect’s ID card can be used to compare with a sketch image drawn according to the witness’s description, thereby assisting the public security organ in determining the identity of the suspect; in the animation industry and The field of social networking is mainly used to render sketches of people's photos in a stylized manner.

The current face sketch image automatic generation technology is mainly based on a synthetic method, that is, a complete sketch image is synthesized by using similar parts in the sample image with the input image.

Specifically, first, a database including a plurality of sample image blocks and a sketch image block corresponding to each sample image block is created, wherein each sample image block respectively includes different feature information related to the face, such as facial features and faces. Decorative information such as decorations, hair, and beards. Secondly, the input image is divided into a plurality of image blocks, and a sample image block similar to the image block is searched in the database for each image segmentation, and a sketch image block corresponding to each image block similar to the image block is obtained, and all the acquired sketches are obtained. The image block is combined into a sketch image. Then, the multi-scale Markov Random Filed (MRF) algorithm model is used to remove the edges between adjacent sub-blocks in the synthesized sketch image to obtain a relatively natural sketch image.

However, the synthetic face-based sketch image automatic generation technology smoothes the synthesized sketch image by the MRF algorithm model, so that some features such as flaws and scars on the face of the synthesized sketch image are smoothed out. As a result, the synthesized sketch image does not retain the texture detail information in the face photo well. And the synthetic face-based sketch image automatic generation technology usually needs to establish a sample database, and the sample information included in the sample database is related to the face, and the sample database in the established sample database has a limited number of samples and cannot cover enough samples. Data, so when the elements not included in the sample data appear in the face image, the synthetic face-based sketch image automatic generation technology cannot accurately generate the sketch image, so the accuracy of the synthetic face-based sketch image automatic generation technique is better. Low and generalization ability is poor. Moreover, based on the synthetic face sketch image automatic generation technology, when the original image is generated into a sketch image, the original data is searched and compared with all the sample image blocks, and all the acquired sketch image blocks are synthesized, resulting in a large workload. Sketch images are generated slowly.

Summary of the invention

The embodiment of the present application provides a method and a device for generating a sketch image, which are used to solve the problem that the automatic generation technology of the face sketch image existing in the prior art has low accuracy, poor generalization ability, and slow sketch image generation. problem.

In a first aspect, an embodiment of the present application provides a method for generating a sketch image, which may be applied to an electronic device, including:

After the electronic device acquires the image of the face to be processed, the first network in the deep convolutional neural network model is pre-trained Obtaining a facial sketch feature in the face image, obtaining a facial structure sketch map, and obtaining the P convolution layer of the second network branch in the deep convolutional neural network model The hair sketch feature in the face image obtains a hair texture sketch map, wherein the P is an integer greater than zero.

The facial structure sketch map and the hair texture sketch map are then synthesized to obtain a sketch image of the face image.

In the embodiment of the present application, based on a deep convolutional neural network, a design includes a first network branch for generating a face feature, and a structure for generating a second network branch including a hair feature, from a large number of training samples. The effective feature expression is learned, and the network model which can generate the accurate and natural face sketch image of the original image is trained to realize the automatic generation of the face sketch image. Compared with the prior art synthetic face-based sketch image automatic generation technology, the technique of generating a face sketch image based on a deep convolutional neural network no longer depends on the sample database, but through the deep convolutional neural network. A network branch generates a structural sketch map including facial features, generates a structural sketch map including hair features through a second network branch in the deep convolutional neural network, and then synthesizes the structural sketch map and the texture sketch map to obtain a final human face sketch image. It improves the accuracy and generalization ability of the face sketch image generation technology, and reduces the workload in the face sketch image generation process, thereby improving the speed of face sketch image generation.

Optionally, each of the convolutional layers in the deep convolutional neural network model has a modified linear unit (English: Rectified Linear Units, referred to as: ReLU) as an activation function. The convolution kernel size model used by each of the convolutional layers in the deep convolutional neural network model is an r x r model.

In a possible design, the first N convolutional layers of the first network branch are the same as or coincide with the first N convolutional layers of the second network branch, and the N is an integer greater than 0 and less than P.

Specifically, the first N convolution layers of the first network branch are the same as the first N convolution layers of the second network branch, or the first N convolution layers of the first network branch and the The first N convolutional layers of the second network branch share the first N convolutional layers in the deep convolutional neural network model.

The first N convolution layers of the first network branch in the embodiment of the present application are the same as or coincide with the first N convolution layers of the second network branch, which improves the computational efficiency of the deep convolutional neural network model.

In a possible design, the obtaining, by the P convolution layers of the first network branch in the deep convolutional neural network model, the facial sketch features in the face image, including:

Filtering background features in the face image by using the first N convolution layers of the first network branch in the deep convolutional neural network model to obtain a facial feature map;

Obtaining a facial sketch feature in the face feature map by the last M convolution layers of the first network branch.

Obtaining the hair sketch feature in the face image by using the P convolution layers of the second network branch in the deep convolutional neural network model, including:

Filtering background features in the face image by using the first N convolution layers of the second network branch in the deep convolutional neural network model to obtain a facial feature map;

Obtaining a hair sketch feature in the face feature map by using the last M convolution layers of the second network branch;

Where P = M + N.

In the above design, the first N convolutional layers of the first network branch are used to filter the background features in the face image to be processed, and the last M convolutional layers are used to obtain the facial structure sketch map; the second network branch The first N convolutional layers are used to filter the background features in the face image to be processed, and the M convolutional layers are used to obtain the texture sketch of the hair texture, which improves the accuracy of the face sketch image generation technology and improves the accuracy. The speed at which face sketch images are generated.

In a possible design, the convolution kernel size of the last M convolutional layers of the first network branch, and the second The convolution kernel sizes of the last M convolutional layers of the network branch are equal.

The convolution kernel size of the last M convolutional layers of the first network branch in the above design is equal to the convolution kernel size of the last M convolutional layers of the second network branch, and the face sketch image generation is improved. The accuracy of the technology.

In a possible design, the M is 2, the convolution kernels of the last two convolution layers of the first network branch are equal in size, and the convolutions of the last two convolution layers of the second network branch are The core size is equal.

In a possible design, the N is 4, and the first N convolution layers of the first network branch in the deep convolutional neural network model are filtered to filter background features in the face image, including:

The background features in the horizontal direction and the vertical direction of the face image are filtered by the first convolution layer and the second convolution layer in the first N convolution layers of the first network branch in the deep convolutional neural network model.

Through the third convolutional layer and the fourth convolutional layer in the first N convolutional layers of the first network branch in the deep convolutional neural network model, the face image filtered for the background feature is horizontally and vertically Smoothing in the direction.

In the above design, the first convolutional layer and the second convolutional layer in the first N convolutional layers of the first network branch are used to filter the background features of the horizontal and vertical directions of the face image to be processed, and the third The convolution layer and the fourth convolution layer are used for smoothing the horizontal and vertical directions for the face image filtered by the background feature, thereby improving the accuracy of the face sketch image generation technique and causing the generated Sketch images are more natural.

In one possible design, the convolution kernel size of the first convolutional layer is equal to the convolution kernel size of the second convolutional layer, and the convolution kernel size of the third convolutional layer and the fourth convolutional layer The convolution kernel has the same size.

In the above design, the convolution kernel size of the first convolutional layer is equal to the convolution kernel size of the second convolutional layer, and the convolution kernel size of the third convolutional layer and the convolution kernel size of the fourth convolutional layer The same, the accuracy of the face sketch image generation technology is improved.

In one possible design, the method further includes:

Obtaining a hair probability that each pixel in the face image is a hair feature point;

And combining the facial structure sketch map and the hair texture sketch map to obtain a sketch image of the face image, which meets the following formula requirements:

S _(i,j) =(1-P _h(i,j) )×S _S(i,j) +P _h(i,j) ×S _t(i,j)

Wherein the S _{(i, j)} is a pixel value of a pixel point of the i-th row and the j-th column in the sketch image of the face image, and P _{h(i, j)} is a sketch image of the face image The hair probability of the pixel of the i-th row and the j-th column, S _{S(i, j)} is the pixel value of the pixel of the i-th row and the j-th column in the sketch image of the face structure, and S _{t(i, j)} is A pixel value of a pixel of the i-th row and the j-th column in the hair texture sketch map, wherein i, j are integers greater than zero.

In the above design, by using a facial image sketch map and a hair texture sketch map based on the hair probability to obtain a sketch image of the face image, the synthesized sketch image can not only retain the facial structure information well, but also the hair texture information. Also reserved is better.

In one possible design, the deep convolutional neural network model is trained as follows:

Performing training by inputting a plurality of personal face sample images in the training sample database into the initialized deep convolutional neural network model; the training sample database includes a plurality of personal face sample images and a sketch sample image corresponding to each face sample image, the initialized Deep convolutional neural network models include weights and offsets;

In the Kth training process, the background features in the face sample image are filtered by the first N convolution layers of the K-1 sub-depth convolutional neural network model to obtain the face sample image. a face feature map, the K being an integer greater than 0;

Passing the last M convolutional layers of the first network branch of the K-1 sub-depended deep convolutional neural network model, Obtaining a facial sketch feature in the face feature image of the face sample image, and obtaining a facial structure sketch map of the face sample image;

Acquiring the hair sketch feature in the face feature image of the face sample image by using the last M convolution layers of the second network branch of the K-1 adjusted deep convolutional neural network model, Sketch image of hair texture of face sample image;

Combining a facial structure sketch map of the face sample image and a hair texture sketch map of the face sample image to obtain a sketch image of the face sample image;

Obtaining an error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image after the Kth training;

The weight and offset used in the K+1th training process are adjusted based on an error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image.

In the above design, the deep convolutional neural network model is trained by using a large number of face sample images, and the sketch image is not dependent on the sample database when generating the sketch image of the face image to be processed, but can directly pass the trained depth volume. The neural network model generates the sketch image of the face image, which improves the accuracy and generalization ability of the face sketch image generation technology, and reduces the workload in the process of generating the face sketch image, thereby improving the face sketch. The speed of image generation.

In a possible design, during the Kth training, the background features in the face sample image are filtered by the first N convolution layers of the K-1 sub-adjusted deep convolutional neural network model. include:

Adding the face sample image and the pixel value of the pixel at the same position in the sketch average image to obtain a face enhancement image;

The pixel value of any one of the pixel points in the sketch average map is an average value of pixel values of pixel points in the same position in the sketch sample image in the training sample database that are at the same position as the any one of the pixel points;

The background features in the face enhancement image are filtered by the first N convolutional layers of the K-1 sub-depth convolutional neural network model.

In the above design, by adding the pixel values of the pixel points at the same position in the face sample image and the sketch average image, the face enhancement image is obtained, and the facial feature information and the hair feature of the face sample image are enhanced. Information, which improves the accuracy of face sketch image generation techniques.

In a possible design, the face feature map of the face sample image is obtained by the last M convolution layers of the first network branch of the K-1 sub-depth deep convolutional neural network model. Facial sketch features, including:

Dividing a face feature map of the face sample image into a plurality of mutually overlapping image blocks, and acquiring image blocks including facial feature information from the plurality of mutually overlapping image blocks;

Determining, for each of the image blocks including the facial feature information, the image block including the facial feature information corresponding to the target region in the facial feature map of the facial sample image, and the image in the target region And adding a pixel value of the pixel at the same position in each of the image blocks including the facial feature information to obtain a face enhancement feature map;

For each facial enhancement feature map, the facial sketch feature in the facial enhancement feature map is acquired by the last M convolutional layers of the first network branch of the K-1 adjusted deep convolutional neural network model.

In the above design, the face sample image is enhanced by adding the pixel values of the pixel points including the face feature information and the pixel points at the same position of the image block in the corresponding target region in the face feature map. The facial feature information enables the synthesized sketch image to retain the facial structure information well.

In a possible design, the face feature map of the face sample image is obtained by the last M convolution layers of the second network branch of the K-1 sub-depth deep convolutional neural network model. Hair sketch features, including:

Dividing a face feature map of the face sample image into a plurality of mutually overlapping image blocks, and acquiring image blocks including hair feature information from the plurality of mutually overlapping image blocks;

And adding, to each of the image blocks including the hair feature information, the pixel values of the pixel positions at the same position in the image block including the hair feature information to obtain a hair enhancement feature map;

For each hair enhancement feature map, the hair sketch feature in the hair enhancement feature map is obtained by the last M convolution layers of the second network branch of the K-1 adjusted deep convolutional neural network model.

In the above design, the face sample image is enhanced by adding the pixel values of the image block including the hair feature information and the pixel points at the same position of the image block in the corresponding target region in the face feature map. The hair feature information enables the synthesized sketch image to retain the hair texture information well.

In a possible design, the obtaining an image block including facial feature information from the plurality of mutually overlapping image blocks includes:

Determining, for each of the plurality of mutually overlapping image blocks, a face probability of each pixel point in each of the image blocks as a facial feature point; determining a number of pixel points whose face probability is not 0 When it is greater than the preset threshold, it is determined that each of the image blocks is an image block including facial feature information.

In the above design, by determining a face probability of each pixel point in each image block as a facial feature point, and then determining that the number of pixel points whose face probability is not 0 is greater than a preset threshold, determining each of the An image block is an image block including facial feature information, which improves the accuracy of acquiring an image block including facial feature information.

In a possible design, the obtaining an image block including hair feature information from the plurality of mutually overlapping image blocks includes:

Determining, for each of the plurality of mutually overlapping image blocks, a hair probability of each pixel point in each of the image blocks as a hair feature point; determining a number of pixel points whose hair probability is not zero When it is greater than the preset threshold, it is determined that each of the image blocks is an image block including hair feature information.

In the above design, by determining a face probability of each pixel point in each image block as a hair feature point, and then determining that the number of pixel points whose hair probability is not 0 is greater than a preset threshold, determining each of the An image block is an image block including hair feature information, which improves the accuracy of acquiring an image block including hair feature information.

In a second aspect, the embodiment of the present application provides a device for generating a sketch image, including:

An obtaining module, configured to obtain a face image to be processed;

a depth convolutional neural network model, configured to acquire a facial structure sketch map and a hair texture sketch map in the face image acquired by the obtaining module; the deep convolutional neural network model is pre-trained, including the first network a branch module and a second network branch module;

The first network branching module is configured to acquire a facial sketch feature in the facial image acquired by the acquiring module, to obtain a facial structure sketch map, where the first network branching module includes P convolutional layers. Wherein P is an integer greater than 0;

The second network branching module is configured to obtain a hair sketch feature in the face image acquired by the acquiring module, to obtain a hair texture sketch map; and the second network branching module includes P convolution layers;

And a synthesizing module, configured to synthesize the facial structure sketch map obtained by the first network branching module and the hair texture sketch map obtained by the second network branching module to obtain a sketch image of the facial image.

In a possible design, the first N convolution layers of the P convolutional layers included in the first network branching module are The first N convolutional layers in the P convolutional layers included in the second network branching module are the same or coincident, and the N is an integer greater than 0 and less than P.

In a possible design, the first network branching module is specifically configured to:

Filtering background features in the face image by using the first N convolution layers of the first network branching module to obtain a facial feature map;

Obtaining a facial sketch feature in the facial feature map by using the last M convolution layers of the first network branching module;

The second network branch module is specifically configured to:

Where P = M + N.

In a possible design, the convolution kernel size of the last M convolutional layers of the first network branching module is equal to the convolution kernel size of the last M convolutional layers of the second network branching module. .

In a possible design, the N is 4, and the first network branching module filters the background features in the face image when passing through the first N convolution layers of the first network branching module. Specifically for:

Filtering background features in a horizontal direction and a vertical direction of the face image by using a first convolution layer and a second convolution layer in the first N convolution layers of the first network branching module;

Smoothing in the horizontal direction and the vertical direction for the face image filtered by the background feature by the third convolution layer and the fourth convolution layer in the first N convolution layers of the first network branching module deal with.

In a possible design, the acquiring module is further configured to acquire a hair probability of each pixel point in the face image as a hair feature point;

The synthesis module is specifically configured to:

Combining the facial structure sketch map obtained by the first network branching module and the hair texture sketch map obtained by the second network branching module to obtain a sketch image of the face image, which meets the following formula requirements:

S _(i,j) =(1-P _h(i,j) )×S _S(i,j) +P _h(i,j) ×S _t(i,j)

In one possible design, the device further includes:

a training module for training the deep convolutional neural network model by:

The last M convolutions of the first network branch module through the K-1 adjusted deep convolutional neural network model a layer, acquiring a facial sketch feature in the face feature image of the face sample image, and obtaining a facial structure sketch map of the face sample image;

Obtaining a hair sketch feature in the face feature image of the face sample image by using the last M convolution layers of the second network branch module of the K-1 sub-depended deep convolutional neural network model a sketch of the hair texture of the face sample image;

In a possible design, the training module filters the face sample image by the first N convolution layers of the K-1 sub-depth convolutional neural network model during the Kth training process. When used in the background feature, specifically for:

In a possible design, the acquiring module is further configured to divide the face sample image into a plurality of mutually overlapping image blocks, and obtain facial feature information from the plurality of mutually overlapping image blocks. Image block

The training module acquires the face feature map of the face sample image in the last M convolution layers of the first network branch module of the deep convolutional neural network model that has undergone K-1 adjustments. When the facial sketch feature is used, it is specifically used to:

Determining, for each image block that includes the facial feature information acquired by the acquiring module, that each of the image blocks including the facial feature information is in a corresponding target region in the facial feature map of the facial sample image, and Adding pixel values of the image points in the target area and the pixel points of the same position in each of the image blocks including the facial feature information to obtain a face enhancement feature map;

Obtaining facial sketch features in the facial enhancement feature map by using the last M convolutional layers of the first network branching module of the K-1 sub-depended deep convolutional neural network model for each facial enhancement feature map .

In a possible design, the acquiring module is further configured to divide the face sample image into a plurality of overlapping image blocks, and obtain hair feature information from the plurality of mutually overlapping image blocks. Image block

The training module acquires the face feature map of the face sample image in the last M convolution layers of the second network branch module of the deep convolutional neural network model that has undergone K-1 adjustments. When sketching hair features, it is specifically used to:

And adding, to the image block including the hair feature information acquired by the acquiring module, the pixel sample values of the face sample image and the pixel position of the same position in the image block including the hair feature information to obtain a hair enhancement feature Figure

Obtaining the hair sketch feature in the hair enhancement feature map by using the rear M convolution layers of the second network branching module of the K-1 sub-depth deep convolutional neural network model for each hair enhancement feature map .

In a possible design, the acquiring module is specifically configured to: when acquiring an image block including facial feature information from the plurality of mutually overlapping image blocks:

In a possible design, the acquiring module is specifically configured to: when acquiring an image block including hair feature information from the plurality of mutually overlapping image blocks:

In a third aspect, an embodiment of the present invention further provides a deep convolutional neural network model, where the model includes a first network branching module and a second network branching module;

The first network branching module includes P convolution layers, and is configured to acquire a facial sketch feature in the face image acquired by the acquiring module, to obtain a facial structure sketch map; wherein the P is greater than 0. The integer.

The second network branching module includes P convolution layers, and is configured to obtain a hair sketch feature in the face image acquired by the acquiring module, to obtain a hair texture sketch map.

In a fourth aspect, an embodiment of the present application further provides a terminal, where the terminal includes a processor and a memory, where the memory is used to store a software program, and the processor is configured to read a software program stored in the memory and implement the first The method provided by one aspect or any of the above first aspects of the design. The electronic device can be a mobile terminal, a computer, or the like.

In a fifth aspect, the embodiment of the present application further provides a computer storage medium, where the software program stores a software program, where the software program can be implemented by one or more processors and can implement the first aspect or the first Any of the aspects provided by the design.

DRAWINGS

FIG. 1 is a schematic flowchart diagram of a method for generating a sketch image according to an embodiment of the present application;

2A is a schematic structural diagram of a first deep convolutional neural network model according to an embodiment of the present application;

2B is a schematic structural diagram of another first deep convolutional neural network model according to an embodiment of the present application;

FIG. 3 is a schematic flowchart of a method for filtering background features in a face image according to an embodiment of the present disclosure;

4 is a schematic structural diagram of a second deep convolutional neural network model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a process for generating a sketch image according to an embodiment of the present application; FIG.

FIG. 6A is a view of four face images to be processed according to an embodiment of the present application; FIG.

6B is an effect diagram of generating a sketch image of four face images to be processed according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a first deep convolutional neural network model according to an embodiment of the present application;

FIG. 8 is a schematic flowchart of a first deep convolutional neural network model training process according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a method for adding an image block according to an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a device for generating a sketch image according to an embodiment of the present application;

FIG. 11 is a schematic structural diagram of a deep convolutional neural network model according to an embodiment of the present application;

FIG. 12 is a schematic structural diagram of a terminal implementation manner according to an embodiment of the present disclosure.

detailed description

The embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

The embodiment of the present invention provides a method and a device for generating a sketch image, which are used to solve the problem that the automatic face generation image automatic generation technology in the prior art has low accuracy, poor generalization ability, and slow sketch image generation. . The method and the device are based on the same inventive concept. Since the principles of the method and the device for solving the problem are similar, the implementation of the device and the method can be referred to each other, and the repeated description is not repeated.

The embodiments of the present application can be applied to electronic devices, such as computers, tablets, notebooks, smart phones, servers, and the like.

The fields of application of the embodiments of the present application include, but are not limited to, a face image field, a vehicle image field, a plant image field, or other types of image fields.

Correspondingly, the embodiment of the present application is applied to the face image field, and when generating a face sketch image, a plurality of personal face sample images are used for training in advance; when applied to a vehicle image field, when generating a vehicle sketch image, a plurality of vehicle sample images are used in advance. Training; applied to the field of plant images, when plant sketch images are generated, several plant sample images are used in advance for training; when applied to other types of image fields, when generating other types of sketch images, several other types of sample images are pre-applied. Train.

Embodiments of the present application can be used to generate grayscale images in addition to being used to generate sketch images.

Correspondingly, the embodiment of the present application is applied to the field of face images, and when generating a face gray image, a plurality of face sketch sample images are used in advance for training; when applied to a vehicle image field, when generating a vehicle gray image, a plurality of The vehicle sketch sample image is used for training; in the field of plant image, when plant grayscale image is generated, several plant sketch sample images are used for training in advance; when applied to other types of image fields, other types of grayscale images are generated in advance. Several other types of sketch sample images are trained.

In order to make the embodiments of the present application easier to understand, in the following, some descriptions of the embodiments of the present application are first described in order to be understood by those skilled in the art, and the description should not be regarded as the scope of protection required by the present application. limited.

A convolutional neural network is a multi-layered neural network, each layer consisting of multiple two-dimensional planes, each of which consists of multiple independent neurons. In the embodiment of the present application, a neuron can be considered as one pixel.

A few, referring to two or more.

In addition, it should be understood that in the description of the present application, the terms "first", "second" and the like are used only to distinguish the purpose of description, and are not to be understood as indicating or implying relative importance, nor as an indication. Or suggest the order.

FIG. 1 is a flowchart of a method for generating a sketch image according to an embodiment of the present disclosure. The method is performed by an electronic device, and specifically includes the following:

Step S101: Acquire a face image to be processed.

It should be noted that, in step S101, the manner of acquiring the face image to be processed includes, but is not limited to, collecting a face image to be processed through the sensing device, acquiring a face image to be processed in a database, and the like.

The sensing device includes, but is not limited to, a light sensing device, an imaging device, an acquisition device, and the like.

The database includes, but is not limited to, a local database, a cloud database, a USB flash drive, a hard disk, and the like.

Step S102: Acquire a facial sketch feature in the face image by using P convolution layers of the first network branch in the pre-trained first deep convolutional neural network model, to obtain a facial structure sketch map, where the P Is an integer greater than 0.

Step S103: Acquire a hair sketch feature in the face image by using P convolution layers of the second network branch in the first deep convolutional neural network model to obtain a hair texture sketch map.

Step S104, synthesizing the facial structure sketch map and the hair texture sketch map to obtain a sketch image of the face image.

It should be noted that step S102 and step S103 are not strictly sequential. Step S103 may be performed after step S102, or step S103 may be performed after step S102, or step S102 and step S103 may be performed simultaneously. The embodiment is not specifically limited herein.

In the embodiment of the present application, the first deep convolutional neural network model may further include an input layer before the P convolution layers of the first network branch and the P convolution layers of the second network branch, where the input layer The number of filter channels is 3. After acquiring the image of the face to be processed, the electronic device processes the image of the face to be processed through the input layer to obtain three images, which are images including red (English: red, referred to as: R) elements, and green. (English: green, abbreviation: G) Image of the element, blue (English: blue, abbreviated as: B) element image. Further, an image of the R element, an image of the G element, and an image of the B element are input to the first convolutional layer. The first deep convolutional neural network model may also extract an element feature generation image separately for the luminance chrominance YUV element.

Each of the convolutional layers in the first deep convolutional neural network model may be an activation function by using a modified linear unit (English: Rectified Linear Units, referred to as ReLU).

In the embodiment of the present application, the convolution kernel (Convolution, referred to as Conv) used in each convolution layer in the first deep convolutional neural network model may be A*B, wherein both A and B are positive. The integers, A and B, may be equal or non-equal, and are not specifically limited in this embodiment.

It should be noted that the input and output of each convolution layer in the first deep convolutional neural network model have one or more feature maps, the number of output feature maps and the number of input feature maps and filtering. The number of channels is related, for example, inputting a face image, and after passing through the three filtering channels of the input layer, three feature maps are obtained.

The first network branch and the second network branch in the embodiment of the present application may be two independent branches, as shown in FIG. 2A; of course, the first network branch and the second network branch may also be shared. The first depth convolution god The first N convolutional layers in the network model are shown in Figure 2B. In the case where the first network branch and the second network branch are two independent branches, the first N convolution layers of the first network branch and the first N convolutions of the second network branch The layers are the same, and the N is an integer greater than 0 and less than P.

In the case that the first network branch and the second network branch are two independent branches, or the first network branch and the second network branch share the first N convolution layers, the first The first N convolutional layers of a network branch and the first N convolutional layers of the second network branch are used to filter background features in the face image to obtain a facial feature map. In the following, referring to FIG. 3, taking N as 4 as an example, a process of obtaining a face feature map by filtering background features in the face image through the first N convolution layers is described in detail:

S301. Filter the background of the horizontal and vertical directions of the face image by using a first convolution layer and a second convolution layer in the first N convolution layers of the first network branch in the first deep convolutional neural network model. feature.

The convolution kernel size of the first convolutional layer is equal to the convolution kernel size of the second convolutional layer.

In step S301, the first convolution layer may be a convolution layer for filtering background features in a horizontal direction of the face image, and the second convolution layer is for filtering a vertical direction of the face image. a convolutional layer of a background feature; the first convolutional layer may also be a convolution layer for filtering background features in a vertical direction of the face image, the second convolutional layer being for filtering the face The convolution layer of the background feature in the horizontal direction of the image is not specifically limited herein. That is, in the embodiment of the present application, the order of filtering the background feature in the horizontal direction of the face image or the background feature in the vertical direction of the face image is not specifically limited.

S302, by using a third convolution layer and a fourth convolution layer in the first N convolution layers of the first network branch in the first deep convolutional neural network model, for the face image filtered by the background feature, Smoothing is performed in the horizontal direction and in the vertical direction.

In step S302, the convolution kernel size of the third convolutional layer is equal to the convolution kernel size of the fourth convolutional layer. The third convolutional layer may be a convolutional layer for smoothing processing in a horizontal direction for the face image filtered with background features, the fourth convolutional layer being used for filtering background features The face image is a convolution layer that is smoothed in a vertical direction; the third convolution layer may also be used for smoothing the vertical direction for the face image filtered by the background feature. The convolutional layer is a convolutional layer for performing smoothing processing in the horizontal direction for the face image in which the background feature is filtered. The embodiment of the present application is not specifically limited herein. That is, in the embodiment of the present application, the smoothing process in the horizontal direction or the smoothing process in the vertical direction is not specifically limited.

The first N convolutional layers of the first network branch in the embodiment of the present application are the same as the first N convolutional layers of the second network branch or share the first N in the first deep convolutional neural network model. a convolution layer that improves computational efficiency of the first deep convolutional neural network model and uses a first convolutional layer and a second convolutional layer in the first N convolutional layers of the first network branch for filtering The background feature of the face image to be processed in the horizontal direction and the vertical direction, the third convolution layer and the fourth convolution layer are used for smoothing the horizontal and vertical directions for the face image filtered with the background feature , improve the accuracy of the face sketch image generation technology, and make the generated sketch image more natural.

Optionally, the number of filter channels of the first convolution layer is a, the number of filter channels of the second convolution layer is b, and the number of filter channels of the third convolution layer is c, the fourth The number of filtering channels of the convolutional layer is d, and the a, b may both be positive integers greater than or equal to 100 and less than or equal to 200, and the a and the b are equal, and the c and d may both be greater than or equal to 1. And a positive integer less than or equal to 100, the c and the d being equal. Specifically, the number of the filtering channels of each convolution layer is not specifically limited in this embodiment.

Filtering background features in the face image by using the first N convolution layers of the first network branch to obtain a face feature map, and obtaining the location through the last M convolution layers of the first network branch The facial sketch feature in the face feature map is obtained, and a facial structure sketch map is obtained. Where P = M + N. The specific values of M and N are not specifically limited in the embodiment of the present application. The N is 4 and the M is 2. For example, the fifth convolutional layer and the sixth branch of the first network branch may be adopted. The convolution layer acquires facial sketch features in the horizontal and vertical directions in the facial feature map to obtain a facial structure sketch map.

The convolution kernel size of the fifth convolutional layer of the first network branch is equal to the convolution kernel size of the sixth convolutional layer. The fifth convolution layer of the first network branch may be a convolution layer for acquiring a horizontal sketch feature in the horizontal feature map, and the sixth convolution layer is for acquiring the facial feature a convolution layer of a face sketch feature in a vertical direction in the figure; the fifth convolution layer of the first network branch may also be a convolution layer for acquiring a face sketch feature in a vertical direction in the face feature map The sixth volume is a convolution layer for acquiring the horizontal sketch feature in the horizontal feature map. The embodiment of the present application is not specifically limited herein. That is, in the embodiment of the present application, the order of acquiring the face sketch feature in the horizontal direction in the face feature map or the face sketch feature in the vertical direction in the face feature map is not specifically limited.

Filtering a background feature in the face image by using the first N convolution layers of the second network branch to obtain a face feature map, and obtaining a location through the last M convolution layers of the second network branch The hair sketch feature in the face feature map is obtained, and a hair texture sketch is obtained. Taking N as 4 and M as 2 as an example, the horizontal and vertical hair sketches in the face feature map are obtained by the fifth convolution layer and the sixth convolution layer of the second network branch. Features to get a sketch of the hair texture sketch.

The convolution kernel size of the fifth convolutional layer of the second network branch is equal to the convolution kernel size of the sixth convolutional layer. The fifth convolution layer of the second network branch may be a convolution layer for acquiring a horizontal direction hair sketch feature in the face feature map, and the sixth convolution layer is for acquiring the facial feature a convolutional layer of a vertical hair sketch feature in the figure; the fifth convolutional layer of the second network branch may also be a convolution layer for acquiring a vertical hair sketch feature in the facial feature map The sixth volume layer is a convolution layer for acquiring the horizontal direction hair sketch feature in the face feature map, which is not specifically limited herein. That is, in the embodiment of the present application, the order of acquiring the hair sketch feature in the horizontal direction in the face feature map or the hair sketch feature in the vertical direction in the face feature map is not specifically limited.

Optionally, the convolution kernel size of the last M convolution layers of the first network branch is equal to the convolution kernel size of the last M convolution layers of the second network branch. Taking N as 4 and M as 2 as an example, the convolution kernel size of the fifth convolutional layer of the first network branch is equal to the convolution kernel size of the fifth convolutional layer of the second network branch, The convolution kernel size of the sixth convolutional layer of the first network branch is equal to the convolution kernel size of the sixth convolutional layer of the second network branch.

Optionally, a fifth convolution layer of the first network branch, a sixth convolution layer of the first network branch, a fifth convolution layer of the second network branch, and the second network branch The number of filter channels of the four convolutional layers of the sixth volume can be all 1.

In a possible implementation manner, when the facial structure sketch map and the hair texture sketch map are synthesized to obtain a sketch image of the face image, the hair probability of the hair feature point may be based on each pixel point. Basically, the synthesis is performed. Specifically, before performing the synthesis of the facial structure sketch map and the hair texture sketch map to obtain the sketch image of the face image, each pixel point in the face image is acquired. The probability of hair for the hair feature points. After obtaining the facial structure sketch map and the hair texture sketch map by the above method, the facial structure sketch map and the hair texture sketch map are combined to obtain a sketch image of the face image, which meets the following formula requirements:

S _(i,j) =(1-P _h(i,j) )×S _S(i,j) +P _h(i,j) ×S _t(i,j)

In the above implementation manner, the synthetic sketch image not only retains the facial structure information well, but also the hair texture by adopting a method of synthesizing the facial structure sketch map and the hair texture sketch map based on the hair probability to obtain the sketch image of the face image. Information is also kept better.

Optionally, the hair probability of each pixel in the face image is obtained by using a second deep convolutional neural network model.

For example, as shown in FIG. 4, the second deep convolutional neural network model may include seven connection layers, wherein the first connection layer, the second connection layer, and the third connection layer each include one ReLU A convolution layer with a convolution (Conv) kernel size of 5×5, a pooling layer with a Conv size of 3×3, and a local response normalization (English: Local Response Normalization, for short: The LRN) layer; the fourth connection layer includes a convolution layer having a ReLU as an activation function and a Conv kernel size of 3×3; and the fifth connection layer includes a convolution with a ReLU as an activation function and a Conv kernel size of 3×3. The sixth connection layer includes a convolution layer having a ReLU as an activation function and a Conv core size of 1×1; and the seventh connection layer includes a convolution layer having a ReLU as an activation function and a Conv size of 1×1. The second deep convolutional neural network model can be trained in advance by sample images in the Helen dataset of the Helen database.

The first connection layer, the second connection layer, and the third connection layer are configured to acquire hair features, facial features, and background features of the face image; a fourth connection layer and a fifth connection layer, configured to Obtaining the facial image of the hair feature, the facial feature, and the background feature, acquiring the facial contour feature, the hair contour feature, and the background contour feature in the horizontal direction and the vertical direction; the sixth connection layer and the seventh connection layer are used for acquiring The face image of the facial contour feature, the hair contour feature, and the background contour feature are smoothed in the horizontal direction and the vertical direction.

When the hair probability of each pixel in the face image is acquired by the second depth convolutional neural network model, for each pixel point in the face image, the pixel contour is covered at each pixel point In the region, the hair probability of each pixel is 1, the face probability and the background probability are both 0; when the pixel is located in the area covered by the face contour, the face probability of each pixel is 1 The hair probability and the background probability are both 0; when each pixel point is located in an area covered by the background contour, the background probability of each pixel point is 1, and the hair probability and the face probability are both 0. The pixels in the area covered by the facial contour are facial feature points, the pixels in the area covered by the hair contour are hair feature points, and the pixels in the area covered by the background contour are background feature points.

For a better understanding of the embodiment of the present application, the following takes P as 4 as an example. Referring to FIG. 5, an exemplary process of generating a sketch image is exemplified:

Entering a face image into the first deep convolutional neural network model, and acquiring a facial structure sketch of the face image by using four convolution layers of the first network branch of the first deep convolutional neural network model; A second network branch of a deep convolutional neural network model, four convolutional layers, acquires a hair texture sketch of the face image. Then, the facial portion of the facial structure sketch map is obtained according to the hair probability of each pixel point as the hair feature point, and the hair portion of the hair texture sketch map is obtained according to the hair probability of each pixel point as the hair feature point, and finally The face portion and the hair portion synthesize a sketch image of the face image.

The face image is synthesized by the sketch generation method provided in the embodiment of the present application to obtain a sketch image, as shown in FIG. 6A. As shown in FIG. 6B, there are four face images to be processed, as shown in FIG. 6B, and an effect diagram of generating a sketch image for the four face images to be processed shown in FIG. 6A, that is, four to-be-processed as shown in FIG. 6A. The face image is processed by the first deep convolutional neural network model to obtain a sketch image.

The first deep convolutional neural network model used in the embodiment of the present application may be obtained by training the initial depth convolutional neural network model in the training sample database in advance, and the training sample database includes several individuals. The face sample image and the sketch sample image corresponding to each face sample image, the initialized first deep convolutional neural network model may include weights and offsets, and may of course include only weights, offset to zero.

Next, taking the first deep convolutional neural network model shown in FIG. 7 as an example, the training process of the first deep convolutional neural network model is specifically described, as shown in the first deep convolutional neural network model shown in FIG. The first four convolutional layers of the first deep convolutional neural network model shared by the first network branch and the second network branch are respectively a ReLU with an activation function and a Conv kernel size of 5×5. A roll of layers, a second convolutional layer with ReLU as the activation function and a Conv kernel size of 5×5, a third convolutional layer with ReLU as the activation function and a Conv kernel size of 1×1, with ReLU as the activation function And the Conv core size is 1×1 of the fourth convolutional layer; the last two convolutional layers of the first network branch are respectively the fifth volume of the first network branch with ReLU as the activation function and the Conv core size of 3×3 The sixth convolutional layer of the first network branch with the ReLU as the activation function and the Conv kernel size of 3×3; the last two convolutional layers of the second network branch have the ReLU as the activation function and the Conv size is The fifth convolutional layer of the 3×3 second network branch, with ReLU as the activation function and the Conv kernel size is 3× The sixth convolutional layer of the second network branch of 3. The size of the above convolution kernel is only an example, and does not specifically limit the configuration of the size of the convolution kernel in the present application. The size of the size of the convolution core is not specifically limited in the embodiment of the present application. The training process of the first deep convolutional neural network model is shown in Figure 8:

S801. Train a plurality of personal face sample images in the training sample database into the initialized first deep convolutional neural network model for training.

Optionally, the weighted configuration of the initialized first deep convolutional neural network model conforms to a Gaussian distribution with a mean of 0 and a variance of 0.01, and the offset configuration is 0.

S802. In the Kth training process, the face sample image and the pixel value of the pixel at the same position in the sketch average image are added to obtain a face enhancement image.

The pixel value of any one of the pixel points in the sketch average graph is an average value of pixel values of pixel points at the same position as the any one of the sketch sample images in the training sample database.

S803, filtering the background feature of the face enhancement image in the horizontal direction and the vertical direction by the first convolution layer and the second convolution layer in the first depth convolutional neural network model adjusted by K-1 times.

S804, by using a third convolutional layer and a fourth convolution layer in the first depth convolutional neural network model adjusted by K-1 times, the face enhancement image for filtering the background feature is horizontally and vertically Smoothing in the direction.

S805. The face sample image is divided into a plurality of mutually overlapping image blocks, and an image block including facial feature information and an image block including hair feature information are acquired from the plurality of mutually overlapping image blocks.

The number of image blocks including facial feature information is H, and the number of image blocks including hair feature information is Q, and both H and Q are positive integers.

Optionally, in step 805, acquiring an image block that includes facial feature information from the plurality of mutually overlapping image blocks may be implemented by:

Implementation manner 1: determining each of the image blocks of the plurality of mutually overlapping image blocks Each pixel in the image block is a face probability of the facial feature point; when it is determined that the number of pixel points whose face probability is not 0 is greater than a preset threshold, determining each of the image blocks as an image block including facial feature information .

Implementation 2: Obtain an image block including facial feature information from the plurality of mutually overlapping image blocks by a feature recognition method. The method for feature recognition may include a feature recognition method based on a local histogram, a feature recognition method based on a binarized histogram, and the like, which are not specifically limited in this embodiment of the present application.

Optionally, in step 805, acquiring an image block including the header feature information from the plurality of mutually overlapping image blocks may be implemented by:

Embodiment 1 : determining, for each of the plurality of mutually overlapping image blocks, a hair probability that each pixel in each image block is a hair feature point; determining that the hair probability is not zero When the number of pixels is greater than a preset threshold, it is determined that each of the image blocks is an image block including hair feature information.

Implementation 2: Obtaining an image block including hair feature information from the plurality of mutually overlapping image blocks by a feature recognition method. The method for feature recognition may include a feature recognition method based on a local histogram, a feature recognition method based on a binarized histogram, and the like, which are not specifically limited in this embodiment of the present application.

S806. For the fth image block including the facial feature information, determining that the fth image block including the facial feature information is in a corresponding target region in the facial feature image of the facial sample image, and the target is The image block in the area and the pixel value of the pixel at the same position in the f-th image block including the face feature information are added, as shown in FIG. 9, the f-th face strong feature map is obtained.

Wherein f is a positive integer that is not more than H.

S807. Acquire a facial sketch feature in the fth facial enhancement feature map by using the last M convolution layers of the first network branch of the K-1th adjusted first deep convolutional neural network model. A sketch of the facial structure of the fth face sample image.

S808, determining, for the gth image block including hair feature information, that the gth image block including the hair feature information corresponds to a target region in the face feature image of the face sample image, and the target is The image block in the region and the pixel value of the pixel at the same position in the gth image block including the hair feature information are added to obtain a g-th hair enhancement feature map.

Wherein g is a positive integer that is not more than Q.

S809. Acquire a hair sketch feature in the gth hair enhancement feature map by using the last M convolution layers of the first network branch of the K-1th adjusted first deep convolutional neural network model. A hair texture sketch of the gth face sample image.

It should be noted that, in step S806 and step S808, there is no strict sequence. Step S808 may be performed after step S806, or step S808 may be performed first, or step S806 may be performed first, or step S806 and step S808 may be performed simultaneously. The example is not specifically limited here.

S810, synthesizing the facial structure sketch map of the fth face sample image and the hair texture sketch map of the gth face sample image to obtain a sketch image of the face sample image.

S811. Acquire an error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image.

S812, adjusting weights and offsets used in the K+1th training process based on an error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image.

Specifically, an error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image and an adjustment amount of the offset are determined according to the network learning rate, and then adjusted according to the adjustment amount. K+1 times The weights and offsets used in the training process.

The network learning rate is the weight and the offset of each adjustment. The network learning rate of the first deep convolutional neural network model may be k×10 ⁻¹⁰ , where k is a positive integer not greater than 100. The example is not specifically limited here.

S813. After the Kth training, obtain a loss function value of the first deep convolutional neural network model.

If the loss function value of the first deep convolutional neural network model is greater than a preset threshold, performing K+1th training; if the loss function value of the first deep convolutional neural network model is less than or equal to a preset threshold, the first Deep convolutional neural network model training is completed.

Specifically, the loss function of the first deep convolutional neural network model conforms to the following formula:

L _g =L _s +αL _t ;

Where L _g is the loss function value of the first deep convolutional neural network model; L _s is the loss function value of the first network branch; L _t is the loss function value of the second network branch; α is a scalar parameter for maintaining A balance between the loss function value of the first network branch and the loss function value of the second network branch.

The value of the loss function of the first network branch may be the mean square error of the first network branch (English: Mean Squared Error, MSE for short), or may be the absolute error and (Sum of Absolute Difference, SAD) value. The value of the mean absolute error (MAD) value may be used as the value of the average absolute error (MAD), and may be other error values. The embodiment of the present application is not specifically limited herein. Taking the loss function value of the first network branch as an example of the MSE value of the first network branch, the loss function value of the first network branch can be determined by the following formula:

Wherein, L _s is a loss function value of the first network branch; p _s is an image block including the facial feature information of the fth; the s _s is a sketch sample image corresponding to the face sample image, f image blocks included in the target region corresponding to the image block including the facial feature information; P _s is all image blocks including facial feature information; |P _s | is the number of all image blocks including facial feature information, ie |P _s | equal to H;

In the face structure sketch map of the fth face sample image, the image block included in the target region corresponding to the fth image block including the face feature information, that is,

w _g is the weight and offset of the first N convolutional layers of the first deep convolutional neural network model, w _s is the weight of the last M convolutional layers of the first network branch of the first deep convolutional neural network model Offset.

The value of the loss function of the second network branch may be the weight of the MSE value and the Sorted Matching Mean Square Error (SM) value of the second network branch, and may also be other error values. The example is not specifically limited here. Among them, SM (·) = sort {MSE (·)}, Sort () is a sort function.

Taking the loss function value of the second network branch as the weight of the MSE value and the SM value of the second network branch, the loss function value of the second network branch can be determined by the following formula:

Wherein β is a scalar parameter, L _t is a loss function value of the first network branch; p _t is the gth image block including hair feature information; and the s _t is a sketch corresponding to the face sample image In the sample image, the image block included in the target region corresponding to the gth image block including the hair feature information; P _t is all image blocks including hair feature information; |P _t | is all images including hair feature information The number of blocks, ie |P _t | is equal to Q;

In the hair texture sketch map of the gth face sample image, the image block included in the target region corresponding to the gth image block including the hair feature information, ie

w _g is the weight and offset of the first N convolutional layers of the first deep convolutional neural network model, and w _t is the weight of the last M convolutional layers of the second network branch of the first deep convolutional neural network model Offset.

The loss function function of the first network branch is the MSE value of the first network branch, and the loss function value of the second network branch is the weight of the MSE value and the SM value of the second network branch, for example, the first deep convolutional neural network model The loss function value is determined by the following formula:

Based on the same inventive concept as the method embodiment, the embodiment of the present invention provides a sketch image generating apparatus 10, specifically for implementing the method described in the embodiments described in FIG. 1 to FIG. 5, FIG. 7, and FIG. The structure is as shown in FIG. 10, and includes an acquisition module 11, a deep convolutional neural network model 12, and a synthesis module 13, wherein:

The obtaining module 11 is configured to acquire a face image to be processed.

a depth convolutional neural network model 12, configured to acquire a facial structure sketch map and a hair texture sketch map in the face image acquired by the obtaining module 11; wherein the deep convolutional neural network model 12 is pre-trained The first network branching module 121 and the second network branching module 122 are configured. The structure of the deep convolutional neural network model 12 is as shown in FIG. 11:

The first network branching module 121 is configured to acquire a facial sketch feature in the face image acquired by the acquiring module 11 to obtain a facial structure sketch map, where the first network branching module includes P convolution layers. Wherein P is an integer greater than zero.

The second network branching module 122 is configured to obtain a hair sketch feature in the face image acquired by the obtaining module 11 to obtain a hair texture sketch map; and the second network branching module includes P convolution layers.

a synthesizing module 13 configured to synthesize the facial structure sketch map obtained by the first network branching module 121 and the hair texture sketch map obtained by the second network branching module 122 to obtain a sketch image of the facial image .

In a possible implementation manner, the first N convolution layers of the P convolution layers included in the first network branching module 121 and the P convolution layers included in the second network branching module 122 The first N convolutional layers are the same or coincident, The N is an integer greater than 0 and less than P.

In a possible implementation, the first network branching module 121 is configured to filter background features in the face image by using the first N convolution layers of the first network branching module 121. Obtaining a face feature map, and then acquiring the face sketch feature in the face feature map by the last M convolution layers of the first network branching module 121. The second network branching module 122 is configured to filter background features in the face image by using the first N convolution layers of the second network branch in the deep convolutional neural network model 12 to obtain a person. The face feature map is then obtained by the last M convolution layers of the second network branch to obtain the hair sketch feature in the face feature map. Where P = M + N.

Optionally, the convolution kernel size of the last M convolutional layers of the first network branching module 121 is equal to the convolution kernel size of the last M convolutional layers of the second network branching module 122.

In a possible implementation manner, the N is 4, and the first network branching module 121 filters the face image in the first N convolution layers of the first network branching module 121. The background feature is specifically configured to filter the background of the face image in the horizontal direction and the vertical direction by using the first convolution layer and the second convolution layer in the first N convolution layers of the first network branching module 121. a feature, then through the third convolutional layer and the fourth convolutional layer of the first N convolutional layers of the first network branching module 121, for the face image filtered by the background feature, horizontally and vertically Smoothing in the direction.

Optionally, the convolution kernel size of the first convolution layer is equal to the convolution kernel size of the second convolution layer, and the convolution kernel size of the third convolution layer and the convolution kernel size of the fourth convolution layer the same.

In a possible implementation manner, the obtaining module 11 is further configured to acquire a hair probability that each pixel point in the face image is a hair feature point. The synthesizing module 13 is configured to synthesize the facial structure sketch map obtained by the first network branching module 121 and the hair texture sketch map obtained by the second network branching module 122 to obtain the facial image. The sketch image meets the following formula requirements:

S _(i,j) =(1-P _h(i,j) )×S _S(i,j) +P _h(i,j) ×S _t(i,j)

Optionally, the device further includes:

The training module 14 is configured to train the deep convolutional neural network model 12 by:

Performing training by inputting a plurality of personal face sample images in the training sample database into the initialized deep convolutional neural network model 12; the training sample database includes a plurality of personal face sample images and a sketch sample image corresponding to each face sample image, the initialization The deep convolutional neural network model 12 includes weights and offsets.

In the Kth training process, the background features in the face sample image are filtered by the first N convolution layers of the K-1 sub-depth convolutional neural network model 12 to obtain the face sample image. The face feature map, the K being an integer greater than zero.

Obtaining a facial sketch feature in the face feature image of the face sample image by using the last M convolution layers of the first network branching module 121 of the K-1 sub-depended deep convolutional neural network model 12, A facial structure sketch map of the face sample image is obtained.

Acquiring the hair sketch feature in the face feature image of the face sample image by using the last M convolution layers of the second network branching module 122 of the K-1 sub-depended deep convolutional neural network model 12, Getting the face sample Image of hair texture sketch illustration.

A face structure sketch map of the face sample image and a hair texture sketch map of the face sample image are combined to obtain a sketch image of the face sample image.

After the Kth training, an error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image is acquired.

Optionally, the training module 14 filters the first N convolution layers of the deep convolutional neural network model 12 that has undergone K-1 adjustments during the Kth training process to filter the image in the face sample image. When the background feature is used, it is specifically used to:

The face sample image and the pixel value of the pixel at the same position in the sketch average map are added to obtain a face enhancement image. The pixel value of any one of the pixel points in the sketch average map is an average value of pixel values of pixel points in the same position in the sketch sample image in the training sample database that are at the same position as the any one of the pixel points; The background features in the face enhancement image are filtered by the first N convolutional layers of the deep convolutional neural network model 12 that have been K-1 adjusted.

In a possible implementation, the acquiring module 11 is further configured to divide the face sample image into a plurality of mutually overlapping image blocks, and obtain a face including the plurality of mutually overlapping image blocks. An image block of feature information. The training module 14 acquires the facial features of the face sample image by using the last M convolution layers of the first network branching module 121 of the deep convolutional neural network model 12 that has undergone K-1 adjustments. The facial sketch features in the figure are specifically used to:

Determining, for each image block that includes the facial feature information acquired by the acquiring module 11, determining that each of the image blocks including the facial feature information corresponds to a target region in the facial feature image of the facial sample image, and The image block in the target area and the pixel values of the pixel points at the same position in each of the image blocks including the facial feature information are added to obtain a face enhancement feature map; for each face enhancement feature map, through the The rear M convolutional layers of the first network branching module 121 of the K-1 sub-adjusted deep convolutional neural network model 12 acquire the facial sketch features in the facial enhancement feature map.

In a possible implementation, the acquiring module 11 is further configured to divide the face sample image into a plurality of overlapping image blocks, and obtain the hair including the plurality of mutually overlapping image blocks. An image block of feature information. The training module 14 acquires the facial features of the face sample image after the M M convolution layers of the second network branching module 122 of the deep convolutional neural network model 12 that has undergone K-1 adjustments. The hair sketch feature in the figure is specifically used to:

And for each image block including the hair feature information acquired by the obtaining module 11, adding the pixel values of the pixel positions at the same position in the image block including the hair feature information to the hair enhancement Feature map; for each hair enhancement feature map, the hair enhancement feature map is obtained by the last M convolution layers of the second network branch module 122 of the K-1 sub-depended deep convolutional neural network model 12 Hair sketch features.

Optionally, the acquiring module 11 is configured to: when acquiring an image block that includes facial feature information from the plurality of mutually overlapping image blocks, specifically:

In a possible design, the obtaining module 11 acquires a packet from the plurality of mutually overlapping image blocks. When an image block including hair feature information is used, it is specifically used to:

The division of the modules in the embodiment of the present application is schematic, and is only a logical function division. In actual implementation, there may be another division manner. In addition, each functional module in each embodiment of the present application may be integrated into one processing. In the device, it can also be physically existed alone, or two or more modules can be integrated into one module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules.

Wherein, when the integrated module can be implemented in the form of hardware, as shown in FIG. 12, the collector 1201, the processor 1202, and the memory 1203 can be included. The physical hardware corresponding to the deep convolutional neural network model 12, the synthesis module 13 and the training module 14 may be the processor 1202. The processor 1202 can be a central processing unit (English: central processing unit, CPU for short), or a digital processing unit or the like. The processor 1202 acquires a face image to be processed through the collector 1201. The memory 1203 is configured to store a program executed by the processor 1202.

The specific connection medium between the above-mentioned collector 1201, processor 1202 and memory 1203 is not limited in the embodiment of the present application. In the embodiment of the present application, the memory 1203, the processor 1202, and the collector 1201 are connected by a bus 1204 in FIG. 12, and the bus is indicated by a thick line in FIG. 12, and the connection manner between other components is only schematically illustrated. , not limited to. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 12, but it does not mean that there is only one bus or one type of bus.

The memory 1203 may be a volatile memory (English: volatile memory), such as a random access memory (English: random-access memory, abbreviation: RAM); the memory 1203 may also be a non-volatile memory (English: non-volatile memory) For example, read-only memory (English: read-only memory, abbreviation: ROM), flash memory (English: flash memory), hard disk (English: hard disk drive, abbreviation: HDD) or solid state drive (English: solid-state drive Abbreviation: SSD), or memory 1203 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited thereto. The memory 1203 may be a combination of the above memories.

The processor 1202 is configured to execute the program code stored in the memory 1203, and is specifically configured to perform the method described in the foregoing embodiments corresponding to FIG. 1 to FIG. 9, and may be specifically implemented by referring to the corresponding embodiments in FIG. 1 to FIG. Narration.

The embodiments described herein are for illustrative purposes only and are not intended to limit the present application, and the functional modules in the embodiments and embodiments of the present application may be combined with each other without conflict.

Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Therefore, the present application may employ an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. The form of the case. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

It is apparent that those skilled in the art can make various changes and modifications to the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. Thus, it is intended that the present invention cover the modifications and variations of the embodiments of the present invention.

Claims

A method for generating a sketch image, comprising:

Obtaining a face image to be processed;

Obtaining a facial sketch feature in the face image by obtaining P convolution layers of the first network branch in the pre-trained deep convolutional neural network model, and obtaining a facial structure sketch map, wherein the P is an integer greater than 0 ;

Obtaining a hair sketch feature in the face image by using P convolution layers of the second network branch in the deep convolutional neural network model to obtain a hair texture sketch map;

The facial structure sketch map and the hair texture sketch map are combined to obtain a sketch image of the face image.
The method according to claim 1, wherein the first N convolutional layers of the first network branch are the same as or coincide with the first N convolutional layers of the second network branch, and the N is greater than 0. An integer less than P.
The method of claim 2, wherein the acquiring the facial sketch features in the face image by the P convolution layers of the first network branch in the deep convolutional neural network model comprises:

Filtering background features in the face image by using the first N convolution layers of the first network branch in the deep convolutional neural network model to obtain a facial feature map;

Obtaining a facial sketch feature in the face feature map by using the last M convolution layers of the first network branch;

Obtaining the hair sketch feature in the face image by using the P convolution layers of the second network branch in the deep convolutional neural network model, including:

Filtering background features in the face image by using the first N convolution layers of the second network branch in the deep convolutional neural network model to obtain a facial feature map;

Obtaining a hair sketch feature in the face feature map by using the last M convolution layers of the second network branch;

Where P = M + N.
The method of claim 3, wherein a convolution kernel size of the last M convolutional layers of the first network branch and a convolution kernel of the last M convolutional layers of the second network branch The dimensions correspond to each other.
The method according to claim 3 or 4, wherein the N is 4, and the first N convolutional layers of the first network branch in the deep convolutional neural network model are filtered to filter the face image Background features, including:

Filtering the background features in the horizontal direction and the vertical direction of the face image by the first convolution layer and the second convolution layer in the first N convolution layers of the first network branch in the deep convolutional neural network model;

Through the third convolutional layer and the fourth convolutional layer in the first N convolutional layers of the first network branch in the deep convolutional neural network model, the face image filtered for the background feature is horizontally and vertically Smoothing in the direction.
The method according to claim 5, wherein the convolution kernel size of the first convolutional layer is equal to the convolution kernel size of the second convolutional layer, and the convolution kernel size and the third convolutional layer The convolution kernel of the four-volume layer is the same size.
The method of any of claims 1 to 6, wherein the method further comprises:

Obtaining a hair probability that each pixel in the face image is a hair feature point;

And combining the facial structure sketch map and the hair texture sketch map to obtain a sketch image of the face image, which meets the following formula requirements:

S (i,j) =(1-P h(i,j) )×S S(i,j) +P h(i,j) ×S t(i,j)

Wherein the S (i, j) is a pixel value of a pixel point of the i-th row and the j-th column in the sketch image of the face image, and P h(i, j) is a sketch image of the face image The hair probability of the pixel of the i-th row and the j-th column, S S(i, j) is the pixel value of the pixel of the i-th row and the j-th column in the sketch image of the face structure, and S t(i, j) is A pixel value of a pixel of the i-th row and the j-th column in the hair texture sketch map, wherein i, j are integers greater than zero.
The method of any of claims 2-7, wherein the deep convolutional neural network model is trained by:

Performing training by inputting a plurality of personal face sample images in the training sample database into the initialized deep convolutional neural network model; the training sample database includes a plurality of personal face sample images and a sketch sample image corresponding to each face sample image, the initialized Deep convolutional neural network models include weights and offsets;

In the Kth training process, the background features in the face sample image are filtered by the first N convolution layers of the K-1 sub-depth convolutional neural network model to obtain the face sample image. a face feature map, the K being an integer greater than 0;

Obtaining a facial sketch feature in a face feature image of the face sample image by using the last M convolution layers of the first network branch of the K-1 adjusted deep convolutional neural network model, to obtain the a sketch of the facial structure of the face sample image;

Acquiring the hair sketch feature in the face feature image of the face sample image by using the last M convolution layers of the second network branch of the K-1 adjusted deep convolutional neural network model, Sketch image of hair texture of face sample image;

Combining a facial structure sketch map of the face sample image and a hair texture sketch map of the face sample image to obtain a sketch image of the face sample image;

Obtaining an error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image after the Kth training;

The weight and offset used in the K+1th training process are adjusted based on an error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image.
The method according to claim 8, wherein in the Kth training, the face sample image is filtered by the first N convolution layers of the K-1 sub-adjusted deep convolutional neural network model Background features in, including:

Adding the face sample image and the pixel value of the pixel at the same position in the sketch average image to obtain a face enhancement image;

The pixel value of any one of the pixel points in the sketch average map is an average value of pixel values of pixel points in the same position in the sketch sample image in the training sample database that are at the same position as the any one of the pixel points;

The background features in the face enhancement image are filtered by the first N convolutional layers of the K-1 sub-depth convolutional neural network model.
The method according to claim 8 or 9, wherein the face sample is obtained by the last M convolution layers of the first network branch of the K-1 adjusted deep convolutional neural network model The facial sketch features in the face feature map of the image, including:

Dividing the face sample image into a plurality of mutually overlapping image blocks, and acquiring image blocks including facial feature information from the plurality of mutually overlapping image blocks;

Determining, for each of the image blocks including the facial feature information, the image block including the facial feature information corresponding to the target region in the facial feature map of the facial sample image, and the image in the target region And adding a pixel value of the pixel at the same position in each of the image blocks including the facial feature information to obtain a face enhancement feature map;

For each facial enhancement feature map, through the K-1 adjusted deep convolutional neural network model A rear M convolutional layer of a network branch acquires a facial sketch feature in the facial enhancement feature map.
The method according to any one of claims 8 to 10, wherein the latter M convolutional layer of the second network branch of the K-1 sub-adjusted deep convolutional neural network model is obtained The hair sketch features in the face feature map of the face sample image, including:

Dividing the face sample image into a plurality of mutually overlapping image blocks, and acquiring image blocks including hair feature information from the plurality of mutually overlapping image blocks;

And adding, to each of the image blocks including the hair feature information, the pixel values of the pixel positions at the same position in the image block including the hair feature information to obtain a hair enhancement feature map;

For each hair enhancement feature map, the hair sketch feature in the hair enhancement feature map is obtained by the last M convolution layers of the second network branch of the K-1 adjusted deep convolutional neural network model.
The method of claim 10, wherein the obtaining an image block comprising facial feature information from the plurality of mutually overlapping image blocks comprises:

Determining, for each of the plurality of mutually overlapping image blocks, a face probability of each pixel point in each of the image blocks as a facial feature point; determining a number of pixel points whose face probability is not 0 When it is greater than the preset threshold, it is determined that each of the image blocks is an image block including facial feature information.
The method of claim 11, wherein the obtaining an image block comprising hair feature information from the plurality of mutually overlapping image blocks comprises:

Determining, for each of the plurality of mutually overlapping image blocks, a hair probability of each pixel point in each of the image blocks as a hair feature point; determining a number of pixel points whose hair probability is not zero When it is greater than the preset threshold, it is determined that each of the image blocks is an image block including hair feature information.
A device for generating a sketch image, comprising:

An obtaining module, configured to obtain a face image to be processed;

a depth convolutional neural network model, configured to acquire a facial structure sketch map and a hair texture sketch map in the face image acquired by the obtaining module; the deep convolutional neural network model is pre-trained, including the first network a branch module and a second network branch module;

The first network branching module is configured to acquire a facial sketch feature in the facial image acquired by the acquiring module, to obtain a facial structure sketch map, where the first network branching module includes P convolutional layers. Wherein P is an integer greater than 0;

The second network branching module is configured to obtain a hair sketch feature in the face image acquired by the acquiring module, to obtain a hair texture sketch map; and the second network branching module includes P convolution layers;

And a synthesizing module, configured to synthesize the facial structure sketch map obtained by the first network branching module and the hair texture sketch map obtained by the second network branching module to obtain a sketch image of the facial image.
The apparatus according to claim 14, wherein the first N convolution layers of the P convolutional layers included in the first network branching module and the P convolutional layers included in the second network branching module The first N convolutional layers are the same or coincident, and the N is an integer greater than 0 and less than P.
The device according to claim 15, wherein the first network branching module is specifically configured to:

Filtering background features in the face image by using the first N convolution layers of the first network branching module to obtain a facial feature map;

Obtaining a facial sketch feature in the facial feature map by using the last M convolution layers of the first network branching module;

The second network branch module is specifically configured to:

Filtering background features in the face image by using the first N convolution layers of the second network branch in the deep convolutional neural network model to obtain a facial feature map;

Obtaining a hair sketch feature in the face feature map by using the last M convolution layers of the second network branch;

Where P = M + N.
The apparatus according to claim 16, wherein a convolution kernel size of the last M convolutional layers of the first network branching module and a volume of the last M convolutional layers of the second network branching module The product core sizes are equal.
The apparatus according to claim 16 or 17, wherein said N is 4, said first network branching module filtering said person through said first N convolutional layers of said first network branching module When the background feature in the face image is used, it is specifically used to:

Filtering background features in a horizontal direction and a vertical direction of the face image by using a first convolution layer and a second convolution layer in the first N convolution layers of the first network branching module;

Smoothing in the horizontal direction and the vertical direction for the face image filtered by the background feature by the third convolution layer and the fourth convolution layer in the first N convolution layers of the first network branching module deal with.
The apparatus according to claim 18, wherein a convolution kernel size of said first convolutional layer is equal to a convolution kernel size of said second convolutional layer, and a convolution kernel size and a third convolutional layer The convolution kernel of the four-volume layer is the same size.
The device according to any one of claims 14 to 19, wherein the obtaining module is further configured to acquire a hair probability of each pixel point in the face image as a hair feature point;

The synthesis module is specifically configured to:

Combining the facial structure sketch map obtained by the first network branching module and the hair texture sketch map obtained by the second network branching module to obtain a sketch image of the face image, which meets the following formula requirements:

S (i,j) =(1-P h(i,j) )×S S(i,j) +P h(i,j) ×S t(i,j)

Wherein the S (i, j) is a pixel value of a pixel point of the i-th row and the j-th column in the sketch image of the face image, and P h(i, j) is a sketch image of the face image The hair probability of the pixel of the i-th row and the j-th column, S S(i, j) is the pixel value of the pixel of the i-th row and the j-th column in the sketch image of the face structure, and S t(i, j) is A pixel value of a pixel of the i-th row and the j-th column in the hair texture sketch map, wherein i, j are integers greater than zero.
The device according to any one of claims 14 to 20, further comprising:

a training module for training the deep convolutional neural network model by:

Performing training by inputting a plurality of personal face sample images in the training sample database into the initialized deep convolutional neural network model; the training sample database includes a plurality of personal face sample images and a sketch sample image corresponding to each face sample image, the initialized Deep convolutional neural network models include weights and offsets;

In the Kth training process, the background features in the face sample image are filtered by the first N convolution layers of the K-1 sub-depth convolutional neural network model to obtain the face sample image. a face feature map, the K being an integer greater than 0;

Obtaining facial sketch features in the facial feature map of the face sample image by using the last M convolution layers of the first network branching module of the K-1 sub-depended deep convolutional neural network model a sketch of the facial structure of the face sample image;

Obtaining a hair sketch feature in the face feature image of the face sample image by using the last M convolution layers of the second network branch module of the K-1 sub-depended deep convolutional neural network model a sketch of the hair texture of the face sample image;

Forming a face structure sketch of the face sample image and a hair texture sketch of the face sample image Obtaining a sketch image of the face sample image;

Obtaining an error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image after the Kth training;

The weight and offset used in the K+1th training process are adjusted based on an error value between the sketch image of the face sample image and the sketch sample image corresponding to the face sample image.
The apparatus according to claim 21, wherein said training module filters the first N convolutional layers of the deep convolutional neural network model subjected to K-1 adjustments during the Kth training period When describing the background features in the face sample image, it is specifically used to:

Adding the face sample image and the pixel value of the pixel at the same position in the sketch average image to obtain a face enhancement image;

The pixel value of any one of the pixel points in the sketch average map is an average value of pixel values of pixel points in the same position in the sketch sample image in the training sample database that are at the same position as the any one of the pixel points;

The background features in the face enhancement image are filtered by the first N convolutional layers of the K-1 sub-depth convolutional neural network model.
The device according to claim 21 or 22, wherein the acquisition module is further configured to divide the face sample image into a plurality of mutually overlapping image blocks, and from the plurality of mutually overlapping images Obtaining an image block including facial feature information in the block;

The training module acquires the face feature map of the face sample image in the last M convolution layers of the first network branch module of the deep convolutional neural network model that has undergone K-1 adjustments. When the facial sketch feature is used, it is specifically used to:

Determining, for each image block that includes the facial feature information acquired by the acquiring module, that each of the image blocks including the facial feature information is in a corresponding target region in the facial feature map of the facial sample image, and Adding pixel values of the image points in the target area and the pixel points of the same position in each of the image blocks including the facial feature information to obtain a face enhancement feature map;

Obtaining facial sketch features in the facial enhancement feature map by using the last M convolutional layers of the first network branching module of the K-1 sub-depended deep convolutional neural network model for each facial enhancement feature map .
The device according to any one of claims 21 to 23, wherein the acquisition module is further configured to divide the face sample image into a plurality of overlapping image blocks, and from the plurality of mutual Obtaining an image block including hair feature information in the overlapping image blocks;

The training module acquires the face feature map of the face sample image in the last M convolution layers of the second network branch module of the deep convolutional neural network model that has undergone K-1 adjustments. When sketching hair features, it is specifically used to:

And adding, to the image block including the hair feature information acquired by the acquiring module, the pixel sample values of the face sample image and the pixel position of the same position in the image block including the hair feature information to obtain a hair enhancement feature Figure

Obtaining the hair sketch feature in the hair enhancement feature map by using the rear M convolution layers of the second network branching module of the K-1 sub-depth deep convolutional neural network model for each hair enhancement feature map .
The device according to claim 23, wherein the obtaining module is configured to: when acquiring an image block including facial feature information from the plurality of mutually overlapping image blocks,

Determining, for each of the plurality of mutually overlapping image blocks, a face probability of each pixel point in each of the image blocks as a facial feature point; determining a number of pixel points whose face probability is not 0 Greater than the preset threshold At the time of the value, it is determined that each of the image blocks is an image block including facial feature information.
The device according to claim 24, wherein the obtaining module is configured to: when acquiring an image block including hair feature information from the plurality of mutually overlapping image blocks,

Determining, for each of the plurality of mutually overlapping image blocks, a hair probability of each pixel point in each of the image blocks as a hair feature point; determining a number of pixel points whose hair probability is not zero When it is greater than the preset threshold, it is determined that each of the image blocks is an image block including hair feature information.
A device for generating a sketch image, comprising: a collector, a memory, and a processor;

The collector is configured to acquire a face image to be processed;

a memory for storing a program executed by the processor;

And a processor, configured to execute the memory stored program based on the face image acquired by the collector to perform the method of any one of claims 1 to 13.
A computer storage medium, characterized in that the computer readable storage medium stores computer executable instructions for causing the computer to perform the method of any one of claims 1 to 13.