+

WO2016112797A1 - Method and device for determining image display information - Google Patents

Method and device for determining image display information Download PDF

Info

Publication number
WO2016112797A1
WO2016112797A1 PCT/CN2016/070157 CN2016070157W WO2016112797A1 WO 2016112797 A1 WO2016112797 A1 WO 2016112797A1 CN 2016070157 W CN2016070157 W CN 2016070157W WO 2016112797 A1 WO2016112797 A1 WO 2016112797A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
display information
information
display
training
Prior art date
Application number
PCT/CN2016/070157
Other languages
French (fr)
Chinese (zh)
Inventor
石克阳
曹阳
Original Assignee
阿里巴巴集团控股有限公司
石克阳
曹阳
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 石克阳, 曹阳 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2016112797A1 publication Critical patent/WO2016112797A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present application relates to the field of computers, and in particular, to a method and apparatus for determining picture display information.
  • the online shopping platform provides various commodity information release mechanisms for various e-commerce providers, and merchants can upload photos of products with multiple angles and multiple backgrounds to attract users.
  • the poor image display method not only hinders the user from obtaining the required information, but also wastes the user's valuable bandwidth resources and reduces the user's screen utilization.
  • the open nature of the Internet such a situation will continue to exist; and, due to the explosive nature of Internet information, it is not feasible to attempt to manually review the display of these images.
  • a method for determining picture display information includes:
  • an apparatus for determining picture display information comprising:
  • a first device configured to acquire a plurality of training pictures that have been marked with display information
  • a second device configured to perform a corresponding picture detection model by using a convolutional neural network based on the plurality of training pictures
  • a third device configured to determine, according to the picture detection model, a picture display letter of the picture to be detected interest.
  • the present application models the display of different display modes of the picture, and determines the picture display information of the picture to be detected through the built model, thereby realizing efficient and accurate recognition of the picture display mode, thereby supporting
  • the picture displayed or the display mode of the picture is further improved, thereby improving the efficiency of the user to obtain information, providing the utilization rate of the screen resources of the user terminal, and improving the user experience.
  • FIG. 1 shows a schematic diagram of an apparatus for determining picture display information in accordance with an aspect of the present application
  • FIG. 2 is a schematic diagram showing a correspondence relationship between training pictures and display information acquired in an apparatus for determining picture display information according to an aspect of the present application;
  • FIG. 3 shows a schematic diagram of a first device in an apparatus for determining picture display information in accordance with a preferred embodiment of the present application
  • FIG. 4 shows a flow chart executed by a second device in an apparatus for determining picture display information in accordance with a preferred embodiment of the present application
  • FIG. 5 is a schematic diagram showing a third device in an apparatus for determining picture display information according to a preferred embodiment of the present application.
  • FIG. 6 shows a flow chart of a method for determining picture display information according to another aspect of the present application.
  • Figure 7 shows a flow chart of step S1 in a method for determining picture display information in accordance with a preferred embodiment of the present application
  • FIG. 8 shows a flow chart of step S3 in a method for determining picture display information in accordance with another preferred embodiment of the present application.
  • the terminal, the device of the service network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage,
  • computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.
  • the device 1 shows an apparatus 1 for determining picture display information in accordance with an aspect of the present application.
  • the device 1 includes a first device 11, a second device 12, and a third device 13.
  • the first device 11 is configured to acquire a plurality of training pictures that have been labeled with display information
  • the second device 12 is configured to obtain a corresponding picture detection model by using a convolutional neural network based on the plurality of training pictures
  • the third device 13 is configured to determine picture display information of the to-be-detected picture according to the picture detection model.
  • the device 1 may be implemented by a network host, a single network server, a plurality of network server sets, or a cloud composed of a plurality of servers.
  • the cloud is composed of a large number of host or network servers based on Cloud Computing, which is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computers.
  • Cloud Computing which is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computers.
  • the device 1 includes an electronic device capable of automatically performing numerical calculation and information processing according to an instruction set or stored in advance, the hardware of which includes but is not limited to a microprocessor, an application specific integrated circuit (ASIC), and a programmable Gate array (FPGA), digital processor (DSP), embedded devices, etc.
  • ASIC application specific integrated circuit
  • FPGA programmable Gate array
  • DSP digital processor
  • the first device 11 constructs a size, format, and the like required by the second device 12 according to the size, format, and the like of the image detection model, and obtains the training image by using a predetermined communication method such as http, https, or by local reading.
  • a predetermined communication method such as http, https, or by local reading.
  • the training picture may be the stored source picture, or may be a picture obtained after the source picture is trimmed.
  • the first device 11 uniformly acquires each training picture in accordance with the classification of the display information.
  • the display information includes any information that can describe the placement information of the item displayed by the training picture, and the details or overall effect of the item.
  • the display information describing the training picture of the clothing includes but is not limited to: the upper body display information, the model lower body display information, the model body display information, the upper body tile display information, the lower body tile display information, the whole body tile display information, the detail display Information, stacked display information, multi-picture display information, other display information, etc. Its correspondence with the training picture is exemplified in FIG. 2.
  • the display information describing the training picture of the furniture includes, but is not limited to, front display information, three-dimensional display information, side display information, detail display information, other display information, and the like.
  • the display information corresponding to each of the training pictures may be directly obtained from a database corresponding to each training picture. It may also be determined according to the obtained display manner of each training picture, wherein the display manner includes a plurality of display classification information, and the marked display information includes at least one of the plurality of display classification information.
  • the display manner of the first device 11 for the lower body garment includes three types of display classification information, specifically: the model lower body display classification information, the lower body tile display classification information, and the lower body detail display classification information.
  • the training picture acquired by the first device 11 includes: a model showing a picture of the pants, a model showing a picture of the skirt, and simultaneously obtaining the display manners of the two training pictures, the display information of the model lower body, corresponding to each
  • the display information of the pictures are: the model shows the information under the body.
  • a picture in the training picture acquired by the first device 11 includes an image of the model showing the pants and an image of the pants details, according to the acquired corresponding display side.
  • the first device 11 determines that the display information of the picture includes the model lower body display information and the lower body detail display classification information.
  • the first device 11 obtains a plurality of training pictures by trimming the source picture.
  • the first device 11 includes: a first unit 111 and a first two unit 112 (shown in FIG. 3).
  • the first unit 111 is configured to acquire a plurality of sample pictures that have been labeled with display information; and the first two units 112 are configured to perform pre-processing on each sample picture to obtain a corresponding training picture.
  • the first unit 111 acquires a plurality of sample pictures remotely by means of an agreed communication method such as http, https, or the like by local reading or the like. Since the size, color, and the like of the acquired sample pictures are different, the first two units 112 preprocess each sample picture to obtain each training picture that meets the preset size and color requirements.
  • the manner in which the first two units 112 preprocess each sample picture includes selecting, from the acquired sample pictures, a picture that meets a requirement of a preset size, a color, and the like as the training picture.
  • the processing manner comprises: normalizing each sample picture to obtain a corresponding training picture.
  • the manner of the normalization processing includes, but is not limited to, at least one of the following: 1) converting a sample picture into a three primary color representation. For example, if the acquired sample picture is in JPG format, the first two units 112 convert the sample picture into an RGB format. 2) Scaling the sample image so that one side is fixed length.
  • the first two units 112 convert the sample picture into a short side size of a and a long side size of (a*a i /b i ).
  • i is the sequence number of the sample picture, 0 ⁇ i ⁇ n
  • n is the number of sample pictures acquired
  • a i is the short side size of the ith sample picture
  • b i is the long side size of the ith sample picture .
  • 3) Crop the sample image to make it square.
  • the first two units 112 respectively cut the two short sides of the acquired sample pictures by the width of (a i -a)/2, and the two long sides respectively cut the width of (b i -a)/2.
  • i is the sequence number of each sample picture, 0 ⁇ i ⁇ n, n is the number of sample pictures acquired, a i is the short side size of the i-th sample picture, and b i is the long side of the i-th sample picture Size, a is the length and width of the cropped sample image.
  • the first two units 112 can also convert the sample picture into three primary colors first. Indicates that the sample image is then scaled so that one side is fixed length, and/or cropped.
  • the number of the sample pictures may be the same as the number of training pictures, or may be less than the number of training pictures.
  • the first two units 112 intercept a plurality of corresponding training pictures from each sample picture processed by the normalization by using a moving window.
  • the number of sample pictures acquired by the first unit 111 is n
  • the first two units 112 now normalize the acquired sample pictures according to any one or more of the foregoing manners.
  • each of the cropped sample pictures of size a*a is carpet-moved by a moving window of a'*a', wherein the step of moving is t.
  • the number of training pictures that are truncated for each sample picture is 1+(a-a')/t
  • the number of training pictures obtained by the second unit is n*(1+(a-a' ) / t).
  • the first two units 112 intercept a plurality of corresponding training pictures from each sample picture subjected to the normalization process by using a moving window, so that the obtained training pictures retain the lower half of the original sample picture.
  • the first two units 112 intercept a plurality of training pictures that retain the lower half of the original sample picture from the sample picture by keeping the moving window aligned with the bottom of the sample picture during the movement.
  • the first two units 112 can also obtain more training pictures by performing rotation processing on the intercepted training pictures, such as mirror flip, plane rotation, and the like. For example, after the first two units 112 perform normalization processing according to any one or more of the above manners, and even intercept a plurality of corresponding training pictures, the obtained training pictures are rotated, so that more The picture is trained and delivered to the second device 12.
  • the items displayed in the training pictures acquired by the first device 11 should belong to the same type of items.
  • all the acquired training pictures are displayed as clothing items; or, the obtained training pictures are all digital items and the like.
  • the second device 12 trains the convolutional neural network based on the plurality of training pictures to obtain a corresponding picture detection model.
  • the second device 12 performs convolutional neural network training on each training picture acquired by the first device 11, and obtains a feature vector (ie, a neuron) corresponding to each display information, and then according to each display information.
  • the obtained feature vectors are classified and processed to obtain a picture detection model.
  • the second device 12 performs convolutional neural network training on each training picture, and correspondingly displays the obtained feature vector with the display information of the training picture to be associated, and when all the training pictures are completed by the convolutional neural network training, corresponding
  • Each fixed feature vector of the same display information is subjected to normalized classification processing in all dimensions, and finally the feature vector of each dimension after classification is corresponding to a picture detection model of display information.
  • the convoluted neural network includes three layers of convolution layers and two layers of fully connected layers.
  • the second device 12 iterates the results obtained by each layer of the convolution layer in a gradient descent manner. Then, the two connected layers are used to establish a connection relationship between the obtained feature vectors.
  • the convolutional neural network may also preferably set a dropout layer (shown in FIG. 4) in one of the all-connected layers to improve the efficiency of model convergence; here, the role of the Dropout layer is to Some of the parameters in its corresponding convolutional layer or fully connected layer are dormant, but their corresponding parameter values are retained but not updated until the next time they are not selected for hibernation.
  • the convolutional neural network also includes a softmax (soft kernel function) layer; during the training phase, the training picture and the corresponding display information are used together, and the whole problem is trained through a multi-layer network, such as a dropout layer, a convolution layer, and a whole Connected layers, etc.; where the display information is played in the softmax layer of the last layer.
  • a softmax layer includes a nonlinear classifier that uses the feature vectors of the fully connected layer output and the corresponding tags for classifier training.
  • the whole process of softmax can be divided into three steps. The first step is to find the maximum value of all the dimensions of the fixed feature vector X, which is denoted as Max_i.
  • the second step uses the exponential function exp to convert each dimension in the vector to 0 ⁇ .
  • the second device 12 inputs the picture itself as a feature into the convolutional neural network for training, and each of the obtained training pictures is directly converted into a feature matrix [W, H, C], where W is the The width dimension of the training picture, H is the height dimension of the training picture, and C is the display information such as the display classification information of the training picture. Then all the pictures are transferred into the model for training in K-segment.
  • the random gradient descent method is used to iteratively learn the above convolutional neural network, where K is generally 32 or 64.
  • each iteration will update the parameters of each layer in the network, such as the weight value of the nodes in the network layer and the paranoid value, until the values of these parameters converge to obtain the optimal solution.
  • the second device 12 can have three layers The result of the volume base layer processing is downsampled (as shown in the Maxpooling layer (maximum merge layer) in Figure 4). Then, the second device 12 establishes a connection relationship between all the feature vectors (ie, neurons) output through the downsampling using the fully connected layer, thereby implementing abstract expression.
  • the second device 12 is provided with a RELU layer and a normalized layer after each layer of the convolution layer.
  • RELU rectified linear unit, an activation function
  • the normalization layer performs normalization processing based on a local window of each pixel point, that is, a local normalization operation, which can enhance the overall generalization performance of the model.
  • the convolution layer comprises a Gaussian convolution layer
  • the Gaussian convolution layer is configured to perform a convolution operation on the output result of the previous layer and the plurality of Gaussian filter kernels, wherein the Gaussian filter kernel is based on the Multiple training pictures have been learned.
  • the second device 12 performs a convolution operation on the output result of the previous layer with a plurality of preset Gaussian filter kernels using a Gaussian convolution layer.
  • the parameters of the Gaussian kernel are learned.
  • the size of the Gaussian kernel used by the second device 12 to set the three-layer Gaussian convolution layer is 5*5, and in each Gaussian convolution layer, the convolution kernel traverses all the pixels of the picture. Calculation.
  • the second device 12 learns 64 Gaussian convolution kernels for the first layer convolutional layer, 32 Gaussian convolution kernels for the second layer convolutional layer, and 16 for the third layer convolutional layer. Gaussian convolutional layer.
  • the number of Gaussian convolution kernels of the above-mentioned layers of convolution layers is only an example. In fact, the number of Gaussian convolution kernels of each layer convolution layer may be determined by actual needs.
  • the second device 12 After the picture detection model is established, the second device 12 provides the picture detection model to the third device 13. When the user uploads a to-be-detected picture, the third device 13 determines the picture display information of the picture to be detected according to the picture detection model.
  • the third device 13 inputs the to-be-detected picture into the picture detection model, and obtains a probability vector of each display information corresponding to the picture to be detected, and takes the maximum value of the probability vector, or the probability value exceeds the preset.
  • the display information corresponding to the threshold person is the picture display information of the picture to be detected.
  • the picture display information may be only one display information, and may also include the plurality of At least one of the display classification information.
  • the display information that can be detected by the image detection model includes: three types of display classification information, specifically: a front display type, a side display type, and a detail display type.
  • the third device 13 determines the to-be-detected picture.
  • the picture display information includes: front display type and detail display type.
  • each probability vector obtained by the third device 13 is less than a preset threshold, it is determined that the picture to be detected of one pair is not compliant.
  • the third device 13 includes: a third unit 131 and a third unit 132. (as shown in Figure 5)
  • the third unit 131 is configured to determine, according to the picture related information of the picture to be detected, the corresponding picture detection submodel from the picture detection model.
  • the third two unit 132 is configured to determine, according to the picture detection submodel, picture display information of the to-be-detected picture, where the picture display information includes at least one of the plurality of display classification information.
  • each of the picture detection sub-models corresponds to detecting a type of item picture.
  • the picture detection sub-model A corresponds to detecting a clothing type picture
  • the picture detection sub-model B corresponds to detecting a digital product type picture.
  • the third unit 131 can also acquire the picture related information of the to-be-detected picture while acquiring the picture to be detected.
  • the third unit 131 acquires a table containing information related to the picture to be detected and the picture through communication protocols such as http and https.
  • the picture related information includes, but is not limited to: 1) display subject information of the to-be-detected picture.
  • the display subject information is used to indicate an item name, a category, and the like displayed in the to-be-detected picture.
  • the display subject information includes: a garment, a top.
  • Display position information of the picture to be detected The display position information is used to indicate a placement position of the item displayed in the picture to be detected, and the like.
  • the display position information includes: a main body view of the furniture, a left side view of the furniture, a right side view of the furniture, a partial view of the furniture, and the like.
  • Application-related information of the application to which the picture to be detected belongs is used to indicate that the source information of the to-be-detected picture is uploaded.
  • the application related information includes: digital information provided by the application client, and a WEB page.
  • the clothing class uploads information and so on.
  • the third unit 131 can obtain the corresponding picture detection submodel according to the picture related information.
  • the third unit 131 cannot obtain the corresponding picture detection submodel according to the picture related information, it is determined that the acquired picture to be detected is not in compliance.
  • the third two unit 132 determines the picture display information of the to-be-detected picture according to the picture detection sub-model.
  • the third unit 132 determines the picture display information of the picture to be detected according to the picture detection sub-model and the foregoing third device 13 according to the picture detection model.
  • the manner of determining the picture display information of the picture to be detected is the same or similar, and will not be described in detail herein.
  • FIG. 6 illustrates a method for determining picture display information in accordance with an aspect of the present application.
  • the method is mainly performed by a determining device.
  • the method comprises steps S1, S2 and S3. Specifically, in step S1, the determining device acquires a plurality of training pictures that have been labeled with the display information; in step S2, the determining device performs corresponding picture detection by using the convolutional neural network training based on the plurality of training pictures. a model; in step S3, the determining device determines picture display information of the picture to be detected according to the picture detection model.
  • the determining device may be implemented by a network host, a single network server, a plurality of network server sets, or a cloud composed of a plurality of servers.
  • the cloud is composed of a large number of host or network servers based on Cloud Computing, which is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computers.
  • the determining device includes an electronic device capable of automatically performing numerical calculation and information processing according to an instruction set or stored in advance, the hardware of which includes but is not limited to a microprocessor, an application specific integrated circuit (ASIC), and a programmable Gate array (FPGA), digital processor (DSP), embedded devices, etc.
  • ASIC application specific integrated circuit
  • FPGA programmable Gate array
  • DSP digital processor
  • the determining device constructs a size, a format, and the like required by the second device according to the second device, and obtains the training picture and the corresponding image by using a predetermined communication method such as http, https, or the like, or by local reading.
  • Display information
  • the training picture may be
  • the stored source picture may also be a picture obtained by trimming the source picture or the like.
  • the determining device uniformly acquires each training picture according to the classification of the display information.
  • the display information includes any information that can describe the placement information of the item displayed by the training picture, and the details or overall effect of the item.
  • the display information describing the training picture of the clothing includes but is not limited to: the upper body display information, the model lower body display information, the model body display information, the upper body tile display information, the lower body tile display information, the whole body tile display information, the detail display Information, stacked display information, multi-picture display information, other display information, etc. Its correspondence with the training picture is exemplified in FIG. 2.
  • the display information describing the training picture of the furniture includes, but is not limited to, front display information, three-dimensional display information, side display information, detail display information, other display information, and the like.
  • the display information corresponding to each of the training pictures may be directly obtained from a database corresponding to each training picture. It may also be determined according to the obtained display manner of each training picture, wherein the display manner includes a plurality of display classification information, and the marked display information includes at least one of the plurality of display classification information.
  • the display manner for the lower body garment preset in the determining device includes three display classification information, specifically: the model lower body display classification information, the lower body tile display classification information, and the lower body detail display classification information.
  • the training picture acquired by the determining device includes: a model showing a picture of the pants, a model showing a picture of the skirt, and simultaneously obtaining the display manners of the two training pictures are the model display information of the lower body, corresponding to each picture.
  • the display information is: the model shows information under the body.
  • the determining device determines the picture according to the acquired corresponding display manner.
  • the display information includes the model's lower body display information and the lower body detail display classification information.
  • the determining device obtains a plurality of training pictures by trimming the source picture.
  • the step S1 includes: step S11, step S12. As shown in Figure 7.
  • the determining device acquires a plurality of sample pictures that have been labeled with the display information; in step S12, the determining device performs pre-processing on each sample picture to obtain a corresponding training picture.
  • the determining device remotely calls or passes through an agreed communication method such as http or https. Get multiple sample images by local reading, etc. Since the size, color, and the like of the acquired sample pictures are different, the determining device performs pre-processing on each sample picture to obtain each training picture that meets the preset size and color requirements.
  • the manner in which the determining device performs preprocessing on each sample picture includes selecting, from the acquired sample pictures, a picture that meets a requirement of a preset size, color, and the like as the training picture.
  • the processing manner comprises: normalizing each sample picture to obtain a corresponding training picture.
  • the manner of the normalization processing includes, but is not limited to, at least one of the following: 1) converting a sample picture into a three primary color representation. For example, if the acquired sample picture is in JPG format, the determining device converts the sample picture into an RGB format. 2) Scaling the sample image so that one side is fixed length.
  • the determining device converts the sample picture into a short side size of a and a long side size of (a*a i /b i ).
  • i is the sequence number of the sample picture, 0 ⁇ i ⁇ n
  • n is the number of sample pictures acquired
  • a i is the short side size of the ith sample picture
  • b i is the long side size of the ith sample picture .
  • the determining device crops the two short sides of the acquired sample pictures by the width of (a i -a)/2, and the two long sides respectively cut the width of (b i -a)/2.
  • i is the sequence number of each sample picture, 0 ⁇ i ⁇ n, n is the number of sample pictures acquired, a i is the short side size of the i-th sample picture, and b i is the long side of the i-th sample picture Size, a is the length and width of the cropped sample image.
  • the determining device may first convert the sample picture into a three primary color representation, and then scale the sample picture to make the side length, and/or crop.
  • the number of the sample pictures may be the same as the number of training pictures, or may be less than the number of training pictures.
  • the determining device intercepts a plurality of corresponding training pictures from each sample picture processed by the normalization by using a moving window.
  • the number of sample pictures acquired by the determining device is n
  • the determining device now normalizes the acquired sample pictures according to any one or more of the foregoing manners.
  • each of the cropped sample pictures of size a*a is carpet-moved by a moving window of a'*a', wherein the step of moving is t.
  • the number of training pictures that are taken out of each sample picture is 1+(a-a')/t
  • the determining device has a total of The number of pictures to be trained is n*(1+(a-a’)/t).
  • the determining device intercepts a plurality of corresponding training pictures from each sample picture subjected to the normalization process by using a moving window, so that the obtained training picture retains the lower half information of the original sample picture; For example, the determining device intercepts a plurality of training pictures that retain the lower half of the original sample picture from the sample picture by keeping the moving window aligned with the bottom of the sample picture during the movement.
  • the determining device may further obtain more training pictures by performing rotation processing on the intercepted training pictures such as mirror flipping, plane rotation, and the like. For example, after the determining device performs normalization processing according to any one or more of the above manners, and even intercepts a plurality of corresponding training pictures, the obtained training pictures are rotated, so that more training pictures are obtained. And step S2 is performed.
  • each training picture acquired by the determining device should belong to the same type of items.
  • all the acquired training pictures are displayed as clothing items; or, the obtained training pictures are all digital items and the like.
  • step S2 the determining device trains the convolutional neural network based on the plurality of training pictures to obtain a corresponding picture detection model.
  • the determining device performs convolutional neural network training on each training picture acquired in the step S1 to obtain a feature vector (ie, a neuron) corresponding to each display information, and then obtains the obtained information according to each display information.
  • a feature vector ie, a neuron
  • Each feature vector is classified and processed to obtain a picture detection model.
  • the determining device performs convolutional neural network training on each training picture, and correspondingly displays the obtained feature vector with the display information of the training picture to be associated with the training picture.
  • the same display is corresponding to the same display.
  • the fixed feature vectors of the information are subjected to normalized classification processing in all dimensions, and finally the feature vectors of each dimension after classification are corresponding to a picture detection model of display information.
  • the convoluted neural network includes three layers of convolution layers and two layers of fully connected layers. Specifically, the determining device iterates the result obtained by each layer of the convolution layer in a gradient descent manner. Then, the two connected layers are used to establish a connection relationship between the obtained feature vectors.
  • the convolutional neural network may also preferably set a dropout layer (shown in FIG. 4) in one of the all-connected layers to improve the efficiency of model convergence; here, the role of the Dropout layer is to Part of the parameters in the corresponding convolutional layer or fully connected layer Hibernate, but its corresponding parameter value will be retained but not updated until the next time it is not selected for hibernation.
  • the convolutional neural network also includes a softmax (soft kernel function) layer; during the training phase, the training picture and the corresponding display information are used together, and the whole problem is trained through a multi-layer network, such as a dropout layer, a convolution layer, and a whole Connected layers, etc.; where the display information is played in the softmax layer of the last layer.
  • a softmax layer includes a nonlinear classifier that uses the feature vectors of the fully connected layer output and the corresponding tags for classifier training.
  • the whole process of softmax can be divided into three steps. The first step is to find the maximum value of all the dimensions of the fixed feature vector X, which is denoted as Max_i.
  • the second step uses the exponential function exp to convert each dimension in the vector to 0 ⁇ .
  • the determining device inputs the picture itself as a feature into the convolutional neural network for training, and each obtained training picture is directly converted into a feature matrix [W, H, C], where W is the training picture.
  • the width dimension, H is the height dimension of the training picture
  • C is the display information such as the display classification information of the training picture.
  • K is generally 32 or 64.
  • each iteration will update the parameters of each layer in the network, such as the weight value of the nodes in the network layer and the paranoid value, until the values of these parameters converge to obtain the optimal solution.
  • the determining device may downsample the result of the three-layer volume base layer processing (as shown by the Maxpooling layer (maximum merge layer) in FIG. 4). Then, the determining device establishes a connection relationship between all the feature vectors (ie, neurons) output through the downsampling using the fully connected layer, thereby implementing abstraction expression.
  • the determining device sets a RELU (reduced linear unit, an activation function) layer and a normalization layer after each layer of the convolution layer.
  • the RELU layer utilizes the unsaturated nonlinear characteristics of each neuron in the neural network to improve the overall training efficiency of the model.
  • the normalization layer performs normalization processing based on a local window of each pixel point, that is, a local normalization operation, which can enhance the overall generalization performance of the model.
  • the convolution layer comprises a Gaussian convolution layer
  • the Gaussian convolution layer is configured to perform a convolution operation on the output result of the previous layer and the plurality of Gaussian filter kernels, wherein the Gaussian filter kernel is based on the Multiple training pictures have been learned.
  • the determining device utilizes a Gaussian convolutional layer to perform a convolution operation on the output result of the previous layer with a plurality of preset Gaussian filter kernels. Among them, the parameters of the Gaussian kernel are learned.
  • the size of the Gaussian kernel used by the determining device to set the three-layer Gaussian convolution layer is 5*5, and in each Gaussian convolution layer, the convolution kernel is traversed for all pixel points of the picture.
  • the determining device learns 64 Gaussian convolution kernels for the first layer convolutional layer, 32 Gaussian convolution kernels for the second layer convolutional layer, and 16 Gaussian for the third layer convolutional layer Convolution layer.
  • the number of Gaussian convolution kernels of the above-mentioned layers of convolution layers is only an example. In fact, the number of Gaussian convolution kernels of each layer convolution layer may be determined by actual needs.
  • the determining device After the picture detection model is established, the determining device saves the picture detection model.
  • the determining device performs step S3, that is, determines picture display information of the to-be-detected picture according to the picture detection model.
  • the determining device inputs the to-be-detected picture into the picture detection model, and obtains a probability vector of each display information corresponding to the to-be-detected picture, where the value of the probability vector is the largest, or the probability value exceeds a preset threshold.
  • the corresponding display information is the picture display information of the picture to be detected.
  • the picture display information may be only one display information, and may further include at least one of the plurality of display classification information.
  • the display information that can be detected by the image detection model includes: three types of display classification information, specifically: a front display type, a side display type, and a detail display type. Determining, by the determining device, the picture display information of the to-be-detected picture, when the values of the two probability vectors exceeding the preset threshold in the values of the probability vectors obtained by the determining device respectively correspond to the front display type and the detail display type Includes: front display type and detail display type.
  • each probability vector obtained by the determining device is less than a preset threshold, it is determined that the image to be detected of one of the ones is not in compliance.
  • the step S3 comprises: steps S31, S32. As shown in Figure 4.
  • step S31 the determining device determines, according to the picture related information of the picture to be detected, the corresponding picture detection submodel from the picture detection model.
  • step S32 the said The determining device determines the picture display information of the to-be-detected picture according to the picture detection sub-model, wherein the picture display information includes at least one of the plurality of display classification information.
  • each of the picture detection sub-models corresponds to detecting a type of item picture.
  • the picture detection sub-model A corresponds to detecting a clothing type picture
  • the picture detection sub-model B corresponds to detecting a digital product type picture.
  • the determining device can acquire the picture related information of the to-be-detected picture while acquiring the picture to be detected.
  • the determining device acquires a table including information related to the detected picture and the picture through a communication protocol such as http, https, or the like.
  • the picture related information includes, but is not limited to: 1) display subject information of the to-be-detected picture.
  • the display subject information is used to indicate an item name, a category, and the like displayed in the to-be-detected picture.
  • the display subject information includes: a garment, a top. 2) Display position information of the picture to be detected.
  • the display position information is used to indicate a placement position of the item displayed in the picture to be detected, and the like.
  • the display position information includes: a main body view of the furniture, a left side view of the furniture, a right side view of the furniture, a partial view of the furniture, and the like.
  • Application-related information of the application to which the picture to be detected belongs is used to indicate that the source information of the to-be-detected picture is uploaded.
  • the application related information includes: digital information provided by the application client, clothing category upload information in the WEB page, and the like.
  • the determining device can obtain the corresponding picture detection sub-model according to the picture related information.
  • the determining device cannot obtain the corresponding picture detection submodel according to the picture related information, it is determined that the acquired picture to be detected is not in compliance.
  • the determining device determines the picture display information of the to-be-detected picture according to the picture detection sub-model.
  • step S32 the manner in which the picture display information of the to-be-detected picture is determined according to the picture detection sub-model in step S32 is determined according to the picture detection model according to the foregoing step S3.
  • the manner in which the picture display information of the detected picture is described is the same or similar, and will not be described in detail herein.
  • the method and device for determining picture display information of the present application are modeled by displaying different display modes of similar items, and the model to be detected is determined by the built model.
  • the picture display information of the film realizes efficient and accurate recognition of the display mode of the picture, thereby supporting further improvement of the displayed picture or the display mode of the picture, thereby improving the efficiency of the user to obtain information, providing the utilization rate of the screen of the user terminal and improving the use of the user.
  • the present application normalizes the acquired sample images, which is beneficial to the unified processing of the training images during modeling, and achieves the use of fewer sample images to obtain enough training images, and improve construction.
  • Modular efficiency also, the use of three-layer convolutional layer and two-layer fully connected layer for neural network training can effectively improve the accuracy of the picture detection model, so that the accuracy of the recognition when identifying and identifying the picture to be detected More than 90%. Therefore, the present application effectively overcomes various shortcomings in the prior art and has high industrial utilization value.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

A method and device for determining image display information, the method comprising: first acquiring, by a device, a plurality of training images having labelled display information (S1); then acquiring a corresponding image detection model by training convolutional neural networks based on the plurality of training images (S2); and when an image to be detected is obtained, determining, by the device, image display information in the image to be detected according to the image detection model (S3). The method realizes a display method by efficiently and accurately recognizing images, and in turn further improves the displayed images or the displaying manner of the images, thus improving information acquisition efficiency for a user and a screen resource utilization rate of a user terminal, and improving user experience.

Description

一种用于确定图片陈列信息的方法及设备Method and device for determining picture display information 技术领域Technical field
本申请涉及计算机领域,尤其涉及一种用于确定图片陈列信息的方法及设备。The present application relates to the field of computers, and in particular, to a method and apparatus for determining picture display information.
背景技术Background technique
随着互联网技术的发展,图片因其相对文字具有表达直观、内容丰富等优势,在越来越多的网页及应用中被广泛应用。例如,网购平台为各电商提供了各种商品信息发布机制,商家可以上传多角度、多背景的商品照片,以吸引用户。With the development of Internet technology, pictures are widely used in more and more web pages and applications because of their intuitive expression and rich content. For example, the online shopping platform provides various commodity information release mechanisms for various e-commerce providers, and merchants can upload photos of products with multiple angles and multiple backgrounds to attract users.
然而,在实际应用中,糟糕的图片陈列方式不仅阻碍了用户获取所需信息,也浪费了用户宝贵的带宽资源、降低了用户的屏幕利用率。显然,鉴于互联网的开放性本质,这样的情况将会持续存在;而且,由于互联网信息的爆发性,试图通过人工来审核这些图片的陈列方式也是不可行的。However, in practical applications, the poor image display method not only hinders the user from obtaining the required information, but also wastes the user's valuable bandwidth resources and reduces the user's screen utilization. Obviously, given the open nature of the Internet, such a situation will continue to exist; and, due to the explosive nature of Internet information, it is not feasible to attempt to manually review the display of these images.
发明内容Summary of the invention
本申请的目的是提供一种用于确定图片陈列信息的方法及设备。It is an object of the present application to provide a method and apparatus for determining picture display information.
根据本申请的一个方面,提供了一种用于确定图片陈列信息的方法,其中,该方法包括:According to an aspect of the present application, a method for determining picture display information is provided, wherein the method includes:
获取已标注陈列信息的多个训练图片;Obtaining multiple training pictures with labeled display information;
基于所述多个训练图片经卷积神经网络训练得对应的图片检测模型;Performing a corresponding picture detection model based on the plurality of training pictures via a convolutional neural network;
根据所述图片检测模型确定待检测图片的图片陈列信息。Determining picture display information of the picture to be detected according to the picture detection model.
根据本申请的另一方面,还提供了一种用于确定图片陈列信息的设备,其中,该设备包括:According to another aspect of the present application, there is also provided an apparatus for determining picture display information, wherein the apparatus comprises:
第一装置,用于获取已标注陈列信息的多个训练图片;a first device, configured to acquire a plurality of training pictures that have been marked with display information;
第二装置,用于基于所述多个训练图片经卷积神经网络训练得对应的图片检测模型;a second device, configured to perform a corresponding picture detection model by using a convolutional neural network based on the plurality of training pictures;
第三装置,用于根据所述图片检测模型确定待检测图片的图片陈列信 息。a third device, configured to determine, according to the picture detection model, a picture display letter of the picture to be detected interest.
与现有技术相比,本申请通过对图片的不同陈列方式的展示进行建模,并通过所建模型来确定待检测图片的图片陈列信息,实现高效、准确地识别图片的陈列方式,从而支持进一步改进所陈列图片或该图片的陈列方式,进而提高用户获取信息效率、提供用户终端屏幕资源利用率并改善用户的使用体验。Compared with the prior art, the present application models the display of different display modes of the picture, and determines the picture display information of the picture to be detected through the built model, thereby realizing efficient and accurate recognition of the picture display mode, thereby supporting The picture displayed or the display mode of the picture is further improved, thereby improving the efficiency of the user to obtain information, providing the utilization rate of the screen resources of the user terminal, and improving the user experience.
附图说明DRAWINGS
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:Other features, objects, and advantages of the present application will become more apparent from the detailed description of the accompanying drawings.
图1示出根据本申请一个方面的一种用于确定图片陈列信息的设备示意图;1 shows a schematic diagram of an apparatus for determining picture display information in accordance with an aspect of the present application;
图2示出根据本申请一个方面的一种用于确定图片陈列信息的设备中所获取的训练图片与陈列信息的对应关系示意图;2 is a schematic diagram showing a correspondence relationship between training pictures and display information acquired in an apparatus for determining picture display information according to an aspect of the present application;
图3示出根据本申请一个优选实施例的一种用于确定图片陈列信息的设备中第一装置的示意图;3 shows a schematic diagram of a first device in an apparatus for determining picture display information in accordance with a preferred embodiment of the present application;
图4示出根据本申请一个优选实施例的一种用于确定图片陈列信息的设备中第二装置所执行的流程图;4 shows a flow chart executed by a second device in an apparatus for determining picture display information in accordance with a preferred embodiment of the present application;
图5示出根据本申请一个优选实施例的一种用于确定图片陈列信息的设备中第三装置的示意图;FIG. 5 is a schematic diagram showing a third device in an apparatus for determining picture display information according to a preferred embodiment of the present application; FIG.
图6示出根据本申请另一个方面的一种用于确定图片陈列信息的方法流程图;6 shows a flow chart of a method for determining picture display information according to another aspect of the present application;
图7示出根据本申请一个优选实施例的一种用于确定图片陈列信息的方法中步骤S1的流程图;Figure 7 shows a flow chart of step S1 in a method for determining picture display information in accordance with a preferred embodiment of the present application;
图8示出根据本申请另一个优选实施例的一种用于确定图片陈列信息的方法中步骤S3的流程图。FIG. 8 shows a flow chart of step S3 in a method for determining picture display information in accordance with another preferred embodiment of the present application.
附图中相同或相似的附图标记代表相同或相似的部件。The same or similar reference numerals in the drawings denote the same or similar components.
具体实施方式 detailed description
下面结合附图对本申请作进一步详细描述。The present application is further described in detail below with reference to the accompanying drawings.
在本申请一个典型的配置中,终端、服务网络的设备和可信方均包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration of the present application, the terminal, the device of the service network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, A magnetic tape cartridge, magnetic tape storage or other magnetic storage device or any other non-transportable medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.
图1示出根据本申请一个方面的一种用于确定图片陈列信息的设备1。其中,所述设备1包括:第一装置11、第二装置12、第三装置13。具体地,所述第一装置11用于获取已标注陈列信息的多个训练图片;所述第二装置12用于基于所述多个训练图片经卷积神经网络训练得到对应的图片检测模型;所述第三装置13用于根据所述图片检测模型确定待检测图片的图片陈列信息。1 shows an apparatus 1 for determining picture display information in accordance with an aspect of the present application. The device 1 includes a first device 11, a second device 12, and a third device 13. Specifically, the first device 11 is configured to acquire a plurality of training pictures that have been labeled with display information; and the second device 12 is configured to obtain a corresponding picture detection model by using a convolutional neural network based on the plurality of training pictures; The third device 13 is configured to determine picture display information of the to-be-detected picture according to the picture detection model.
在此,所述设备1可由网络主机、单个网络服务器、多个网络服务器集或多个服务器构成的云等实现。在此,云由基于云计算(Cloud Computing)的大量主机或网络服务器构成,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。本领域技术人员应能理解上述网络设备仅为举例,其他现有的或今后可能出现的网络设备如可适用于本申请,也应包含在本申请保护范围以内,并 在此以引用方式包含于此。在此,所述设备1包括一种能够按照事先设定或存储的指令,自动进行数值计算和信息处理的电子设备,其硬件包括但不限于微处理器、专用集成电路(ASIC)、可编程门阵列(FPGA)、数字处理器(DSP)、嵌入式设备等。Here, the device 1 may be implemented by a network host, a single network server, a plurality of network server sets, or a cloud composed of a plurality of servers. Here, the cloud is composed of a large number of host or network servers based on Cloud Computing, which is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computers. A person skilled in the art should understand that the foregoing network device is only an example, and other existing or future network devices may be applicable to the present application, and should also be included in the protection scope of the present application. It is hereby incorporated by reference. Here, the device 1 includes an electronic device capable of automatically performing numerical calculation and information processing according to an instruction set or stored in advance, the hardware of which includes but is not limited to a microprocessor, an application specific integrated circuit (ASIC), and a programmable Gate array (FPGA), digital processor (DSP), embedded devices, etc.
具体地,所述第一装置11按照所述第二装置12构建图片检测模型所要求的尺寸、格式等,通过http、https等约定通信方式远程调用、或通过本地读取等方式获取训练图片及所对应的陈列信息。其中,所述训练图片可以是所存储的源图片,也可以是对源图片进行修剪之后所得到的图片等。所述第一装置11按照陈列信息的分类均匀地获取各训练图片。其中,所述陈列信息包括任何能够描述所述训练图片所展示的物品的摆放信息、以及所述物品的细节或整体效果的信息等。例如,描述服装的训练图片的陈列信息包括但不限于:模特上身陈列信息、模特下身陈列信息、模特全身陈列信息、上身平铺陈列信息、下身平铺陈列信息、全身平铺陈列信息、细节陈列信息、堆叠陈列信息、多图陈列信息、其他陈列信息等。其与训练图片的对应关系如图2举例。又如,描述家具的训练图片的陈列信息包括但不限于:正面陈列信息、立体陈列信息、侧面陈列信息、细节陈列信息、其他陈列信息等。Specifically, the first device 11 constructs a size, format, and the like required by the second device 12 according to the size, format, and the like of the image detection model, and obtains the training image by using a predetermined communication method such as http, https, or by local reading. Corresponding display information. The training picture may be the stored source picture, or may be a picture obtained after the source picture is trimmed. The first device 11 uniformly acquires each training picture in accordance with the classification of the display information. Wherein, the display information includes any information that can describe the placement information of the item displayed by the training picture, and the details or overall effect of the item. For example, the display information describing the training picture of the clothing includes but is not limited to: the upper body display information, the model lower body display information, the model body display information, the upper body tile display information, the lower body tile display information, the whole body tile display information, the detail display Information, stacked display information, multi-picture display information, other display information, etc. Its correspondence with the training picture is exemplified in FIG. 2. For another example, the display information describing the training picture of the furniture includes, but is not limited to, front display information, three-dimensional display information, side display information, detail display information, other display information, and the like.
在此,各所述训练图片所对应的陈列信息可以由对应各训练图片的数据库中直接获取。也可以根据所获取的各训练图片的陈列方式来确定,其中,所述陈列方式包括多种陈列分类信息,所标注的陈列信息包括所述多种陈列分类信息中至少一个。Here, the display information corresponding to each of the training pictures may be directly obtained from a database corresponding to each training picture. It may also be determined according to the obtained display manner of each training picture, wherein the display manner includes a plurality of display classification information, and the marked display information includes at least one of the plurality of display classification information.
例如,所述第一装置11中预设了针对下身服装的陈列方式包括三种陈列分类信息,具体为:模特下身陈列分类信息、下身平铺陈列分类信息和下身细节陈列分类信息。所述第一装置11所获取的训练图片包括:模特展示裤子的图片、模特展示半身裙的图片,同时一并获取了该两幅训练图片的陈列方式均为模特下身陈列分类信息,则对应各图片的陈列信息均为:模特下身陈列信息。For example, the display manner of the first device 11 for the lower body garment includes three types of display classification information, specifically: the model lower body display classification information, the lower body tile display classification information, and the lower body detail display classification information. The training picture acquired by the first device 11 includes: a model showing a picture of the pants, a model showing a picture of the skirt, and simultaneously obtaining the display manners of the two training pictures, the display information of the model lower body, corresponding to each The display information of the pictures are: the model shows the information under the body.
又如,所述第一装置11所获取的训练图片中的一幅图片中既包含模特展示裤子的图像还包含裤子细节的图像,则根据所获取的对应的陈列方 式,所述第一装置11确定所述图片的陈列信息包含模特下身陈列信息和下身细节陈列分类信息。For another example, a picture in the training picture acquired by the first device 11 includes an image of the model showing the pants and an image of the pants details, according to the acquired corresponding display side. The first device 11 determines that the display information of the picture includes the model lower body display information and the lower body detail display classification information.
优选地,所述第一装置11通过对源图片进行修剪得到多个训练图片。具体地,所述第一装置11包括:第一一单元111、第一二单元112(如图3所示)。Preferably, the first device 11 obtains a plurality of training pictures by trimming the source picture. Specifically, the first device 11 includes: a first unit 111 and a first two unit 112 (shown in FIG. 3).
具体地,所述第一一单元111用于获取已标注陈列信息的多个样本图片;所述第一二单元112用于对每个样本图片进行预处理以获得对应的训练图片。Specifically, the first unit 111 is configured to acquire a plurality of sample pictures that have been labeled with display information; and the first two units 112 are configured to perform pre-processing on each sample picture to obtain a corresponding training picture.
在此,所述第一一单元111通过http、https等约定通信方式远程调用、或通过本地读取等方式获取多个样本图片。由于所获取的样本图片的尺寸、色彩等各不相同,则所述第一二单元112对每个样本图片进行预处理,以得到符合预设尺寸、色彩要求的各训练图片。Here, the first unit 111 acquires a plurality of sample pictures remotely by means of an agreed communication method such as http, https, or the like by local reading or the like. Since the size, color, and the like of the acquired sample pictures are different, the first two units 112 preprocess each sample picture to obtain each training picture that meets the preset size and color requirements.
在此,所述第一二单元112对每个样本图片进行预处理的方式包括从所获取的样本图片中选取符合预设尺寸、色彩等要求的图片作为所述训练图片。优选地,所述与处理方式包括:对每个样本图片进行归一化处理以获得对应的训练图片。具体地,所述归一化处理的方式包括但不限于以下至少任一项:1)将样本图片转换为三原色表示。例如,所获取的样本图片为JPG格式,则所述第一二单元112将该样本图片转换为RGB格式。2)对样本图片按比例缩放使其一边为定长。例如,所获取的各样本图片的尺寸各不相同,则所述第一二单元112将该样本图片转换成短边尺寸为a、长边尺寸为(a*ai/bi)。其中,i为样本图片的序号,0<i<n,n为所获取的样本图片的数量,ai为第i个样本图片的短边尺寸,bi为第i个样本图片的长边尺寸。3)裁剪样本图片使其为正方形。例如,所述第一二单元112将所获取的各样本图片的两短边分别裁剪(ai-a)/2的宽度,两长边分别裁剪(bi-a)/2的宽度。其中,i为各样本图片的序号,0<i<n,n为所获取的样本图片的数量,ai为第i个样本图片的短边尺寸,bi为第i个样本图片的长边尺寸,a为裁剪后的样本图片的长和宽尺寸。Here, the manner in which the first two units 112 preprocess each sample picture includes selecting, from the acquired sample pictures, a picture that meets a requirement of a preset size, a color, and the like as the training picture. Preferably, the processing manner comprises: normalizing each sample picture to obtain a corresponding training picture. Specifically, the manner of the normalization processing includes, but is not limited to, at least one of the following: 1) converting a sample picture into a three primary color representation. For example, if the acquired sample picture is in JPG format, the first two units 112 convert the sample picture into an RGB format. 2) Scaling the sample image so that one side is fixed length. For example, if the acquired size of each sample picture is different, the first two units 112 convert the sample picture into a short side size of a and a long side size of (a*a i /b i ). Where i is the sequence number of the sample picture, 0<i<n, n is the number of sample pictures acquired, a i is the short side size of the ith sample picture, and b i is the long side size of the ith sample picture . 3) Crop the sample image to make it square. For example, the first two units 112 respectively cut the two short sides of the acquired sample pictures by the width of (a i -a)/2, and the two long sides respectively cut the width of (b i -a)/2. Where i is the sequence number of each sample picture, 0<i<n, n is the number of sample pictures acquired, a i is the short side size of the i-th sample picture, and b i is the long side of the i-th sample picture Size, a is the length and width of the cropped sample image.
需要说明的是,本领域技术人员应该理解,上述归一化处理的方式仅为举例。事实上,所述第一二单元112还可以先将样本图片转换为三原色 表示,再对样本图片按比例缩放使其一边为定长、和/或进行裁剪。It should be noted that those skilled in the art should understand that the manner of the above normalization processing is merely an example. In fact, the first two units 112 can also convert the sample picture into three primary colors first. Indicates that the sample image is then scaled so that one side is fixed length, and/or cropped.
其中,所述样本图片的数量可以与训练图片的数量相同,也可以少于训练图片的数量。The number of the sample pictures may be the same as the number of training pictures, or may be less than the number of training pictures.
优选地,所述第一二单元112利用移动窗从经所述归一化处理的每个样本图片中截取多个对应的训练图片。例如,所述第一一单元111所获取的样本图片的数量为n,所述第一二单元112现将所获取的样本图片按照上述任一种或多种方式进行归一化处理。接着,以a’*a’的移动窗对每幅裁剪后的尺寸为a*a的样本图片进行地毯式的移动,其中,移动的步进为t。如此,每幅样本图片都被截取出的训练图片的数量为1+(a-a’)/t,则所述第二单元共得到训练图片的数量为n*(1+(a-a’)/t)。更优选地,所述第一二单元112利用移动窗从经所述归一化处理的每个样本图片中截取多个对应的训练图片,以使所得到的训练图片保留原样本图片的下半部信息;例如,第一二单元112通过使移动窗在移动过程中保持与样本图片底部对齐,从该样本图片中截取多个保留原样本图片的下半部信息的训练图片。Preferably, the first two units 112 intercept a plurality of corresponding training pictures from each sample picture processed by the normalization by using a moving window. For example, the number of sample pictures acquired by the first unit 111 is n, and the first two units 112 now normalize the acquired sample pictures according to any one or more of the foregoing manners. Next, each of the cropped sample pictures of size a*a is carpet-moved by a moving window of a'*a', wherein the step of moving is t. Thus, the number of training pictures that are truncated for each sample picture is 1+(a-a')/t, and the number of training pictures obtained by the second unit is n*(1+(a-a' ) / t). More preferably, the first two units 112 intercept a plurality of corresponding training pictures from each sample picture subjected to the normalization process by using a moving window, so that the obtained training pictures retain the lower half of the original sample picture. For example, the first two units 112 intercept a plurality of training pictures that retain the lower half of the original sample picture from the sample picture by keeping the moving window aligned with the bottom of the sample picture during the movement.
更为优选地,所述第一二单元112还可以通过对所截取的训练图片进行如镜像翻转、平面旋转等的旋转处理,得到更多的训练图片。例如,所述第一二单元112按照上述任一种或多种方式进行归一化处理、甚至截取了多个对应的训练图片之后,将所得到的训练图片进行旋转处理,如此得到更多的训练图片,并将其输送至所述第二装置12。More preferably, the first two units 112 can also obtain more training pictures by performing rotation processing on the intercepted training pictures, such as mirror flip, plane rotation, and the like. For example, after the first two units 112 perform normalization processing according to any one or more of the above manners, and even intercept a plurality of corresponding training pictures, the obtained training pictures are rotated, so that more The picture is trained and delivered to the second device 12.
需要说明的是,所述第一装置11所获取的各训练图片中所展示的物品应属于同一类物品。例如,所获取的各训练图片中所展示的均为服装类物品;或者,所获取的各训练图片中所展示的均为数码类物品等。It should be noted that the items displayed in the training pictures acquired by the first device 11 should belong to the same type of items. For example, all the acquired training pictures are displayed as clothing items; or, the obtained training pictures are all digital items and the like.
所述第二装置12基于所述多个训练图片经卷积神经网络训练得到对应的图片检测模型。The second device 12 trains the convolutional neural network based on the plurality of training pictures to obtain a corresponding picture detection model.
具体地,所述第二装置12将所述第一装置11所获取的各训练图片进行卷积神经网络训练,得到对应各陈列信息的特征向量(即神经元),再按照各陈列信息对所得到的各特征向量进行分类处理,得到图片检测模型。 Specifically, the second device 12 performs convolutional neural network training on each training picture acquired by the first device 11, and obtains a feature vector (ie, a neuron) corresponding to each display information, and then according to each display information. The obtained feature vectors are classified and processed to obtain a picture detection model.
例如,所述第二装置12将每个训练图片进行卷积神经网络训练,并将得到的特征向量与所属训练图片的陈列信息相对应,当所有训练图片完成卷积神经网络训练后,将对应同一陈列信息的各固定特征向量在所有维度上进行归一化的分类处理,最终得到分类后的每个维度的特征向量对应一个陈列信息的图片检测模型。For example, the second device 12 performs convolutional neural network training on each training picture, and correspondingly displays the obtained feature vector with the display information of the training picture to be associated, and when all the training pictures are completed by the convolutional neural network training, corresponding Each fixed feature vector of the same display information is subjected to normalized classification processing in all dimensions, and finally the feature vector of each dimension after classification is corresponding to a picture detection model of display information.
在此,所述卷积神经网络(convoluted neural network)包括三层卷积层及两层全连通层。具体地,所述第二装置12采用梯度下降的方式对每层卷积层所得到的结果进行迭代。再利用两层全连通层将所得到的各特征向量建立连接关系。其中,所述卷积神经网络还可以优选地在其中一层全连通层设置dropout(休眠)层(如图4所示),用以提升模型收敛的效率;在此,Dropout层的作用是将将其对应的卷积层或全连通层中的部分参数休眠,但是其对应的参数值会保留但是不更新,直到下一次不被选中进行休眠才会更新。该卷积神经网络还包括softmax(软性核函数)层;训练阶段,训练图片和对应的陈列信息会一起被利用,整个问题会经过多层网络进行训练,比如dropout层、卷积层、全连通层等等;其中陈列信息是在最后一层的softmax层发挥作用。Softmax层中包含的是一个非线性分类器,其利用全连通层输出的特征向量与对应的标签进行分类器训练。整个softmax的过程可以分为三步,第一步是对固定特征向量X所有维的值求最大值,记为Max_i,第二步使用指数函数exp将向量中的每一维都转化到0~1之间的数,即向量X中的每一维x[i]=exp(x[i]–Max_i),第三步对所有的值求和,然后相应的做归一化,即x[i]=x[i]/sum(x[i])。Here, the convoluted neural network includes three layers of convolution layers and two layers of fully connected layers. Specifically, the second device 12 iterates the results obtained by each layer of the convolution layer in a gradient descent manner. Then, the two connected layers are used to establish a connection relationship between the obtained feature vectors. Wherein, the convolutional neural network may also preferably set a dropout layer (shown in FIG. 4) in one of the all-connected layers to improve the efficiency of model convergence; here, the role of the Dropout layer is to Some of the parameters in its corresponding convolutional layer or fully connected layer are dormant, but their corresponding parameter values are retained but not updated until the next time they are not selected for hibernation. The convolutional neural network also includes a softmax (soft kernel function) layer; during the training phase, the training picture and the corresponding display information are used together, and the whole problem is trained through a multi-layer network, such as a dropout layer, a convolution layer, and a whole Connected layers, etc.; where the display information is played in the softmax layer of the last layer. Included in the Softmax layer is a nonlinear classifier that uses the feature vectors of the fully connected layer output and the corresponding tags for classifier training. The whole process of softmax can be divided into three steps. The first step is to find the maximum value of all the dimensions of the fixed feature vector X, which is denoted as Max_i. The second step uses the exponential function exp to convert each dimension in the vector to 0~. The number between 1 is the vector x[i]=exp(x[i]–Max_i) in the vector X. The third step sums all the values and then normalizes them, ie x[ i]=x[i]/sum(x[i]).
例如,所述第二装置12将图片本身作为一个特征输入所述卷积神经网络进行训练,得到的每张训练图片直接转化为一个特征矩阵[W,H,C],其中,W为所述训练图片的宽度尺寸,H为所述训练图片的高度尺寸,C为所述训练图片的陈列分类信息等陈列信息。然后所有图片以K张为单位调入模型中进行训练,训练过程中使用了随机梯度下降方法对上述的卷积神经网络进行迭代学习,此处K一般取32或64。其中,每一轮迭代都会更新网络中每一层的参数,如网络层内结点的权重值以及偏执值等,直到这些参数值收敛,取得最优解。更为优选地,所述第二装置12可将三层 卷基层处理后的结果进行降采样(如图4中的Maxpooling层(最大值合并层)所示)。接着,所述第二装置12使用全连通层将经过降采样所输出的所有特征向量(即神经元)互相之间建立连接关系,从而实现抽象化表达。For example, the second device 12 inputs the picture itself as a feature into the convolutional neural network for training, and each of the obtained training pictures is directly converted into a feature matrix [W, H, C], where W is the The width dimension of the training picture, H is the height dimension of the training picture, and C is the display information such as the display classification information of the training picture. Then all the pictures are transferred into the model for training in K-segment. During the training, the random gradient descent method is used to iteratively learn the above convolutional neural network, where K is generally 32 or 64. Among them, each iteration will update the parameters of each layer in the network, such as the weight value of the nodes in the network layer and the paranoid value, until the values of these parameters converge to obtain the optimal solution. More preferably, the second device 12 can have three layers The result of the volume base layer processing is downsampled (as shown in the Maxpooling layer (maximum merge layer) in Figure 4). Then, the second device 12 establishes a connection relationship between all the feature vectors (ie, neurons) output through the downsampling using the fully connected layer, thereby implementing abstract expression.
优选地,如图4所示,所述第二装置12在每层卷积层后均设置RELU层和归一化层。其中,RELU(rectified linear unit,校正线性单元,一种激活函数)层利用神经网络中的各神经元的不饱和的非线性特性,提高模型整体的训练效率。所述归一化层基于每个像素点的局部窗口进行归一化处理,也就是局部归一化操作,能够增强模型整体的泛化性能。Preferably, as shown in FIG. 4, the second device 12 is provided with a RELU layer and a normalized layer after each layer of the convolution layer. Among them, RELU (rectified linear unit, an activation function) layer utilizes the unsaturated nonlinear characteristics of each neuron in the neural network to improve the overall training efficiency of the model. The normalization layer performs normalization processing based on a local window of each pixel point, that is, a local normalization operation, which can enhance the overall generalization performance of the model.
其中,所述卷积层包括高斯卷积层,所述高斯卷积层用于对前一层的输出结果与多个高斯滤波核进行卷积操作,其中,所述高斯滤波核是基于所述多个训练图片经学习获得的。Wherein the convolution layer comprises a Gaussian convolution layer, the Gaussian convolution layer is configured to perform a convolution operation on the output result of the previous layer and the plurality of Gaussian filter kernels, wherein the Gaussian filter kernel is based on the Multiple training pictures have been learned.
例如,所述第二装置12利用高斯卷积层对前一层的输出结果与多个预设的高斯滤波核进行卷积操作。其中,高斯核的参数是经过学习得到的。所述第二装置12设置三层高斯卷积层所使用的高斯核的尺寸均为5*5,并且,在每一个高斯卷积层中,卷积核均是对图片所有的像素点进行遍历计算。其中,所述第二装置12针对第一层卷积层学习了64个高斯卷积核,针对第二层卷积层学习了32个高斯卷积核,针对第三层卷积层学习了16个高斯卷积层。For example, the second device 12 performs a convolution operation on the output result of the previous layer with a plurality of preset Gaussian filter kernels using a Gaussian convolution layer. Among them, the parameters of the Gaussian kernel are learned. The size of the Gaussian kernel used by the second device 12 to set the three-layer Gaussian convolution layer is 5*5, and in each Gaussian convolution layer, the convolution kernel traverses all the pixels of the picture. Calculation. The second device 12 learns 64 Gaussian convolution kernels for the first layer convolutional layer, 32 Gaussian convolution kernels for the second layer convolutional layer, and 16 for the third layer convolutional layer. Gaussian convolutional layer.
需要说明的是,本领域技术人员应该理解,上述各层卷积层的高斯卷积核的数量仅为举例,事实上,各层卷积层的高斯卷积核的数量可由实际需求而定。It should be noted that those skilled in the art should understand that the number of Gaussian convolution kernels of the above-mentioned layers of convolution layers is only an example. In fact, the number of Gaussian convolution kernels of each layer convolution layer may be determined by actual needs.
在建立了图片检测模型后,所述第二装置12将所述图片检测模型提供给所述第三装置13。当用户上传一待检测图片时,所述第三装置13根据所述图片检测模型确定待检测图片的图片陈列信息。After the picture detection model is established, the second device 12 provides the picture detection model to the third device 13. When the user uploads a to-be-detected picture, the third device 13 determines the picture display information of the picture to be detected according to the picture detection model.
具体地,所述第三装置13将所述待检测图片输入所述图片检测模型,得到所述待检测图片对应各陈列信息的概率向量,取概率向量的值最大者、或者概率值超出预设阈值者所对应的陈列信息为所述待检测图片的图片陈列信息。Specifically, the third device 13 inputs the to-be-detected picture into the picture detection model, and obtains a probability vector of each display information corresponding to the picture to be detected, and takes the maximum value of the probability vector, or the probability value exceeds the preset. The display information corresponding to the threshold person is the picture display information of the picture to be detected.
在此,所述图片陈列信息可以仅为一个陈列信息,还可以包括所述多 种陈列分类信息中至少一个。Here, the picture display information may be only one display information, and may also include the plurality of At least one of the display classification information.
例如,所述图片检测模型所能检测的陈列信息包括:三种陈列分类信息,具体为:正面陈列类型、侧面陈列类型及细节陈列类型。当所述第三装置13所得到的各概率向量的值中超出预设阈值的两个概率向量的值分别对应正面陈列类型和细节陈列类型,则所述第三装置13确定所述待检测图片的图片陈列信息包括:正面陈列类型和细节陈列类型。For example, the display information that can be detected by the image detection model includes: three types of display classification information, specifically: a front display type, a side display type, and a detail display type. When the values of the two probability vectors exceeding the preset threshold in the values of the respective probability vectors obtained by the third device 13 respectively correspond to the front display type and the detail display type, the third device 13 determines the to-be-detected picture. The picture display information includes: front display type and detail display type.
若所述第三装置13所得到的各概率向量的值均小于预设阈值,则认定所对一个的待检测图片不合规。If the value of each probability vector obtained by the third device 13 is less than a preset threshold, it is determined that the picture to be detected of one pair is not compliant.
优选地,所述第三装置13包括:第三一单元131、第三二单元132。(如图5所示)Preferably, the third device 13 includes: a third unit 131 and a third unit 132. (as shown in Figure 5)
所述第三一单元131用于根据待检测图片的图片相关信息从所述图片检测模型中确定对应的所述图片检测子模型。所述第三二单元132用于根据所述图片检测子模型确定所述待检测图片的图片陈列信息,其中,所述图片陈列信息包括所述多种陈列分类信息中至少一个。The third unit 131 is configured to determine, according to the picture related information of the picture to be detected, the corresponding picture detection submodel from the picture detection model. The third two unit 132 is configured to determine, according to the picture detection submodel, picture display information of the to-be-detected picture, where the picture display information includes at least one of the plurality of display classification information.
在此,每个所述图片检测子模型对应检测一类物品图片。例如,图片检测子模型A对应检测服装类图片,图片检测子模型B对应检测数码产品类图片。Here, each of the picture detection sub-models corresponds to detecting a type of item picture. For example, the picture detection sub-model A corresponds to detecting a clothing type picture, and the picture detection sub-model B corresponds to detecting a digital product type picture.
所述第三一单元131在获取待检测图片的同时,还能获取所述待检测图片的图片相关信息。The third unit 131 can also acquire the picture related information of the to-be-detected picture while acquiring the picture to be detected.
例如,所述第三一单元131通过http、https等通信约定获取包含待检测图片和图片相关信息的表格。其中,所述图片相关信息包括但不限于:1)所述待检测图片的展示主体信息。其中,所述展示主体信息用于表示所述待检测图片中所展示的物品名称、类别等。例如,所述展示主体信息包括:服装、上衣。2)所述待检测图片的陈列位置信息。其中,所述陈列位置信息用于表示所述待检测图片中所展示的物品的摆放位置等。例如,所述陈列位置信息包括:家具的主体图、家具的左侧图、家具的右侧图、家具的局部图等。3)所述待检测图片所属应用的应用相关信息。其中,所述应用相关信息用于表示上传所述待检测图片的来源信息等。例如,所述应用相关信息包括:应用客户端所提供的数码类信息、WEB页面中 的服装类上传信息等。For example, the third unit 131 acquires a table containing information related to the picture to be detected and the picture through communication protocols such as http and https. The picture related information includes, but is not limited to: 1) display subject information of the to-be-detected picture. The display subject information is used to indicate an item name, a category, and the like displayed in the to-be-detected picture. For example, the display subject information includes: a garment, a top. 2) Display position information of the picture to be detected. The display position information is used to indicate a placement position of the item displayed in the picture to be detected, and the like. For example, the display position information includes: a main body view of the furniture, a left side view of the furniture, a right side view of the furniture, a partial view of the furniture, and the like. 3) Application-related information of the application to which the picture to be detected belongs. The application related information is used to indicate that the source information of the to-be-detected picture is uploaded. For example, the application related information includes: digital information provided by the application client, and a WEB page. The clothing class uploads information and so on.
由上可见,所述第三一单元131可以根据所述图片相关信息得到所对应的图片检测子模型。当所述第三一单元131根据所述图片相关信息无法得到所对应的图片检测子模型时,则认定所获取的待检测图片不合规。As can be seen from the above, the third unit 131 can obtain the corresponding picture detection submodel according to the picture related information. When the third unit 131 cannot obtain the corresponding picture detection submodel according to the picture related information, it is determined that the acquired picture to be detected is not in compliance.
接着,所述第三二单元132根据所述图片检测子模型确定所述待检测图片的图片陈列信息。Next, the third two unit 132 determines the picture display information of the to-be-detected picture according to the picture detection sub-model.
需要说明的是,本领域技术人员应该理解,所述第三二单元132根据所述图片检测子模型确定所述待检测图片的图片陈列信息的方式与前述第三装置13根据所述图片检测模型确定所述待检测图片的图片陈列信息的方式相同或相似,在此不再详述。It should be noted that those skilled in the art should understand that the third unit 132 determines the picture display information of the picture to be detected according to the picture detection sub-model and the foregoing third device 13 according to the picture detection model. The manner of determining the picture display information of the picture to be detected is the same or similar, and will not be described in detail herein.
图6示出根据本申请一个方面的一种用于确定图片陈列信息的方法。其中,所述方法主要由确定设备来执行。其中,所述方法包括步骤S1、S2和S3。具体地,在步骤S1中,所述确定设备获取已标注陈列信息的多个训练图片;在步骤S2中,所述确定设备基于所述多个训练图片经卷积神经网络训练得到对应的图片检测模型;在步骤S3中,所述确定设备根据所述图片检测模型确定待检测图片的图片陈列信息。FIG. 6 illustrates a method for determining picture display information in accordance with an aspect of the present application. Wherein, the method is mainly performed by a determining device. Wherein the method comprises steps S1, S2 and S3. Specifically, in step S1, the determining device acquires a plurality of training pictures that have been labeled with the display information; in step S2, the determining device performs corresponding picture detection by using the convolutional neural network training based on the plurality of training pictures. a model; in step S3, the determining device determines picture display information of the picture to be detected according to the picture detection model.
在此,所述确定设备可由网络主机、单个网络服务器、多个网络服务器集或多个服务器构成的云等实现。在此,云由基于云计算(Cloud Computing)的大量主机或网络服务器构成,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。本领域技术人员应能理解上述网络设备仅为举例,其他现有的或今后可能出现的网络设备如可适用于本申请,也应包含在本申请保护范围以内,并在此以引用方式包含于此。在此,所述确定设备包括一种能够按照事先设定或存储的指令,自动进行数值计算和信息处理的电子设备,其硬件包括但不限于微处理器、专用集成电路(ASIC)、可编程门阵列(FPGA)、数字处理器(DSP)、嵌入式设备等。Here, the determining device may be implemented by a network host, a single network server, a plurality of network server sets, or a cloud composed of a plurality of servers. Here, the cloud is composed of a large number of host or network servers based on Cloud Computing, which is a kind of distributed computing, a super virtual computer composed of a group of loosely coupled computers. Those skilled in the art should understand that the foregoing network device is only an example, and other existing or future network devices may be applicable to the present application, and are also included in the protection scope of the present application, and are herein incorporated by reference. this. Here, the determining device includes an electronic device capable of automatically performing numerical calculation and information processing according to an instruction set or stored in advance, the hardware of which includes but is not limited to a microprocessor, an application specific integrated circuit (ASIC), and a programmable Gate array (FPGA), digital processor (DSP), embedded devices, etc.
具体地,所述确定设备按照所述第二装置构建图片检测模型所要求的尺寸、格式等,通过http、https等约定通信方式远程调用、或通过本地读取等方式获取训练图片及所对应的陈列信息。其中,所述训练图片可以是 所存储的源图片,也可以是对源图片进行修剪之后所得到的图片等。所述确定设备按照陈列信息的分类均匀地获取各训练图片。其中,所述陈列信息包括任何能够描述所述训练图片所展示的物品的摆放信息、以及所述物品的细节或整体效果的信息等。例如,描述服装的训练图片的陈列信息包括但不限于:模特上身陈列信息、模特下身陈列信息、模特全身陈列信息、上身平铺陈列信息、下身平铺陈列信息、全身平铺陈列信息、细节陈列信息、堆叠陈列信息、多图陈列信息、其他陈列信息等。其与训练图片的对应关系如图2举例。又如,描述家具的训练图片的陈列信息包括但不限于:正面陈列信息、立体陈列信息、侧面陈列信息、细节陈列信息、其他陈列信息等。Specifically, the determining device constructs a size, a format, and the like required by the second device according to the second device, and obtains the training picture and the corresponding image by using a predetermined communication method such as http, https, or the like, or by local reading. Display information. Wherein, the training picture may be The stored source picture may also be a picture obtained by trimming the source picture or the like. The determining device uniformly acquires each training picture according to the classification of the display information. Wherein, the display information includes any information that can describe the placement information of the item displayed by the training picture, and the details or overall effect of the item. For example, the display information describing the training picture of the clothing includes but is not limited to: the upper body display information, the model lower body display information, the model body display information, the upper body tile display information, the lower body tile display information, the whole body tile display information, the detail display Information, stacked display information, multi-picture display information, other display information, etc. Its correspondence with the training picture is exemplified in FIG. 2. For another example, the display information describing the training picture of the furniture includes, but is not limited to, front display information, three-dimensional display information, side display information, detail display information, other display information, and the like.
在此,各所述训练图片所对应的陈列信息可以由对应各训练图片的数据库中直接获取。也可以根据所获取的各训练图片的陈列方式来确定,其中,所述陈列方式包括多种陈列分类信息,所标注的陈列信息包括所述多种陈列分类信息中至少一个。Here, the display information corresponding to each of the training pictures may be directly obtained from a database corresponding to each training picture. It may also be determined according to the obtained display manner of each training picture, wherein the display manner includes a plurality of display classification information, and the marked display information includes at least one of the plurality of display classification information.
例如,所述确定设备中预设了针对下身服装的陈列方式包括三种陈列分类信息,具体为:模特下身陈列分类信息、下身平铺陈列分类信息和下身细节陈列分类信息。所述确定设备所获取的训练图片包括:模特展示裤子的图片、模特展示半身裙的图片,同时一并获取了该两幅训练图片的陈列方式均为模特下身陈列分类信息,则对应各图片的陈列信息均为:模特下身陈列信息。For example, the display manner for the lower body garment preset in the determining device includes three display classification information, specifically: the model lower body display classification information, the lower body tile display classification information, and the lower body detail display classification information. The training picture acquired by the determining device includes: a model showing a picture of the pants, a model showing a picture of the skirt, and simultaneously obtaining the display manners of the two training pictures are the model display information of the lower body, corresponding to each picture. The display information is: the model shows information under the body.
又如,所述确定设备所获取的训练图片中的一幅图片中既包含模特展示裤子的图像还包含裤子细节的图像,则根据所获取的对应的陈列方式,所述确定设备确定所述图片的陈列信息包含模特下身陈列信息和下身细节陈列分类信息。For example, if a picture in the training picture acquired by the determining device includes both an image of the model showing the pants and an image of the pants details, the determining device determines the picture according to the acquired corresponding display manner. The display information includes the model's lower body display information and the lower body detail display classification information.
优选地,所述确定设备通过对源图片进行修剪得到多个训练图片。具体地,所述步骤S1包括:步骤S11、步骤S12。如图7所示。在步骤S11中,所述确定设备获取已标注陈列信息的多个样本图片;在步骤S12中,所述确定设备对每个样本图片进行预处理以获得对应的训练图片。Preferably, the determining device obtains a plurality of training pictures by trimming the source picture. Specifically, the step S1 includes: step S11, step S12. As shown in Figure 7. In step S11, the determining device acquires a plurality of sample pictures that have been labeled with the display information; in step S12, the determining device performs pre-processing on each sample picture to obtain a corresponding training picture.
在此,所述确定设备通过http、https等约定通信方式远程调用、或通 过本地读取等方式获取多个样本图片。由于所获取的样本图片的尺寸、色彩等各不相同,则所述确定设备对每个样本图片进行预处理,以得到符合预设尺寸、色彩要求的各训练图片。Here, the determining device remotely calls or passes through an agreed communication method such as http or https. Get multiple sample images by local reading, etc. Since the size, color, and the like of the acquired sample pictures are different, the determining device performs pre-processing on each sample picture to obtain each training picture that meets the preset size and color requirements.
在此,所述确定设备对每个样本图片进行预处理的方式包括从所获取的样本图片中选取符合预设尺寸、色彩等要求的图片作为所述训练图片。优选地,所述与处理方式包括:对每个样本图片进行归一化处理以获得对应的训练图片。具体地,所述归一化处理的方式包括但不限于以下至少任一项:1)将样本图片转换为三原色表示。例如,所获取的样本图片为JPG格式,则所述确定设备将该样本图片转换为RGB格式。2)对样本图片按比例缩放使其一边为定长。例如,所获取的各样本图片的尺寸各不相同,则所述确定设备将该样本图片转换成短边尺寸为a、长边尺寸为(a*ai/bi)。其中,i为样本图片的序号,0<i<n,n为所获取的样本图片的数量,ai为第i个样本图片的短边尺寸,bi为第i个样本图片的长边尺寸。3)裁剪样本图片使其为正方形。例如,所述确定设备将所获取的各样本图片的两短边分别裁剪(ai-a)/2的宽度,两长边分别裁剪(bi-a)/2的宽度。其中,i为各样本图片的序号,0<i<n,n为所获取的样本图片的数量,ai为第i个样本图片的短边尺寸,bi为第i个样本图片的长边尺寸,a为裁剪后的样本图片的长和宽尺寸。Here, the manner in which the determining device performs preprocessing on each sample picture includes selecting, from the acquired sample pictures, a picture that meets a requirement of a preset size, color, and the like as the training picture. Preferably, the processing manner comprises: normalizing each sample picture to obtain a corresponding training picture. Specifically, the manner of the normalization processing includes, but is not limited to, at least one of the following: 1) converting a sample picture into a three primary color representation. For example, if the acquired sample picture is in JPG format, the determining device converts the sample picture into an RGB format. 2) Scaling the sample image so that one side is fixed length. For example, if the acquired size of each sample picture is different, the determining device converts the sample picture into a short side size of a and a long side size of (a*a i /b i ). Where i is the sequence number of the sample picture, 0<i<n, n is the number of sample pictures acquired, a i is the short side size of the ith sample picture, and b i is the long side size of the ith sample picture . 3) Crop the sample image to make it square. For example, the determining device crops the two short sides of the acquired sample pictures by the width of (a i -a)/2, and the two long sides respectively cut the width of (b i -a)/2. Where i is the sequence number of each sample picture, 0<i<n, n is the number of sample pictures acquired, a i is the short side size of the i-th sample picture, and b i is the long side of the i-th sample picture Size, a is the length and width of the cropped sample image.
需要说明的是,本领域技术人员应该理解,上述归一化处理的方式仅为举例。事实上,所述确定设备还可以先将样本图片转换为三原色表示,再对样本图片按比例缩放使其一边为定长、和/或进行裁剪。It should be noted that those skilled in the art should understand that the manner of the above normalization processing is merely an example. In fact, the determining device may first convert the sample picture into a three primary color representation, and then scale the sample picture to make the side length, and/or crop.
其中,所述样本图片的数量可以与训练图片的数量相同,也可以少于训练图片的数量。The number of the sample pictures may be the same as the number of training pictures, or may be less than the number of training pictures.
优选地,所述确定设备利用移动窗从经所述归一化处理的每个样本图片中截取多个对应的训练图片。例如,所述确定设备所获取的样本图片的数量为n,所述确定设备现将所获取的样本图片按照上述任一种或多种方式进行归一化处理。接着,以a’*a’的移动窗对每幅裁剪后的尺寸为a*a的样本图片进行地毯式的移动,其中,移动的步进为t。如此,每幅样本图片都被截取出的训练图片的数量为1+(a-a’)/t,则所述确定设备共得 到训练图片的数量为n*(1+(a-a’)/t)。更优选地,所述确定设备利用移动窗从经所述归一化处理的每个样本图片中截取多个对应的训练图片,以使所得到的训练图片保留原样本图片的下半部信息;例如,该确定设备通过使移动窗在移动过程中保持与样本图片底部对齐,从该样本图片中截取多个保留原样本图片的下半部信息的训练图片。Preferably, the determining device intercepts a plurality of corresponding training pictures from each sample picture processed by the normalization by using a moving window. For example, the number of sample pictures acquired by the determining device is n, and the determining device now normalizes the acquired sample pictures according to any one or more of the foregoing manners. Next, each of the cropped sample pictures of size a*a is carpet-moved by a moving window of a'*a', wherein the step of moving is t. Thus, the number of training pictures that are taken out of each sample picture is 1+(a-a')/t, and the determining device has a total of The number of pictures to be trained is n*(1+(a-a’)/t). More preferably, the determining device intercepts a plurality of corresponding training pictures from each sample picture subjected to the normalization process by using a moving window, so that the obtained training picture retains the lower half information of the original sample picture; For example, the determining device intercepts a plurality of training pictures that retain the lower half of the original sample picture from the sample picture by keeping the moving window aligned with the bottom of the sample picture during the movement.
更为优选地,所述确定设备还可以通过对所截取的训练图片进行如镜像翻转、平面旋转等的旋转处理,得到更多的训练图片。例如,所述确定设备按照上述任一种或多种方式进行归一化处理、甚至截取了多个对应的训练图片之后,将所得到的训练图片进行旋转处理,如此得到更多的训练图片,并执行步骤S2。More preferably, the determining device may further obtain more training pictures by performing rotation processing on the intercepted training pictures such as mirror flipping, plane rotation, and the like. For example, after the determining device performs normalization processing according to any one or more of the above manners, and even intercepts a plurality of corresponding training pictures, the obtained training pictures are rotated, so that more training pictures are obtained. And step S2 is performed.
需要说明的是,所述确定设备所获取的各训练图片中所展示的物品应属于同一类物品。例如,所获取的各训练图片中所展示的均为服装类物品;或者,所获取的各训练图片中所展示的均为数码类物品等。It should be noted that the items displayed in each training picture acquired by the determining device should belong to the same type of items. For example, all the acquired training pictures are displayed as clothing items; or, the obtained training pictures are all digital items and the like.
在步骤S2中,所述确定设备基于所述多个训练图片经卷积神经网络训练得到对应的图片检测模型。In step S2, the determining device trains the convolutional neural network based on the plurality of training pictures to obtain a corresponding picture detection model.
具体地,所述确定设备将在所述步骤S1中所获取的各训练图片进行卷积神经网络训练,得到对应各陈列信息的特征向量(即神经元),再按照各陈列信息对所得到的各特征向量进行分类处理,得到图片检测模型。Specifically, the determining device performs convolutional neural network training on each training picture acquired in the step S1 to obtain a feature vector (ie, a neuron) corresponding to each display information, and then obtains the obtained information according to each display information. Each feature vector is classified and processed to obtain a picture detection model.
例如,所述确定设备将每个训练图片进行卷积神经网络训练,并将得到的特征向量与所属训练图片的陈列信息相对应,当所有训练图片完成卷积神经网络训练后,将对应同一陈列信息的各固定特征向量在所有维度上进行归一化的分类处理,最终得到分类后的每个维度的特征向量对应一个陈列信息的图片检测模型。For example, the determining device performs convolutional neural network training on each training picture, and correspondingly displays the obtained feature vector with the display information of the training picture to be associated with the training picture. When all the training pictures are completed by the convolutional neural network training, the same display is corresponding to the same display. The fixed feature vectors of the information are subjected to normalized classification processing in all dimensions, and finally the feature vectors of each dimension after classification are corresponding to a picture detection model of display information.
在此,所述卷积神经网络(convoluted neural network)包括三层卷积层及两层全连通层。具体地,所述确定设备采用梯度下降的方式对每层卷积层所得到的结果进行迭代。再利用两层全连通层将所得到的各特征向量建立连接关系。其中,所述卷积神经网络还可以优选地在其中一层全连通层设置dropout(休眠)层(如图4所示),用以提升模型收敛的效率;在此,Dropout层的作用是将将其对应的卷积层或全连通层中的部分参数 休眠,但是其对应的参数值会保留但是不更新,直到下一次不被选中进行休眠才会更新。该卷积神经网络还包括softmax(软性核函数)层;训练阶段,训练图片和对应的陈列信息会一起被利用,整个问题会经过多层网络进行训练,比如dropout层、卷积层、全连通层等等;其中陈列信息是在最后一层的softmax层发挥作用。Softmax层中包含的是一个非线性分类器,其利用全连通层输出的特征向量与对应的标签进行分类器训练。整个softmax的过程可以分为三步,第一步是对固定特征向量X所有维的值求最大值,记为Max_i,第二步使用指数函数exp将向量中的每一维都转化到0~1之间的数,即向量X中的每一维x[i]=exp(x[i]–Max_i),第三步对所有的值求和,然后相应的做归一化,即x[i]=x[i]/sum(x[i])。Here, the convoluted neural network includes three layers of convolution layers and two layers of fully connected layers. Specifically, the determining device iterates the result obtained by each layer of the convolution layer in a gradient descent manner. Then, the two connected layers are used to establish a connection relationship between the obtained feature vectors. Wherein, the convolutional neural network may also preferably set a dropout layer (shown in FIG. 4) in one of the all-connected layers to improve the efficiency of model convergence; here, the role of the Dropout layer is to Part of the parameters in the corresponding convolutional layer or fully connected layer Hibernate, but its corresponding parameter value will be retained but not updated until the next time it is not selected for hibernation. The convolutional neural network also includes a softmax (soft kernel function) layer; during the training phase, the training picture and the corresponding display information are used together, and the whole problem is trained through a multi-layer network, such as a dropout layer, a convolution layer, and a whole Connected layers, etc.; where the display information is played in the softmax layer of the last layer. Included in the Softmax layer is a nonlinear classifier that uses the feature vectors of the fully connected layer output and the corresponding tags for classifier training. The whole process of softmax can be divided into three steps. The first step is to find the maximum value of all the dimensions of the fixed feature vector X, which is denoted as Max_i. The second step uses the exponential function exp to convert each dimension in the vector to 0~. The number between 1 is the vector x[i]=exp(x[i]–Max_i) in the vector X. The third step sums all the values and then normalizes them, ie x[ i]=x[i]/sum(x[i]).
例如,所述确定设备将图片本身作为一个特征输入所述卷积神经网络进行训练,得到的每张训练图片直接转化为一个特征矩阵[W,H,C],其中,W为所述训练图片的宽度尺寸,H为所述训练图片的高度尺寸,C为所述训练图片的陈列分类信息等陈列信息。然后所有图片以K张为单位调入模型中进行训练,训练过程中使用了随机梯度下降方法对上述的卷积神经网络进行迭代学习,此处K一般取32或64。其中,每一轮迭代都会更新网络中每一层的参数,如网络层内结点的权重值以及偏执值等,直到这些参数值收敛,取得最优解。更为优选地,所述确定设备可将三层卷基层处理后的结果进行降采样(如图4中的Maxpooling层(最大值合并层)所示)。接着,所述确定设备使用全连通层将经过降采样所输出的所有特征向量(即神经元)互相之间建立连接关系,从而实现抽象化表达。For example, the determining device inputs the picture itself as a feature into the convolutional neural network for training, and each obtained training picture is directly converted into a feature matrix [W, H, C], where W is the training picture. The width dimension, H is the height dimension of the training picture, and C is the display information such as the display classification information of the training picture. Then all the pictures are transferred into the model for training in K-segment. During the training, the random gradient descent method is used to iteratively learn the above convolutional neural network, where K is generally 32 or 64. Among them, each iteration will update the parameters of each layer in the network, such as the weight value of the nodes in the network layer and the paranoid value, until the values of these parameters converge to obtain the optimal solution. More preferably, the determining device may downsample the result of the three-layer volume base layer processing (as shown by the Maxpooling layer (maximum merge layer) in FIG. 4). Then, the determining device establishes a connection relationship between all the feature vectors (ie, neurons) output through the downsampling using the fully connected layer, thereby implementing abstraction expression.
优选地,如图4所示,所述确定设备在每层卷积层后均设置RELU(rectified linear unit,校正线性单元,一种激活函数)层和归一化层。其中,RELU层利用神经网络中的各神经元的不饱和的非线性特性,提高模型整体的训练效率。所述归一化层基于每个像素点的局部窗口进行归一化处理,也就是局部归一化操作,能够增强模型整体的泛化性能。Preferably, as shown in FIG. 4, the determining device sets a RELU (reduced linear unit, an activation function) layer and a normalization layer after each layer of the convolution layer. Among them, the RELU layer utilizes the unsaturated nonlinear characteristics of each neuron in the neural network to improve the overall training efficiency of the model. The normalization layer performs normalization processing based on a local window of each pixel point, that is, a local normalization operation, which can enhance the overall generalization performance of the model.
其中,所述卷积层包括高斯卷积层,所述高斯卷积层用于对前一层的输出结果与多个高斯滤波核进行卷积操作,其中,所述高斯滤波核是基于所述多个训练图片经学习获得的。 Wherein the convolution layer comprises a Gaussian convolution layer, the Gaussian convolution layer is configured to perform a convolution operation on the output result of the previous layer and the plurality of Gaussian filter kernels, wherein the Gaussian filter kernel is based on the Multiple training pictures have been learned.
例如,所述确定设备利用高斯卷积层对前一层的输出结果与多个预设的高斯滤波核进行卷积操作。其中,高斯核的参数是经过学习得到的。所述确定设备设置三层高斯卷积层所使用的高斯核的尺寸均为5*5,并且,在每一个高斯卷积层中,卷积核均是对图片所有的像素点进行遍历计算。其中,所述确定设备针对第一层卷积层学习了64个高斯卷积核,针对第二层卷积层学习了32个高斯卷积核,针对第三层卷积层学习了16个高斯卷积层。For example, the determining device utilizes a Gaussian convolutional layer to perform a convolution operation on the output result of the previous layer with a plurality of preset Gaussian filter kernels. Among them, the parameters of the Gaussian kernel are learned. The size of the Gaussian kernel used by the determining device to set the three-layer Gaussian convolution layer is 5*5, and in each Gaussian convolution layer, the convolution kernel is traversed for all pixel points of the picture. Wherein, the determining device learns 64 Gaussian convolution kernels for the first layer convolutional layer, 32 Gaussian convolution kernels for the second layer convolutional layer, and 16 Gaussian for the third layer convolutional layer Convolution layer.
需要说明的是,本领域技术人员应该理解,上述各层卷积层的高斯卷积核的数量仅为举例,事实上,各层卷积层的高斯卷积核的数量可由实际需求而定。It should be noted that those skilled in the art should understand that the number of Gaussian convolution kernels of the above-mentioned layers of convolution layers is only an example. In fact, the number of Gaussian convolution kernels of each layer convolution layer may be determined by actual needs.
在建立了图片检测模型后,所述确定设备保存所述图片检测模型。当用户上传一待检测图片时,所述确定设备执行步骤S3,即根据所述图片检测模型确定待检测图片的图片陈列信息。After the picture detection model is established, the determining device saves the picture detection model. When the user uploads a to-be-detected picture, the determining device performs step S3, that is, determines picture display information of the to-be-detected picture according to the picture detection model.
具体地,所述确定设备将所述待检测图片输入所述图片检测模型,得到所述待检测图片对应各陈列信息的概率向量,取概率向量的值最大者、或者概率值超出预设阈值者所对应的陈列信息为所述待检测图片的图片陈列信息。Specifically, the determining device inputs the to-be-detected picture into the picture detection model, and obtains a probability vector of each display information corresponding to the to-be-detected picture, where the value of the probability vector is the largest, or the probability value exceeds a preset threshold. The corresponding display information is the picture display information of the picture to be detected.
在此,所述图片陈列信息可以仅为一个陈列信息,还可以包括所述多种陈列分类信息中至少一个。Here, the picture display information may be only one display information, and may further include at least one of the plurality of display classification information.
例如,所述图片检测模型所能检测的陈列信息包括:三种陈列分类信息,具体为:正面陈列类型、侧面陈列类型及细节陈列类型。当所述确定设备所得到的各概率向量的值中超出预设阈值的两个概率向量的值分别对应正面陈列类型和细节陈列类型,则所述确定设备确定所述待检测图片的图片陈列信息包括:正面陈列类型和细节陈列类型。For example, the display information that can be detected by the image detection model includes: three types of display classification information, specifically: a front display type, a side display type, and a detail display type. Determining, by the determining device, the picture display information of the to-be-detected picture, when the values of the two probability vectors exceeding the preset threshold in the values of the probability vectors obtained by the determining device respectively correspond to the front display type and the detail display type Includes: front display type and detail display type.
若所述确定设备所得到的各概率向量的值均小于预设阈值,则认定所对一个的待检测图片不合规。If the value of each probability vector obtained by the determining device is less than a preset threshold, it is determined that the image to be detected of one of the ones is not in compliance.
优选地,所述步骤S3包括:步骤S31、S32。如图4所示。Preferably, the step S3 comprises: steps S31, S32. As shown in Figure 4.
在步骤S31中,所述确定设备根据待检测图片的图片相关信息从所述图片检测模型中确定对应的所述图片检测子模型。在步骤S32中,所述确 定设备根据所述图片检测子模型确定所述待检测图片的图片陈列信息,其中,所述图片陈列信息包括所述多种陈列分类信息中至少一个。In step S31, the determining device determines, according to the picture related information of the picture to be detected, the corresponding picture detection submodel from the picture detection model. In step S32, the said The determining device determines the picture display information of the to-be-detected picture according to the picture detection sub-model, wherein the picture display information includes at least one of the plurality of display classification information.
在此,每个所述图片检测子模型对应检测一类物品图片。例如,图片检测子模型A对应检测服装类图片,图片检测子模型B对应检测数码产品类图片。Here, each of the picture detection sub-models corresponds to detecting a type of item picture. For example, the picture detection sub-model A corresponds to detecting a clothing type picture, and the picture detection sub-model B corresponds to detecting a digital product type picture.
所述确定设备在获取待检测图片的同时,还能获取所述待检测图片的图片相关信息。The determining device can acquire the picture related information of the to-be-detected picture while acquiring the picture to be detected.
例如,所述确定设备通过http、https等通信约定获取包含待检测图片和图片相关信息的表格。其中,所述图片相关信息包括但不限于:1)所述待检测图片的展示主体信息。其中,所述展示主体信息用于表示所述待检测图片中所展示的物品名称、类别等。例如,所述展示主体信息包括:服装、上衣。2)所述待检测图片的陈列位置信息。其中,所述陈列位置信息用于表示所述待检测图片中所展示的物品的摆放位置等。例如,所述陈列位置信息包括:家具的主体图、家具的左侧图、家具的右侧图、家具的局部图等。3)所述待检测图片所属应用的应用相关信息。其中,所述应用相关信息用于表示上传所述待检测图片的来源信息等。例如,所述应用相关信息包括:应用客户端所提供的数码类信息、WEB页面中的服装类上传信息等。For example, the determining device acquires a table including information related to the detected picture and the picture through a communication protocol such as http, https, or the like. The picture related information includes, but is not limited to: 1) display subject information of the to-be-detected picture. The display subject information is used to indicate an item name, a category, and the like displayed in the to-be-detected picture. For example, the display subject information includes: a garment, a top. 2) Display position information of the picture to be detected. The display position information is used to indicate a placement position of the item displayed in the picture to be detected, and the like. For example, the display position information includes: a main body view of the furniture, a left side view of the furniture, a right side view of the furniture, a partial view of the furniture, and the like. 3) Application-related information of the application to which the picture to be detected belongs. The application related information is used to indicate that the source information of the to-be-detected picture is uploaded. For example, the application related information includes: digital information provided by the application client, clothing category upload information in the WEB page, and the like.
由上可见,所述确定设备可以根据所述图片相关信息得到所对应的图片检测子模型。当所述确定设备根据所述图片相关信息无法得到所对应的图片检测子模型时,则认定所获取的待检测图片不合规。As can be seen from the above, the determining device can obtain the corresponding picture detection sub-model according to the picture related information. When the determining device cannot obtain the corresponding picture detection submodel according to the picture related information, it is determined that the acquired picture to be detected is not in compliance.
接着,所述确定设备根据所述图片检测子模型确定所述待检测图片的图片陈列信息。Next, the determining device determines the picture display information of the to-be-detected picture according to the picture detection sub-model.
需要说明的是,本领域技术人员应该理解,所述步骤S32中根据所述图片检测子模型确定所述待检测图片的图片陈列信息的方式与前述步骤S3中的根据所述图片检测模型确定所述待检测图片的图片陈列信息的方式相同或相似,在此不再详述。It should be noted that, in the step S32, the manner in which the picture display information of the to-be-detected picture is determined according to the picture detection sub-model in step S32 is determined according to the picture detection model according to the foregoing step S3. The manner in which the picture display information of the detected picture is described is the same or similar, and will not be described in detail herein.
综上所述,本申请的用于确定图片陈列信息的方法及设备,通过对同类物品的不同陈列方式的展示进行建模,并通过所建模型来确定待检测图 片的图片陈列信息,实现高效、准确地识别图片的陈列方式,从而支持进一步改进所陈列图片或该图片的陈列方式,进而提高用户获取信息效率、提供用户终端屏幕资源利用率并改善用户的使用体验;另外,本申请对所获取的样本图片进行归一化处理,有利于建模时对训练图片的统一处理,实现了使用较少的样本图片的数量来获得足够多的训练图片,提高建模效率;还有,采用三层卷积层和两层全连通层来进行神经网络训练,能够有效提高图片检测模型的准确性,使得在对待检测图片进行判别和识别时,所识别的准确性达到90%以上。所以,本申请有效克服了现有技术中的种种缺点而具高度产业利用价值。In summary, the method and device for determining picture display information of the present application are modeled by displaying different display modes of similar items, and the model to be detected is determined by the built model. The picture display information of the film realizes efficient and accurate recognition of the display mode of the picture, thereby supporting further improvement of the displayed picture or the display mode of the picture, thereby improving the efficiency of the user to obtain information, providing the utilization rate of the screen of the user terminal and improving the use of the user. In addition, the present application normalizes the acquired sample images, which is beneficial to the unified processing of the training images during modeling, and achieves the use of fewer sample images to obtain enough training images, and improve construction. Modular efficiency; also, the use of three-layer convolutional layer and two-layer fully connected layer for neural network training can effectively improve the accuracy of the picture detection model, so that the accuracy of the recognition when identifying and identifying the picture to be detected More than 90%. Therefore, the present application effectively overcomes various shortcomings in the prior art and has high industrial utilization value.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。装置权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。 It is obvious to those skilled in the art that the present application is not limited to the details of the above-described exemplary embodiments, and the present invention can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application. Therefore, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the invention is defined by the appended claims instead All changes in the meaning and scope of equivalent elements are included in this application. Any reference signs in the claims should not be construed as limiting the claim. In addition, it is to be understood that the word "comprising" does not exclude other elements or steps. A plurality of units or devices recited in the device claims may also be implemented by a unit or device by software or hardware. The first, second, etc. words are used to denote names and do not denote any particular order.

Claims (20)

  1. 一种用于确定图片陈列信息的方法,其中,该方法包括:A method for determining picture display information, wherein the method comprises:
    获取已标注陈列信息的多个训练图片;Obtaining multiple training pictures with labeled display information;
    基于所述多个训练图片经卷积神经网络训练得对应的图片检测模型;Performing a corresponding picture detection model based on the plurality of training pictures via a convolutional neural network;
    根据所述图片检测模型确定待检测图片的图片陈列信息。Determining picture display information of the picture to be detected according to the picture detection model.
  2. 根据权利要求1所述的方法,其中,所述获取已标注陈列信息的多个训练图片包括:The method of claim 1, wherein the obtaining the plurality of training pictures that have been labeled with the display information comprises:
    获取已标注陈列信息的多个样本图片;Obtaining multiple sample images of the labeled display information;
    对每个样本图片进行预处理以获得对应的训练图片。Each sample picture is preprocessed to obtain a corresponding training picture.
  3. 根据权利要求2所述的方法,其中,所述对每个样本图片进行预处理以获得对应的训练图片包括:The method of claim 2, wherein the pre-processing each sample picture to obtain a corresponding training picture comprises:
    对每个样本图片进行归一化处理以获得对应的训练图片。Each sample picture is normalized to obtain a corresponding training picture.
  4. 根据权利要求3所述的方法,其中,所述对每个样本图片进行预处理以获得对应的训练图片还包括:The method of claim 3, wherein the pre-processing each sample picture to obtain a corresponding training picture further comprises:
    利用移动窗从经所述归一化处理的每个样本图片中截取多个对应的训练图片。A plurality of corresponding training pictures are intercepted from each sample picture processed by the normalization using a moving window.
  5. 根据权利要求3或4所述的方法,其中,所述归一化处理包括以下至少任一项:The method of claim 3 or 4, wherein the normalization process comprises at least one of the following:
    将样本图片转换为三原色表示;Convert the sample image to a three primary color representation;
    对样本图片按比例缩放使其一边为定长;Scale the sample image to a fixed length on one side;
    裁剪样本图片使其为正方形。Crop the sample image to make it square.
  6. 根据权利要求1至5中任一项所述的方法,其中,所述卷积神经网络包括三层卷积层及两层全连通层。The method according to any one of claims 1 to 5, wherein the convolutional neural network comprises three layers of convolutional layers and two layers of fully connected layers.
  7. 根据权利要求6所述的方法,其中所述卷积层包括高斯卷积层,所述高斯卷积层用于对前一层的输出结果与多个高斯滤波核进行卷积操作,其中,所述高斯滤波核是基于所述多个训练图片经学习获得的。The method of claim 6 wherein said convolutional layer comprises a Gaussian convolutional layer for convoluting the output of the previous layer with a plurality of Gaussian filter kernels, wherein The Gaussian filter kernel is obtained based on the learning of the plurality of training pictures.
  8. 根据权利要求1至7中任一项所述的方法,其中,所述获取已标注陈列信息的多个训练图片包括: The method according to any one of claims 1 to 7, wherein the obtaining a plurality of training pictures with the annotated display information comprises:
    获取已标注陈列信息的多个训练图片,其中,关于所述训练图片的陈列方式包括多种陈列分类信息,所标注的陈列信息包括所述多种陈列分类信息中至少一个;Obtaining a plurality of training pictures that have been labeled with the display information, wherein the display manner of the training picture includes a plurality of display classification information, and the marked display information includes at least one of the plurality of display classification information;
    其中,所述根据所述图片检测模型确定待检测图片的图片陈列信息包括:The determining the image display information of the to-be-detected image according to the image detection model includes:
    根据所述图片检测模型确定待检测图片的图片陈列信息,其中,所述图片陈列信息包括所述多种陈列分类信息中至少一个。Determining, according to the picture detection model, picture display information of a picture to be detected, wherein the picture display information includes at least one of the plurality of display classification information.
  9. 根据权利要求8所述的方法,其中,所述图片检测模型包括多个图片检测子模型;The method of claim 8, wherein the picture detection model comprises a plurality of picture detection sub-models;
    其中,所述根据所述图片检测模型确定待检测图片的图片陈列信息包括:The determining the image display information of the to-be-detected image according to the image detection model includes:
    根据待检测图片的图片相关信息从所述图片检测模型中确定对应的所述图片检测子模型;Determining, according to the picture related information of the picture to be detected, the corresponding picture detection submodel from the picture detection model;
    根据所述图片检测子模型确定所述待检测图片的图片陈列信息,其中,所述图片陈列信息包括所述多种陈列分类信息中至少一个。Determining, according to the picture detection submodel, picture display information of the to-be-detected picture, wherein the picture display information includes at least one of the plurality of display classification information.
  10. 根据权利要求9所述的方法,其中,所述图片相关信息包括以下至少任一项:The method of claim 9, wherein the picture related information comprises at least one of the following:
    所述待检测图片的展示主体信息;Displaying body information of the to-be-detected picture;
    所述待检测图片的陈列位置信息;Display position information of the picture to be detected;
    所述待检测图片所属应用的应用相关信息。The application related information of the application to which the picture to be detected belongs.
  11. 一种用于确定图片陈列信息的设备,其中,该设备包括:A device for determining picture display information, wherein the device comprises:
    第一装置,用于获取已标注陈列信息的多个训练图片;a first device, configured to acquire a plurality of training pictures that have been marked with display information;
    第二装置,用于基于所述多个训练图片经卷积神经网络训练得对应的图片检测模型;a second device, configured to perform a corresponding picture detection model by using a convolutional neural network based on the plurality of training pictures;
    第三装置,用于根据所述图片检测模型确定待检测图片的图片陈列信息。And a third device, configured to determine, according to the picture detection model, picture display information of the picture to be detected.
  12. 根据权利要求11所述的设备,其中,所述第一装置包括:The apparatus of claim 11 wherein said first device comprises:
    第一一单元,用于获取已标注陈列信息的多个样本图片;a first unit for acquiring a plurality of sample pictures of the displayed display information;
    第一二单元,用于对每个样本图片进行预处理以获得对应的训练图 片。a first two unit for preprocessing each sample picture to obtain a corresponding training picture sheet.
  13. 根据权利要求12所述的设备,其中,所述第一二单元用于:The apparatus of claim 12 wherein said first two units are for:
    对每个样本图片进行归一化处理以获得对应的训练图片。Each sample picture is normalized to obtain a corresponding training picture.
  14. 根据权利要求13所述的设备,其中,所述第一二单元还用于:The apparatus of claim 13 wherein said first two units are further for:
    利用移动窗从经所述归一化处理的每个样本图片中截取多个对应的训练图片。A plurality of corresponding training pictures are intercepted from each sample picture processed by the normalization using a moving window.
  15. 根据权利要求13或14所述的设备,其中,所述归一化处理包括以下至少任一项:The apparatus according to claim 13 or 14, wherein said normalization processing comprises at least one of the following:
    将样本图片转换为三原色表示;Convert the sample image to a three primary color representation;
    对样本图片按比例缩放使其一边为定长;Scale the sample image to a fixed length on one side;
    裁剪样本图片使其为正方形。Crop the sample image to make it square.
  16. 根据权利要求11至15中任一项所述的设备,其中,所述卷积神经网络包括三层卷积层及两层全连通层。Apparatus according to any one of claims 11 to 15, wherein the convolutional neural network comprises three layers of convolutional layers and two layers of fully connected layers.
  17. 根据权利要求16所述的设备,其中所述卷积层包括高斯卷积层,所述高斯卷积层用于对前一层的输出结果与多个高斯滤波核进行卷积操作,其中,所述高斯滤波核是基于所述多个训练图片经学习获得的。The apparatus according to claim 16, wherein said convolution layer comprises a Gaussian convolution layer for convoluting an output result of a previous layer with a plurality of Gaussian filter kernels, wherein The Gaussian filter kernel is obtained based on the learning of the plurality of training pictures.
  18. 根据权利要求11至17中任一项所述的设备,其中,所述第一装置用于:Apparatus according to any one of claims 11 to 17, wherein said first means is for:
    获取已标注陈列信息的多个训练图片,其中,关于所述训练图片的陈列方式包括多种陈列分类信息,所标注的陈列信息包括所述多种陈列分类信息中至少一个;Obtaining a plurality of training pictures that have been labeled with the display information, wherein the display manner of the training picture includes a plurality of display classification information, and the marked display information includes at least one of the plurality of display classification information;
    其中,所述第三装置用于:Wherein the third device is used to:
    根据所述图片检测模型确定待检测图片的图片陈列信息,其中,所述图片陈列信息包括所述多种陈列分类信息中至少一个。Determining, according to the picture detection model, picture display information of a picture to be detected, wherein the picture display information includes at least one of the plurality of display classification information.
  19. 根据权利要求18所述的设备,其中,所述图片检测模型包括多个图片检测子模型;The apparatus of claim 18, wherein the picture detection model comprises a plurality of picture detection sub-models;
    其中,所述第三装置包括:Wherein the third device comprises:
    第三一单元,用于根据待检测图片的图片相关信息从所述图片检测模型中确定对应的所述图片检测子模型; a third unit, configured to determine, according to the picture related information of the picture to be detected, the corresponding picture detection submodel from the picture detection model;
    第三二单元,用于根据所述图片检测子模型确定所述待检测图片的图片陈列信息,其中,所述图片陈列信息包括所述多种陈列分类信息中至少一个。And a third unit, configured to determine, according to the picture detection sub-model, picture display information of the to-be-detected picture, where the picture display information includes at least one of the plurality of display classification information.
  20. 根据权利要求19所述的设备,其中,所述图片相关信息包括以下至少任一项:The device according to claim 19, wherein the picture related information comprises at least one of the following:
    所述待检测图片的展示主体信息;Displaying body information of the to-be-detected picture;
    所述待检测图片的陈列位置信息;Display position information of the picture to be detected;
    所述待检测图片所属应用的应用相关信息。 The application related information of the application to which the picture to be detected belongs.
PCT/CN2016/070157 2015-01-15 2016-01-05 Method and device for determining image display information WO2016112797A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510020689.9 2015-01-15
CN201510020689.9A CN105843816A (en) 2015-01-15 2015-01-15 Method and device for determining display information of picture

Publications (1)

Publication Number Publication Date
WO2016112797A1 true WO2016112797A1 (en) 2016-07-21

Family

ID=56405240

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/070157 WO2016112797A1 (en) 2015-01-15 2016-01-05 Method and device for determining image display information

Country Status (2)

Country Link
CN (1) CN105843816A (en)
WO (1) WO2016112797A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886344A (en) * 2016-09-30 2018-04-06 北京金山安全软件有限公司 Convolutional neural network-based cheating advertisement page identification method and device

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145908B (en) * 2017-05-08 2019-09-03 江南大学 A small target detection method based on R-FCN
CN108052523A (en) * 2017-11-03 2018-05-18 中国互联网络信息中心 Gambling site recognition methods and system based on convolutional neural networks
CN107944022A (en) * 2017-12-11 2018-04-20 努比亚技术有限公司 Picture classification method, mobile terminal and computer-readable recording medium
CN109657681A (en) * 2018-12-28 2019-04-19 北京旷视科技有限公司 Mask method, device, electronic equipment and the computer readable storage medium of picture
CN110705744B (en) * 2019-08-26 2022-10-21 南京苏宁加电子商务有限公司 Planogram generation method, planogram generation apparatus, computer device, and storage medium
CN110851902B (en) * 2019-11-06 2023-04-07 广东博智林机器人有限公司 Method and device for generating spatial arrangement scheme
CN117612159B (en) * 2022-11-08 2024-08-27 郑州英视江河生态环境科技有限公司 Microscopic biological image processing method, neural network training method, device and equipment
CN115601631B (en) * 2022-12-15 2023-04-07 深圳爱莫科技有限公司 Cigarette display image recognition method, system, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101950400A (en) * 2010-10-09 2011-01-19 姚建 Network shopping guiding method
US7920745B2 (en) * 2006-03-31 2011-04-05 Fujifilm Corporation Method and apparatus for performing constrained spectral clustering of digital image data
CN103544506A (en) * 2013-10-12 2014-01-29 Tcl集团股份有限公司 Method and device for classifying images on basis of convolutional neural network
CN103793717A (en) * 2012-11-02 2014-05-14 阿里巴巴集团控股有限公司 Methods for determining image-subject significance and training image-subject significance determining classifier and systems for same
CN103955718A (en) * 2014-05-15 2014-07-30 厦门美图之家科技有限公司 Image subject recognition method
CN104050568A (en) * 2013-03-11 2014-09-17 阿里巴巴集团控股有限公司 Method and system for commodity picture displaying
CN104077577A (en) * 2014-07-03 2014-10-01 浙江大学 Trademark detection method based on convolutional neural network
CN104268524A (en) * 2014-09-24 2015-01-07 朱毅 Convolutional neural network image recognition method based on dynamic adjustment of training targets

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034116B (en) * 2010-05-07 2013-05-01 大连交通大学 Commodity image classifying method based on complementary features and class description
CN103345645B (en) * 2013-06-27 2016-09-28 复旦大学 Commodity image class prediction method towards net purchase platform

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7920745B2 (en) * 2006-03-31 2011-04-05 Fujifilm Corporation Method and apparatus for performing constrained spectral clustering of digital image data
CN101950400A (en) * 2010-10-09 2011-01-19 姚建 Network shopping guiding method
CN103793717A (en) * 2012-11-02 2014-05-14 阿里巴巴集团控股有限公司 Methods for determining image-subject significance and training image-subject significance determining classifier and systems for same
CN104050568A (en) * 2013-03-11 2014-09-17 阿里巴巴集团控股有限公司 Method and system for commodity picture displaying
CN103544506A (en) * 2013-10-12 2014-01-29 Tcl集团股份有限公司 Method and device for classifying images on basis of convolutional neural network
CN103955718A (en) * 2014-05-15 2014-07-30 厦门美图之家科技有限公司 Image subject recognition method
CN104077577A (en) * 2014-07-03 2014-10-01 浙江大学 Trademark detection method based on convolutional neural network
CN104268524A (en) * 2014-09-24 2015-01-07 朱毅 Convolutional neural network image recognition method based on dynamic adjustment of training targets

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886344A (en) * 2016-09-30 2018-04-06 北京金山安全软件有限公司 Convolutional neural network-based cheating advertisement page identification method and device

Also Published As

Publication number Publication date
CN105843816A (en) 2016-08-10

Similar Documents

Publication Publication Date Title
WO2016112797A1 (en) Method and device for determining image display information
US10962404B2 (en) Systems and methods for weight measurement from user photos using deep learning networks
WO2021227726A1 (en) Methods and apparatuses for training face detection and image detection neural networks, and device
JP7490004B2 (en) Image Colorization Using Machine Learning
WO2016124103A1 (en) Picture detection method and device
US10515275B2 (en) Intelligent digital image scene detection
AU2014341919B2 (en) Systems and methods for facial representation
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
WO2021164550A1 (en) Image classification method and apparatus
CN111797893A (en) Neural network training method, image classification system and related equipment
JP2020522285A (en) System and method for whole body measurement extraction
CN107545263B (en) Object detection method and device
EP2869239A2 (en) Systems and methods for facial representation
CN107346436A (en) A kind of vision significance detection method of fused images classification
CN106683091A (en) Target classification and attitude detection method based on depth convolution neural network
CN105426455A (en) Method and device for carrying out classified management on clothes on the basis of picture processing
US10831818B2 (en) Digital image search training using aggregated digital images
Mohanty et al. Robust pose recognition using deep learning
CN107886062B (en) Image processing method, system and server
CN112070044A (en) Video object classification method and device
CN108764247A (en) Deep learning object detecting method and device based on dense connection
CN110858316A (en) Classifying time series image data
AU2014253687B2 (en) System and method of tracking an object
CN114663952A (en) Object classification method, deep learning model training method, device and equipment
Chang et al. Salgaze: Personalizing gaze estimation using visual saliency

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16737023

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16737023

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载