+

WO2018166438A1 - 图像处理方法、装置及电子设备 - Google Patents

图像处理方法、装置及电子设备 Download PDF

Info

Publication number
WO2018166438A1
WO2018166438A1 PCT/CN2018/078810 CN2018078810W WO2018166438A1 WO 2018166438 A1 WO2018166438 A1 WO 2018166438A1 CN 2018078810 W CN2018078810 W CN 2018078810W WO 2018166438 A1 WO2018166438 A1 WO 2018166438A1
Authority
WO
WIPO (PCT)
Prior art keywords
map
feature
image
feature map
attention
Prior art date
Application number
PCT/CN2018/078810
Other languages
English (en)
French (fr)
Inventor
王飞
钱晨
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2018166438A1 publication Critical patent/WO2018166438A1/zh
Priority to US16/451,334 priority Critical patent/US10943145B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/803Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/248Character recognition characterised by the processing or recognition method involving plural approaches, e.g. verification by template match; Resolving confusion among similar patterns, e.g. "O" versus "Q"
    • G06V30/2504Coarse or fine approaches, e.g. resolution of ambiguities or multiscale approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present application relates to computer vision technology, and in particular, to an image processing method, apparatus, and electronic device.
  • Computer vision is a simulation of biological vision using computers and related equipment.
  • the visual attention mechanism has always been of concern. Humans can quickly scan the entire area of the field of view through the visual attention mechanism and filter out areas that are not related to the target object, but only the area where the target object is located. Therefore, the attention mechanism greatly improves the efficiency of human information acquisition of the target object.
  • the embodiment of the present application proposes a technical solution for image processing.
  • an image processing method including: extracting features of an image to be processed, obtaining a first feature image of the image; and generating an attention map of the image based on the first feature image; The above attention map and the first characteristic map are combined to obtain a fusion map; and based on the fusion map, the features of the image are extracted again.
  • the generating the attention map of the image based on the first feature map includes: sequentially performing N times down sampling processing on the first feature map, where N is an integer greater than or equal to 1;
  • the feature maps after the N times of downsampling are sequentially subjected to upsampling processing N times to obtain an attention map of the above image, wherein the resolution of the attention map is the same as the resolution of the first feature map.
  • the feature map after the Nth downsampling process is sequentially subjected to N times of upsampling processing, including: a feature map after the Nnth downsampling process and a feature map after the nth upsampling process.
  • a convolution operation is performed in which n is an integer greater than 1 and less than N; the n+1th upsampling process is performed on the feature map after the convolution operation.
  • the performing the convolution operation on the feature map after the Nnth downsampling process and the feature map after the nth downsampling process includes: performing a roll of the feature map after the Nnth downsampling process Product processing, obtaining a convolution map; adding the feature value of at least one pixel in the convolution map to the feature value of the corresponding pixel in the n-th up-sampling feature map; convolving the added feature map operating.
  • the feature map after the Nth downsampling process is sequentially subjected to N times of upsampling processing, and further includes: performing at least one convolution operation on the feature map after the Nth downsampling process; The feature map after the convolution operation is subjected to the first upsampling process.
  • the fusion of the attention map and the first feature map to obtain a fusion map includes: performing at least one convolution operation on the first feature map; and integrating the attention map and the last convolution operation A feature map is obtained to obtain the above fusion map.
  • the fusion of the attention map and the first feature map to obtain a fusion map includes: normalizing at least the attention map; and integrating the normalized attention graph and the first feature map. , obtain the above fusion map.
  • the normalizing the at least the attention map includes: performing at least one convolution process on the attention map sequentially; and normalizing the attention map after the last convolution process.
  • the fusion of the attention map and the first feature map to obtain a fusion map includes: weighting values of at least one pixel in the normalized attention map and corresponding pixels in the first feature map. The eigenvalues are multiplied to obtain the above fusion map.
  • the fusion of the attention map and the first feature map to obtain a fusion map includes: weighting values of at least one pixel in the normalized attention map and corresponding pixels in the first feature map. Multiplying the eigenvalues to obtain a multiplied graph; adding the eigenvalues of at least one pixel in the multiplied graph to the eigenvalues of the corresponding pixels in the first graph, to obtain the fused graph.
  • the method further includes at least one of: detecting or identifying an object included in the image according to the feature of the image extracted again;
  • the feature of the image extracted again is used to determine the type of the object included in the image; and the image is segmented according to the feature of the image extracted again.
  • an image processing apparatus comprising: a first feature extraction unit, configured to extract features of an image to be processed, obtain a first feature image of the image; and extract attention a unit, configured to generate an attention map of the image based on the first feature map, a fusion unit configured to fuse the attention map and the first feature map to obtain a fusion map, and a second feature extraction unit configured to be based on the fusion map , extract the features of the above image again.
  • the attention extraction unit includes: a downsampling module, configured to sequentially perform N times of downsampling processing on the first feature map, where N is an integer greater than or equal to 1; and an upsampling module is configured to The feature map after the Nth downsampling process is sequentially subjected to upsampling processing N times to obtain an attention map of the above image, wherein the resolution of the attention map is the same as the first feature map.
  • the upsampling module is configured to perform a convolution operation on the Nnth downsampled feature map and the nth upsampled feature map, where n is greater than 1 and less than N.
  • An integer performing an n+1th upsampling process on the feature map after the convolution operation to obtain an attention map of the image.
  • the upsampling module when the upsampling module performs a convolution operation on the Nnth downsampling feature map and the nth upsampling feature map, the method is used to: after the Nnth downsampling process The feature map is subjected to convolution processing to obtain a convolution map; the feature value of at least one pixel in the convolution map is added to the feature value of the corresponding pixel in the feature map after the nth up-sampling process; The feature map performs a convolution operation to obtain an attention map of the image.
  • the upsampling module is configured to perform at least one convolution operation on the feature map after the Nth downsampling process, and perform a first upsampling process on the feature map after the last convolution operation.
  • the apparatus further includes: a second convolution unit configured to perform at least one convolution operation on the first feature map; and the fusion unit is configured to fuse the attention map and the last convolution operation The first feature map obtains the above fusion map.
  • the apparatus further includes: a normalization unit configured to perform normalization processing on at least the attention map; and the fusion unit configured to fuse the normalized attention map and the first feature map , obtain the above fusion map.
  • the apparatus further includes: a second convolution unit, configured to sequentially perform at least one convolution process on the attention map; and the normalization unit is configured to perform the attention map after the last convolution process Normalized processing.
  • the merging unit is configured to: multiply a weight value of at least one pixel in the normalized attention map with a feature value of a corresponding pixel in the first feature map to obtain the fused graph.
  • the merging unit is configured to: multiply a weight value of at least one pixel in the normalized attention map and a feature value of a corresponding pixel in the first feature map to obtain a multiplied graph; The feature value of at least one pixel in the multiplied graph is added to the feature value of the corresponding pixel in the first feature map to obtain the fusion map.
  • the apparatus further includes at least one of: a detecting unit configured to detect or identify an object included in the image according to a feature of the image extracted again; a classifying unit configured to extract the image according to the image And a feature of determining an object of the object included in the image; and a dividing unit configured to segment the image according to the feature of the image extracted again.
  • a computer readable storage medium having stored thereon computer instructions that, when executed, implement the operations of the steps in the image processing method of any of the embodiments of the present application.
  • an electronic device includes: a processor and a memory; the memory is configured to store at least one executable instruction, the executable instruction causing the processor to execute the present application
  • the image processing method according to any of the embodiments corresponds to the image processing method.
  • a computer program comprising computer readable code, the processor in the device executing the implementation of the present application when the computer readable code is run on a device An instruction of each step in the image processing method of one embodiment.
  • the image processing method and device, the electronic device, the program and the medium provided by the embodiment of the present application first extract the feature of the image to be processed, obtain a first feature image of the image, and generate an attention map of the image based on the first feature image. Then, the attention map and the first feature map are merged, and finally the features of the above image are extracted again based on the obtained fusion map, thereby realizing the introduction of the attention mechanism into the image processing, effectively improving the efficiency of obtaining information from the image. .
  • FIG. 1 is a flow chart of one embodiment of a method for detecting a target object in accordance with the present application
  • FIG. 2 is a flow chart showing a generation attention map of a method for detecting a target object according to the present application
  • FIG. 3a is a schematic diagram of a network structure corresponding to the flow shown in FIG. 2;
  • FIG. 3b is a schematic diagram of another network structure corresponding to the flow shown in FIG. 2;
  • FIG. 3b is a schematic diagram of another network structure corresponding to the flow shown in FIG. 2;
  • FIG. 4 is a flow chart showing a fusion attention map and a first feature map of a method for detecting a target object according to the present application
  • FIG. 5a is a schematic structural diagram of a neural network corresponding to the flow shown in FIG. 4;
  • FIG. 5a is a schematic structural diagram of a neural network corresponding to the flow shown in FIG. 4;
  • Figure 5b is a schematic diagram of the processing of the neural network shown in Figure 5a;
  • FIG. 6 is a schematic structural diagram of a deep convolutional neural network composed of the neural network shown in FIG. 5a;
  • FIG. 7 is a schematic structural diagram of an embodiment of an image processing apparatus according to the present application.
  • FIG. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server of an embodiment of the present application.
  • Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • the image processing method of this embodiment includes the following steps:
  • Step 101 Extract features of the image to be processed to obtain a first feature map of the image.
  • the image to be processed may be an image including various objects, buildings, people, and scenery, which may be a static image or a frame image in the video.
  • it may be implemented by using one or more convolution layers in the neural network.
  • the feature of the above image is extracted to obtain a first feature map of the image.
  • the step 101 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first feature extraction unit 701 that is executed by the processor.
  • Step 102 Generate an attention map of the image based on the first feature map.
  • a series of processes may be performed on the feature map to obtain an attention map of the image.
  • the above series of processing may be, for example, performing multiple downsampling processing on the first feature image, performing downsampling and upsampling processing on the first feature image, and performing multiple downsampling processing on the first feature image. Sub-sampling processing, convolution or averaging of the first feature map, and the like.
  • the method for generating the attention map based on the feature map may adopt any one of the methods provided in the following embodiments of the present application, and may also adopt other existing methods for generating the attention map based on the attention mechanism, and the embodiment of the present application is not limited.
  • the attention map generated by the attention mechanism of the computer vision technology may include the global information of the image to be processed, and the weight information of the feature that also includes the attention attention to the global information may simulate the human visual system, and focus on the image. When the weight of the feature information is significant, without losing the global information of the image.
  • the step 102 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an attention extraction unit 702 that is executed by the processor.
  • Step 103 Integrate the attention map and the first feature map to obtain a fusion map.
  • the two After obtaining the attention map and the first feature map, the two can be fused to obtain effective information of the object, the person, and the scene contained in the image to be processed, that is, the fusion map can be used more effectively. Expresses information such as objects, people, scenery, etc. in the image to be processed.
  • the step 103 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a fusion unit 703 that is executed by the processor.
  • Step 104 based on the fusion map, extract the features of the image again.
  • the features of the image may be extracted again, and the obtained features may be further applied.
  • the features of the above image can be implemented by using a plurality of convolutional convolution layers or residual units.
  • the step 104 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second feature extraction unit 704 that is executed by the processor.
  • the image processing method of various embodiments of the present application may be implemented by a neural network. It can be understood that in order to achieve better extraction of the features of the image to be processed, the neural network may be repeated multiple times to form a deeper neural network. In this way, more comprehensive global information of the image to be processed can be obtained, thereby enabling improved feature expression capabilities of the image to be processed.
  • the above neural network can use the picture with the annotation information to train the neural network before use, and backpropagate according to the training result to modify the parameters of the neural network, complete the training of the neural network, and obtain the above neural network.
  • the image processing method provided by the above embodiment of the present application first extracts features of an image to be processed, obtains a first feature map of the image, generates a attention map of the image based on the first feature map, and then draws attention and A feature map is merged, and finally the features of the above image are extracted again based on the obtained fusion map, thereby realizing the introduction of the attention mechanism into the image processing, and effectively improving the efficiency of obtaining information from the image.
  • FIG. 2 there is shown a flow 200 of generating an attention map in accordance with an image processing method of the present application.
  • the attention map of the image to be processed is generated by the following steps.
  • step 201 the first feature map is sequentially subjected to N times of downsampling processing.
  • N is an integer and N ⁇ 1.
  • the downsampling process is performed on the first feature map obtained in step 101, and global information of the first feature map can be obtained. However, the more times the downsampling is performed, the larger the difference between the dimension of the obtained global information graph and the dimension of the first feature map.
  • the down sampling operation may be implemented by, but not limited to, using a non-synchronized long pooling layer, an asynchronous long convolutional layer, and an average pooling layer to perform a downsampling operation.
  • the resolution of the first feature map is assumed to be 224 ⁇ 224, and after 3 times of downsampling, the resolution of the obtained feature map is 28 ⁇ 28. Since the feature map obtained after N times of downsampling is different from the resolution of the first feature map, although the feature map obtained after N times of downsampling contains the global information of the first feature map, the resolution cannot be guided. The characteristics of 224 ⁇ 224 are learned.
  • the step 201 may be performed by a processor invoking a corresponding instruction stored in the memory or by a downsampling module executed by the processor.
  • Step 202 Perform a N-time upsampling process on the feature map after the Nth downsampling process to obtain an attention map of the image.
  • the above feature map may be subjected to N times of upsampling processing.
  • the resolution of the feature map after the N times of upsampling processing is the same as the resolution of the first feature map.
  • the upsampling operation may be implemented by, but not limited to, using a deconvolution layer, a nearest neighbor interpolation layer, and a linear interpolation layer to perform an upsampling operation.
  • the resolution of the obtained feature map is 28 ⁇ 28
  • the resolution of the attention map obtained is the same as the resolution of the first feature map.
  • the feature map obtained by the downsampling process and the feature map obtained by the upsampling process may be convolved. operating. After the convolution operation is performed on the feature map obtained by the upsampling process, the next upsampling process is performed. That is, a convolution operation is performed on the feature map after the N-nth downsampling process and the feature map after the nth upsampling process; and the n+1th upsampling process is performed on the feature map after the convolution operation.
  • n is a positive integer and 1 ⁇ n ⁇ N.
  • the step 202 may be performed by a processor invoking a corresponding instruction stored in a memory or by an upsampling module executed by the processor.
  • the convolution operation in this implementation manner may be implemented by using a convolution layer, or may be implemented by using a residual unit, which is not limited in this implementation manner.
  • the residual unit described above may be a network structure including two or more convolution layers.
  • the attention graph obtained can be used to guide the subsequent learning of the features in the first feature map; After the downsampling process and the upsampling process, the convolution operation is performed on the obtained feature image, so that the features in the feature maps in different dimensions can be better learned.
  • Convolution processing is performed on the feature map after the Nnth downsampling process to obtain a convolution map; the feature value of at least one pixel (for example, each pixel) in the convolution map and the feature map after the nth upsampling process The feature values of the corresponding pixels are added; the convolution operation is performed on the added feature maps (ie, added).
  • n is a positive integer and 1 ⁇ n ⁇ N.
  • N may be a preset value, or may be a value calculated according to the resolution of the first feature map, and the value of N may be determined by the following calculation method: setting a minimum resolution of the feature map obtained after the down sampling process, The number of times of downsampling that can be performed is determined according to the resolution of the first feature map and the minimum resolution described above, that is, the value of N is determined. For example, the resolution of the first feature map is 56 ⁇ 56, and the minimum resolution set is 7 ⁇ 7. After each downsampling operation, the resolution of the obtained feature map is reduced to one quarter of the feature map before the downsampling. , then determine that the value of N is at most 3.
  • the feature map with the same resolution is obtained during the downsampling process and during the processing.
  • the feature map after the N-th downsampling process may be convoluted to obtain a convolution map. Then, the feature value of at least one pixel in the convolution map is added to the feature value of the corresponding pixel in the feature map after the nth upsampling process, and then the concatenated feature map is convoluted.
  • the feature maps having the same resolution obtained in the process of the downsampling process and the process of the above processing are added, and deeper information of the image to be processed can be obtained.
  • At least one convolution operation is performed on the feature map after the Nth downsampling process, and the first upsampling process is performed on the feature map after the last convolution operation.
  • the obtained feature map is convoluted to obtain a global information map, and then the convolution operation is performed on the global information map again, and then the feature after the convolution operation is performed again.
  • the graph performs the first upsampling process.
  • the downsampling process and the upsampling process are two symmetric processes, and the resulting attention graph is more capable of reflecting the feature information contained in the image to be processed.
  • the image processing method of the present implementation may be implemented by using the network structure shown in FIG. 3a.
  • the network structure of the implementation includes an input layer 301, a plurality of concatenated convolution layers 302, a plurality of downsampling units 303, a plurality of upsampling units 304, a plurality of residual units 305, and an output layer. 306. It can be understood that the convolution operation in this implementation is implemented by a residual unit.
  • the input layer 301 is used to input an image to be processed.
  • the cascaded plurality of convolution layers 302 are used to extract features of the image to be processed to obtain a first feature map. It can be understood that the plurality of convolution layers 302 of the above cascade can also be implemented by the residual unit.
  • Each downsampling unit 303 includes a downsampling layer 3031 and a residual unit 3032.
  • the downsampling layer 3031 is configured to downsample the first feature map obtained by the cascaded plurality of convolution layers 302.
  • the resolution of the feature map obtained by the sampling process is equivalent to one quarter of the resolution of the first feature image.
  • Each residual unit 3032 is configured to perform a convolution operation on the downsampled feature map after each downsampling process to extract features of the downsampled feature map.
  • the resolution of the first feature map is 56 ⁇ 56.
  • the resolution of the obtained image is 28 ⁇ 28
  • the residual unit 3032 extracts the above 28 ⁇ 28. The characteristics of the image.
  • the resolution of the obtained feature map processed by the third downsampling unit is 7 ⁇ 7
  • the residual unit of the third downsampling unit extracts the above.
  • the feature of the 7 ⁇ 7 image is obtained as a global information map of the first feature map.
  • the number of the downsampling units 303 in the foregoing network structure may be arbitrary, and the implementation manner is not limited thereto.
  • the structure of the residual unit in each downsampling unit can be the same, that is, including the same number of convolutional layers, but the parameters of the respective convolutional layers are different.
  • the upsampling unit 304 may include one residual unit 3041 and an upsampling layer 3042, and the residual unit 3041 and the residual unit 3032 may have the same structure but different parameters.
  • the residual unit 3041 is configured to extract the feature of the global information map obtained by the residual unit 3032.
  • the resolution of the feature image obtained by the upsampling process of the upsampling layer 3042 is the global information map resolution. Four times. After the same upsampling process as the number of downsamplings, the resolution of the resulting attention map is the same as the resolution of the first feature map.
  • the downsampling layer in the downsampling unit 303 can be implemented by the maximum pooling layer
  • the upsampling layer in the upsampling unit 304 can be implemented by the bilinear interpolation layer.
  • each of the downsampling unit 303 and the upsampling unit 304 may further include a plurality of residual units.
  • the downsampling unit 303' includes a maximum pooling layer and r cascaded residual units
  • the upsampling unit 304' includes r cascaded residual units and an interpolation.
  • the layer, and the last largest pooled layer and the first bilinear interpolation layer include 2r cascaded residual units. And the feature maps of the same resolution obtained during the upsampling process and the down-sampling process can be added by convolution with a residual unit, so before the last maximum pooling layer and the first bilinear interpolation A residual unit 305' is connected to the layer.
  • r is an integer greater than or equal to 1.
  • the feature maps with the same resolution are obtained during the downsampling and upsampling, for example, the feature maps with resolutions of 28 ⁇ 28, 14 ⁇ 14 and 7 ⁇ 7 are obtained during the downsampling process, and similarly, the upsampling is performed.
  • the feature maps with resolutions of 14 ⁇ 14, 28 ⁇ 28, and 56 ⁇ 56 are obtained in the process.
  • the resolution obtained in the downsampling process is a 14 ⁇ 14 feature map, which can be processed by one residual unit 305 and added to the feature value of the corresponding pixel in the resolution of the 14 ⁇ 14 feature image obtained in the upsampling process, and used for Subsequent upsampling;
  • the resolution obtained in the downsampling process is 28 ⁇ 28 feature map, which can be processed by one residual unit 304 and the resolution obtained in the upsampling process is 28 ⁇ 28. Plus for subsequent upsampling.
  • Such processing can capture multi-scale features of objects contained in the image to be processed, while enhancing the intensity of features of at least one object of interest, and suppressing the intensity of features of other objects that are not of interest.
  • FIG. 4 there is shown a flow 400 employed in the fusion of the attention map and the first feature map in the image processing method of the present application.
  • the fusion operation of this embodiment can be implemented by the following steps:
  • step 401 at least the attention map is normalized.
  • the attention map is normalized, and the weight value of at least one pixel (for example, each pixel) in the attention map may be limited to [0, 1].
  • the above normalization operation can be implemented, for example, by a sigmoid function, which is a threshold function of a neural network, which can map variables to [0, 1].
  • the step 401 can be performed by a processor invoking a corresponding instruction stored in a memory or by a normalization unit that is executed by the processor.
  • Step 402 Combine the normalized attention map and the first feature map to obtain a fusion map.
  • the normalized attention map and the first feature map are merged to obtain a fusion map.
  • the normalization operation of attention can facilitate subsequent data processing on the one hand, and facilitate subsequent data processing to obtain more accurate results on the other hand.
  • the step 104 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a fusion unit 703 that is executed by the processor.
  • At least one convolution processing may be performed on the attention map, and then the attention map after the last convolution processing may be performed. Then normalize it.
  • the operations described above may be performed by a processor invoking a corresponding instruction stored in a memory, or by a second convolution unit and a normalization unit being executed by the processor.
  • the convolution operation may be implemented by a convolution layer.
  • the convolution kernel of the convolution layer may be set to 1 ⁇ 1, which may enhance the features included in the attention map. Ability to express.
  • the weight value of at least one pixel (for example, each pixel) in the normalized attention map may be used. Multiplying the feature values of the corresponding pixels in the first feature map to obtain a fusion map.
  • the above operations may be performed by a processor invoking a corresponding instruction stored in a memory, or by a fusing unit 703 being executed by the processor.
  • the attention map since the attention map has the same resolution as the first feature map, at least one pixel in the attention map may be in one-to-one correspondence with at least one pixel in the first feature map.
  • the weight value of at least one pixel in the attention map is normalized, and the normalized weight value and the feature value of the corresponding pixel in the first feature map are multiplied by the method.
  • the obtained multiplied graph is used as a fusion map.
  • the feature values of at least one pixel in the obtained multiplied graph and the feature values of corresponding pixels in the first feature graph may be further obtained.
  • the above operations may be performed by a processor invoking a corresponding instruction stored in a memory, or by a fusing unit 703 being executed by the processor.
  • the feature information may be referred to as useful information.
  • the processing of the attention map or the processing of the feature map reduces the signal strength of the feature information in the image to be processed, that is, reduces the feature value of at least one pixel in the first feature map.
  • the attenuation of signal strength is not conducive to the learning of features by neural networks, and the attenuation of useful information above directly affects the feature learning ability of neural networks.
  • adding the feature value of at least one pixel in the multiplied image to the feature value of the corresponding pixel in the first feature image on the one hand, increasing the proportion of the feature value of the at least one pixel of the useful information in the entire fusion image, It is equivalent to suppressing other information than useful information, and on the other hand, it can prevent attenuation of signal strength.
  • the neural network structure of this embodiment can be obtained as shown in FIG. 5a in combination with the solution described in the embodiment shown in FIG.
  • the neural network 500 includes a first feature extraction unit 501, a first convolution unit 502, an attention extraction unit 503, a second convolution unit 504, a normalization unit 505, a fusion unit 506, and a second feature extraction unit. 507.
  • the first feature extraction unit 501, the first convolution unit 502, and the second feature extraction unit 507 are all formed by a plurality of residual units, and the first feature extraction unit 501 includes p cascaded residual units, the first volume.
  • the product unit 502 includes t cascaded residual units, and the second feature extraction unit 507 includes p cascaded residual units. Where p and t are integers greater than one.
  • the first feature extraction unit 501 functions as a plurality of convolution layers 302 concatenated in FIG. 3 for extracting features of the image to be processed to obtain a first feature map.
  • the first convolution unit 502 can further extract features of the first feature map.
  • the attention extraction unit 503 functions as a plurality of downsampling units 303, a plurality of upsampling units 304, and a plurality of residual units 305 in FIG. 3 to acquire an attention map.
  • the second convolution unit 504 is configured to perform at least one convolution operation on the attention map prior to normalizing the attention map.
  • the normalization unit 505 is used to normalize the attention map.
  • the fusion unit 506 is configured to fuse the normalized processed attention map and the first feature map to obtain a fusion map.
  • the second feature extraction unit 507 is for extracting features of the fused map again.
  • the process of the neural network shown in Fig. 5a can be referred to Fig. 5b, as shown in Fig. 5b, where the characteristic of the input, i.e., the first feature map, is represented by x.
  • the receptive field of the attention extraction unit 503 shown in Fig. 5a and the receptive field of the first convolution unit 502 respectively simulate the attention of human vision.
  • the left branch in FIG. 5b corresponds to the attention extraction unit 503, and the right branch corresponds to the first convolution unit 502.
  • the left branch in Figure 5b includes two downsamplings and two upsamplings.
  • the resolution of the obtained feature map is one quarter of the resolution of the first feature image x;
  • the resolution of the obtained feature map is one-sixteenth of the resolution of the first feature image x; and then the first upsampling process is performed, and the obtained feature map is obtained after the first downsampling process.
  • the resolution of the feature map is the same; after the second upsampling process, the obtained feature map has the same resolution as the first feature map.
  • the weight M(x) of the features of attention attention in the image is determined.
  • the right branch in Figure 5b includes a convolution operation on the first feature map x, resulting in a feature T(x).
  • the obtained weight M(x) is fused with the feature T(x) to obtain a fusion map, and the fusion map includes the fused feature (1+M(x)) ⁇ T(x).
  • the neural network 500 can also be repeatedly used as a sub-neural network, and the sub-neural networks with different parameters can be stacked.
  • a deep convolutional neural network 600 as shown in FIG. 6 is obtained.
  • the deep convolutional neural network 600 may include a plurality of sub-neural networks, and three sub-neural networks are schematically illustrated in FIG. 6, which are a sub-neural network 601, a sub-neural network 602, and a sub-neural network 603, respectively.
  • the parameters of each sub-neural network may be the same or different.
  • the parameters of the sub-neural network referred to herein may include: the number of downsampling and upsampling in the attention extraction unit, the number of residual units in the first convolution unit, and the like.
  • each sub-neural network may be repeated multiple times.
  • the deep convolutional neural network 600 may include m sub-neural networks 601 and k sub-neural nerves.
  • the neural network constructed based on the image processing method proposed in the embodiment can effectively reduce the parameter amount required for the neural network training in the neural network training process, and improve the learning efficiency of the feature; and based on the neural network after the training is completed In the process of image processing, there is no need to perform operations for adjusting parameters; through the same number of downsampling and upsampling operations, the reverse transmission of global information is realized, thereby promoting the transfer of useful information of attention attention.
  • the image to be processed may include multiple objects, and the multiple objects may be objects of the same kind or different types of objects.
  • the above objects may be at least one kind of objects, for example, may include various vehicles such as airplanes, bicycles, automobiles, and the like, and may also include various animals such as birds, dogs, and lions.
  • the features included in the image may be detected or recognized using the extracted features.
  • the image may be segmented by using the extracted feature again, and the portion including the object may be segmented.
  • the object contained in the image can be detected or recognized, and can be applied to an unmanned or blind guide device; the objects contained in the image can be classified and applied to the detection device in the military field; Segmentation can be applied to further analysis of objects.
  • any of the image processing methods provided by the embodiments of the present application may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like.
  • any image processing method provided by the embodiment of the present application may be executed by a processor, such as the processor, by executing a corresponding instruction stored in the memory to execute any one of the image processing methods mentioned in the embodiments of the present application. This will not be repeated below.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the method includes the steps of the foregoing method embodiments; and the foregoing storage medium includes at least one medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the present application provides an embodiment of an image processing apparatus, the apparatus embodiment corresponding to the method embodiment shown in FIG. 1, the apparatus optionally being applicable to at least One kind of electronic device.
  • the image processing apparatus 700 of the present embodiment includes a first feature extraction unit 701, an attention extraction unit 702, a fusion unit 703, and a second feature extraction unit 704.
  • the first feature extraction unit 701 is configured to extract features of the image to be processed, and obtain a first feature image of the image.
  • the attention extraction unit 702 is configured to generate an attention map of the image based on the first feature map.
  • the fusion unit 703 is configured to fuse the attention map and the first feature map to obtain a fusion map.
  • the second feature extraction unit 704 is configured to extract the feature of the image again based on the fusion map.
  • the attention extraction unit 702 may further include a downsampling module and an upsampling module not shown in FIG. 7.
  • a downsampling module configured to sequentially perform N times of sampling processing on the first feature map, where N is an integer greater than or equal to 1;
  • the upsampling module is configured to perform N times of upsampling processing on the feature map after the Nth downsampling process to obtain an attention map of the image, wherein the resolution of the attention map is the same as the first feature map.
  • the upsampling module may be configured to perform a convolution operation on the feature map after the Nnth downsampling process and the feature map after the nth upsampling process;
  • the feature map after the product operation performs the n+1th upsampling process to obtain the attention map of the image.
  • n is an integer greater than 1 and less than N.
  • the upsampling module when the upsampling module performs a convolution operation on the feature map after the Nnth downsampling process and the feature map after the nth upsampling process,
  • the feature map after the Nnth downsampling process is subjected to convolution processing to obtain a convolution map; the feature value of at least one pixel in the convolution map is compared with the feature value of the corresponding pixel in the feature map after the nth upsampling process Add; convolution operation on the added feature map.
  • the upsampling module may be configured to: perform at least one convolution operation on the feature image after the Nth downsampling process; and perform a feature image after the last convolution operation The first upsampling process is performed to obtain an attention map of the image.
  • the image processing apparatus 700 may further include a second convolution unit not shown in FIG. 7 for performing at least one convolution operation on the first feature map.
  • the fusion unit 703 is configured to fuse the attention map and the first feature map after the last convolution operation to obtain the fusion map.
  • the image processing apparatus 700 may further include a normalization unit not shown in FIG. 7 for normalizing at least the attention map.
  • the fusion unit 703 is configured to fuse the normalized attention map and the first feature map to obtain the fusion map.
  • the image processing apparatus 700 may further include a second convolution unit not shown in FIG. 7 for sequentially performing at least one convolution process on the attention map.
  • the normalization unit is used for normalizing the attention map after the last convolution process.
  • the merging unit 703 may be further configured to: use a weight value of at least one pixel in the normalized attention map and a feature of the corresponding pixel in the first feature map. Multiply the values to obtain the above fusion map.
  • the merging unit 703 may be further configured to: use a weight value of at least one pixel in the normalized attention map and a feature of the corresponding pixel in the first feature map. Multiplying the values to obtain a multiplied graph; adding the feature values of at least one pixel in the multiplied graph to the feature values of the corresponding pixels in the first feature map to obtain the fusion map.
  • the image processing apparatus 700 may further include at least one of the following items not shown in FIG. 7 : a detecting unit, a classifying unit, and a dividing unit.
  • the detecting unit is configured to detect or identify an object included in the image according to the feature of the image extracted again.
  • a classifying unit configured to determine a category of the object included in the image according to the feature of the image extracted again.
  • a dividing unit configured to divide the image according to the feature of the image extracted again.
  • the image processing apparatus first extracts features of an image to be processed, obtains a first feature map of the image, generates an attention map of the image based on the first feature map, and then draws attention and A feature map is merged, and finally the features of the above image are extracted again based on the obtained fusion map, thereby realizing the introduction of the attention mechanism into the image processing, and effectively improving the efficiency of obtaining information from the image.
  • each block of the flowchart or block diagrams can represent a module, a program segment, or a portion of code that includes one or more Executable instructions.
  • the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present application may be implemented by software or by hardware.
  • the described unit may also be disposed in the processor.
  • a processor includes a first feature extraction unit, an attention extraction unit, a fusion unit, and a second feature extraction unit.
  • the names of the units do not constitute a limitation on the unit itself in some cases.
  • the first feature extraction unit may also be described as “extracting the features of the image to be processed, and obtaining the first feature map of the image. Unit.”
  • the embodiment of the present application further provides another electronic device, including: a processor and a memory, where the memory is used to store at least one executable instruction, and the executable instruction causes the processor to perform image processing in any of the foregoing embodiments of the present application.
  • a processor and a memory where the memory is used to store at least one executable instruction, and the executable instruction causes the processor to perform image processing in any of the foregoing embodiments of the present application. The corresponding operation of the method.
  • the embodiment of the present application further provides an electronic device, such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • an electronic device such as a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • FIG. 8 there is shown a schematic structural diagram of an electronic device 800 suitable for implementing a terminal device or a server of an embodiment of the present application.
  • the computer system 800 includes one or more processors and a communication unit.
  • the one or more processors described above for example: one or more central processing units (CPUs) 801, and/or one or more image processors (GPUs) 813, etc., the processors may be stored in a read only memory (ROM) At least one suitable action and process is performed by executable instructions in 802 or executable instructions loaded into random access memory (RAM) 803 from storage portion 808.
  • the communication unit 812 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card.
  • the processor can communicate with the ROM 802 and/or the RAM 803 to execute executable instructions, connect to the communication unit 812 via the bus 804, and communicate with other target devices via the communication unit 812, thereby completing any of the methods provided by the embodiments of the present application.
  • Corresponding operations for example, extracting features of the image to be processed, obtaining a first feature map of the image; generating an attention map of the image based on the first feature map; and integrating the attention map and the first feature map to obtain a fusion Figure; based on the above fusion map, the features of the above image are extracted again.
  • the RAM 803 at least one kind of program and data required for the operation of the device may be stored.
  • the CPU 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804.
  • ROM 802 is an optional module.
  • the RAM 803 stores executable instructions, or writes executable instructions to the ROM 802 at runtime, and the executable instructions cause the CPU 801 to perform operations corresponding to the above-described communication methods.
  • An input/output (I/O) interface 805 is also coupled to bus 804.
  • the communication unit 812 may be integrated or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and linked on the bus 804.
  • the following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, etc.; an output portion 807 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 808 including a hard disk or the like. And a communication portion 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the Internet.
  • Driver 810 is also coupled to I/O interface 805 as needed.
  • a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 810 as needed so that a computer program read therefrom is installed into the storage portion 808 as needed.
  • FIG. 8 is only an optional implementation manner.
  • the number and type of components in FIG. 8 may be selected, deleted, added, or replaced according to actual needs;
  • implementations such as separate settings or integrated settings may also be adopted.
  • the GPU 813 and the CPU 801 may be separately configured or the GPU 813 may be integrated on the CPU 801, and the communication unit may be separately configured or integrated.
  • CPU 801 or GPU 813, and so on are all within the scope of protection disclosed herein.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing instructions corresponding to the method steps provided by the embodiments of the present application, for example, extracting features of an image to be processed, obtaining a first feature map of the image; generating a attention map of the image based on the first feature map; and integrating the attention map And the first characteristic map described above, obtaining a fusion map; and extracting features of the image based on the fusion map.
  • the computer program can be downloaded and installed from the network via communication portion 809, and/or installed from removable media 811. When the computer program is executed by the CPU 801, the above-described functions defined in the method of the present application are performed.
  • the embodiment of the present application further provides a computer program, including computer readable code, when the computer readable code is run on the device, the processor in the device is configured to implement any of the foregoing embodiments of the present application. Instructions for each step in the image processing method.
  • the embodiment of the present application further provides a computer readable storage medium for storing computer readable instructions, which are executed to implement the operations of the steps in the image processing method of any of the foregoing embodiments of the present application.
  • the method, apparatus, and apparatus of the embodiments of the present application may be implemented in many ways.
  • the method, apparatus, and apparatus of the embodiments of the present application can be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the embodiments of the present application are not limited to the order of the above optional description unless otherwise specified.
  • the present application may also be embodied as a program recorded in a recording medium, the programs including machine readable instructions for implementing a method in accordance with embodiments of the present application.
  • the embodiment of the present application also covers a recording medium storing a program for executing the method according to an embodiment of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请公开了图像处理方法、装置及电子设备。所述图像处理方法的一实施方式包括:提取待处理的图像的特征,获得上述图像的第一特征图;基于上述第一特征图,生成上述图像的注意力图;融合上述注意力图和上述第一特征图,获得融合图;基于上述融合图,再次提取上述图像的特征。该实施方式实现了将注意力机制引入到图像处理中,有效提高了从图像中获取信息的效率。

Description

图像处理方法、装置及电子设备
本申请要求在2017年03月13日提交中国专利局、申请号为CN201710145253.1、发明名称为“图像处理方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术,具体涉及一种图像处理方法、装置及电子设备。
背景技术
计算机视觉是使用计算机及相关设备对生物视觉的一种模拟。在计算机视觉领域,视觉注意力机制一直为人们所关注。人类可以通过视觉注意力机制快速扫描视野范围内的整体区域,并将与目标物体无关的区域滤除,而仅仅关注目标物体所在的区域。因此,注意力机制极大的提高了人类对于目标物体的信息获取效率。
发明内容
本申请实施例提出了一种图像处理的技术方案。
根据本申请实施例的一个方面,提供了一种图像处理方法,包括:提取待处理的图像的特征,获得上述图像的第一特征图;基于上述第一特征图,生成上述图像的注意力图;融合上述注意力图和上述第一特征图,获得融合图;基于上述融合图,再次提取上述图像的特征。
在一些实施例中,上述基于上述第一特征图,生成上述图像的注意力图,包括:对上述第一特征图依次进行N次下采样处理,其中,N为大于或等于1的整数;对第N次下采样处理后的特征图依次进行N次上采样处理,获得上述图像的注意力图,其中,上述注意力图的分辨率与上述第一特征图的分辨率相同。
在一些实施例中,上述对第N次下采样处理后的特征图依次进行N次上采样处理,包括:对第N-n次下采样处理后的特征图和第n次上采样处理后的特征图进行卷积操作,其中,n为大于1且小于N的整数;对卷积操作后的特征图进行第n+1次上采样处理。
在一些实施例中,上述对第N-n次下采样处理后的特征图和第n次上采样处理后的特征图进行卷积操作,包括:对上述第N-n次下采样处理后的特征图进行卷积处理,得到卷积图;将上述卷积图中至少一个像素的特征值与第n次上采样处理后的特征图中相应像素的特征值相加;对相加后的特征图进行卷积操作。
在一些实施例中,上述对第N次下采样处理后的特征图依次进行N次上采样处理,还包括:对第N次下采样处理后的特征图进行至少一次卷积操作;对最后一次卷积操作后的特征图进行第1次上采样处理。
在一些实施例中,上述融合上述注意力图和上述第一特征图,获得融合图,包括:对上述第一特征图进行至少一次卷积操作;融合上述注意力图和最后一次卷积操作后的第一特征图,获得上述融合图。
在一些实施例中,上述融合上述注意力图和上述第一特征图,获得融合图,包括:至 少对上述注意力图进行归一化处理;融合归一化处理后的注意力图和上述第一特征图,获得上述融合图。
在一些实施例中,上述至少对上述注意力图进行归一化处理,包括:对上述注意力图依次进行至少一次卷积处理;对最后一次卷积处理后的注意力图进行归一化处理。
在一些实施例中,上述融合上述注意力图和上述第一特征图,获得融合图,包括:将归一化处理后的注意力图中至少一个像素的权重值与上述第一特征图中相应像素的特征值相乘,获得上述融合图。
在一些实施例中,上述融合上述注意力图和上述第一特征图,获得融合图,包括:将归一化处理后的注意力图中至少一个像素的权重值与上述第一特征图中相应像素的特征值相乘,获得相乘图;将上述相乘图中至少一个像素的特征值与上述第一特征图中相应像素的特征值相加,获得上述融合图。
在一些实施例中,在上述基于上述融合图,再次提取上述图像的特征之后,上述方法还包括以下至少之一:根据再次提取的上述图像的特征,检测或识别上述图像中包括的物体;根据再次提取的上述图像的特征,确定上述图像中包括的物体的类别;根据再次提取的上述图像的特征,对上述图像进行分割。
根据本申请实施例的另一个方面,提供了一种图像处理装置,上述装置包括:第一特征提取单元,用于提取待处理的图像的特征,获得上述图像的第一特征图;注意力提取单元,用于基于上述第一特征图,生成上述图像的注意力图;融合单元,用于融合上述注意力图和上述第一特征图,获得融合图;第二特征提取单元,用于基于上述融合图,再次提取上述图像的特征。
在一些实施例中,上述注意力提取单元包括:下采样模块,用于对上述第一特征图依次进行N次下采样处理,其中,N为大于或等于1的整数;上采样模块,用于对第N次下采样处理后的特征图依次进行N次上采样处理,获得上述图像的注意力图,其中,上述注意力图的分辨率与上述第一特征图相同。
在一些实施例中,上述上采样模块用于:对第N-n次下采样处理后的特征图和第n次上采样处理后的特征图进行卷积操作,其中,n为大于1且小于N的整数;对卷积操作后的特征图进行第n+1次上采样处理,获得所述图像的注意力图。
在一些实施例中,上述上采样模块对第N-n次下采样处理后的特征图和第n次上采样处理后的特征图进行卷积操作时,用于:对上述第N-n次下采样处理后的特征图进行卷积处理,得到卷积图;将上述卷积图中至少一个像素的特征值与第n次上采样处理后的特征图中相应像素的特征值相加;对相加后的特征图进行卷积操作,获得所述图像的注意力图。
在一些实施例中,上述上采样模块用于:对第N次下采样处理后的特征图进行至少一次卷积操作;对最后一次卷积操作后的特征图进行第1次上采样处理。
在一些实施例中,上述装置还包括:第二卷积单元,用于对上述第一特征图进行至少一次卷积操作;上述融合单元,用于融合上述注意力图和最后一次卷积操作后的第一特征图,获得上述融合图。
在一些实施例中,上述装置还包括:归一化单元,用于至少对上述注意力图进行归一化处理;上述融合单元,用于融合归一化处理后的注意力图和上述第一特征图,获得上述融合图。
在一些实施例中,上述装置还包括:第二卷积单元,用于对上述注意力图依次进行至 少一次卷积处理;上述归一化单元,用于对最后一次卷积处理后的注意力图进行归一化处理。
在一些实施例中,上述融合单元用于:将归一化处理后的注意力图中至少一个像素的权重值与上述第一特征图中相应像素的特征值相乘,获得上述融合图。
在一些实施例中,上述融合单元用于:将归一化处理后的注意力图中至少一个像素的权重值与上述第一特征图中相应像素的特征值相乘,获得相乘图;将上述相乘图中至少一个像素的特征值与上述第一特征图中相应像素的特征值相加,获得上述融合图。
在一些实施例中,上述装置还包括以下至少之一:检测单元,用于根据再次提取的上述图像的特征,检测或识别上述图像中包括的物体;分类单元,用于根据再次提取的上述图像的特征,确定上述图像中包括的物体的类别;分割单元,用于根据再次提取的上述图像的特征,对上述图像进行分割。
根据本申请实施例的又一个方面,提供了一种计算机可读存储介质,其上存储有计算机指令,该指令被执行时实现本申请任一实施方式所述图像处理方法中各步骤的操作。
根据本申请实施例的再一个方面,提供了一种电子设备,包括:处理器和和存储器;所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请任一实施方式所述图像处理方法对应的。
根据本申请实施例的再一个方面,提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请任一实施方式所述图像处理方法中各步骤的指令。
本申请实施例提供的图像处理方法、装置及电子设备、程序和介质,首先提取待处理的图像的特征,得到上述图像的第一特征图,基于上述第一特征图,生成上述图像的注意力图,再将注意力图和第一特征图融合,最后再基于得到的融合图再次提取上述图像的特征,从而实现了将注意力机制引入到图像处理中,有效地提高了从图像中获取信息的效率。
下面通过附图和实施例,对本申请实施例的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是根据本申请的用于检测目标物体的方法的一个实施例的流程图;
图2是根据本申请的用于检测目标物体的方法的生成注意力图的流程示意图;
图3a是图2所示的流程对应的一种网络结构的示意图;
图3b是图2所示的流程对应的另一种网络结构的示意图;
图4是根据本申请的用于检测目标物体的方法的融合注意力图和第一特征图的流程示意图;
图5a是图4所示的流程对应的神经网络的结构示意图;
图5b是图5a所示的神经网络的处理过程示意图;
图6是图5a所示的神经网络构成的深度卷积神经网络的结构示意图;
图7是根据本申请的图像处理装置的一个实施例的结构示意图;
图8是适于用来实现本申请实施例的终端设备或服务器的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请实施例作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关申请,而非对该申请实施例的限定。同时,应当明白,为了便于描述,附图中所示出的至少一个个部分的尺寸并不是按照实际的比例关系绘制的。
另外还需要说明的是,为了便于描述,附图中仅示出了与有关申请实施例相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于终端设备、计算机系统、服务器等电子设备,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
下面将参考附图并结合实施例来详细说明本申请。
参考图1,示出了根据本申请的图像处理方法的一个实施例的流程100。本实施例的图像处理方法,包括以下步骤:
步骤101,提取待处理的图像的特征,获得上述图像的第一特征图。
在本申请实施例中,待处理的图像可以是包含各种物体、建筑、人物、景色的图像,其可以是静态图像,也可以是视频中的一帧图像。在提取上述待处理的图像的特征时,可以利用神经网络中的一个或多个卷积层来实现。提取上述图像的特征,得到上述图像的第一特征图。
在一个可选示例中,该步骤101可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一特征提取单元701执行。
步骤102,基于第一特征图,生成上述图像的注意力图。
在获得上述待处理的图像的第一特征图后,可以对上述特征图进行一系列的处理,得到上述图像的注意力图。上述一系列的处理例如可以是:对第一特征图进行多次下采样处理、对第一特征图交替进行下采样和上采样处理、对第一特征图进行多次下采样处理后再进行多次上采样处理、对第一特征图进行卷积或平均池化等等。基于特征图生成注意力图的方式可采用本申请实施例下文提供的任一种方法,也可采用基于注意力机制生成注意力 图的其他现有方法,本申请实施例并不限制。基于计算机视觉技术的注意力机制生成的注意力图可以包含上述待处理的图像的全局信息,并且对上述全局信息中还包括注意力关注的特征的权重信息,可以模拟人的视觉系统,重点关注图像当中权重大的特征信息,而不丧失图像的全局信息。
在一个可选示例中,该步骤102可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的注意力提取单元702执行。
步骤103,融合注意力图和第一特征图,获得融合图。
在得到注意力图和第一特征图后,可以将二者进行融合,以获得上述待处理的图像中包含的物体、人物、景色的有效信息,也就是说,利用上述融合图,能够更有效地表达待处理的图像中的物体、人物、景色等信息。
在一个可选示例中,该步骤103可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的融合单元703执行。
步骤104,基于上述融合图,再次提取上述图像的特征。
本申请各实施例中,在得到上述融合图后,可以再次提取上述图像的特征,得到的特征可以进行进一步的应用。在再次提取上述图像的特征时,可以利用多个级联的卷积层或残差单元来实现。
在一个可选示例中,该步骤104可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二特征提取单元704执行。
在一种可选示例中,本申请各实施例的图像处理方法,可以由神经网络来实现。可以理解的是,为了实现更好的提取上述待处理的图像的特征,可以多次重复上述神经网络,以形成更深的神经网络。这样,能够得到待处理的图像的更全面的全局信息,从而能够提高对待处理的图像的特征表达能力。
可以理解的是,上述神经网络在使用前可以利用带有标注信息的图片来训练神经网络,并根据训练结果反向传播以修改神经网络的参数,完成神经网络的训练,从而得到上述神经网络。
本申请的上述实施例提供的图像处理方法,首先提取待处理的图像的特征,得到上述图像的第一特征图,基于上述第一特征图,生成上述图像的注意力图,再将注意力图和第一特征图融合,最后再基于得到的融合图再次提取上述图像的特征,从而实现了将注意力机制引入到图像处理中,有效地提高了从图像中获取信息的效率。
参考图2,其示出了根据本申请的图像处理方法的生成注意力图的流程200。如图2所示,本实施例中通过以下步骤生成待处理的图像的注意力图。
步骤201,对第一特征图依次进行N次下采样处理。
本申请各实施例中,N为整数,且N≥1。在对步骤101中得到的第一特征图进行下采样处理,可以获得第一特征图的全局信息。但下采样的次数越多,得到的全局信息图的维度与第一特征图的维度相差就越大。本实施例中,上述下采样操作可以通过但不限于以下方式来实现:利用不同步长的池化层、不同步长的卷积层、平均池化层,来进行下采样操作。例如,利用步长为2的池化层对第一特征图进行下采样时,假设第一特征图的分辨率为224×224,经过3次下采样后,得到的特征图的分辨率为28×28。由于经过N次下采样后得到的特征图与第一特征图的分辨率不同,虽然经过N次下采样后得到的特征图中包含了第一特征图的全局信息,但其无法指导分辨率为224×224的特征进行学习。
在一个可选示例中,该步骤201可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的下采样模块执行。
步骤202,对第N次下采样处理后的特征图依次进行N次上采样处理,获得图像的注意力图。
本实施例中,在得到N次下采样处理后的特征图后,可以对上述特征图进行N次上采样处理。这样经N次上采样处理后的特征图的分辨率与第一特征图的分辨率相同。本实施例中,上述上采样操作可以通过但不限于以下方式来实现:利用解卷积层、最邻近插值层、线性插值层,进行上采样操作。例如,经过3次下采样后,得到的特征图的分辨率为28×28,此特征图再经过3次上采样处理后,得到的注意力图的分辨率与第一特征图的分辨率相同。
在本实施例的一些可选的实现方式中,在每次下采样及每次上采样处理后,还可以对经下采样处理得到的特征图以及对经上采样处理得到的特征图进行卷积操作。在对经上采样处理得到的特征图进行卷积操作后,再进行下一次的上采样处理。即对第N-n次下采样处理后的特征图和第n次上采样处理后的特征图进行卷积操作;对卷积操作后的特征图进行第n+1次上采样处理。其中,n为正整数,且1<n<N。
在一个可选示例中,该步骤202可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的上采样模块执行。
可以理解的是,本实现方式中,在对经第N-1次上采样处理得到的特征图进行卷积操作后,进行第N次上采样处理,并且,不需要对经第N次上采样处理得到的注意力图进行卷积操作。本实现方式中的卷积操作,可以利用卷积层来实现,也可以利用残差单元来实现,本实现方式对此不做限定。上述残差单元可以是包括两个以上卷积层的网络结构。
本实现方式的图像处理方法,不仅注意力图的分辨率与第一特征图的分辨率相同,得到的注意力图可以用于指导后续对第一特征图中的特征进行学习;同时,通过在每次下采样处理和上采样处理后,对得到的特征图进行卷积操作,能够更好的学习得到的不同维度下特征图中的特征。
在本实施例的一些可选的实现方式中,在对第N-n次下采样处理后的特征图和第n次上采样处理后的特征图进行卷积操作时,还可以通过以下步骤来实现:
对第N-n次下采样处理后的特征图进行卷积处理,得到卷积图;将卷积图中至少一个像素(例如,各像素)的特征值与第n次上采样处理后的特征图中相应像素的特征值相加;对相加后的(即:相加得到的)特征图进行卷积操作。
本实现方式中,n为正整数,且1<n<N。N可以为一个预设值,也可以是根据第一特征图的分辨率计算得到的值,并且可通过如下计算方法来确定N的值:设置下采样处理后得到的特征图的最小分辨率,根据第一特征图的分辨率以及上述最小分辨率来确定能够进行的下采样的次数,即确定N的值。例如第一特征图的分辨率为56×56,设置的最小分辨率为7×7,每次下采样操作后,得到的特征图的分辨率降为下采样之前的特征图的四分之一,则确定N的值最大为3。
由于本实现方式可以对第一特征图进行N次下采样处理,然后再进行N次上采样处理,则在下采样处理的过程中以及上采用处理的过程中会得到分辨率相同的特征图。为了获得待处理的图像的更深层次的信息,可以对第N-n次下采样处理后的特征图进行卷积处理,得到卷积图。然后将卷积图中至少一个像素的特征值与第n次上采样处理后的特征图中相 应像素的特征值相加,然后对相加后的特征图进行卷积操作。
本实现方式的图像处理方法,将在下采样处理的过程中以及上采用处理的过程中得到的分辨率相同的特征图相加,能够获得待处理的图像的更深层次的信息。
在本实施例的一些可选的实现方式中,在对第N次下采样处理后的特征图依次进行N次上采样处理时,还可以包括以下步骤:
对第N次下采样处理后的特征图进行至少一次卷积操作,并对最后一次卷积操作后的特征图进行第1次上采样处理。
本实现方式中,在最后一次下采样处理后,对得到的特征图进行卷积操作,得到全局信息图,然后对上述全局信息图再次进行卷积操作,然后对再次进行卷积操作后的特征图进行第一次上采样处理。这样,下采样处理过程与上采样处理过程为对称的两个处理过程,最后得到的注意力图更能够反映待处理的图像中包含的特征信息。
在一种可选示例中,本实现方式的图像处理方法可以采用图3a所示的网络结构来实现。如图3a所示,本实现方式的网络结构包括输入层301、级联的多个卷积层302、多个下采样单元303、多个上采样单元304、多个残差单元305以及输出层306。可以理解的是,本实现方式中的卷积操作由残差单元来实现。
其中,输入层301用于输入待处理的图像。级联的多个卷积层302用于提取待处理的图像的特征,得到第一特征图。可以理解的是,上述级联的多个卷积层302也可以由残差单元来实现。
每个下采样单元303包括一个下采样层3031和一个残差单元3032,下采样层3031用于对级联的多个卷积层302得到的第一特征图进行下采样处理,每经过一次下采样处理得到的特征图的分辨率就相当于第一特征图分辨率的四分之一。每个残差单元3032用于在每次下采样处理后对下采样处理后的特征图进行卷积操作,以提取下采样处理后的特征图的特征。举例来说,第一特征图的分辨率为56×56,经过上述下采样层3031的一次下采样处理后,得到的图像的分辨率为28×28,残差单元3032提取上述28×28的图像的特征。假如上述网络结构中包括3个下采样单元303,则经过第3个下采样单元处理后的得到的特征图的分辨率为7×7,且第3个下采样单元的残差单元提取了上述7×7的图像的特征,得到了第一特征图的全局信息图。可以理解的是,上述网络结构中的下采样单元303的数目可以是任意的,本实现方式并不对此进行限定。同时,可以理解的是,且每个下采样单元中的残差单元的结构可以相同,即包括相同数量的卷积层,但各个卷积层的参数不同。
在得到第一特征图的全局信息图后,利用上采样单元304对上述全局信息图继续处理。上述上采样单元304可以包括一个残差单元3041和上采样层3042,残差单元3041与残差单元3032的结构可以相同,但参数不同。残差单元3041用于提取上述残差单元3032得到的全局信息图的特征,在提取了上述特征后,经上采样层3042的上采样处理,得到的特征图的分辨率为全局信息图分辨率的四倍。经过与下采样次数相同的上采样处理后,最后得到的注意力图的分辨率与第一特征图的分辨率相同。
可以理解的是,图3a中,下采样单元303中的下采样层可以通过最大池化层来实现,上采样单元304中的上采样层可以通过双线性插值层来实现。另外,每个下采样单元303和上采样单元304中还可以包括多个残差单元。可参考图3b,如图3b所示,下采样单元303’包括一个最大池化层和r个级联的残差单元,上采样单元304’包括r个级联的残差单元和一个内插层,且最后一个最大池化层和第一个双线性插值层之间包括2r个级联的残差 单元。并且上采样过程中与下采样过程中得到的相同分辨率的特征图之间通过一个残差单元卷积后可以相加,因此,在最后一个最大池化层之前和第一个双线性插值层之后连接有一个残差单元305’。其中,r为大于或等于1的整数。
由于在下采样和上采样的过程中,会得到分辨率相同的特征图,例如下采样过程中会得到分辨率为28×28、14×14和7×7的特征图,同样的,在上采样的过程中会得到分辨率为14×14、28×28和56×56的特征图。在下采样过程中得到的分辨率为14×14特征图,可以经一个残差单元305处理后与上采样过程中得到的分辨率为14×14特征图中对应像素的特征值相加,用于后续的上采样;在下采样过程得到的分辨率为28×28特征图,可以经一个残差单元304处理后与上采样过程中得到的分辨率为28×28特征图中对应像素的特征值相加,用于后续的上采样。这样的处理,可以捕捉待处理的图像中包含的物体的多尺度特征,同时可以增强注意力所关注的至少一个物体的特征的强度,抑制注意力不关注的其它物体的特征的强度。
参考图4,其示出了根据本申请的图像处理方法中融合注意力图和第一特征图所采用的流程400。如图4所示,本实施例的融合操作可以通过以下步骤来实现:
步骤401,至少对注意力图进行归一化处理。
本实施例中,对注意力图进行归一化处理,可以将注意力图中至少一个像素(例如,各像素)的权重值限定在[0,1]之间。上述归一化操作,例如可以通过sigmoid函数来实现,sigmoid函数是神经网络的阈值函数,其可以将变量映射到[0,1]之间。
在一个可选示例中,该步骤401可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的归一化单元执行。
步骤402,融合归一化处理后的注意力图和第一特征图,获得融合图。
在对注意力图进行归一化处理后,然后将归一化处理后的注意力图和第一特征图融合,得到融合图。
本实施例中,对注意力进行归一化操作一方面可以方便后续的数据处理,另一方面可以促进后续的数据处理得到更准确的结果。
在一个可选示例中,该步骤104可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的融合单元703执行。
在本实施例的一些可选的实现方式中,在步骤401中对注意力图进行归一化处理前,可以先对注意力图进行至少一次卷积处理,然后对最后一次卷积处理后的注意力图再进行归一化处理。在一个可选示例中,上述操作可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二卷积单元和归一化单元执行。
在一种可选示例中,上述卷积操作可以通过卷积层来实现,可选的,可以设置该卷积层的卷积核为1×1,这样可以增强对注意力图中所包含的特征的表达能力。
在本实施例的一些可选的实现方式中,在步骤402中融合注意力图和第一特征图时,可以将归一化处理后的注意力图中至少一个像素(例如,各像素)的权重值与第一特征图中相应像素的特征值相乘,得到融合图。在一个可选示例中,上述操作可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的融合单元703执行。
本实现方式中,由于注意力图与第一特征图的分辨率相同,因此注意力图中的至少一个像素可以与第一特征图中的至少一个像素一一对应。并且,步骤401中已对注意力图中至少一个像素的权重值进行归一化处理,可以将归一化处理后的权重值与第一特征图中对 应像素的特征值利用相乘的方法进行融合,将得到的相乘图作为融合图。
在本实施例的一些可选的实现方式中,在步骤402中得到上述相乘图后,还可以将得到的相乘图中至少一个像素的特征值与第一特征图中相应像素的特征值相加,将相加后得到的特征图作为融合图。在一个可选示例中,上述操作可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的融合单元703执行。
由于得到的相乘图中包含了待处理的图像中的特征信息,这些特征信息可称为有用信息。并且,不论是对注意力图的处理,还是对特征图的处理,都会降低待处理的图像中的特征信息的信号强度,即降低了第一特征图中至少一个像素的特征值。信号强度的衰减不利于神经网络对特征的学习,而对于上述有用信息的衰减则会直接影响神经网络的特征学习能力。
本实现方式中,将相乘图中至少一个像素的特征值与第一特征图中相应像素的特征值相加,一方面可以增加上述有用信息在整个融合图至少一个像素的特征值的比例,相当于抑制了有用信息之外的其他信息,另一方面还可以防止信号强度的衰减。
在图3所示的网络结构的基础上,结合图1所示实施例所描述的方案,可以得到本实施例的神经网络结构如图5a所示。图5a中,神经网络500包括第一特征提取单元501、第一卷积单元502、注意力提取单元503、第二卷积单元504、归一化单元505、融合单元506以及第二特征提取单元507。其中,第一特征提取单元501、第一卷积单元502以及第二特征提取单元507都由多个残差单元形成,第一特征提取单元501包括p个级联的残差单元,第一卷积单元502包括t个级联的残差单元,第二特征提取单元507包括p个级联的残差单元。其中,p、t为大于1的整数。
第一特征提取单元501的作用相当于图3中级联的多个卷积层302,用于提取待处理的图像的特征,得到第一特征图。第一卷积单元502可以进一步提取第一特征图的特征。注意力提取单元503的作用相当于图3中多个下采样单元303、多个上采样单元304以及多个残差单元305,获取注意力图。第二卷积单元504用于在对注意力图进行归一化处理前对注意力图进行至少一次卷积操作。归一化单元505用于对注意力图进行归一化处理。融合单元506用于融合归一化处理后的注意力图和第一特征图,得到融合图。第二特征提取单元507用于再次提取融合图的特征。
图5a所示的神经网络的处理过程可参考图5b,如图5b所示,以x表示输入的特征,即第一特征图。利用图5a所示的注意力提取单元503的感受野以及第一卷积单元502的感受野分别模拟人类视觉的注意力。图5b中的左侧分支相当于注意力提取单元503,右侧分支相当于第一卷积单元502。
图5b中的左侧分支包括两次下采样与两次上采样,经第一次下采样后,得到的特征图的分辨率为第一特征图x分辨率的四分之一;经第二次下采样后,得到的特征图的分辨率为第一特征图x分辨率的十六分之一;然后经第一次上采样处理,得到的特征图与第一次下采样处理后得到的特征图的分辨率相同;经第二次上采样处理,得到的特征图与第一特征图的分辨率相同。同时,经过上述两次下采样处理和两次上采样处理,即在遍历整个特征图后,确定了图像中注意力关注的特征的权重M(x)。
图5b中的右侧分支包括对第一特征图x的卷积操作,得到特征T(x)。
最后,将得到的权重M(x)与特征T(x)进行融合,得到融合图,上述融合图中包括融合后的特征(1+M(x))·T(x)。
可以理解的是,为了构造更深层次的神经网络,在本申请的可选实施例中,还可以将上述神经网络500作为子神经网络,多次重复执行,并可以将不同参数的子神经网络堆叠,得到如图6所示的深度卷积神经网络600。深度卷积神经网络600可以包括多个子神经网络,图6中示意性的示出了三个子神经网络,分别为子神经网络601、子神经网络602以及子神经网络603。每个子神经网络的参数可以相同,也可以不同。此处所指的子神经网络的参数可以包括:注意力提取单元中的下采样和上采样的次数、第一卷积单元中残差单元的个数等等。另外,每个子神经网络可以重复多次,例如当子神经网络601、子神经网络602以及子神经网络603至少一个不相同时,深度卷积神经网络600可以包括m个子神经网络601、k个子神经网络602以及j个子神经网络603,其中,m、k、j均为正整数。
基于本实施例提出的图像处理方法构建的神经网络,在神经网络训练过程可有效减小神经网络训练过所需调整的参数量,提高了对特征的学习效率;同时基于训练完成后的神经网络进行图像处理的过程中,无需进行调整参数的操作;通过相同次数的下采样和上采样操作,实现了全局信息的反向传递,从而促进了注意力关注的有用信息的传递。
在本实施例的一些可选的实现方式中,待处理的图像中可以包含多个物体,且上述多个物体可以为同一种类的物体,也可以为不同种类的物体。上述物体可以是至少一个种类别的物体,例如可以包括飞机、自行车、汽车等各种交通工具,还可以包括鸟类、狗、狮子等各种动物。
在基于融合图,再次提取了上述待处理的图像的特征后,可以利用再次提取的特征,检测或识别上述图像中包括的物体。
进一步的,还可以利用再次提取的特征,确定上述图像中包括的物体的类别。
进一步的,还可以利用再次提取的特征,对上述图像进行分割,将包含物体的部分分割出来。
本实现方式的图像处理方法,在再次提取了待处理的图像的特征之后,可以利用再次提取的特征实现不同的应用,可满足不同任务的图像处理需求。例如,可以对图像中包含的物体进行检测或识别,可应用到无人驾驶或导盲装置中;可以对图像中包含的物体进行分类,可以应用到军事领域的侦测装置中;可以对图像进行分割,可以应用到进一步的对物体的分析中。
本申请实施例提供的任一种图像处理方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本申请实施例提供的任一种图像处理方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本申请实施例提及的任一种图像处理方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等至少一个种可以存储程序代码的介质。
作为对上述至少一个图所示方法的实现,本申请提供了一种图像处理装置的一个实施例,该装置实施例与图1所示的方法实施例相对应,该装置可选可以应用于至少一个种电子设备中。
如图7所示,本实施例的图像处理装置700包括:第一特征提取单元701、注意力提取单元702、融合单元703以及第二特征提取单元704。
其中,第一特征提取单元701,用于提取待处理的图像的特征,获得上述图像的第一特征图。
注意力提取单元702,用于基于上述第一特征图,生成上述图像的注意力图。
融合单元703,用于融合上述注意力图和上述第一特征图,获得融合图。
第二特征提取单元704,用于基于上述融合图,再次提取上述图像的特征。
在本实施例的一些可选的实现方式中,上述注意力提取单元702可以进一步包括图7中未示出的下采样模块和上采样模块。
下采样模块,用于对上述第一特征图依次进行N次下采样处理,其中,N为大于或等于1的整数;
上采样模块,用于对第N次下采样处理后的特征图依次进行N次上采样处理,获得上述图像的注意力图,其中,上述注意力图的分辨率与上述第一特征图相同。
在本实施例的一些可选的实现方式中,上述上采样模块可以用于:对第N-n次下采样处理后的特征图和第n次上采样处理后的特征图进行卷积操作;对卷积操作后的特征图进行第n+1次上采样处理,获得所述图像的注意力图。其中,n为大于1且小于N的整数。
在本实施例的一些可选的实现方式中,上述上采样模块对第N-n次下采样处理后的特征图和第n次上采样处理后的特征图进行卷积操作时,用于:对上述第N-n次下采样处理后的特征图进行卷积处理,得到卷积图;将上述卷积图中至少一个像素的特征值与第n次上采样处理后的特征图中相应像素的特征值相加;对相加后的特征图进行卷积操作。
在本实施例的一些可选的实现方式中,上述上采样模块可以用于:对第N次下采样处理后的特征图进行至少一次卷积操作;对最后一次卷积操作后的特征图进行第1次上采样处理,获得图像的注意力图。在本实施例的一些可选的实现方式中,上述图像处理装置700还可以包括图7中未示出的第二卷积单元,用于对上述第一特征图进行至少一次卷积操作。相应的,融合单元703,用于融合上述注意力图和最后一次卷积操作后的第一特征图,获得上述融合图。
在本实施例的一些可选的实现方式中,上述图像处理装置700还可以包括图7中未示出的归一化单元,用于至少对上述注意力图进行归一化处理。相应的,融合单元703,用于融合归一化处理后的注意力图和上述第一特征图,获得上述融合图。
在本实施例的一些可选的实现方式中,上述图像处理装置700还可以包括图7中未示出的第二卷积单元,用于对上述注意力图依次进行至少一次卷积处理。相应的,上述归一化单元,用于对最后一次卷积处理后的注意力图进行归一化处理。
在本实施例的一些可选的实现方式中,上述融合单元703还可以进一步用于:将归一化处理后的注意力图中至少一个像素的权重值与上述第一特征图中相应像素的特征值相乘,获得上述融合图。
在本实施例的一些可选的实现方式中,上述融合单元703还可以进一步用于:将归一化处理后的注意力图中至少一个像素的权重值与上述第一特征图中相应像素的特征值相乘,获得相乘图;将上述相乘图中至少一个像素的特征值与上述第一特征图中相应像素的特征值相加,获得上述融合图。
在本实施例的一些可选的实现方式中,上述图像处理装置700还可以包括图7中未示出的以下至少一项:检测单元、分类单元、分割单元。
其中,检测单元,用于根据再次提取的上述图像的特征,检测或识别上述图像中包括 的物体。
分类单元,用于根据再次提取的上述图像的特征,确定上述图像中包括的物体的类别。
分割单元,用于根据再次提取的上述图像的特征,对上述图像进行分割。
本申请的上述实施例提供的图像处理装置,首先提取待处理的图像的特征,得到上述图像的第一特征图,基于上述第一特征图,生成上述图像的注意力图,再将注意力图和第一特征图融合,最后再基于得到的融合图再次提取上述图像的特征,从而实现了将注意力机制引入到图像处理中,有效地提高了从图像中获取信息的效率。
附图中的流程图和框图,图示了按照本申请至少一个种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括第一特征提取单元、注意力提取单元、融合单元及第二特征提取单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一特征提取单元还可以被描述为“提取待处理的图像的特征,获得上述图像的第一特征图的单元”。
另外,本申请实施例还提供了另一种电子设备,包括:处理器和存储器,该存储器用于存放至少一可执行指令,可执行指令使处理器执行本申请前述任一实施例的图像处理方法对应的操作。
本申请实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等。下面参考图8,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备800的结构示意图:如图8所示,计算机系统800包括一个或多个处理器、通信部等,上述一个或多个处理器例如:一个或多个中央处理单元(CPU)801,和/或一个或多个图像处理器(GPU)813等,处理器可以根据存储在只读存储器(ROM)802中的可执行指令或者从存储部分808加载到随机访问存储器(RAM)803中的可执行指令而执行至少一个种适当的动作和处理。通信部812可包括但不限于网卡,上述网卡可包括但不限于IB(Infiniband)网卡。
处理器可与ROM 802和/或RAM 803通信以执行可执行指令,通过总线804与通信部812相连、并经通信部812与其他目标设备通信,从而完成本申请实施例提供的任一项方法对应的操作,例如,提取待处理的图像的特征,获得上述图像的第一特征图;基于上述第一特征图,生成上述图像的注意力图;融合上述注意力图和上述第一特征图,获得融合图;基于上述融合图,再次提取上述图像的特征。
此外,在RAM 803中,还可存储有装置操作所需的至少一个种程序和数据。CPU 801、ROM 802以及RAM 803通过总线804彼此相连。在有RAM 803的情况下,ROM 802为可选模块。RAM 803存储可执行指令,或在运行时向ROM 802中写入可执行指令,可执 行指令使CPU 801执行上述通信方法对应的操作。输入/输出(I/O)接口805也连接至总线804。通信部812可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线804链接上。
以下部件连接至I/O接口805:包括键盘、鼠标等的输入部分806;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分807;包括硬盘等的存储部分808;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分809。通信部分809经由诸如因特网的网络执行通信处理。驱动器810也根据需要连接至I/O接口805。可拆卸介质811,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器810上,以便于从其上读出的计算机程序根据需要被安装入存储部分808。
需要说明的,如图8所示的架构仅为一种可选实现方式,在可选实践过程中,可根据实际需要对上述图8的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU 813和CPU 801可分离设置或者可将GPU 813集成在CPU 801上,通信部可分离设置,也可集成设置在CPU 801或GPU813上,等等。这些可替换的实施方式均落入本申请公开的保护范围。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,提取待处理的图像的特征,获得上述图像的第一特征图;基于上述第一特征图,生成上述图像的注意力图;融合上述注意力图和上述第一特征图,获得融合图;基于上述融合图,再次提取上述图像的特征。在这样的实施例中,该计算机程序可以通过通信部分809从网络上被下载和安装,和/或从可拆卸介质811被安装。在该计算机程序被CPU 801执行时,执行本申请的方法中限定的上述功能。
另外,本申请实施例还提供了一种计算机程序,包括计算机可读代码,当计算机可读代码在设备上运行时,设该备中的处理器执行用于实现本申请前述任一实施例的图像处理方法中各步骤的指令。
另外,本申请实施例还提供了一种计算机可读存储介质,用于存储计算机可读取的指令,该指令被执行时实现本申请前述任一实施例的图像处理方法中各步骤的操作。
可能以许多方式来实现本申请实施例的方法和装置、设备。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请实施例的方法和装置、设备。用于方法的步骤的上述顺序仅是为了进行说明,本申请实施例的方法的步骤不限于以上可选描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请实施例的方法的机器可读指令。因而,本申请实施例还覆盖存储用于执行根据本申请实施例的方法的程序的记录介质。
本申请实施例的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请实施例限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请实施例的原理和实际应用,并且使本领域的普通技术人员能够理解本申请实施例从而设计适于特定用途的带有至少一个种修改的至少一个种实施例。

Claims (25)

  1. 一种图像处理方法,其特征在于,包括:
    提取待处理的图像的特征,获得所述图像的第一特征图;
    基于所述第一特征图,生成所述图像的注意力图;
    融合所述注意力图和所述第一特征图,获得融合图;
    基于所述融合图,再次提取所述图像的特征。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述第一特征图,生成所述图像的注意力图,包括:
    对所述第一特征图依次进行N次下采样处理,其中,N为大于或等于1的整数;
    对第N次下采样处理后的特征图依次进行N次上采样处理,获得所述图像的注意力图,其中,所述注意力图的分辨率与所述第一特征图的分辨率相同。
  3. 根据权利要求2所述的方法,其特征在于,所述对第N次下采样处理后的特征图依次进行N次上采样处理,包括:
    对第N-n次下采样处理后的特征图和第n次上采样处理后的特征图进行卷积操作,其中,n为大于1且小于N的整数;
    对卷积操作后的特征图进行第n+1次上采样处理。
  4. 根据权利要求3所述的方法,其特征在于,所述对第N-n次下采样处理后的特征图和第n次上采样处理后的特征图进行卷积操作,包括:
    对所述第N-n次下采样处理后的特征图进行卷积处理,得到卷积图;
    将所述卷积图中至少一个像素的特征值与第n次上采样处理后的特征图中相应像素的特征值相加;
    对相加后的特征图进行卷积操作。
  5. 根据权利要求3或4所述的方法,其特征在于,所述对第N次下采样处理后的特征图依次进行N次上采样处理,还包括:
    对第N次下采样处理后的特征图进行至少一次卷积操作;
    对最后一次卷积操作后的特征图进行第1次上采样处理。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述融合所述注意力图和所述第一特征图,获得融合图,包括:
    对所述第一特征图进行至少一次卷积操作;
    融合所述注意力图和最后一次卷积操作后的第一特征图,获得所述融合图。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述融合所述注意力图和所述第一特征图,获得融合图,包括:
    至少对所述注意力图进行归一化处理;
    融合归一化处理后的注意力图和所述第一特征图,获得所述融合图。
  8. 根据权利要求7所述的方法,其特征在于,所述至少对所述注意力图进行归一化处理,包括:
    对所述注意力图依次进行至少一次卷积处理;
    对最后一次卷积处理后的注意力图进行归一化处理。
  9. 根据权利要求7或8所述的方法,其特征在于,所述融合所述注意力图和所述第 一特征图,获得融合图,包括:
    将归一化处理后的注意力图中至少一个像素的权重值与所述第一特征图中相应像素的特征值相乘,获得所述融合图。
  10. 根据权利要求7或8所述的方法,其特征在于,所述融合所述注意力图和所述第一特征图,获得融合图,包括:
    将归一化处理后的注意力图中至少一个像素的权重值与所述第一特征图中相应像素的特征值相乘,获得相乘图;
    将所述相乘图中至少一个像素的特征值与所述第一特征图中相应像素的特征值相加,获得所述融合图。
  11. 根据权利要求1-10任一项所述的方法,其特征在于,在所述基于所述融合图,再次提取所述图像的特征之后,所述方法还包括以下至少之一:
    根据再次提取的所述图像的特征,检测或识别所述图像中包括的物体;
    根据再次提取的所述图像的特征,确定所述图像中包括的物体的类别;
    根据再次提取的所述图像的特征,对所述图像进行分割。
  12. 一种图像处理装置,其特征在于,包括:
    第一特征提取单元,用于提取待处理的图像的特征,获得所述图像的第一特征图;
    注意力提取单元,用于基于所述第一特征图,生成所述图像的注意力图;
    融合单元,用于融合所述注意力图和所述第一特征图,获得融合图;
    第二特征提取单元,用于基于所述融合图,再次提取所述图像的特征。
  13. 根据权利要求11所述的装置,其特征在于,所述注意力提取单元包括:
    下采样模块,用于对所述第一特征图依次进行N次下采样处理,其中,N为大于或等于1的整数;
    上采样模块,用于对第N次下采样处理后的特征图依次进行N次上采样处理,获得所述图像的注意力图,其中,所述注意力图的分辨率与所述第一特征图的分辨率相同。
  14. 根据权利要求13所述的装置,其特征在于,所述上采样模块用于:
    对第N-n次下采样处理后的特征图和第n次上采样处理后的特征图进行卷积操作,其中,n为大于1且小于N的整数;
    对卷积操作后的特征图进行第n+1次上采样处理,获得所述图像的注意力图。
  15. 根据权利要求14所述的装置,其特征在于,所述上采样模块对第N-n次下采样处理后的特征图和第n次上采样处理后的特征图进行卷积操作时,用于:
    对所述第N-n次下采样处理后的特征图进行卷积处理,得到卷积图;
    将所述卷积图中至少一个像素的特征值与第n次上采样处理后的特征图中相应像素的特征值相加;
    对相加后的特征图进行卷积操作。
  16. 根据权利要求14或15所述的装置,其特征在于,所述上采样模块用于:
    对第N次下采样处理后的特征图进行至少一次卷积操作;
    对最后一次卷积操作后的特征图进行第1次上采样处理,获得所述图像的注意力图。
  17. 根据权利要求12-16任一项所述的装置,其特征在于,还包括:
    第二卷积单元,用于对所述第一特征图进行至少一次卷积操作;
    所述融合单元,用于融合所述注意力图和最后一次卷积操作后的第一特征图,获得所 述融合图。
  18. 根据权利要求12-17任一项所述的装置,其特征在于,还包括:
    归一化单元,用于至少对所述注意力图进行归一化处理;
    所述融合单元,用于融合归一化处理后的注意力图和所述第一特征图,获得所述融合图。
  19. 根据权利要求18所述的装置,其特征在于,还包括:
    第二卷积单元,用于对所述注意力图依次进行至少一次卷积处理;以及
    所述归一化单元,用于对最后一次卷积处理后的注意力图进行归一化处理。
  20. 根据权利要求18或19所述的装置,其特征在于,所述融合单元用于:
    将归一化处理后的注意力图中至少一个像素的权重值与所述第一特征图中相应像素的特征值相乘,获得所述融合图。
  21. 根据权利要求18或19所述的装置,其特征在于,所述融合单元用于:
    将归一化处理后的注意力图中至少一个像素的权重值与所述第一特征图中相应像素的特征值相乘,获得相乘图;
    将所述相乘图中至少一个像素的特征值与所述第一特征图中相应像素的特征值相加,获得所述融合图。
  22. 根据权利要求14-21任一项所述的装置,其特征在于,还包括以下至少之一:
    检测单元,用于根据再次提取的所述图像的特征,检测或识别所述图像中包括的物体;
    分类单元,用于根据再次提取的所述图像的特征,确定所述图像中包括的物体的类别;
    分割单元,用于根据再次提取的所述图像的特征,对所述图像进行分割。
  23. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于,所述指令被执行时实现权利要求1~11中任一项所述图像处理方法中各步骤的操作。
  24. 一种电子设备,其特征在于,包括:处理器和存储器;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行权利要求1~11中任一项所述图像处理方法对应的操作。
  25. 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1~11中任一项所述的图像处理方法中各步骤的指令。
PCT/CN2018/078810 2017-03-13 2018-03-13 图像处理方法、装置及电子设备 WO2018166438A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/451,334 US10943145B2 (en) 2017-03-13 2019-06-25 Image processing methods and apparatus, and electronic devices

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710145253.1A CN106934397B (zh) 2017-03-13 2017-03-13 图像处理方法、装置及电子设备
CN201710145253.1 2017-03-13

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/451,334 Continuation US10943145B2 (en) 2017-03-13 2019-06-25 Image processing methods and apparatus, and electronic devices

Publications (1)

Publication Number Publication Date
WO2018166438A1 true WO2018166438A1 (zh) 2018-09-20

Family

ID=59433696

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/078810 WO2018166438A1 (zh) 2017-03-13 2018-03-13 图像处理方法、装置及电子设备

Country Status (3)

Country Link
US (1) US10943145B2 (zh)
CN (1) CN106934397B (zh)
WO (1) WO2018166438A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447990A (zh) * 2018-10-22 2019-03-08 北京旷视科技有限公司 图像语义分割方法、装置、电子设备和计算机可读介质
CN110046627A (zh) * 2018-10-16 2019-07-23 杭州依图医疗技术有限公司 一种乳腺影像识别的方法及装置
CN110598788A (zh) * 2019-09-12 2019-12-20 腾讯科技(深圳)有限公司 目标检测方法、装置、电子设备及存储介质
CN111108508A (zh) * 2019-12-23 2020-05-05 深圳市优必选科技股份有限公司 脸部情感识别方法、智能装置和计算机可读存储介质
CN111199516A (zh) * 2019-12-30 2020-05-26 深圳大学 基于图像生成网络模型的图像处理方法、系统及存储介质
CN111402274A (zh) * 2020-04-14 2020-07-10 上海交通大学医学院附属上海儿童医学中心 一种磁共振左心室图像分割的处理方法、模型及训练方法
CN111861897A (zh) * 2019-05-17 2020-10-30 北京嘀嘀无限科技发展有限公司 一种图像处理方法及装置
CN113379667A (zh) * 2021-07-16 2021-09-10 浙江大华技术股份有限公司 脸部图像生成方法、装置、设备及介质
EP3871158A4 (en) * 2019-05-16 2021-12-29 Samsung Electronics Co., Ltd. Image processing apparatus and operating method of the same
CN114677661A (zh) * 2022-03-24 2022-06-28 智道网联科技(北京)有限公司 一种路侧标识识别方法、装置和电子设备

Families Citing this family (82)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934397B (zh) 2017-03-13 2020-09-01 北京市商汤科技开发有限公司 图像处理方法、装置及电子设备
CN107291945B (zh) * 2017-07-12 2020-03-31 上海媒智科技有限公司 基于视觉注意力模型的高精度服装图像检索方法及系统
CN108229531B (zh) * 2017-09-29 2021-02-26 北京市商汤科技开发有限公司 对象特征提取方法、装置、存储介质和电子设备
US10891723B1 (en) * 2017-09-29 2021-01-12 Snap Inc. Realistic neural network based image style transfer
CN108876813B (zh) * 2017-11-01 2021-01-26 北京旷视科技有限公司 用于视频中物体检测的图像处理方法、装置及设备
CN108229302A (zh) * 2017-11-10 2018-06-29 深圳市商汤科技有限公司 特征提取方法、装置、计算机程序、存储介质和电子设备
CN108229650B (zh) * 2017-11-15 2021-04-09 北京市商汤科技开发有限公司 卷积处理方法、装置及电子设备
CN108171260B (zh) * 2017-12-15 2022-02-11 百度在线网络技术(北京)有限公司 一种图片识别方法及系统
CN107993217B (zh) * 2017-12-22 2021-04-09 北京奇虎科技有限公司 视频数据实时处理方法及装置、计算设备
CN108280451B (zh) * 2018-01-19 2020-12-29 北京市商汤科技开发有限公司 语义分割及网络训练方法和装置、设备、介质
CN108154145B (zh) * 2018-01-24 2020-05-19 北京地平线机器人技术研发有限公司 检测自然场景图像中的文本的位置的方法和装置
CN108038519B (zh) * 2018-01-30 2020-11-24 浙江大学 一种基于稠密的特征金字塔网络的宫颈图像处理方法及装置
CN108364023A (zh) * 2018-02-11 2018-08-03 北京达佳互联信息技术有限公司 基于注意力模型的图像识别方法和系统
CN108647585B (zh) * 2018-04-20 2020-08-14 浙江工商大学 一种基于多尺度循环注意力网络的交通标识符检测方法
CN108734290B (zh) * 2018-05-16 2021-05-18 湖北工业大学 一种基于注意力机制的卷积神经网络构建方法及应用
CN108830322A (zh) * 2018-06-15 2018-11-16 联想(北京)有限公司 一种图像处理方法及装置、设备、存储介质
CN109190649B (zh) * 2018-07-02 2021-10-01 北京陌上花科技有限公司 一种深度学习网络模型服务器的优化方法和装置
US11429824B2 (en) * 2018-09-11 2022-08-30 Intel Corporation Method and system of deep supervision object detection for reducing resource usage
CN111091593B (zh) * 2018-10-24 2024-03-22 深圳云天励飞技术有限公司 图像处理方法、装置、电子设备及存储介质
CN109257622A (zh) * 2018-11-01 2019-01-22 广州市百果园信息技术有限公司 一种音视频处理方法、装置、设备及介质
CN109658346B (zh) * 2018-11-13 2021-07-02 达闼科技(北京)有限公司 图像修复方法、装置、计算机可读存储介质及电子设备
CN113591755B (zh) * 2018-11-16 2024-04-16 北京市商汤科技开发有限公司 关键点检测方法及装置、电子设备和存储介质
CN109993735A (zh) * 2019-03-29 2019-07-09 成都信息工程大学 基于级联卷积的图像分割方法
CN110136197A (zh) * 2019-04-22 2019-08-16 南方电网科学研究院有限责任公司 机器人巡检图像的表计位置检测方法、装置及存储介质
CN110135307B (zh) * 2019-04-30 2022-07-01 北京邮电大学 基于注意力机制的交通标志检测方法和装置
CN110647794B (zh) * 2019-07-12 2023-01-03 五邑大学 基于注意力机制的多尺度sar图像识别方法及装置
CN110675409A (zh) * 2019-09-20 2020-01-10 上海商汤智能科技有限公司 图像处理方法及装置、电子设备和存储介质
EP4032062A4 (en) 2019-10-25 2022-12-14 Samsung Electronics Co., Ltd. Image processing method, apparatus, electronic device and computer readable storage medium
CN110796412B (zh) * 2019-10-29 2022-09-06 浙江大华技术股份有限公司 包裹跟踪方法以及相关装置
CN110956122B (zh) * 2019-11-27 2022-08-02 深圳市商汤科技有限公司 图像处理方法及装置、处理器、电子设备、存储介质
CN112927146A (zh) * 2019-12-05 2021-06-08 北大方正集团有限公司 压缩图像复原方法、装置、设备和存储介质
CN111145196A (zh) * 2019-12-11 2020-05-12 中国科学院深圳先进技术研究院 图像分割方法、装置及服务器
CN111079767B (zh) * 2019-12-22 2022-03-22 浪潮电子信息产业股份有限公司 一种用于分割图像的神经网络模型及其图像分割方法
SG10201913754XA (en) * 2019-12-30 2020-12-30 Sensetime Int Pte Ltd Image processing method and apparatus, electronic device, and storage medium
CN112219224B (zh) * 2019-12-30 2024-04-26 商汤国际私人有限公司 图像处理方法及装置、电子设备和存储介质
US11450021B2 (en) 2019-12-30 2022-09-20 Sensetime International Pte. Ltd. Image processing method and apparatus, electronic device, and storage medium
CN111401415A (zh) * 2020-03-02 2020-07-10 北京三快在线科技有限公司 计算机视觉任务模型的训练方法、装置、设备和存储介质
US20230103737A1 (en) * 2020-03-03 2023-04-06 Nec Corporation Attention mechanism, image recognition system, and feature conversion method
CN111414962B (zh) * 2020-03-19 2023-06-23 创新奇智(重庆)科技有限公司 一种引入物体关系的图像分类方法
CN111476737B (zh) * 2020-04-15 2022-02-11 腾讯科技(深圳)有限公司 一种图像处理方法、智能设备及计算机可读存储介质
CN111539887B (zh) * 2020-04-21 2023-07-14 温州大学 一种基于混合卷积的通道注意力机制和分层学习的神经网络图像去雾方法
CN111639652B (zh) * 2020-04-28 2024-08-20 博泰车联网(南京)有限公司 一种图像处理方法、装置及计算机存储介质
CN111729304B (zh) * 2020-05-26 2024-04-05 广州尊游软件科技有限公司 一种展示海量对象的方法
CN111627038B (zh) * 2020-05-27 2021-05-11 杭州王道控股有限公司 一种背景去除方法、装置、设备及可读存储介质
CN111368942B (zh) * 2020-05-27 2020-08-25 深圳创新奇智科技有限公司 商品分类识别方法、装置、电子设备及存储介质
CN112084865A (zh) * 2020-08-06 2020-12-15 中国科学院空天信息创新研究院 目标检测方法、装置、电子设备和存储介质
CN112149661B (zh) * 2020-08-07 2024-06-21 珠海欧比特宇航科技股份有限公司 车牌识别方法、装置及介质
CN112101456B (zh) * 2020-09-15 2024-04-26 推想医疗科技股份有限公司 注意力特征图获取方法及装置、目标检测的方法及装置
US12045288B1 (en) * 2020-09-24 2024-07-23 Amazon Technologies, Inc. Natural language selection of objects in image data
CN112241955B (zh) * 2020-10-27 2023-08-25 平安科技(深圳)有限公司 三维图像的碎骨分割方法、装置、计算机设备及存储介质
CN112258487B (zh) * 2020-10-29 2024-06-18 成都芯昇动力科技有限公司 图像检测系统及方法
KR102562731B1 (ko) * 2020-11-06 2023-08-01 연세대학교 산학협력단 자기 집중 모듈 및 이를 이용한 정규화 방법
US20220156587A1 (en) * 2020-11-16 2022-05-19 Objectvideo Labs, Llc Multi-head deep metric machine-learning architecture
CN112464810A (zh) * 2020-11-25 2021-03-09 创新奇智(合肥)科技有限公司 一种基于注意力图的吸烟行为检测方法及装置
CN112562819B (zh) * 2020-12-10 2022-06-17 清华大学 一种针对先心病的超声多切面数据的报告生成方法
CN112489033B (zh) * 2020-12-13 2025-05-02 杭州追猎科技有限公司 基于分类权重的混凝土养护箱的清洁效果的检测方法
CN112633352B (zh) * 2020-12-18 2023-08-29 浙江大华技术股份有限公司 一种目标检测方法、装置、电子设备及存储介质
CN112884007B (zh) * 2021-01-22 2022-08-09 重庆交通大学 一种像素级统计描述学习的sar图像分类方法
CN113158738B (zh) * 2021-01-28 2022-09-20 中南大学 一种基于注意力机制的港口环境下目标检测方法、系统、终端及可读存储介质
CN112991351B (zh) * 2021-02-23 2022-05-27 新华三大数据技术有限公司 遥感图像语义分割方法、装置及存储介质
CN112949654B (zh) * 2021-02-25 2025-02-25 上海商汤善萃医疗科技有限公司 图像检测方法及相关装置、设备
CN112819818B (zh) * 2021-02-26 2023-11-14 中国人民解放军总医院第一医学中心 图像识别模块训练方法和装置
CN112967264A (zh) * 2021-03-19 2021-06-15 深圳市商汤科技有限公司 缺陷检测方法及装置、电子设备和存储介质
CN113139543B (zh) * 2021-04-28 2023-09-01 北京百度网讯科技有限公司 目标对象检测模型的训练方法、目标对象检测方法和设备
CN113222846B (zh) * 2021-05-18 2024-05-10 北京达佳互联信息技术有限公司 图像处理方法和图像处理装置
CN113239840B (zh) * 2021-05-24 2024-10-15 中国农业银行股份有限公司 字迹鉴定方法、装置、设备和存储介质
CN113255700B (zh) * 2021-06-10 2021-11-02 展讯通信(上海)有限公司 图像的特征图的处理方法及装置、存储介质、终端
CA3225826A1 (en) * 2021-07-27 2023-02-02 Caroline ROUGIER Two-dimensional pose estimations
CN113344827B (zh) * 2021-08-05 2021-11-23 浙江华睿科技股份有限公司 一种图像去噪方法、图像去噪网络运算单元及设备
CN114565941B (zh) * 2021-08-24 2024-09-24 商汤国际私人有限公司 纹理生成方法、装置、设备及计算机可读存储介质
CN113570003B (zh) * 2021-09-23 2022-01-07 深圳新视智科技术有限公司 基于注意力机制的特征融合缺陷检测方法及装置
CN114119627B (zh) * 2021-10-19 2022-05-17 北京科技大学 基于深度学习的高温合金微观组织图像分割方法及装置
US12277671B2 (en) * 2021-11-10 2025-04-15 Adobe Inc. Multi-stage attention model for texture synthesis
CN114187213B (zh) * 2021-12-14 2024-12-03 成都微光集电科技有限公司 图像融合方法及其装置、设备和存储介质
CN114723760B (zh) * 2022-05-19 2022-08-23 北京世纪好未来教育科技有限公司 人像分割模型的训练方法、装置及人像分割方法、装置
CN114926686A (zh) * 2022-05-25 2022-08-19 上海商汤智能科技有限公司 图像识别方法、装置、计算机设备和存储介质
WO2023236576A1 (zh) * 2022-06-06 2023-12-14 京东科技控股股份有限公司 图像特征处理方法、装置、产品、介质及设备
CN115243031B (zh) * 2022-06-17 2024-06-21 合肥工业大学智能制造技术研究院 一种基于质量注意力机制的视频时空特征优化方法、系统、电子设备及存储介质
CN114821202B (zh) * 2022-06-29 2022-10-04 武汉纺织大学 一种基于用户偏好的服装推荐方法
CN115468570A (zh) * 2022-08-31 2022-12-13 北京百度网讯科技有限公司 高精地图地面要素的提取方法、装置、设备及存储介质
CN119251064A (zh) * 2024-10-14 2025-01-03 徐州创活信息科技有限公司 一种夜间低照度图像增强方法及系统
CN119380337B (zh) * 2024-12-27 2025-03-14 西北农林科技大学 基于图像分析的葡萄霜霉病孢子检测方法、系统和装置

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697593A (zh) * 2009-09-08 2010-04-21 武汉大学 一种基于时域预测的关注度提取方法
CN101866422A (zh) * 2010-06-29 2010-10-20 上海大学 基于图像的多特征融合提取图像关注度的方法
CN101980248A (zh) * 2010-11-09 2011-02-23 西安电子科技大学 基于改进视觉注意力模型的自然场景目标检测方法
CN103679718A (zh) * 2013-12-06 2014-03-26 河海大学 一种基于显著性的快速场景分析方法
US20140153651A1 (en) * 2011-07-19 2014-06-05 Thomson Licensing Method and apparatus for reframing and encoding a video signal
CN103996185A (zh) * 2014-04-29 2014-08-20 重庆大学 一种基于注意力td-bu机制的图像分割方法
CN105228033A (zh) * 2015-08-27 2016-01-06 联想(北京)有限公司 一种视频处理方法及电子设备
CN106157319A (zh) * 2016-07-28 2016-11-23 哈尔滨工业大学 基于卷积神经网络的区域和像素级融合的显著性检测方法
CN106934397A (zh) * 2017-03-13 2017-07-07 北京市商汤科技开发有限公司 图像处理方法、装置及电子设备
CN107729901A (zh) * 2016-08-10 2018-02-23 阿里巴巴集团控股有限公司 图像处理模型的建立方法、装置及图像处理方法及系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100040236A (ko) * 2008-10-09 2010-04-19 삼성전자주식회사 시각적 관심에 기반한 2차원 영상의 3차원 영상 변환기 및 변환 방법
US20170262996A1 (en) * 2016-03-11 2017-09-14 Qualcomm Incorporated Action localization in sequential data with attention proposals from a recurrent network
US10354362B2 (en) * 2016-09-08 2019-07-16 Carnegie Mellon University Methods and software for detecting objects in images using a multiscale fast region-based convolutional neural network
CN109118459B (zh) * 2017-06-23 2022-07-19 南开大学 图像显著性物体检测方法和装置

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697593A (zh) * 2009-09-08 2010-04-21 武汉大学 一种基于时域预测的关注度提取方法
CN101866422A (zh) * 2010-06-29 2010-10-20 上海大学 基于图像的多特征融合提取图像关注度的方法
CN101980248A (zh) * 2010-11-09 2011-02-23 西安电子科技大学 基于改进视觉注意力模型的自然场景目标检测方法
US20140153651A1 (en) * 2011-07-19 2014-06-05 Thomson Licensing Method and apparatus for reframing and encoding a video signal
CN103679718A (zh) * 2013-12-06 2014-03-26 河海大学 一种基于显著性的快速场景分析方法
CN103996185A (zh) * 2014-04-29 2014-08-20 重庆大学 一种基于注意力td-bu机制的图像分割方法
CN105228033A (zh) * 2015-08-27 2016-01-06 联想(北京)有限公司 一种视频处理方法及电子设备
CN106157319A (zh) * 2016-07-28 2016-11-23 哈尔滨工业大学 基于卷积神经网络的区域和像素级融合的显著性检测方法
CN107729901A (zh) * 2016-08-10 2018-02-23 阿里巴巴集团控股有限公司 图像处理模型的建立方法、装置及图像处理方法及系统
CN106934397A (zh) * 2017-03-13 2017-07-07 北京市商汤科技开发有限公司 图像处理方法、装置及电子设备

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046627A (zh) * 2018-10-16 2019-07-23 杭州依图医疗技术有限公司 一种乳腺影像识别的方法及装置
CN110046627B (zh) * 2018-10-16 2021-09-10 杭州依图医疗技术有限公司 一种乳腺影像识别的方法及装置
CN109447990A (zh) * 2018-10-22 2019-03-08 北京旷视科技有限公司 图像语义分割方法、装置、电子设备和计算机可读介质
EP3871158A4 (en) * 2019-05-16 2021-12-29 Samsung Electronics Co., Ltd. Image processing apparatus and operating method of the same
CN111861897A (zh) * 2019-05-17 2020-10-30 北京嘀嘀无限科技发展有限公司 一种图像处理方法及装置
CN110598788A (zh) * 2019-09-12 2019-12-20 腾讯科技(深圳)有限公司 目标检测方法、装置、电子设备及存储介质
CN110598788B (zh) * 2019-09-12 2023-06-30 腾讯科技(深圳)有限公司 目标检测方法、装置、电子设备及存储介质
CN111108508A (zh) * 2019-12-23 2020-05-05 深圳市优必选科技股份有限公司 脸部情感识别方法、智能装置和计算机可读存储介质
CN111108508B (zh) * 2019-12-23 2023-10-13 深圳市优必选科技股份有限公司 脸部情感识别方法、智能装置和计算机可读存储介质
CN111199516A (zh) * 2019-12-30 2020-05-26 深圳大学 基于图像生成网络模型的图像处理方法、系统及存储介质
CN111199516B (zh) * 2019-12-30 2023-05-05 深圳大学 基于图像生成网络模型的图像处理方法、系统及存储介质
CN111402274A (zh) * 2020-04-14 2020-07-10 上海交通大学医学院附属上海儿童医学中心 一种磁共振左心室图像分割的处理方法、模型及训练方法
CN111402274B (zh) * 2020-04-14 2023-05-26 上海交通大学医学院附属上海儿童医学中心 一种磁共振左心室图像分割的处理方法、模型及训练方法
CN113379667A (zh) * 2021-07-16 2021-09-10 浙江大华技术股份有限公司 脸部图像生成方法、装置、设备及介质
CN114677661A (zh) * 2022-03-24 2022-06-28 智道网联科技(北京)有限公司 一种路侧标识识别方法、装置和电子设备

Also Published As

Publication number Publication date
CN106934397B (zh) 2020-09-01
US20190311223A1 (en) 2019-10-10
US10943145B2 (en) 2021-03-09
CN106934397A (zh) 2017-07-07

Similar Documents

Publication Publication Date Title
WO2018166438A1 (zh) 图像处理方法、装置及电子设备
JP7415251B2 (ja) 画像処理用の装置及び方法、並びにニューラルネットワークトをトレーニングするシステム
CN112465828B (zh) 一种图像语义分割方法、装置、电子设备及存储介质
CN108154222B (zh) 深度神经网络训练方法和系统、电子设备
CN111402130B (zh) 数据处理方法和数据处理装置
CN108460411B (zh) 实例分割方法和装置、电子设备、程序和介质
CN108229531B (zh) 对象特征提取方法、装置、存储介质和电子设备
WO2018153322A1 (zh) 关键点检测方法、神经网络训练方法、装置和电子设备
Singh et al. Single image dehazing for a variety of haze scenarios using back projected pyramid network
CN108235116B (zh) 特征传播方法和装置、电子设备和介质
US9269025B1 (en) Object detection in images
CN112581379A (zh) 图像增强方法以及装置
KR20200087808A (ko) 인스턴스 분할 방법 및 장치, 전자 기기, 프로그램 및 매체
CN113673562B (zh) 一种特征增强的方法、目标分割方法、装置和存储介质
CN111626956A (zh) 图像去模糊方法和装置
CN108154153B (zh) 场景分析方法和系统、电子设备
KR102527642B1 (ko) 딥러닝 기반 소형 표적 탐지 시스템 및 방법
US20250118068A1 (en) Method and system for detecting changes in areas
Shit et al. An encoder‐decoder based CNN architecture using end to end dehaze and detection network for proper image visualization and detection
Du et al. Dehazing network: Asymmetric unet based on physical model
CN108229281B (zh) 神经网络的生成方法和人脸检测方法、装置及电子设备
Chen et al. Object counting in remote sensing via selective spatial‐frequency pyramid network
US20170045619A1 (en) Method and apparatus to recover scene data using re-sampling compressive sensing
Taş et al. Camera-based wildfire smoke detection for foggy environments
Ke et al. Scale-aware dimension-wise attention network for small ship instance segmentation in synthetic aperture radar images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18767955

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 17.12.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 18767955

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载