CN113221969A - Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion - Google Patents
Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion Download PDFInfo
- Publication number
- CN113221969A CN113221969A CN202110446945.6A CN202110446945A CN113221969A CN 113221969 A CN113221969 A CN 113221969A CN 202110446945 A CN202110446945 A CN 202110446945A CN 113221969 A CN113221969 A CN 113221969A
- Authority
- CN
- China
- Prior art keywords
- feature
- fusion
- features
- attention vector
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a semantic segmentation system and a semantic segmentation method based on dual-feature fusion of Internet of things perception, wherein the method comprises the following steps of S1, carrying out feature coding on an original image to obtain features of different scales; s2, learning the features of different scales through two attention thinning blocks to obtain multi-level fusion features; s3, reducing the dimension of the multilevel fusion features to obtain dimension reduction features; s4, context coding is carried out on the dimension reduction features by using depth separable convolutions with different convolution scales to obtain local features with different scales; s5, performing global pooling on the dimensionality reduction features by using a global mean pooling layer to obtain global features; s6, channel splicing and fusing the global features and the local features to obtain multi-scale context fusion features; s7, performing channel splicing and fusion on the dimensionality reduction feature and the multi-scale context fusion feature to obtain a splicing feature; and S8, obtaining output according to the splicing characteristics. The semantic difference among the multi-level features is relieved, the information representation is enriched, and the identification precision is improved.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a semantic segmentation system and method based on dual-feature fusion of Internet of things perception.
Background
Semantic segmentation, aiming at densely assigning each pixel to a respective predefined class, is becoming an increasingly attractive direction for computer vision field research. Because the deep learning method has strong expression learning capability, the research in the semantic segmentation field obtains good performance in many applications of the internet of things such as automatic driving, diabetic retinopathy, image analysis and the like. Two important factors, namely the feature fusion mode and the complexity of the network, significantly determine the performance of the semantic segmentation method. In particular, to accurately resolve complex scenarios in resource-constrained Internet of Things (IoT) environments, it is extremely important and challenging to encode robust multi-level features and different contextual information in an efficient and effective manner to achieve accurate, fast, and lightweight performance.
Existing semantic segmentation methods can be roughly divided into two categories: precision-oriented and efficiency-oriented methods. In the early days, most of the previous work was too focused on a single viewing angle: the accuracy of the identification of the algorithm or the speed of the efficiency of execution of the algorithm. In the first category of methods, the design idea of the semantic segmentation model mainly focuses on how to integrate diversified features, and a complex framework is designed to realize high-precision segmentation performance. For example, researchers propose a pyramid structure, a hole space pyramid pooling module (ASPP) or a Context Pyramid Module (CPM), and encode multi-scale context information at the end of the backbone network ResNet101(2048 feature maps) for processing the multi-scale change problem of the target. In addition, the U-type network directly fuses the hierarchical features through long-hop connection operation, and extracts spatial information of different levels as much as possible, thereby realizing accurate pixel segmentation. On the other hand, a typical asymmetric decoder structure has also been extensively studied by scholars. The ENet and ESPNet networks compress the network size greatly through pruning operations, and process large-scale images on line at a very fast speed. In order to improve the overall performance of the semantic segmentation method, recent semantic segmentation documents show a trend of uniformly considering the high efficiency and effectiveness of a segmentation network when multi-level features and multi-scale context information are coded. In particular, ERFNet employs a large number of decomposed convolutions with different dilation rates in the decoder portion, resulting in reduced redundancy of parameters while enlarging the field of view. In addition, researchers have proposed BiSenet, CANet, and ICNet that can process input images separately through several lightweight sub-networks and then fuse together multiple layers of feature or depth context information. In recent research, CIFReNet encodes multi-layer and multi-scale information by introducing a feature refinement and context integration module to achieve accurate and efficient scene segmentation.
Although the existing research achieves better segmentation performance in terms of high precision or high speed, the existing method at least has the following problems: 1) in the multi-level information fusion process, the process of feature extraction is completed by depending on more time and calculation complexity, so that the model learning efficiency is low and the calculation cost is high; 2) methods that directly fuse multi-source information through element-level addition or cascading operations rarely consider how to narrow the semantic gap between multi-layer features. Therefore, interaction between various information sources is hindered, resulting in non-ideal segmentation accuracy.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a semantic segmentation system and a semantic segmentation method based on dual-feature fusion of Internet of things perception, and the balance of comprehensive performances in the aspects of precision, speed, storage and computational complexity is realized.
A semantic segmentation method based on dual-feature fusion of Internet of things perception comprises the following steps:
s1, inputting an original image, and performing feature coding on the original image by using a backbone network to obtain features of different scales; s2, learning the features of different scales through two attention thinning blocks to obtain multi-level fusion features;
s3, reducing the dimension of the multilevel fusion features to obtain dimension reduction features;
s4, context coding is carried out on the dimension reduction features by using depth separable deconvolution with different convolution scales so as to obtain local features with different scales;
s5, performing global pooling on the dimensionality reduction features by using a global mean pooling layer to obtain global features;
s6, performing channel splicing and fusion on the global features and the local features of different scales to obtain multi-scale context fusion features;
s7, performing channel splicing and fusion on the dimensionality reduction feature and the multi-scale context fusion feature to obtain a splicing feature;
and S8, reducing the dimension of the spliced features and performing up-sampling to obtain the final output.
Preferably, step S1 specifically includes:
feature encoding is performed on the original image by using a backbone network to obtain a first feature, a second feature and a third feature, wherein the first feature is 1/4 of the original image scale, the second feature is 1/8 of the original image scale, and the third feature is 1/16 of the original image scale.
Preferably, step S2 includes the following steps:
s2.1, fusing the first feature and the second feature through a first attention thinning block to output semantic features;
and S2.2, fusing the semantic features and the third features through a second attention thinning block to obtain multi-level fusion features.
Preferably, step S2.1 specifically includes the following steps:
s2.1.1, mapping the first feature to a scale consistent with the second feature through a down-sampling layer to obtain a first scale feature;
s2.1.2, mapping the channel dimension of the first scale feature to be consistent with the channel dimension of the second feature through the first 1 x 1 convolution layer to obtain a first channel feature;
s2.1.3, channel splicing and fusing the first scale features and the second features to obtain first fused features;
s2.1.4, inputting the first fusion feature into a first adaptive mean pooling layer and a first adaptive maximum pooling layer respectively to output a first attention vector and a second attention vector respectively;
s2.1.5, performing nonlinear mapping on the first attention vector and the second attention vector through the first multi-layer perception layer to output a first mixed attention vector and a second mixed attention vector, and fusing the first mixed attention vector and the second mixed attention vector to output a first fused mixed attention vector;
s2.1.6, normalizing the first fused mixed attention vector to obtain a first normalized mixed attention vector;
s2.1.7, mapping the first channel feature with the first normalized mixed attention vector weighting;
s2.1.8, fusing the second feature and the weighted first channel feature to output a semantic feature.
Preferably, step S2.2 specifically includes the following steps:
s2.2.1, mapping the third feature to a scale consistent with the second feature through an upsampling layer to obtain a second scale feature;
s2.2.2, mapping the channel dimension of the second scale feature to be consistent with the channel dimension of the second feature through the second 1 x 1 convolution layer to obtain a second channel feature;
s2.2.3, channel splicing and fusing the second scale features and the semantic features to obtain second fusion features;
s2.2.4, inputting the second fusion feature into a second adaptive mean pooling layer and a second adaptive maximum pooling layer respectively to output a third attention vector and a fourth attention vector respectively;
s2.2.5, performing nonlinear mapping on the third attention vector and the fourth attention vector through the second multi-layer perception layer to output a third mixed attention vector and a fourth mixed attention vector, and fusing the third mixed attention vector and the fourth mixed attention vector to output a second fused mixed attention vector;
s2.2.6, normalizing the second fused mixed attention vector to obtain a second normalized mixed attention vector;
s2.2.7, mapping the second channel feature with a second normalized mixed attention vector weighting;
s2.2.8, fusing the semantic features and the weighted second channel features to obtain a multi-level fused feature.
Preferably, the first fusion feature is input into the first adaptive mean pooling layer and the first adaptive maximum pooling layer respectively in step S2.1.4 to output the first attention vector and the second attention vector respectively, specifically using the following formulas:
V1=AAP1(C[F1,F2]),
V2=AMP1(C[F1,F2]),
wherein, V1Is the first attention vector, V2Is the second attention vector, F1Is a first scale feature, F2In the second feature, C]Indicating channel splice fusion, AAP1() Representing a first adaptive mean pooling layer, AMP1() Representing a first adaptive max pooling layer.
Preferably, in step S2.2.4, the second fusion feature is respectively input into the second adaptive mean pooling layer and the second adaptive maximum pooling layer to respectively output the third attention vector and the fourth attention vector, specifically using the following formulas:
V3=AAP2(C[L1,L2]),
V4=AMP2(C[L1,L2]),
wherein, V3Is the third attention vector, V4Is the fourth attention vector, L1Is a second scale feature, L2Being a semantic feature, AAP2() Representing a second adaptive mean pooling layer, AMP2() Representing a second adaptive max pooling layer.
Preferably, in step S2.1.5, the first attention vector and the second attention vector are nonlinearly mapped by the first multi-layered sensing layer to output a first mixed attention vector and a second mixed attention vector, and the first mixed attention vector and the second mixed attention vector are channel-splicing-fused to output a first fused mixed attention vector, specifically using the following formula:
Va1=MLP1(C[V1,V2]),
in step S2.2.5, the third attention vector and the fourth attention vector are nonlinearly mapped by the second multi-layered sensing layer to output a third mixed attention vector and a fourth mixed attention vector, and the third mixed attention vector and the fourth mixed attention vector are fused to output a second fused mixed attention vector, specifically using the following formulas:
Va2=MLP2(C[V3,V4]),
wherein, Va1As a first fusion mixed attention vector, Va2For the second fused mixed attention vector, MLP1() Being a first multi-layer sensing layer, MLP2() A second multilayer sensing layer.
Preferably, the method comprises the following steps: s2.1.6, normalizing the first fused mixed attention vector to obtain a first normalized mixed attention vector, S2.1.7, weighting and mapping the first channel feature by the first normalized mixed attention vector, S2.1.8, fusing the second feature and the weighted first channel feature to output semantic features, specifically adopting the following formula:
the method comprises the following steps: s2.2.6, normalizing the second fused mixed attention vector to obtain a second normalized mixed attention vector, S2.2.7, weighting and mapping the second channel feature by the second normalized mixed attention vector, S2.2.8, fusing the semantic feature and the weighted second channel feature to obtain a multilevel fused feature, wherein the following formula is specifically adopted:
wherein L is2Is semantic feature, L'2For multi-level fusion of features, Sig1() Representing a first activation function, Sig2() Denotes a second activation function, F'1Is a first channel feature, L'1For the second channel feature, H represents the height of the feature map, W represents the width of the feature map,representing a pixel-level dot-product operation,representing a pixel level dot addition operation.
Correspondingly, a semantic segmentation system based on the Internet of things perception and adopting double-feature fusion is further provided, and comprises a multilayer feature fusion module and a lightweight semantic pyramid module which are connected with each other;
the multi-layer feature fusion module comprises a backbone network unit and a proofreading unit;
the lightweight semantic pyramid module comprises a first dimension reduction unit, a second dimension reduction unit, a third dimension reduction unit, a context coding unit, a global pooling unit, a first channel splicing and fusing unit, a second channel splicing and fusing unit and an upsampling unit;
the backbone network unit is connected with a proofreading unit, the proofreading unit is respectively connected with a first dimension reduction unit and a second dimension reduction unit, the first dimension reduction unit is respectively connected with a context coding unit and a global pooling unit, the context coding unit and the global pooling unit are both connected with a first channel splicing and fusing unit, the second dimension reduction unit and the first channel splicing and fusing unit are both connected with a second channel splicing and fusing unit, the second channel splicing and fusing unit is also connected with a third dimension reduction unit, and an up-sampling unit is connected with the third dimension reduction unit;
the backbone network unit is used for performing feature coding on the original image by using a backbone network to obtain features of different scales;
the checking unit is used for learning the features with different scales through the two attention thinning blocks so as to obtain multi-level fusion features;
the first dimension reduction unit and the second dimension reduction unit are used for reducing the dimension of the multi-level fusion feature so as to output a first dimension reduction feature and a second dimension reduction feature respectively, and the first dimension reduction feature and the second dimension reduction feature are the same;
the context coding unit is used for respectively carrying out context coding on the first dimension reduction characteristics through depth separable convolutions with different convolution scales so as to obtain local characteristics with different scales;
the global pooling unit is used for performing global pooling on the first dimension reduction feature through a global mean pooling layer to obtain a global feature;
the first channel splicing and fusing unit is used for carrying out channel splicing and fusing on the global features and the local features of different scales so as to obtain multi-scale context fusion features;
the second channel splicing and fusing unit is used for carrying out channel splicing and fusing on the second dimension reduction feature and the multi-scale context fusion feature to obtain a splicing feature;
the third dimension reduction unit is used for reducing the dimension of the splicing feature;
and the up-sampling unit is used for up-sampling the splicing characteristics subjected to the dimensionality reduction so as to obtain final output.
The invention has the beneficial effects that:
(1) a multi-level feature fusion module (MFFM) is proposed that employs two recursive attention-refining blocks (ARBs) to effectively improve the effectiveness of multi-level feature fusion. Under the condition that the computation cost of the executable is controllable, the proposed ARB corrects the spatial detail information in the low-order features by using the abstract semantic information of the high-order features, so that the semantic difference among the multi-level features is relieved.
(2) A Lightweight Semantic Pyramid Module (LSPM) is provided that decomposes convolution operators to reduce the computational overhead of context information coding. In addition, the module fuses the multi-level fusion features and the multi-scale context diagnosis, and information representation is enriched, so that the identification precision is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a semantic segmentation method based on Internet of things perception for dual feature fusion according to the invention;
FIG. 2 is a schematic structural diagram of a semantic segmentation system based on Internet of things perception and dual-feature fusion according to the invention;
fig. 3 is a schematic structural diagram of the attention thinning block according to the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided by way of specific examples, and other advantages and effects of the present invention will be readily apparent to those skilled in the art from the disclosure herein. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
The first embodiment is as follows:
referring to fig. 1, 2 and 3, the embodiment provides a semantic segmentation method based on dual-feature fusion of internet of things perception, which includes the steps of:
s1, inputting an original image, and performing feature coding on the original image by using a backbone network to obtain features of different scales;
s2, learning the features of different scales through two attention thinning blocks to obtain multi-level fusion features;
s3, reducing the dimension of the multilevel fusion features to obtain dimension reduction features;
s4, context coding is carried out on the dimension reduction features by using depth separable deconvolution with different convolution scales so as to obtain local features with different scales;
s5, performing global pooling on the dimensionality reduction features by using a global mean pooling layer to obtain global features;
s6, performing channel splicing and fusion on the global features and the local features of different scales to obtain multi-scale context fusion features;
s7, performing channel splicing and fusion on the dimensionality reduction feature and the multi-scale context fusion feature to obtain a splicing feature;
and S8, reducing the dimension of the spliced features and performing up-sampling to obtain the final output.
Wherein, step S1 specifically includes:
feature encoding is performed on the original image by using a backbone network to obtain a first feature, a second feature and a third feature, wherein the first feature is 1/4 of the original image scale, the second feature is 1/8 of the original image scale, and the third feature is 1/16 of the original image scale.
Each layer of the backbone network has different feature expression capabilities. The shallower layers contain more spatial detail but lack semantic information; while deeper layers retain rich semantic information but lose a large amount of spatial information. Intuitively, fusing multiple layers of information together has a gain effect on learning discriminative and comprehensive feature representations.
Based on the above observation, we obtain different scale features from the backbone network and sequentially note as I1/4,I1/8And I1/16And secondly, unifying the dimensions of all feature maps into 1/8 size to reduce information loss and resource utilization. Specifically, by a1 × 1 pooling layer pair I1/4Down-sampling to obtain I1’/8Then, using bilinear layer to pair high-order characteristic diagram I1/16Up-sampling to obtain I'1/8. And finally fusing the three to obtain a multi-level fusion characteristic O. The above process is represented by the following formula:
T’1/8=T(GAPk=2,s=2(I1/4))
T”1/8=Upsample(I1/16)
wherein, GAPk=2,S=2() Representing a global mean pooling layer with scale 2 and step 2, T () is defined as a channel transform operation for changing the number of feature maps. Upsample () represents an upsampled layer and ≧ represents a pixel-level point add operation.
Although the above-described feature fusion operation facilitates the mutual utilization of complementary information between multi-level features, it may not be efficient and comprehensive to directly integrate low-level features with high-level features due to semantic differences in multi-level stages. To address this problem, the present invention designs a feature refinement strategy, called an attention-collation Block (ARB). Both focus on modeling the inter-channel relationships for multi-level fusion features. By this method, when the current channel position contains the characteristics of the value information, the model can emphasize the weight of the neuron highly related to the target object.
Namely, step S2 includes the following steps:
s2.1, fusing the first feature and the second feature through a first attention thinning block to output semantic features;
and S2.2, fusing the semantic features and the third features through a second attention thinning block to obtain multi-level fusion features.
Further, step S2.1 specifically includes the following steps:
s2.1.1, mapping the first feature to a scale consistent with the second feature through a down-sampling layer to obtain a first scale feature;
s2.1.2, mapping the channel dimension of the first scale feature to be consistent with the channel dimension of the second feature through the first 1 x 1 convolution layer to obtain a first channel feature;
s2.1.3, channel splicing and fusing the first scale features and the second features to obtain first fused features;
s2.1.4, inputting the first fusion feature into a first adaptive average pooling layer (AAP) and a first adaptive maximum pooling layer (AMP) respectively to output a first attention vector and a second attention vector respectively; both the adaptive mean pooling layer (AAP) and the adaptive max pooling layer (AMP) model the importance of individual feature channels by weighting all channels of the multi-level fused feature. The higher the importance of the current feature channel is, the larger the weight corresponding to the layer is.
S2.1.5, performing nonlinear mapping on the first attention vector and the second attention vector through the first multi-layer sensing layer to output a first mixed attention vector and a second mixed attention vector, which are used for improving the nonlinearity and robustness of the feature, and fusing the first mixed attention vector and the second mixed attention vector to output a first fused mixed attention vector;
s2.1.6, normalizing the first fused mixed attention vector to obtain a first normalized mixed attention vector;
s2.1.7, mapping the first channel feature with the first normalized mixed attention vector weighting;
s2.1.8, fusing the second feature and the weighted first channel feature to output a semantic feature.
Further, step S2.2 specifically includes the following steps:
s2.2.1, mapping the third feature to a scale consistent with the second feature through an upsampling layer to obtain a second scale feature;
s2.2.2, mapping the channel dimension of the second scale feature to be consistent with the channel dimension of the second feature through the second 1 x 1 convolution layer to obtain a second channel feature;
s2.2.3, channel splicing and fusing the second scale features and the semantic features to obtain second fusion features;
s2.2.4, inputting the second fusion feature into a second adaptive mean pooling layer and a second adaptive maximum pooling layer respectively to output a third attention vector and a fourth attention vector respectively;
s2.2.5, performing nonlinear mapping on the third attention vector and the fourth attention vector through the second multi-layer perception layer to output a third mixed attention vector and a fourth mixed attention vector, and fusing the third mixed attention vector and the fourth mixed attention vector to output a second fused mixed attention vector;
s2.2.6, normalizing the second fused mixed attention vector to obtain a second normalized mixed attention vector;
s2.2.7, mapping the second channel feature with a second normalized mixed attention vector weighting;
s2.2.8, fusing the semantic features and the weighted second channel features to obtain a multi-level fused feature.
Further, in step S2.1.4, the first fusion feature is respectively input into the first adaptive mean pooling layer and the first adaptive maximum pooling layer to respectively output the first attention vector and the second attention vector, specifically using the following formulas:
V1=AAP1(C[F1,F2]),
V2=AMP1(C[F1,F2]),
wherein, V1Is the first attention vector, V2Is the second attention vector, F1Is a first scale feature, F2In the second feature, C]Indicating channel splice fusion, AAP1() Representing a first adaptive mean pooling layer, AMP1() Representing a first adaptive max pooling layer.
Inputting the second fusion feature into the second adaptive mean pooling layer and the second adaptive maximum pooling layer respectively to output a third attention vector and a fourth attention vector, which are described in step S2.2.4, specifically using the following formulas:
V3=AAP2(C[L1,L2]),
V4=AMP2(C[L1,L2]),
wherein, V3Is the third attention vector, V4Is the fourth attention vector, L1Is a second scale feature, L2Being a semantic feature, AAP2() Representing a second adaptive mean pooling layer, AMP2() Representing a second adaptive max pooling layer.
Further, in step S2.1.5, the first attention vector and the second attention vector are nonlinearly mapped by the first multi-layered sensing layer to output a first mixed attention vector and a second mixed attention vector, and the first mixed attention vector and the second mixed attention vector are channel-splicing-fused to output a first fused mixed attention vector, specifically using the following formula:
Va1=MLP1(C[V1,V2]),
in step S2.2.5, the third attention vector and the fourth attention vector are nonlinearly mapped by the second multi-layered sensing layer to output a third mixed attention vector and a fourth mixed attention vector, and the third mixed attention vector and the fourth mixed attention vector are fused to output a second fused mixed attention vector, specifically using the following formulas:
Va2=MLP2(C[V3,V4]),
wherein, Va1As a first fusion mixed attention vector, Va2For the second fused mixed attention vector, MLP1() Being a first multi-layer sensing layer, MLP2() A second multilayer sensing layer.
Further, the steps of: s2.1.6, normalizing the first fused mixed attention vector to obtain a first normalized mixed attention vector, S2.1.7, weighting and mapping the first channel feature by the first normalized mixed attention vector, S2.1.8, fusing the second feature and the weighted first channel feature to output semantic features, specifically adopting the following formula:
the method comprises the following steps: s2.2.6, normalizing the second fused mixed attention vector to obtain a second normalized mixed attention vector, S2.2.7, weighting and mapping the second channel feature by the second normalized mixed attention vector, S2.2.8, fusing the semantic feature and the weighted second channel feature to obtain a multilevel fused feature, wherein the following formula is specifically adopted:
wherein L is2Is semantic feature, L'2For multi-level fusion of features, Sig1() Representing a first activation function, Sig2() Denotes a second activation function, F'1Is a first channel feature, L'1For the second channel feature, H represents the height of the feature map, W represents the width of the feature map,representing a pixel-level dot-product operation,representing a pixel level dot addition operation.
Technically, the design of the ARB can be regarded as an information collation strategy, and two attention-based paths predict the importance of each channel in a complementary manner, so that more semantic information is transferred to low-level features to relieve semantic difference among different-level features, and effective feature fusion is realized. The experimental results in the following section verify the effectiveness of this setup. It is worth noting that ARBs have only 0.03M parameters in total, and the entire multi-level feature fusion remains computationally lightweight.
Further, to enhance the computational efficiency of the context extraction module, the present invention proposes a deep separable Deconvolution (DFC) operation instead of the standard convolutional layer. Inspired by deep separable convolution and decomposed convolution, one main idea of lightweight feature extraction is to integrate the ideas of the two technologies. Firstly, a regularization layer and an activation function are used as two preprocessing steps to improve the regularity of the convolutional layer; next, the 3 × 3 depth convolution is decomposed to obtain two sets of depth separable convolution layers with dimensions of 3 × 1 and 1 × 3, respectively. By the method, the sparsity of the dense convolution kernels on all the channels is kept uniformly, so that the computational complexity and the resource overhead of convolution are reduced. And finally, fusing the local features and the global features of all scales to obtain the multi-scale context fusion feature.
After the multi-scale context is coded, the multi-level fusion features after dimension reduction are further combined with the global features and the local features of different scales to predict the final segmentation result. The above design has two advantages: on one hand, multi-level information and multi-scale context information are integrated in a unified system to realize more effective feature representation; on the other hand, the adoption of the jump connection can encourage the information transfer and gradient conduction of the front-layer multilevel information, thereby improving the identification efficiency.
The key points of the technology of the invention are as follows:
(1) the invention discloses a novel Internet of things-oriented dual-feature fusion real-time semantic segmentation network (DFFNet). Compared to advanced methods, DFFNet reduces FLOPs by about 2.5 times and increases model execution speed by 1.8 times, while achieving better accuracy.
(2) A multi-level feature fusion module (MFFM) is proposed that employs two recursive attention-collation blocks (ARBs) to effectively improve the effectiveness of multi-level feature fusion. Under the condition that the computation cost of the executable is controllable, the proposed ARB corrects the spatial detail information in the low-order features by using the abstract semantic information of the high-order features, so that the semantic difference among the multi-level features is relieved.
(3) A Lightweight Semantic Pyramid Module (LSPM) is provided that decomposes convolution operators to reduce the computational overhead of context information coding. In addition, the module fuses the multi-level fusion features and the multi-scale context diagnosis, and abundant information is represented, so that the identification precision is improved.
Further, this embodiment also compares the present invention across multiple data sets against existing methods to verify the effectiveness of the present invention.
Data set: the data set used in the present invention is a recognized standard scene perception data set, cityscaps, consisting of 25000 annotated 2048 × 1024 resolution images. The annotations set contains 30 classes, 19 of which are used for training and evaluation. In the experiments of the present invention, only 5000 images with fine annotations were used. There were 2975 images for training, 500 images for verification, and 1525 images for testing.
Setting parameters: all experiments were in NVIDIA1080 tivpu card. We perform a 0.5 to 1.5 times random scaling of the image and apply a random left-right flip operation randomly on all training sets. Further, the initial learning rate was set to 0.005, and the learning rate was attenuated using a poly strategy. The network was trained using a stochastic gradient descent optimization algorithm by minimizing the pixel cross entropy loss, with a momentum of 0.9 and a weight decay of 5 e-4. Finally, a batch normalization layer is applied before all conventional or expanded convolutional layers to achieve fast convergence.
Evaluation indexes are as follows: the invention adopts four evaluation indexes recognized in the field of semantic segmentation: segmentation accuracy, inference speed, network parameters and computational complexity.
The multistage feature fusion module ablation experiment:
as shown in table 1, the present invention compares four multi-level feature fusion models with a reference model: elemental Additive Fusion (EAF), average pool attention refinement (AAR), Maximum Attention Refinement (MAR), and the use of AAR and MAR in combination. As shown in the table, the EAF performance is only 1.12% higher than the baseline network, which indicates that directly fusing the multi-level feature is a sub-optimal solution. Compared with a reference network, the AAR and the MAR realize the performance improvement of 2.61% of mIoU and 2.54% of mIoU, which shows that the interdependence relationship between modeling channels can reduce the semantic difference between multi-level features. The bilateral pooling attention strategy provided by the invention mutually compensates the significance information and the global information. Thus, MFFM achieves a further boost of 0.55% and 0.62% mlou compared to AAR and MAR. In addition, the proposed MFFM adds negligible additional calculations (only 0.06M parameter and 0.11GFLOPs), which verifies the efficiency and effectiveness of the proposed module.
| Model (model) | Speed (ms) | Reference quantity (M) | FLOPs(G) | MIoU(%) |
| Baseline | 15.40 | 1.82 | 2.79 | 67.83 |
| EAF | 15.81 | 1.85 | 2.90 | 68.95 |
| AAR | 15.80 | 1.86 | 2.90 | 70.44 |
| MAR | 15.81 | 1.86 | 2.90 | 70.37 |
| AAR+MAR | 16.03 | 1.88 | 2.90 | 70.99 |
TABLE 1 ablation learning of multilevel feature fusion modules
The ablation experiment of the lightweight semantic pyramid module comprises the following steps:
the test evaluates the performance of the lightweight semantic pyramid module. SC-SPM, FC-SPM, DC-SPM and DFC-SPM respectively represent methods with four semantic pyramid modules, which are built on conventional convolution, decomposed convolution, deep convolution and deep separable deconvolution, respectively. As shown in table 2, 1) compared with the reference model EAF, the semantic segmentation method with the semantic pyramid module can improve the mIOU segmentation accuracy by about 1.11% to 2.70%, which indicates that extracting the local and global context information can significantly improve the learning ability of the model. 2) Although SC-SPM, FC-SPM, DC-SPM, and DFC-SPM achieve similar accuracy performance, building a semantic pyramid module based on efficient convolution achieves better efficiency (faster speed and less computational complexity) than building a module based on conventional convolution. DFC-SPM achieved 71.02% IU, with only 0.05M additional parameters and 0.20G FLOPs. 3) The LSPM integrates context information and multi-level feature information by designing a short-distance feature learning operation, and is used for encouraging information transfer and gradient conduction of front-level multi-level information. Therefore, the accuracy performance of the DFC-SPM method is improved from 71.02% mIoU to 71.65% mIoU. The above results demonstrate the high efficiency and effectiveness of the proposed LSPM.
| Model (model) | Speed (ms) | Reference quantity (M) | FLOPs(G) | MIoU(%) |
| EAF | 15.81 | 1.85 | 2.90 | 68.95 |
| SC-SPM | 16.22 | 2.11 | 4.43 | 70.81 |
| FC-SPM | 16.10 | 2.03 | 3.72 | 70.06 |
| DC-SPM | 15.76 | 1.90 | 3.11 | 71.00 |
| DFC-SPM | 15.72 | 1.90 | 3.10 | 71.02 |
| LSPM | 15.65 | 1.89 | 3.06 | 71.65 |
TABLE 2 ablation learning of lightweight semantic pyramid modules
Evaluation on the reference data set:
on the cityscaps dataset, DFFNet was compared to other existing semantic segmentation methods. "-" indicates that the method does not publish the corresponding performance value.
TABLE 3 comprehensive Properties of this chapter of methods and comparison methods on the Cityscapes dataset
As shown in table 3, SegNet and ENET improve speed by compressing the model scale significantly at the expense of segmentation accuracy. LW-Refine Net and ERFNet design an asymmetric codec structure to maintain a balance of accuracy and efficiency. The BeSiNet, CANet and ICNet adopt a multi-branch structure, so that good balance between precision and speed is achieved, but more additional learning parameters are introduced. In contrast, DFFNet achieves better accuracy and efficiency performance, particularly in terms of a reduction in network parameters (1.9M parameters) and computational complexity (3.1 GFLOPs). In addition, FCN and partition 10 use a VGG backbone network (e.g., VGG16 and VGG19) that is computationally expensive as a feature extractor, requiring 2 seconds or more to process an image. DRN, deedlab v2, reflinenet, and PSPNet employ deep ResNet backbone networks (e.g., ResNet50 and ResNet101) to enhance multi-scale feature representation, requiring significant computational cost and memory usage. Compared with the accuracy-oriented methods, the method only needs 12ms for processing images with 640 × 360 resolution, and achieves the segmentation accuracy of 71.0% mIoU.
In conclusion, the method realizes comprehensive segmentation performance in precision and efficiency (reasoning speed, network parameters and computational complexity), so that the method has great deployment potential on the Internet of things equipment with limited resources.
Example two:
referring to fig. 3, the embodiment provides a semantic segmentation system based on dual-feature fusion of internet of things perception, which includes a multilayer feature fusion module and a lightweight semantic pyramid module connected to each other;
the multi-layer feature fusion module comprises a backbone network unit and a proofreading unit;
the lightweight semantic pyramid module comprises a first dimension reduction unit, a second dimension reduction unit, a third dimension reduction unit, a context coding unit, a global pooling unit, a first channel splicing and fusing unit, a second channel splicing and fusing unit and an upsampling unit;
the backbone network unit is connected with a proofreading unit, the proofreading unit is respectively connected with a first dimension reduction unit and a second dimension reduction unit, the first dimension reduction unit is respectively connected with a context coding unit and a global pooling unit, the context coding unit and the global pooling unit are both connected with a first channel splicing and fusing unit, the second dimension reduction unit and the first channel splicing and fusing unit are both connected with a second channel splicing and fusing unit, the second channel splicing and fusing unit is also connected with a third dimension reduction unit, and an up-sampling unit is connected with the third dimension reduction unit;
the backbone network unit is used for performing feature coding on the original image by using a backbone network to obtain features of different scales;
the checking unit is used for learning the features with different scales through the two attention thinning blocks so as to obtain multi-level fusion features;
the first dimension reduction unit and the second dimension reduction unit are used for reducing the dimension of the multi-level fusion feature so as to output a first dimension reduction feature and a second dimension reduction feature respectively, and the first dimension reduction feature and the second dimension reduction feature are the same;
the context coding unit is used for respectively carrying out context coding on the first dimension reduction characteristics through depth separable convolutions with different convolution scales so as to obtain local characteristics with different scales;
the global pooling unit is used for performing global pooling on the first dimension reduction feature through a global mean pooling layer to obtain a global feature;
the first channel splicing and fusing unit is used for carrying out channel splicing and fusing on the global features and the local features of different scales so as to obtain multi-scale context fusion features;
the second channel splicing and fusing unit is used for carrying out channel splicing and fusing on the second dimension reduction feature and the multi-scale context fusion feature to obtain a splicing feature;
the third dimension reduction unit is used for reducing the dimension of the splicing feature;
and the up-sampling unit is used for up-sampling the splicing characteristics subjected to the dimensionality reduction so as to obtain final output.
It should be noted that, similar to the embodiment, the semantic segmentation system based on dual-feature fusion for internet of things perception provided in this embodiment is not described herein in detail.
The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention by those skilled in the art should fall within the protection scope of the present invention without departing from the design spirit of the present invention.
Claims (10)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110446945.6A CN113221969A (en) | 2021-04-25 | 2021-04-25 | Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion |
| PCT/CN2022/081427 WO2022227913A1 (en) | 2021-04-25 | 2022-03-17 | Double-feature fusion semantic segmentation system and method based on internet of things perception |
| LU503090A LU503090B1 (en) | 2021-04-25 | 2022-03-17 | A semantic segmentation system and method based on dual feature fusion for iot sensing |
| ZA2022/07731A ZA202207731B (en) | 2021-04-25 | 2022-07-12 | A semantic segmentation system and method based on dual feature fusion for iot sensing |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110446945.6A CN113221969A (en) | 2021-04-25 | 2021-04-25 | Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN113221969A true CN113221969A (en) | 2021-08-06 |
Family
ID=77088741
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110446945.6A Withdrawn CN113221969A (en) | 2021-04-25 | 2021-04-25 | Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion |
Country Status (4)
| Country | Link |
|---|---|
| CN (1) | CN113221969A (en) |
| LU (1) | LU503090B1 (en) |
| WO (1) | WO2022227913A1 (en) |
| ZA (1) | ZA202207731B (en) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114445430A (en) * | 2022-04-08 | 2022-05-06 | 暨南大学 | Real-time image semantic segmentation method and system based on lightweight multi-scale feature fusion |
| CN114821768A (en) * | 2022-03-18 | 2022-07-29 | 中国科学院自动化研究所 | Skeleton behavior identification method and device and electronic equipment |
| CN114821042A (en) * | 2022-04-27 | 2022-07-29 | 南京国电南自轨道交通工程有限公司 | An R-FCN knife gate detection method combining local and global features |
| CN114913325A (en) * | 2022-03-24 | 2022-08-16 | 北京百度网讯科技有限公司 | Semantic segmentation method, device and computer program product |
| WO2022227913A1 (en) * | 2021-04-25 | 2022-11-03 | 浙江师范大学 | Double-feature fusion semantic segmentation system and method based on internet of things perception |
| CN116740866A (en) * | 2023-08-11 | 2023-09-12 | 上海银行股份有限公司 | Banknote loading and clearing system and method for self-service machine |
| CN119399457A (en) * | 2024-09-18 | 2025-02-07 | 广州大学 | A real-time semantic segmentation method and system for multi-shape pyramids in traffic scenes |
Families Citing this family (59)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115272677A (en) * | 2022-08-01 | 2022-11-01 | 安徽理工大学环境友好材料与职业健康研究院(芜湖) | Multi-scale feature fusion semantic segmentation method, equipment and storage medium |
| CN115713624A (en) * | 2022-09-02 | 2023-02-24 | 郑州大学 | Self-adaptive fusion semantic segmentation method for enhancing multi-scale features of remote sensing image |
| CN115830449B (en) * | 2022-12-01 | 2025-09-02 | 北京理工大学重庆创新中心 | Remote sensing target detection method guided by explicit contours and enhanced by spatially varying context |
| CN116664875A (en) * | 2023-01-16 | 2023-08-29 | 河北师范大学 | PVT-based saliency target detection method for gating network |
| CN116030307B (en) * | 2023-02-03 | 2025-08-05 | 山东大学 | Breast pathology image recognition system based on context-aware multi-scale feature fusion |
| CN116229065B (en) * | 2023-02-14 | 2023-12-01 | 湖南大学 | A segmentation method for robotic surgical instruments based on multi-branch fusion |
| CN116342884B (en) * | 2023-03-28 | 2024-02-06 | 阿里云计算有限公司 | Image segmentation and model training method and server |
| CN116052007B (en) * | 2023-03-30 | 2023-08-11 | 山东锋士信息技术有限公司 | Remote sensing image change detection method integrating time and space information |
| CN116434069B (en) * | 2023-04-27 | 2025-09-19 | 南京信息工程大学 | Remote sensing image change detection method based on local-global transducer network |
| CN116205928B (en) * | 2023-05-06 | 2023-07-18 | 南方医科大学珠江医院 | Image segmentation processing method, device and equipment for laparoscopic surgery video and medium |
| CN116580241B (en) * | 2023-05-22 | 2024-05-14 | 内蒙古农业大学 | Image processing method and system based on dual-branch multi-scale semantic segmentation network |
| CN116630386B (en) * | 2023-06-12 | 2024-02-20 | 新疆生产建设兵团医院 | CTA scanning image processing method and system thereof |
| CN116721253B (en) * | 2023-06-12 | 2025-08-15 | 湖南科技大学 | Abdominal CT image multi-organ segmentation method based on deep learning |
| CN117115435B (en) * | 2023-06-30 | 2025-07-18 | 重庆理工大学 | Attention and multi-scale feature extraction-based real-time semantic segmentation method |
| CN116721351B (en) * | 2023-07-06 | 2024-06-18 | 内蒙古电力(集团)有限责任公司内蒙古超高压供电分公司 | Remote sensing intelligent extraction method for road environment characteristics in overhead line channel |
| CN116559778B (en) * | 2023-07-11 | 2023-09-29 | 海纳科德(湖北)科技有限公司 | Vehicle whistle positioning method and system based on deep learning |
| CN116612124B (en) * | 2023-07-21 | 2023-10-20 | 国网四川省电力公司电力科学研究院 | A transmission line defect detection method based on dual-branch serial hybrid attention |
| CN116721420B (en) * | 2023-08-10 | 2023-10-20 | 南昌工程学院 | A method and system for constructing a semantic segmentation model for UV images of electrical equipment |
| CN117115443B (en) * | 2023-08-18 | 2024-06-11 | 中南大学 | A segmentation method for identifying small infrared targets |
| CN117058383B (en) * | 2023-08-18 | 2025-06-20 | 河南大学 | An efficient and lightweight real-time semantic segmentation method for urban street scenes |
| CN117197763B (en) * | 2023-09-07 | 2025-09-26 | 湖北工业大学 | Road crack detection method and system based on cross-attention guided feature alignment network |
| CN117095172B (en) * | 2023-09-09 | 2025-07-04 | 西北工业大学 | A continuous semantic segmentation method based on internal and external distillation |
| CN117314787B (en) * | 2023-11-14 | 2025-07-08 | 河北工业大学 | Underwater image enhancement method based on adaptive multi-scale fusion and attention mechanism |
| CN117636165A (en) * | 2023-11-30 | 2024-03-01 | 电子科技大学 | A multi-task remote sensing semantic change detection method based on token mixing |
| CN118212543B (en) * | 2023-12-11 | 2024-10-22 | 自然资源部国土卫星遥感应用中心 | Bilateral fusion and lightweight network improved radiation abnormal target detection method |
| CN117809294B (en) * | 2023-12-29 | 2024-07-19 | 天津大学 | A text detection method based on feature correction and difference-guided attention |
| CN117710694B (en) * | 2024-01-12 | 2024-10-22 | 中国科学院自动化研究所 | Multimode characteristic information acquisition method and system, electronic equipment and storage medium |
| CN117876929B (en) * | 2024-01-12 | 2024-06-21 | 天津大学 | A temporal object localization method based on progressive multi-scale context learning |
| CN117593633B (en) * | 2024-01-19 | 2024-06-14 | 宁波海上鲜信息技术股份有限公司 | Ocean scene-oriented image recognition method, system, equipment and storage medium |
| CN117745745B (en) * | 2024-02-18 | 2024-05-10 | 湖南大学 | CT image segmentation method based on context fusion perception |
| CN118037664B (en) * | 2024-02-20 | 2024-10-01 | 成都天兴山田车用部品有限公司 | Deep hole surface defect detection and CV size calculation method |
| CN117789153B (en) * | 2024-02-26 | 2024-05-03 | 浙江驿公里智能科技有限公司 | Automobile oil tank outer cover positioning system and method based on computer vision |
| CN117828280B (en) * | 2024-03-05 | 2024-06-07 | 山东新科建工消防工程有限公司 | Intelligent fire information acquisition and management method based on Internet of things |
| CN118052739B (en) * | 2024-03-08 | 2025-01-14 | 东莞理工学院 | A traffic image defogging method and intelligent traffic image processing system based on deep learning |
| CN117993442B (en) * | 2024-03-21 | 2024-10-18 | 济南大学 | Hybrid neural network method and system for fusing local and global information |
| CN118072357B (en) * | 2024-04-16 | 2024-07-02 | 南昌理工学院 | Control method and system of intelligent massage robot |
| CN118429808B (en) * | 2024-05-10 | 2024-12-17 | 北京信息科技大学 | Remote sensing image road extraction method and system based on lightweight network structure |
| CN118230175B (en) * | 2024-05-23 | 2024-08-13 | 济南市勘察测绘研究院 | Real estate mapping data processing method and system based on artificial intelligence |
| CN118366000B (en) * | 2024-06-14 | 2024-10-29 | 陕西天润科技股份有限公司 | Cultural relic health management method based on digital twinning |
| CN118587506A (en) * | 2024-06-19 | 2024-09-03 | 兰州大学 | A deep learning-based atmospheric cloud classification method |
| CN118397298B (en) * | 2024-06-28 | 2024-09-06 | 杭州安脉盛智能技术有限公司 | Self-attention space pyramid pooling method based on mixed pooling and related components |
| CN118429335B (en) * | 2024-07-02 | 2024-09-24 | 新疆胜新复合材料有限公司 | Online defect detection system and method for carbon fiber sucker rod |
| CN118470679B (en) * | 2024-07-10 | 2024-09-24 | 山东省计算中心(国家超级计算济南中心) | A lightweight lane line segmentation and recognition method and system |
| CN118485835B (en) * | 2024-07-16 | 2024-10-01 | 杭州电子科技大学 | Multispectral image semantic segmentation method based on modal divergence difference fusion |
| CN118898718B (en) * | 2024-07-25 | 2025-04-18 | 中国矿业大学 | A semantic segmentation method with enhanced boundary perception |
| CN119168951A (en) * | 2024-08-27 | 2024-12-20 | 上海茹钰生物科技有限公司 | Essence liquid automated production line and method thereof |
| CN118840559B (en) * | 2024-09-20 | 2024-12-13 | 泉州职业技术大学 | Rail surface defect segmentation method and device based on ordered cross-scale feature interaction |
| CN119048763B (en) * | 2024-10-30 | 2025-04-08 | 江西师范大学 | A colonoscopy polyp image segmentation method based on hybrid model |
| CN119068201B (en) * | 2024-11-04 | 2025-04-22 | 江西师范大学 | Image segmentation method and system based on multistage multi-scale gradual fusion network |
| CN119152075B (en) * | 2024-11-11 | 2025-02-14 | 浙江大学 | Object elimination method and device for environment interaction perception association |
| CN119649022B (en) * | 2024-11-14 | 2025-07-04 | 华东交通大学 | Real-time semantic segmentation tunnel over-excavation and under-excavation monitoring method and system |
| CN119151802A (en) * | 2024-11-15 | 2024-12-17 | 无锡学院 | Method, system, equipment and storage medium for fusing infrared image and visible light image |
| CN119577685B (en) * | 2024-11-29 | 2025-09-26 | 西安电子科技大学 | S-D network full-level sensing-based efficient detection system and detection method thereof |
| CN119314086B (en) * | 2024-12-13 | 2025-03-25 | 浙江师范大学 | Image matting method |
| CN119810663B (en) * | 2024-12-31 | 2025-09-30 | 同济大学 | Road extraction method based on mixed attention mechanism and direction prior |
| CN119888285B (en) * | 2025-03-26 | 2025-07-22 | 厦门理工学院 | A multi-scale image matching method and system |
| CN120216970B (en) * | 2025-05-30 | 2025-09-19 | 大连理工大学 | Remaining life prediction method and device based on multi-scale decomposition enhancement |
| CN120374992B (en) * | 2025-06-30 | 2025-09-12 | 江苏富翰医疗产业发展有限公司 | Image segmentation method based on spatial attention mechanism |
| CN120526608B (en) * | 2025-07-25 | 2025-10-03 | 湖南工商大学 | Road traffic flow prediction method based on spatiotemporal hybrid attention network |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111210432A (en) * | 2020-01-12 | 2020-05-29 | 湘潭大学 | An image semantic segmentation method based on multi-scale and multi-level attention mechanism |
| CN111915619A (en) * | 2020-06-05 | 2020-11-10 | 华南理工大学 | A fully convolutional network semantic segmentation method with dual feature extraction and fusion |
| CN111932553A (en) * | 2020-07-27 | 2020-11-13 | 北京航空航天大学 | Remote sensing image semantic segmentation method based on area description self-attention mechanism |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150104102A1 (en) * | 2013-10-11 | 2015-04-16 | Universidade De Coimbra | Semantic segmentation method with second-order pooling |
| CN112651973B (en) * | 2020-12-14 | 2022-10-28 | 南京理工大学 | Semantic segmentation method based on cascade of feature pyramid attention and mixed attention |
| CN113221969A (en) * | 2021-04-25 | 2021-08-06 | 浙江师范大学 | Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion |
-
2021
- 2021-04-25 CN CN202110446945.6A patent/CN113221969A/en not_active Withdrawn
-
2022
- 2022-03-17 WO PCT/CN2022/081427 patent/WO2022227913A1/en not_active Ceased
- 2022-03-17 LU LU503090A patent/LU503090B1/en active IP Right Grant
- 2022-07-12 ZA ZA2022/07731A patent/ZA202207731B/en unknown
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111210432A (en) * | 2020-01-12 | 2020-05-29 | 湘潭大学 | An image semantic segmentation method based on multi-scale and multi-level attention mechanism |
| CN111915619A (en) * | 2020-06-05 | 2020-11-10 | 华南理工大学 | A fully convolutional network semantic segmentation method with dual feature extraction and fusion |
| CN111932553A (en) * | 2020-07-27 | 2020-11-13 | 北京航空航天大学 | Remote sensing image semantic segmentation method based on area description self-attention mechanism |
Non-Patent Citations (1)
| Title |
|---|
| XIANGYAN TANG等: "DFFNet: An IoT-perceptive dual feature fusion network for general real-time semantic segmentation", INFORMATION SCIENCES 565 (2021) 326–343, 12 February 2021 (2021-02-12), pages 2 - 4 * |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2022227913A1 (en) * | 2021-04-25 | 2022-11-03 | 浙江师范大学 | Double-feature fusion semantic segmentation system and method based on internet of things perception |
| CN114821768A (en) * | 2022-03-18 | 2022-07-29 | 中国科学院自动化研究所 | Skeleton behavior identification method and device and electronic equipment |
| CN114913325A (en) * | 2022-03-24 | 2022-08-16 | 北京百度网讯科技有限公司 | Semantic segmentation method, device and computer program product |
| CN114913325B (en) * | 2022-03-24 | 2024-05-10 | 北京百度网讯科技有限公司 | Semantic segmentation method, semantic segmentation device and computer program product |
| CN114445430A (en) * | 2022-04-08 | 2022-05-06 | 暨南大学 | Real-time image semantic segmentation method and system based on lightweight multi-scale feature fusion |
| CN114821042A (en) * | 2022-04-27 | 2022-07-29 | 南京国电南自轨道交通工程有限公司 | An R-FCN knife gate detection method combining local and global features |
| CN114821042B (en) * | 2022-04-27 | 2025-07-22 | 南京国电南自轨道交通工程有限公司 | R-FCN knife switch detection method combining local features and global features |
| CN116740866A (en) * | 2023-08-11 | 2023-09-12 | 上海银行股份有限公司 | Banknote loading and clearing system and method for self-service machine |
| CN116740866B (en) * | 2023-08-11 | 2023-10-27 | 上海银行股份有限公司 | Banknote loading and clearing system and method for self-service machine |
| CN119399457A (en) * | 2024-09-18 | 2025-02-07 | 广州大学 | A real-time semantic segmentation method and system for multi-shape pyramids in traffic scenes |
| CN119399457B (en) * | 2024-09-18 | 2025-10-03 | 广州大学 | A real-time semantic segmentation method and system for multi-shape pyramids in traffic scenarios |
Also Published As
| Publication number | Publication date |
|---|---|
| LU503090B1 (en) | 2023-03-22 |
| WO2022227913A1 (en) | 2022-11-03 |
| ZA202207731B (en) | 2022-07-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113221969A (en) | Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion | |
| CN112651973B (en) | Semantic segmentation method based on cascade of feature pyramid attention and mixed attention | |
| CN112418292B (en) | Image quality evaluation method, device, computer equipment and storage medium | |
| CN113516133B (en) | Multi-modal image classification method and system | |
| CN112991350A (en) | RGB-T image semantic segmentation method based on modal difference reduction | |
| EP4336378A1 (en) | Data processing method and related device | |
| CN113486190A (en) | Multi-mode knowledge representation method integrating entity image information and entity category information | |
| CN108388900A (en) | The video presentation method being combined based on multiple features fusion and space-time attention mechanism | |
| CN113033570A (en) | Image semantic segmentation method for improving fusion of void volume and multilevel characteristic information | |
| CN113961736B (en) | Method, apparatus, computer device and storage medium for text generation image | |
| CN113066089B (en) | A Real-time Image Semantic Segmentation Method Based on Attention Guidance Mechanism | |
| CN113919479B (en) | Method for extracting data features and related device | |
| CN114863229A (en) | Image classification method and training method and device for image classification model | |
| CN114861907A (en) | Data computing method, device, storage medium and device | |
| CN114936901A (en) | Visual perception recommendation method and system based on cross-modal semantic reasoning and fusion | |
| CN117033609A (en) | Text visual question-answering method, device, computer equipment and storage medium | |
| US11948090B2 (en) | Method and apparatus for video coding | |
| CN112966672B (en) | Gesture recognition method under complex background | |
| CN111652349A (en) | A neural network processing method and related equipment | |
| CN118885601A (en) | Personalized recommendation method and system based on emotion-aware knowledge graph convolutional network | |
| CN118154866A (en) | A city-level point cloud semantic segmentation system and method based on spatial perception | |
| CN116912268A (en) | Skin lesion image segmentation method, device, equipment and storage medium | |
| CN119048399B (en) | Image restoration method, system, device and medium integrating cross attention | |
| CN112784831A (en) | Character recognition method for enhancing attention mechanism by fusing multilayer features | |
| WO2024174583A1 (en) | Model training method and apparatus, and device, storage medium and product |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| CB03 | Change of inventor or designer information | ||
| CB03 | Change of inventor or designer information |
Inventor after: Zhu Xinzhong Inventor after: Xu Huiying Inventor after: Zhao Jianmin Inventor before: Zhu Xinzhong Inventor before: Xu Huiying Inventor before: Tu Wenxuan Inventor before: Liu Xinwang Inventor before: Zhao Jianmin |
|
| WW01 | Invention patent application withdrawn after publication | ||
| WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210806 |