+

CN113221969A - Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion - Google Patents

Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion Download PDF

Info

Publication number
CN113221969A
CN113221969A CN202110446945.6A CN202110446945A CN113221969A CN 113221969 A CN113221969 A CN 113221969A CN 202110446945 A CN202110446945 A CN 202110446945A CN 113221969 A CN113221969 A CN 113221969A
Authority
CN
China
Prior art keywords
feature
fusion
features
attention vector
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110446945.6A
Other languages
Chinese (zh)
Inventor
朱信忠
徐慧英
涂文轩
刘新旺
赵建民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Normal University CJNU
Original Assignee
Zhejiang Normal University CJNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Normal University CJNU filed Critical Zhejiang Normal University CJNU
Priority to CN202110446945.6A priority Critical patent/CN113221969A/en
Publication of CN113221969A publication Critical patent/CN113221969A/en
Priority to PCT/CN2022/081427 priority patent/WO2022227913A1/en
Priority to LU503090A priority patent/LU503090B1/en
Priority to ZA2022/07731A priority patent/ZA202207731B/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a semantic segmentation system and a semantic segmentation method based on dual-feature fusion of Internet of things perception, wherein the method comprises the following steps of S1, carrying out feature coding on an original image to obtain features of different scales; s2, learning the features of different scales through two attention thinning blocks to obtain multi-level fusion features; s3, reducing the dimension of the multilevel fusion features to obtain dimension reduction features; s4, context coding is carried out on the dimension reduction features by using depth separable convolutions with different convolution scales to obtain local features with different scales; s5, performing global pooling on the dimensionality reduction features by using a global mean pooling layer to obtain global features; s6, channel splicing and fusing the global features and the local features to obtain multi-scale context fusion features; s7, performing channel splicing and fusion on the dimensionality reduction feature and the multi-scale context fusion feature to obtain a splicing feature; and S8, obtaining output according to the splicing characteristics. The semantic difference among the multi-level features is relieved, the information representation is enriched, and the identification precision is improved.

Description

Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a semantic segmentation system and method based on dual-feature fusion of Internet of things perception.
Background
Semantic segmentation, aiming at densely assigning each pixel to a respective predefined class, is becoming an increasingly attractive direction for computer vision field research. Because the deep learning method has strong expression learning capability, the research in the semantic segmentation field obtains good performance in many applications of the internet of things such as automatic driving, diabetic retinopathy, image analysis and the like. Two important factors, namely the feature fusion mode and the complexity of the network, significantly determine the performance of the semantic segmentation method. In particular, to accurately resolve complex scenarios in resource-constrained Internet of Things (IoT) environments, it is extremely important and challenging to encode robust multi-level features and different contextual information in an efficient and effective manner to achieve accurate, fast, and lightweight performance.
Existing semantic segmentation methods can be roughly divided into two categories: precision-oriented and efficiency-oriented methods. In the early days, most of the previous work was too focused on a single viewing angle: the accuracy of the identification of the algorithm or the speed of the efficiency of execution of the algorithm. In the first category of methods, the design idea of the semantic segmentation model mainly focuses on how to integrate diversified features, and a complex framework is designed to realize high-precision segmentation performance. For example, researchers propose a pyramid structure, a hole space pyramid pooling module (ASPP) or a Context Pyramid Module (CPM), and encode multi-scale context information at the end of the backbone network ResNet101(2048 feature maps) for processing the multi-scale change problem of the target. In addition, the U-type network directly fuses the hierarchical features through long-hop connection operation, and extracts spatial information of different levels as much as possible, thereby realizing accurate pixel segmentation. On the other hand, a typical asymmetric decoder structure has also been extensively studied by scholars. The ENet and ESPNet networks compress the network size greatly through pruning operations, and process large-scale images on line at a very fast speed. In order to improve the overall performance of the semantic segmentation method, recent semantic segmentation documents show a trend of uniformly considering the high efficiency and effectiveness of a segmentation network when multi-level features and multi-scale context information are coded. In particular, ERFNet employs a large number of decomposed convolutions with different dilation rates in the decoder portion, resulting in reduced redundancy of parameters while enlarging the field of view. In addition, researchers have proposed BiSenet, CANet, and ICNet that can process input images separately through several lightweight sub-networks and then fuse together multiple layers of feature or depth context information. In recent research, CIFReNet encodes multi-layer and multi-scale information by introducing a feature refinement and context integration module to achieve accurate and efficient scene segmentation.
Although the existing research achieves better segmentation performance in terms of high precision or high speed, the existing method at least has the following problems: 1) in the multi-level information fusion process, the process of feature extraction is completed by depending on more time and calculation complexity, so that the model learning efficiency is low and the calculation cost is high; 2) methods that directly fuse multi-source information through element-level addition or cascading operations rarely consider how to narrow the semantic gap between multi-layer features. Therefore, interaction between various information sources is hindered, resulting in non-ideal segmentation accuracy.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a semantic segmentation system and a semantic segmentation method based on dual-feature fusion of Internet of things perception, and the balance of comprehensive performances in the aspects of precision, speed, storage and computational complexity is realized.
A semantic segmentation method based on dual-feature fusion of Internet of things perception comprises the following steps:
s1, inputting an original image, and performing feature coding on the original image by using a backbone network to obtain features of different scales; s2, learning the features of different scales through two attention thinning blocks to obtain multi-level fusion features;
s3, reducing the dimension of the multilevel fusion features to obtain dimension reduction features;
s4, context coding is carried out on the dimension reduction features by using depth separable deconvolution with different convolution scales so as to obtain local features with different scales;
s5, performing global pooling on the dimensionality reduction features by using a global mean pooling layer to obtain global features;
s6, performing channel splicing and fusion on the global features and the local features of different scales to obtain multi-scale context fusion features;
s7, performing channel splicing and fusion on the dimensionality reduction feature and the multi-scale context fusion feature to obtain a splicing feature;
and S8, reducing the dimension of the spliced features and performing up-sampling to obtain the final output.
Preferably, step S1 specifically includes:
feature encoding is performed on the original image by using a backbone network to obtain a first feature, a second feature and a third feature, wherein the first feature is 1/4 of the original image scale, the second feature is 1/8 of the original image scale, and the third feature is 1/16 of the original image scale.
Preferably, step S2 includes the following steps:
s2.1, fusing the first feature and the second feature through a first attention thinning block to output semantic features;
and S2.2, fusing the semantic features and the third features through a second attention thinning block to obtain multi-level fusion features.
Preferably, step S2.1 specifically includes the following steps:
s2.1.1, mapping the first feature to a scale consistent with the second feature through a down-sampling layer to obtain a first scale feature;
s2.1.2, mapping the channel dimension of the first scale feature to be consistent with the channel dimension of the second feature through the first 1 x 1 convolution layer to obtain a first channel feature;
s2.1.3, channel splicing and fusing the first scale features and the second features to obtain first fused features;
s2.1.4, inputting the first fusion feature into a first adaptive mean pooling layer and a first adaptive maximum pooling layer respectively to output a first attention vector and a second attention vector respectively;
s2.1.5, performing nonlinear mapping on the first attention vector and the second attention vector through the first multi-layer perception layer to output a first mixed attention vector and a second mixed attention vector, and fusing the first mixed attention vector and the second mixed attention vector to output a first fused mixed attention vector;
s2.1.6, normalizing the first fused mixed attention vector to obtain a first normalized mixed attention vector;
s2.1.7, mapping the first channel feature with the first normalized mixed attention vector weighting;
s2.1.8, fusing the second feature and the weighted first channel feature to output a semantic feature.
Preferably, step S2.2 specifically includes the following steps:
s2.2.1, mapping the third feature to a scale consistent with the second feature through an upsampling layer to obtain a second scale feature;
s2.2.2, mapping the channel dimension of the second scale feature to be consistent with the channel dimension of the second feature through the second 1 x 1 convolution layer to obtain a second channel feature;
s2.2.3, channel splicing and fusing the second scale features and the semantic features to obtain second fusion features;
s2.2.4, inputting the second fusion feature into a second adaptive mean pooling layer and a second adaptive maximum pooling layer respectively to output a third attention vector and a fourth attention vector respectively;
s2.2.5, performing nonlinear mapping on the third attention vector and the fourth attention vector through the second multi-layer perception layer to output a third mixed attention vector and a fourth mixed attention vector, and fusing the third mixed attention vector and the fourth mixed attention vector to output a second fused mixed attention vector;
s2.2.6, normalizing the second fused mixed attention vector to obtain a second normalized mixed attention vector;
s2.2.7, mapping the second channel feature with a second normalized mixed attention vector weighting;
s2.2.8, fusing the semantic features and the weighted second channel features to obtain a multi-level fused feature.
Preferably, the first fusion feature is input into the first adaptive mean pooling layer and the first adaptive maximum pooling layer respectively in step S2.1.4 to output the first attention vector and the second attention vector respectively, specifically using the following formulas:
V1=AAP1(C[F1,F2]),
V2=AMP1(C[F1,F2]),
wherein, V1Is the first attention vector, V2Is the second attention vector, F1Is a first scale feature, F2In the second feature, C]Indicating channel splice fusion, AAP1() Representing a first adaptive mean pooling layer, AMP1() Representing a first adaptive max pooling layer.
Preferably, in step S2.2.4, the second fusion feature is respectively input into the second adaptive mean pooling layer and the second adaptive maximum pooling layer to respectively output the third attention vector and the fourth attention vector, specifically using the following formulas:
V3=AAP2(C[L1,L2]),
V4=AMP2(C[L1,L2]),
wherein, V3Is the third attention vector, V4Is the fourth attention vector, L1Is a second scale feature, L2Being a semantic feature, AAP2() Representing a second adaptive mean pooling layer, AMP2() Representing a second adaptive max pooling layer.
Preferably, in step S2.1.5, the first attention vector and the second attention vector are nonlinearly mapped by the first multi-layered sensing layer to output a first mixed attention vector and a second mixed attention vector, and the first mixed attention vector and the second mixed attention vector are channel-splicing-fused to output a first fused mixed attention vector, specifically using the following formula:
Va1=MLP1(C[V1,V2]),
in step S2.2.5, the third attention vector and the fourth attention vector are nonlinearly mapped by the second multi-layered sensing layer to output a third mixed attention vector and a fourth mixed attention vector, and the third mixed attention vector and the fourth mixed attention vector are fused to output a second fused mixed attention vector, specifically using the following formulas:
Va2=MLP2(C[V3,V4]),
wherein, Va1As a first fusion mixed attention vector, Va2For the second fused mixed attention vector, MLP1() Being a first multi-layer sensing layer, MLP2() A second multilayer sensing layer.
Preferably, the method comprises the following steps: s2.1.6, normalizing the first fused mixed attention vector to obtain a first normalized mixed attention vector, S2.1.7, weighting and mapping the first channel feature by the first normalized mixed attention vector, S2.1.8, fusing the second feature and the weighted first channel feature to output semantic features, specifically adopting the following formula:
Figure BDA0003037267300000061
the method comprises the following steps: s2.2.6, normalizing the second fused mixed attention vector to obtain a second normalized mixed attention vector, S2.2.7, weighting and mapping the second channel feature by the second normalized mixed attention vector, S2.2.8, fusing the semantic feature and the weighted second channel feature to obtain a multilevel fused feature, wherein the following formula is specifically adopted:
Figure BDA0003037267300000062
wherein L is2Is semantic feature, L'2For multi-level fusion of features, Sig1() Representing a first activation function, Sig2() Denotes a second activation function, F'1Is a first channel feature, L'1For the second channel feature, H represents the height of the feature map, W represents the width of the feature map,
Figure BDA0003037267300000071
representing a pixel-level dot-product operation,
Figure BDA0003037267300000072
representing a pixel level dot addition operation.
Correspondingly, a semantic segmentation system based on the Internet of things perception and adopting double-feature fusion is further provided, and comprises a multilayer feature fusion module and a lightweight semantic pyramid module which are connected with each other;
the multi-layer feature fusion module comprises a backbone network unit and a proofreading unit;
the lightweight semantic pyramid module comprises a first dimension reduction unit, a second dimension reduction unit, a third dimension reduction unit, a context coding unit, a global pooling unit, a first channel splicing and fusing unit, a second channel splicing and fusing unit and an upsampling unit;
the backbone network unit is connected with a proofreading unit, the proofreading unit is respectively connected with a first dimension reduction unit and a second dimension reduction unit, the first dimension reduction unit is respectively connected with a context coding unit and a global pooling unit, the context coding unit and the global pooling unit are both connected with a first channel splicing and fusing unit, the second dimension reduction unit and the first channel splicing and fusing unit are both connected with a second channel splicing and fusing unit, the second channel splicing and fusing unit is also connected with a third dimension reduction unit, and an up-sampling unit is connected with the third dimension reduction unit;
the backbone network unit is used for performing feature coding on the original image by using a backbone network to obtain features of different scales;
the checking unit is used for learning the features with different scales through the two attention thinning blocks so as to obtain multi-level fusion features;
the first dimension reduction unit and the second dimension reduction unit are used for reducing the dimension of the multi-level fusion feature so as to output a first dimension reduction feature and a second dimension reduction feature respectively, and the first dimension reduction feature and the second dimension reduction feature are the same;
the context coding unit is used for respectively carrying out context coding on the first dimension reduction characteristics through depth separable convolutions with different convolution scales so as to obtain local characteristics with different scales;
the global pooling unit is used for performing global pooling on the first dimension reduction feature through a global mean pooling layer to obtain a global feature;
the first channel splicing and fusing unit is used for carrying out channel splicing and fusing on the global features and the local features of different scales so as to obtain multi-scale context fusion features;
the second channel splicing and fusing unit is used for carrying out channel splicing and fusing on the second dimension reduction feature and the multi-scale context fusion feature to obtain a splicing feature;
the third dimension reduction unit is used for reducing the dimension of the splicing feature;
and the up-sampling unit is used for up-sampling the splicing characteristics subjected to the dimensionality reduction so as to obtain final output.
The invention has the beneficial effects that:
(1) a multi-level feature fusion module (MFFM) is proposed that employs two recursive attention-refining blocks (ARBs) to effectively improve the effectiveness of multi-level feature fusion. Under the condition that the computation cost of the executable is controllable, the proposed ARB corrects the spatial detail information in the low-order features by using the abstract semantic information of the high-order features, so that the semantic difference among the multi-level features is relieved.
(2) A Lightweight Semantic Pyramid Module (LSPM) is provided that decomposes convolution operators to reduce the computational overhead of context information coding. In addition, the module fuses the multi-level fusion features and the multi-scale context diagnosis, and information representation is enriched, so that the identification precision is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a semantic segmentation method based on Internet of things perception for dual feature fusion according to the invention;
FIG. 2 is a schematic structural diagram of a semantic segmentation system based on Internet of things perception and dual-feature fusion according to the invention;
fig. 3 is a schematic structural diagram of the attention thinning block according to the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided by way of specific examples, and other advantages and effects of the present invention will be readily apparent to those skilled in the art from the disclosure herein. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
The first embodiment is as follows:
referring to fig. 1, 2 and 3, the embodiment provides a semantic segmentation method based on dual-feature fusion of internet of things perception, which includes the steps of:
s1, inputting an original image, and performing feature coding on the original image by using a backbone network to obtain features of different scales;
s2, learning the features of different scales through two attention thinning blocks to obtain multi-level fusion features;
s3, reducing the dimension of the multilevel fusion features to obtain dimension reduction features;
s4, context coding is carried out on the dimension reduction features by using depth separable deconvolution with different convolution scales so as to obtain local features with different scales;
s5, performing global pooling on the dimensionality reduction features by using a global mean pooling layer to obtain global features;
s6, performing channel splicing and fusion on the global features and the local features of different scales to obtain multi-scale context fusion features;
s7, performing channel splicing and fusion on the dimensionality reduction feature and the multi-scale context fusion feature to obtain a splicing feature;
and S8, reducing the dimension of the spliced features and performing up-sampling to obtain the final output.
Wherein, step S1 specifically includes:
feature encoding is performed on the original image by using a backbone network to obtain a first feature, a second feature and a third feature, wherein the first feature is 1/4 of the original image scale, the second feature is 1/8 of the original image scale, and the third feature is 1/16 of the original image scale.
Each layer of the backbone network has different feature expression capabilities. The shallower layers contain more spatial detail but lack semantic information; while deeper layers retain rich semantic information but lose a large amount of spatial information. Intuitively, fusing multiple layers of information together has a gain effect on learning discriminative and comprehensive feature representations.
Based on the above observation, we obtain different scale features from the backbone network and sequentially note as I1/4,I1/8And I1/16And secondly, unifying the dimensions of all feature maps into 1/8 size to reduce information loss and resource utilization. Specifically, by a1 × 1 pooling layer pair I1/4Down-sampling to obtain I1/8Then, using bilinear layer to pair high-order characteristic diagram I1/16Up-sampling to obtain I'1/8. And finally fusing the three to obtain a multi-level fusion characteristic O. The above process is represented by the following formula:
T’1/8=T(GAPk=2,s=2(I1/4))
T”1/8=Upsample(I1/16)
Figure BDA0003037267300000101
wherein, GAPk=2,S=2() Representing a global mean pooling layer with scale 2 and step 2, T () is defined as a channel transform operation for changing the number of feature maps. Upsample () represents an upsampled layer and ≧ represents a pixel-level point add operation.
Although the above-described feature fusion operation facilitates the mutual utilization of complementary information between multi-level features, it may not be efficient and comprehensive to directly integrate low-level features with high-level features due to semantic differences in multi-level stages. To address this problem, the present invention designs a feature refinement strategy, called an attention-collation Block (ARB). Both focus on modeling the inter-channel relationships for multi-level fusion features. By this method, when the current channel position contains the characteristics of the value information, the model can emphasize the weight of the neuron highly related to the target object.
Namely, step S2 includes the following steps:
s2.1, fusing the first feature and the second feature through a first attention thinning block to output semantic features;
and S2.2, fusing the semantic features and the third features through a second attention thinning block to obtain multi-level fusion features.
Further, step S2.1 specifically includes the following steps:
s2.1.1, mapping the first feature to a scale consistent with the second feature through a down-sampling layer to obtain a first scale feature;
s2.1.2, mapping the channel dimension of the first scale feature to be consistent with the channel dimension of the second feature through the first 1 x 1 convolution layer to obtain a first channel feature;
s2.1.3, channel splicing and fusing the first scale features and the second features to obtain first fused features;
s2.1.4, inputting the first fusion feature into a first adaptive average pooling layer (AAP) and a first adaptive maximum pooling layer (AMP) respectively to output a first attention vector and a second attention vector respectively; both the adaptive mean pooling layer (AAP) and the adaptive max pooling layer (AMP) model the importance of individual feature channels by weighting all channels of the multi-level fused feature. The higher the importance of the current feature channel is, the larger the weight corresponding to the layer is.
S2.1.5, performing nonlinear mapping on the first attention vector and the second attention vector through the first multi-layer sensing layer to output a first mixed attention vector and a second mixed attention vector, which are used for improving the nonlinearity and robustness of the feature, and fusing the first mixed attention vector and the second mixed attention vector to output a first fused mixed attention vector;
s2.1.6, normalizing the first fused mixed attention vector to obtain a first normalized mixed attention vector;
s2.1.7, mapping the first channel feature with the first normalized mixed attention vector weighting;
s2.1.8, fusing the second feature and the weighted first channel feature to output a semantic feature.
Further, step S2.2 specifically includes the following steps:
s2.2.1, mapping the third feature to a scale consistent with the second feature through an upsampling layer to obtain a second scale feature;
s2.2.2, mapping the channel dimension of the second scale feature to be consistent with the channel dimension of the second feature through the second 1 x 1 convolution layer to obtain a second channel feature;
s2.2.3, channel splicing and fusing the second scale features and the semantic features to obtain second fusion features;
s2.2.4, inputting the second fusion feature into a second adaptive mean pooling layer and a second adaptive maximum pooling layer respectively to output a third attention vector and a fourth attention vector respectively;
s2.2.5, performing nonlinear mapping on the third attention vector and the fourth attention vector through the second multi-layer perception layer to output a third mixed attention vector and a fourth mixed attention vector, and fusing the third mixed attention vector and the fourth mixed attention vector to output a second fused mixed attention vector;
s2.2.6, normalizing the second fused mixed attention vector to obtain a second normalized mixed attention vector;
s2.2.7, mapping the second channel feature with a second normalized mixed attention vector weighting;
s2.2.8, fusing the semantic features and the weighted second channel features to obtain a multi-level fused feature.
Further, in step S2.1.4, the first fusion feature is respectively input into the first adaptive mean pooling layer and the first adaptive maximum pooling layer to respectively output the first attention vector and the second attention vector, specifically using the following formulas:
V1=AAP1(C[F1,F2]),
V2=AMP1(C[F1,F2]),
wherein, V1Is the first attention vector, V2Is the second attention vector, F1Is a first scale feature, F2In the second feature, C]Indicating channel splice fusion, AAP1() Representing a first adaptive mean pooling layer, AMP1() Representing a first adaptive max pooling layer.
Inputting the second fusion feature into the second adaptive mean pooling layer and the second adaptive maximum pooling layer respectively to output a third attention vector and a fourth attention vector, which are described in step S2.2.4, specifically using the following formulas:
V3=AAP2(C[L1,L2]),
V4=AMP2(C[L1,L2]),
wherein, V3Is the third attention vector, V4Is the fourth attention vector, L1Is a second scale feature, L2Being a semantic feature, AAP2() Representing a second adaptive mean pooling layer, AMP2() Representing a second adaptive max pooling layer.
Further, in step S2.1.5, the first attention vector and the second attention vector are nonlinearly mapped by the first multi-layered sensing layer to output a first mixed attention vector and a second mixed attention vector, and the first mixed attention vector and the second mixed attention vector are channel-splicing-fused to output a first fused mixed attention vector, specifically using the following formula:
Va1=MLP1(C[V1,V2]),
in step S2.2.5, the third attention vector and the fourth attention vector are nonlinearly mapped by the second multi-layered sensing layer to output a third mixed attention vector and a fourth mixed attention vector, and the third mixed attention vector and the fourth mixed attention vector are fused to output a second fused mixed attention vector, specifically using the following formulas:
Va2=MLP2(C[V3,V4]),
wherein, Va1As a first fusion mixed attention vector, Va2For the second fused mixed attention vector, MLP1() Being a first multi-layer sensing layer, MLP2() A second multilayer sensing layer.
Further, the steps of: s2.1.6, normalizing the first fused mixed attention vector to obtain a first normalized mixed attention vector, S2.1.7, weighting and mapping the first channel feature by the first normalized mixed attention vector, S2.1.8, fusing the second feature and the weighted first channel feature to output semantic features, specifically adopting the following formula:
Figure BDA0003037267300000141
the method comprises the following steps: s2.2.6, normalizing the second fused mixed attention vector to obtain a second normalized mixed attention vector, S2.2.7, weighting and mapping the second channel feature by the second normalized mixed attention vector, S2.2.8, fusing the semantic feature and the weighted second channel feature to obtain a multilevel fused feature, wherein the following formula is specifically adopted:
Figure BDA0003037267300000142
wherein L is2Is semantic feature, L'2For multi-level fusion of features, Sig1() Representing a first activation function, Sig2() Denotes a second activation function, F'1Is a first channel feature, L'1For the second channel feature, H represents the height of the feature map, W represents the width of the feature map,
Figure BDA0003037267300000143
representing a pixel-level dot-product operation,
Figure BDA0003037267300000144
representing a pixel level dot addition operation.
Technically, the design of the ARB can be regarded as an information collation strategy, and two attention-based paths predict the importance of each channel in a complementary manner, so that more semantic information is transferred to low-level features to relieve semantic difference among different-level features, and effective feature fusion is realized. The experimental results in the following section verify the effectiveness of this setup. It is worth noting that ARBs have only 0.03M parameters in total, and the entire multi-level feature fusion remains computationally lightweight.
Further, to enhance the computational efficiency of the context extraction module, the present invention proposes a deep separable Deconvolution (DFC) operation instead of the standard convolutional layer. Inspired by deep separable convolution and decomposed convolution, one main idea of lightweight feature extraction is to integrate the ideas of the two technologies. Firstly, a regularization layer and an activation function are used as two preprocessing steps to improve the regularity of the convolutional layer; next, the 3 × 3 depth convolution is decomposed to obtain two sets of depth separable convolution layers with dimensions of 3 × 1 and 1 × 3, respectively. By the method, the sparsity of the dense convolution kernels on all the channels is kept uniformly, so that the computational complexity and the resource overhead of convolution are reduced. And finally, fusing the local features and the global features of all scales to obtain the multi-scale context fusion feature.
After the multi-scale context is coded, the multi-level fusion features after dimension reduction are further combined with the global features and the local features of different scales to predict the final segmentation result. The above design has two advantages: on one hand, multi-level information and multi-scale context information are integrated in a unified system to realize more effective feature representation; on the other hand, the adoption of the jump connection can encourage the information transfer and gradient conduction of the front-layer multilevel information, thereby improving the identification efficiency.
The key points of the technology of the invention are as follows:
(1) the invention discloses a novel Internet of things-oriented dual-feature fusion real-time semantic segmentation network (DFFNet). Compared to advanced methods, DFFNet reduces FLOPs by about 2.5 times and increases model execution speed by 1.8 times, while achieving better accuracy.
(2) A multi-level feature fusion module (MFFM) is proposed that employs two recursive attention-collation blocks (ARBs) to effectively improve the effectiveness of multi-level feature fusion. Under the condition that the computation cost of the executable is controllable, the proposed ARB corrects the spatial detail information in the low-order features by using the abstract semantic information of the high-order features, so that the semantic difference among the multi-level features is relieved.
(3) A Lightweight Semantic Pyramid Module (LSPM) is provided that decomposes convolution operators to reduce the computational overhead of context information coding. In addition, the module fuses the multi-level fusion features and the multi-scale context diagnosis, and abundant information is represented, so that the identification precision is improved.
Further, this embodiment also compares the present invention across multiple data sets against existing methods to verify the effectiveness of the present invention.
Data set: the data set used in the present invention is a recognized standard scene perception data set, cityscaps, consisting of 25000 annotated 2048 × 1024 resolution images. The annotations set contains 30 classes, 19 of which are used for training and evaluation. In the experiments of the present invention, only 5000 images with fine annotations were used. There were 2975 images for training, 500 images for verification, and 1525 images for testing.
Setting parameters: all experiments were in NVIDIA1080 tivpu card. We perform a 0.5 to 1.5 times random scaling of the image and apply a random left-right flip operation randomly on all training sets. Further, the initial learning rate was set to 0.005, and the learning rate was attenuated using a poly strategy. The network was trained using a stochastic gradient descent optimization algorithm by minimizing the pixel cross entropy loss, with a momentum of 0.9 and a weight decay of 5 e-4. Finally, a batch normalization layer is applied before all conventional or expanded convolutional layers to achieve fast convergence.
Evaluation indexes are as follows: the invention adopts four evaluation indexes recognized in the field of semantic segmentation: segmentation accuracy, inference speed, network parameters and computational complexity.
The multistage feature fusion module ablation experiment:
as shown in table 1, the present invention compares four multi-level feature fusion models with a reference model: elemental Additive Fusion (EAF), average pool attention refinement (AAR), Maximum Attention Refinement (MAR), and the use of AAR and MAR in combination. As shown in the table, the EAF performance is only 1.12% higher than the baseline network, which indicates that directly fusing the multi-level feature is a sub-optimal solution. Compared with a reference network, the AAR and the MAR realize the performance improvement of 2.61% of mIoU and 2.54% of mIoU, which shows that the interdependence relationship between modeling channels can reduce the semantic difference between multi-level features. The bilateral pooling attention strategy provided by the invention mutually compensates the significance information and the global information. Thus, MFFM achieves a further boost of 0.55% and 0.62% mlou compared to AAR and MAR. In addition, the proposed MFFM adds negligible additional calculations (only 0.06M parameter and 0.11GFLOPs), which verifies the efficiency and effectiveness of the proposed module.
Model (model) Speed (ms) Reference quantity (M) FLOPs(G) MIoU(%)
Baseline 15.40 1.82 2.79 67.83
EAF 15.81 1.85 2.90 68.95
AAR 15.80 1.86 2.90 70.44
MAR 15.81 1.86 2.90 70.37
AAR+MAR 16.03 1.88 2.90 70.99
TABLE 1 ablation learning of multilevel feature fusion modules
The ablation experiment of the lightweight semantic pyramid module comprises the following steps:
the test evaluates the performance of the lightweight semantic pyramid module. SC-SPM, FC-SPM, DC-SPM and DFC-SPM respectively represent methods with four semantic pyramid modules, which are built on conventional convolution, decomposed convolution, deep convolution and deep separable deconvolution, respectively. As shown in table 2, 1) compared with the reference model EAF, the semantic segmentation method with the semantic pyramid module can improve the mIOU segmentation accuracy by about 1.11% to 2.70%, which indicates that extracting the local and global context information can significantly improve the learning ability of the model. 2) Although SC-SPM, FC-SPM, DC-SPM, and DFC-SPM achieve similar accuracy performance, building a semantic pyramid module based on efficient convolution achieves better efficiency (faster speed and less computational complexity) than building a module based on conventional convolution. DFC-SPM achieved 71.02% IU, with only 0.05M additional parameters and 0.20G FLOPs. 3) The LSPM integrates context information and multi-level feature information by designing a short-distance feature learning operation, and is used for encouraging information transfer and gradient conduction of front-level multi-level information. Therefore, the accuracy performance of the DFC-SPM method is improved from 71.02% mIoU to 71.65% mIoU. The above results demonstrate the high efficiency and effectiveness of the proposed LSPM.
Model (model) Speed (ms) Reference quantity (M) FLOPs(G) MIoU(%)
EAF 15.81 1.85 2.90 68.95
SC-SPM 16.22 2.11 4.43 70.81
FC-SPM 16.10 2.03 3.72 70.06
DC-SPM 15.76 1.90 3.11 71.00
DFC-SPM 15.72 1.90 3.10 71.02
LSPM 15.65 1.89 3.06 71.65
TABLE 2 ablation learning of lightweight semantic pyramid modules
Evaluation on the reference data set:
on the cityscaps dataset, DFFNet was compared to other existing semantic segmentation methods. "-" indicates that the method does not publish the corresponding performance value.
Figure BDA0003037267300000181
TABLE 3 comprehensive Properties of this chapter of methods and comparison methods on the Cityscapes dataset
As shown in table 3, SegNet and ENET improve speed by compressing the model scale significantly at the expense of segmentation accuracy. LW-Refine Net and ERFNet design an asymmetric codec structure to maintain a balance of accuracy and efficiency. The BeSiNet, CANet and ICNet adopt a multi-branch structure, so that good balance between precision and speed is achieved, but more additional learning parameters are introduced. In contrast, DFFNet achieves better accuracy and efficiency performance, particularly in terms of a reduction in network parameters (1.9M parameters) and computational complexity (3.1 GFLOPs). In addition, FCN and partition 10 use a VGG backbone network (e.g., VGG16 and VGG19) that is computationally expensive as a feature extractor, requiring 2 seconds or more to process an image. DRN, deedlab v2, reflinenet, and PSPNet employ deep ResNet backbone networks (e.g., ResNet50 and ResNet101) to enhance multi-scale feature representation, requiring significant computational cost and memory usage. Compared with the accuracy-oriented methods, the method only needs 12ms for processing images with 640 × 360 resolution, and achieves the segmentation accuracy of 71.0% mIoU.
In conclusion, the method realizes comprehensive segmentation performance in precision and efficiency (reasoning speed, network parameters and computational complexity), so that the method has great deployment potential on the Internet of things equipment with limited resources.
Example two:
referring to fig. 3, the embodiment provides a semantic segmentation system based on dual-feature fusion of internet of things perception, which includes a multilayer feature fusion module and a lightweight semantic pyramid module connected to each other;
the multi-layer feature fusion module comprises a backbone network unit and a proofreading unit;
the lightweight semantic pyramid module comprises a first dimension reduction unit, a second dimension reduction unit, a third dimension reduction unit, a context coding unit, a global pooling unit, a first channel splicing and fusing unit, a second channel splicing and fusing unit and an upsampling unit;
the backbone network unit is connected with a proofreading unit, the proofreading unit is respectively connected with a first dimension reduction unit and a second dimension reduction unit, the first dimension reduction unit is respectively connected with a context coding unit and a global pooling unit, the context coding unit and the global pooling unit are both connected with a first channel splicing and fusing unit, the second dimension reduction unit and the first channel splicing and fusing unit are both connected with a second channel splicing and fusing unit, the second channel splicing and fusing unit is also connected with a third dimension reduction unit, and an up-sampling unit is connected with the third dimension reduction unit;
the backbone network unit is used for performing feature coding on the original image by using a backbone network to obtain features of different scales;
the checking unit is used for learning the features with different scales through the two attention thinning blocks so as to obtain multi-level fusion features;
the first dimension reduction unit and the second dimension reduction unit are used for reducing the dimension of the multi-level fusion feature so as to output a first dimension reduction feature and a second dimension reduction feature respectively, and the first dimension reduction feature and the second dimension reduction feature are the same;
the context coding unit is used for respectively carrying out context coding on the first dimension reduction characteristics through depth separable convolutions with different convolution scales so as to obtain local characteristics with different scales;
the global pooling unit is used for performing global pooling on the first dimension reduction feature through a global mean pooling layer to obtain a global feature;
the first channel splicing and fusing unit is used for carrying out channel splicing and fusing on the global features and the local features of different scales so as to obtain multi-scale context fusion features;
the second channel splicing and fusing unit is used for carrying out channel splicing and fusing on the second dimension reduction feature and the multi-scale context fusion feature to obtain a splicing feature;
the third dimension reduction unit is used for reducing the dimension of the splicing feature;
and the up-sampling unit is used for up-sampling the splicing characteristics subjected to the dimensionality reduction so as to obtain final output.
It should be noted that, similar to the embodiment, the semantic segmentation system based on dual-feature fusion for internet of things perception provided in this embodiment is not described herein in detail.
The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention by those skilled in the art should fall within the protection scope of the present invention without departing from the design spirit of the present invention.

Claims (10)

1.一种基于物联网感知的双特征融合的语义分割方法,其特征在于,包括步骤:1. a semantic segmentation method based on the dual feature fusion of Internet of Things perception, is characterized in that, comprises the steps: S1、输入原图像,利用骨干网络对原图像进行特征编码,以获得不同尺度的特征;S1. Input the original image, and use the backbone network to perform feature encoding on the original image to obtain features of different scales; S2、通过两个注意力细化块对不同尺度的特征进行学习,以获取多级融合特征;S2. Learning features of different scales through two attention refinement blocks to obtain multi-level fusion features; S3、对多级融合特征进行降维,以得到降维特征;S3. Perform dimensionality reduction on the multi-level fusion features to obtain dimensionality reduction features; S4、用不同卷积尺度的深度可分解卷积分别对降维特征进行上下文编码,以获得不同尺度的局部特征;S4. Use depthwise decomposable convolutions of different convolution scales to perform context encoding on the dimensionality reduction features respectively to obtain local features of different scales; S5、用全局均值池化层对降维特征进行全局池化,以获得全局特征;S5. Use the global mean pooling layer to globally pool the dimensionality reduction features to obtain global features; S6、将全局特征以及不同尺度的局部特征进行通道拼接融合,以获得多尺度上下文融合特征;S6. Perform channel splicing and fusion of global features and local features of different scales to obtain multi-scale context fusion features; S7、将降维特征、多尺度上下文融合特征进行通道拼接融合,以获得拼接特征;S7. Perform channel splicing and fusion on dimension reduction features and multi-scale context fusion features to obtain splicing features; S8、将拼接特征降维并进行上采样,以获得最终输出。S8. Reduce the dimensionality of the spliced features and perform up-sampling to obtain the final output. 2.根据权利要求1所述的一种基于物联网感知的双特征融合的语义分割方法,其特征在于,步骤S1中具体为:2. a kind of semantic segmentation method based on the dual feature fusion of Internet of Things perception according to claim 1, is characterized in that, in step S1 specifically: 利用骨干网络对原图像进行特征编码,以获得第一、第二、第三特征,其中第一特征尺度为原图像尺度的1/4,第二特征尺度为原图像尺度的1/8,第三特征尺度为原图像尺度的1/16。Use the backbone network to perform feature encoding on the original image to obtain the first, second and third features, where the first feature scale is 1/4 of the original image scale, the second feature scale is 1/8 of the original image scale, and the The three feature scale is 1/16 of the original image scale. 3.根据权利要求2所述的一种基于物联网感知的双特征融合的语义分割方法,其特征在于,步骤S2中包括以下步骤:3. a kind of semantic segmentation method based on the dual feature fusion of Internet of Things perception according to claim 2, is characterized in that, comprises the following steps in step S2: S2.1、通过第一注意力细化块对第一特征、第二特征进行融合,以输出语义特征;S2.1, fuse the first feature and the second feature through the first attention refinement block to output semantic features; S2.2、通过第二注意力细化块对语义特征以及第三特征进行融合,以获得多级融合特征。S2.2, the semantic feature and the third feature are fused through the second attention refinement block to obtain multi-level fusion features. 4.根据权利要求3所述的一种基于物联网感知的双特征融合的语义分割方法,其特征在于,步骤S2.1中具体包括以下步骤:4. a kind of semantic segmentation method based on Internet of Things perception dual feature fusion according to claim 3, is characterized in that, specifically comprises the following steps in step S2.1: S2.1.1、通过降采样层将第一特征映射至尺度与第二特征一致,以得到第一尺度特征;S2.1.1. Map the first feature to a scale consistent with the second feature through the downsampling layer to obtain the first scale feature; S2.1.2、通过第一1*1卷积层将第一尺度特征的通道维度映射至与第二特征通道维数一致,以获得第一通道特征;S2.1.2. Map the channel dimension of the first scale feature to be consistent with the second feature channel dimension through the first 1*1 convolution layer to obtain the first channel feature; S2.1.3、将第一尺度特征与第二特征进行通道拼接融合,以获取第一融合特征;S2.1.3. Perform channel splicing and fusion of the first scale feature and the second feature to obtain the first fusion feature; S2.1.4、将第一融合特征分别输入第一自适应均值池化层和第一自适应最大池化层,以分别输出第一注意力向量、第二注意力向量;S2.1.4, input the first fusion feature into the first adaptive mean pooling layer and the first adaptive maximum pooling layer respectively, to output the first attention vector and the second attention vector respectively; S2.1.5、通过第一多层感知层对第一注意力向量、第二注意力向量进行非线性映射以输出第一混合注意力向量、第二混合注意力向量,将第一混合注意力向量、第二混合注意力向量进行融合,以输出第一融合混合注意力向量;S2.1.5. Perform nonlinear mapping on the first attention vector and the second attention vector through the first multi-layer perceptual layer to output the first mixed attention vector and the second mixed attention vector, and the first mixed attention vector , the second mixed attention vector is fused to output the first fusion mixed attention vector; S2.1.6、对第一融合混合注意力向量做归一化处理,以获取第一归一化混合注意力向量;S2.1.6, normalize the first fusion mixed attention vector to obtain the first normalized mixed attention vector; S2.1.7、用第一归一化混合注意力向量加权映射第一通道特征;S2.1.7, use the first normalized mixed attention vector to weight map the first channel feature; S2.1.8、融合第二特征以及加权后的第一通道特征,以输出语义特征。S2.1.8, fuse the second feature and the weighted first channel feature to output semantic features. 5.根据权利要求4所述的一种基于物联网感知的双特征融合的语义分割方法,其特征在于,步骤S2.2中具体包括以下步骤:5. a kind of semantic segmentation method based on the dual feature fusion of Internet of Things perception according to claim 4, is characterized in that, specifically comprises the following steps in step S2.2: S2.2.1、通过上采样层将第三特征映射至尺度与第二特征一致,以得到第二尺度特征;S2.2.1. Map the third feature to a scale consistent with the second feature through the upsampling layer to obtain the second scale feature; S2.2.2、通过第二1*1卷积层将第二尺度特征的通道维度映射至与第二特征通道维数一致,以获得第二通道特征;S2.2.2. Map the channel dimension of the second scale feature to be consistent with the second feature channel dimension through the second 1*1 convolution layer to obtain the second channel feature; S2.2.3、将第二尺度特征与语义特征进行通道拼接融合,以获取第二融合特征;S2.2.3. Perform channel splicing and fusion of the second scale feature and the semantic feature to obtain the second fusion feature; S2.2.4、将第二融合特征分别输入第二自适应均值池化层和第二自适应最大池化层,以分别输出第三注意力向量、第四注意力向量;S2.2.4, input the second fusion feature into the second adaptive mean pooling layer and the second adaptive maximum pooling layer respectively, to output the third attention vector and the fourth attention vector respectively; S2.2.5、通过第二多层感知层对第三注意力向量、第四注意力向量进行非线性映射以输出第三混合注意力向量、第四混合注意力向量,将第三混合注意力向量、第四混合注意力向量进行融合以输出第二融合混合注意力向量;S2.2.5. Perform nonlinear mapping on the third attention vector and the fourth attention vector through the second multi-layer perceptual layer to output the third mixed attention vector and the fourth mixed attention vector, and the third mixed attention vector , the fourth mixed attention vector is fused to output the second fusion mixed attention vector; S2.2.6、对第二融合混合注意力向量做归一化处理,以获取第二归一化混合注意力向量;S2.2.6, normalize the second fusion mixed attention vector to obtain the second normalized mixed attention vector; S2.2.7、用第二归一化混合注意力向量加权映射第二通道特征;S2.2.7, use the second normalized mixed attention vector to weight map the second channel feature; S2.2.8、融合语义特征以及加权后的第二通道特征,以获得多级融合特征。S2.2.8, fuse semantic features and weighted second channel features to obtain multi-level fusion features. 6.根据权利要求5所述的一种基于物联网感知的双特征融合的语义分割方法,其特征在于,步骤S2.1.4中所述的将第一融合特征分别输入第一自适应均值池化层和第一自适应最大池化层,以分别输出第一注意力向量、第二注意力向量,具体采用以下公式:6. The method for semantic segmentation based on IoT-aware dual feature fusion according to claim 5, wherein the step S2.1.4 is to input the first fusion feature into the first adaptive mean pooling respectively layer and the first adaptive max pooling layer to output the first attention vector and the second attention vector respectively, and the following formula is specifically used: V1=AAP1(C[F1,F2]),V 1 =AAP 1 (C[F 1 , F 2 ]), V2=AMP1(C[F1,F2]),V 2 =AMP 1 (C[F 1 , F 2 ]), 其中,V1为第一注意力向量,V2为第二注意力向量,F1为第一尺度特征,F2为第二特征,C[]表示通道拼接融合,AAP1()表示第一自适应均值池化层,AMP1()表示第一自适应最大池化层。Among them, V 1 is the first attention vector, V 2 is the second attention vector, F 1 is the first scale feature, F 2 is the second feature, C[] represents channel splicing and fusion, AAP 1 () represents the first Adaptive mean pooling layer, AMP 1 ( ) represents the first adaptive max pooling layer. 7.根据权利要求6所述的一种基于物联网感知的双特征融合的语义分割方法,其特征在于,步骤S2.2.4中所述的将第二融合特征分别输入第二自适应均值池化层和第二自适应最大池化层,以分别输出第三注意力向量、第四注意力向量,具体采用以下公式:7. The method for semantic segmentation based on IoT-aware dual feature fusion according to claim 6, wherein the second fusion feature is input into the second adaptive mean pooling as described in step S2.2.4, respectively. layer and the second adaptive max pooling layer to output the third attention vector and the fourth attention vector, respectively, using the following formula: V3=AAP2(C[L1,L2]),V 3 =AAP 2 (C[L 1 ,L 2 ]), V4=AMP2(C[L1,L2]),V 4 =AMP 2 (C[L 1 , L 2 ]), 其中,V3为第三注意力向量,V4为第四注意力向量,L1为第二尺度特征,L2为语义特征,AAP2()表示第二自适应均值池化层,AMP2()表示第二自适应最大池化层。Among them, V 3 is the third attention vector, V 4 is the fourth attention vector, L 1 is the second scale feature, L 2 is the semantic feature, AAP 2 ( ) represents the second adaptive mean pooling layer, AMP 2 ( ) represents the second adaptive max pooling layer. 8.根据权利要求7所述的一种基于物联网感知的双特征融合的语义分割方法,其特征在于,8. a kind of semantic segmentation method based on Internet of Things perception dual feature fusion according to claim 7, is characterized in that, 步骤S2.1.5中所述通过第一多层感知层对第一注意力向量、第二注意力向量进行非线性映射以输出第一混合注意力向量、第二混合注意力向量,将第一混合注意力向量、第二混合注意力向量进行通道拼接融合,以输出第一融合混合注意力向量,具体采用以下公式:In step S2.1.5, the first attention vector and the second attention vector are nonlinearly mapped through the first multi-layer perceptual layer to output the first mixed attention vector and the second mixed attention vector, and the first mixed attention vector is The attention vector and the second mixed attention vector are channel-spliced and fused to output the first fusion mixed attention vector. Specifically, the following formula is used: Va1=MLP1(C[V1,V2]),V a1 =MLP 1 (C[V 1 , V 2 ]), 步骤S2.2.5中所述通过第二多层感知层对第三注意力向量、第四注意力向量进行非线性映射以输出第三混合注意力向量、第四混合注意力向量,将第三混合注意力向量、第四混合注意力向量进行融合以输出第二融合混合注意力向量,具体采用以下公式:In step S2.2.5, the third attention vector and the fourth attention vector are nonlinearly mapped through the second multi-layer perceptual layer to output the third mixed attention vector and the fourth mixed attention vector, and the third mixed attention vector is mixed. The attention vector and the fourth mixed attention vector are fused to output the second fused mixed attention vector, specifically using the following formula: Va2=MLP2(C[V3,V4]), Va2 = MLP 2 (C[V 3 , V 4 ]), 其中,Va1为第一融合混合注意力向量,Va2为第二融合混合注意力向量,MLP1()为第一多层感知层,MLP2()为第二多层感知层。Wherein, V a1 is the first fusion mixed attention vector, V a2 is the second fusion mixed attention vector, MLP 1 ( ) is the first multi-layer perception layer, and MLP 2 ( ) is the second multi-layer perception layer. 9.根据权利要求8所述的一种基于物联网感知的双特征融合的语义分割方法,其特征在于,步骤:S2.1.6、对第一融合混合注意力向量做归一化处理,以获取第一归一化混合注意力向量,S2.1.7、用第一归一化混合注意力向量加权映射第一通道特征,S2.1.8、融合第二特征以及加权后的第一通道特征,以输出语义特征,具体采用以下公式:9. A kind of semantic segmentation method based on Internet of Things perception dual feature fusion according to claim 8, it is characterized in that, step: S2.1.6, normalize the first fusion mixed attention vector to obtain The first normalized mixed attention vector, S2.1.7, use the first normalized mixed attention vector to weight map the first channel feature, S2.1.8, fuse the second feature and the weighted first channel feature to output Semantic features, using the following formula:
Figure FDA0003037267290000051
Figure FDA0003037267290000051
步骤:S2.2.6、对第二融合混合注意力向量做归一化处理,以获取第二归一化混合注意力向量,S2.2.7、用第二归一化混合注意力向量加权映射第二通道特征,S2.2.8、融合语义特征以及加权后的第二通道特征,以获得多级融合特征,具体采用以下公式:Steps: S2.2.6, normalize the second fusion mixed attention vector to obtain the second normalized mixed attention vector, S2.2.7, use the second normalized mixed attention vector to weight map the second Channel features, S2.2.8, fusion semantic features and weighted second channel features to obtain multi-level fusion features, specifically using the following formula:
Figure FDA0003037267290000052
Figure FDA0003037267290000052
其中,L2为语义特征、L’2为多级融合特征、Sig1()表示第一激活函数、Sig2()表示第二激活函数、F’1为第一通道特征、L’1为第二通道特征、H表示特征图的高,W表示特征图的宽,
Figure FDA0003037267290000053
表示像素级点乘操作,
Figure FDA0003037267290000054
表示像素级点加操作。
Wherein, L 2 is the semantic feature, L' 2 is the multi-level fusion feature, Sig 1 ( ) represents the first activation function, Sig 2 ( ) represents the second activation function, F' 1 is the first channel feature, and L' 1 is the The second channel feature, H represents the height of the feature map, W represents the width of the feature map,
Figure FDA0003037267290000053
represents the pixel-level dot product operation,
Figure FDA0003037267290000054
Represents a pixel-level dot-add operation.
10.一种基于物联网感知的双特征融合的语义分割系统,其特征在于,包括相连的多层特征融合模块以及轻量级语义金字塔模块;10. A semantic segmentation system based on Internet-of-Things perception dual feature fusion, characterized in that it comprises a connected multi-layer feature fusion module and a lightweight semantic pyramid module; 多层特征融合模块包括骨干网络单元、校对单元;The multi-layer feature fusion module includes a backbone network unit and a proofreading unit; 轻量级语义金字塔模块包括第一降维单元、第二降维单元、第三降维单元、上下文编码单元、全局池化单元、第一通道拼接融合单元、第二通道拼接融合单元、上采样单元;The lightweight semantic pyramid module includes a first dimension reduction unit, a second dimension reduction unit, a third dimension reduction unit, a context encoding unit, a global pooling unit, a first channel splicing and fusion unit, a second channel splicing and fusion unit, and an upsampling unit. unit; 其中骨干网络单元与校对单元相连,校对单元分别与第一降维单元、第二降维单元相连,第一降维单元分别与上下文编码单元、全局池化单元相连,上下文编码单元、全局池化单元均与第一通道拼接融合单元相连,第二降维单元、第一通道拼接融合单元均与第二通道拼接融合单元相连,第二通道拼接融合单元还与第三降维单元相连,上采样单元与第三降维单元相连;The backbone network unit is connected with the proofreading unit, the proofreading unit is connected with the first dimension reduction unit and the second dimension reduction unit respectively, the first dimension reduction unit is connected with the context coding unit and the global pooling unit respectively, the context coding unit and the global pooling unit are respectively connected. The units are all connected with the first channel splicing and fusion unit, the second dimensionality reduction unit and the first channel splicing and fusion unit are all connected with the second channel splicing and fusion unit, and the second channel splicing and fusion unit is also connected with the third dimension reduction unit, upsampling The unit is connected to the third dimension reduction unit; 所述骨干网络单元,用于利用骨干网络对原图像进行特征编码,以获得不同尺度的特征;The backbone network unit is used to encode the features of the original image by using the backbone network to obtain features of different scales; 所述校对单元,用于通过两个注意力细化块对不同尺度的特征进行学习,以获取多级融合特征;The proofreading unit is used to learn features of different scales through two attention refinement blocks to obtain multi-level fusion features; 第一降维单元以及第二降维单元均用于对多级融合特征进行降维,以分别输出第一降维特征、第二降维特征,第一降维特征以及第二降维特征两者相同;Both the first dimension reduction unit and the second dimension reduction unit are used to reduce the dimension of the multi-level fusion feature, so as to output the first dimension reduction feature, the second dimension reduction feature, the first dimension reduction feature and the second dimension reduction feature respectively. are the same; 上下文编码单元,用于通过不同卷积尺度的深度可分解卷积分别对第一降维特征进行上下文编码,以获得不同尺度的局部特征;a context encoding unit, used to perform context encoding on the first dimension reduction feature through depthwise decomposable convolutions of different convolution scales, so as to obtain local features of different scales; 全局池化单元,用于通过全局均值池化层对第一降维特征进行全局池化,以获得全局特征;The global pooling unit is used to globally pool the first dimension reduction feature through the global mean pooling layer to obtain the global feature; 第一通道拼接融合单元,用于对全局特征以及不同尺度的局部特征进行通道拼接融合,以获得多尺度上下文融合特征;The first channel splicing and fusion unit is used for channel splicing and fusion of global features and local features of different scales to obtain multi-scale context fusion features; 第二通道拼接融合单元,用于将第二降维特征、多尺度上下文融合特征进行通道拼接融合,以获得拼接特征;The second channel splicing and fusion unit is used for channel splicing and fusion of the second dimension reduction feature and the multi-scale context fusion feature to obtain the splicing feature; 第三降维单元,用于对拼接特征进行降维;The third dimension reduction unit is used to reduce the dimension of the spliced feature; 上采样单元,用于对降维后的拼接特征进行上采样,以获得最终输出。The upsampling unit is used to upsample the dimension-reduced concatenated features to obtain the final output.
CN202110446945.6A 2021-04-25 2021-04-25 Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion Withdrawn CN113221969A (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202110446945.6A CN113221969A (en) 2021-04-25 2021-04-25 Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion
PCT/CN2022/081427 WO2022227913A1 (en) 2021-04-25 2022-03-17 Double-feature fusion semantic segmentation system and method based on internet of things perception
LU503090A LU503090B1 (en) 2021-04-25 2022-03-17 A semantic segmentation system and method based on dual feature fusion for iot sensing
ZA2022/07731A ZA202207731B (en) 2021-04-25 2022-07-12 A semantic segmentation system and method based on dual feature fusion for iot sensing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110446945.6A CN113221969A (en) 2021-04-25 2021-04-25 Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion

Publications (1)

Publication Number Publication Date
CN113221969A true CN113221969A (en) 2021-08-06

Family

ID=77088741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110446945.6A Withdrawn CN113221969A (en) 2021-04-25 2021-04-25 Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion

Country Status (4)

Country Link
CN (1) CN113221969A (en)
LU (1) LU503090B1 (en)
WO (1) WO2022227913A1 (en)
ZA (1) ZA202207731B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445430A (en) * 2022-04-08 2022-05-06 暨南大学 Real-time image semantic segmentation method and system based on lightweight multi-scale feature fusion
CN114821768A (en) * 2022-03-18 2022-07-29 中国科学院自动化研究所 Skeleton behavior identification method and device and electronic equipment
CN114821042A (en) * 2022-04-27 2022-07-29 南京国电南自轨道交通工程有限公司 An R-FCN knife gate detection method combining local and global features
CN114913325A (en) * 2022-03-24 2022-08-16 北京百度网讯科技有限公司 Semantic segmentation method, device and computer program product
WO2022227913A1 (en) * 2021-04-25 2022-11-03 浙江师范大学 Double-feature fusion semantic segmentation system and method based on internet of things perception
CN116740866A (en) * 2023-08-11 2023-09-12 上海银行股份有限公司 Banknote loading and clearing system and method for self-service machine
CN119399457A (en) * 2024-09-18 2025-02-07 广州大学 A real-time semantic segmentation method and system for multi-shape pyramids in traffic scenes

Families Citing this family (59)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115272677A (en) * 2022-08-01 2022-11-01 安徽理工大学环境友好材料与职业健康研究院(芜湖) Multi-scale feature fusion semantic segmentation method, equipment and storage medium
CN115713624A (en) * 2022-09-02 2023-02-24 郑州大学 Self-adaptive fusion semantic segmentation method for enhancing multi-scale features of remote sensing image
CN115830449B (en) * 2022-12-01 2025-09-02 北京理工大学重庆创新中心 Remote sensing target detection method guided by explicit contours and enhanced by spatially varying context
CN116664875A (en) * 2023-01-16 2023-08-29 河北师范大学 PVT-based saliency target detection method for gating network
CN116030307B (en) * 2023-02-03 2025-08-05 山东大学 Breast pathology image recognition system based on context-aware multi-scale feature fusion
CN116229065B (en) * 2023-02-14 2023-12-01 湖南大学 A segmentation method for robotic surgical instruments based on multi-branch fusion
CN116342884B (en) * 2023-03-28 2024-02-06 阿里云计算有限公司 Image segmentation and model training method and server
CN116052007B (en) * 2023-03-30 2023-08-11 山东锋士信息技术有限公司 Remote sensing image change detection method integrating time and space information
CN116434069B (en) * 2023-04-27 2025-09-19 南京信息工程大学 Remote sensing image change detection method based on local-global transducer network
CN116205928B (en) * 2023-05-06 2023-07-18 南方医科大学珠江医院 Image segmentation processing method, device and equipment for laparoscopic surgery video and medium
CN116580241B (en) * 2023-05-22 2024-05-14 内蒙古农业大学 Image processing method and system based on dual-branch multi-scale semantic segmentation network
CN116630386B (en) * 2023-06-12 2024-02-20 新疆生产建设兵团医院 CTA scanning image processing method and system thereof
CN116721253B (en) * 2023-06-12 2025-08-15 湖南科技大学 Abdominal CT image multi-organ segmentation method based on deep learning
CN117115435B (en) * 2023-06-30 2025-07-18 重庆理工大学 Attention and multi-scale feature extraction-based real-time semantic segmentation method
CN116721351B (en) * 2023-07-06 2024-06-18 内蒙古电力(集团)有限责任公司内蒙古超高压供电分公司 Remote sensing intelligent extraction method for road environment characteristics in overhead line channel
CN116559778B (en) * 2023-07-11 2023-09-29 海纳科德(湖北)科技有限公司 Vehicle whistle positioning method and system based on deep learning
CN116612124B (en) * 2023-07-21 2023-10-20 国网四川省电力公司电力科学研究院 A transmission line defect detection method based on dual-branch serial hybrid attention
CN116721420B (en) * 2023-08-10 2023-10-20 南昌工程学院 A method and system for constructing a semantic segmentation model for UV images of electrical equipment
CN117115443B (en) * 2023-08-18 2024-06-11 中南大学 A segmentation method for identifying small infrared targets
CN117058383B (en) * 2023-08-18 2025-06-20 河南大学 An efficient and lightweight real-time semantic segmentation method for urban street scenes
CN117197763B (en) * 2023-09-07 2025-09-26 湖北工业大学 Road crack detection method and system based on cross-attention guided feature alignment network
CN117095172B (en) * 2023-09-09 2025-07-04 西北工业大学 A continuous semantic segmentation method based on internal and external distillation
CN117314787B (en) * 2023-11-14 2025-07-08 河北工业大学 Underwater image enhancement method based on adaptive multi-scale fusion and attention mechanism
CN117636165A (en) * 2023-11-30 2024-03-01 电子科技大学 A multi-task remote sensing semantic change detection method based on token mixing
CN118212543B (en) * 2023-12-11 2024-10-22 自然资源部国土卫星遥感应用中心 Bilateral fusion and lightweight network improved radiation abnormal target detection method
CN117809294B (en) * 2023-12-29 2024-07-19 天津大学 A text detection method based on feature correction and difference-guided attention
CN117710694B (en) * 2024-01-12 2024-10-22 中国科学院自动化研究所 Multimode characteristic information acquisition method and system, electronic equipment and storage medium
CN117876929B (en) * 2024-01-12 2024-06-21 天津大学 A temporal object localization method based on progressive multi-scale context learning
CN117593633B (en) * 2024-01-19 2024-06-14 宁波海上鲜信息技术股份有限公司 Ocean scene-oriented image recognition method, system, equipment and storage medium
CN117745745B (en) * 2024-02-18 2024-05-10 湖南大学 CT image segmentation method based on context fusion perception
CN118037664B (en) * 2024-02-20 2024-10-01 成都天兴山田车用部品有限公司 Deep hole surface defect detection and CV size calculation method
CN117789153B (en) * 2024-02-26 2024-05-03 浙江驿公里智能科技有限公司 Automobile oil tank outer cover positioning system and method based on computer vision
CN117828280B (en) * 2024-03-05 2024-06-07 山东新科建工消防工程有限公司 Intelligent fire information acquisition and management method based on Internet of things
CN118052739B (en) * 2024-03-08 2025-01-14 东莞理工学院 A traffic image defogging method and intelligent traffic image processing system based on deep learning
CN117993442B (en) * 2024-03-21 2024-10-18 济南大学 Hybrid neural network method and system for fusing local and global information
CN118072357B (en) * 2024-04-16 2024-07-02 南昌理工学院 Control method and system of intelligent massage robot
CN118429808B (en) * 2024-05-10 2024-12-17 北京信息科技大学 Remote sensing image road extraction method and system based on lightweight network structure
CN118230175B (en) * 2024-05-23 2024-08-13 济南市勘察测绘研究院 Real estate mapping data processing method and system based on artificial intelligence
CN118366000B (en) * 2024-06-14 2024-10-29 陕西天润科技股份有限公司 Cultural relic health management method based on digital twinning
CN118587506A (en) * 2024-06-19 2024-09-03 兰州大学 A deep learning-based atmospheric cloud classification method
CN118397298B (en) * 2024-06-28 2024-09-06 杭州安脉盛智能技术有限公司 Self-attention space pyramid pooling method based on mixed pooling and related components
CN118429335B (en) * 2024-07-02 2024-09-24 新疆胜新复合材料有限公司 Online defect detection system and method for carbon fiber sucker rod
CN118470679B (en) * 2024-07-10 2024-09-24 山东省计算中心(国家超级计算济南中心) A lightweight lane line segmentation and recognition method and system
CN118485835B (en) * 2024-07-16 2024-10-01 杭州电子科技大学 Multispectral image semantic segmentation method based on modal divergence difference fusion
CN118898718B (en) * 2024-07-25 2025-04-18 中国矿业大学 A semantic segmentation method with enhanced boundary perception
CN119168951A (en) * 2024-08-27 2024-12-20 上海茹钰生物科技有限公司 Essence liquid automated production line and method thereof
CN118840559B (en) * 2024-09-20 2024-12-13 泉州职业技术大学 Rail surface defect segmentation method and device based on ordered cross-scale feature interaction
CN119048763B (en) * 2024-10-30 2025-04-08 江西师范大学 A colonoscopy polyp image segmentation method based on hybrid model
CN119068201B (en) * 2024-11-04 2025-04-22 江西师范大学 Image segmentation method and system based on multistage multi-scale gradual fusion network
CN119152075B (en) * 2024-11-11 2025-02-14 浙江大学 Object elimination method and device for environment interaction perception association
CN119649022B (en) * 2024-11-14 2025-07-04 华东交通大学 Real-time semantic segmentation tunnel over-excavation and under-excavation monitoring method and system
CN119151802A (en) * 2024-11-15 2024-12-17 无锡学院 Method, system, equipment and storage medium for fusing infrared image and visible light image
CN119577685B (en) * 2024-11-29 2025-09-26 西安电子科技大学 S-D network full-level sensing-based efficient detection system and detection method thereof
CN119314086B (en) * 2024-12-13 2025-03-25 浙江师范大学 Image matting method
CN119810663B (en) * 2024-12-31 2025-09-30 同济大学 Road extraction method based on mixed attention mechanism and direction prior
CN119888285B (en) * 2025-03-26 2025-07-22 厦门理工学院 A multi-scale image matching method and system
CN120216970B (en) * 2025-05-30 2025-09-19 大连理工大学 Remaining life prediction method and device based on multi-scale decomposition enhancement
CN120374992B (en) * 2025-06-30 2025-09-12 江苏富翰医疗产业发展有限公司 Image segmentation method based on spatial attention mechanism
CN120526608B (en) * 2025-07-25 2025-10-03 湖南工商大学 Road traffic flow prediction method based on spatiotemporal hybrid attention network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210432A (en) * 2020-01-12 2020-05-29 湘潭大学 An image semantic segmentation method based on multi-scale and multi-level attention mechanism
CN111915619A (en) * 2020-06-05 2020-11-10 华南理工大学 A fully convolutional network semantic segmentation method with dual feature extraction and fusion
CN111932553A (en) * 2020-07-27 2020-11-13 北京航空航天大学 Remote sensing image semantic segmentation method based on area description self-attention mechanism

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150104102A1 (en) * 2013-10-11 2015-04-16 Universidade De Coimbra Semantic segmentation method with second-order pooling
CN112651973B (en) * 2020-12-14 2022-10-28 南京理工大学 Semantic segmentation method based on cascade of feature pyramid attention and mixed attention
CN113221969A (en) * 2021-04-25 2021-08-06 浙江师范大学 Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210432A (en) * 2020-01-12 2020-05-29 湘潭大学 An image semantic segmentation method based on multi-scale and multi-level attention mechanism
CN111915619A (en) * 2020-06-05 2020-11-10 华南理工大学 A fully convolutional network semantic segmentation method with dual feature extraction and fusion
CN111932553A (en) * 2020-07-27 2020-11-13 北京航空航天大学 Remote sensing image semantic segmentation method based on area description self-attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIANGYAN TANG等: "DFFNet: An IoT-perceptive dual feature fusion network for general real-time semantic segmentation", INFORMATION SCIENCES 565 (2021) 326–343, 12 February 2021 (2021-02-12), pages 2 - 4 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022227913A1 (en) * 2021-04-25 2022-11-03 浙江师范大学 Double-feature fusion semantic segmentation system and method based on internet of things perception
CN114821768A (en) * 2022-03-18 2022-07-29 中国科学院自动化研究所 Skeleton behavior identification method and device and electronic equipment
CN114913325A (en) * 2022-03-24 2022-08-16 北京百度网讯科技有限公司 Semantic segmentation method, device and computer program product
CN114913325B (en) * 2022-03-24 2024-05-10 北京百度网讯科技有限公司 Semantic segmentation method, semantic segmentation device and computer program product
CN114445430A (en) * 2022-04-08 2022-05-06 暨南大学 Real-time image semantic segmentation method and system based on lightweight multi-scale feature fusion
CN114821042A (en) * 2022-04-27 2022-07-29 南京国电南自轨道交通工程有限公司 An R-FCN knife gate detection method combining local and global features
CN114821042B (en) * 2022-04-27 2025-07-22 南京国电南自轨道交通工程有限公司 R-FCN knife switch detection method combining local features and global features
CN116740866A (en) * 2023-08-11 2023-09-12 上海银行股份有限公司 Banknote loading and clearing system and method for self-service machine
CN116740866B (en) * 2023-08-11 2023-10-27 上海银行股份有限公司 Banknote loading and clearing system and method for self-service machine
CN119399457A (en) * 2024-09-18 2025-02-07 广州大学 A real-time semantic segmentation method and system for multi-shape pyramids in traffic scenes
CN119399457B (en) * 2024-09-18 2025-10-03 广州大学 A real-time semantic segmentation method and system for multi-shape pyramids in traffic scenarios

Also Published As

Publication number Publication date
LU503090B1 (en) 2023-03-22
WO2022227913A1 (en) 2022-11-03
ZA202207731B (en) 2022-07-27

Similar Documents

Publication Publication Date Title
CN113221969A (en) Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion
CN112651973B (en) Semantic segmentation method based on cascade of feature pyramid attention and mixed attention
CN112418292B (en) Image quality evaluation method, device, computer equipment and storage medium
CN113516133B (en) Multi-modal image classification method and system
CN112991350A (en) RGB-T image semantic segmentation method based on modal difference reduction
EP4336378A1 (en) Data processing method and related device
CN113486190A (en) Multi-mode knowledge representation method integrating entity image information and entity category information
CN108388900A (en) The video presentation method being combined based on multiple features fusion and space-time attention mechanism
CN113033570A (en) Image semantic segmentation method for improving fusion of void volume and multilevel characteristic information
CN113961736B (en) Method, apparatus, computer device and storage medium for text generation image
CN113066089B (en) A Real-time Image Semantic Segmentation Method Based on Attention Guidance Mechanism
CN113919479B (en) Method for extracting data features and related device
CN114863229A (en) Image classification method and training method and device for image classification model
CN114861907A (en) Data computing method, device, storage medium and device
CN114936901A (en) Visual perception recommendation method and system based on cross-modal semantic reasoning and fusion
CN117033609A (en) Text visual question-answering method, device, computer equipment and storage medium
US11948090B2 (en) Method and apparatus for video coding
CN112966672B (en) Gesture recognition method under complex background
CN111652349A (en) A neural network processing method and related equipment
CN118885601A (en) Personalized recommendation method and system based on emotion-aware knowledge graph convolutional network
CN118154866A (en) A city-level point cloud semantic segmentation system and method based on spatial perception
CN116912268A (en) Skin lesion image segmentation method, device, equipment and storage medium
CN119048399B (en) Image restoration method, system, device and medium integrating cross attention
CN112784831A (en) Character recognition method for enhancing attention mechanism by fusing multilayer features
WO2024174583A1 (en) Model training method and apparatus, and device, storage medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zhu Xinzhong

Inventor after: Xu Huiying

Inventor after: Zhao Jianmin

Inventor before: Zhu Xinzhong

Inventor before: Xu Huiying

Inventor before: Tu Wenxuan

Inventor before: Liu Xinwang

Inventor before: Zhao Jianmin

WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210806

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载