+

CN115641465B - A remote sensing image classification method based on Transformer lightweight model - Google Patents

A remote sensing image classification method based on Transformer lightweight model

Info

Publication number
CN115641465B
CN115641465B CN202211105685.7A CN202211105685A CN115641465B CN 115641465 B CN115641465 B CN 115641465B CN 202211105685 A CN202211105685 A CN 202211105685A CN 115641465 B CN115641465 B CN 115641465B
Authority
CN
China
Prior art keywords
module
remote sensing
image
information
sensing image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211105685.7A
Other languages
Chinese (zh)
Other versions
CN115641465A (en
Inventor
李玲玲
孙钰凯
马晶晶
焦李成
刘芳
赵帅
吴文童
刘旭
张梦璇
张丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202211105685.7A priority Critical patent/CN115641465B/en
Publication of CN115641465A publication Critical patent/CN115641465A/en
Application granted granted Critical
Publication of CN115641465B publication Critical patent/CN115641465B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a remote sensing image classification method based on a transform lightweight model, which relates to the technical field of image classification and comprises the steps of visualizing and preprocessing an optical remote sensing image according to actual conditions, acquiring image coding information of the optical remote sensing image through an image coding module, carrying out feature extraction on the image coding information of the optical remote sensing image through an MLP module to acquire information codes, extracting feature information in the information codes through an Attention module, carrying out normalization processing on the feature information extracted by the Attention module and the information codes acquired by the MLP module through a residual module after superposition of the feature information and the information codes acquired by the MLP module, and finally carrying out normalization processing through a normalization module and then sending the normalized information and the normalized information codes into a classification module through an FNN module to obtain a final classification result.

Description

Remote sensing image classification method based on Transformer lightweight model
Technical Field
The invention relates to the technical field of image classification, in particular to a remote sensing image classification method based on a transform lightweight model.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
The image classification is a research direction in which the fields of computer vision, pattern recognition and machine learning are very active, is widely applied in a plurality of fields including face recognition, intelligent video analysis and the like in the security field, traffic scene object recognition, vehicle counting, license plate recognition in the traffic field, content-based image retrieval, album automatic classification and the like in the internet field, and aims to determine the types of objects in images so as to acquire the image types.
Feature extraction is an important component of image classification, and mainly comprises traditional features (such as a gradient Histogram Oriented (HOG) combined with a Support Vector Machine (SVM)), deep learning-based features (convolutional neural network (CNN) -based methods, transform-based methods, and the like).
In the method based on the transducer, the classical transducer model codes the blocks of the original input image, so that the obtained characteristics are rough and much information is ignored when the image is coded, and meanwhile, the Attention module of the classical transducer model has the problems of high model complexity, low calculation efficiency, long consumption time and the like, and the effect in the field of optical remote sensing image classification cannot be fully reflected.
Disclosure of Invention
Aiming at the defects of large calculation resource consumption and long calculation time of the optical remote sensing image classification model based on a transducer, the invention provides a remote sensing image classification method based on a transducer lightweight model, wherein the training effect on a medium and small data set is higher than that of a classical transducer model, the model calculation amount is reduced, the calculation resource is saved, the classification speed is improved, and the problems are solved.
The technical scheme of the invention is as follows:
A remote sensing image classification method based on a transducer lightweight model comprises the following steps:
Step S1, selecting an optical remote sensing image, and visualizing and preprocessing the selected optical remote sensing image according to actual conditions;
s2, encoding the selected optical remote sensing image through an image encoding module to obtain image encoding information of the selected optical remote sensing image;
s3, extracting features of image coding information of the selected optical remote sensing image through an MLP module to obtain information codes;
s4, extracting characteristic information in the information code through an attribute module;
S5, after the characteristic information extracted by the Attention module and the information code acquired by the MLP module are overlapped by the residual error module, carrying out normalization processing by the normalization module;
and S6, after normalization processing is carried out by the normalization module, the normalization processing is sent into the classification module by the FNN module, so that a final classification result is obtained.
Further, in the step S1, the selected optical remote sensing image is visualized to be converted into a picture, and the preprocessing comprises image rotation, definition adjustment and image size transformation;
the image size transformation includes transforming (224 ) the image size of the selected optical remote sensing image.
Further, in the step S1, the method further includes:
storing the selected optical remote sensing images into different folders according to categories, wherein each folder is named by the name of the category.
Further, the step S2 includes:
The image coding module carries out convolution operation on the selected optical remote sensing image through a convolution layer with the convolution kernel size of 3 layers of 4 layers, and extracts image characteristic information in the optical remote sensing image;
and then the dimension of the image characteristic information is regulated by a convolution layer with the convolution kernel size of 1, and the two-dimensional image characteristic information is converted into one-dimensional image coding information.
Further, in the step S2, the feature dimension finally extracted is (batch, 28,28,192).
Further, the step S3 includes:
step S31, extracting features of the image coding information obtained in the step S2 through an MLP module, and obtaining features with feature dimensions (batch, 14,14,192) through downsampling;
step S32, expanding the features obtained in the step S301 to obtain an information code with feature dimension (batch, 196,192);
and S33, splitting the information code obtained in the step S302 according to characteristic dimensions (batch, 8,196,24), and inputting the split information code to an Attention module and a residual error module as a data input Q.
Further, the calculation formula of the Attention module is as follows:
Wherein, the
D n is a dimension value of the data input Q;
q T transposes the result for the data input Q matrix.
Compared with the prior art, the invention has the beneficial effects that:
1. A remote sensing image classification method based on a light-weight transducer model reduces the parameter quantity of the model while improving the precision of the transducer model, so as to obtain a better classification result, and the precision improvement is very large compared with that of a classical transducer model and a CNN model.
2. A remote sensing image classification method based on a transducer lightweight model can greatly reduce the huge model parameter quantity and calculation amount of the transducer model due to the complexity of the transducer model.
3. According to the remote sensing image classification method based on the transducer lightweight model, through outputting the characteristics of different scales in the model, compared with the characteristic that the characteristics of the traditional transducer model are unchanged in the extraction scale, the characteristics of different scales can be better obtained, and the overall accuracy of the model is improved.
Drawings
FIG. 1 is a flow chart of a remote sensing image classification method based on a transducer lightweight model
FIG. 2 is a general classification flow chart in the first embodiment;
FIG. 3 is a diagram of an example of a data sample;
FIG. 4 is a diagram of a QAttention model structure;
FIG. 5 is a block diagram of an MLP module;
fig. 6 is a precision comparison chart.
Detailed Description
It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The features and capabilities of the present invention are described in further detail below in connection with examples.
Example 1
In the method based on the transducer, the classical transducer model codes the blocks of the original input image, so that the obtained characteristics are rough and much information is ignored when the image is coded, and meanwhile, the Attention module of the classical transducer model has the problems of high model complexity, low calculation efficiency, long consumption time and the like, and the effect in the field of optical remote sensing image classification cannot be fully reflected.
The image classification model based on the transducer has the defects of high complexity, difficult convergence of model training and the like, and has poor training effect on small and medium-sized data sets such as optical remote sensing image classification and the like.
The embodiment provides a remote sensing image classification method based on a light-weight model of a transducer, which aims at the problems of high model complexity, low calculation efficiency, long consumption time, difficult convergence of a medium-small data set and the like of a classical transducer model, and improves the fitting capacity of the model, reduces network parameters, improves the average precision and training speed of optical remote sensing image classification and can be used for acquiring the category identification condition of an optical remote sensing image by modifying the model structure and using an improved QAttention module, a convolution realization form of an MLP module and an improved image coding module.
It should be noted that, please refer to fig. 2, the overall classification flow includes obtaining training samples and test samples in the optical remote sensing image classification dataset, performing data enhancement on the training samples, constructing an image coding module, a QAttention module, an MLP module and a classification module, inputting the training samples into the constructed classification model for training to obtain a trained model, inputting the test samples into the trained model, and predicting and outputting the optical remote sensing image classification result.
With respect to the image coding module, since the image coding module of the classical transducer model is rough in extracting image features, the convergence of the model and the final accuracy result are greatly affected, in this embodiment, in order to improve the stability of model training, the feature information in the original image is extracted as perfectly as possible, and downsampling is performed by adopting multi-layer convolution, so that the problem that the Vision Transformer model is not easy to train in small-scale data is greatly relieved.
Regarding QAttention modules, the original calculation formula of the Attention module is as follows:
the calculation formula after removing the V parameter is as follows:
the calculation formula for the Q parameter only remains as follows:
After experimental comparison, the test precision of different structures is found to have no obvious difference, and the parameter number and the calculated amount of the model are reduced along with the reduction of the Q, K and V parameters, so in the embodiment, the calculation formula which only keeps the Q parameter is decided to be used as the calculation formula of QAttention modules, namely:
The QAttention block diagram is shown in figure 4.
Regarding the MLP module, the realization of convolution operation is realized by depending on the linear operation of elements in the receptive field, so in the realization of the MLP module, a convolution layer with a convolution kernel of 1 is considered as a fully connected model in a feature diagram, and two-dimensional convolution is used for replacing a one-dimensional MLP module to perform operation, thereby further reducing the parameter quantity of the model and improving the operation efficiency and stability of the model.
The structure of the MLP module is shown in fig. 5.
Regarding the classification module, the classification module is not modified in this embodiment, and those skilled in the art should know the operation principle thereof, and therefore, a detailed description thereof will be omitted.
Referring to fig. 1-6, a remote sensing image classification method based on a transducer lightweight model is also a process of training model establishment, and includes:
the method comprises the steps of S1, selecting an optical remote sensing image, and carrying out visualization and pretreatment on the selected optical remote sensing image according to actual conditions, wherein in the step S1, the visualization is preferably carried out to convert the selected optical remote sensing image into a picture;
the image size transformation includes transforming (224 ) the image size of the selected optical remote sensing image to a data sample schematic as shown in FIG. 3;
The step S1 further comprises the steps of storing the selected optical remote sensing images into different folders according to categories, naming each folder by the name of the category, then placing all the optical remote sensing images belonging to the category into the folders, and for example, each folder contains 700 optical remote sensing images in the category of the book in NWPU-RESISC format.
Step S2, encoding the selected optical remote sensing image through an image encoding module to obtain image encoding information of the selected optical remote sensing image, wherein preferably, the step S2 comprises the following steps:
The image coding module carries out convolution operation on the selected optical remote sensing image through a convolution layer with the convolution kernel size of 3 layers of 4 layers, and extracts image characteristic information in the optical remote sensing image;
And then the dimension of the image characteristic information is regulated by a convolution layer with the convolution kernel size of 1, and the two-dimensional image characteristic information is converted into one-dimensional image coding information, so that the dimension of the finally extracted characteristic is (batch, 28,28,192).
Step S3, extracting features of image coding information of the selected optical remote sensing image through an MLP module to obtain information codes, wherein the step S3 preferably comprises the following steps:
step S31, extracting features of the image coding information obtained in the step S2 through an MLP module, and obtaining features with feature dimensions (batch, 14,14,192) through downsampling;
step S32, expanding the features obtained in the step S301 to obtain an information code with feature dimension (batch, 196,192);
and S33, splitting the information code obtained in the step S302 according to characteristic dimensions (batch, 8,196,24), and inputting the split information code to an Attention module and a residual error module as a data input Q.
The characteristic information in the information code is extracted through the Attention module, preferably, the data input Q is input into the following formula to operate, so that the characteristic information in the information code is extracted;
The calculation formula of the Attention module is as follows:
Wherein, the
D n is a dimension value of the data input Q;
q T transposes the result for the data input Q matrix.
And S5, after the characteristic information extracted by the Attention module and the information code acquired by the MLP module are overlapped by the residual error module, carrying out normalization processing by the normalization module, wherein the residual error module and the normalization module are used for improving the convergence capacity of the model and preventing the problem of model precision reduction caused by gradient dispersion.
Step S6, after normalization processing is carried out by the normalization module, the normalization processing is sent into the classification module by the FNN module, so that a final classification result is obtained; the function of the FNN module is to adjust the internal channel structure of the model, so that characteristic information of different positions can be focused among QAttention modules, and the accuracy of the model is improved.
The number of parameters of QAttention modules in this embodiment is 32.11M, and the calculated amount is 7.07GFlops, which is significantly reduced by 9.42GFlops compared with 48.06M of parameters of the original Attention module.
Referring to fig. 6, in the remote sensing image classification method based on the light-weight model of the transducer, classification accuracy on NWPU-RESISC data sets of 20% training set is 79.76%, which is improved by 13.36% compared with 66.4% of the classical model of the transducer, is improved by 4.53% compared with Resnet, and accuracy improvement is obvious, and the condition that the transducer model is difficult to fit is greatly relieved.
The remote sensing image classification method based on the transducer lightweight model has the classification speed of 824.4 pieces/s, and is obviously improved compared with 759.6 pieces/s of a classical transducer model.
The above examples merely illustrate specific embodiments of the application, which are described in more detail and are not to be construed as limiting the scope of the application. It should be noted that it is possible for a person skilled in the art to make several variants and modifications without departing from the technical idea of the application, which fall within the scope of protection of the application.
This background section is provided to generally present the context of the present invention and the work of the presently named inventors, to the extent it is described in this background section, as well as the description of the present section as not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present invention.

Claims (6)

1.一种基于Transformer轻量化模型的遥感图像分类方法,其特征在于,包括:1. A remote sensing image classification method based on a Transformer lightweight model, comprising: 步骤S1:选取光学遥感图像,对所选光学遥感图像进行可视化和预处理;Step S1: Select an optical remote sensing image, and visualize and preprocess the selected optical remote sensing image; 步骤S2:通过图像编码模块对所选光学遥感图像进行编码,获取所选光学遥感图像的图像编码信息;Step S2: encoding the selected optical remote sensing image through an image encoding module to obtain image encoding information of the selected optical remote sensing image; 步骤S3:通过MLP模块对所选光学遥感图像的图像编码信息进行特征提取,获取信息编码;Step S3: extracting features from the image coding information of the selected optical remote sensing image through the MLP module to obtain information coding; 步骤S4:通过Attention模块提取信息编码中的特征信息;Step S4: extracting feature information from information encoding through the Attention module; 步骤S5:Attention模块提取出的特征信息和MLP模块获取的信息编码经残差模块叠加后,由归一化模块进行归一化处理;Step S5: The feature information extracted by the Attention module and the information encoding obtained by the MLP module are superimposed by the residual module and then normalized by the normalization module; 步骤S6:归一化模块进行归一化处理后,经FNN模块送入分类模块中,从而得出最终的分类结果;Step S6: After the normalization module performs normalization processing, it is sent to the classification module through the FNN module to obtain the final classification result; 所述Attention模块的计算公式如下:The calculation formula of the Attention module is as follows: 其中,in, d n为数据输入Q的维度值; d n is the dimension value of data input Q; Q T为数据输入Q矩阵转置结果。 Q T is the result of transposing the data input Q matrix. 2.根据权利要求1所述的一种基于Transformer轻量化模型的遥感图像分类方法,其特征在于,所述步骤S1中,可视化为将所选光学遥感图像转成图片;预处理,包括:图像的旋转、清晰度调整、图像尺寸变换。2. A remote sensing image classification method based on a Transformer lightweight model according to claim 1, characterized in that in step S1, visualization is to convert the selected optical remote sensing image into a picture; preprocessing includes: image rotation, clarity adjustment, and image size conversion. 3.根据权利要求1所述的一种基于Transformer轻量化模型的遥感图像分类方法,其特征在于,所述步骤S1中,还包括:3. The remote sensing image classification method based on the Transformer lightweight model according to claim 1, characterized in that step S1 further includes: 将所选光学遥感图像按类别存放至不同文件夹内,每个文件夹以该类别的名字命名。The selected optical remote sensing images are stored in different folders according to categories, and each folder is named after the category. 4.根据权利要求1所述的一种基于Transformer轻量化模型的遥感图像分类方法,其特征在于,所述步骤S2,包括:4. The remote sensing image classification method based on the Transformer lightweight model according to claim 1, wherein step S2 comprises: 所述图像编码模块将所选光学遥感图像经过4层卷积核大小为3的卷积层进行卷积运算,提取其中的图像特征信息;The image encoding module performs convolution operation on the selected optical remote sensing image through 4 convolution layers with a convolution kernel size of 3 to extract image feature information therein; 再通过一个卷积核大小为1的卷积层调节图像特征信息的维度,将二维图像特征信息转化为一维图像编码信息。Then, a convolution layer with a convolution kernel size of 1 is used to adjust the dimension of the image feature information, and the two-dimensional image feature information is converted into one-dimensional image coding information. 5.根据权利要求4所述的一种基于Transformer轻量化模型的遥感图像分类方法,其特征在于,所述步骤S2中,最终提取的特征维度为(batch,28,28,192)。5. A remote sensing image classification method based on a Transformer lightweight model according to claim 4, characterized in that in step S2, the feature dimension finally extracted is (batch, 28, 28, 192). 6.根据权利要求1所述的一种基于Transformer轻量化模型的遥感图像分类方法,其特征在于,所述步骤S3,包括:6. The remote sensing image classification method based on the Transformer lightweight model according to claim 1, wherein step S3 comprises: 步骤S31:将步骤S2得到的图像编码信息经MLP模块进行特征提取,并经过下采样得到特征维度为(batch,14,14,192)的特征;Step S31: The image coding information obtained in step S2 is subjected to feature extraction by the MLP module, and features with a feature dimension of (batch, 14, 14, 192) are obtained by downsampling. 步骤S32:将步骤S301中得到的特征展开,得到特征维度为(batch,196,192)的信息编码;Step S32: Expand the features obtained in step S301 to obtain information encoding with a feature dimension of (batch, 196, 192); 步骤S33:将步骤S302中得到的信息编码,按照特征维度(batch,8,196,24)进行拆分,拆分后的信息编码作为数据输入Q,输入至Attention模块和残差模块。Step S33: Split the information code obtained in step S302 according to the feature dimensions (batch, 8, 196, 24), and use the split information code as data input Q to input into the Attention module and the residual module.
CN202211105685.7A 2022-09-09 2022-09-09 A remote sensing image classification method based on Transformer lightweight model Active CN115641465B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211105685.7A CN115641465B (en) 2022-09-09 2022-09-09 A remote sensing image classification method based on Transformer lightweight model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211105685.7A CN115641465B (en) 2022-09-09 2022-09-09 A remote sensing image classification method based on Transformer lightweight model

Publications (2)

Publication Number Publication Date
CN115641465A CN115641465A (en) 2023-01-24
CN115641465B true CN115641465B (en) 2025-09-12

Family

ID=84941203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211105685.7A Active CN115641465B (en) 2022-09-09 2022-09-09 A remote sensing image classification method based on Transformer lightweight model

Country Status (1)

Country Link
CN (1) CN115641465B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310506B (en) * 2023-02-09 2025-09-12 燕山大学 Image classification method based on Transformer lightweight model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126282A (en) * 2019-12-25 2020-05-08 中国矿业大学 A Content Description Method for Remote Sensing Images Based on Variational Self-Attention Reinforcement Learning
CN112214599A (en) * 2020-10-20 2021-01-12 电子科技大学 Multi-label text classification method based on statistics and pre-trained language models

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287962B (en) * 2019-05-20 2023-10-27 平安科技(深圳)有限公司 Remote sensing image target extraction method, device and medium based on super object information
CN111428781A (en) * 2020-03-20 2020-07-17 中国科学院深圳先进技术研究院 Remote sensing image feature classification method and system
CN112732913B (en) * 2020-12-30 2023-08-22 平安科技(深圳)有限公司 Method, device, equipment and storage medium for classifying unbalanced samples
CN112991351B (en) * 2021-02-23 2022-05-27 新华三大数据技术有限公司 Remote sensing image semantic segmentation method and device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126282A (en) * 2019-12-25 2020-05-08 中国矿业大学 A Content Description Method for Remote Sensing Images Based on Variational Self-Attention Reinforcement Learning
CN112214599A (en) * 2020-10-20 2021-01-12 电子科技大学 Multi-label text classification method based on statistics and pre-trained language models

Also Published As

Publication number Publication date
CN115641465A (en) 2023-01-24

Similar Documents

Publication Publication Date Title
CN111898736B (en) An Efficient Pedestrian Re-identification Method Based on Attribute Awareness
CN111582044B (en) Face recognition method based on convolutional neural network and attention model
CN107577990B (en) A large-scale face recognition method based on GPU-accelerated retrieval
CN115641473A (en) Remote sensing image classification method based on CNN-self-attention mechanism hybrid architecture
WO2020164278A1 (en) Image processing method and device, electronic equipment and readable storage medium
CN116012709B (en) High-resolution remote sensing image building extraction method and system
CN116612335B (en) A few-sample fine-grained image classification method based on contrastive learning
CN113901924B (en) Document table detection method and device
CN117292442B (en) A cross-modal and cross-domain universal face forgery positioning method
Jabeen et al. A deep multimodal system for provenance filtering with universal forgery detection and localization
CN115640401A (en) Text content extraction method and device
CN113837151A (en) Table image processing method and device, computer equipment and readable storage medium
CN118072349A (en) A textual person re-identification method based on cross-modal semantic alignment
CN115641465B (en) A remote sensing image classification method based on Transformer lightweight model
CN118887511A (en) A YOLOv8-based infrared remote sensing image target detection method, electronic device, and computer-readable storage medium
CN114882412B (en) Annotated and associated short video emotion recognition method and system based on vision and language
CN120145098A (en) A market violation detection method based on multimodal image and text fusion
CN114565970A (en) High-precision multi-angle behavior recognition method based on deep learning
CN119007018A (en) River channel target identification method and device based on improved YOLOv n model
CN117152142B (en) Bearing defect detection model construction method and system
CN113012152A (en) Image tampering chain detection method and device and electronic equipment
CN117612266A (en) Cross-resolution pedestrian re-identification method based on multi-scale image and feature layer alignment
CN102609732A (en) Object recognition method based on generalization visual dictionary diagram
CN114491103B (en) Internet of things cross-media big data retrieval method based on multi-mark depth association analysis
CN114120315A (en) Method and device for identifying electron microscope viruses based on small sample

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载