CN115641465B - A remote sensing image classification method based on Transformer lightweight model - Google Patents
A remote sensing image classification method based on Transformer lightweight modelInfo
- Publication number
- CN115641465B CN115641465B CN202211105685.7A CN202211105685A CN115641465B CN 115641465 B CN115641465 B CN 115641465B CN 202211105685 A CN202211105685 A CN 202211105685A CN 115641465 B CN115641465 B CN 115641465B
- Authority
- CN
- China
- Prior art keywords
- module
- remote sensing
- image
- information
- sensing image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a remote sensing image classification method based on a transform lightweight model, which relates to the technical field of image classification and comprises the steps of visualizing and preprocessing an optical remote sensing image according to actual conditions, acquiring image coding information of the optical remote sensing image through an image coding module, carrying out feature extraction on the image coding information of the optical remote sensing image through an MLP module to acquire information codes, extracting feature information in the information codes through an Attention module, carrying out normalization processing on the feature information extracted by the Attention module and the information codes acquired by the MLP module through a residual module after superposition of the feature information and the information codes acquired by the MLP module, and finally carrying out normalization processing through a normalization module and then sending the normalized information and the normalized information codes into a classification module through an FNN module to obtain a final classification result.
Description
Technical Field
The invention relates to the technical field of image classification, in particular to a remote sensing image classification method based on a transform lightweight model.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
The image classification is a research direction in which the fields of computer vision, pattern recognition and machine learning are very active, is widely applied in a plurality of fields including face recognition, intelligent video analysis and the like in the security field, traffic scene object recognition, vehicle counting, license plate recognition in the traffic field, content-based image retrieval, album automatic classification and the like in the internet field, and aims to determine the types of objects in images so as to acquire the image types.
Feature extraction is an important component of image classification, and mainly comprises traditional features (such as a gradient Histogram Oriented (HOG) combined with a Support Vector Machine (SVM)), deep learning-based features (convolutional neural network (CNN) -based methods, transform-based methods, and the like).
In the method based on the transducer, the classical transducer model codes the blocks of the original input image, so that the obtained characteristics are rough and much information is ignored when the image is coded, and meanwhile, the Attention module of the classical transducer model has the problems of high model complexity, low calculation efficiency, long consumption time and the like, and the effect in the field of optical remote sensing image classification cannot be fully reflected.
Disclosure of Invention
Aiming at the defects of large calculation resource consumption and long calculation time of the optical remote sensing image classification model based on a transducer, the invention provides a remote sensing image classification method based on a transducer lightweight model, wherein the training effect on a medium and small data set is higher than that of a classical transducer model, the model calculation amount is reduced, the calculation resource is saved, the classification speed is improved, and the problems are solved.
The technical scheme of the invention is as follows:
A remote sensing image classification method based on a transducer lightweight model comprises the following steps:
Step S1, selecting an optical remote sensing image, and visualizing and preprocessing the selected optical remote sensing image according to actual conditions;
s2, encoding the selected optical remote sensing image through an image encoding module to obtain image encoding information of the selected optical remote sensing image;
s3, extracting features of image coding information of the selected optical remote sensing image through an MLP module to obtain information codes;
s4, extracting characteristic information in the information code through an attribute module;
S5, after the characteristic information extracted by the Attention module and the information code acquired by the MLP module are overlapped by the residual error module, carrying out normalization processing by the normalization module;
and S6, after normalization processing is carried out by the normalization module, the normalization processing is sent into the classification module by the FNN module, so that a final classification result is obtained.
Further, in the step S1, the selected optical remote sensing image is visualized to be converted into a picture, and the preprocessing comprises image rotation, definition adjustment and image size transformation;
the image size transformation includes transforming (224 ) the image size of the selected optical remote sensing image.
Further, in the step S1, the method further includes:
storing the selected optical remote sensing images into different folders according to categories, wherein each folder is named by the name of the category.
Further, the step S2 includes:
The image coding module carries out convolution operation on the selected optical remote sensing image through a convolution layer with the convolution kernel size of 3 layers of 4 layers, and extracts image characteristic information in the optical remote sensing image;
and then the dimension of the image characteristic information is regulated by a convolution layer with the convolution kernel size of 1, and the two-dimensional image characteristic information is converted into one-dimensional image coding information.
Further, in the step S2, the feature dimension finally extracted is (batch, 28,28,192).
Further, the step S3 includes:
step S31, extracting features of the image coding information obtained in the step S2 through an MLP module, and obtaining features with feature dimensions (batch, 14,14,192) through downsampling;
step S32, expanding the features obtained in the step S301 to obtain an information code with feature dimension (batch, 196,192);
and S33, splitting the information code obtained in the step S302 according to characteristic dimensions (batch, 8,196,24), and inputting the split information code to an Attention module and a residual error module as a data input Q.
Further, the calculation formula of the Attention module is as follows:
Wherein, the
D n is a dimension value of the data input Q;
q T transposes the result for the data input Q matrix.
Compared with the prior art, the invention has the beneficial effects that:
1. A remote sensing image classification method based on a light-weight transducer model reduces the parameter quantity of the model while improving the precision of the transducer model, so as to obtain a better classification result, and the precision improvement is very large compared with that of a classical transducer model and a CNN model.
2. A remote sensing image classification method based on a transducer lightweight model can greatly reduce the huge model parameter quantity and calculation amount of the transducer model due to the complexity of the transducer model.
3. According to the remote sensing image classification method based on the transducer lightweight model, through outputting the characteristics of different scales in the model, compared with the characteristic that the characteristics of the traditional transducer model are unchanged in the extraction scale, the characteristics of different scales can be better obtained, and the overall accuracy of the model is improved.
Drawings
FIG. 1 is a flow chart of a remote sensing image classification method based on a transducer lightweight model
FIG. 2 is a general classification flow chart in the first embodiment;
FIG. 3 is a diagram of an example of a data sample;
FIG. 4 is a diagram of a QAttention model structure;
FIG. 5 is a block diagram of an MLP module;
fig. 6 is a precision comparison chart.
Detailed Description
It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The features and capabilities of the present invention are described in further detail below in connection with examples.
Example 1
In the method based on the transducer, the classical transducer model codes the blocks of the original input image, so that the obtained characteristics are rough and much information is ignored when the image is coded, and meanwhile, the Attention module of the classical transducer model has the problems of high model complexity, low calculation efficiency, long consumption time and the like, and the effect in the field of optical remote sensing image classification cannot be fully reflected.
The image classification model based on the transducer has the defects of high complexity, difficult convergence of model training and the like, and has poor training effect on small and medium-sized data sets such as optical remote sensing image classification and the like.
The embodiment provides a remote sensing image classification method based on a light-weight model of a transducer, which aims at the problems of high model complexity, low calculation efficiency, long consumption time, difficult convergence of a medium-small data set and the like of a classical transducer model, and improves the fitting capacity of the model, reduces network parameters, improves the average precision and training speed of optical remote sensing image classification and can be used for acquiring the category identification condition of an optical remote sensing image by modifying the model structure and using an improved QAttention module, a convolution realization form of an MLP module and an improved image coding module.
It should be noted that, please refer to fig. 2, the overall classification flow includes obtaining training samples and test samples in the optical remote sensing image classification dataset, performing data enhancement on the training samples, constructing an image coding module, a QAttention module, an MLP module and a classification module, inputting the training samples into the constructed classification model for training to obtain a trained model, inputting the test samples into the trained model, and predicting and outputting the optical remote sensing image classification result.
With respect to the image coding module, since the image coding module of the classical transducer model is rough in extracting image features, the convergence of the model and the final accuracy result are greatly affected, in this embodiment, in order to improve the stability of model training, the feature information in the original image is extracted as perfectly as possible, and downsampling is performed by adopting multi-layer convolution, so that the problem that the Vision Transformer model is not easy to train in small-scale data is greatly relieved.
Regarding QAttention modules, the original calculation formula of the Attention module is as follows:
the calculation formula after removing the V parameter is as follows:
the calculation formula for the Q parameter only remains as follows:
After experimental comparison, the test precision of different structures is found to have no obvious difference, and the parameter number and the calculated amount of the model are reduced along with the reduction of the Q, K and V parameters, so in the embodiment, the calculation formula which only keeps the Q parameter is decided to be used as the calculation formula of QAttention modules, namely:
The QAttention block diagram is shown in figure 4.
Regarding the MLP module, the realization of convolution operation is realized by depending on the linear operation of elements in the receptive field, so in the realization of the MLP module, a convolution layer with a convolution kernel of 1 is considered as a fully connected model in a feature diagram, and two-dimensional convolution is used for replacing a one-dimensional MLP module to perform operation, thereby further reducing the parameter quantity of the model and improving the operation efficiency and stability of the model.
The structure of the MLP module is shown in fig. 5.
Regarding the classification module, the classification module is not modified in this embodiment, and those skilled in the art should know the operation principle thereof, and therefore, a detailed description thereof will be omitted.
Referring to fig. 1-6, a remote sensing image classification method based on a transducer lightweight model is also a process of training model establishment, and includes:
the method comprises the steps of S1, selecting an optical remote sensing image, and carrying out visualization and pretreatment on the selected optical remote sensing image according to actual conditions, wherein in the step S1, the visualization is preferably carried out to convert the selected optical remote sensing image into a picture;
the image size transformation includes transforming (224 ) the image size of the selected optical remote sensing image to a data sample schematic as shown in FIG. 3;
The step S1 further comprises the steps of storing the selected optical remote sensing images into different folders according to categories, naming each folder by the name of the category, then placing all the optical remote sensing images belonging to the category into the folders, and for example, each folder contains 700 optical remote sensing images in the category of the book in NWPU-RESISC format.
Step S2, encoding the selected optical remote sensing image through an image encoding module to obtain image encoding information of the selected optical remote sensing image, wherein preferably, the step S2 comprises the following steps:
The image coding module carries out convolution operation on the selected optical remote sensing image through a convolution layer with the convolution kernel size of 3 layers of 4 layers, and extracts image characteristic information in the optical remote sensing image;
And then the dimension of the image characteristic information is regulated by a convolution layer with the convolution kernel size of 1, and the two-dimensional image characteristic information is converted into one-dimensional image coding information, so that the dimension of the finally extracted characteristic is (batch, 28,28,192).
Step S3, extracting features of image coding information of the selected optical remote sensing image through an MLP module to obtain information codes, wherein the step S3 preferably comprises the following steps:
step S31, extracting features of the image coding information obtained in the step S2 through an MLP module, and obtaining features with feature dimensions (batch, 14,14,192) through downsampling;
step S32, expanding the features obtained in the step S301 to obtain an information code with feature dimension (batch, 196,192);
and S33, splitting the information code obtained in the step S302 according to characteristic dimensions (batch, 8,196,24), and inputting the split information code to an Attention module and a residual error module as a data input Q.
The characteristic information in the information code is extracted through the Attention module, preferably, the data input Q is input into the following formula to operate, so that the characteristic information in the information code is extracted;
The calculation formula of the Attention module is as follows:
Wherein, the
D n is a dimension value of the data input Q;
q T transposes the result for the data input Q matrix.
And S5, after the characteristic information extracted by the Attention module and the information code acquired by the MLP module are overlapped by the residual error module, carrying out normalization processing by the normalization module, wherein the residual error module and the normalization module are used for improving the convergence capacity of the model and preventing the problem of model precision reduction caused by gradient dispersion.
Step S6, after normalization processing is carried out by the normalization module, the normalization processing is sent into the classification module by the FNN module, so that a final classification result is obtained; the function of the FNN module is to adjust the internal channel structure of the model, so that characteristic information of different positions can be focused among QAttention modules, and the accuracy of the model is improved.
The number of parameters of QAttention modules in this embodiment is 32.11M, and the calculated amount is 7.07GFlops, which is significantly reduced by 9.42GFlops compared with 48.06M of parameters of the original Attention module.
Referring to fig. 6, in the remote sensing image classification method based on the light-weight model of the transducer, classification accuracy on NWPU-RESISC data sets of 20% training set is 79.76%, which is improved by 13.36% compared with 66.4% of the classical model of the transducer, is improved by 4.53% compared with Resnet, and accuracy improvement is obvious, and the condition that the transducer model is difficult to fit is greatly relieved.
The remote sensing image classification method based on the transducer lightweight model has the classification speed of 824.4 pieces/s, and is obviously improved compared with 759.6 pieces/s of a classical transducer model.
The above examples merely illustrate specific embodiments of the application, which are described in more detail and are not to be construed as limiting the scope of the application. It should be noted that it is possible for a person skilled in the art to make several variants and modifications without departing from the technical idea of the application, which fall within the scope of protection of the application.
This background section is provided to generally present the context of the present invention and the work of the presently named inventors, to the extent it is described in this background section, as well as the description of the present section as not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present invention.
Claims (6)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211105685.7A CN115641465B (en) | 2022-09-09 | 2022-09-09 | A remote sensing image classification method based on Transformer lightweight model |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211105685.7A CN115641465B (en) | 2022-09-09 | 2022-09-09 | A remote sensing image classification method based on Transformer lightweight model |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN115641465A CN115641465A (en) | 2023-01-24 |
| CN115641465B true CN115641465B (en) | 2025-09-12 |
Family
ID=84941203
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211105685.7A Active CN115641465B (en) | 2022-09-09 | 2022-09-09 | A remote sensing image classification method based on Transformer lightweight model |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115641465B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116310506B (en) * | 2023-02-09 | 2025-09-12 | 燕山大学 | Image classification method based on Transformer lightweight model |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111126282A (en) * | 2019-12-25 | 2020-05-08 | 中国矿业大学 | A Content Description Method for Remote Sensing Images Based on Variational Self-Attention Reinforcement Learning |
| CN112214599A (en) * | 2020-10-20 | 2021-01-12 | 电子科技大学 | Multi-label text classification method based on statistics and pre-trained language models |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110287962B (en) * | 2019-05-20 | 2023-10-27 | 平安科技(深圳)有限公司 | Remote sensing image target extraction method, device and medium based on super object information |
| CN111428781A (en) * | 2020-03-20 | 2020-07-17 | 中国科学院深圳先进技术研究院 | Remote sensing image feature classification method and system |
| CN112732913B (en) * | 2020-12-30 | 2023-08-22 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for classifying unbalanced samples |
| CN112991351B (en) * | 2021-02-23 | 2022-05-27 | 新华三大数据技术有限公司 | Remote sensing image semantic segmentation method and device and storage medium |
-
2022
- 2022-09-09 CN CN202211105685.7A patent/CN115641465B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111126282A (en) * | 2019-12-25 | 2020-05-08 | 中国矿业大学 | A Content Description Method for Remote Sensing Images Based on Variational Self-Attention Reinforcement Learning |
| CN112214599A (en) * | 2020-10-20 | 2021-01-12 | 电子科技大学 | Multi-label text classification method based on statistics and pre-trained language models |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115641465A (en) | 2023-01-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111898736B (en) | An Efficient Pedestrian Re-identification Method Based on Attribute Awareness | |
| CN111582044B (en) | Face recognition method based on convolutional neural network and attention model | |
| CN107577990B (en) | A large-scale face recognition method based on GPU-accelerated retrieval | |
| CN115641473A (en) | Remote sensing image classification method based on CNN-self-attention mechanism hybrid architecture | |
| WO2020164278A1 (en) | Image processing method and device, electronic equipment and readable storage medium | |
| CN116012709B (en) | High-resolution remote sensing image building extraction method and system | |
| CN116612335B (en) | A few-sample fine-grained image classification method based on contrastive learning | |
| CN113901924B (en) | Document table detection method and device | |
| CN117292442B (en) | A cross-modal and cross-domain universal face forgery positioning method | |
| Jabeen et al. | A deep multimodal system for provenance filtering with universal forgery detection and localization | |
| CN115640401A (en) | Text content extraction method and device | |
| CN113837151A (en) | Table image processing method and device, computer equipment and readable storage medium | |
| CN118072349A (en) | A textual person re-identification method based on cross-modal semantic alignment | |
| CN115641465B (en) | A remote sensing image classification method based on Transformer lightweight model | |
| CN118887511A (en) | A YOLOv8-based infrared remote sensing image target detection method, electronic device, and computer-readable storage medium | |
| CN114882412B (en) | Annotated and associated short video emotion recognition method and system based on vision and language | |
| CN120145098A (en) | A market violation detection method based on multimodal image and text fusion | |
| CN114565970A (en) | High-precision multi-angle behavior recognition method based on deep learning | |
| CN119007018A (en) | River channel target identification method and device based on improved YOLOv n model | |
| CN117152142B (en) | Bearing defect detection model construction method and system | |
| CN113012152A (en) | Image tampering chain detection method and device and electronic equipment | |
| CN117612266A (en) | Cross-resolution pedestrian re-identification method based on multi-scale image and feature layer alignment | |
| CN102609732A (en) | Object recognition method based on generalization visual dictionary diagram | |
| CN114491103B (en) | Internet of things cross-media big data retrieval method based on multi-mark depth association analysis | |
| CN114120315A (en) | Method and device for identifying electron microscope viruses based on small sample |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |