CN115641465B

CN115641465B - A remote sensing image classification method based on Transformer lightweight model

Info

Publication number: CN115641465B
Application number: CN202211105685.7A
Authority: CN
Inventors: 李玲玲; 孙钰凯; 马晶晶; 焦李成; 刘芳; 赵帅; 吴文童; 刘旭; 张梦璇; 张丹
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-09-09
Filing date: 2022-09-09
Publication date: 2025-09-12
Anticipated expiration: 2042-09-09
Also published as: CN115641465A

Abstract

The invention discloses a remote sensing image classification method based on a transform lightweight model, which relates to the technical field of image classification and comprises the steps of visualizing and preprocessing an optical remote sensing image according to actual conditions, acquiring image coding information of the optical remote sensing image through an image coding module, carrying out feature extraction on the image coding information of the optical remote sensing image through an MLP module to acquire information codes, extracting feature information in the information codes through an Attention module, carrying out normalization processing on the feature information extracted by the Attention module and the information codes acquired by the MLP module through a residual module after superposition of the feature information and the information codes acquired by the MLP module, and finally carrying out normalization processing through a normalization module and then sending the normalized information and the normalized information codes into a classification module through an FNN module to obtain a final classification result.

Description

Remote sensing image classification method based on Transformer lightweight model

Technical Field

The invention relates to the technical field of image classification, in particular to a remote sensing image classification method based on a transform lightweight model.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

The image classification is a research direction in which the fields of computer vision, pattern recognition and machine learning are very active, is widely applied in a plurality of fields including face recognition, intelligent video analysis and the like in the security field, traffic scene object recognition, vehicle counting, license plate recognition in the traffic field, content-based image retrieval, album automatic classification and the like in the internet field, and aims to determine the types of objects in images so as to acquire the image types.

Feature extraction is an important component of image classification, and mainly comprises traditional features (such as a gradient Histogram Oriented (HOG) combined with a Support Vector Machine (SVM)), deep learning-based features (convolutional neural network (CNN) -based methods, transform-based methods, and the like).

In the method based on the transducer, the classical transducer model codes the blocks of the original input image, so that the obtained characteristics are rough and much information is ignored when the image is coded, and meanwhile, the Attention module of the classical transducer model has the problems of high model complexity, low calculation efficiency, long consumption time and the like, and the effect in the field of optical remote sensing image classification cannot be fully reflected.

Disclosure of Invention

Aiming at the defects of large calculation resource consumption and long calculation time of the optical remote sensing image classification model based on a transducer, the invention provides a remote sensing image classification method based on a transducer lightweight model, wherein the training effect on a medium and small data set is higher than that of a classical transducer model, the model calculation amount is reduced, the calculation resource is saved, the classification speed is improved, and the problems are solved.

The technical scheme of the invention is as follows:

A remote sensing image classification method based on a transducer lightweight model comprises the following steps:

Step S1, selecting an optical remote sensing image, and visualizing and preprocessing the selected optical remote sensing image according to actual conditions;

s2, encoding the selected optical remote sensing image through an image encoding module to obtain image encoding information of the selected optical remote sensing image;

s3, extracting features of image coding information of the selected optical remote sensing image through an MLP module to obtain information codes;

s4, extracting characteristic information in the information code through an attribute module;

S5, after the characteristic information extracted by the Attention module and the information code acquired by the MLP module are overlapped by the residual error module, carrying out normalization processing by the normalization module;

and S6, after normalization processing is carried out by the normalization module, the normalization processing is sent into the classification module by the FNN module, so that a final classification result is obtained.

Further, in the step S1, the selected optical remote sensing image is visualized to be converted into a picture, and the preprocessing comprises image rotation, definition adjustment and image size transformation;

the image size transformation includes transforming (224 ) the image size of the selected optical remote sensing image.

Further, in the step S1, the method further includes:

storing the selected optical remote sensing images into different folders according to categories, wherein each folder is named by the name of the category.

Further, the step S2 includes:

The image coding module carries out convolution operation on the selected optical remote sensing image through a convolution layer with the convolution kernel size of 3 layers of 4 layers, and extracts image characteristic information in the optical remote sensing image;

and then the dimension of the image characteristic information is regulated by a convolution layer with the convolution kernel size of 1, and the two-dimensional image characteristic information is converted into one-dimensional image coding information.

Further, in the step S2, the feature dimension finally extracted is (batch, 28,28,192).

Further, the step S3 includes:

step S31, extracting features of the image coding information obtained in the step S2 through an MLP module, and obtaining features with feature dimensions (batch, 14,14,192) through downsampling;

step S32, expanding the features obtained in the step S301 to obtain an information code with feature dimension (batch, 196,192);

and S33, splitting the information code obtained in the step S302 according to characteristic dimensions (batch, 8,196,24), and inputting the split information code to an Attention module and a residual error module as a data input Q.

Further, the calculation formula of the Attention module is as follows:

Wherein, the

D _n is a dimension value of the data input Q;

q ^T transposes the result for the data input Q matrix.

Compared with the prior art, the invention has the beneficial effects that:

1. A remote sensing image classification method based on a light-weight transducer model reduces the parameter quantity of the model while improving the precision of the transducer model, so as to obtain a better classification result, and the precision improvement is very large compared with that of a classical transducer model and a CNN model.

2. A remote sensing image classification method based on a transducer lightweight model can greatly reduce the huge model parameter quantity and calculation amount of the transducer model due to the complexity of the transducer model.

3. According to the remote sensing image classification method based on the transducer lightweight model, through outputting the characteristics of different scales in the model, compared with the characteristic that the characteristics of the traditional transducer model are unchanged in the extraction scale, the characteristics of different scales can be better obtained, and the overall accuracy of the model is improved.

Drawings

FIG. 1 is a flow chart of a remote sensing image classification method based on a transducer lightweight model

FIG. 2 is a general classification flow chart in the first embodiment;

FIG. 3 is a diagram of an example of a data sample;

FIG. 4 is a diagram of a QAttention model structure;

FIG. 5 is a block diagram of an MLP module;

fig. 6 is a precision comparison chart.

Detailed Description

It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The features and capabilities of the present invention are described in further detail below in connection with examples.

Example 1

The image classification model based on the transducer has the defects of high complexity, difficult convergence of model training and the like, and has poor training effect on small and medium-sized data sets such as optical remote sensing image classification and the like.

The embodiment provides a remote sensing image classification method based on a light-weight model of a transducer, which aims at the problems of high model complexity, low calculation efficiency, long consumption time, difficult convergence of a medium-small data set and the like of a classical transducer model, and improves the fitting capacity of the model, reduces network parameters, improves the average precision and training speed of optical remote sensing image classification and can be used for acquiring the category identification condition of an optical remote sensing image by modifying the model structure and using an improved QAttention module, a convolution realization form of an MLP module and an improved image coding module.

It should be noted that, please refer to fig. 2, the overall classification flow includes obtaining training samples and test samples in the optical remote sensing image classification dataset, performing data enhancement on the training samples, constructing an image coding module, a QAttention module, an MLP module and a classification module, inputting the training samples into the constructed classification model for training to obtain a trained model, inputting the test samples into the trained model, and predicting and outputting the optical remote sensing image classification result.

With respect to the image coding module, since the image coding module of the classical transducer model is rough in extracting image features, the convergence of the model and the final accuracy result are greatly affected, in this embodiment, in order to improve the stability of model training, the feature information in the original image is extracted as perfectly as possible, and downsampling is performed by adopting multi-layer convolution, so that the problem that the Vision Transformer model is not easy to train in small-scale data is greatly relieved.

Regarding QAttention modules, the original calculation formula of the Attention module is as follows:

the calculation formula after removing the V parameter is as follows:

the calculation formula for the Q parameter only remains as follows:

After experimental comparison, the test precision of different structures is found to have no obvious difference, and the parameter number and the calculated amount of the model are reduced along with the reduction of the Q, K and V parameters, so in the embodiment, the calculation formula which only keeps the Q parameter is decided to be used as the calculation formula of QAttention modules, namely:

The QAttention block diagram is shown in figure 4.

Regarding the MLP module, the realization of convolution operation is realized by depending on the linear operation of elements in the receptive field, so in the realization of the MLP module, a convolution layer with a convolution kernel of 1 is considered as a fully connected model in a feature diagram, and two-dimensional convolution is used for replacing a one-dimensional MLP module to perform operation, thereby further reducing the parameter quantity of the model and improving the operation efficiency and stability of the model.

The structure of the MLP module is shown in fig. 5.

Regarding the classification module, the classification module is not modified in this embodiment, and those skilled in the art should know the operation principle thereof, and therefore, a detailed description thereof will be omitted.

Referring to fig. 1-6, a remote sensing image classification method based on a transducer lightweight model is also a process of training model establishment, and includes:

the method comprises the steps of S1, selecting an optical remote sensing image, and carrying out visualization and pretreatment on the selected optical remote sensing image according to actual conditions, wherein in the step S1, the visualization is preferably carried out to convert the selected optical remote sensing image into a picture;

the image size transformation includes transforming (224 ) the image size of the selected optical remote sensing image to a data sample schematic as shown in FIG. 3;

The step S1 further comprises the steps of storing the selected optical remote sensing images into different folders according to categories, naming each folder by the name of the category, then placing all the optical remote sensing images belonging to the category into the folders, and for example, each folder contains 700 optical remote sensing images in the category of the book in NWPU-RESISC format.

Step S2, encoding the selected optical remote sensing image through an image encoding module to obtain image encoding information of the selected optical remote sensing image, wherein preferably, the step S2 comprises the following steps:

And then the dimension of the image characteristic information is regulated by a convolution layer with the convolution kernel size of 1, and the two-dimensional image characteristic information is converted into one-dimensional image coding information, so that the dimension of the finally extracted characteristic is (batch, 28,28,192).

Step S3, extracting features of image coding information of the selected optical remote sensing image through an MLP module to obtain information codes, wherein the step S3 preferably comprises the following steps:

The characteristic information in the information code is extracted through the Attention module, preferably, the data input Q is input into the following formula to operate, so that the characteristic information in the information code is extracted;

The calculation formula of the Attention module is as follows:

Wherein, the

D _n is a dimension value of the data input Q;

q ^T transposes the result for the data input Q matrix.

And S5, after the characteristic information extracted by the Attention module and the information code acquired by the MLP module are overlapped by the residual error module, carrying out normalization processing by the normalization module, wherein the residual error module and the normalization module are used for improving the convergence capacity of the model and preventing the problem of model precision reduction caused by gradient dispersion.

Step S6, after normalization processing is carried out by the normalization module, the normalization processing is sent into the classification module by the FNN module, so that a final classification result is obtained; the function of the FNN module is to adjust the internal channel structure of the model, so that characteristic information of different positions can be focused among QAttention modules, and the accuracy of the model is improved.

The number of parameters of QAttention modules in this embodiment is 32.11M, and the calculated amount is 7.07GFlops, which is significantly reduced by 9.42GFlops compared with 48.06M of parameters of the original Attention module.

Referring to fig. 6, in the remote sensing image classification method based on the light-weight model of the transducer, classification accuracy on NWPU-RESISC data sets of 20% training set is 79.76%, which is improved by 13.36% compared with 66.4% of the classical model of the transducer, is improved by 4.53% compared with Resnet, and accuracy improvement is obvious, and the condition that the transducer model is difficult to fit is greatly relieved.

The remote sensing image classification method based on the transducer lightweight model has the classification speed of 824.4 pieces/s, and is obviously improved compared with 759.6 pieces/s of a classical transducer model.

The above examples merely illustrate specific embodiments of the application, which are described in more detail and are not to be construed as limiting the scope of the application. It should be noted that it is possible for a person skilled in the art to make several variants and modifications without departing from the technical idea of the application, which fall within the scope of protection of the application.

This background section is provided to generally present the context of the present invention and the work of the presently named inventors, to the extent it is described in this background section, as well as the description of the present section as not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present invention.

Claims

1. A remote sensing image classification method based on a Transformer lightweight model, comprising:

Step S1: Select an optical remote sensing image, and visualize and preprocess the selected optical remote sensing image;

Step S2: encoding the selected optical remote sensing image through an image encoding module to obtain image encoding information of the selected optical remote sensing image;

Step S3: extracting features from the image coding information of the selected optical remote sensing image through the MLP module to obtain information coding;

Step S4: extracting feature information from information encoding through the Attention module;

Step S5: The feature information extracted by the Attention module and the information encoding obtained by the MLP module are superimposed by the residual module and then normalized by the normalization module;

Step S6: After the normalization module performs normalization processing, it is sent to the classification module through the FNN module to obtain the final classification result;

The calculation formula of the Attention module is as follows:

in,

d _n is the dimension value of data input Q;

Q ^T is the result of transposing the data input Q matrix.

2. A remote sensing image classification method based on a Transformer lightweight model according to claim 1, characterized in that in step S1, visualization is to convert the selected optical remote sensing image into a picture; preprocessing includes: image rotation, clarity adjustment, and image size conversion.

3. The remote sensing image classification method based on the Transformer lightweight model according to claim 1, characterized in that step S1 further includes:

The selected optical remote sensing images are stored in different folders according to categories, and each folder is named after the category.

4. The remote sensing image classification method based on the Transformer lightweight model according to claim 1, wherein step S2 comprises:

The image encoding module performs convolution operation on the selected optical remote sensing image through 4 convolution layers with a convolution kernel size of 3 to extract image feature information therein;

Then, a convolution layer with a convolution kernel size of 1 is used to adjust the dimension of the image feature information, and the two-dimensional image feature information is converted into one-dimensional image coding information.

5. A remote sensing image classification method based on a Transformer lightweight model according to claim 4, characterized in that in step S2, the feature dimension finally extracted is (batch, 28, 28, 192).

6. The remote sensing image classification method based on the Transformer lightweight model according to claim 1, wherein step S3 comprises:

Step S31: The image coding information obtained in step S2 is subjected to feature extraction by the MLP module, and features with a feature dimension of (batch, 14, 14, 192) are obtained by downsampling.

Step S32: Expand the features obtained in step S301 to obtain information encoding with a feature dimension of (batch, 196, 192);

Step S33: Split the information code obtained in step S302 according to the feature dimensions (batch, 8, 196, 24), and use the split information code as data input Q to input into the Attention module and the residual module.