CN118865144A

CN118865144A - Tumor multi-gene detection method based on hyperspectral imaging

Info

Publication number: CN118865144A
Application number: CN202411354180.3A
Authority: CN
Inventors: 韩景泓; 张文格; 黄心怡; 卢佳; 刘宸宇; 王艺珍; 常树磊; 王山; 李长域; 党广虹; 杨子卿
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2024-09-27
Filing date: 2024-09-27
Publication date: 2024-10-29
Anticipated expiration: 2044-09-27
Also published as: CN118865144B

Abstract

The present invention discloses a method for detecting tumor multi-genes based on hyperspectral images, which belongs to the technical field of mutation sample detection, and specifically includes: obtaining tumor tissue sections of different carcinogenic driver gene mutation states after H&E staining; photographing tumor tissue sections by a hyperspectral camera, obtaining hyperspectral images of tumor samples and preprocessing; building a global feature extractor based on S3Anet; building a tumor multi-gene detection model and using multi-task learning training; collecting tumor samples to be detected, and obtaining hyperspectral data of the samples to be detected by a hyperspectral camera; inputting the hyperspectral data into a trained S3ANet model for feature extraction, and then inputting it into a trained multi-gene detection model to obtain the gene detection results of the tumor samples. The beneficial effects of the present invention are: the present invention effectively reduces the dependence on traditional pathological diagnosis experience and gene sequencing, significantly improves the accuracy, reliability and generalization ability of detection, and reduces the detection cost.

Description

Tumor multi-gene detection method based on hyperspectral imaging

技术领域Technical Field

本发明涉及突变样本检测技术领域，特别涉及一种基于高光谱图像的肿瘤多基因检测方法。The present invention relates to the technical field of mutation sample detection, and in particular to a tumor multi-gene detection method based on hyperspectral images.

背景技术Background Art

在现有医疗技术背景下，肿瘤靶向治疗对于提高患者生存率具有重要意义。在进行肿瘤靶向治疗前，通常需要对致癌驱动基因进行检测。目前，常用的基因检测手段主要包括免疫组化（免疫组织化学技术）和基因测序。然而，这两种方法存在诸多不足。免疫组化和基因测序往往检测时间较长，成本较高，并且耗费大量的人力资源，难以满足医院和患者对快速、低成本基因检测的需求。因此，现有的检测手段在广泛应用中受到限制，亟需一种高效且经济的新型肿瘤基因检测方法。Under the background of existing medical technology, targeted tumor therapy is of great significance for improving patient survival rates. Before conducting targeted tumor therapy, it is usually necessary to detect oncogenic driver genes. At present, the commonly used genetic testing methods mainly include immunohistochemistry (immunohistochemistry technology) and gene sequencing. However, these two methods have many shortcomings. Immunohistochemistry and gene sequencing often take a long time to test, are costly, and consume a lot of human resources, making it difficult to meet the needs of hospitals and patients for rapid and low-cost genetic testing. Therefore, the existing detection methods are limited in wide application, and a new, efficient and economical tumor gene detection method is urgently needed.

发明内容Summary of the invention

本发明的目的是提供一种基于高光谱图像的肿瘤多基因检测方法，以解决现有技术中基因检测方法检测时间长、成本高、应用范围有限等问题。本发明是通过利用肿瘤切片的高光谱图像信息作为训练数据，构建模型并完成训练后，能够准确识别患者组织样本中致癌驱动基因的突变水平，从而为肿瘤的靶向治疗提供一种高效、低成本且精确的基因检测手段。The purpose of the present invention is to provide a tumor multi-gene detection method based on hyperspectral images to solve the problems of long detection time, high cost, and limited application scope of gene detection methods in the prior art. The present invention uses the hyperspectral image information of tumor slices as training data, builds a model and completes the training, which can accurately identify the mutation level of oncogenic driver genes in patient tissue samples, thereby providing an efficient, low-cost and accurate gene detection method for targeted tumor treatment.

为了实现上述发明目的，本发明提供了一种基于高光谱图像的肿瘤多基因检测方法，所述检测方法包括以下步骤：In order to achieve the above-mentioned object of the invention, the present invention provides a method for detecting multiple genes of tumors based on hyperspectral images, and the detection method comprises the following steps:

步骤S1：收集肿瘤组织样本，获取经过H&E染色的不同致癌驱动基因突变状态的肿瘤组织切片，并进行低温保存；Step S1: Collect tumor tissue samples, obtain tumor tissue sections with different mutation states of oncogenic driver genes after H&E staining, and store them at low temperatures;

步骤S2：获取环境光光谱特性，通过无荧光、白色漫反射材料作为参照物，采集参照物区域的高光谱数据，经过平均处理得到环境光特征曲线；Step S2: Acquire the spectral characteristics of ambient light, use a non-fluorescent, white diffuse reflective material as a reference, collect hyperspectral data of the reference area, and obtain an ambient light characteristic curve through averaging;

步骤S3：解冻肿瘤组织样本，通过高光谱相机拍摄肿瘤组织切片，获取肿瘤样本的高光谱图像，所述肿瘤样本的高光谱图像包括光谱维和立体空间维；Step S3: thawing the tumor tissue sample, photographing the tumor tissue slice with a hyperspectral camera, and obtaining a hyperspectral image of the tumor sample, wherein the hyperspectral image of the tumor sample includes a spectral dimension and a stereoscopic spatial dimension;

步骤S4：对肿瘤样本的高光谱图像进行预处理；Step S4: preprocessing the hyperspectral image of the tumor sample;

步骤S5：基于S3Anet搭建全局特征提取器，其中S3ANet为3D卷积神经网络与自注意力机制结合的基本结构框架，用于提取肿瘤样本的形态及光学特征，并通过自监督学习任务训练模型以自动提取高光谱数据的特征；Step S5: building a global feature extractor based on S3Anet, where S3Anet is a basic structural framework combining a 3D convolutional neural network with a self-attention mechanism, which is used to extract the morphological and optical features of tumor samples, and train the model through a self-supervised learning task to automatically extract features of hyperspectral data;

步骤S6：搭建肿瘤多基因检测模型并利用多任务学习训练，肿瘤多基因检测模型利用由S3ANet提取的特征向量，通过多个任务特定层对每个基因的突变状态进行二分类，优化多个基因的检测结果；Step S6: Build a tumor multi-gene detection model and use multi-task learning training. The tumor multi-gene detection model uses the feature vector extracted by S3ANet to perform binary classification on the mutation status of each gene through multiple task-specific layers to optimize the detection results of multiple genes;

步骤S7：收集待检测的肿瘤样本，通过高光谱相机获取该待检测样本的高光谱数据；Step S7: collecting a tumor sample to be detected, and obtaining hyperspectral data of the sample to be detected by a hyperspectral camera;

步骤S8：将步骤S7中获得的高光谱数据输入到经过训练的S3ANet模型中进行特征提取，将得到的特征向量输入到训练好的多基因检测模型，利用该模型预测待检测肿瘤样本的基因突变状态，输出肿瘤样本的基因检测结果。Step S8: The hyperspectral data obtained in step S7 is input into the trained S3ANet model for feature extraction, and the obtained feature vector is input into the trained multi-gene detection model, which is used to predict the gene mutation status of the tumor sample to be detected and output the gene detection result of the tumor sample.

其中，所述步骤S1中收集的肿瘤组织样本包括携带EGFR基因突变阳性及阴性、ALK、FGFR1、PIK3CA、KRAS、ERCC1、RRM1和HER2基因突变状态的肿瘤切片，样本来源于手术患者术后边缘游离的正常组织。该肿瘤组织样本的获取不改变手术术式或治疗方案，所有样本采集均获得患者授权，且术前未确定样本来源病人的基因突变状态。The tumor tissue samples collected in step S1 include tumor sections carrying positive and negative EGFR gene mutations, ALK, FGFR1, PIK3CA, KRAS, ERCC1, RRM1 and HER2 gene mutations, and the samples are derived from normal tissues free from the postoperative margin of surgical patients. The acquisition of the tumor tissue samples does not change the surgical procedure or treatment plan, all sample collections are authorized by the patients, and the gene mutation status of the sample source patients is not determined before surgery.

所述步骤S4的预处理包括：The pre-processing of step S4 includes:

步骤S401：利用步骤S2获取的环境光特征曲线，对肿瘤样本的高光谱数据进行环境降噪处理，减去环境光光谱特征，获得降噪后的肿瘤样本高光谱数据；Step S401: using the ambient light characteristic curve obtained in step S2, performing environmental noise reduction processing on the hyperspectral data of the tumor sample, subtracting the ambient light spectrum characteristics, and obtaining the hyperspectral data of the tumor sample after noise reduction;

步骤S402：对降噪后的肿瘤样本高光谱数据进行归一化处理，将光谱值转换到[0,1]范围内，通过min-max归一化方法提高数据一致性；Step S402: normalizing the denoised hyperspectral data of the tumor sample, converting the spectral values to the range of [0, 1], and improving data consistency through a min-max normalization method;

步骤S403：进行数据增强处理，包括通过图像旋转生成新的训练样本，以增强模型的泛化能力；Step S403: performing data enhancement processing, including generating new training samples by image rotation to enhance the generalization ability of the model;

步骤S404：将高光谱图像划分为小的三维数据块，每个三维数据块均包含一定数量的相邻像素和所有光谱通道的信息，用于提升内存利用率和后期的训练效率，并采用随机梯度下降方法加速模型训练过程，避免过拟合现象。Step S404: Divide the hyperspectral image into small three-dimensional data blocks, each of which contains a certain number of adjacent pixels and information of all spectral channels, so as to improve memory utilization and later training efficiency, and use the stochastic gradient descent method to accelerate the model training process and avoid overfitting.

其中，步骤S402中的归一化处理通过计算整个数据集中的最小值和最大值，将每个像素的光谱值减去最小值后除以最大值与最小值的差值。The normalization process in step S402 is performed by calculating the minimum and maximum values in the entire data set, subtracting the minimum value from the spectral value of each pixel and dividing the result by the difference between the maximum value and the minimum value.

步骤S403中的数据增强通过随机旋转角度对原始图像进行处理，模拟实际应用中的视角变化，以提高模型对不同图像形态的鲁棒性。The data enhancement in step S403 processes the original image by randomly rotating the angle to simulate the change of viewing angle in actual applications, so as to improve the robustness of the model to different image forms.

所述S3ANet网络包括三维卷积神经网络（3D-CNN）模块，所述3D-CNN模块由多个卷积层、全局平均池化层和全连接层组成；The S3ANet network includes a three-dimensional convolutional neural network (3D-CNN) module, which consists of multiple convolutional layers, global average pooling layers and fully connected layers;

卷积层，用于提取高光谱图像中的空间维和光谱维的特征；Convolutional layer, used to extract the features of spatial and spectral dimensions in hyperspectral images;

全局平均池化层，将提取的特征映射为固定长度的特征向量；The global average pooling layer maps the extracted features into a feature vector of fixed length;

全连接层，用于提取和压缩全局特征，以便于分类任务中的特征表达。The fully connected layer is used to extract and compress global features to facilitate feature expression in classification tasks.

所述S3ANet网络还包括自注意力机制模块，所述自注意力机制通过计算高光谱图像特征之间的相似度，生成特征权重矩阵，利用该矩阵对高光谱图像特征进行加权处理，以突出重要特征，抑制冗余信息。The S3ANet network also includes a self-attention mechanism module, which generates a feature weight matrix by calculating the similarity between hyperspectral image features, and uses the matrix to weight the hyperspectral image features to highlight important features and suppress redundant information.

所述S3ANet网络通过自监督学习进行训练，所述自监督学习包含图像重建任务，具体为：将高光谱图像输入S3ANet网络，经过3D-CNN模块和自注意力机制模块处理后，输出重建的高光谱图像，所述重建任务通过计算输入图像与重建图像的光谱相似性来优化网络参数，直至最小化重建误差。The S3ANet network is trained by self-supervised learning, which includes an image reconstruction task, specifically: a hyperspectral image is input into the S3ANet network, and after being processed by a 3D-CNN module and a self-attention mechanism module, a reconstructed hyperspectral image is output. The reconstruction task optimizes the network parameters by calculating the spectral similarity between the input image and the reconstructed image until the reconstruction error is minimized.

所述步骤S6中的多任务学习模型用于同时预测多个肿瘤基因的突变状态，所述多任务学习模型包括输入层、共享层和任务特定层；The multi-task learning model in step S6 is used to simultaneously predict the mutation status of multiple tumor genes, and the multi-task learning model includes an input layer, a shared layer, and a task-specific layer;

所述共享层用于提取多个基因分类任务的公共特征；The shared layer is used to extract common features of multiple gene classification tasks;

所述任务特定层为每个基因任务独立构建，包含全连接层和sigmoid激活函数，用于二分类预测基因的突变状态。The task-specific layer is independently constructed for each gene task, including a fully connected layer and a sigmoid activation function, which is used for binary classification prediction of the mutation status of the gene.

所述多任务学习模型通过优化每个基因任务的独立损失函数进行训练，最终损失为所有任务损失的加权和，所述训练采用批量梯度下降法，并通过验证集评估模型的准确性、F1分数、召回率和精确率。The multi-task learning model is trained by optimizing the independent loss function of each gene task, and the final loss is the weighted sum of all task losses. The training adopts batch gradient descent method, and the accuracy, F1 score, recall rate and precision of the model are evaluated through the validation set.

本发明的有益效果是：本发明通过高光谱成像技术、S3ANet模型的全局特征提取和自监督、多任务学习方法实现肿瘤多基因的检测，有效减少了对传统病理学诊断经验和基因测序的依赖，显著提高了检测的准确性、可靠性及泛化能力，同时降低了检测成本，而且还通过提升内存利用率和训练效率，优化了系统的计算资源消耗，使得该检测方法适用于大规模数据处理。整体上，本发明为肿瘤基因分型和个性化治疗提供了高效、经济的解决方案，有助于提高癌症早期筛查率和患者预后；具体的：The beneficial effects of the present invention are as follows: the present invention realizes the detection of multiple tumor genes through hyperspectral imaging technology, global feature extraction and self-supervision of the S3ANet model, and multi-task learning methods, effectively reducing the dependence on traditional pathological diagnosis experience and gene sequencing, significantly improving the accuracy, reliability and generalization ability of detection, while reducing the detection cost, and also optimizing the system's computing resource consumption by improving memory utilization and training efficiency, making the detection method suitable for large-scale data processing. Overall, the present invention provides an efficient and economical solution for tumor gene typing and personalized treatment, which helps to improve the early screening rate of cancer and patient prognosis; specifically:

1、本发明通过高光谱成像技术结合自动化的处理流程，能够快速获取肿瘤切片的光谱和空间信息，进而进行肿瘤基因分型。医生可根据检测结果迅速识别肿瘤的基因特征，为患者选择最适合的治疗方案，尤其是免疫抑制剂，从而缩短了诊断时间，提高了治疗效率。1. The present invention uses hyperspectral imaging technology combined with automated processing procedures to quickly obtain spectral and spatial information of tumor slices, and then perform tumor gene typing. Doctors can quickly identify the genetic characteristics of tumors based on the test results and choose the most suitable treatment plan for patients, especially immunosuppressants, thereby shortening the diagnosis time and improving treatment efficiency.

2、采用自动化处理方法，减少了对病理学家临床经验的依赖。通过系统化的肿瘤基因检测流程，显著降低了临床诊断过程中的主观偏差，提高了诊断的准确性，避免了患者因过度治疗或治疗不足而产生的风险。2. The use of automated processing methods reduces the reliance on the clinical experience of pathologists. Through the systematic tumor gene testing process, the subjective bias in the clinical diagnosis process is significantly reduced, the accuracy of diagnosis is improved, and the risk of over-treatment or under-treatment of patients is avoided.

3、本发明采用的快速基因检测方法能够有效支持癌症早期发现与治疗，从而降低国民因癌症导致的严重健康风险。3. The rapid gene detection method adopted by the present invention can effectively support the early detection and treatment of cancer, thereby reducing the serious health risks caused by cancer to the public.

4、本发明采用了结合3D卷积神经网络（3D-CNN）和自注意力机制的S3ANet模型，能够集成光谱和空间信息，提取肿瘤样本的全局特征。该模型不仅提高了肿瘤基因检测的准确性，还增强了系统的可靠性。4. The present invention adopts the S3ANet model that combines 3D convolutional neural network (3D-CNN) and self-attention mechanism, which can integrate spectral and spatial information and extract global features of tumor samples. This model not only improves the accuracy of tumor gene detection, but also enhances the reliability of the system.

5、通过将高光谱图像划分为小的三维数据块并进行数据随机化处理，本发明显著提高了内存利用效率，缩短了模型训练时间，同时减少了计算资源的消耗，适合大规模数据处理。5. By dividing the hyperspectral image into small three-dimensional data blocks and performing data randomization processing, the present invention significantly improves memory utilization efficiency, shortens model training time, and reduces the consumption of computing resources, making it suitable for large-scale data processing.

6、本发明通过自监督学习技术，减少了对大量标注数据的依赖，使模型能够在较少的标注数据条件下进行有效训练。同时，自监督学习的应用提高了模型的适应性，能够更好地应对新数据和新环境，保证检测结果的准确性和可靠性。6. The present invention reduces the reliance on a large amount of labeled data through self-supervised learning technology, so that the model can be effectively trained under the condition of less labeled data. At the same time, the application of self-supervised learning improves the adaptability of the model, which can better cope with new data and new environments, and ensure the accuracy and reliability of the detection results.

7、本发明利用多任务学习优化多个基因状态的检测任务，通过共享特征提取层，提升了模型的训练效率，并通过多个任务特定层精确分类各基因状态，进一步提高了检测结果的准确度和一致性。7. The present invention uses multi-task learning to optimize the detection tasks of multiple gene states. By sharing the feature extraction layer, the training efficiency of the model is improved, and the gene states are accurately classified through multiple task-specific layers, further improving the accuracy and consistency of the detection results.

8、通过准确的肿瘤基因分型结果，本发明为个性化医疗奠定了基础，使得治疗方案可以根据患者的具体基因特征进行定制化选择，极大提高了治疗效果，提升了患者的生存率。8. Through accurate tumor gene typing results, the present invention lays the foundation for personalized medicine, so that treatment plans can be customized according to the patient's specific genetic characteristics, greatly improving the treatment effect and increasing the patient's survival rate.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实施例一和二的肿瘤多基因检测流程示意图；FIG1 is a schematic diagram of a multi-gene tumor detection process according to embodiments 1 and 2 of the present invention;

图2是本发明实施例一和二中数据预处理流程示意图；FIG2 is a schematic diagram of a data preprocessing process in Embodiments 1 and 2 of the present invention;

图3是本发明实施例一和二中用于高光谱图像特征提取的S3ANet神经网络架构示意图。FIG3 is a schematic diagram of the S3ANet neural network architecture for hyperspectral image feature extraction in Embodiments 1 and 2 of the present invention.

具体实施方式DETAILED DESCRIPTION

为能清楚说明本方案的技术特点，下面通过具体实施方式，对本方案进行阐述。In order to clearly illustrate the technical features of this solution, this solution is described below through a specific implementation method.

实施例一Embodiment 1

本发明实施例提供了基于高光谱图像的肿瘤多基因检测方法，其包括以下步骤：The embodiment of the present invention provides a method for multi-gene detection of tumors based on hyperspectral images, which comprises the following steps:

步骤S1：收集肿瘤组织样本，获取经过H&E染色的不同致癌驱动基因突变状态的肿瘤组织切片，并进行低温保存；Step S1: Collect tumor tissue samples, obtain tumor tissue sections with different oncogenic driver gene mutation states after H&E staining, and store them at low temperatures;

所述步骤S4的预处理包括：The pre-processing of step S4 includes:

实施例二Embodiment 2

如图1-图2所示，本发明实施例提供了一种以高光谱影像为基础的肿瘤多基因检测方法，具体包括以下几个步骤：As shown in FIG. 1 and FIG. 2 , an embodiment of the present invention provides a method for multi-gene detection of tumors based on hyperspectral imaging, which specifically includes the following steps:

步骤S1：收集肿瘤手术患者的肿瘤组织样本，样本低温保存。Step S1: Collect tumor tissue samples from patients undergoing tumor surgery and store the samples at low temperatures.

本实施例要获取肿瘤组织样本的高光谱图像信息作为训练数据，使用到的肿瘤切片包括肿瘤 EGFR 基因突变阳性切片和肿瘤 EGFR 基因突变阴性切片等多种致癌驱动基因不同突变性的肿瘤切片，所有样本的使用均经过授权，样本的采集不需改变手术术式或治疗方案，术前未确定样本来源病人。样本均来源于术后的肿瘤组织边缘游离的正常组织。上述切片均经过H&E染色。病理结果根据免疫组化结果进行分类，形成EGFR、ALK、FGFR1、PIK3CA、KRAS、ERCC1、RRM1、HER2基因突变性不同分级的肿瘤样本，将其全部置于低温放置。In this embodiment, hyperspectral image information of tumor tissue samples is obtained as training data. The tumor slices used include tumor EGFR gene mutation-positive slices and tumor EGFR gene mutation-negative slices and other tumor slices with different mutations of carcinogenic driver genes. The use of all samples is authorized, and the collection of samples does not require changes in surgical procedures or treatment plans. The source patient of the samples was not determined before surgery. The samples are all derived from normal tissues free at the edge of the postoperative tumor tissue. The above slices are all stained with H&E. The pathological results are classified according to the results of immunohistochemistry to form tumor samples with different grades of EGFR, ALK, FGFR1, PIK3CA, KRAS, ERCC1, RRM1, and HER2 gene mutations, and all of them are placed at low temperature.

步骤S2：获取环境光光谱特性。Step S2: Obtaining the spectral characteristics of ambient light.

本实施例中，在高光谱成像过程中，使用一块无荧光、白色漫反射材料作为参照物，放置于成像平面上。对参照物区域进行高光谱成像，采集该区域的一维光谱数据；然后再通过对采集到的光谱数据进行平均处理，得到代表环境光条件的光谱特征曲线。In this embodiment, during the hyperspectral imaging process, a piece of non-fluorescent, white diffuse reflective material is used as a reference and placed on the imaging plane. Hyperspectral imaging is performed on the reference area to collect one-dimensional spectral data of the area; and then the collected spectral data are averaged to obtain a spectral characteristic curve representing the ambient light conditions.

步骤S3：基于高光谱扫描早期肿瘤组织样本。Step S3: Scanning early tumor tissue samples based on the hyperspectral spectrum.

将经过步骤S1处理的组织样本解冻后，通过高光谱相机拍摄，获取高光谱原始图像。本实施例中，高光谱图像包括光谱维和立体空间维，同时收集高光谱数据。After the tissue sample processed in step S1 is thawed, it is photographed by a hyperspectral camera to obtain a hyperspectral original image. In this embodiment, the hyperspectral image includes a spectral dimension and a stereoscopic spatial dimension, and the hyperspectral data is collected at the same time.

步骤S4：对高光谱图像进行预处理。Step S4: preprocessing the hyperspectral image.

首先，将高光谱图像减去环境光光谱特征，以实现环境降噪，即将肿瘤高光谱图像中各像素点的一维光谱数据减去环境光平均光谱特征。Firstly, the spectral characteristics of ambient light are subtracted from the hyperspectral image to achieve environmental noise reduction, that is, the one-dimensional spectral data of each pixel in the tumor hyperspectral image is subtracted from the average spectral characteristics of ambient light.

为了确保数据在相同的范围内，进行归一化处理。这一过程涉及将每个像素的光谱值转换到[0, 1]的范围内。具体操作为，先计算整个数据集中的最小值和最大值，然后将每个像素的光谱值减去最小值，最后将结果除以最大值与最小值之差。这种min-max归一化方法有助于统一不同数据源的尺度，使得模型在训练过程中能够更公平地对待每个像素点，从而提高数据的一致性。To ensure that the data is in the same range, normalization is performed. This process involves converting the spectral value of each pixel to the range of [0, 1]. The specific operation is to first calculate the minimum and maximum values in the entire data set, then subtract the minimum value from the spectral value of each pixel, and finally divide the result by the difference between the maximum and minimum values. This min-max normalization method helps to unify the scale of different data sources, allowing the model to treat each pixel more fairly during training, thereby improving data consistency.

为了增强模型的泛化能力，本实施例还采用数据增强技术，即通过旋转原始图像生成新的训练样本，能够模拟真实世界中可能出现的视角变化，增加数据的多样性，从而帮助模型学习到更加鲁棒的特征。In order to enhance the generalization ability of the model, this embodiment also adopts data enhancement technology, that is, generating new training samples by rotating the original image, which can simulate the perspective changes that may occur in the real world and increase the diversity of data, thereby helping the model learn more robust features.

在训练深度学习模型时，将高光谱图像划分为小的三维数据块。不仅可以提高内存的利用率，还可以通过每个批次的随机梯度下降来加速模型的训练过程。When training deep learning models, hyperspectral images are divided into small three-dimensional data blocks. This not only improves memory utilization, but also accelerates the model training process through stochastic gradient descent in each batch.

为了防止模型在训练过程中对数据的特定顺序产生依赖，从而导致过拟合，需在每个训练周期（epoch）结束后对数据进行随机打乱。这种数据随机化的方法确保了每次训练时样本的顺序都是不同的，迫使模型学习到更加泛化的特征，而不是仅仅记住特定的数据顺序。In order to prevent the model from becoming dependent on the specific order of data during training, which can lead to overfitting, the data needs to be randomly shuffled after each training cycle (epoch). This method of data randomization ensures that the order of samples is different each time during training, forcing the model to learn more generalized features instead of just remembering a specific data order.

步骤S5：基于S3ANet提取图像全局特征。Step S5: Extract image global features based on S3ANet.

为了尽可能提取更多的图像形态及光学特征，本实施例创新性提出S3ANet，该网络结合了能够有效处理高光谱图像的3D-CNN网络和对形态光学特征重点关注的自注意力机制形成新的网络架构，通过自监督任务训练模型得到能够在无标注高光谱数据上进行特征自动提取的特征提取器；具体为：In order to extract as many image morphological and optical features as possible, this embodiment innovatively proposes S3ANet, which combines the 3D-CNN network that can effectively process hyperspectral images and the self-attention mechanism that focuses on morphological optical features to form a new network architecture. The feature extractor that can automatically extract features on unlabeled hyperspectral data is obtained through the self-supervised task training model; specifically:

（1）模型构建：(1) Model construction:

使用3D卷积神经网络来提取高光谱图像中的空间和光谱特征。3D-CNN能够同时处理图像的空间和光谱信息，捕捉到更加细粒度的特征。在3D-CNN的基础上，添加自注意力机制，以增强对重要的图像形态及光学特征的关注。自注意力机制通过计算特征之间的相似度，选择性地关注更重要的特征，提高特征提取的精度。Use 3D convolutional neural network to extract spatial and spectral features in hyperspectral images. 3D-CNN can process the spatial and spectral information of images at the same time and capture more fine-grained features. On the basis of 3D-CNN, add self-attention mechanism to enhance the focus on important image morphology and optical features. The self-attention mechanism calculates the similarity between features and selectively focuses on more important features, thereby improving the accuracy of feature extraction.

（2）网络结构：(2) Network structure:

多层3D卷积层，用于提取图像的低级和高级特征。Multiple layers of 3D convolutional layers are used to extract low-level and high-level features of the image.

全局平均池化层，用于将特征映射到一个固定长度的向量，便于后续处理。The global average pooling layer is used to map features to a vector of fixed length for subsequent processing.

自注意力层，用于增强特征之间的相互作用和重要性。Self-attention layer to enhance the interaction and importance between features.

全连接层，用于进一步提取全局特征。The fully connected layer is used to further extract global features.

（3）自监督学习任务训练模型：(3) Self-supervised learning task training model:

设定一个图像重建的自监督学习任务用于训练模型。自监督学习任务的目的是通过没有标签的数据来训练模型，使其能够学习到有效的特征表示。在图像重建任务中，模型的目标是将输入的高光谱图像重建出来。通过最小化重建图像与原始图像之间的差异，模型能够学习到输入数据的有效特征。A self-supervised learning task of image reconstruction is set to train the model. The purpose of the self-supervised learning task is to train the model with unlabeled data so that it can learn effective feature representation. In the image reconstruction task, the goal of the model is to reconstruct the input hyperspectral image. By minimizing the difference between the reconstructed image and the original image, the model can learn the effective features of the input data.

训练过程：Training process:

将数据集分成训练集和验证集，确保模型能够在训练过程中进行验证和调优。Splitting the dataset into training and validation sets ensures that the model can be validated and tuned during the training process.

使用前述的自监督学习任务，训练模型。通过反向传播算法，不断调整模型的参数，使其能够准确地重建输入图像。在每个训练迭代结束后，评估模型在验证集上的表现，确保模型能够有效地学习到数据的特征。Use the self-supervised learning task described above to train the model. Use the back-propagation algorithm to continuously adjust the model's parameters so that it can accurately reconstruct the input image. After each training iteration, evaluate the model's performance on the validation set to ensure that the model can effectively learn the characteristics of the data.

（4）提取特征：(4) Feature extraction:

训练完成后，使用特征提取器从高光谱图像中提取全局特征。这些特征向量可以用于后续的多任务学习。After training, a feature extractor is used to extract global features from the hyperspectral image. These feature vectors can be used for subsequent multi-task learning.

步骤S6：利用多任务学习训练肿瘤多基因检测模型。Step S6: Use multi-task learning to train a tumor multi-gene detection model.

（1）数据准备：(1) Data preparation:

获取标注数据，包含每个样本的6种基因的阴性或阳性状态。每个基因的状态用0（阴性）和1（阳性）表示。使用已经训练好的S3ANet模型，从高光谱图像中提取特征，这些特征将作为多任务学习模型的输入。Obtain labeled data, including the negative or positive status of the six genes for each sample. The status of each gene is represented by 0 (negative) and 1 (positive). Use the trained S3ANet model to extract features from the hyperspectral image, which will be used as the input of the multi-task learning model.

（2）构建多任务学习模型：(2) Building a multi-task learning model:

网络架构为：The network architecture is:

输入层：接受从S3ANet提取的特征向量；Input layer: accepts feature vectors extracted from S3ANet;

共享层：一层或多层共享的全连接层，用于提取公共特征；Shared layer: one or more shared fully connected layers used to extract common features;

任务特定层：为每个基因单独构建一组任务特定层，使用全连接层和sigmoid激活函数进行二分类。Task-specific layers: A set of task-specific layers is constructed for each gene separately, using fully connected layers and sigmoid activation functions for binary classification.

（3）数据划分：(3) Data division:

将数据集划分为训练集和验证集。训练集用于模型训练，验证集用于模型评估和调优。The dataset is divided into a training set and a validation set. The training set is used for model training, and the validation set is used for model evaluation and tuning.

（4）训练模型：(4) Training model:

使用多任务学习方法，同时优化所有基因的分类任务。每个基因的输出层通过一个独立的损失函数进行优化，总损失是所有任务损失的加权和。使用批量梯度下降法进行模型训练，最小化总损失。在每个训练迭代结束后，评估模型在验证集上的性能，并进行必要的超参数调优。Use a multi-task learning approach to optimize the classification tasks of all genes simultaneously. The output layer of each gene is optimized using an independent loss function, and the total loss is the weighted sum of all task losses. Use batch gradient descent to train the model and minimize the total loss. After each training iteration, evaluate the performance of the model on the validation set and perform necessary hyperparameter tuning.

（5）模型评估：(5) Model evaluation:

使用准确率、F1分数、召回率和精确率等指标评估模型性能。The model performance was evaluated using metrics such as accuracy, F1 score, recall, and precision.

对每个基因的分类结果进行单独评估，并计算整体的模型性能。The classification results for each gene were evaluated individually and the overall model performance was calculated.

在验证集上进行模型评估，确保模型的泛化能力和鲁棒性。The model is evaluated on the validation set to ensure the generalization ability and robustness of the model.

实施例三Embodiment 3

本发明实施例提供了一种实现实施例一或二中实现上述基于高光谱图像的肿瘤多基因检测方法的系统，该系统包括：The embodiment of the present invention provides a system for implementing the above-mentioned tumor multi-gene detection method based on hyperspectral image in embodiment 1 or 2, and the system includes:

样本收集模块，用于收集肿瘤组织样本，获取经过H&E染色的不同致癌驱动基因突变状态的肿瘤组织切片，并对其进行低温保存；A sample collection module is used to collect tumor tissue samples, obtain tumor tissue sections with different mutation states of oncogenic driver genes after H&E staining, and store them at low temperatures;

光谱采集模块，用于获取环境光光谱特性，通过无荧光、白色漫反射材料作为参照物，采集参照物区域的高光谱数据，并经过平均处理得到环境光特征曲线；The spectrum acquisition module is used to obtain the spectral characteristics of the ambient light. It uses a non-fluorescent, white diffuse reflective material as a reference object to collect the hyperspectral data of the reference area, and obtains the ambient light characteristic curve through averaging processing;

高光谱成像模块，用于解冻肿瘤组织样本，并通过高光谱相机拍摄肿瘤组织切片，获取包含光谱维和立体空间维的肿瘤样本的高光谱图像；A hyperspectral imaging module is used to thaw tumor tissue samples and photograph tumor tissue sections using a hyperspectral camera to obtain hyperspectral images of tumor samples that include spectral and stereoscopic spatial dimensions;

图像预处理模块，用于对肿瘤样本的高光谱图像进行预处理；An image preprocessing module, used for preprocessing the hyperspectral images of tumor samples;

全局特征提取模块，基于S3ANet构建，其中S3ANet为结合3D卷积神经网络与自注意力机制的基本结构框架，用于提取肿瘤样本的形态及光学特征，并通过自监督学习任务训练模型以自动提取高光谱数据的特征；The global feature extraction module is built on S3ANet, which is a basic structural framework combining 3D convolutional neural networks and self-attention mechanisms. It is used to extract the morphological and optical features of tumor samples and train the model through self-supervised learning tasks to automatically extract features of hyperspectral data;

肿瘤多基因检测模块，用于基于S3ANet提取的特征向量，利用多任务学习进行训练，并通过多个任务特定层对每个基因的突变状态进行二分类，优化多个基因的检测结果；The tumor multi-gene detection module is used to train the feature vectors extracted by S3ANet using multi-task learning, and to classify the mutation status of each gene through multiple task-specific layers to optimize the detection results of multiple genes;

样本检测模块，用于采集待检测肿瘤样本的高光谱数据，通过高光谱相机获取该待检测样本的高光谱图像；A sample detection module is used to collect hyperspectral data of a tumor sample to be detected and obtain a hyperspectral image of the sample to be detected through a hyperspectral camera;

数据输入与预测模块，用于将待检测样本的高光谱数据输入到经过训练的S3ANet模型中进行特征提取，并将提取的特征向量输入到训练好的肿瘤多基因检测模型，以预测待检测肿瘤样本的基因突变状态，输出肿瘤样本的基因检测结果。The data input and prediction module is used to input the hyperspectral data of the sample to be tested into the trained S3ANet model for feature extraction, and input the extracted feature vector into the trained tumor multi-gene detection model to predict the gene mutation status of the tumor sample to be tested and output the gene detection results of the tumor sample.

实施例四Embodiment 4

本实施例提供了一种计算机可读存储介质，其上存储有计算机程序，在处理器执行时能够实现实施例一和二前列腺肿瘤识别方法中的步骤，如上述实施例以和二中所述内容。This embodiment provides a computer-readable storage medium having a computer program stored thereon, which can implement the steps in the prostate tumor identification method of embodiments one and two when executed by a processor, such as the contents described in the above embodiments one and two.

实施例五Embodiment 5

本实施例提供了一种计算机设备，包括存储器、处理器和存储在存储器上并可运行在处理器上的计算机程序，实现了上述实施例一和二所述基于高光谱图像的前列腺肿瘤识别方法中的步骤，并在处理器执行所述程序时实现。This embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, which implements the steps in the prostate tumor identification method based on hyperspectral images described in the above-mentioned embodiments 1 and 2, and is implemented when the processor executes the program.

以上所述仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The tumor polygene detection method based on hyperspectral images is characterized by comprising the following steps of:

step S1: collecting tumor tissue samples, obtaining tumor tissue sections of different oncogene mutation states subjected to H & E staining, and preserving at low temperature;

Step S2: acquiring the spectrum characteristics of the ambient light, collecting hyperspectral data of a reference object area by taking a non-fluorescent and white diffuse reflection material as a reference object, and obtaining an ambient light characteristic curve through average treatment;

Step S3: thawing a tumor tissue sample, shooting a tumor tissue slice through a hyperspectral camera, and obtaining a hyperspectral image of the tumor sample, wherein the hyperspectral image of the tumor sample comprises a spectrum and a three-dimensional space dimension;

step S4: preprocessing a hyperspectral image of a tumor sample;

Step S5: building a global feature extractor based on S3Anet, wherein S3Anet is a basic structure framework combining a 3D convolutional neural network and a self-attention mechanism, and is used for extracting the form and optical features of a tumor sample, and automatically extracting the features of hyperspectral data by a self-supervision learning task training model;

Step S6: constructing a tumor polygene detection model, utilizing multitask learning training, utilizing feature vectors extracted by S3ANet to perform two-classification on mutation states of each gene through a plurality of task specific layers, and optimizing detection results of a plurality of genes;

Step S7: collecting a tumor sample to be detected, and acquiring hyperspectral data of the sample to be detected through a hyperspectral camera;

step S8: inputting the hyperspectral data obtained in the step S7 into a trained S3ANet model for feature extraction, inputting the obtained feature vector into a trained polygene detection model, predicting the gene mutation state of a tumor sample to be detected by using the model, and outputting the gene detection result of the tumor sample.

2. The hyperspectral image-based tumor polygene detection method of claim 1, wherein the tumor tissue samples collected in the step S1 comprise tumor sections carrying mutation states of EGFR gene mutation positive and negative, ALK, FGFR1, PIK3CA, KRAS, ERCC1, RRM1 and HER2 genes, and the samples are derived from normal tissues with free postoperative edges of the surgical patients.

3. The hyperspectral image-based tumor polygene detection method of claim 1, wherein the preprocessing of step S4 comprises:

step S401: performing environmental noise reduction treatment on the hyperspectral data of the tumor sample by using the environmental light characteristic curve obtained in the step S2, and subtracting the environmental light spectrum characteristic to obtain the hyperspectral data of the tumor sample after noise reduction;

Step S402: carrying out normalization processing on the tumor sample hyperspectral data after noise reduction, converting a spectrum value into a range of [0, 1], and improving data consistency by a min-max normalization method;

step S403: performing data enhancement processing, including generating new training samples through image rotation;

Step S404: the hyperspectral image is divided into small three-dimensional data blocks, each three-dimensional data block contains a certain number of adjacent pixels and information of all spectrum channels, and a random gradient descent method is adopted to accelerate the model training process.

4. The hyperspectral image based tumor polygene detection method as claimed in claim 3, wherein the normalization process in step S402 divides the spectral value of each pixel by the difference between the maximum value and the minimum value by calculating the minimum value and the maximum value in the whole dataset.

5. The hyperspectral image based tumor polygene detection method as set forth in claim 3, wherein the data enhancement in step S403 processes the original image by a random rotation angle, simulating the change of viewing angle in practical application.

6. The hyperspectral image based tumor polygene detection method of claim 1, wherein the S3ANet comprises a three-dimensional convolutional neural network module consisting of a plurality of convolutional layers, a global averaging pooling layer and a fully connected layer;

the convolution layer is used for extracting the characteristics of the space dimension and the spectrum dimension in the hyperspectral image;

the global average pooling layer maps the extracted features into feature vectors with fixed lengths;

And the full connection layer is used for extracting and compressing global features so as to facilitate the feature expression in the classification task.

7. The hyperspectral image based tumor polygene detection method of claim 6, wherein the S3ANet further comprises a self-attention mechanism module, wherein the self-attention mechanism generates a feature weight matrix by calculating the similarity between hyperspectral image features, and weights the hyperspectral image features by using the matrix.

8. The hyperspectral image based tumor polygene detection method of claim 7, wherein the S3ANet is trained by self-supervised learning, the self-supervised learning comprising an image reconstruction task, in particular: and inputting the hyperspectral image into the S3ANet, processing the hyperspectral image by the three-dimensional convolutional neural network module and the self-attention mechanism module, and outputting a reconstructed hyperspectral image, wherein the reconstruction task optimizes network parameters by calculating the spectrum similarity of the input image and the reconstructed image until the reconstruction error is minimized.

9. The hyperspectral image based tumor polygene detection method of claim 1, wherein the multi-task learning model in step S6 is used for simultaneously predicting mutation states of a plurality of tumor genes, and comprises an input layer, a shared layer and a task specific layer;

the sharing layer is used for extracting common characteristics of a plurality of gene classification tasks;

The task specific layer is independently constructed for each gene task and comprises a full-connection layer and a sigmoid activation function, and is used for two-class prediction of the mutation state of genes.

10. The hyperspectral image based tumor polygene detection method of claim 9, wherein the multitasking learning model is trained by optimizing the independent loss function of each genetic task, the final loss being a weighted sum of all task losses, the training employing a batch gradient descent method, and evaluating the model for accuracy, F1 score, recall, and precision by a validation set.