CN119515773A

CN119515773A - A self-supervised coloring normalization method based on denoising diffusion probability model

Info

Publication number: CN119515773A
Application number: CN202411395800.8A
Authority: CN
Inventors: 刘少军; 杨淮水
Original assignee: Shenzhen Technology University
Current assignee: Shenzhen Technology University
Priority date: 2024-10-08
Filing date: 2024-10-08
Publication date: 2025-02-25

Abstract

The present invention provides a self-supervised staining standardization method based on a denoising diffusion probability model, which is mainly used to solve the problems of low robustness and generalization in the analysis of histopathological images by computer-aided systems, as well as information loss and staining errors in current staining standardization algorithms. Compared with the prior art, the present invention achieves staining standardization of histopathological images with multiple staining styles through a denoising diffusion probability model and a self-supervised training method, retains the cell morphological structure, improves the accuracy of disease diagnosis in downstream tasks such as classification and segmentation by computer-aided systems, and is beneficial to clinical practice and the advancement of medical research.

Description

Self-supervision dyeing standardization method based on denoising diffusion probability model

Technical Field

The invention relates to the technical field of histopathological image staining standardization, in particular to a self-supervision staining standardization method based on a denoising diffusion probability model.

Background

With the development of computer-aided detection/diagnosis, histopathological images are becoming increasingly important for the diagnosis and prognosis of cancer. However, histopathological images exhibit different staining styles due to differences in staining techniques, handling skills, and scanner specifications. These diverse staining styles reduce the robustness of computer-aided detection/diagnostic algorithms.

In recent years, staining normalization methods have been limited to traditional staining vector decomposition, learning-based generation of countermeasure networks and variants thereof. Traditional standardized methods of staining typically require a medical professional to screen the stained templates, consuming a significant amount of time and effort. The supervision method in the learning-based method needs paired images, and is difficult to acquire in clinical application. The unsupervised method often has the problems of dyeing errors and cell structure change, and the unsupervised method is more than the method for mapping gray images into color images, so that the problems of poor generalization capability and information loss exist.

Disclosure of Invention

Aiming at the defects of the existing dyeing standardization technology, the invention aims to provide a self-supervision dyeing standardization method based on a denoising diffusion probability model, which systematically converts a dyeing normalization task into a self-supervision pixel alignment color mapping, avoids the need of pairing data, adapts to various dyeing styles, can be suitable for different dyeing scenes without retraining, designs a repurposing sampling algorithm, has the characteristics of high speed, high accuracy, no damage to cell morphological structures and the like, provides a robust and effective solution for the dyeing standardization of digital tissue pathology images, is beneficial to improving the accuracy of disease diagnosis and subsequent analysis, and advances clinical practice and medical research.

The invention realizes the above purpose through the following technical scheme:

a self-supervision dyeing standardization method based on a denoising diffusion probability model comprises the following steps:

obtaining pathological images of different stained tissues, and dividing a target staining data set and other staining data sets;

Preprocessing the obtained other dyeing data sets, and integrating the extracted dyeing vectors into a dyeing matrix database;

designing a self-supervision dyeing standardized training strategy, and performing dyeing enhancement pretreatment on a target dyeing data set by using dyeing vectors in a dyeing matrix database;

constructing a dyeing standardization model based on a denoising diffusion probability model, and performing self-supervision training on the denoising diffusion probability model by utilizing a preprocessed data set;

Judging whether the training achieves the expected effect, if so, sampling the tissue pathology image on the trained model by adopting a re-normalization sampling strategy to obtain a dyeing standardization result, otherwise, readjusting the self-supervision dyeing standardization training strategy, and retraining the model by using the adjusted strategy until the satisfactory standardization effect is achieved.

According to the self-supervision dyeing standardization method based on the denoising diffusion probability model, the acquired tissue pathology image data set is tidied, and the method comprises the following steps:

traversing all images in the obtained histopathological image data set, and dividing the histopathological images with different dyeing styles;

one dyeing style of target dyeing image is screened and stored as data set 0, and the other dyeing images are stored as data set 1.

According to the self-supervision dyeing standardization method based on the denoising diffusion probability model provided by the invention, the pretreatment of other acquired dyeing data sets comprises the following steps:

Mapping RGB color space of each tissue pathology image in the data set 1 into LAB color space, and carrying out threshold truncation on the image on a brightness channel to obtain a tissue region pixel mask;

Mapping the RGB color space of the image into an optical density space, multiplying the optical density image by a tissue region pixel mask, and removing the region with little staining to highlight the staining range of interest;

Flattening the truncated images into vectors, and then decomposing the vectors by using singular values to obtain two vertical dyeing matrixes, wherein the dyeing matrixes of each image in the data set 1 form a dyeing matrix database.

According to the self-supervision dyeing standardization method based on the denoising diffusion probability model, the design of the self-supervision dyeing standardization training strategy comprises the following steps:

Randomly selecting a staining matrix from a staining matrix database;

adding a disturbance to the randomly selected staining matrix;

For the disturbed dyeing matrix, searching samples with the Euclidean distance within 0.1 in a dyeing matrix database by using a K neighbor algorithm, ensuring that the number of the searched neighbors is more than 5, and otherwise, returning to the dyeing matrix database to reselect the dyeing matrix for reinforcement;

For an input image, calculating a staining matrix thereof;

correcting the dyeing matrix of the input image into a disturbed dyeing matrix to obtain a dyeing enhancement image;

and adding Gaussian blur to the dyeing enhancement image to obtain a final dyeing enhancement preprocessed image.

According to the self-supervision dyeing standardization method based on the denoising diffusion probability model, disturbance is added to a randomly selected dyeing matrix, and the method is expressed as the following formula:

M′=M×a+b

Wherein M is a randomly selected dyeing matrix, M' is a dyeing matrix added with disturbance, a is a random number ranging from 0.3 to 1.7, and b is a random number ranging from-0.7 to 0.7.

According to the self-supervision dyeing standardization method based on the denoising diffusion probability model, the construction of the denoising diffusion probability model comprises the following steps:

constructing a forward noise adding process, and gradually adding noise to the image through a parameterized Markov chain until the distribution of the image approaches to a standard Gaussian distribution;

Constructing a UNet network for predicting noise, receiving inputs of a noisy image x _t, an image x to be normalized and a current time t, and outputting the noise predicted in the current state

Constructing a backward denoising process by predicted random noiseAnd the noisy image x _t, the noisy image x _t-1 at the next moment is obtained through reverse reasoning until the noise is completely removed to obtain a dyed standardized image.

According to the self-supervision dyeing standardization method based on the denoising diffusion probability model, provided by the invention, noise is gradually added to an image through a parameterized Markov chain until the distribution of the image approaches to a standard Gaussian distribution, and the method is expressed as the following formula:

Where x ₀ represents a clean image that has not been denoised, x _t is a denoised image at time t, e is random noise sampled from a normal distribution, Is the super parameter of the denoising diffusion probability model at the moment t.

According to the self-supervision dyeing standardization method based on the denoising diffusion probability model provided by the invention, the judgment training accords with the expected effect or not, and the method comprises the following steps:

Comparing the performances of the dyeing standard image generated by the training model and the dyeing standard image really required in terms of structural similarity SSIM, peak signal to noise ratio PSNR and learning perceived image block similarity LPIPS, the method is expressed as the following formula:

Where μ is the mean, σ is the standard deviation, σ _xy is the covariance of image x and image y, MSE is the mean square error of the two images, maxValue is the maximum that image pixels can take, AndIs the i-th layer feature extracted by the pre-trained neural network, w _l is the corresponding weight vector.

The dyeing standardization result is applied to related downstream tasks such as classification, segmentation and the like, and the corresponding index of the computer-aided detection/diagnosis system on the dyed standardized image is better than that of the non-dyed standardized image.

According to the self-supervision dyeing standardization method based on the denoising diffusion probability model, which is provided by the invention, a renormalization sampling strategy is adopted to sample the tissue pathology image, and the method comprises the following steps:

Adding noise to a certain degree to an image to be normalized, and taking the image as an initial image in a backward denoising process;

And part of steps in the backward denoising process are skipped through a non-Markov chain method, so that a dyeing standardized image of the denoising diffusion probability model is rapidly and accurately obtained.

According to the self-supervision dyeing standardization method based on the denoising diffusion probability model, provided by the invention, partial steps in the backward denoising process are skipped through a non-Markov chain method, and the method is represented as the following formula:

where prev of x _prev denotes that it can be separated from t by a number of steps.

It can be seen that, therefore, the present invention has the following beneficial effects over the prior art:

1. Automated traditional histopathological image staining standardization usually requires professional personnel to manually select a staining template according to the staining effect, and the invention fully utilizes the denoising diffusion probability model to realize the staining standardization of different staining histopathological images, thereby remarkably reducing the burden of human resources and accelerating the whole flow of histopathological image analysis by introducing automated processing.

2. Accuracy by fully training and learning through a self-supervision dyeing standardization training strategy, the invention has higher level of capabilities of standardization of histopathological image dyeing and preservation of morphological structure. The present invention more accurately identifies critical information in histopathological images than currently available methods, thereby providing more reliable staining normalization.

3. The invention adopts a high-efficiency re-planning sampling algorithm, solves the defect of low speed of the denoising and diffusion probability model, improves the speed by 120 times compared with the initial sampling process of the denoising and diffusion probability model, achieves the speed equivalent to that of the traditional method, and provides powerful support for rapid diagnosis decision in the real-time analysis process.

4. The invention performs dyeing standardization on the histopathological images with different dyeing styles, solves the problem of poor generalization capability of a computer-aided detection/diagnosis system on the histopathological images with different dyeing styles, and can improve the accuracy of disease diagnosis and prognosis analysis and the effect of treatment by combining the analysis result of the computer-aided detection/diagnosis system and the experience and judgment of doctors, thereby being beneficial to making more accurate diagnosis and treatment decisions.

5. Self-supervised learning can be trained with unpaired data, reducing reliance on large numbers of paired, different staining styles of histopathological images. Through self-supervised learning, the model can learn the intrinsic rules and features in the data, so that better generalization capability can be shown in the face of new and unseen data.

6. The denoising diffusion probability model (DDM) has excellent performance in the field of image generation, and can generate high-quality and vivid images, and the generating capability provides a solid foundation for dyeing standardization, so that the model can better simulate and restore the target dyeing style. The DDM trains and learns the data in a mode of gradually adding and removing noise, so that the model can be better adapted to pathological images with different dyeing styles, and the accuracy and the robustness of dyeing standardization are improved.

7. According to the invention, a self-supervision dyeing standardization training strategy is designed, and dynamic adjustment is carried out according to the training effect, so that the model can adapt to different data distribution and dyeing styles, and a better standardization effect is achieved. The tissue pathology image is sampled on the trained model by adopting a re-normalization sampling strategy, a high-precision dyeing standardization result can be obtained, and the sampling strategy can ensure that the standardized image has a consistent dyeing style while keeping the original tissue structure.

The invention is described in further detail below with reference to the drawings and the detailed description.

Drawings

FIG. 1 is a flow chart of an embodiment of a self-supervised staining normalization method based on a denoising diffusion probability model.

FIG. 2 is a flow chart of a self-supervised staining normalization training strategy for an algorithm model in an embodiment of a self-supervised staining normalization method based on a denoising diffusion probability model according to the present invention.

FIG. 3 is a schematic diagram of an overall framework of an algorithm model in an embodiment of a self-supervised staining normalization method based on a denoising diffusion probability model.

Fig. 4 is a schematic diagram showing the effect of the self-supervision staining normalization method based on the denoising diffusion probability model according to the embodiment of the present invention on the data set 1.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

Referring to fig. 1 to 4, the present invention provides a self-supervision dyeing standardization method based on a denoising diffusion probability model, the method comprising the steps of:

Step S1, obtaining pathological images of different dyed tissues, and dividing a target dyeing data set and other dyeing data sets;

s2, preprocessing other acquired dyeing data sets, and integrating the extracted dyeing vectors into a dyeing matrix database;

S3, designing a self-supervision dyeing standardized training strategy, and performing dyeing enhancement pretreatment on a target dyeing data set by using dyeing vectors in a dyeing matrix database;

S4, constructing a dyeing standardization model based on a denoising diffusion probability model, and performing self-supervision training on the denoising diffusion probability model by utilizing the preprocessed data set;

and S5, judging whether the training achieves the expected effect, if so, executing the step S6.

And S6, sampling the tissue pathology image on the trained model by adopting a re-normalization sampling strategy to obtain a dyeing standardization result, otherwise, readjusting the self-supervision dyeing standardization training strategy, and retraining the model by using the adjusted strategy until a satisfactory standardization effect is achieved.

In the step S1, the arrangement of the obtained histopathological image data set includes:

step S11, traversing all images in the obtained histopathological image data set, and dividing the histopathological images with different dyeing styles generated by dyes, dyeing procedures, digital scanners and the like;

in step S12, a target dyeing image of one dyeing style is screened and stored as data set 0, and the other dyeing images are stored as data set 1.

In the above step S2, the other acquired staining data sets are preprocessed, including:

Step S21, mapping RGB color space of the image into LAB color space for each histopathological image in the data set 1, and cutting off the image on a brightness channel by a threshold (for example, 0.8) to obtain a tissue region pixel mask;

step S22, mapping RGB color space of the image into optical density space, multiplying the optical density image by tissue region pixel mask, removing the region with little dyeing to highlight the dyeing range of interest;

The optical density space, so-called OD, which is a direct correlation with dye mass, is reflected in the photograph as less light is transmitted and as more dye mass is measured, as the OD is higher. The optical density space reflects the distribution of staining in the histopathological image more than the RGB color space. The lambert-beer law and the mathematical expression for the conversion of the RGB color space into optical density space are:

OD=-log₁₀(I) (2)

Wherein A is absorbance, T is transmittance (transmittance, transmitted light intensity to incident light intensity), K is molar absorption coefficient, c is light absorbing substance concentration, L is absorption layer thickness, and I is RGB vector of the image.

Step S23, flattening the truncated images into vectors, and then decomposing the vectors by using singular values to obtain two vertical dyeing matrixes, wherein the dyeing matrixes of each image in the data set 1 form a dyeing matrix database.

In the above step S3, a self-supervision dyeing standardization training strategy is designed, including:

Randomly selecting a staining matrix from a staining matrix database;

for randomly selected staining matrices, a perturbation is added, expressed as the following formula:

M′=M×a+b (3)

For an input image, calculating a staining matrix thereof;

Correcting the dyeing matrix of the input image into a disturbed dyeing matrix to obtain a dyeing enhancement image, wherein the process can be represented by the following formula:

Where 99 denotes the 99 th percentile, M _x is the staining matrix of the input image, M 'is the staining matrix after randomly selecting and adding the perturbation, and M' _x is the staining matrix after correcting the input image.

In the step S4, the construction of the denoising diffusion probability model includes:

The forward noise adding process is constructed, the noise adding purpose is to enable a noise removing diffusion probability model to generate target data samples from noise, the image is subjected to parameterized Markov chain to gradually add noise until the distribution of the image is close to standard Gaussian distribution, and the noise adding purpose is expressed as the following formula:

Constructing a backward denoising process by predicted random noiseAnd the noisy image x _t, the noisy image x _t-1 at the next moment is obtained through reverse reasoning until the noise is completely removed to obtain a dyed standardized image, and the formula is as follows:

Where z is random noise sampled from a normal distribution, σ _t、α_t and The method is an ultra-parameter of a denoising diffusion probability model at the moment t, and a clean dyeing standardized image is finally restored by gradually obtaining an image x _t-1 at the next moment.

In the step S5, determining whether the training meets the expected effect includes:

When comparing the performance of the dyeing standard image generated by the training model with the real required dyeing standard image in terms of structural similarity SSIM, peak signal-to-noise ratio PSNR and learning perceived image block similarity LPIPS, the method should achieve a good effect from visual subjective evaluation better than the current method, and is expressed as the following formula:

Where μ is the mean, σ is the standard deviation, σ _xy is the covariance of image x and image y, MSE is the mean square error of the two images, maxValue is the maximum that image pixels can take, AndIs a first layer feature extracted by the pre-trained neural network, w _l is the corresponding weight vector.

The method is characterized in that a dyeing standardization result is applied to related downstream tasks such as classification, segmentation and the like, and corresponding indexes of a computer-aided detection/diagnosis system on the dyed standardized image, such as Accuracy Accuracy of classification, a measurement function for representing a percentage of a correct sample number to all samples and a segmentation Dice similarity Coefficient (DICE SIMILARITY Coefficient, DSC) for evaluating similarity of two samples, and the like are superior to those of the undyed standardized image, and are expressed as the following formulas:

Where TP is the number of positive samples actually and predicted to be positive samples, FP is the number of negative samples actually and predicted to be positive samples, TN is the number of negative samples actually and predicted to be negative samples, FN is the number of positive samples actually and predicted to be negative samples, samples are images in the classification task, and samples are pixels of the images in the segmentation task.

Further, in the step S6, the sampling of the histopathological image using the renormalization sampling strategy for staining normalization includes:

the image to be normalized is added with a certain degree of noise and is used as an initial image in the backward denoising process, and the formula is as follows:

Where x _N represents the initial image during backward denoising, and N represents the initial sampling time step selected by the empirical value, i.e., a degree of noise.

By a non-Markov chain method, part of steps in the backward denoising process are skipped, so that a dyeing standardized image of a denoising diffusion probability model is rapidly and accurately obtained, and the dyeing standardized image is expressed as the following formula:

The sampling strategy and the re-planning sampling strategy of the original denoising diffusion probability model select respective optimal parameters for test set verification, and relevant data are compared as shown in the following table (1) after the test:

Watch (1)

As can be seen from the above Table (1), through the re-planning sampling strategy of the present invention, SSIM, PSNR and LPIPS are all improved, and the reasoning speed is improved by 120 times. The invention is illustrated that better dyeing standardization performance and effect can be realized with smaller time complexity.

The invention uses the best parameters obtained by respective training in the task of breast cancer classification and colorectal gland segmentation to carry out test set verification on the dyeing standardization and the non-dyeing standardization, and relevant data are compared with the following table (2) (3) after the test:

Watch (2)

Watch (3)

From the tables (2) and (3), the self-supervision dyeing standardization method based on the denoising diffusion probability model improves breast cancer classification and colorectal gland segmentation, so that the method has the advantages of realizing good dyeing standardization effect, improving the accuracy of disease diagnosis of a computer-aided system in downstream tasks such as classification and segmentation, and being beneficial to clinical practice and advancing medical research.

In summary, compared with the prior art, the self-supervision staining standardization method based on the denoising diffusion probability model comprises the following steps of firstly, dividing an image into a target staining style part and other staining style parts after collecting histopathological image data of different staining styles. And then, preprocessing the acquired image data sets of other styles, including converting optical density space to intercept the dyeing region of interest, extracting the characteristics of the dyeing matrix, and finally forming a dyeing matrix database for training. In the preprocessing stage of model training, randomly selecting a dyeing matrix from a dyeing matrix database, adding disturbance, judging whether the dyeing matrix accords with reality through a K nearest neighbor algorithm, correcting the dyeing matrix of an input image into the disturbed dyeing matrix, and adding Gaussian blur to obtain a final preprocessing result. In the model construction stage, a denoising diffusion probability algorithm model is adopted. The model integral framework comprises a forward noise adding process, a UNet network for predicting noise and a backward noise removing process. After training, by evaluating the performance of the model, including the structural similarity SSIM, the peak signal-to-noise ratio PSNR, the learning perception image block similarity LPIPS and the inference time, it is determined whether the training achieves the expected effect, and further, the dyeing standardization result can be applied to related downstream tasks such as classification, segmentation, etc., and the computer-aided detection/diagnosis system is used to perform corresponding processing on the dyed standardization image, so as to determine whether the recognition capability of the computer system is improved. If the effects are all in line with expectations, a self-supervision staining model based on a denoising diffusion probability model is deployed on a histopathological image processing platform. And uploading histopathological images with different dyeing styles on a platform, and performing dyeing standardization on the images by using a trained model to finally obtain a uniform dyeing standardization image. The goal of this overall flow is to improve the accuracy of disease detection and diagnosis in downstream tasks such as classification and segmentation by computer-aided systems, advance clinical practice and advance medical research.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but any insubstantial changes and substitutions made by those skilled in the art on the basis of the present invention are intended to be within the scope of the present invention as claimed.

Claims

1. The self-supervision dyeing standardization method based on the denoising diffusion probability model is characterized by comprising the following steps of:

2. The method of claim 1, wherein the sorting of the acquired histopathological image dataset comprises:

3. The method according to claim 1, wherein the preprocessing of the acquired further staining data sets comprises:

4. The method of claim 1, wherein the designing a self-supervising staining standardized training strategy comprises:

Randomly selecting a staining matrix from a staining matrix database;

adding a disturbance to the randomly selected staining matrix;

For an input image, calculating a staining matrix thereof;

5. The method according to claim 4, characterized in that for a randomly selected staining matrix, a perturbation is added, expressed as the following formula:

M^′=M×a+b

Wherein M is a randomly selected dyeing matrix, M ^′ is a dyeing matrix added with disturbance, a is a random number ranging from 0.3 to 1.7, and b is a random number ranging from-0.7 to 0.7.

6. The method of claim 1, wherein the constructing of the denoising diffusion probability model comprises:

7. The method of claim 6, wherein the parameterized markov chain of the image is progressively noisy until its distribution approximates a standard gaussian distribution, expressed by the following formula:

8. The method of claim 1, wherein determining whether the training meets the expected effect comprises:

Where μ is the mean, σ is the standard deviation, σ _xy is the covariance of image x and image y, MSE is the mean square error of the two images, maxValue is the maximum that image pixels can take, AndIs a first layer feature extracted by a pre-trained neural network, w _l is a corresponding weight vector;

The dyeing standardization result is applied to the related downstream tasks, and the corresponding index of the computer-aided detection/diagnosis system on the dyed standardized image is better than that of the non-dyed standardized image.

9. The method of claim 1, wherein sampling the histopathological image using a re-normalization sampling strategy comprises:

10. The method according to claim 9, wherein:

part of the steps in the jump backward denoising process are expressed by the following formula by a non-Markov chain method: