CN117542469A

CN117542469A - Diagnostic report generation method, system and medium for multi-modal medical data

Info

Publication number: CN117542469A
Application number: CN202311517989.9A
Authority: CN
Inventors: 黄飞跃; 马勇; 徐宇辰; 柏志安
Original assignee: Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd
Current assignee: Ruinjin Hospital Affiliated to Shanghai Jiaotong University School of Medicine Co Ltd
Priority date: 2023-11-15
Filing date: 2023-11-15
Publication date: 2024-02-09

Abstract

The disclosure provides a diagnostic report generation method, a system and a medium for multi-modal medical data, wherein the diagnostic report generation method for the multi-modal medical data comprises the following steps: acquiring a multi-modal training data set, wherein the multi-modal training data set comprises a medical image and a text report; splitting the text report according to preset keywords, and determining a negative description text and a positive description text; inputting the medical image, the negative description text and the positive description text into a pre-trained diagnostic report generation model, and outputting a loss function; optimizing the pre-trained diagnostic report generation model according to the loss function, and determining a multi-mode medical diagnostic report generation model; inputting the medical image to be detected into a multi-mode medical diagnosis report generating model to generate a diagnosis report corresponding to the medical image to be detected. The method solves the problem of unbalanced training samples in the model training stage, and improves the detection capability of the model on abnormal description in the medical image.

Description

Diagnostic report generation method, system and medium for multi-modal medical data

技术领域Technical field

本公开涉及计算机视觉和自然语言处理技术领域，具体地，涉及一种面向多模态医学数据的诊断报告生成方法、系统及介质。The present disclosure relates to the technical fields of computer vision and natural language processing, and in particular, to a diagnostic report generation method, system and medium for multi-modal medical data.

背景技术Background technique

随着医疗水平的提升以及医学影像技术的发展，医学影像数据成为患者电子档案的重要组成部分，撰写一份合格的医学影像报告需要医生具有专业的医学影像知识、临床医学知识以及临床经验，并且对病灶随着时间的变化具有一定的评估能力，对患者病史及其他检查结果具有综合判断能力。With the improvement of medical standards and the development of medical imaging technology, medical imaging data has become an important part of patients’ electronic files. Writing a qualified medical imaging report requires doctors to have professional medical imaging knowledge, clinical medicine knowledge and clinical experience, and Have certain ability to evaluate changes in lesions over time, and have the ability to make comprehensive judgments on patient medical history and other examination results.

然而现如今医学影像数据呈指数级增长，过量的数据给临床医生带来巨大压力。医疗领域积累了大量的<医学图像-文本报告>数据，这些图文数据具有很强的内在相关性，若根据这些海量的医院内数据，采用计算机辅助诊断系统优化医生的工作流程，采用计算机对医学图像进行分析和处理，并对检查报告进行信息抽取，完成影像报告的自动生成，医生仅对最终的报告进行审核修改，能够极大地缓解医生的工作压力，对减少误诊漏诊有非常大的缓解作用，该应用具有重要的实际应用价值。However, nowadays, medical imaging data is growing exponentially, and the excess data puts tremendous pressure on clinicians. The medical field has accumulated a large amount of "Medical Image-Text Report" data. These graphic and text data have strong intrinsic correlation. If based on these massive hospital data, a computer-aided diagnosis system can be used to optimize the doctor's workflow, and computer-aided diagnosis can be used to optimize the doctor's workflow. Medical images are analyzed and processed, information is extracted from the examination report, and the image report is automatically generated. The doctor only reviews and modifies the final report, which can greatly relieve the doctor's work pressure and greatly reduce misdiagnosis and missed diagnosis. function, this application has important practical application value.

医生在编写影像报告时，会同时参考医学影像、检查报告等特征。其中文本报告包括实验室检查报告，临床记录等，这些信息难以通过传统技术方法进行综合处理，需要采用深度学习方法对图像数据和文本数据进行特征抽取和特征对齐，并利用对齐后的特征生成影像报告文本，与此同时，影像报告文本描述中存在大量的非异常语句描述，定义为阴性描述，以及影像中的出现病变等异常特征描述，定义为阳性描述，大量的阴性描述导致正负样本不均衡问题，影响最终报告生成任务对异常描述的检出率。When doctors write imaging reports, they will also refer to features such as medical images and examination reports. Text reports include laboratory test reports, clinical records, etc. This information is difficult to comprehensively process through traditional technical methods. Deep learning methods need to be used to extract and align features of image data and text data, and use the aligned features to generate images. Report text. At the same time, there are a large number of non-abnormal sentence descriptions in the image report text description, which are defined as negative descriptions, and descriptions of abnormal features such as lesions in the images, which are defined as positive descriptions. The large number of negative descriptions lead to inconsistency between positive and negative samples. The balancing problem affects the detection rate of anomaly descriptions in the final report generation task.

发明内容Contents of the invention

针对现有技术中的缺陷，本公开的目的是提供一种面向多模态医学数据的诊断报告生成方法、系统及介质。In view of the deficiencies in the prior art, the purpose of this disclosure is to provide a diagnostic report generation method, system and medium for multi-modal medical data.

为实现上述目的，根据本公开的第一方面，提供一种面向多模态医学数据的诊断报告生成方法，包括：In order to achieve the above object, according to the first aspect of the present disclosure, a diagnostic report generation method for multi-modal medical data is provided, including:

获取多模态训练数据集，所述多模态训练数据集包括医学影像和文本报告；Obtaining a multi-modal training data set, the multi-modal training data set includes medical images and text reports;

根据预设的关键字将所述文本报告进行拆分处理，确定阴性描述文本和阳性描述文本；Split the text report according to preset keywords to determine negative description text and positive description text;

将所述医学影像、所述阴性描述文本、所述阳性描述文本输入预训练的诊断报告生成模型中，输出损失函数，对所述预训练的诊断报告生成模型进行模型训练，所述损失函数包括阴性损失函数和阳性损失函数；The medical image, the negative description text, and the positive description text are input into a pre-trained diagnostic report generation model, a loss function is output, and the pre-trained diagnostic report generation model is model trained, and the loss function includes Negative loss function and positive loss function;

根据所述损失函数对所述预训练的诊断报告生成模型进行优化处理，确定多模态医学诊断报告生成模型；Optimize the pre-trained diagnostic report generation model according to the loss function to determine a multi-modal medical diagnosis report generation model;

将待测的医学影像输入所述多模态医学诊断报告生成模型中，生成所述待测的医学影像对应的诊断报告。The medical image to be tested is input into the multi-modal medical diagnosis report generation model, and a diagnosis report corresponding to the medical image to be tested is generated.

可选地，所述预训练的诊断报告生成模型包括文本编辑器、图像编辑器、多模态文本生成器。Optionally, the pre-trained diagnostic report generation model includes a text editor, an image editor, and a multi-modal text generator.

可选地，所述将所述医学影像、所述阴性描述文本、所述阳性描述文本输入预训练的诊断报告生成模型中，输出损失函数，对所述预训练的诊断报告生成模型进行模型训练，包括：Optionally, the medical image, the negative description text, and the positive description text are input into a pre-trained diagnostic report generation model, a loss function is output, and model training is performed on the pre-trained diagnostic report generation model. ,include:

将所述医学影像输入所述图像编辑器进行图像特征提取处理，输出所述医学影像的图像特征；Input the medical image into the image editor for image feature extraction processing, and output the image features of the medical image;

将所述阴性描述文本和所述阳性描述文本输入所述文本编辑器，输出所述文本报告的文本特征，所述文本报告的文本特征包括阴性描述文本特征和阳性描述文本特征；Input the negative description text and the positive description text into the text editor, and output the text features of the text report, where the text features of the text report include negative description text features and positive description text features;

将所述图像特征和所述文本特征输入所述多模态文本生成器内，输出所述阴性损失函数和所述阳性损失函数。The image features and the text features are input into the multi-modal text generator, and the negative loss function and the positive loss function are output.

可选地，所述将所述图像特征和所述文本特征输入所述多模态文本生成器内，输出所述阴性损失函数和所述阳性损失函数，包括：Optionally, inputting the image features and the text features into the multi-modal text generator, and outputting the negative loss function and the positive loss function include:

将所述图像特征和所述文本特征在相同特征空间内进行特征对齐处理，确定经过所述对齐处理的所述图像特征和所述文本特征；Perform feature alignment processing on the image features and text features in the same feature space, and determine the image features and text features that have undergone the alignment processing;

根据经过所述对齐处理的所述图像特征和所述文本特征，生成所述医学影像对应的诊断报告；Generate a diagnosis report corresponding to the medical image according to the image features and the text features that have undergone the alignment process;

根据所述医学影像对应的诊断报告和所述文本报告，输出所述阴性损失函数和所述阳性损失函数。The negative loss function and the positive loss function are output according to the diagnosis report corresponding to the medical image and the text report.

可选地，所述根据所述损失函数对所述预训练的诊断报告生成模型进行优化处理，确定多模态医学诊断报告生成模型，包括：Optionally, optimizing the pre-trained diagnostic report generation model according to the loss function to determine a multi-modal medical diagnosis report generation model includes:

根据所述阴性损失函数和所述阳性损失函数设置所述文本报告中所述阴性描述文本和所述阳性描述文本的比例，优化所述预训练的诊断报告生成模型参数，确定所述多模态医学诊断报告生成模型。Set the proportion of the negative description text and the positive description text in the text report according to the negative loss function and the positive loss function, optimize the parameters of the pre-trained diagnostic report generation model, and determine the multi-modality Medical diagnosis report generation model.

可选地，所述预训练的诊断报告生成模型采用全量阴性描述文本的文本报告进行预训练处理。Optionally, the pre-trained diagnostic report generation model uses text reports containing all negative description texts for pre-training processing.

可选地，所述将待测的医学影像输入所述多模态医学诊断报告生成模型中，生成所述待测的医学影像对应的诊断报告，包括：Optionally, inputting the medical image to be tested into the multi-modal medical diagnosis report generation model and generating a diagnostic report corresponding to the medical image to be tested includes:

将所述待测的医学影像输入所述图像编辑器，输出所述待测的医学影像的图像特征；Input the medical image to be tested into the image editor, and output the image features of the medical image to be tested;

将所述待测的医学影像的图像特征输入所述多模态文本生成器内，生成所述待测的医学影像对应的诊断报告。The image features of the medical image to be tested are input into the multi-modal text generator to generate a diagnostic report corresponding to the medical image to be tested.

根据本公开的第二方面，提供一种面向多模态医学数据的诊断报告生成系统，包括：According to a second aspect of the present disclosure, a diagnostic report generation system for multi-modal medical data is provided, including:

根据本公开的第三方面，提供一种非临时性计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现本公开第一方面提供的面向多模态医学数据的诊断报告生成方法。According to a third aspect of the present disclosure, a non-transitory computer-readable storage medium is provided, on which a computer program is stored. When the program is executed by a processor, the multi-modal medical data-oriented diagnosis provided by the first aspect of the present disclosure is implemented. Report generation method.

根据本公开的第四方面，提供一种电子设备，包括：According to a fourth aspect of the present disclosure, an electronic device is provided, including:

存储器，其上存储有计算机程序；A memory on which a computer program is stored;

处理器，用于执行所述存储器中的所述计算机程序，以实现本公开第一方面提供的面向多模态医学数据的诊断报告生成方法的步骤。A processor, configured to execute the computer program in the memory to implement the steps of the diagnostic report generation method for multi-modal medical data provided in the first aspect of the present disclosure.

与现有技术相比，本公开实施例具有如下至少一种有益效果：Compared with the prior art, the embodiments of the present disclosure have at least one of the following beneficial effects:

通过上述技术方案，在对预训练的诊断报告生成模型进行模型训练过程中，输出阴性损失函数和阳性损失函数，能够根据阴性损失函数和阳性损失函数设置文本报告内阴性描述文本和阳性描述文本的比例，引导阳性描述文本对预训练的诊断报告生成模型进行训练，以解决训练样本不均衡的问题，调整阴性描述文本和阳性描述文本的损失权重，提高对医学影响的异常描述的检测精度，有效防止漏检异常病变的情况发生，并且，采用训练完成的多模态医学诊断报告生成模型自动生成诊断报告，缓解医生的工作压力，提高工作效率。Through the above technical solution, during the model training process of the pre-trained diagnostic report generation model, the negative loss function and the positive loss function are output, and the negative description text and the positive description text in the text report can be set according to the negative loss function and the positive loss function. Ratio, guide the positive description text to train the pre-trained diagnostic report generation model to solve the problem of imbalanced training samples, adjust the loss weight of negative description text and positive description text, and improve the detection accuracy of abnormal descriptions with medical impact, effectively It prevents the missed detection of abnormal lesions, and uses the trained multi-modal medical diagnosis report generation model to automatically generate diagnostic reports to relieve doctors' work pressure and improve work efficiency.

附图说明Description of drawings

通过阅读参照以下附图对非限制性实施例所作的详细描述，本公开的其它特征、目的和优点将会变得更明显：Other features, objects and advantages of the present disclosure will become more apparent upon reading the detailed description of the non-limiting embodiments with reference to the following drawings:

图1是根据一示例性实施例示出的一种面向多模态医学数据的诊断报告生成方法的流程图。FIG. 1 is a flow chart of a method for generating a diagnostic report for multi-modal medical data according to an exemplary embodiment.

图2是根据一示例性实施例示出的预训练的诊断报告生成模型的结构示意图。Figure 2 is a schematic structural diagram of a pre-trained diagnostic report generation model according to an exemplary embodiment.

图3是根据一示例性实施例示出的一种预训练的诊断报告生成模型进行模型训练的方法的流程图。FIG. 3 is a flowchart of a method for training a pre-trained diagnostic report generation model according to an exemplary embodiment.

图4是根据一示例性实施例示出的一种面向多模态医学数据的诊断报告生成系统的框图。FIG. 4 is a block diagram of a diagnostic report generation system for multi-modal medical data according to an exemplary embodiment.

图5是根据一示例实施例示出的一种电子设备的框图。FIG. 5 is a block diagram of an electronic device according to an example embodiment.

具体实施方式Detailed ways

下面结合具体实施例对本公开进行详细说明。以下实施例将有助于本领域的技术人员进一步理解本公开，但不以任何形式限制本公开。应当指出的是，对本领域的普通技术人员来说，在不脱离本公开构思的前提下，还可以做出若干变形和改进。这些都属于本公开的保护范围。The present disclosure will be described in detail below with reference to specific embodiments. The following examples will help those skilled in the art further understand the present disclosure, but do not limit the present disclosure in any form. It should be noted that, for those of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present disclosure. These all belong to the protection scope of this disclosure.

图1是根据一示例性实施例示出的一种面向多模态医学数据的诊断报告生成方法的流程图。如图1所示，一种面向多模态医学数据的诊断报告生成方法，包括S11至S15。FIG. 1 is a flow chart of a method for generating a diagnostic report for multi-modal medical data according to an exemplary embodiment. As shown in Figure 1, a diagnostic report generation method for multi-modal medical data includes S11 to S15.

S11获取多模态训练数据集。S11 obtains the multi-modal training data set.

其中，多模态训练数据集包括医学影像和文本报告，多模态训练数据集可以采用多个医学影像和其对应的文本报告组成的组合对的形式。多模态训练数据集用于对预训练的诊断报告生成模型进行模型训练。Among them, the multi-modal training data set includes medical images and text reports, and the multi-modal training data set can be in the form of a combined pair consisting of multiple medical images and their corresponding text reports. The multimodal training dataset is used for model training of the pretrained diagnostic report generation model.

S12根据预设的关键字将文本报告进行拆分处理，确定阴性描述文本和阳性描述文本。S12 splits the text report according to the preset keywords and determines the negative description text and the positive description text.

其中，阴性描述文本表示文本报告内涉及医学影像中未出现病变异常特征的文本描述，阳性描述文本表示文本报告内涉及医学影像中出现病变异常特征的文本描述。Among them, the negative description text represents the text description in the text report involving the abnormal characteristics of lesions that do not appear in the medical images, and the positive description text represents the text description in the text report that involves the abnormal characteristics of lesions in the medical images.

S13将医学影像、阴性描述文本、阳性描述文本输入预训练的诊断报告生成模型中，输出损失函数，对预训练的诊断报告生成模型进行模型训练。S13 inputs the medical image, negative description text, and positive description text into the pre-trained diagnostic report generation model, outputs the loss function, and performs model training on the pre-trained diagnostic report generation model.

其中，损失函数包括阴性损失函数和阳性损失函数，阴性损失函数表示阴性描述文本对应的损失函数，阳性损失函数表示阳性描述文本对应的损失函数。Among them, the loss function includes a negative loss function and a positive loss function. The negative loss function represents the loss function corresponding to the negative description text, and the positive loss function represents the loss function corresponding to the positive description text.

在一些可能的实施例中，首先，可以采用全量阴性描述文本的文本报告和其对应的医学影像对预设的诊断报告生成模型进行预训练处理，获取预训练的诊断报告生成模型；其次，再采用包含阴性描述文本和阳性描述文本的文本报告逐步对预训练的诊断报告生成模型进行模型训练处理，提高诊断报告生成模型对阳性描述文本的识别能力。In some possible embodiments, first, a text report with a full amount of negative description text and its corresponding medical image can be used to pre-train a preset diagnostic report generation model to obtain a pre-trained diagnostic report generation model; secondly, The pre-trained diagnostic report generation model is gradually trained using text reports containing negative description text and positive description text to improve the diagnostic report generation model's ability to recognize positive description text.

如图2所示，诊断报告生成模型包括文本编辑器、图像编辑器、多模态文本生成器。文本编辑器用于提取文本报告中的文本特征，图像编辑器用于提取医学影像的图像特征，多模态文本生成器用于将文本特征和图像特征在相同的特征空间内对齐，并根据对齐后的文本特征和图像特征生成诊断报告。As shown in Figure 2, the diagnostic report generation model includes a text editor, an image editor, and a multi-modal text generator. The text editor is used to extract text features from text reports, the image editor is used to extract image features from medical images, and the multimodal text generator is used to align text features and image features in the same feature space, and generate text based on the aligned text. Features and image features to generate diagnostic reports.

作为一种示例，对预设的诊断报告生成模型进行预训练处理，包括：As an example, pre-training a preset diagnostic report generation model includes:

将全量阴性描述文本的文本报告和其对应的医学影像输入预设的诊断报告生成模型中，输出阴性损失函数。Input the text report of the full amount of negative description text and its corresponding medical image into the preset diagnosis report generation model, and output the negative loss function.

具体地，将全量阴性描述文本输入文本编辑器内，输出全量阴性描述文本的文本特征；将医学影像图像输入图像编辑器内，输出医学影像的图像特征；将全量阴性描述文本的文本特征和医学影像的图像特征输入多模态文本生成器内，输出阴性损失函数。Specifically, the entire negative description text is input into the text editor, and the text features of the full negative description text are output; the medical image image is input into the image editor, and the image features of the medical image are output; and the text features of the full negative description text and the medical image are output. The image features of the image are input into the multi-modal text generator and the negative loss function is output.

作为另一示例，采用包含阴性描述文本和阳性描述文本的文本报告对预训练的诊断报告生成模型进行模型训练处理。引导阳性描述文本对预训练的诊断报告生成模型进行训练，防止用于模型训练的训练样本不均衡导致训练完成的多模态医学诊断报告生成模型对异常描述难检测出的问题。As another example, a pre-trained diagnostic report generation model is subjected to a model training process using a text report containing negative description text and positive description text. Guide the positive description text to train the pre-trained diagnostic report generation model to prevent the imbalance of training samples used for model training from causing the multi-modal medical diagnosis report generation model that has completed training to be difficult to detect abnormal descriptions.

S14根据损失函数对预训练的诊断报告生成模型进行优化处理，确定多模态医学诊断报告生成模型。S14 optimizes the pre-trained diagnostic report generation model according to the loss function to determine the multi-modal medical diagnosis report generation model.

在一种可能的实施例中，根据阴性损失函数和阳性损失函数设置文本报告中阴性描述文本和阳性描述文本的比例，优化预训练的诊断报告生成模型，确定多模态医学诊断报告生成模型。In a possible embodiment, the proportion of negative description text and positive description text in the text report is set according to the negative loss function and the positive loss function, the pre-trained diagnostic report generation model is optimized, and the multi-modal medical diagnosis report generation model is determined.

在模型训练阶段，可以自主调节阴性描述文本和阳性描述文本的比例，通过调整阳性描述文本在文本报告中的比例，进而调节阴性损失函数和阳性损失函数的权重，在一定程度上提高所训练的多模态医学诊断报告生成模型对医学影像中的出现的病变异常特征识别能力，精准的基于医学影像生成诊断报告，有效防止漏检异常病变。In the model training stage, you can adjust the proportion of negative description text and positive description text independently. By adjusting the proportion of positive description text in the text report, you can then adjust the weight of the negative loss function and the positive loss function, and improve the trained results to a certain extent. The multi-modal medical diagnosis report generation model has the ability to identify the abnormal characteristics of lesions appearing in medical images, and accurately generates diagnostic reports based on medical images, effectively preventing the missed detection of abnormal lesions.

S15将待测的医学影像输入多模态医学诊断报告生成模型中，生成待测的医学影像对应的诊断报告。S15 inputs the medical image to be tested into the multi-modal medical diagnosis report generation model, and generates a diagnostic report corresponding to the medical image to be tested.

作为一种示例，在测试阶段，将待测的医学影像输入训练完成的多模态医学诊断报告生成模型中，首先，将待测的医学影像输入图像编辑器，输出待测的医学影像的图像特征；其次，将待测的医学影像的图像特征输入多模态文本生成器内，生成待测的医学影像对应的诊断报告，最终多模态医学诊断报告生成模型输出待测的医学影响对应的诊断报告。As an example, in the testing phase, the medical image to be tested is input into the multi-modal medical diagnosis report generation model that has been trained. First, the medical image to be tested is input into the image editor, and the image of the medical image to be tested is output. features; secondly, input the image features of the medical image to be tested into the multi-modal text generator to generate a diagnostic report corresponding to the medical image to be tested. Finally, the multi-modal medical diagnosis report generation model outputs the text corresponding to the medical impact to be tested. Diagnose report.

如图3所示，在一些可能的实施例中，将医学影像、阴性描述文本、阳性描述文本输入预训练的诊断报告生成模型中，输出损失函数，对预训练的诊断报告生成模型进行模型训练，包括S21至S23。As shown in Figure 3, in some possible embodiments, medical images, negative description text, and positive description text are input into a pre-trained diagnostic report generation model, a loss function is output, and model training is performed on the pre-trained diagnostic report generation model. , including S21 to S23.

S21，将医学影像输入图像编辑器进行图像特征提取处理，输出医学影像的图像特征。S21. Input the medical image into the image editor for image feature extraction processing, and output the image features of the medical image.

S22，将阴性描述文本、阳性描述文本输入文本编辑器，输出文本报告的文本特征。S22: Input the negative description text and the positive description text into a text editor, and output the text characteristics of the text report.

其中，文本报告的文本特征包括阴性描述文本特征和阳性描述文本特征。Among them, the text features of the text report include negative description text features and positive description text features.

S23，将图像特征和文本特征输入多模态文本生成器内，输出阴性损失函数和阳性损失函数。S23, input the image features and text features into the multi-modal text generator, and output the negative loss function and the positive loss function.

在一种可能的实施例中，在多模态文本生成器内，将图像特征和文本特征在相同特征空间内进行特征对齐处理，确定经过对齐处理的图像特征和文本特征；根据经过对齐处理的图像特征和文本特征，生成医学影像对应的诊断报告；根据医学影像对应的报告和文本报告，输出阴性损失函数和阳性损失函数。In a possible embodiment, in the multi-modal text generator, the image features and text features are subjected to feature alignment processing in the same feature space, and the aligned image features and text features are determined; according to the aligned features Image features and text features are used to generate diagnostic reports corresponding to medical images; negative loss functions and positive loss functions are output based on reports and text reports corresponding to medical images.

通过上述技术方案，通过调整阴性描述文本和阳性描述文本的损失权重引导阳性描述文本对预训练的诊断报告生成模型进行模型训练和优化处理，优化诊断报告生成模型的文本编辑器、图像编辑器、以及多模态文本生成器的参数，完成对预训练的诊断报告生成模型的模型训练，获取多模态医学诊断报告生成模型。Through the above technical solution, by adjusting the loss weight of the negative description text and the positive description text, the positive description text is guided to perform model training and optimization processing on the pre-trained diagnostic report generation model, and the text editor, image editor, and image editor of the diagnostic report generation model are optimized. and the parameters of the multi-modal text generator, complete the model training of the pre-trained diagnostic report generation model, and obtain the multi-modal medical diagnosis report generation model.

基于同一构思，本公开还提供一种面向多模态医学数据的诊断报告生成系统，参照图4，该面向多模态医学数据的诊断报告生成系统100，包括：获取模块110、文本处理模块120、模型训练模块130、模型优化模块140、诊断报告生成模块150。Based on the same concept, the present disclosure also provides a diagnostic report generation system for multi-modal medical data. Referring to Figure 4 , the diagnostic report generation system 100 for multi-modal medical data includes: an acquisition module 110 and a text processing module 120 , model training module 130, model optimization module 140, and diagnostic report generation module 150.

获取模块110，用于获取多模态训练数据集，所述多模态训练数据集包括医学影像和文本报告；The acquisition module 110 is used to acquire a multi-modal training data set, which includes medical images and text reports;

文本处理模块120，用于根据预设的关键字将所述文本报告进行拆分处理，确定阴性描述文本和阳性描述文本；The text processing module 120 is used to split the text report according to preset keywords and determine negative descriptive text and positive descriptive text;

模型训练模块130，用于将所述医学影像、所述阴性描述文本、所述阳性描述文本输入预训练的诊断报告生成模型中，输出损失函数，对所述预训练的诊断报告生成模型进行模型训练，所述损失函数包括阴性损失函数和阳性损失函数；The model training module 130 is used to input the medical image, the negative description text, and the positive description text into a pre-trained diagnostic report generation model, output a loss function, and model the pre-trained diagnostic report generation model. Training, the loss function includes a negative loss function and a positive loss function;

模型优化模块140，用于根据所述损失函数对所述预训练的诊断报告生成模型进行优化处理，确定多模态医学诊断报告生成模型；The model optimization module 140 is configured to optimize the pre-trained diagnostic report generation model according to the loss function and determine a multi-modal medical diagnosis report generation model;

诊断报告生成模块150，用于将待测的医学影像输入所述多模态医学诊断报告生成模型中，生成所述待测的医学影像对应的诊断报告。The diagnostic report generation module 150 is configured to input the medical image to be tested into the multi-modal medical diagnosis report generation model, and generate a diagnostic report corresponding to the medical image to be tested.

可选地，模型训练模块130，包括：Optionally, the model training module 130 includes:

图像特征提取子模块，用于将所述医学影像输入所述图像编辑器进行图像特征提取处理，输出所述医学影像的图像特征；An image feature extraction submodule, used to input the medical image into the image editor for image feature extraction processing, and output the image features of the medical image;

文本特征提取子模块，用于将所述阴性描述文本和所述阳性描述文本输入所述文本编辑器，输出所述文本报告的文本特征，所述文本报告的文本特征包括阴性描述文本特征和阳性描述文本特征；Text feature extraction submodule, used to input the negative description text and the positive description text into the text editor, and output the text features of the text report. The text features of the text report include negative description text features and positive description text. Describe text features;

多模态处理子模块，用于将所述图像特征和所述文本特征输入所述多模态文本生成器内，输出所述阴性损失函数和所述阳性损失函数。The multi-modal processing submodule is used to input the image features and the text features into the multi-modal text generator, and output the negative loss function and the positive loss function.

可选地，多模态处理子模块，包括：Optionally, the multimodal processing submodule includes:

对齐处理子模块，用于将所述图像特征和所述文本特征在相同特征空间内进行特征对齐处理，确定经过所述对齐处理的所述图像特征和所述文本特征；Alignment processing submodule, used to perform feature alignment processing on the image features and the text features in the same feature space, and determine the image features and the text features that have undergone the alignment processing;

诊断报告生成子模块，用于根据经过所述对齐处理的所述图像特征和所述文本特征，生成所述医学影像对应的诊断报告；A diagnostic report generation submodule, configured to generate a diagnostic report corresponding to the medical image according to the image features and the text features that have undergone the alignment process;

损失子模块，用于根据所述医学影像对应的诊断报告和所述文本报告，输出所述阴性损失函数和所述阳性损失函数。A loss submodule, configured to output the negative loss function and the positive loss function according to the diagnosis report corresponding to the medical image and the text report.

可选地，模型优化模块140，包括：Optionally, the model optimization module 140 includes:

优化子模块，用于根据所述阴性损失函数和所述阳性损失函数设置所述文本报告中所述阴性描述文本和所述阳性描述文本的比例，优化所述预训练的诊断报告生成模型参数，确定所述多模态医学诊断报告生成模型。An optimization submodule configured to set the ratio of the negative description text and the positive description text in the text report according to the negative loss function and the positive loss function, and optimize the pre-trained diagnostic report generation model parameters, The multimodal medical diagnosis report generation model is determined.

可选地，诊断报告生成模块150，包括：Optionally, the diagnostic report generation module 150 includes:

待测图像特征提取子模块，用于将所述待测的医学影像输入所述图像编辑器，输出所述待测的医学影像的图像特征；The image feature extraction submodule to be tested is used to input the medical image to be tested into the image editor and output the image features of the medical image to be tested;

待测诊断报告生成子模块，用于将所述待测的医学影像的图像特征输入所述多模态文本生成器内，生成所述待测的医学影像对应的诊断报告。The diagnostic report generation submodule to be tested is used to input the image features of the medical image to be tested into the multi-modal text generator and generate a diagnostic report corresponding to the medical image to be tested.

关于上述系统的实施例，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the embodiments of the above system, the specific manner in which each module performs operations has been described in detail in the embodiments of the method, and will not be described in detail here.

如图5所示，在一些可能的实施例中，本公开还可以提供一种电子设备，例如医学诊断报告的终端，该电子设备500可以包括：处理器501，存储器502。该电子设备500还可以包括多媒体组件503，输入/输出接口504，以及通信组件505中的一者或多者。As shown in FIG. 5 , in some possible embodiments, the present disclosure can also provide an electronic device, such as a medical diagnosis report terminal. The electronic device 500 can include: a processor 501 and a memory 502 . The electronic device 500 may also include one or more of a multimedia component 503 , an input/output interface 504 , and a communication component 505 .

其中，处理器501用于控制该电子设备500的整体操作，以完成上述第一方面的面向多模态医学数据的诊断报告生成方法中的全部或者部分步骤。存储器502用于存储各种类型的数据以支持在该电子设备500的操作，这些数据例如可以包括用于在该电子设备500上操作的任何应用程序或方法的指令，以及应用程序相关的数据，例如联系人数据、收发的消息、图片、音频、视频等等。该存储器502可以由任何类型的易失性或非易失性存储设备或者它们的组合实现，例如静态随机存取存储器(Static Random Access Memory，简称SRAM)，电可擦除可编程只读存储器(Electrically Erasable Programmable Read-OnlyMemory，简称EEPROM)，可擦除可编程只读存储器(Erasable Programmable Read-OnlyMemory，简称EPROM)，可编程只读存储器(Programmable Read-Only Memory，简称PROM)，只读存储器(Read-Only Memory，简称ROM)，磁存储器，快闪存储器，磁盘或光盘。多媒体组件503可以包括屏幕和音频组件。其中屏幕例如可以是触摸屏，音频组件用于输出和/或输入音频信号。例如，音频组件可以包括一个麦克风，麦克风用于接收外部音频信号。所接收的音频信号可以被进一步存储在存储器502或通过通信组件505发送。音频组件还包括至少一个扬声器，用于输出音频信号。输入/输出接口504为处理器501和其他接口模块之间提供接口，上述其他接口模块可以是键盘，鼠标，按钮等。这些按钮可以是虚拟按钮或者实体按钮。通信组件505用于该电子设备500与其他设备之间进行有线或无线通信。无线通信，例如Wi-Fi，蓝牙，近场通信(Near Field Communication，简称NFC)，2G、3G、4G、NB-IOT、eMTC、或其他5G等等，或它们中的一种或几种的组合，在此不做限定。因此相应的该通信组件305可以包括：Wi-Fi模块，蓝牙模块，NFC模块等等。The processor 501 is used to control the overall operation of the electronic device 500 to complete all or part of the steps in the first aspect of the diagnostic report generation method for multi-modal medical data. The memory 502 is used to store various types of data to support operations on the electronic device 500. These data may include, for example, instructions for any application program or method operating on the electronic device 500, as well as application-related data. For example, contact data, messages sent and received, pictures, audios, videos, etc. The memory 502 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (Static Random Access Memory, SRAM for short), electrically erasable programmable read-only memory ( Electrically Erasable Programmable Read-Only Memory (EEPROM for short), Erasable Programmable Read-Only Memory (EPROM for short), Programmable Read-Only Memory (PROM for short), Read-Only Memory ( Read-Only Memory (ROM for short), magnetic memory, flash memory, magnetic disk or optical disk. Multimedia components 503 may include screen and audio components. The screen may be a touch screen, for example, and the audio component is used to output and/or input audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may be further stored in memory 502 or transmitted through communication component 505 . The audio component also includes at least one speaker for outputting audio signals. The input/output interface 504 provides an interface between the processor 501 and other interface modules. The other interface modules may be keyboards, mice, buttons, etc. These buttons can be virtual buttons or physical buttons. The communication component 505 is used for wired or wireless communication between the electronic device 500 and other devices. Wireless communication, such as Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or one or more of them The combination is not limited here. Therefore, the corresponding communication component 305 may include: Wi-Fi module, Bluetooth module, NFC module, etc.

在另一示例性实施例中，还提供了一种包括程序指令的非临时性计算机可读存储介质，该程序指令被处理器执行时实现上述的第一方面的面向多模态医学数据的诊断报告生成方法的步骤。例如，该计算机可读存储介质可以为上述包括程序指令的存储器，上述程序指令可由电子设备的处理器执行以完成面向多模态医学数据的诊断报告生成方法。In another exemplary embodiment, a non-transitory computer-readable storage medium including program instructions is also provided. When the program instructions are executed by a processor, the multi-modal medical data-oriented diagnosis of the first aspect is implemented. Steps in the report generation method. For example, the computer-readable storage medium may be the above-mentioned memory including program instructions, and the program instructions may be executed by a processor of the electronic device to complete the diagnostic report generation method for multi-modal medical data.

在另一示例性实施例中，还提供一种计算机程序产品，该计算机程序产品包含能够由可编程的装置执行的计算机程序，该计算机程序具有当由该可编程的装置执行时用于执行上述的面向多模态医学数据的诊断报告生成方法的代码部分。In another exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program executable by a programmable device, the computer program having a function for performing the above when executed by the programmable device. The code part of the diagnostic report generation method for multi-modal medical data.

以上对本公开的具体实施例进行了描述。需要理解的是，本公开并不局限于上述特定实施方式，本领域技术人员可以在权利要求的范围内做出各种变形或修改，这并不影响本公开的实质内容。上述各优选特征在互不冲突的情况下，可以任意组合使用。Specific embodiments of the present disclosure have been described above. It should be understood that the present disclosure is not limited to the specific embodiments described above, and those skilled in the art can make various variations or modifications within the scope of the claims, which does not affect the essential content of the present disclosure. The above preferred features can be used in any combination as long as they do not conflict with each other.

Claims

1. A diagnostic report generation method for multi-modal medical data, characterized by:

Obtaining a multi-modal training data set, the multi-modal training data set includes medical images and text reports;

Split the text report according to preset keywords to determine negative description text and positive description text;

The medical image, the negative description text, and the positive description text are input into a pre-trained diagnostic report generation model, a loss function is output, and the pre-trained diagnostic report generation model is model trained, and the loss function includes Negative loss function and positive loss function;

Optimize the pre-trained diagnostic report generation model according to the loss function to determine a multi-modal medical diagnosis report generation model;

The medical image to be tested is input into the multi-modal medical diagnosis report generation model, and a diagnosis report corresponding to the medical image to be tested is generated.

2. The method according to claim 1, characterized in that the pre-trained diagnostic report generation model includes a text editor, an image editor, and a multi-modal text generator.

3. The method according to claim 2, characterized in that the medical image, the negative description text, and the positive description text are input into a pre-trained diagnostic report generation model, and a loss function is output to calculate the The above pre-trained diagnostic report generation model is used for model training, including:

Input the medical image into the image editor for image feature extraction processing, and output the image features of the medical image;

Input the negative description text and the positive description text into the text editor, and output the text features of the text report, where the text features of the text report include negative description text features and positive description text features;

The image features and the text features are input into the multi-modal text generator, and the negative loss function and the positive loss function are output.

4. The method according to claim 3, characterized in that the image features and the text features are input into the multi-modal text generator, and the negative loss function and the positive loss function are output ,include:

Perform feature alignment processing on the image features and text features in the same feature space, and determine the image features and text features that have undergone the alignment processing;

Generate a diagnosis report corresponding to the medical image according to the image features and the text features that have undergone the alignment process;

The negative loss function and the positive loss function are output according to the diagnosis report corresponding to the medical image and the text report.

5. The method according to claim 1, characterized in that optimizing the pre-trained diagnostic report generation model according to the loss function and determining a multi-modal medical diagnosis report generation model includes:

Set the proportion of the negative description text and the positive description text in the text report according to the negative loss function and the positive loss function, optimize the parameters of the pre-trained diagnostic report generation model, and determine the multi-modality Medical diagnosis report generation model.

6. The method according to claim 5, characterized in that the pre-trained diagnostic report generation model uses a text report with a full amount of negative description text for pre-training processing.

7. The method according to claim 2, characterized in that said inputting the medical image to be tested into the multi-modal medical diagnosis report generation model and generating a diagnosis report corresponding to the medical image to be tested includes: :

Input the medical image to be tested into the image editor, and output the image features of the medical image to be tested;

The image features of the medical image to be tested are input into the multi-modal text generator to generate a diagnosis report corresponding to the medical image to be tested.

8. A diagnostic report generation system for multi-modal medical data, characterized by including:

An acquisition module, used to acquire a multi-modal training data set, where the multi-modal training data set includes medical images and text reports;

A text processing module, configured to split the text report according to preset keywords and determine negative description text and positive description text;

A model training module, configured to input the medical image, the negative description text, and the positive description text into a pre-trained diagnostic report generation model, output a loss function, and perform model training on the pre-trained diagnostic report generation model. , the loss function includes a negative loss function and a positive loss function;

A model optimization module, configured to optimize the pre-trained diagnostic report generation model according to the loss function and determine a multi-modal medical diagnosis report generation model;

A diagnostic report generation module is used to input the medical image to be tested into the multi-modal medical diagnosis report generation model and generate a diagnostic report corresponding to the medical image to be tested.

9. A non-transitory computer-readable storage medium with a computer program stored thereon, characterized in that when the program is executed by a processor, the steps of the method according to any one of claims 1-7 are implemented.

10. An electronic device, characterized in that it includes:

A memory on which a computer program is stored;

A processor, configured to execute the computer program in the memory to implement the steps of the method according to any one of claims 1-7.