CN115563297A

CN115563297A - A food safety knowledge map construction and completion method based on graph neural network

Info

Publication number: CN115563297A
Application number: CN202211134812.6A
Authority: CN
Inventors: 向金海; 翁永琳; 倪福川; 李国亮
Original assignee: Huazhong Agricultural University
Current assignee: Huazhong Agricultural University
Priority date: 2022-09-19
Filing date: 2022-09-19
Publication date: 2023-01-03

Abstract

The invention provides a food safety knowledge graph building and complementing method based on a graph neural network, which comprises the steps of acquiring national food safety standard files related to food types, food additives and pesticide residues in food, processing the national food safety standard files into triples capable of being applied to the knowledge graph through operations such as data cleaning, formatting and the like, and building a body layer mode framework of the food safety knowledge graph; designing a food safety knowledge map query system capable of realizing entity query and visual display; generating a word vector of an entity name in the food safety knowledge graph according to a pre-training language model BERT; and performing feature fusion on text information and graph structure information in the food safety knowledge graph by using a graph neural network architecture, and respectively applying the feature fusion to two downstream tasks of entity classification and link prediction so as to fulfill the aim of knowledge graph completion. The invention improves the integrity and the practicability of the food safety knowledge map and realizes the intelligent application of food safety standard information.

Description

A food safety knowledge map construction and completion method based on graph neural network

技术领域technical field

本发明属于知识图谱和图神经网络技术领域，具体涉及一种基于图神经网络的食品安全知识图谱构建与补全方法。The invention belongs to the technical field of knowledge graph and graph neural network, and in particular relates to a method for constructing and completing a food safety knowledge graph based on graph neural network.

背景技术Background technique

随着人们生活水平的不断提高，人们对食品安全的重视程度也越来越高。无论是监管部门还是普通民众，都需要一套切实有效的标准指导。但是目前来说，食品安全国家标准仍然是以文件的形式存储，并且文件的种类和数目繁多，格式不够统一，彼此间还有引用关系。虽然我国对食品质量规范化和信息化管理的工作一直在推进，但是在结构化整合这些标准、系统化分析关联上仍有欠缺。With the continuous improvement of people's living standards, people pay more and more attention to food safety. Both regulators and ordinary people need a set of practical and effective standard guidance. But at present, the national food safety standards are still stored in the form of documents, and there are many types and numbers of documents, the format is not uniform enough, and there are references to each other. Although my country has been promoting the standardization and information management of food quality, there are still deficiencies in the structural integration of these standards and systematic analysis of the relationship.

伴随着科学技术的发展，知识图谱这一形式可以把客观世界的概念和关系以图结构的形式描述为节点和边，更有利于查询食品标准的内容和它们之间的关系。目前国内外大部分食品安全知识图谱都集中于子领域的构建，没有一个权威的食品安全标准的知识库作为指导。With the development of science and technology, the form of knowledge graph can describe the concepts and relationships of the objective world as nodes and edges in the form of a graph structure, which is more conducive to querying the content of food standards and the relationship between them. At present, most of the food safety knowledge maps at home and abroad are focused on the construction of sub-fields, and there is no authoritative food safety standard knowledge base as a guide.

如果仅仅依靠人工抽取标准内容作为数据来构建知识图谱，对于食品这么庞大的领域来说无疑是低效的，且信息往往会不够完整。而图神经网络在处理图结构数据上具有优势性，可以尝试应用于知识图谱补全的实体分类和链接预测任务中。但由于图神经网络自身结构的原因，当获取深层图信息时会出现过平滑的现象，影响模型的效果，并且如果仅仅考虑图结构信息的话，知识图谱自身的信息并没有得到充分的利用，这也是如今需要考虑的问题之一。If only relying on manual extraction of standard content as data to construct a knowledge map, it is undoubtedly inefficient for such a huge field of food, and the information is often incomplete. The graph neural network has advantages in processing graph-structured data, and can try to be applied to entity classification and link prediction tasks of knowledge graph completion. However, due to the structure of the graph neural network itself, when the deep graph information is obtained, there will be over-smoothing, which will affect the effect of the model, and if only the graph structure information is considered, the information of the knowledge graph itself has not been fully utilized. It is also one of the issues that need to be considered now.

发明内容Contents of the invention

本发明要解决的技术问题是：提供一种基于图神经网络的食品安全知识图谱构建与补全方法，用于提高食品安全知识图谱的完整性。The technical problem to be solved by the present invention is to provide a method for constructing and completing a food safety knowledge map based on a graph neural network, which is used to improve the integrity of the food safety knowledge map.

本发明为解决上述技术问题所采取的技术方案为：一种基于图神经网络的食品安全知识图谱构建与补全方法，包括以下步骤：The technical solution adopted by the present invention to solve the above technical problems is: a method for constructing and completing a food safety knowledge map based on a graph neural network, including the following steps:

(此处与权利要求一致，待权利要求确认后填充)(Here is consistent with the claim, to be filled after the claim is confirmed)

本发明的有益效果为：The beneficial effects of the present invention are:

1.本发明的一种基于图神经网络的食品安全知识图谱构建与补全方法，通过提取食品安全标准信息构建知识图谱，根据预训练语言模型BERT生成食品安全知识图谱中实体名称的词向量；利用图神经网络架构将食品安全知识图谱中的文本信息和图结构信息进行特征融合，并分别应用到实体分类和链接预测两项下游任务中对食品安全知识图谱进行补全，实现了提高食品安全知识图谱的完整性的功能；本发明通过在图神经网络中引入文本表示，丰富了节点信息的同时，缓解了多层数图神经网络本身结构导致的过平滑现象，有利于模型学习到更多的知识图谱信息；本发明通过改进R-GCN链接预测模型中的打分函数，引入分段式分块点积的思想构建的打分函数，以图结构特征向量和文本特征向量分别各自计算的形式提高了链接预测的效果。1. A method for constructing and completing a food safety knowledge map based on a graph neural network of the present invention, constructing a knowledge map by extracting food safety standard information, and generating word vectors of entity names in the food safety knowledge map according to the pre-trained language model BERT; Using the graph neural network architecture to fuse the text information and graph structure information in the food safety knowledge map, and apply them to the two downstream tasks of entity classification and link prediction to complete the food safety knowledge map, and realize the improvement of food safety. The function of the integrity of the knowledge map; the invention introduces text representations into the graph neural network, enriches the node information, and alleviates the over-smoothing phenomenon caused by the structure of the multi-layer digital graph neural network itself, which is conducive to the model to learn more knowledge map information; the present invention improves the scoring function in the R-GCN link prediction model, and introduces the scoring function constructed by the idea of segmented block dot product, which is improved in the form of calculating the graph structure feature vector and the text feature vector respectively. effect on link prediction.

2.本发明获取与食品品类、食品添加剂、食品中农药残留相关的国家食品安全标准文件，通过数据清洗和格式化等操作将不同食品类别、食品添加剂、食品中农药残留的最大限量等食品安全国家标准信息处理为可以应用到知识图谱中的三元组形式，并构建食品安全知识图谱的本体层模式架构，弥补了食品安全领域知识图谱数据的空缺。2. The present invention obtains national food safety standard documents related to food categories, food additives, and pesticide residues in food, and through operations such as data cleaning and formatting, the food safety standards for different food categories, food additives, and maximum limits of pesticide residues in food, etc. The national standard information is processed into a triplet form that can be applied to the knowledge map, and the ontology layer model structure of the food safety knowledge map is constructed, which makes up for the vacancy of the knowledge map data in the field of food safety.

3.本发明搭建了一个基于Python的Flask框架和Echarts.js的网页端可视化系统，以图的形式存储食品安全国家标准内容的数据并应用到下游任务中，实现了实体查询和可视化展示的食品安全知识图谱查询系统。3. The present invention builds a Python-based Flask framework and a web-side visualization system of Echarts.js, stores the data of the national food safety standard content in the form of a graph and applies it to downstream tasks, and realizes entity query and visual display of food Security knowledge graph query system.

4.本发明弥补了当前技术的不足，提高了食品安全知识图谱的完整性和实用性，实现了食品安全标准信息的智能化应用。4. The invention makes up for the shortcomings of the current technology, improves the integrity and practicability of the food safety knowledge map, and realizes the intelligent application of food safety standard information.

附图说明Description of drawings

图1是本发明实施例的初步构建食品安全知识图谱的流程图。Fig. 1 is a flowchart of the preliminary construction of a food safety knowledge map according to an embodiment of the present invention.

图2是本发明实施例的食品安全知识图谱的本体结构图。Fig. 2 is an ontology structure diagram of the food safety knowledge map of the embodiment of the present invention.

图3是本发明实施例的数据抽取和处理的流程图。Fig. 3 is a flowchart of data extraction and processing in an embodiment of the present invention.

图4是本发明实施例的实体分类模型的结构示意图。Fig. 4 is a schematic structural diagram of an entity classification model according to an embodiment of the present invention.

图5是本发明实施例的链接预测模型的结构示意图。Fig. 5 is a schematic structural diagram of a link prediction model according to an embodiment of the present invention.

具体实施方式detailed description

下面结合附图和具体实施方式对本发明作进一步详细的说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

参见图1，本发明的实施例包括食品安全知识图谱的初步构建模块、基于图神经网络和文本预训练模型的知识图谱实体分类模型、基于编码器-解码器的知识图谱链接预测模型。Referring to Fig. 1, the embodiment of the present invention includes a preliminary building block of a food safety knowledge graph, a knowledge graph entity classification model based on a graph neural network and a text pre-training model, and an encoder-decoder-based knowledge graph link prediction model.

本发明的一种基于图神经网络的食品安全知识图谱构建与补全方法，包括以下步骤：A method for constructing and completing a food safety knowledge map based on a graph neural network of the present invention comprises the following steps:

S1：收集数据；采用包括网络爬虫和表格解析的手段提取食品安全国家标准信息，包括标准文件中的半结构化信息和非结构化信息；数据抽取的工作流程图如图3所示；S1: Collect data; extract national food safety standard information by means including web crawler and table analysis, including semi-structured information and unstructured information in standard documents; the work flow chart of data extraction is shown in Figure 3;

S11：通过网络爬虫方法从网站上获取食品安全国家标准的信息、下载食品安全国家标准文件；S11: Obtain information on national food safety standards from websites and download national food safety standard documents through web crawlers;

S12：OCR识别下载的食品安全国家标准文件，解析文件中的表格信息；S12: OCR identifies the downloaded national food safety standard file, and parses the table information in the file;

S13：基于规则匹配和网页、表格解析，得到半结构化数据和非结构化数据。S13: Obtain semi-structured data and unstructured data based on rule matching and web page and table analysis.

S2：构建本体；基于得到的半结构化信息和人工辅助标注，构建食品安全知识图谱的本体框架，并利用得到的本体框架进一步完善数据的提取方式，得到的本体框架图如图2所示，其中包括了与食品品类、食品标准、食品添加剂和农药残留相关的实体类型和关系类型；S2: Construct ontology; based on the obtained semi-structured information and manual assisted annotation, construct the ontology framework of the food safety knowledge map, and use the obtained ontology framework to further improve the data extraction method. The resulting ontology framework diagram is shown in Figure 2. These include entity types and relationship types related to food categories, food standards, food additives, and pesticide residues;

S3：处理数据；根据构建的本体框架和知识图谱的存储需要，对半结构化和非结构化数据进行格式化处理和数据清洗，得到csv格式的三元组形式数据；数据处理的工作流程图如图3所示；S3: Data processing; according to the storage requirements of the constructed ontology framework and knowledge graph, format and clean semi-structured and unstructured data to obtain triplet form data in csv format; work flow chart of data processing As shown in Figure 3;

S4：存储知识图谱，并构建可视化查询系统；S4: Store the knowledge map and build a visual query system;

S41：将csv格式的知识图谱三元组数据存储到Neo4J图数据库中；S41: storing the knowledge graph triplet data in csv format into the Neo4J graph database;

S42：基于Python的Flask框架，调用Neo4j图数据库，以Echarts.js的力导向图展示知识图谱；S42: Based on the Python Flask framework, call the Neo4j graph database, and display the knowledge graph with the force-directed graph of Echarts.js;

S43：通过Python调用Neo4J的cypher语句实现知识图谱实体的查询，并显示到网页端页面上。S43: Call the cypher statement of Neo4J through Python to realize the query of the knowledge map entity, and display it on the web page.

S5：利用图神经网络和文本预训练模型BERT完成知识图谱实体分类任务；图4展示了本发明中结合文本表示的关系图卷积网络示意图，包括BERT文本表示部分和R-GCN图结构信息部分；S5: Use the graph neural network and the text pre-training model BERT to complete the knowledge map entity classification task; Figure 4 shows a schematic diagram of the relational graph convolution network combined with text representation in the present invention, including the BERT text representation part and the R-GCN graph structure information part ;

S51：基于食品安全知识图谱的三元组构建数据集，划分训练集、验证集和测试集，将食品安全相关的实体类型为对应的类型编号，将实体本身抽象为实体编号，基于实体类型表和三元组表构建DGLDataset数据集；S51: Construct a data set based on the triplet of the food safety knowledge map, divide the training set, verification set and test set, assign the food safety-related entity type to the corresponding type number, and abstract the entity itself into the entity number, based on the entity type table Construct the DGLDataset dataset with the triple table;

S52：采用BERT预训练模型作为实体分类模型(参见图4)，提取知识图谱实体名称的文本词向量表示，对预训练得到的词向量进行降维处理；S52: Use the BERT pre-training model as the entity classification model (see Figure 4), extract the text word vector representation of the entity name of the knowledge graph, and perform dimensionality reduction processing on the word vector obtained from the pre-training;

实体分类模型用于结合图的结构信息和实体的文本描述，将知识图谱表示从单一的网络结构表示拓展到结构和文本的共同表示，构建融合多源知识图谱信息的模型；通过文本预训练模型BERT得到实体描述的词向量表示，将词向量表示添加到实体表示中与原有的实体关系结构信息融合后，共同通过关系图卷积网络模型学习知识图谱表示；The entity classification model is used to combine the structural information of the graph and the textual description of the entity, expand the knowledge graph representation from a single network structure representation to the common representation of structure and text, and build a model that integrates multi-source knowledge graph information; through the text pre-training model BERT obtains the word vector representation of the entity description, adds the word vector representation to the entity representation and fuses the original entity relationship structure information, and jointly learns the knowledge graph representation through the relational graph convolutional network model;

采用BERT服务bert-as-service，基于C/S架构封装预训练的BERT模型，将其作为一种服务提供给客户端，更方便地为工程类项目提供支持。因为食品安全知识图谱中的实体名称大部分为中文词汇，所以选取中文BERT预训练模型chinese_L-12_H-768_A-12，得到所有实体名称的768维词向量。由于通过预训练模型得到的词向量维度较高，如果直接使用会大大增加模型的计算量，通过主成分分析(Principal Component Analysis,PCA)的方法对维数较高的768维词向量进行降维。后续实验证明，维数的降低对实验结果的影响不大，考虑到计算复杂度和文本信息表示的完整性，实体分类模型统一使用的文本表示维数为5维并将浮点数保留了4位小数，便于后续的使用。The BERT service bert-as-service is used to encapsulate the pre-trained BERT model based on the C/S architecture, and provide it to the client as a service, so as to provide support for engineering projects more conveniently. Because most of the entity names in the food safety knowledge map are Chinese words, the Chinese BERT pre-training model chinese_L-12_H-768_A-12 is selected to obtain the 768-dimensional word vectors of all entity names. Due to the high dimension of the word vector obtained through the pre-training model, if it is used directly, the calculation amount of the model will be greatly increased, and the dimensionality of the 768-dimensional word vector with a high dimension is reduced by the method of Principal Component Analysis (PCA). . Subsequent experiments proved that the reduction of dimensionality has little effect on the experimental results. Considering the computational complexity and the integrity of text information representation, the text representation dimension used uniformly by the entity classification model is 5-dimensional and the floating-point number is reserved for 4 bits. Decimals are convenient for subsequent use.

S53：利用R-GCN聚合信息，更新节点表示，获取图结构信息；S53: Use R-GCN to aggregate information, update node representation, and obtain graph structure information;

S54：将实体名称的文本词向量嵌入到R-GCN的输出层之前；S54: Embedding the text word vector of the entity name before the output layer of R-GCN;

S55：经过输出层学习后，使用softmax对实体进行分类，对样本中每个实体预测可能的类别，在所有有标签的节点上最小化交叉熵损失函数进行优化。S55: After learning the output layer, use softmax to classify entities, predict possible categories for each entity in the sample, and optimize by minimizing the cross-entropy loss function on all labeled nodes.

经过实验证实，本发明结合文本信息的R-GCN模型在对比是否使用文本表示时取得了更高的实体分类准确率。Experiments have proved that the R-GCN model combined with text information in the present invention achieves a higher entity classification accuracy rate when compared with whether text representation is used.

S6：基于编码器-解码器的图神经网络模型完成知识图谱链接预测任务；图5展示了基于编码器-解码器的链接预测模型的结构图，包括BERT文本表示部分和R-GCN图结构信息部分；将R-GCN模型作为编码器得到三元组的表示，将打分函数作为解码器；S6: The encoder-decoder-based graph neural network model completes the knowledge map link prediction task; Figure 5 shows the structure diagram of the encoder-decoder-based link prediction model, including the BERT text representation part and R-GCN graph structure information Part; the R-GCN model is used as the encoder to obtain the representation of the triplet, and the scoring function is used as the decoder;

通过三元组构建的知识图谱，以及基于图神经网络和文本表示实现实体分类，提高了食品安全知识图谱在实体类型上的完整性，但是食品安全知识图谱还存在部分关系缺失的问题。通过预测知识图谱中缺失的三元组，基于打分函数来判断三元组是否符合要求，从而实现链接预测。对于形如(头实体，关系，尾实体)类型的三元组，依据其中一个实体和对应关系预测另一个实体，从而解决头尾实体预测问题。The knowledge map constructed by triples and the entity classification based on graph neural network and text representation have improved the integrity of the food safety knowledge map in terms of entity types, but there are still some missing relationships in the food safety knowledge map. By predicting the missing triples in the knowledge graph and judging whether the triples meet the requirements based on the scoring function, link prediction is realized. For a triple of the type (head entity, relationship, tail entity), predict another entity based on one of the entities and the corresponding relationship, so as to solve the problem of head and tail entity prediction.

通过这种方式得到的链接预测模型同样可以应用到知识图谱的推理当中，根据一个实体和它的关系结构预测可能与之具有某种特定关联的其他实体。就食品安全问题而言，可以通过上述方法，根据某个食品类别推断出与其具有一定关系的其他食品或可能超标的其他物质，从而精准把握食品全链条(生产、运输、储存等)中可能存在的安全隐患，及时发现问题，防患于未然。The link prediction model obtained in this way can also be applied to the reasoning of knowledge graphs, predicting other entities that may have a specific relationship with it according to an entity and its relationship structure. As far as food safety issues are concerned, the above methods can be used to infer other foods that have a certain relationship with it or other substances that may exceed the standard according to a certain food category, so as to accurately grasp the possible existence of food in the whole food chain (production, transportation, storage, etc.). potential safety hazards, discover problems in time, and prevent problems before they happen.

在链接预测任务中，主要包括两方面的工作。一方面，为了让模型学习到知识图谱的信息，需要获取知识图谱的表示。相较于实体分类模型，需要更多地考虑结构图中边的类型和表示。另一方面，需要选择一个适用于当前问题的三元组打分函数，此函数直接影响模型的训练和结果。In the link prediction task, it mainly includes two aspects of work. On the one hand, in order for the model to learn the information of the knowledge graph, it is necessary to obtain the representation of the knowledge graph. Compared with entity classification models, more consideration needs to be given to the type and representation of edges in structural graphs. On the other hand, it is necessary to choose a triplet scoring function suitable for the current problem, which directly affects the training and results of the model.

S61：划分食品安全知识图谱数据集，为了避免类型不均衡影响预测效果，抽取数据时尽可能保证训练集和验证集、测试集的关系类型比例基本一致，将食品安全相关的实体类型和关系类型抽象为对应的类型编号；S61: Divide the food safety knowledge map data set. In order to avoid the impact of type imbalance on the prediction effect, when extracting data, try to ensure that the relationship type ratios of the training set, verification set, and test set are basically the same. Food safety-related entity types and relationship types Abstract to the corresponding type number;

S62：利用BERT预训练模型对知识图谱中每个节点的描述信息做文本特征提取并得到词向量表示；S62: Use the BERT pre-training model to perform text feature extraction on the description information of each node in the knowledge graph and obtain word vector representation;

S63：链接预测模型的编码器部分：基于R-GCN的隐藏层更新节点的表示，包括节点的描述作为文本信息通过BERT预训练模型得到词向量表示，将文本表示词向量嵌入到最后一个R-GCN隐藏层之前，而知识图谱本身的图结构信息则通过初始化嵌入后输入R-GCN隐藏层得到知识图谱的表示；两种特征在进入最后一层隐藏层之前先进行融合和表示学习，再通过最后一层R-GCN隐藏层；S63: The encoder part of the link prediction model: update the representation of the node based on the hidden layer of R-GCN, including the description of the node as text information, obtain the word vector representation through the BERT pre-training model, and embed the text representation word vector into the last R- Before the GCN hidden layer, the graph structure information of the knowledge map itself is input into the R-GCN hidden layer after initialization and embedding to obtain the representation of the knowledge map; the two features are fused and represented before entering the last hidden layer, and then passed The last layer of R-GCN hidden layer;

S64：链接预测模型的解码器部分：通过打分函数实现对三元组的评价，得到正样本在所有样本分数中的排名。将打分函数定义为f(s，r，o)，其中s，r，o分别表示头实体、关系和尾实体。S64: the decoder part of the link prediction model: realize the evaluation of triplets through the scoring function, and obtain the ranking of positive samples among all sample scores. Define the scoring function as f(s, r, o), where s, r, o denote head entity, relation and tail entity, respectively.

S641：打分函数基于分段式的分块点积思想，结合节点表示中包含的图结构信息和文本信息，分为f_graph(s，r，o)和f_word(s，r，o)两部分来进行计算，即f(s，r，o)＝f_graph(s，r，o)+f_word(s，r，o)，将图结构特征和文本特征各分为2部分，f_graph(s，r，o)＝∑_{0≤x，y＜2}s_x，y·<r_x，s_y，o_z>，f_word(s，r，o)＝∑_{2≤x，y＜4}<r_x，s_y，o_z>；其中x，y，z分别代表关系表示、头实体表示和尾实体表示分段的段编号。S641: The scoring function is based on the segmented block dot product idea, combined with the graph structure information and text information contained in the node representation, and is divided into f _graph (s, r, o) and f _word (s, r, o) part to calculate, that is, f(s, r, o) = f _graph (s, r, o) + f _word (s, r, o), divide the graph structure features and text features into two parts, f _graph (s, r, o) = ∑ _{0≤x, y<2} s _{x, y} <r _x , s _y , o _z >, f _word (s, r, o) = ∑ _{2≤x, y<4} <r _x , s _y , o _z >; where x, y, and z represent the segment numbers of the relationship representation, head entity representation, and tail entity representation segments, respectively.

S642：图结构信息部分需要考虑对称性，使用奇数表示非对称关系，偶数表示对称关系，引入参数s_x，y，在词向量部分的计算中则不添加关系约束。当x为奇数且x+y≥2时，s_x，y为-1，其余为1。S642: The graph structure information part needs to consider the symmetry, use an odd number to indicate an asymmetric relationship, and an even number to indicate a symmetric relationship, introduce parameters s _{x, y} , and do not add relationship constraints in the calculation of the word vector part. When x is an odd number and x+y≥2, s _{x, y} is -1, and the rest are 1.

S643：为了简化计算，引入尾实体的索引w_x，y，降低复杂度。S643: In order to simplify the calculation, the index w _{x, y} of the tail entity is introduced to reduce the complexity.

S65：前期构造的数据集中并不包含负例数据，因此采用随机破坏正例数据的方法进行负采样，即以数据集中其他的随机实体替换头实体或尾实体的方式生成一定比例的负样本，并通过交叉熵损失进行优化。S65: The data set constructed in the previous stage does not contain negative data, so the method of randomly destroying positive data is used for negative sampling, that is, a certain proportion of negative samples are generated by replacing the head entity or tail entity with other random entities in the data set, And optimized by cross-entropy loss.

以上实施例仅用于说明本发明的设计思想和特点，其目的在于使本领域内的技术人员能够了解本发明的内容并据以实施，本发明的保护范围不限于上述实施例。所以，凡依据本发明所揭示的原理、设计思路所作的等同变化或修饰，均在本发明的保护范围之内。The above embodiments are only used to illustrate the design concept and characteristics of the present invention, and its purpose is to enable those skilled in the art to understand the content of the present invention and implement it accordingly. The protection scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications based on the principles and design ideas disclosed in the present invention are within the protection scope of the present invention.

Claims

1. A method for constructing and completing a food safety knowledge map based on a graph neural network, characterized in that: comprising the following steps:

S1: Use methods including web crawlers and table analysis to extract national food safety standard information. National food safety standard information includes semi-structured data and unstructured data in standard documents;

S2: Based on semi-structured data and artificially assisted labeling, construct the ontology framework of the food safety knowledge map, and use the ontology framework to further improve the data extraction method;

S3: According to the storage requirements of the ontology framework and knowledge graph, format and clean semi-structured and unstructured data to obtain triple data of food safety knowledge graph in csv format;

S4: Store food safety knowledge map triplet data and build a visual query system;

S5: Combining text representation relational graph convolutional network, including BERT text representation part and R-GCN graph structure information part; using graph neural network and text pre-training model BERT to build an entity classification model to complete the knowledge map entity classification task; entity classification model It is used to combine the structural information of the graph and the text description of the entity, expand the knowledge graph representation from a single network structure representation to the common representation of structure and text, and build a model that integrates multi-source knowledge graph information; the entity is obtained through the text pre-training model BERT The word vector representation of the description, after adding the word vector representation to the entity representation and merging with the original entity relationship structure information, jointly learn the knowledge graph representation through the relational graph convolutional network model;

S6: Use the R-GCN model as the encoder to obtain the representation of the triplet, and use the scoring function as the decoder; construct the structure of the link prediction model based on the encoder-decoder, including the BERT text representation part and the R-GCN graph structure information Part; the encoder-decoder-based link prediction model is used to predict other entities that may have a certain relationship with it according to an entity and its relationship structure, by predicting missing triples in the knowledge graph, based on the scoring function judgment Whether the triplet meets the requirements, so as to complete the knowledge graph link prediction task.

2. A method for constructing and completing a food safety knowledge map based on a graph neural network according to claim 1, characterized in that: in the step S1, the specific steps are:

S11: Obtain information on national food safety standards from the Internet through a crawler program, and download national food safety standard documents;

S12: OCR identifies the downloaded national food safety standard file, and parses the table information in the file;

S13: Obtain semi-structured data and unstructured data based on rule matching and web page and table analysis.

3. A graph neural network-based food safety knowledge map construction and completion method according to claim 1, characterized in that: in the step S2, the ontology framework includes food categories, food standards, food additives and Entity types and relationship types related to pesticide residues.

4. A food safety knowledge map construction and completion method based on graph neural network according to claim 3, characterized in that: in the step S4, the specific steps are:

S41: storing the knowledge graph triplet data in csv format into the Neo4J graph database;

S42: Based on the Python Flask framework, call the Neo4j graph database, and display the knowledge graph with the force-directed graph of Echarts.js;

S43: Call the cypher statement of Neo4J through Python to realize the query of the knowledge map entity, and display it on the web page.

5. A method for constructing and completing a food safety knowledge map based on a graph neural network according to claim 4, characterized in that: in the step S5, the specific steps are:

S51: Construct a data set based on the triplet data of the food safety knowledge map, divide the training set, verification set and test set, use the entity type related to food safety as the corresponding type number, and abstract the entity itself into an entity number, based on the entity type table Construct the DGLDataset dataset with the triple table;

S52: Using the BERT pre-training model as the entity classification model, extracting the text word vector representation of the entity name of the knowledge graph, and performing dimensionality reduction processing on the word vector obtained through pre-training;

S53: Utilizing the R-GCN aggregation information to update the node representation, and obtain graph structure information;

S54: Embedding the text word vector of the entity name before the output layer of R-GCN;

S55: After learning the output layer, use softmax to classify entities, predict possible categories for each entity in the sample, and optimize by minimizing the cross-entropy loss function on all labeled nodes.

6. A food safety knowledge map construction and completion method based on graph neural network according to claim 5, characterized in that: in the step S6, the entity classification model includes an input layer, several hidden layers and an output layer;

Each layer of the entity classification model includes a relational graph convolutional network layer R-GCN, which is used to calculate each node on the training set according to the node representation and edge type through the message function Output information, and aggregate information to obtain new node representations;

The input layer of the model uses the type and number of entities as features; the input layer and hidden layer use ReLU as the activation function; the output of the last layer uses softmax for classification;

In the hidden layer, the node representation obtained by the input layer is calculated by multiple relational graph convolutional network layers R-GCN and activation function ReLU respectively; let Y be the index set of nodes, K be the total number of categories,

Represents the predicted value of the k-th category of the i-th node with a label in the l-th layer, and t _ik represents the real label, then the output layer obtains the minimum cross-entropy loss function Loss on all labeled nodes:

In the entity classification task, the calculation amount of the model is reduced by means of basis function decomposition; the word vector obtained by the BERT pre-training model is embedded into the output layer of the R-GCN model to obtain the best effect.

7. A food safety knowledge map construction and completion method based on graph neural network according to claim 5, characterized in that: in the step S6, the specific steps are:

S61: Divide the food safety knowledge map data set, and make the relationship type ratios of the training set, verification set, and test set consistent when extracting data to avoid type imbalance from affecting the prediction effect; abstract food safety-related entity types and relationship types into correspondence type number of

S62: Use the BERT pre-training model to perform text feature extraction on the description information of each node in the knowledge graph, and obtain word vector representation;

S63: The encoder part of the link prediction model updates the representation of the node based on the hidden layer of R-GCN, uses the description of the node as text information to obtain the word vector representation through the BERT pre-training model, and embeds the text representation word vector into the last R-GCN Before the GCN hidden layer, the graph structure information of the knowledge map itself is input into the R-GCN hidden layer after initialization and embedding to obtain the representation of the knowledge map; the two features are fused and represented before entering the last hidden layer, and then passed The last layer of R-GCN hidden layer;

S64: Let s, r, and o denote the head entity, relation, and tail entity respectively, and define the scoring function as f(s, r, o); the decoder part of the link prediction model realizes the evaluation of triples through the scoring function, and obtains The rank of positive samples among all sample scores;

S65: The data set constructed in the previous stage does not contain negative example data, and the method of randomly destroying positive example data is used for negative sampling, and a certain proportion of negative samples is generated by replacing the head entity or tail entity with other random entities in the data set, and through crossover Entropy loss is optimized.

8. A graph neural network-based food safety knowledge map construction and completion method according to claim 7, characterized in that: in the step S64, the specific steps are:

S641: Let the graph structure feature of the scoring function be f _graph (s, r, o), and the text feature be f _word (s, r, o), then:

f(s, r, o) = f _graph (s, r, o) + f _word (s, r, o);

Let x, y, and z be the segment numbers of the relationship representation, head entity representation, and tail entity representation segments respectively, and divide the graph structure features and text features into two parts, as follows:

f _graph (s, r, o) = ∑ _{0≤x, y<2} s _{x, y} <r _x , s _y , o _z >,

f _word (s, r, o) = ∑ _{2≤x, y<4} <r _x , s _y , o _z >;

S642: Consider symmetry in the graph structure information part, use odd numbers to indicate asymmetric relationships, and even numbers to indicate symmetric relationships, and introduce parameters s _{x, y} ; do not add relationship constraints in the calculation of the word vector part; when x is an odd number and x+y≥ 2, s _{x, y} is -1, and the rest are 1;

S643: Introduce the index w _{x, y} of the tail entity to simplify calculation and reduce complexity.

9. A computer storage medium, characterized in that: a computer program that can be executed by a computer processor is stored therein, and the computer program executes a graph-based neural network as described in any one of claims 1 to 8. Construction and completion method of food safety knowledge graph in network.