CN109033129A

CN109033129A - Multi-source Information Fusion knowledge mapping based on adaptive weighting indicates learning method

Info

Publication number: CN109033129A
Application number: CN201810563786.6A
Authority: CN
Inventors: 常亮; 张舜尧; 匡海丽; 王文凯
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2018-06-04
Filing date: 2018-06-04
Publication date: 2018-12-18
Anticipated expiration: 2038-06-04
Also published as: CN109033129B

Abstract

The invention discloses a multi-source information fusion knowledge map representation learning method based on self-adaptive weight. Firstly, the fusion of text information and structured information is considered, and a translation-based model between entity vectors and relationship vectors is adopted. By adjusting the relationship between the two The weight between them is used to optimize the score function, and the type constraint training is performed on the structured information that has been classified in the previous stage without introducing more parameters; then the loss function is used to associate the entity vector and the relationship vector, and the loss is optimized Function, when the optimization goal is achieved, the vector of each entity and the vector of the relationship in the knowledge graph can be learned. The invention solves the problem that the fusion of text information and structured information in the knowledge base does not consider the weight, and utilizes the existing hierarchical information of the structured information in the knowledge base to more accurately represent the interrelationships between entities and relationships, and It is applied to large-scale knowledge graphs.

Description

Multi-source information fusion knowledge graph representation learning method based on adaptive weight

技术领域technical field

本发明涉及知识图谱和深度学习技术领域，具体涉及一种基于自适应权重的多源信息融合知识图谱表示学习方法。The invention relates to the technical field of knowledge graph and deep learning, in particular to a multi-source information fusion knowledge graph representation learning method based on adaptive weight.

背景技术Background technique

随着社会的迅猛发展，我们慢慢进入一个信息化的时代。海量新的数据和信息每天都以不同的形式产生。移动互联网如今已经成了当今社会最有效便捷的信息获取平台，用户对真实信息获取的需求也日益增加，如何从海量数据中获取有效信息已成为众多领域面临的主要难题。知识图谱也由此应运而生。With the rapid development of society, we are slowly entering an information age. Masses of new data and information are generated every day in different forms. The mobile Internet has become the most effective and convenient information acquisition platform in today's society, and users' demand for real information acquisition is also increasing. How to obtain effective information from massive data has become a major problem in many fields. The knowledge graph also came into being.

人们通常以网络的形式组织知识库中的知识，网络中每个结点表示实体，而每条边表示两个实体之间的关系，三元组的形式为(实体1，关系，实体2)。图1为知识图谱中典型的三元组的示例图。其中椭圆表示的结点“莎士比亚”“罗密欧与朱丽叶”都为实体，连边表示的“作者”为关系。因此，大部分知识都可以用三元组来表示，对应着知识库网络中的一条链以及链接的两个实体，这就是知识库的通用表示方式。最近几年，深度学习在语音识别，图像分析和自然语言处理领域获得广泛关注。表示学习旨在将研究对象的语义信息表示为稠密低维实值向量。在该低维向量空间中，两个对象距离越近就说明语义相似度越高。该方向最近取得了重要进展，可以在低维空间中高效计算实体和关系的语义联系，有效的解决数据稀疏问题，使知识获取，融合和推理的性能得到显著提升。People usually organize the knowledge in the knowledge base in the form of a network. Each node in the network represents an entity, and each edge represents a relationship between two entities. The form of a triple is (entity 1, relation, entity 2) . Figure 1 is an example diagram of a typical triple in a knowledge graph. The nodes "Shakespeare" and "Romeo and Juliet" represented by the ellipse are entities, and the "author" represented by the edge is a relationship. Therefore, most knowledge can be represented by triples, corresponding to a chain and two linked entities in the knowledge base network, which is the general representation of the knowledge base. In recent years, deep learning has gained widespread attention in the fields of speech recognition, image analysis and natural language processing. Representation learning aims to represent the semantic information of the research object as a dense low-dimensional real-valued vector. In this low-dimensional vector space, the closer the distance between two objects, the higher the semantic similarity. Recently, important progress has been made in this direction, which can efficiently calculate the semantic connection of entities and relationships in low-dimensional space, effectively solve the problem of data sparsity, and significantly improve the performance of knowledge acquisition, fusion and reasoning.

知识表示学习面临的一个重大挑战就是如何实现多源信息融合。现有的知识图谱的三元组结构信息如TransE等，仅利用知识图谱的三元组结构信息进行表示学习，还有大量与知识有关的其他信息没有得到有效利用如知识库的其他信息，如实体和关系的描述信息、类别信息等。A major challenge in knowledge representation learning is how to achieve multi-source information fusion. The triple structure information of the existing knowledge graph, such as TransE, only uses the triple structure information of the knowledge graph for representation learning, and a large amount of other information related to knowledge has not been effectively utilized, such as other information of the knowledge base, such as Descriptive information, category information, etc. of entities and relationships.

发明内容Contents of the invention

本发明针对现有知识图谱表示学习方法所存在的与文本信息融合后无法充分利用结构化模型和文本信息之间关系的问题，提出一种基于自适应权重的多源信息融合知识图谱表示学习方法。Aiming at the problem that existing knowledge map representation learning methods cannot fully utilize the relationship between structured models and text information after fusion with text information, the present invention proposes a multi-source information fusion knowledge map representation learning method based on adaptive weights .

为解决上述问题，本发明是通过以下技术方案实现的：In order to solve the above problems, the present invention is achieved through the following technical solutions:

基于自适应权重的多源信息融合知识图谱表示学习方法，具体包括步骤如下：A multi-source information fusion knowledge map representation learning method based on adaptive weight, the specific steps are as follows:

步骤1、利用自适应的权重来平衡文本信息和结构化信息的融合，定义文本信息和结构化信息相互关联的总得分函数f(h,r,t)：Step 1. Use adaptive weights to balance the fusion of text information and structured information, and define a total score function f(h, r, t) that correlates text information and structured information:

f(h,r,t)＝(1-λ)(||h_d+r-t_d||+||h_d+r-M_rtt_S||+||M_rhh_S+r-t_d||)+λ(||M_rhh+r+M_rtt||)f(h,r,t)＝(1-λ)(||h _d +rt _d ||+||h _d +rM _rt t _S ||+||M _rh h _S +rt _d ||)+ λ(||M _rh h+r+M _rt t||)

其中，λ表示权重，h表示头实体，t表示尾实体，r表示头实体h和尾实体t的关系，h_d表示头实体基于文本的表示，t_d表示尾实体基于文本的表示，h_S表示头实体基于结构化的表示，t_S表示尾实体基于结构化的表示，M_rh是根据头实体定义的投影矩阵，M_rh是根据尾实体定义的投影矩阵；Among them, λ represents the weight, h represents the head entity, t represents the tail entity, r represents the relationship between the head entity h and the tail entity t, h _d represents the text-based representation of the head entity, t _d represents the text-based representation of the tail entity, h _S Indicates that the head entity is based on a structured representation, t _S indicates that the tail entity is based on a structured representation, M _rh is the projection matrix defined according to the head entity, and M _rh is the projection matrix defined according to the tail entity;

步骤2、基于步骤1所定义的总得分函数f(h,r,t)，建立基于自适应权重的文本信息与结构化信息融合的损失函数，并通过最小化损失函数，学得实体和关系的向量表示，达到优化目标。Step 2. Based on the total score function f(h,r,t) defined in step 1, establish a loss function based on the fusion of text information and structured information based on adaptive weight, and learn entities and relationships by minimizing the loss function The vector representation of is to achieve the optimization goal.

上述步骤1中，权重λ的取值范围为λ∈(0，1)。In the above step 1, the value range of the weight λ is λ∈(0, 1).

上述步骤2中，采用随机梯度下降方法最小化损失函数。In step 2 above, the stochastic gradient descent method is used to minimize the loss function.

上述步骤2中，所构建的损失函数L为：In the above step 2, the constructed loss function L is:

其中，[f(h,r,t)+γ-f(h',r,t')]₊＝max(0,f(h,r,t)+γ-f(h',r,t'))；γ为设定的边界值；(h,r,t)表示知识图谱的三元组即正例三元组，h表示头实体，t表示尾实体，r表示头实体和尾实体之间的关系，f(h,r,t)表示正例三元组的得分函数，S(h,r,t)表示正例三元组集合；(h',r,t')表示随机替换掉头实体h和尾实体t所构建的负例三元组，f(h',r,t')表示负例三元组的得分函数，S′(h,r,t)表示负例三元组集合。Among them, [f(h,r,t)+γ-f(h',r,t')] ₊ ＝max(0,f(h,r,t)+γ-f(h',r,t ')); γ is the set boundary value; (h, r, t) represents the triplet of the knowledge map, that is, the positive triplet, h represents the head entity, t represents the tail entity, and r represents the head entity and tail entity The relationship between, f(h, r, t) represents the score function of positive triples, S(h, r, t) represents the set of positive triples; (h', r, t') represents random Replace the negative example triplet constructed by U-turn entity h and tail entity t, f(h', r, t') represents the score function of the negative example triplet, S'(h, r, t) represents the negative example three A collection of tuples.

与现有技术相比，本发明首先考虑了文本信息和结构化信息的融合，采用实体向量和关系向量之间基于翻译的模型，通过调节两者之间的权重来优化得分函数，并通过对前期已经分类好的结构化信息进行类型约束训练，且无需引入更多的参数；然后利用损失函数将实体向量和关系向量关联起来，并优化了该损失函数，当达到优化目标时，就可以学得知识图谱中每个实体的向量和关系的向量。本发明解决了知识库中文本信息和结构化信息融合没有考虑权重的问题，并利用了知识库中结构化信息已有的层次信息，更精确地表示实体和关系之间的相互联系，并将其应用于大规模知识图谱中。Compared with the prior art, the present invention first considers the fusion of text information and structured information, adopts a translation-based model between entity vectors and relationship vectors, optimizes the score function by adjusting the weight between the two, and The structured information that has been classified in the early stage is trained with type constraints, and there is no need to introduce more parameters; then use the loss function to associate the entity vector and the relationship vector, and optimize the loss function. When the optimization goal is achieved, you can learn Get the vector of each entity and the vector of relationship in the knowledge graph. The invention solves the problem that the fusion of text information and structured information in the knowledge base does not consider the weight, and utilizes the existing hierarchical information of the structured information in the knowledge base to more accurately represent the interrelationships between entities and relationships, and It is applied to large-scale knowledge graphs.

附图说明Description of drawings

图1为知识图谱中关系三元组的示例图。Figure 1 is an example diagram of relational triples in the knowledge graph.

图2为本发明知识图谱表示学习方法的流程示意图。Fig. 2 is a schematic flow chart of the method for learning knowledge map representation of the present invention.

图3a为根据已有的知识图谱表示学习方法得到的三元组表示的示例图。Fig. 3a is an example diagram of a triplet representation obtained according to an existing knowledge graph representation learning method.

图3b为根据本发明知识图谱表示学习方法得到的三元组表示的示例图。Fig. 3b is an example diagram of a triplet representation obtained according to the knowledge graph representation learning method of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白，以下结合具体实例，并参照附图，对本发明进一步详细说明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in combination with specific examples and with reference to the accompanying drawings.

鉴于现有技术仅考虑了文本信息和结构化信息的融合，并没有充分的考虑如何调节两者之间的权重来达到最好的效果，并且没有利用到知识库中结构化信息已有的层次信息，且学习参数数目较多，从而无法精确的表示实体和关系之间的联系，并不能很好的将其应用到大规模的知识图谱中。In view of the fact that the existing technology only considers the fusion of text information and structured information, it does not fully consider how to adjust the weight between the two to achieve the best effect, and does not utilize the existing levels of structured information in the knowledge base Information, and the number of learning parameters is large, so it cannot accurately represent the connection between entities and relationships, and it cannot be well applied to large-scale knowledge graphs.

本发明充分的考虑了自适应权重的文本信息和结构化信息的融合，并根据没有利用的层次信息来丰富结构化信息。在文本信息和结构化信息融合的时候，提出一个自适应的权重来平衡文本信息和结构化信息的融合，并对提前分类好的结构化信息进行类型约束训练，通过这样的方法来最小化得分函数，达到优化目标的目的。通过加入自适应权重的多元信息融合方法，解决知识库中实体和关系的异质性和不平衡性，更精确地表示实体和关系及其之间的相互联系，并将其应用于大规模知识图谱中。The invention fully considers the fusion of text information and structured information with self-adaptive weight, and enriches structured information according to unused hierarchical information. When text information and structured information are fused, an adaptive weight is proposed to balance the fusion of text information and structured information, and type-constrained training is performed on structured information that has been classified in advance, and the score is minimized by this method function to achieve the goal of optimization. By adding a multivariate information fusion method with adaptive weights, it can solve the heterogeneity and imbalance of entities and relationships in the knowledge base, more accurately represent entities and relationships and their interconnections, and apply them to large-scale knowledge Atlas.

具体来说，一种基于自适应权重的多源信息融合知识图谱表示学习方法，如图2所示，包括如下步骤：Specifically, a multi-source information fusion knowledge map representation learning method based on adaptive weight, as shown in Figure 2, includes the following steps:

步骤1、在文本信息和结构化信息融合的时候，提出一个自适应的权重来平衡文本信息和结构化信息的融合，并对提前分类好的结构化信息进行类型约束训练。Step 1. When text information and structured information are fused, an adaptive weight is proposed to balance the fusion of text information and structured information, and type-constrained training is performed on the pre-classified structured information.

基于文本信息和结构化信息的融合，定义相互关联的总得分函数f：Based on the fusion of text information and structured information, define an interrelated total score function f:

f(h,r,t)＝(1-λ)f_D(h,r,t)+λf_S(h,r,t)f(h,r,t)=(1-λ)f _D (h,r,t)+λf _S (h,r,t)

fD(h,r,t)表示基于文本表示的得分函数：fD(h,r,t) represents the scoring function based on the text representation:

f_D(h,r,t)＝f_DD(h,r,t)+f_DS(h,r,t)+f_SD(h,r,t)f _D (h,r,t)=f _DD (h,r,t)+f _DS (h,r,t)+f _SD (h,r,t)

＝||h_d+r-t_d||+||h_d+r-M_rtt_S||+||M_rhh_S+r-t_d||＝||h _d +rt _d ||+||h _d +rM _rt t _S ||+||M _rh h _S +rt _d ||

f_S(h,r,t)表示基于结构化表示的得分函数：f _S (h, r, t) represents a scoring function based on structured representations:

f_S(h,r,t)＝||M_rhh+r+M_rtt||f _S (h,r,t)＝||M _rh h+r+M _rt t||

其中，λ表示权重，λ∈(0，1)，h表示头实体，t表示尾实体，r表示头实体h和尾实体t的关系，h_d表示头实体基于文本的表示，t_d表示尾实体基于文本的表示，h_S表示头实体基于结构化的表示，t_S表示尾实体基于结构化的表示，M_rh是根据头实体定义的投影矩阵，M_rh是根据尾实体定义的投影矩阵。Among them, λ represents the weight, λ∈(0,1), h represents the head entity, t represents the tail entity, r represents the relationship between the head entity h and the tail entity t, h _d represents the text-based representation of the head entity, and t _d represents the tail entity Entity text-based representation, h _S indicates head entity-based structured representation, t _S indicates tail entity-based structured representation, M _rh is the projection matrix defined according to head entity, and M _rh is the projection matrix defined according to tail entity.

步骤2、提出一个基于自适应权重的文本信息与结构化信息融合的损失函数，并通过最小化损失函数，学得实体、关系的向量表示，达到优化目标。Step 2. Propose a loss function based on the fusion of text information and structured information based on adaptive weight, and learn the vector representation of entities and relationships by minimizing the loss function to achieve the optimization goal.

步骤21、定义损失函数为：Step 21. Define the loss function as:

其中，[f(h,r,t)+γ-f(h',r,t')]₊＝max(0,f(h,r,t)+γ-f(h',r,t'))₊；γ为设定的边界值；(h,r,t)表示知识图谱的三元组即正例元组，h表示头实体，t表示尾实体，r表示头实体h和尾实体t的关系，f(h,r,t)表示正例三元组的得分函数，S(h,r,t)表示正例三元组集合；(h',r,t')表示随即替换掉的头实体h和尾实体t所构建的负例三元组，f(h',r,t')表示负例三元组的得分函数，S'(h,r,t)表示负例三元组集合；Among them, [f(h,r,t)+γ-f(h',r,t')] ₊ ＝max(0,f(h,r,t)+γ-f(h',r,t ')) ₊ ; γ is the set boundary value; (h, r, t) represents the triplet of the knowledge map, that is, the positive example tuple, h represents the head entity, t represents the tail entity, r represents the head entity h and the tail entity The relation of entity t, f(h, r, t) represents the scoring function of positive triples, S(h, r, t) represents the set of positive triples; (h', r, t') represents random The negative triplet constructed by replacing the head entity h and the tail entity t, f(h',r,t') represents the score function of the negative triplet, and S'(h,r,t) represents the negative Set of example triples;

步骤22、采用随机梯度下降方法最小化损失函数，学习得到知识图谱中每个实体向量和关系向量及其之间的相互联系。Step 22: Minimize the loss function by using the stochastic gradient descent method, and learn to obtain each entity vector and relationship vector in the knowledge map and the interrelationships between them.

最小化损失函数的过程即为最小化得分函数的过程，且最小化的过程就是达到优化目标的过程。三元组得分函数中的E采用的是TransE模型中的能量函数，那么最小化损失函数的过程中，当关系r的类型为简单关系类型1-1或复杂关系类型1-N，N-1，N-N时，通过不断调整h、t和r，使h+r尽可能与t相等。The process of minimizing the loss function is the process of minimizing the score function, and the process of minimizing is the process of achieving the optimization goal. E in the triplet score function uses the energy function in the TransE model, then in the process of minimizing the loss function, when the type of the relationship r is a simple relationship type 1-1 or a complex relationship type 1-N, N-1 , N-N, by constantly adjusting h, t and r, make h+r equal to t as much as possible.

由此方法学习并得到基于自适应权重的多源信息融合的知识图谱表示学习方法，并对提前分类好的结构化信息进行类型约束训练的模型更加精确有效。This method learns and obtains a knowledge graph representation learning method based on multi-source information fusion based on adaptive weights, and the type-constrained training model for pre-classified structured information is more accurate and effective.

加入文本信息可以解决现有方法无法解决的问题：当预测一个新出现的实体(未经过训练的实体)时，原有的方法会随机来给它一个向量表示，这样它的得分函数和经过训练实体的得分函数相比就会很差，它的损失函数也会变大，从而预测的效果也会很差。在我们未改进的加入文本信息的方法，当出现一个新的实体(未经过训练的实体)，结构化方法会随机来给它一个向量表示，但是在知识库中会有对这个新实体的文本描述，我们通过新实体的文本描述可以处理成文本表示的向量，通过训练结构化方法的向量和文本表示的向量相加来得到新的得分函数，从而达到优化目标。在出现一个新出现的实体(未经过训练的实体)时，虽然加入文本信息通过文本描述来优化了得分函数，但是通过两者得分函数相加得到新的得分函数的方法中，结构化信息还是能够提供错误的信息，且比重很大，这显然是不合理的。Adding text information can solve problems that cannot be solved by existing methods: when predicting a new entity (untrained entity), the original method will randomly give it a vector representation, so that its score function and trained The score function of the entity will be poor, and its loss function will become larger, so the prediction effect will be poor. In our unimproved method of adding text information, when a new entity (untrained entity) appears, the structured method will randomly give it a vector representation, but there will be text for this new entity in the knowledge base Description, we can process the text description of the new entity into a vector of text representation, and obtain a new score function by adding the vector of the training structured method and the vector of text representation, so as to achieve the optimization goal. When a new entity (untrained entity) appears, although the score function is optimized by adding text information through text description, in the method of obtaining a new score function by adding the two score functions, the structured information is still It is obviously unreasonable to be able to provide wrong information with a large proportion.

本发明提出了在自适应权重的结构化信息和文本信息的融合。通过实体的训练次数来更新两者的权重表示，当实体每训练一次结构化信息表示的权重适当增加一点，而文本信息表示的权重则减少一点。这是因为在没有加入文本信息的时候，实体训练次数也呈长尾分布，常用实体和具有多类别的实体出现次数多，相对训练次数也足够，那么训练次数足够多的实体就会愈趋近正确的表示，训练次数少的实体的表示也会相对弱。这样的话我们认为出现次数越多的实体在结构化信息的表示已经足够好了，我们就可以把出现这样实体的结构化表示部分的权重增加，出现次数越多，结构化信息表示的权重也会随着次数增多而增大。反而，出现次数较少的实体，在结构化信息的表示不足够好，那么我们就在这些实体的文本信息表示部分的权重增加，如果实体一次没有出现，那么在结构化信息部分的权重也为0，也就是全部用文本信息表示，这样既利用了文本信息来表示未出现的新实体，同时也过滤掉了随机赋予向量的结构化信息。The present invention proposes the fusion of structured information and textual information in adaptive weights. The weight representation of the two is updated through the training times of the entity. When the entity is trained once, the weight of the structured information representation is increased a little, while the weight of the text information representation is decreased a little. This is because when no text information is added, the number of entity training times is also in a long-tail distribution. Frequently used entities and entities with multiple categories appear more often, and the relative number of training times is also sufficient, so entities with enough training times will be closer to each other. Correct representation, the representation of entities with less training times will also be relatively weak. In this case, we think that entities with more occurrences are good enough in the representation of structured information, and we can increase the weight of the structured representation of such entities. The more occurrences, the weight of structured information representation will also be Increases with increasing frequency. On the contrary, for entities that appear less frequently, the representation of structured information is not good enough, so we increase the weight of the text information representation part of these entities. If the entity does not appear once, then the weight of the structured information part is also 0, that is, all are represented by text information, which not only uses text information to represent new entities that have not appeared, but also filters out the structured information that is randomly assigned to the vector.

图3a为根据现有的知识图谱表示学习方法得到的三元组表示的示例图。在图3a中，没有考虑知识图谱三元组的层次信息。层次信息是指在不同的场景下，实体可能具有不同的角色，比如莎士比亚既是作家又是音乐家，鲍伯也具有这样的属性。我们认为拥有多种类型的实体在不同的关系下应该具有不同表示。我们从层次结构构造特定类型的投影矩阵M_r，然后把头实体h和尾实体t通过构造的特定投影矩阵来表示。这样实体具有多少种关系就会有多少种映射来分别表示这一实体在每种关系下的特殊表示。在图3b中，我们把实体的种类通过特定关系表示出来，在训练中具有相同类型的实体趋于一个簇并且具有相似的表示，事实上这也是实体预测中引起误差的主要原因。本发明中可以提高选择在特定关系类型信息下具有相同类型的实体的训练概率，通过这样的方式来优化目标。Fig. 3a is an example diagram of a triplet representation obtained according to an existing knowledge graph representation learning method. In Figure 3a, the hierarchical information of knowledge graph triples is not considered. Hierarchical information means that entities may have different roles in different scenarios. For example, Shakespeare is both a writer and a musician, and Bob also has such attributes. We believe that entities with multiple types should have different representations under different relations. We construct a specific type of projection matrix M _r from the hierarchical structure, and then represent the head entity h and tail entity t through the constructed specific projection matrix. In this way, there will be as many mappings as there are as many relationships as the entity has to represent the special representation of this entity under each relationship. In Figure 3b, we represent the types of entities through specific relations. Entities with the same type tend to be in a cluster and have similar representations in training. In fact, this is the main reason for the error in entity prediction. In the present invention, the training probability of selecting entities of the same type under specific relationship type information can be improved, and the target can be optimized in this way.

本发明解决了现有技术中实体和关系的不平衡性和异质性，以及参数过多而导致的计算过于复杂，没办法很好的表示知识图谱中的实体和关系之间的相互联系以及不能很好地应用于大规模知识图谱中的问题，具有良好的实用性。The present invention solves the imbalance and heterogeneity of entities and relationships in the prior art, and the calculation is too complicated due to too many parameters, and there is no way to well represent the interconnections between entities and relationships in the knowledge map and It cannot be well applied to problems in large-scale knowledge graphs, and has good practicability.

需要说明的是，尽管以上本发明所述的实施例是说明性的，但这并非是对本发明的限制，因此本发明并不局限于上述具体实施方式中。在不脱离本发明原理的情况下，凡是本领域技术人员在本发明的启示下获得的其它实施方式，均视为在本发明的保护之内。It should be noted that although the above-mentioned embodiments of the present invention are illustrative, they are not intended to limit the present invention, so the present invention is not limited to the above specific implementation manners. Without departing from the principles of the present invention, all other implementations obtained by those skilled in the art under the inspiration of the present invention are deemed to be within the protection of the present invention.

Claims

1. A multi-source information fusion knowledge map representation learning method based on adaptive weight, characterized in that the specific steps are as follows:

Step 1. Use adaptive weights to balance the fusion of text information and structured information, and define a total score function f(h, r, t) that correlates text information and structured information:

f(h,r,t)＝(1-λ)(||h _d +rt _d ||+||h _d +rM _rt t _S ||+||M _rh h _S +rt _d ||)+ λ(||M _rh h+r+M _rt t||)

Among them, λ represents the weight, h represents the head entity, t represents the tail entity, r represents the relationship between the head entity h and the tail entity t, h _d represents the text-based representation of the head entity, t _d represents the text-based representation of the tail entity, h _S Indicates that the head entity is based on a structured representation, t _S indicates that the tail entity is based on a structured representation, M _rh is the projection matrix defined according to the head entity, and M _rh is the projection matrix defined according to the tail entity;

Step 2. Based on the total score function f(h,r,t) defined in step 1, establish a loss function based on the fusion of text information and structured information based on adaptive weight, and learn entities and relationships by minimizing the loss function The vector representation of is to achieve the optimization goal.

2. The multi-source information fusion knowledge map representation learning method based on adaptive weight according to claim 1, characterized in that, in step 1, the value range of weight λ is λ∈(0,1).

3. The adaptive weight-based multi-source information fusion knowledge graph representation learning method according to claim 1, characterized in that in step 2, the stochastic gradient descent method is used to minimize the loss function.

4. The multi-source information fusion knowledge map representation learning method based on adaptive weight according to claim 1, characterized in that, in step 2, the constructed loss function L is:

Among them, [f(h,r,t)+γ-f(h',r,t')] ₊ ＝max(0,f(h,r,t)+γ-f(h',r,t ')); γ is the set boundary value; (h, r, t) represents the triplet of the knowledge map, that is, the positive triplet, h represents the head entity, t represents the tail entity, and r represents the head entity and tail entity The relationship between, f(h, r, t) represents the score function of positive triples, S(h, r, t) represents the set of positive triples; (h', r, t') represents random Replace the negative example triplet constructed by U-turn entity h and tail entity t, f(h', r, t') represents the score function of the negative example triplet, S'(h, r, t) represents the negative example three A collection of tuples.