+

CN109033129A - Multi-source Information Fusion knowledge mapping based on adaptive weighting indicates learning method - Google Patents

Multi-source Information Fusion knowledge mapping based on adaptive weighting indicates learning method Download PDF

Info

Publication number
CN109033129A
CN109033129A CN201810563786.6A CN201810563786A CN109033129A CN 109033129 A CN109033129 A CN 109033129A CN 201810563786 A CN201810563786 A CN 201810563786A CN 109033129 A CN109033129 A CN 109033129A
Authority
CN
China
Prior art keywords
entity
information
representation
structured
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810563786.6A
Other languages
Chinese (zh)
Other versions
CN109033129B (en
Inventor
常亮
张舜尧
匡海丽
王文凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN201810563786.6A priority Critical patent/CN109033129B/en
Publication of CN109033129A publication Critical patent/CN109033129A/en
Application granted granted Critical
Publication of CN109033129B publication Critical patent/CN109033129B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

本发明公开一种基于自适应权重的多源信息融合知识图谱表示学习方法,首先考虑了文本信息和结构化信息的融合,采用实体向量和关系向量之间基于翻译的模型,通过调节两者之间的权重来优化得分函数,并通过对前期已经分类好的结构化信息进行类型约束训练,且无需引入更多的参数;然后利用损失函数将实体向量和关系向量关联起来,并优化了该损失函数,当达到优化目标时,就可以学得知识图谱中每个实体的向量和关系的向量。本发明解决了知识库中文本信息和结构化信息融合没有考虑权重的问题,并利用了知识库中结构化信息已有的层次信息,更精确地表示实体和关系之间的相互联系,并将其应用于大规模知识图谱中。

The invention discloses a multi-source information fusion knowledge map representation learning method based on self-adaptive weight. Firstly, the fusion of text information and structured information is considered, and a translation-based model between entity vectors and relationship vectors is adopted. By adjusting the relationship between the two The weight between them is used to optimize the score function, and the type constraint training is performed on the structured information that has been classified in the previous stage without introducing more parameters; then the loss function is used to associate the entity vector and the relationship vector, and the loss is optimized Function, when the optimization goal is achieved, the vector of each entity and the vector of the relationship in the knowledge graph can be learned. The invention solves the problem that the fusion of text information and structured information in the knowledge base does not consider the weight, and utilizes the existing hierarchical information of the structured information in the knowledge base to more accurately represent the interrelationships between entities and relationships, and It is applied to large-scale knowledge graphs.

Description

基于自适应权重的多源信息融合知识图谱表示学习方法Multi-source information fusion knowledge graph representation learning method based on adaptive weight

技术领域technical field

本发明涉及知识图谱和深度学习技术领域,具体涉及一种基于自适应权重的多源信息融合知识图谱表示学习方法。The invention relates to the technical field of knowledge graph and deep learning, in particular to a multi-source information fusion knowledge graph representation learning method based on adaptive weight.

背景技术Background technique

随着社会的迅猛发展,我们慢慢进入一个信息化的时代。海量新的数据和信息每天都以不同的形式产生。移动互联网如今已经成了当今社会最有效便捷的信息获取平台,用户对真实信息获取的需求也日益增加,如何从海量数据中获取有效信息已成为众多领域面临的主要难题。知识图谱也由此应运而生。With the rapid development of society, we are slowly entering an information age. Masses of new data and information are generated every day in different forms. The mobile Internet has become the most effective and convenient information acquisition platform in today's society, and users' demand for real information acquisition is also increasing. How to obtain effective information from massive data has become a major problem in many fields. The knowledge graph also came into being.

人们通常以网络的形式组织知识库中的知识,网络中每个结点表示实体,而每条边表示两个实体之间的关系,三元组的形式为(实体1,关系,实体2)。图1为知识图谱中典型的三元组的示例图。其中椭圆表示的结点“莎士比亚”“罗密欧与朱丽叶”都为实体,连边表示的“作者”为关系。因此,大部分知识都可以用三元组来表示,对应着知识库网络中的一条链以及链接的两个实体,这就是知识库的通用表示方式。最近几年,深度学习在语音识别,图像分析和自然语言处理领域获得广泛关注。表示学习旨在将研究对象的语义信息表示为稠密低维实值向量。在该低维向量空间中,两个对象距离越近就说明语义相似度越高。该方向最近取得了重要进展,可以在低维空间中高效计算实体和关系的语义联系,有效的解决数据稀疏问题,使知识获取,融合和推理的性能得到显著提升。People usually organize the knowledge in the knowledge base in the form of a network. Each node in the network represents an entity, and each edge represents a relationship between two entities. The form of a triple is (entity 1, relation, entity 2) . Figure 1 is an example diagram of a typical triple in a knowledge graph. The nodes "Shakespeare" and "Romeo and Juliet" represented by the ellipse are entities, and the "author" represented by the edge is a relationship. Therefore, most knowledge can be represented by triples, corresponding to a chain and two linked entities in the knowledge base network, which is the general representation of the knowledge base. In recent years, deep learning has gained widespread attention in the fields of speech recognition, image analysis and natural language processing. Representation learning aims to represent the semantic information of the research object as a dense low-dimensional real-valued vector. In this low-dimensional vector space, the closer the distance between two objects, the higher the semantic similarity. Recently, important progress has been made in this direction, which can efficiently calculate the semantic connection of entities and relationships in low-dimensional space, effectively solve the problem of data sparsity, and significantly improve the performance of knowledge acquisition, fusion and reasoning.

知识表示学习面临的一个重大挑战就是如何实现多源信息融合。现有的知识图谱的三元组结构信息如TransE等,仅利用知识图谱的三元组结构信息进行表示学习,还有大量与知识有关的其他信息没有得到有效利用如知识库的其他信息,如实体和关系的描述信息、类别信息等。A major challenge in knowledge representation learning is how to achieve multi-source information fusion. The triple structure information of the existing knowledge graph, such as TransE, only uses the triple structure information of the knowledge graph for representation learning, and a large amount of other information related to knowledge has not been effectively utilized, such as other information of the knowledge base, such as Descriptive information, category information, etc. of entities and relationships.

发明内容Contents of the invention

本发明针对现有知识图谱表示学习方法所存在的与文本信息融合后无法充分利用结构化模型和文本信息之间关系的问题,提出一种基于自适应权重的多源信息融合知识图谱表示学习方法。Aiming at the problem that existing knowledge map representation learning methods cannot fully utilize the relationship between structured models and text information after fusion with text information, the present invention proposes a multi-source information fusion knowledge map representation learning method based on adaptive weights .

为解决上述问题,本发明是通过以下技术方案实现的:In order to solve the above problems, the present invention is achieved through the following technical solutions:

基于自适应权重的多源信息融合知识图谱表示学习方法,具体包括步骤如下:A multi-source information fusion knowledge map representation learning method based on adaptive weight, the specific steps are as follows:

步骤1、利用自适应的权重来平衡文本信息和结构化信息的融合,定义文本信息和结构化信息相互关联的总得分函数f(h,r,t):Step 1. Use adaptive weights to balance the fusion of text information and structured information, and define a total score function f(h, r, t) that correlates text information and structured information:

f(h,r,t)=(1-λ)(||hd+r-td||+||hd+r-MrttS||+||MrhhS+r-td||)+λ(||Mrhh+r+Mrtt||)f(h,r,t)=(1-λ)(||h d +rt d ||+||h d +rM rt t S ||+||M rh h S +rt d ||)+ λ(||M rh h+r+M rt t||)

其中,λ表示权重,h表示头实体,t表示尾实体,r表示头实体h和尾实体t的关系,hd表示头实体基于文本的表示,td表示尾实体基于文本的表示,hS表示头实体基于结构化的表示,tS表示尾实体基于结构化的表示,Mrh是根据头实体定义的投影矩阵,Mrh是根据尾实体定义的投影矩阵;Among them, λ represents the weight, h represents the head entity, t represents the tail entity, r represents the relationship between the head entity h and the tail entity t, h d represents the text-based representation of the head entity, t d represents the text-based representation of the tail entity, h S Indicates that the head entity is based on a structured representation, t S indicates that the tail entity is based on a structured representation, M rh is the projection matrix defined according to the head entity, and M rh is the projection matrix defined according to the tail entity;

步骤2、基于步骤1所定义的总得分函数f(h,r,t),建立基于自适应权重的文本信息与结构化信息融合的损失函数,并通过最小化损失函数,学得实体和关系的向量表示,达到优化目标。Step 2. Based on the total score function f(h,r,t) defined in step 1, establish a loss function based on the fusion of text information and structured information based on adaptive weight, and learn entities and relationships by minimizing the loss function The vector representation of is to achieve the optimization goal.

上述步骤1中,权重λ的取值范围为λ∈(0,1)。In the above step 1, the value range of the weight λ is λ∈(0, 1).

上述步骤2中,采用随机梯度下降方法最小化损失函数。In step 2 above, the stochastic gradient descent method is used to minimize the loss function.

上述步骤2中,所构建的损失函数L为:In the above step 2, the constructed loss function L is:

其中,[f(h,r,t)+γ-f(h',r,t')]+=max(0,f(h,r,t)+γ-f(h',r,t'));γ为设定的边界值;(h,r,t)表示知识图谱的三元组即正例三元组,h表示头实体,t表示尾实体,r表示头实体和尾实体之间的关系,f(h,r,t)表示正例三元组的得分函数,S(h,r,t)表示正例三元组集合;(h',r,t')表示随机替换掉头实体h和尾实体t所构建的负例三元组,f(h',r,t')表示负例三元组的得分函数,S′(h,r,t)表示负例三元组集合。Among them, [f(h,r,t)+γ-f(h',r,t')] + =max(0,f(h,r,t)+γ-f(h',r,t ')); γ is the set boundary value; (h, r, t) represents the triplet of the knowledge map, that is, the positive triplet, h represents the head entity, t represents the tail entity, and r represents the head entity and tail entity The relationship between, f(h, r, t) represents the score function of positive triples, S(h, r, t) represents the set of positive triples; (h', r, t') represents random Replace the negative example triplet constructed by U-turn entity h and tail entity t, f(h', r, t') represents the score function of the negative example triplet, S'(h, r, t) represents the negative example three A collection of tuples.

与现有技术相比,本发明首先考虑了文本信息和结构化信息的融合,采用实体向量和关系向量之间基于翻译的模型,通过调节两者之间的权重来优化得分函数,并通过对前期已经分类好的结构化信息进行类型约束训练,且无需引入更多的参数;然后利用损失函数将实体向量和关系向量关联起来,并优化了该损失函数,当达到优化目标时,就可以学得知识图谱中每个实体的向量和关系的向量。本发明解决了知识库中文本信息和结构化信息融合没有考虑权重的问题,并利用了知识库中结构化信息已有的层次信息,更精确地表示实体和关系之间的相互联系,并将其应用于大规模知识图谱中。Compared with the prior art, the present invention first considers the fusion of text information and structured information, adopts a translation-based model between entity vectors and relationship vectors, optimizes the score function by adjusting the weight between the two, and The structured information that has been classified in the early stage is trained with type constraints, and there is no need to introduce more parameters; then use the loss function to associate the entity vector and the relationship vector, and optimize the loss function. When the optimization goal is achieved, you can learn Get the vector of each entity and the vector of relationship in the knowledge graph. The invention solves the problem that the fusion of text information and structured information in the knowledge base does not consider the weight, and utilizes the existing hierarchical information of the structured information in the knowledge base to more accurately represent the interrelationships between entities and relationships, and It is applied to large-scale knowledge graphs.

附图说明Description of drawings

图1为知识图谱中关系三元组的示例图。Figure 1 is an example diagram of relational triples in the knowledge graph.

图2为本发明知识图谱表示学习方法的流程示意图。Fig. 2 is a schematic flow chart of the method for learning knowledge map representation of the present invention.

图3a为根据已有的知识图谱表示学习方法得到的三元组表示的示例图。Fig. 3a is an example diagram of a triplet representation obtained according to an existing knowledge graph representation learning method.

图3b为根据本发明知识图谱表示学习方法得到的三元组表示的示例图。Fig. 3b is an example diagram of a triplet representation obtained according to the knowledge graph representation learning method of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实例,并参照附图,对本发明进一步详细说明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in combination with specific examples and with reference to the accompanying drawings.

鉴于现有技术仅考虑了文本信息和结构化信息的融合,并没有充分的考虑如何调节两者之间的权重来达到最好的效果,并且没有利用到知识库中结构化信息已有的层次信息,且学习参数数目较多,从而无法精确的表示实体和关系之间的联系,并不能很好的将其应用到大规模的知识图谱中。In view of the fact that the existing technology only considers the fusion of text information and structured information, it does not fully consider how to adjust the weight between the two to achieve the best effect, and does not utilize the existing levels of structured information in the knowledge base Information, and the number of learning parameters is large, so it cannot accurately represent the connection between entities and relationships, and it cannot be well applied to large-scale knowledge graphs.

本发明充分的考虑了自适应权重的文本信息和结构化信息的融合,并根据没有利用的层次信息来丰富结构化信息。在文本信息和结构化信息融合的时候,提出一个自适应的权重来平衡文本信息和结构化信息的融合,并对提前分类好的结构化信息进行类型约束训练,通过这样的方法来最小化得分函数,达到优化目标的目的。通过加入自适应权重的多元信息融合方法,解决知识库中实体和关系的异质性和不平衡性,更精确地表示实体和关系及其之间的相互联系,并将其应用于大规模知识图谱中。The invention fully considers the fusion of text information and structured information with self-adaptive weight, and enriches structured information according to unused hierarchical information. When text information and structured information are fused, an adaptive weight is proposed to balance the fusion of text information and structured information, and type-constrained training is performed on structured information that has been classified in advance, and the score is minimized by this method function to achieve the goal of optimization. By adding a multivariate information fusion method with adaptive weights, it can solve the heterogeneity and imbalance of entities and relationships in the knowledge base, more accurately represent entities and relationships and their interconnections, and apply them to large-scale knowledge Atlas.

具体来说,一种基于自适应权重的多源信息融合知识图谱表示学习方法,如图2所示,包括如下步骤:Specifically, a multi-source information fusion knowledge map representation learning method based on adaptive weight, as shown in Figure 2, includes the following steps:

步骤1、在文本信息和结构化信息融合的时候,提出一个自适应的权重来平衡文本信息和结构化信息的融合,并对提前分类好的结构化信息进行类型约束训练。Step 1. When text information and structured information are fused, an adaptive weight is proposed to balance the fusion of text information and structured information, and type-constrained training is performed on the pre-classified structured information.

基于文本信息和结构化信息的融合,定义相互关联的总得分函数f:Based on the fusion of text information and structured information, define an interrelated total score function f:

f(h,r,t)=(1-λ)fD(h,r,t)+λfS(h,r,t)f(h,r,t)=(1-λ)f D (h,r,t)+λf S (h,r,t)

fD(h,r,t)表示基于文本表示的得分函数:fD(h,r,t) represents the scoring function based on the text representation:

fD(h,r,t)=fDD(h,r,t)+fDS(h,r,t)+fSD(h,r,t)f D (h,r,t)=f DD (h,r,t)+f DS (h,r,t)+f SD (h,r,t)

=||hd+r-td||+||hd+r-MrttS||+||MrhhS+r-td||=||h d +rt d ||+||h d +rM rt t S ||+||M rh h S +rt d ||

fS(h,r,t)表示基于结构化表示的得分函数:f S (h, r, t) represents a scoring function based on structured representations:

fS(h,r,t)=||Mrhh+r+Mrtt||f S (h,r,t)=||M rh h+r+M rt t||

其中,λ表示权重,λ∈(0,1),h表示头实体,t表示尾实体,r表示头实体h和尾实体t的关系,hd表示头实体基于文本的表示,td表示尾实体基于文本的表示,hS表示头实体基于结构化的表示,tS表示尾实体基于结构化的表示,Mrh是根据头实体定义的投影矩阵,Mrh是根据尾实体定义的投影矩阵。Among them, λ represents the weight, λ∈(0,1), h represents the head entity, t represents the tail entity, r represents the relationship between the head entity h and the tail entity t, h d represents the text-based representation of the head entity, and t d represents the tail entity Entity text-based representation, h S indicates head entity-based structured representation, t S indicates tail entity-based structured representation, M rh is the projection matrix defined according to head entity, and M rh is the projection matrix defined according to tail entity.

步骤2、提出一个基于自适应权重的文本信息与结构化信息融合的损失函数,并通过最小化损失函数,学得实体、关系的向量表示,达到优化目标。Step 2. Propose a loss function based on the fusion of text information and structured information based on adaptive weight, and learn the vector representation of entities and relationships by minimizing the loss function to achieve the optimization goal.

步骤21、定义损失函数为:Step 21. Define the loss function as:

其中,[f(h,r,t)+γ-f(h',r,t')]+=max(0,f(h,r,t)+γ-f(h',r,t'))+;γ为设定的边界值;(h,r,t)表示知识图谱的三元组即正例元组,h表示头实体,t表示尾实体,r表示头实体h和尾实体t的关系,f(h,r,t)表示正例三元组的得分函数,S(h,r,t)表示正例三元组集合;(h',r,t')表示随即替换掉的头实体h和尾实体t所构建的负例三元组,f(h',r,t')表示负例三元组的得分函数,S'(h,r,t)表示负例三元组集合;Among them, [f(h,r,t)+γ-f(h',r,t')] + =max(0,f(h,r,t)+γ-f(h',r,t ')) + ; γ is the set boundary value; (h, r, t) represents the triplet of the knowledge map, that is, the positive example tuple, h represents the head entity, t represents the tail entity, r represents the head entity h and the tail entity The relation of entity t, f(h, r, t) represents the scoring function of positive triples, S(h, r, t) represents the set of positive triples; (h', r, t') represents random The negative triplet constructed by replacing the head entity h and the tail entity t, f(h',r,t') represents the score function of the negative triplet, and S'(h,r,t) represents the negative Set of example triples;

步骤22、采用随机梯度下降方法最小化损失函数,学习得到知识图谱中每个实体向量和关系向量及其之间的相互联系。Step 22: Minimize the loss function by using the stochastic gradient descent method, and learn to obtain each entity vector and relationship vector in the knowledge map and the interrelationships between them.

最小化损失函数的过程即为最小化得分函数的过程,且最小化的过程就是达到优化目标的过程。三元组得分函数中的E采用的是TransE模型中的能量函数,那么最小化损失函数的过程中,当关系r的类型为简单关系类型1-1或复杂关系类型1-N,N-1,N-N时,通过不断调整h、t和r,使h+r尽可能与t相等。The process of minimizing the loss function is the process of minimizing the score function, and the process of minimizing is the process of achieving the optimization goal. E in the triplet score function uses the energy function in the TransE model, then in the process of minimizing the loss function, when the type of the relationship r is a simple relationship type 1-1 or a complex relationship type 1-N, N-1 , N-N, by constantly adjusting h, t and r, make h+r equal to t as much as possible.

由此方法学习并得到基于自适应权重的多源信息融合的知识图谱表示学习方法,并对提前分类好的结构化信息进行类型约束训练的模型更加精确有效。This method learns and obtains a knowledge graph representation learning method based on multi-source information fusion based on adaptive weights, and the type-constrained training model for pre-classified structured information is more accurate and effective.

加入文本信息可以解决现有方法无法解决的问题:当预测一个新出现的实体(未经过训练的实体)时,原有的方法会随机来给它一个向量表示,这样它的得分函数和经过训练实体的得分函数相比就会很差,它的损失函数也会变大,从而预测的效果也会很差。在我们未改进的加入文本信息的方法,当出现一个新的实体(未经过训练的实体),结构化方法会随机来给它一个向量表示,但是在知识库中会有对这个新实体的文本描述,我们通过新实体的文本描述可以处理成文本表示的向量,通过训练结构化方法的向量和文本表示的向量相加来得到新的得分函数,从而达到优化目标。在出现一个新出现的实体(未经过训练的实体)时,虽然加入文本信息通过文本描述来优化了得分函数,但是通过两者得分函数相加得到新的得分函数的方法中,结构化信息还是能够提供错误的信息,且比重很大,这显然是不合理的。Adding text information can solve problems that cannot be solved by existing methods: when predicting a new entity (untrained entity), the original method will randomly give it a vector representation, so that its score function and trained The score function of the entity will be poor, and its loss function will become larger, so the prediction effect will be poor. In our unimproved method of adding text information, when a new entity (untrained entity) appears, the structured method will randomly give it a vector representation, but there will be text for this new entity in the knowledge base Description, we can process the text description of the new entity into a vector of text representation, and obtain a new score function by adding the vector of the training structured method and the vector of text representation, so as to achieve the optimization goal. When a new entity (untrained entity) appears, although the score function is optimized by adding text information through text description, in the method of obtaining a new score function by adding the two score functions, the structured information is still It is obviously unreasonable to be able to provide wrong information with a large proportion.

本发明提出了在自适应权重的结构化信息和文本信息的融合。通过实体的训练次数来更新两者的权重表示,当实体每训练一次结构化信息表示的权重适当增加一点,而文本信息表示的权重则减少一点。这是因为在没有加入文本信息的时候,实体训练次数也呈长尾分布,常用实体和具有多类别的实体出现次数多,相对训练次数也足够,那么训练次数足够多的实体就会愈趋近正确的表示,训练次数少的实体的表示也会相对弱。这样的话我们认为出现次数越多的实体在结构化信息的表示已经足够好了,我们就可以把出现这样实体的结构化表示部分的权重增加,出现次数越多,结构化信息表示的权重也会随着次数增多而增大。反而,出现次数较少的实体,在结构化信息的表示不足够好,那么我们就在这些实体的文本信息表示部分的权重增加,如果实体一次没有出现,那么在结构化信息部分的权重也为0,也就是全部用文本信息表示,这样既利用了文本信息来表示未出现的新实体,同时也过滤掉了随机赋予向量的结构化信息。The present invention proposes the fusion of structured information and textual information in adaptive weights. The weight representation of the two is updated through the training times of the entity. When the entity is trained once, the weight of the structured information representation is increased a little, while the weight of the text information representation is decreased a little. This is because when no text information is added, the number of entity training times is also in a long-tail distribution. Frequently used entities and entities with multiple categories appear more often, and the relative number of training times is also sufficient, so entities with enough training times will be closer to each other. Correct representation, the representation of entities with less training times will also be relatively weak. In this case, we think that entities with more occurrences are good enough in the representation of structured information, and we can increase the weight of the structured representation of such entities. The more occurrences, the weight of structured information representation will also be Increases with increasing frequency. On the contrary, for entities that appear less frequently, the representation of structured information is not good enough, so we increase the weight of the text information representation part of these entities. If the entity does not appear once, then the weight of the structured information part is also 0, that is, all are represented by text information, which not only uses text information to represent new entities that have not appeared, but also filters out the structured information that is randomly assigned to the vector.

图3a为根据现有的知识图谱表示学习方法得到的三元组表示的示例图。在图3a中,没有考虑知识图谱三元组的层次信息。层次信息是指在不同的场景下,实体可能具有不同的角色,比如莎士比亚既是作家又是音乐家,鲍伯也具有这样的属性。我们认为拥有多种类型的实体在不同的关系下应该具有不同表示。我们从层次结构构造特定类型的投影矩阵Mr,然后把头实体h和尾实体t通过构造的特定投影矩阵来表示。这样实体具有多少种关系就会有多少种映射来分别表示这一实体在每种关系下的特殊表示。在图3b中,我们把实体的种类通过特定关系表示出来,在训练中具有相同类型的实体趋于一个簇并且具有相似的表示,事实上这也是实体预测中引起误差的主要原因。本发明中可以提高选择在特定关系类型信息下具有相同类型的实体的训练概率,通过这样的方式来优化目标。Fig. 3a is an example diagram of a triplet representation obtained according to an existing knowledge graph representation learning method. In Figure 3a, the hierarchical information of knowledge graph triples is not considered. Hierarchical information means that entities may have different roles in different scenarios. For example, Shakespeare is both a writer and a musician, and Bob also has such attributes. We believe that entities with multiple types should have different representations under different relations. We construct a specific type of projection matrix M r from the hierarchical structure, and then represent the head entity h and tail entity t through the constructed specific projection matrix. In this way, there will be as many mappings as there are as many relationships as the entity has to represent the special representation of this entity under each relationship. In Figure 3b, we represent the types of entities through specific relations. Entities with the same type tend to be in a cluster and have similar representations in training. In fact, this is the main reason for the error in entity prediction. In the present invention, the training probability of selecting entities of the same type under specific relationship type information can be improved, and the target can be optimized in this way.

本发明解决了现有技术中实体和关系的不平衡性和异质性,以及参数过多而导致的计算过于复杂,没办法很好的表示知识图谱中的实体和关系之间的相互联系以及不能很好地应用于大规模知识图谱中的问题,具有良好的实用性。The present invention solves the imbalance and heterogeneity of entities and relationships in the prior art, and the calculation is too complicated due to too many parameters, and there is no way to well represent the interconnections between entities and relationships in the knowledge map and It cannot be well applied to problems in large-scale knowledge graphs, and has good practicability.

需要说明的是,尽管以上本发明所述的实施例是说明性的,但这并非是对本发明的限制,因此本发明并不局限于上述具体实施方式中。在不脱离本发明原理的情况下,凡是本领域技术人员在本发明的启示下获得的其它实施方式,均视为在本发明的保护之内。It should be noted that although the above-mentioned embodiments of the present invention are illustrative, they are not intended to limit the present invention, so the present invention is not limited to the above specific implementation manners. Without departing from the principles of the present invention, all other implementations obtained by those skilled in the art under the inspiration of the present invention are deemed to be within the protection of the present invention.

Claims (4)

1.基于自适应权重的多源信息融合知识图谱表示学习方法,其特征是,具体包括步骤如下:1. A multi-source information fusion knowledge map representation learning method based on adaptive weight, characterized in that the specific steps are as follows: 步骤1、利用自适应的权重来平衡文本信息和结构化信息的融合,定义文本信息和结构化信息相互关联的总得分函数f(h,r,t):Step 1. Use adaptive weights to balance the fusion of text information and structured information, and define a total score function f(h, r, t) that correlates text information and structured information: f(h,r,t)=(1-λ)(||hd+r-td||+||hd+r-MrttS||+||MrhhS+r-td||)+λ(||Mrhh+r+Mrtt||)f(h,r,t)=(1-λ)(||h d +rt d ||+||h d +rM rt t S ||+||M rh h S +rt d ||)+ λ(||M rh h+r+M rt t||) 其中,λ表示权重,h表示头实体,t表示尾实体,r表示头实体h和尾实体t的关系,hd表示头实体基于文本的表示,td表示尾实体基于文本的表示,hS表示头实体基于结构化的表示,tS表示尾实体基于结构化的表示,Mrh是根据头实体定义的投影矩阵,Mrh是根据尾实体定义的投影矩阵;Among them, λ represents the weight, h represents the head entity, t represents the tail entity, r represents the relationship between the head entity h and the tail entity t, h d represents the text-based representation of the head entity, t d represents the text-based representation of the tail entity, h S Indicates that the head entity is based on a structured representation, t S indicates that the tail entity is based on a structured representation, M rh is the projection matrix defined according to the head entity, and M rh is the projection matrix defined according to the tail entity; 步骤2、基于步骤1所定义的总得分函数f(h,r,t),建立基于自适应权重的文本信息与结构化信息融合的损失函数,并通过最小化损失函数,学得实体和关系的向量表示,达到优化目标。Step 2. Based on the total score function f(h,r,t) defined in step 1, establish a loss function based on the fusion of text information and structured information based on adaptive weight, and learn entities and relationships by minimizing the loss function The vector representation of is to achieve the optimization goal. 2.根据权利要求1所述的基于自适应权重的多源信息融合知识图谱表示学习方法,其特征是,步骤1中,权重λ的取值范围为λ∈(0,1)。2. The multi-source information fusion knowledge map representation learning method based on adaptive weight according to claim 1, characterized in that, in step 1, the value range of weight λ is λ∈(0,1). 3.根据权利要求1所述的基于自适应权重的多源信息融合知识图谱表示学习方法,其特征是,步骤2中,采用随机梯度下降方法最小化损失函数。3. The adaptive weight-based multi-source information fusion knowledge graph representation learning method according to claim 1, characterized in that in step 2, the stochastic gradient descent method is used to minimize the loss function. 4.根据权利要求1所述的基于自适应权重的多源信息融合知识图谱表示学习方法,其特征是,步骤2中,所构建的损失函数L为:4. The multi-source information fusion knowledge map representation learning method based on adaptive weight according to claim 1, characterized in that, in step 2, the constructed loss function L is: 其中,[f(h,r,t)+γ-f(h',r,t')]+=max(0,f(h,r,t)+γ-f(h',r,t'));γ为设定的边界值;(h,r,t)表示知识图谱的三元组即正例三元组,h表示头实体,t表示尾实体,r表示头实体和尾实体之间的关系,f(h,r,t)表示正例三元组的得分函数,S(h,r,t)表示正例三元组集合;(h',r,t')表示随机替换掉头实体h和尾实体t所构建的负例三元组,f(h',r,t')表示负例三元组的得分函数,S′(h,r,t)表示负例三元组集合。Among them, [f(h,r,t)+γ-f(h',r,t')] + =max(0,f(h,r,t)+γ-f(h',r,t ')); γ is the set boundary value; (h, r, t) represents the triplet of the knowledge map, that is, the positive triplet, h represents the head entity, t represents the tail entity, and r represents the head entity and tail entity The relationship between, f(h, r, t) represents the score function of positive triples, S(h, r, t) represents the set of positive triples; (h', r, t') represents random Replace the negative example triplet constructed by U-turn entity h and tail entity t, f(h', r, t') represents the score function of the negative example triplet, S'(h, r, t) represents the negative example three A collection of tuples.
CN201810563786.6A 2018-06-04 2018-06-04 Multi-source information fusion knowledge graph representation learning method based on adaptive weights Active CN109033129B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810563786.6A CN109033129B (en) 2018-06-04 2018-06-04 Multi-source information fusion knowledge graph representation learning method based on adaptive weights

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810563786.6A CN109033129B (en) 2018-06-04 2018-06-04 Multi-source information fusion knowledge graph representation learning method based on adaptive weights

Publications (2)

Publication Number Publication Date
CN109033129A true CN109033129A (en) 2018-12-18
CN109033129B CN109033129B (en) 2021-08-03

Family

ID=64612062

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810563786.6A Active CN109033129B (en) 2018-06-04 2018-06-04 Multi-source information fusion knowledge graph representation learning method based on adaptive weights

Country Status (1)

Country Link
CN (1) CN109033129B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109739996A (en) * 2018-12-29 2019-05-10 北京航天数据股份有限公司 A kind of construction method and device of industry knowledge mapping
CN110232186A (en) * 2019-05-20 2019-09-13 浙江大学 The knowledge mapping for merging entity description, stratification type and text relation information indicates learning method
CN110263324A (en) * 2019-05-16 2019-09-20 华为技术有限公司 Text handling method, model training method and device
CN110275959A (en) * 2019-05-22 2019-09-24 广东工业大学 A Fast Learning Method for Large-Scale Knowledge Base
CN111159485A (en) * 2019-12-30 2020-05-15 科大讯飞(苏州)科技有限公司 Tail entity linking method, device, server and storage medium
WO2020135048A1 (en) * 2018-12-29 2020-07-02 颖投信息科技(上海)有限公司 Data merging method and apparatus for knowledge graph
CN111488402A (en) * 2020-03-26 2020-08-04 天津大学 A Representation Learning Method with Hierarchical Relational Structure Knowledge Graph
CN111680109A (en) * 2020-04-22 2020-09-18 北京三快在线科技有限公司 Knowledge graph representation learning model training method and device and electronic equipment
CN111881290A (en) * 2020-06-17 2020-11-03 国家电网有限公司 A multi-source grid entity fusion method for distribution network based on weighted semantic similarity
CN112596031A (en) * 2020-12-22 2021-04-02 电子科技大学 Target radar threat degree assessment method based on knowledge graph
CN113032582A (en) * 2021-04-20 2021-06-25 杭州叙简科技股份有限公司 Knowledge graph based entity unified model establishment and entity unified method
CN113312487A (en) * 2021-01-16 2021-08-27 江苏网进科技股份有限公司 Knowledge representation learning method facing legal text based on TransE model
CN113590843A (en) * 2021-08-06 2021-11-02 中国海洋大学 Knowledge representation learning method fusing molecular structure characteristics
CN115935968A (en) * 2023-01-04 2023-04-07 北京工业大学 Knowledge graph embedding method based on fusion embedding of semantic and relational structure

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110137919A1 (en) * 2009-12-09 2011-06-09 Electronics And Telecommunications Research Institute Apparatus and method for knowledge graph stabilization
CN106250412A (en) * 2016-07-22 2016-12-21 浙江大学 The knowledge mapping construction method merged based on many source entities
CN106886543A (en) * 2015-12-16 2017-06-23 清华大学 The knowledge mapping of binding entity description represents learning method and system
CN107273490A (en) * 2017-06-14 2017-10-20 北京工业大学 A kind of combination mistake topic recommendation method of knowledge based collection of illustrative plates
CN107885760A (en) * 2016-12-21 2018-04-06 桂林电子科技大学 It is a kind of to represent learning method based on a variety of semantic knowledge mappings

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110137919A1 (en) * 2009-12-09 2011-06-09 Electronics And Telecommunications Research Institute Apparatus and method for knowledge graph stabilization
CN106886543A (en) * 2015-12-16 2017-06-23 清华大学 The knowledge mapping of binding entity description represents learning method and system
CN106250412A (en) * 2016-07-22 2016-12-21 浙江大学 The knowledge mapping construction method merged based on many source entities
CN107885760A (en) * 2016-12-21 2018-04-06 桂林电子科技大学 It is a kind of to represent learning method based on a variety of semantic knowledge mappings
CN107273490A (en) * 2017-06-14 2017-10-20 北京工业大学 A kind of combination mistake topic recommendation method of knowledge based collection of illustrative plates

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐增林等: "知识图谱技术综述", 《电子科技大学学报》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020135048A1 (en) * 2018-12-29 2020-07-02 颖投信息科技(上海)有限公司 Data merging method and apparatus for knowledge graph
CN109739996A (en) * 2018-12-29 2019-05-10 北京航天数据股份有限公司 A kind of construction method and device of industry knowledge mapping
CN109739996B (en) * 2018-12-29 2020-12-25 北京航天数据股份有限公司 Construction method and device of industrial knowledge map
CN110263324A (en) * 2019-05-16 2019-09-20 华为技术有限公司 Text handling method, model training method and device
US12204859B2 (en) 2019-05-16 2025-01-21 Huawei Technologies Co., Ltd. Text processing method, model training method, and apparatus
CN110232186A (en) * 2019-05-20 2019-09-13 浙江大学 The knowledge mapping for merging entity description, stratification type and text relation information indicates learning method
CN110275959A (en) * 2019-05-22 2019-09-24 广东工业大学 A Fast Learning Method for Large-Scale Knowledge Base
CN111159485A (en) * 2019-12-30 2020-05-15 科大讯飞(苏州)科技有限公司 Tail entity linking method, device, server and storage medium
CN111159485B (en) * 2019-12-30 2020-11-13 科大讯飞(苏州)科技有限公司 Tail entity linking method, device, server and storage medium
CN111488402A (en) * 2020-03-26 2020-08-04 天津大学 A Representation Learning Method with Hierarchical Relational Structure Knowledge Graph
CN111680109B (en) * 2020-04-22 2024-03-29 北京三快在线科技有限公司 Knowledge graph representation learning model training method and device and electronic equipment
CN111680109A (en) * 2020-04-22 2020-09-18 北京三快在线科技有限公司 Knowledge graph representation learning model training method and device and electronic equipment
CN111881290A (en) * 2020-06-17 2020-11-03 国家电网有限公司 A multi-source grid entity fusion method for distribution network based on weighted semantic similarity
CN112596031A (en) * 2020-12-22 2021-04-02 电子科技大学 Target radar threat degree assessment method based on knowledge graph
CN113312487A (en) * 2021-01-16 2021-08-27 江苏网进科技股份有限公司 Knowledge representation learning method facing legal text based on TransE model
CN113032582A (en) * 2021-04-20 2021-06-25 杭州叙简科技股份有限公司 Knowledge graph based entity unified model establishment and entity unified method
CN113590843A (en) * 2021-08-06 2021-11-02 中国海洋大学 Knowledge representation learning method fusing molecular structure characteristics
CN113590843B (en) * 2021-08-06 2023-06-23 中国海洋大学 A Knowledge Representation Learning Method Integrating Molecular Structure Features
CN115935968A (en) * 2023-01-04 2023-04-07 北京工业大学 Knowledge graph embedding method based on fusion embedding of semantic and relational structure

Also Published As

Publication number Publication date
CN109033129B (en) 2021-08-03

Similar Documents

Publication Publication Date Title
CN109033129A (en) Multi-source Information Fusion knowledge mapping based on adaptive weighting indicates learning method
CN112131404B (en) Entity alignment method in four-risk one-gold domain knowledge graph
CN111737552B (en) Method, device and equipment for training information extraction model and obtaining knowledge graph
CN109376242B (en) Text classification method based on cyclic neural network variant and convolutional neural network
CN107885760A (en) It is a kind of to represent learning method based on a variety of semantic knowledge mappings
CN109977234A (en) A kind of knowledge mapping complementing method based on subject key words filtering
CN108763376B (en) A Knowledge Representation Learning Method Integrating Relation Path, Type, and Entity Description Information
CN105630901A (en) Knowledge graph representation learning method
CN108763326A (en) A kind of sentiment analysis model building method of the diversified convolutional neural networks of feature based
CN108197290A (en) A kind of knowledge mapping expression learning method for merging entity and relationship description
CN114676687B (en) Aspect-level emotion classification method based on enhanced semantic syntax information
CN107608953B (en) A word vector generation method based on variable-length context
CN116578708B (en) Paper data name disambiguation method based on graph neural network
CN114580638A (en) Knowledge Graph Representation Learning Method and System Based on Text Graph Enhancement
CN110264372B (en) Topic community discovery method based on node representation
CN108052625A (en) A kind of entity sophisticated category method
CN112000689B (en) A multi-knowledge graph fusion method based on text analysis
CN115438197B (en) Method and system for complementing relationship of affair knowledge graph based on double-layer heterogeneous graph
CN107145514A (en) Chinese sentence pattern sorting technique based on decision tree and SVM mixed models
CN114444506B (en) Relation triplet extraction method for fusing entity types
CN113722439A (en) Cross-domain emotion classification method and system based on antagonism type alignment network
CN115936115A (en) Knowledge Graph Embedding Method Based on Graph Convolution Contrastive Learning and XLNet
CN115131605A (en) Structure perception graph comparison learning method based on self-adaptive sub-graph
CN115809346A (en) A small-sample knowledge map completion method based on multi-view semantic enhancement
CN117035008A (en) Image text matching method based on graphic neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20181218

Assignee: Guilin Biqi Information Technology Co.,Ltd.

Assignor: GUILIN University OF ELECTRONIC TECHNOLOGY

Contract record no.: X2023980045831

Denomination of invention: A Learning Method for Knowledge Graph Representation of Multi source Information Fusion Based on Adaptive Weighting

Granted publication date: 20210803

License type: Common License

Record date: 20231107

EE01 Entry into force of recordation of patent licensing contract
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载