CN107291687A

CN107291687A - It is a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method

Info

Publication number: CN107291687A
Application number: CN201710285995.4A
Authority: CN
Inventors: 向阳; 贾圣宾; 鄂世嘉; 吕东东
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2017-04-27
Filing date: 2017-04-27
Publication date: 2017-10-24
Anticipated expiration: 2037-04-27
Also published as: CN107291687B

Abstract

The invention relates to a Chinese unsupervised open entity relationship extraction method based on dependency semantics. The method includes the following steps: preprocessing input text: performing Chinese word segmentation, part-of-speech tagging and dependency syntax analysis on the input text; and naming entities on the input text Recognition; randomly select two entities from the identified entities to form a candidate entity pair; find the dependency path between the two entities in the candidate entity pair; analyze whether the syntactic structure mapped by the dependency path is consistent with the paradigm of the dependency semantic paradigm set Match, if yes, then extract words or phrases from the rest of the input text according to the matched paradigm as relational words, and the extracted relational words and candidate entity pairs form a relation triplet, otherwise proceed to the next set of candidate entity pairs Pattern matching; output relation triples. Compared with the prior art, the present invention has the advantages of low computational complexity, high extraction efficiency, overcoming the limitation of distance and location, and being able to extract even a single sentence.

Description

A Chinese Unsupervised Open Entity Relationship Extraction Method Based on Dependency Semantics

技术领域technical field

本发明涉及人工智能与自然语言处理领域的信息抽取研究，尤其是涉及一种基于依存语义的中文无监督开放式实体关系抽取方法。The invention relates to information extraction research in the fields of artificial intelligence and natural language processing, in particular to a Chinese unsupervised open entity relationship extraction method based on dependency semantics.

背景技术Background technique

大数据的浪潮宛有钱塘江之势汹涌而来，互联网积存的数据呈爆炸式增长。面对web中的海量信息，用户想快速的找到自己关心的信息，变得十分困难。传统搜索引擎只能将与用户查询内容相关的大量网页返回给用户，必须再对网页进行浏览后才能得到用户自己需要的信息。这种单一的返回网页的搜索方式已不能满足用户面对海量网络数据的实际需求。互联网为人们提供了一个取之不尽用之不竭的信息源，如何快速准确地从中自动挖掘有价值的信息成为人们关注的焦点。The wave of big data seems to be surging like the Qiantang River, and the data accumulated on the Internet is growing explosively. Faced with the massive amount of information on the web, it becomes very difficult for users to quickly find the information they care about. Traditional search engines can only return a large number of webpages related to the user's query content to the user, and the information that the user needs can only be obtained after browsing the webpage. This single search method of returning webpages can no longer meet the actual needs of users facing massive network data. The Internet provides people with an inexhaustible source of information, how to quickly and accurately mine valuable information automatically has become the focus of attention.

信息抽取技术应运而生。把文本中蕴含的无结构化信息以结构化或者半结构化的形式输出，快速获取用户真正关心的内容，从而提供智能化、人性化的信息服务，这就是信息抽取的任务。例如，从飞机失事事件的新闻报道中，抽取人物、时间、地点、伤亡人数、事故原因等信息，让用户快速获取事件原委。而命名实体关系抽取是信息抽取的一个核心子任务，也叫做实体关系抽取或关系抽取，从无结构的自然语言文本中抽取相关命名实体之间的语义关系，并整理成结构化的关系三元组(Entity1，RelationWords，Entity2)，其中Entity1、Entity2是存在关系的实体对，RelationWords则是描述实体之间语义关系的词或词序列。Information extraction technology came into being. The task of information extraction is to output the unstructured information contained in the text in a structured or semi-structured form, quickly obtain the content that users really care about, and provide intelligent and humanized information services. For example, from the news reports of the plane crash, information such as people, time, location, number of casualties, and cause of the accident are extracted, allowing users to quickly obtain the cause of the incident. Named entity relationship extraction is a core subtask of information extraction, also known as entity relationship extraction or relationship extraction, which extracts the semantic relationship between related named entities from unstructured natural language texts and organizes them into structured relational triples. Group (Entity1, RelationWords, Entity2), where Entity1 and Entity2 are entity pairs with relationships, and RelationWords are words or word sequences describing the semantic relationship between entities.

实体关系抽取有着重要的研究价值，在知识图谱、智能搜索引擎、自动问答系统、文本挖掘、机器翻译等许多人工智能领域都有广泛的应用。Entity relationship extraction has important research value and is widely used in many artificial intelligence fields such as knowledge graphs, intelligent search engines, automatic question answering systems, text mining, and machine translation.

传统的信息抽取通过训练好的抽取器识别目标关系类型，需要预先定义的关系类型和大量标注的训练语料。传统的中文关系抽取基于有监督的机器学习算法，主要包括基于特征的方法和基于核的方法。此类方法有几点不足：首先，定义一个全面的实体关系类型体系是很困难的；其次，严重依赖于大规模已标注的训练语料，手工标注语料是费时费力的，且标注的质量难以把控；最后，开放式网络文本海量且不能预先定义，因此传统的方法无法适应开放领域信息抽取需求。开放式实体关系抽取技术克服了传统关系抽取的弊端，可以自动地发现网络文本中任意的关系类型，具有重要的发展前景和研究价值。在开放式关系抽取研究方面，主要是应用聚类算法。通过位置限制、距离限制等手段，抽取候选实体对，然后聚类生成相似实体对的类簇，然后为各类簇标注关系类标签，选择较有代表性的词作为该类的关系描述词。这样的方法存在两个问题：聚类算法需要相当数量的相关实体对，即对于单个或者少量的实体对无法得到有效的结果，当训练语料不足时会严重影响此类方法的效果；很难确定最后的核心关系词是否能够成为一个有效的关系特征词，最后所确定类族的描述词也不一定适合该簇中的每一对实体。此外，有学者研究基于深层句法分析或语义角色标注的方法，取得不错的效果，此方面研究主要集中在英文语料上。Traditional information extraction uses trained extractors to identify target relation types, which requires pre-defined relation types and a large amount of labeled training corpus. Traditional Chinese relation extraction is based on supervised machine learning algorithms, mainly including feature-based methods and kernel-based methods. This type of method has several shortcomings: first, it is very difficult to define a comprehensive entity relationship type system; second, it relies heavily on large-scale labeled training corpus, manual labeling of corpus is time-consuming and laborious, and the quality of labeling is difficult to control. Finally, open network texts are massive and cannot be pre-defined, so traditional methods cannot meet the needs of open domain information extraction. Open entity relationship extraction technology overcomes the disadvantages of traditional relationship extraction, and can automatically discover any relationship type in network texts, which has important development prospects and research value. In the research of open relation extraction, clustering algorithm is mainly applied. By means of position restriction and distance restriction, the candidate entity pairs are extracted, and then clustered to generate clusters of similar entity pairs, and then the relationship class labels are marked for each type of cluster, and more representative words are selected as the relationship descriptors of the class. There are two problems with this method: the clustering algorithm requires a considerable number of related entity pairs, that is, effective results cannot be obtained for a single or a small number of entity pairs, and the effect of such methods will be seriously affected when the training corpus is insufficient; it is difficult to determine Whether the final core relational word can become an effective relational characteristic word, and the descriptor of the finally determined class family may not be suitable for every pair of entities in the cluster. In addition, some scholars have studied methods based on deep syntactic analysis or semantic role labeling, and achieved good results. Research in this area mainly focuses on English corpus.

开放式关系抽取在英语语料上的研究，已经取得非常瞩目的成果，但是对中文语料的研究相对较少。中文语料在构词、构句和表述方面具有其独特的灵活性和复杂性，其研究难度要远大于英文，因此，现有的一些英文实体关系抽取系统无法适应于中文语料。必需仔细研究中文词法、句法，并将其引入实体关系抽取，才能获得适合中文领域的实体关系抽取系统。The research on open relational extraction on English corpus has achieved remarkable results, but there are relatively few studies on Chinese corpus. Chinese corpus has its unique flexibility and complexity in terms of word formation, sentence construction and expression, and its research difficulty is much greater than that of English. Therefore, some existing English entity relationship extraction systems cannot adapt to Chinese corpus. It is necessary to carefully study Chinese lexical and syntax and introduce them into entity relationship extraction in order to obtain an entity relationship extraction system suitable for the Chinese field.

研究发现，在进行实体关系抽取时，存在关系的实体对之间往往存在一定的句法关系。例如，如果两个实体分别是句子的主语和宾语，那么实体对的关系特征词就极可能是谓语动词。如果提前知道了实体对之间的句法关系，那么就可以比较准确的确定实体对之间的关系特征词。依存句法分析可以反映出句子各成分之间的语义修饰关系。由于句子中的命名实体必定会作为一个名词短语出现在依存结构中，那么实体之间的依存路径也必然会反映出相应实体对的关系特征。The study found that when extracting entity relationships, there is often a certain syntactic relationship between entity pairs that have relationships. For example, if two entities are the subject and object of the sentence respectively, then the relation feature word of the entity pair is most likely to be the predicate verb. If the syntactic relationship between entity pairs is known in advance, then the characteristic words of the relationship between entity pairs can be determined more accurately. Dependency syntactic analysis can reflect the semantic modification relationship between the components of a sentence. Since the named entity in a sentence must appear in the dependency structure as a noun phrase, the dependency path between entities must also reflect the relationship characteristics of the corresponding entity pair.

综上所述，为使实体关系抽取方法更适用于中文语料，立足于中文特有的句法语义特征，充分展现无监督方法在开放领域的适应性和有效性。本发明提出了一种无监督的中文开放式关系抽取方法——依存语义范式(Dependency Semantic Normal Forms，DSNFs)。为中文开放式关系抽取研究领域提带来创新性成果。To sum up, in order to make the entity relationship extraction method more suitable for Chinese corpus, based on the unique syntax and semantic features of Chinese, it fully demonstrates the adaptability and effectiveness of unsupervised methods in the open field. The present invention proposes an unsupervised Chinese open relation extraction method - Dependency Semantic Normal Forms (DSNFs). Bring innovative results to the research field of Chinese open relational extraction.

发明内容Contents of the invention

本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种基于依存语义的中文无监督开放式实体关系抽取方法。本发明的目的是规避传统抽取方法训练语料要求高、移植性扩展性差和无法适应开放式网络文本等弊端，又考虑到中文在词法语法等方面的复杂灵活等特性导致的英文语料下的抽取方法无法移植到中文上来，本发明提出一种立足中文语言特色的针对网络文本的开放式无监督实体关系抽取方法。The purpose of the present invention is to provide a Chinese unsupervised open entity relationship extraction method based on dependency semantics in order to overcome the above-mentioned defects in the prior art. The purpose of the present invention is to avoid the disadvantages of traditional extraction methods such as high requirements for training corpus, poor portability and scalability, and inability to adapt to open network texts, and the extraction method under the English corpus due to the complex and flexible characteristics of Chinese in terms of morphology and grammar. It cannot be transplanted to Chinese, and the present invention proposes an open unsupervised entity relationship extraction method for network texts based on the characteristics of Chinese language.

为了解决上述技术问题，本发明以实体关系与依存分析树之间的映射为基础，深入挖掘最短依存路径所蕴涵的依存语义，利用依存关系、词性信息和位置关系等特征为限定，得到依存语义范式，提出并实现了一种新颖的无监督中文开放式关系抽取方法。In order to solve the above technical problems, the present invention is based on the mapping between the entity relationship and the dependency analysis tree, deeply digs the dependency semantics contained in the shortest dependency path, and uses the characteristics of dependency relationship, part-of-speech information and position relationship as limitations to obtain the dependency semantics Paradigm, proposed and implemented a novel unsupervised Chinese open relation extraction method.

本发明的目的可以通过以下技术方案来实现：The purpose of the present invention can be achieved through the following technical solutions:

一种基于依存语义的中文无监督开放式实体关系抽取方法，该方法包括以下步骤：A Chinese unsupervised open entity relationship extraction method based on dependency semantics, the method includes the following steps:

S1、预处理输入文本：对输入文本进行中文分词、词性标注和依存句法分析；S1. Preprocessing the input text: perform Chinese word segmentation, part-of-speech tagging and dependency syntax analysis on the input text;

S2、对输入文本进行命名实体识别；S2. Perform named entity recognition on the input text;

S3、从识别出的实体中任意选出两个实体构成候选实体对；S3. Randomly select two entities from the identified entities to form a candidate entity pair;

S4、寻找候选实体对中的两个实体之间的依存路径；S4. Find a dependency path between two entities in the candidate entity pair;

S5、分析候选实体对中的两个实体之间的依存路径所映射的句法结构是否与依存语义范式集的范式匹配，若是，则根据被匹配的范式从输入文本的剩余部分中抽取出词或短语作为关系词，抽取的关系词与候选实体对构成关系三元组，若否则进行下一组候选实体对的范式匹配；S5. Analyze whether the syntactic structure mapped by the dependency path between the two entities in the candidate entity pair matches the paradigm of the dependent semantic paradigm set, and if so, extract words or words from the rest of the input text according to the matched paradigm. Phrases are used as relational words, and the extracted relational words and candidate entity pairs constitute a relational triplet, otherwise, the paradigm matching of the next group of candidate entity pairs is performed;

S6、输出关系三元组。S6. Outputting a relation triplet.

所述的关系三元组形式为：(Entity1，RelationWords，Entity2)，其中Entity1、Entity2是存在关系的实体对，RelationWords是描述实体之间语义关系的词或短语。The form of the relation triplet is: (Entity1, RelationWords, Entity2), where Entity1 and Entity2 are entity pairs with relation, and RelationWords are words or phrases describing the semantic relation between entities.

所述的依存语义范式包括第一类前修饰结构类、第二类并列结构类、第三类动词相关类、第四类模板化类和其他类。The dependency semantic paradigm includes the first type of pre-modified structure, the second type of parallel structure, the third type of verb-related type, the fourth type of templated type and other types.

所述的第一类前修饰结构类包括组合式定语结构和由结构助词“的”与中心语连接的结构，组合式定语结构对应依存语义范式“Entity1+AttWord1(+ AttWord2)+Entity2”，由结构助词“的”与中心语连接的结构对应语义范式“Entity1+ 的+Noun+Entity2”或“Entity1+的+Entity2+Noun”，其中Entity1、Entity2是存在关系的实体对，AttWord1和AttWord2为不同的定语词，Noun为名词。The first type of pre-modification structure class includes a combined attributive structure and a structure connected by the structural particle "of" and the head, and the combined attributive structure corresponds to the dependent semantic paradigm "Entity1+AttWord1(+AttWord2)+Entity2", which is formed by The structure of the connection between the structural particle "的" and the head word corresponds to the semantic paradigm "Entity1+的+Noun+Entity2" or "Entity1+的+Entity2+Noun", in which Entity1 and Entity2 are entity pairs with a relationship, and AttWord1 and AttWord2 are different definitions. Words, Noun is a noun.

所述的第二类并列结构类包括并列名词结构和并列动词结构。The second type of coordinating structures includes coordinating noun structures and coordinating verb structures.

所述的并列名词结构包括并列实体作为主语结构，并列实体作为谓词宾语结构，并列实体作为介词宾语结构以及前三种的混合结构，并列实体作为主语结构对应依存语义范式“Entity2+Conj+(Entity1++)+Pred+Entity3”，并列实体作为谓词宾语结构对应依存语义范式“Entity2+Pred+Entity3+Conj+(Entity1++)”，并列实体作为介词宾语结构对应依存语义范式“Entity2+Prep+Entity3+Conj+(Entity1++)+Pred (+Dobj)”，其中Entity2、Entity3为存在关系的实体对，(Entity1++)表示存在一个或多个并列实体，Conj为连词，Pred为谓词，Prep为介词，Dobj为直接宾语。The described parallel noun structure includes a parallel entity as a subject structure, a parallel entity as a predicate object structure, a parallel entity as a prepositional object structure and a mixed structure of the first three, and a parallel entity as a subject structure corresponding to the dependent semantic paradigm "Entity2+Conj+(Entity1++) +Pred+Entity3", the parallel entity as a predicate object structure corresponds to the dependent semantic paradigm "Entity2+Pred+Entity3+Conj+(Entity1++)", the parallel entity as a preposition object structure corresponds to the dependent semantic paradigm "Entity2+Prep+Entity3+Conj+(Entity1++) +Pred (+Dobj)", where Entity2 and Entity3 are entity pairs that have a relationship, (Entity1++) indicates that there are one or more parallel entities, Conj is a conjunction, Pred is a predicate, Prep is a preposition, and Dobj is a direct object.

所述的并列动词结构包括动词连用结构和并列类复句结构。The said coordinating verb structure includes a verb conjunction structure and a coordinating compound sentence structure.

所述的第三类动词相关类包括主谓动宾结构和主谓介宾结构，主谓动宾结构对应依存语义范式“Entity1+Pred+Entity2”，主谓介宾结构对应依存语义范式“Entity1+Prep+Entity2+Pred(+Dobj)”，其中，Entity1、Entity2是存在关系的实体对，Pred为谓词，Prep为介词，Dobj为直接宾语。The third category of verb-related classes includes a subject-verb-verb-object structure and a subject-predicate-interface-object structure, the subject-predicate-verb-object structure corresponds to the dependent semantic paradigm "Entity1+Pred+Entity2", and the subject-verb-verb-object structure corresponds to the dependent semantic paradigm "Entity1 +Prep+Entity2+Pred(+Dobj)", wherein, Entity1 and Entity2 are entity pairs with relationship, Pred is a predicate, Prep is a preposition, and Dobj is a direct object.

与现有技术相比，本发明具有以下优点：Compared with the prior art, the present invention has the following advantages:

1)本发明提出的方法有充足的能力应对复杂的中文句法，抽取过程中，无需限制实体对与关系词的相对位置，避免传统方法中位置限制带来的弊端；1) The method proposed by the present invention has sufficient ability to deal with complex Chinese syntax. In the extraction process, there is no need to limit the relative position of entity pairs and relative words, and avoid the disadvantages caused by position restrictions in traditional methods;

2)本发明提出的方法可以获得更丰富的结果，可以抽取以动词或名词为核心的关系短语，相较之下，其他一些效果较好的抽取器只能抽取动词为关系词；2) The method proposed by the present invention can obtain richer results, and can extract relational phrases with verbs or nouns as the core. In contrast, other extractors with better effects can only extract verbs as relational words;

3)本发明提出的方法可以较好地识别长跨度的依存关系，特别是在并列结构的情况下，可以抽取共现的关系三元组，避免传统方法中距离限制带来的弊端；3) The method proposed by the present invention can better identify long-span dependencies, especially in the case of juxtaposed structures, can extract co-occurring relational triples, avoiding the disadvantages caused by distance restrictions in traditional methods;

4)本发明提出的方法无需模型训练语料，一条句子也可以进行关系抽取，计算复杂度低，抽取效率高，可满足高实时性需求。4) The method proposed by the present invention does not require model training corpus, and a sentence can also be used for relationship extraction, with low computational complexity and high extraction efficiency, and can meet high real-time requirements.

附图说明Description of drawings

图1为本发明抽取方法流程示意图；Fig. 1 is a schematic flow chart of the extraction method of the present invention;

图2为依存语义范式DSNF1图模型；Figure 2 is a graph model of the dependency semantic paradigm DSNF1;

图3为依存语义范式DSNF2图模型；Figure 3 is a graph model of the dependency semantic paradigm DSNF2;

图4为依存语义范式DSNF3图模型；Figure 4 is a graph model of the dependency semantic paradigm DSNF3;

图5为依存语义范式DSNF4图模型；Figure 5 is a graph model of the dependency semantic paradigm DSNF4;

图6为依存语义范式DSNF5图模型；Figure 6 is a graph model of the dependency semantic paradigm DSNF5;

图7为依存语义范式DSNF6图模型；Figure 7 is a graph model of the dependency semantic paradigm DSNF6;

图8为依存语义范式DSNF7图模型；Figure 8 is a graph model of the dependency semantic paradigm DSNF7;

图9为依存语义范式DSNF8图模型；Figure 9 is a graph model of the dependency semantic paradigm DSNF8;

图10为依存语义范式DSNF9图模型。Figure 10 is the graph model of the dependency semantic paradigm DSNF9.

具体实施方式detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明的一部分实施例，而不是全部实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都应属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

实施例Example

本发明提出的一种基于依存语义的中文无监督开放式实体关系抽取方法，为基于依存语义范式(DSNFs)的实体关系抽取方法，可以实现自动抽取，无需人工干预，输入是未经任何处理的自然语言句子，输出就是实体关系三元组。如图1所示，整个过程可以描述如下：A Chinese unsupervised open entity relationship extraction method based on dependency semantics proposed by the present invention is an entity relationship extraction method based on dependency semantic paradigm (DSNFs), which can realize automatic extraction without manual intervention, and the input is without any processing Natural language sentences, the output is entity-relationship triples. As shown in Figure 1, the whole process can be described as follows:

步骤1：预处理输入文本。每个句子将经过分词、词性标注、依存句法分析等一系列自然语言处理操作,为后续步骤做准备。本发明所提出的方法借助哈工大社会计算与信息检索研究中心研发的“语言技术平台(LTP)”所提供的自然语言处理技术进行上述操作。Step 1: Preprocess the input text. Each sentence will undergo a series of natural language processing operations such as word segmentation, part-of-speech tagging, and dependency syntax analysis to prepare for the next steps. The method proposed by the present invention uses the natural language processing technology provided by the "Language Technology Platform (LTP)" developed by the Social Computing and Information Retrieval Research Center of Harbin Institute of Technology to perform the above operations.

步骤2：选择候选实体对。通过命名实体识别模块进行输入文本的实体识别，然后将所有识别出来的候选实体进行两两组对。本方法采用哈工大语言技术平台提供的命名实体识别技术和迭代的启发式方法进行命名实体识别。后者是通过合并相连名词获取最大化名词短语，其中名词的词性只能是{ni，nh，ns，nz，j}，分别代表机构名、人名、地理名、其他专有名词和缩略词。两种方法互为补充，同时展开。Step 2: Select candidate entity pairs. The entity recognition of the input text is carried out through the named entity recognition module, and then all the recognized candidate entities are paired in pairs. This method uses the named entity recognition technology provided by the language technology platform of Harbin Institute of Technology and the iterative heuristic method for named entity recognition. The latter is to obtain the maximum noun phrase by merging connected nouns, in which the part of speech of nouns can only be {ni, nh, ns, nz, j}, which respectively represent organization names, personal names, geographical names, other proper nouns and acronyms . The two methods complement each other and are carried out simultaneously.

步骤3：匹配依存语义范式。对于步骤二中得到候选实体对，分析实体间的依存最短路径所映射的句法结构是否能够匹配某一个DSNF，Step 3: Match the dependency semantic paradigm. For the candidate entity pairs obtained in step 2, analyze whether the syntax structure mapped by the shortest path between entities can match a certain DSNF,

步骤4：输出关系三元组。步骤3执行完后，若匹配，则从中抽取出关系词，输出关系三元组；若未匹配，则进行下一组候选实体对的匹配。Step 4: Output relation triplets. After the execution of step 3, if it matches, then extract the relation word from it and output the relation triplet; if not match, then match the next set of candidate entity pairs.

本发明提出的方法的核心在于依存语义范式，下面将着重介绍其相关内容：The core of the method proposed by the present invention lies in the dependency semantic paradigm, and the relevant content will be introduced emphatically below:

通过统计分析大量关系实例后发现，关系三元组总是会出现在某些固定的句法结构中，对实体关系有表征作用的句法结构有：主谓关系、动宾关系、介词宾语、并列成分和修饰关系等。将这些结构映射到依存树中可以得到依存语义范式 (DSNFs)。DSNFs是由词序列、词性、依存路径及其相关的依存标签组合而成。本方法将该范式集分为前修饰、并列、动词相关、模板化的、其他五大类，在每一类中，可以得到一种或多种DSNF，为关系抽取提供合理的依据。Through the statistical analysis of a large number of relational examples, it is found that relational triples always appear in certain fixed syntactic structures, and the syntactic structures that represent entity relations include: subject-predicate relations, verb-object relations, prepositional objects, and parallel components and grooming relationships, etc. Mapping these structures into dependency trees yields Dependency Semantic Normal Forms (DSNFs). DSNFs are composed of word sequences, parts of speech, dependency paths and their associated dependency labels. This method divides the paradigm set into five categories: premodification, juxtaposition, verb correlation, templated, and others. In each category, one or more DSNFs can be obtained to provide a reasonable basis for relation extraction.

一、前修饰(Pre-Modification Class，PreMod)1. Pre-Modification Class (PreMod)

前修饰在中文短语中是一种非常重要的修饰类型。在中文语言学看来，PreMod 句法类的关系表述是一种偏正结构，它能形成一个偏正短语。而偏正短语的结构由定语中心语和修饰语配对组成，其中定语是名词性偏正短语中的前附加成分。定语的构成成分范围很广泛，除了副词和“的”字短语之外，其他各类实词(名词、动词和形容词)和短语都可以充当定语。除此之外，定语的复杂性还在于它的多层次性，从不同的侧面加以限定、描写并同时叠加在一个中心语之前，使得一个中心语可以带有多个定语。Pre-modification is a very important type of modification in Chinese phrases. From the perspective of Chinese linguistics, the relational expression of the PreMod syntactic class is a positive structure, which can form a positive phrase. However, the structure of partial phrases is composed of the attributive head and the modifier pair, and the attributive is the front additional component in the noun partial phrases. The composition of attributives has a wide range. Except for adverbs and "de" phrases, other content words (nouns, verbs and adjectives) and phrases can act as attributives. In addition, the complexity of an attributive also lies in its multi-layered nature, which can be defined and described from different sides and superimposed before a central term at the same time, so that a central term can have multiple attributives.

从形式结构看，定语可以分为以下两种类型：From the perspective of formal structure, attributives can be divided into the following two types:

1)组合式定语，直接附加在中心语之前，中间不加“的”的定语，即“定语+ 中心语”。例如，“<ORG>高二3班</ORG>班主任<PER>王某</PER>”中的“高二 3班”是“班主任”的定语，“高二3班班主任”是“王某”的定语，“班主任”也表述了实体“高二3班”和“王某”之间的语义关系，从而构成一个关系三元组(高二3班，班主任，王某)。由于定语的多层次性，可能由多个词组合共同作为关系特征词，例如，“<ORG>某公司</ORG>首席执行官<PER>赵某</PER>”可以抽取关系(某公司，首席执行官，赵某)，其中由“首席”和“执行官”组合为关系特征词。PER表示人名，ORG表示机构名。1) Combined attributive, an attributive that is added directly before the head subject without "的" in the middle, that is, "attributive + head word". For example, in "<ORG>Grade 2 Class 3</ORG>Head Teacher <PER>Wang</PER>", "Grade 2 Class 3" is the attributive of "Senior Class 3", and "Grade 2 Class 3" is the attributive of "Wang". The attributive, "head teacher" also expresses the semantic relationship between the entity "Grade 2 Class 3" and "Wang", thus forming a relational triple (Grade 2 Class 3, head teacher, Wang). Due to the multi-level nature of attributives, multiple word combinations may be used as relationship feature words. For example, "<ORG>a company</ORG>CEO <PER>Zhao</PER>" can extract the relationship (a company , Chief Executive Officer, Zhao), wherein the combination of "Chief" and "Executive Officer" is a relational feature word. PER means the name of a person, and ORG means the name of an organization.

将组合式定语结构映射在依存分析中表现为：定语依存于中心语，依存关系为“定中关系”，若存在多层定语，则距离中心语较远的定语词依存于距离中心语较近的定语词或直接依存于中心语，依存关系也为“定中关系”。经统计研究，在实际的关系抽取中，我们主要考虑有两层定语和三层定语的结构，既得到关系抽取范式DSNF1：“Entity1+AttWord1(+AttWord2)+Entity2”，依存分析如图2所示。此外还要考虑词性的限制，只考虑定语词(AttWord1、AttWord2)为名词的情况，如果“AttWord1”为职业相关名词(主要包括与机构、工作相关的名词，如董事长、总经理、县长等)；或者“AttWord1”为普通名词(相对于职业相关名词)且“Entity2”为人物实体，满足这两种限制时才会进行关系抽取。The combined attributive structure mapping is shown in the dependency analysis: the attributive is dependent on the central term, and the dependence relationship is a "centered relationship". The attributive word or directly depends on the head word, and the dependency relationship is also a "fixed-middle relationship". According to statistical research, in the actual relationship extraction, we mainly consider the structure of two-layer attributives and three-layer attributives, and obtained the relationship extraction paradigm DSNF1: "Entity1+AttWord1(+AttWord2)+Entity2", the dependency analysis is shown in Figure 2 Show. In addition, the limitation of part of speech should also be considered, only considering the situation that attributive words (AttWord1, AttWord2) are nouns, if "AttWord1" is an occupation-related noun (mainly including nouns related to institutions and work, such as chairman, general manager, county magistrate etc.); or "AttWord1" is a common noun (relative to occupation-related nouns) and "Entity2" is a character entity, and relationship extraction will only be performed when these two restrictions are met.

2)由结构助词“的”与中心语连接的定语，即“定语+的+中心语”。例如，“<PER1>张某</PER1>的妻子<PER2>孙某</PER2>”可以抽取关系元组(张某，妻子，孙某)。再如，“<ORG>某大学</ORG>的<PER>裴某某</PER>老师”和“<ORG> 某大学</ORG>的老师<PER>裴某某</PER>”，虽然结构有所不同，但表达相同的含义。因此可以表达为两种关系抽取范式DSNF2和DSNF3：“Entity1+的 +Noun+Entity2”或“Entity1+的+Entity2+Noun”。从这两种结构中可以抽取关系三元组(Entity1，Noun，Entity1)。可映射为依存句法分析形式，如图3，图4。2) The attributive connected by the structural particle "的" and the head, that is, "attributive+de+head". For example, "<PER1>Zhang</PER1>'s wife <PER2>Sun</PER2>" can extract relational tuples (Zhang, wife, Sun). For another example, "teacher <PER>Pei</PER> from <ORG>university</ORG>" and "teacher <PER>Pei</PER> from <ORG>university</ORG>" , although the structure is different, but express the same meaning. Therefore, it can be expressed as two relational extraction paradigms DSNF2 and DSNF3: "+Noun+Entity2 of Entity1+" or "+Entity2+Noun of Entity1+". From these two structures a relational triple (Entity1, Noun, Entity1) can be extracted. It can be mapped to the form of dependency syntax analysis, as shown in Figure 3 and Figure 4.

在关系抽取中还可能遇到这样的情况，偏正短语中只包含一个实体名词，例如“刘某某教师游览上海”、“小明的妻子是小红”等，这种偏正短语往往蕴含在其他关系句法类中。此时，实体作为定语修饰中心语，在依存句法分析时，实体将不会再直接作为主语或宾语，而是其修饰的中心语成为了句法结构中的主干成分。在关系抽取过程中充分考虑这种情况，将中心语作为“伪实体(Pseudo-entity，Pe)”在依存分析时做相应的转换。例如“<PER>刘某某</PER><Pe-PER>教师</Pe-PER> 游览<LOC>上海</LOC>”，抽取伪实体“教师”和实体“上海”之间的关系“游览”，然后转换并输出关系三元组(刘某某，游览，上海)。在接下来的分析中遇到此种情况将不再赘述。Pe-PER表示人名类伪实体。It is also possible to encounter such a situation in relation extraction. The partial positive phrase contains only one entity noun, such as "Teacher Liu Moumou visited Shanghai", "Xiaoming's wife is Xiaohong", etc. This partial positive phrase is often contained in Among other relational syntax classes. At this time, the entity is used as an attributive modifier to modify the head term. In the dependency syntax analysis, the entity will no longer be directly used as the subject or object, but the modified head term becomes the main component of the syntactic structure. This situation is fully considered in the process of relationship extraction, and the head language is used as a "pseudo-entity (Pe)" to make corresponding conversions during dependency analysis. For example, "<PER>Liu Moumou</PER><Pe-PER>teacher</Pe-PER> visit <LOC>Shanghai</LOC>", extract the relationship between the pseudo-entity "teacher" and the entity "Shanghai" "Travel", then convert and output relational triples (Liu XX, Tour, Shanghai). This situation will not be repeated in the following analysis. Pe-PER represents pseudo-entities of personal names.

二、动词相关(Verbal Class，VERB)2. Verb related (Verbal Class, VERB)

该类中，相关的两个实体，往往一个处于主语的位置，而另一个处于宾语的位置，可以是动词的宾语(动宾结构)，也可以是介词(preposition，Prep)的宾语(介宾结构)，且实体间的关系可以直接由一个谓词(predicate，Pred)表达。根据宾语的不同又可以进一步分为“主谓—动宾”结构和“主谓—介宾”结构。In this class, one of the two related entities is often in the position of the subject, while the other is in the position of the object, which can be the object of the verb (verb-object structure), or the object of the preposition (preposition, Prep) (preposition, prep). structure), and the relationship between entities can be directly expressed by a predicate (predicate, Pred). According to the different objects, it can be further divided into "subject-verb-verb-object" structure and "subject-verb-interobject" structure.

1)对于“主谓—动宾”结构，例如，“<PER>刘某某</PER>游览<LOC>上海 </LOC>”，该例句中“刘某某”是主语，“上海”是宾语，“游览”则是两实体发生关联的谓语动词，可以抽取三元组(刘某某，游览，上海)。将“主谓—动宾”结构映射到依存分析图中，两实体都依存于核心动词，依存关系分别为“主谓关系”和“动宾关系”。可得关系抽取范式DSNF4：“Entity1+Pred+Entity2”，可以抽取关系三元组(Entity1，Pred，Entity2)。依存分析如图5所示。LOC表示地理名词，1) For the "subject-verb-verb object" structure, for example, "<PER>Liu XX</PER>Visits <LOC>Shanghai</LOC>", in this example sentence "Liu XX" is the subject, "Shanghai" is the object, and "Tour" is the predicate verb associated with the two entities, and triples (Liu XX, Tour, Shanghai) can be extracted. Mapping the "subject-verb-verb-object" structure to the dependency analysis diagram, both entities depend on the core verb, and the dependency relationship is "subject-verb relationship" and "verb-object relationship". The available relation extraction paradigm DSNF4: "Entity1+Pred+Entity2", can extract relation triples (Entity1, Pred, Entity2). Dependency analysis is shown in Figure 5. LOC means geographical noun,

2)对于“主谓—介宾”结构，例如“<PER>刘某某</PER>对<LOC>上海</LOC> 进行深度游”，主语是实体“刘某某”，动词“进行”是句子的谓语，主语实体依存于谓语动词，依存关系为“主谓关系”。“对上海”构成介宾短语，实体“上海”依存于介词“对”，依存关系为“介宾关系”；介词“对”以关系“状中结构”依存于谓语动词。名词短语“深度游”则是谓词的直接宾语，由此可以抽取关系元组(刘某某，进行深度游，上海)。值得说明的地方，由于实体2处于介宾短语的位置，它通过介词间接与谓语动词发生依存关系，所以为了使关系抽取结果具有更明确的语义，本文将谓词短语和谓语的直接宾语(direct object，Dobj)共同作为关系特征词。“主谓—介宾”结构可映射为关系抽取范式DSNF5：“Entity1+Prep+Entity2+ Pred(+Dobj)”，可以抽取关系三元组(Entity1，Pred-Dobj，Entity2)依存分析如图 6所示。2) For the "subject-predicate-intermediate" structure, such as "<PER>Liu XX</PER> conducted an in-depth tour of <LOC>Shanghai</LOC>", the subject is the entity "Liu XX", and the verb "go on " is the predicate of the sentence, the subject entity depends on the predicate verb, and the dependency relationship is the "subject-predicate relationship". "Dui Shanghai" constitutes a prepositional phrase, and the entity "Shanghai" depends on the preposition "Dui", and the dependency relationship is "preposition-object relationship". The noun phrase "deep tour" is the direct object of the predicate, from which the relational tuple (Liu Moumou, conducting deep tour, Shanghai) can be extracted. It is worth noting that since entity 2 is in the position of the predicate phrase, it has an indirect dependency relationship with the predicate verb through the preposition. Therefore, in order to make the relationship extraction result have clearer semantics, this paper combines the predicate phrase and the direct object of the predicate (direct object , Dobj) together as relational feature words. The "subject-predicate-subject" structure can be mapped to the relational extraction paradigm DSNF5: "Entity1+Prep+Entity2+Pred(+Dobj)", which can extract relational triples (Entity1, Pred-Dobj, Entity2) dependency analysis as shown in Figure 6 Show.

特别地，对于“主谓—介宾”结构，如果介词为“由、被”等表示被动的词语，此时将Entity1和Entity2的位置互换，构成关系三元组(Entity2，Pred-Dobj， Entity1)。In particular, for the "subject-predicate-predative" structure, if the preposition is "by, be", etc. to represent passive words, at this time, the positions of Entity1 and Entity2 are exchanged to form a relational triple (Entity2, Pred-Dobj, Entity1).

三、并列(Coordination Class，COOR)3. Coordination Class (COOR)

并列关系在中文语句中也是相当常见的。并列表示句子或短语之间具有的一种相互关联，或是同时并举，或是同时进行的关系，并列成分只有前后之分而无主次之分。发生并列关系的，可以是相互关联的不同事物，也可以是同一事物的不同方面，还可以是同一主体的不同动作。并列短语又叫并列词组，一般是由两个或两个以上的名词、动词、形容词、代词或数量词等组合而成，构成词的词性一般要求相同。词与词之间是并列关系，中间常用顿号或“和、及、又、与、并”等连词 (conjunction，Conj)。在关系抽取中主要考虑并列名词和并列动词两种。Parallel relations are also quite common in Chinese sentences. Parallel means that sentences or phrases are related to each other, either at the same time, or at the same time. The parallel relationship can be different things that are related to each other, or different aspects of the same thing, or different actions of the same subject. Coordinated phrases are also called parallel phrases, which are generally composed of two or more nouns, verbs, adjectives, pronouns or quantifiers, etc., and the parts of speech of the constituent words generally require the same. There is a parallel relationship between words, and commas or conjunctions (conjunction, Conj) such as "and, and, and, and, and" are often used in the middle. In relation extraction, two kinds of coordinating nouns and coordinating verbs are mainly considered.

如在“<PER1>刘某某</PER1>和<PER2>彭某某</PER2>游览<ORG>上海 </ORG>”中，“刘某某”和“彭某某”是两个具有并列关系的名词。两个实体发生这种名词短语并列关系时，它们产生相同的行为并作用在另一个共同实体上。示例中可以提取关系三元组(刘某某，游览，上海)，同时，“刘某某”的并列成分“彭某某”也与“上海”之间存在“游览”关系，可以抽取关系元组(彭某某，游览，上海)。实际上，COOR句法类需要依赖于其他句法类而存在，如上例中，关系元组(刘某某，游览，上海)应该属于VERB句法类。因为实体“彭某某”依存于实体“刘某某”，依存关系为“并列关系”，所以发生在实体“刘某某”上的关系同样适用于实体“彭某某”。根据实体在句法中所处的位置主要有主语位置、谓词宾语位置和介词宾语位置三类，由此可得，For example, in "<PER1>Liu XX</PER1> and <PER2>Peng XX</PER2>visit <ORG>Shanghai</ORG>", "Liu XX" and "Peng XX" are two Nouns that have parallel relationships. When two entities have this noun phrase juxtaposition, they produce the same behavior and act on another common entity. In the example, the relationship triplet (Liu XX, tour, Shanghai) can be extracted. At the same time, the parallel component "Peng XX" of "Liu XX" also has a "tour" relationship with "Shanghai", and the relationship element can be extracted Group (Peng Moumou, tour, Shanghai). In fact, the COOR syntactic class needs to exist depending on other syntactic classes. For example, in the above example, the relational tuple (Liu XX, Tour, Shanghai) should belong to the VERB syntactic class. Because the entity "Peng XX" is dependent on the entity "Liu XX", the dependency relationship is a "parallel relationship", so the relationship that occurs on the entity "Liu XX" also applies to the entity "Peng XX". According to the positions of entities in the syntax, there are mainly three types: subject position, predicate-object position and preposition-object position, thus,

1)并列名词作为主语时，提取出关系抽取范式DSNF6：“Entity2+Conj+(Entity1++)+Pred+Entity3”，(其中(Entity1++)表示存在一个或多个并列实体，下同)。由关系三元组(Entity2，Pred，Entity3)可得三元组(Entity1，Pred， Entity3)，依存关系如图7所示。1) When the parallel noun is used as the subject, the relation extraction paradigm DSNF6 is extracted: "Entity2+Conj+(Entity1++)+Pred+Entity3", (where (Entity1++) means that there are one or more parallel entities, the same below). From the relationship triplet (Entity2, Pred, Entity3), the triplet (Entity1, Pred, Entity3) can be obtained, and the dependency relationship is shown in FIG. 7 .

2)并列名词作为谓词宾语时，提取出关系抽取范式DSNF7：“Entity2+Pred+Entity3+Conj+(Entity1++)”，由关系三元组(Entity2，Pred，Entity3) 可得三元组(Entity2，Pred，Entity1)，依存关系如图8所示。2) When the parallel noun is used as the predicate object, the relation extraction paradigm DSNF7 is extracted: "Entity2+Pred+Entity3+Conj+(Entity1++)", from the relational triplet (Entity2, Pred, Entity3), the triplet (Entity2, Pred , Entity1), the dependency relationship is shown in Figure 8.

3)并列名词作为介词宾语时，提取出关系抽取范式DSNF8：“Entity2+Prep+Entity3+Conj+(Entity1++)+Pred(+Dobj)”，由关系三元组(Entity2， Pred-Dobj，Entity3)可得三元组(Entity2，Pred-Dobj，Entity1)，依存关系如图9 所示。3) When a co-ordinated noun is used as a prepositional object, the relational extraction paradigm DSNF8 is extracted: "Entity2+Prep+Entity3+Conj+(Entity1++)+Pred(+Dobj)", which can be obtained from the relational triplet (Entity2, Pred-Dobj, Entity3) The triplet (Entity2, Pred-Dobj, Entity1) is obtained, and the dependency relationship is shown in Figure 9.

4)前三种类型的混合型。如“<PER1>李某某</PER1>同学、<PER2>张某某 </PER2>同学一起，分别在<ORG1>上海</ORG1>和<ORG2>杭州</ORG2>邀约了 <PER3>张某某</PER3>同学和<PER4>高某某</PER4>同学。”是前三种类型的混合。4) A mixture of the first three types. For example, "<PER1>Li Moumou</PER1> and <PER2>Zhang Moumou</PER2> together invited <PER3 in <ORG1>Shanghai</ORG1> and <ORG2>Hangzhou</ORG2> >Zhang XX</PER3> and <PER4>Gao XX</PER4>." It is a mixture of the first three types.

并列动词主要描述由同一个主语同时发出的两个不同的动作。分两类情况，Coordinating verbs mainly describe two different actions performed by the same subject at the same time. There are two types of situations,

1)第一类情况，是动词连用。在中文构句时，当一个动词无法将行为的涵义描述完整时，往往会两个动词连用，第一个动词对第二个动词进行补充，第二个动词是及物动词，因此一般抽取距离宾语更近的第二个动词作为关系特征词。如“<PER>张某某</PER>踏雪游览<LOC>庐山</LOC>”，其中“踏雪”和“游览”构成并列关系，可以抽取关系(张某某，游览，庐山)。1) The first type of situation is the joint use of verbs. When composing sentences in Chinese, when a verb cannot fully describe the meaning of an action, two verbs are often used in conjunction, the first verb complements the second verb, and the second verb is a transitive verb, so the distance is generally extracted The second verb that is closer to the object serves as a relational character. For example, "<PER>Zhang Moumou</PER>Snow Tour <LOC>Lushan</LOC>", in which "Snowwalking" and "Tour" form a parallel relationship, and the relationship can be extracted (Zhang Moumou, tour, Lushan Mountain) .

2)第二类情况，则是并列类复句，指的是复句中的几个子句在语义上具有平等并列的关系。如果两个或多个事件之间存在并举罗列的关系，而不存在因果上的联系，就可以构成并列类复句。子句之间常常用逗号和“并、还、而且”等连词分开。如例句“<ORG1>某公司</ORG1>经理<PER>高某</PER>参观<ORG2>厂房 </ORG2>，并在<ORG3>某车间</ORG3>发表生产指导建议。”逗号将复句分成两个子句，分别表达了两个事件，且主语同为实体“高某”，因此两个子句构成并列。并列子句中的谓词“参观”和“发表”构成并列，依存关系为“并列关系”。映射到依存句法时可描述为：如果实体2作为宾语依存于谓语动词2，而此动词2与另外一个动词1构成并列(依存关系为“并列关系”)，同时存在实体1作为主语依存于动词1，那么可以推断实体1和实体2之间存在关系，关系特征词为动词2。因此可以得到关系抽取范式DSNF9：“Entity1+Pred1+Pred2+Entity2”，依存分析如图 10所示。范式DSNF9可以涵盖上述两类情况。2) The second type of situation is the parallel compound sentence, which means that several clauses in the compound sentence have an equal and parallel relationship in semantics. If there is a parallel listing relationship between two or more events, but there is no causal connection, then a parallel complex sentence can be formed. Clauses are often separated by commas and conjunctions such as "and, also, and". For example, "<ORG1>a company</ORG1> manager <PER>Gao</PER> visited <ORG2>factory</ORG2>, and issued production guidance suggestions in <ORG3>a workshop</ORG3>." Comma The compound sentence is divided into two clauses, expressing two events respectively, and the subject is the same as the entity "Gao Mou", so the two clauses are juxtaposed. The predicates "visit" and "publish" in the parallel clause constitute a parallel, and the dependent relationship is "parallel relationship". When mapped to dependency syntax, it can be described as: if entity 2 is dependent on predicate verb 2 as an object, and this verb 2 forms a parallel relationship with another verb 1 (the dependency relationship is "parallel relationship"), and entity 1 exists as a subject dependent on the verb 1, then it can be inferred that there is a relationship between entity 1 and entity 2, and the characteristic word of the relationship is verb 2. Therefore, the relationship extraction paradigm DSNF9 can be obtained: "Entity1+Pred1+Pred2+Entity2", and the dependency analysis is shown in Figure 10. Normal form DSNF9 can cover the above two types of situations.

值得说明，并列结构是嵌套在其他句法类中存在的。范式DSNF6、DSNF7、 DSNF8和DSNF9只表达了并列名词依赖于VERB句法类中“主谓—动宾”结构时的表现状况。其他状况不再赘述。实际抽取操作步骤相似，当Entity1和Entity2存在并列关系时，如果三元组(Entity2，RelationWord，Entity3)成立，则可得关系三元组(Entity1，RelationWord，Entity3)；如果三元组(Entity3，RelationWord， Entity2)成立，则可得关系三元组(Entity3，RelationWord，Entity1)。It is worth noting that the parallel structure is nested in other syntactic classes. Normal forms DSNF6, DSNF7, DSNF8 and DSNF9 only express the performance of coordinating nouns depending on the "subject-verb-verb-object" structure in the VERB syntactic class. Other conditions will not be repeated. The actual extraction steps are similar. When Entity1 and Entity2 have a parallel relationship, if the triplet (Entity2, RelationWord, Entity3) is established, then the relational triplet (Entity1, RelationWord, Entity3) can be obtained; if the triplet (Entity3, RelationWord, Entity2) is established, then the relation triplet (Entity3, RelationWord, Entity1) can be obtained.

四、模式化的(Formulaic Class，FORM)4. Modeled (Formulaic Class, FORM)

FORM的类型往往是一些在中文中经常出现，无法归纳到前面几种关系句法类中，但一般具有固定的表达格式。例如，“王某，某大学教授，发表……”，“王某”和“某大学教授”之间无法找到相应连接词，没有直接修饰关系，所以都不符合上述几种类型。但是从此句中可抽取实体关系三元组(王某，教授，某大学)。类似的行文表达方式是很常见的，它是中国人的写作习惯。针对这些特殊语法表达结构，只需提取出模板做硬性匹配就可以取得很好效果。The types of FORM often appear in Chinese and cannot be classified into the previous relational syntax categories, but generally have a fixed expression format. For example, "Wang Mou, a professor of a certain university, published...", there is no corresponding conjunction between "Wang Mou" and "a professor of a certain university", and there is no direct modification relationship, so they do not meet the above types. But entity-relationship triples (Wang, professor, certain university) can be extracted from this sentence. Similar expressions in writing are very common, and it is a Chinese writing habit. For these special grammatical expression structures, you only need to extract templates for hard matching to achieve good results.

五、其他(Other Class)5. Other Class

本方法把所有目前无法分辨的其他关系类型归纳到这一类。由于该类的不确定性，本文对这一类不做深入研究。This method generalizes all other relation types that cannot be distinguished so far into this category. Due to the uncertainty of this category, this article does not make an in-depth study of this category.

本发明公布了一种基于依存语义的中文无监督开放式实体关系抽取方法，规避传统方法人工标注依赖性大，结果不合理等弊端，立足于中文独特、灵活的句法特征，以实体关系与依存分析树之间的映射为基础，深入挖掘最短依存路径所蕴涵的依存语义，利用依存关系、词性信息和位置关系等特征为限定，得到依存语义范式 (DSNFs)，利用此范式集可以从海量大数据中快速准确地自动抽取实体关系。无需任何人工，可实现全自动抽取，无需依赖模型训练语料，计算复杂度低，抽取效率高，可满足高实时性需求。本发明可以广泛应用于知识图谱、智能搜索引擎、自动问答系统、文本挖掘、机器翻译等人工智能领域。The invention discloses a Chinese unsupervised open entity relationship extraction method based on dependency semantics, which avoids the disadvantages of traditional methods such as manual labeling of large dependencies and unreasonable results, based on the unique and flexible syntax features of Chinese, and uses entity relationship and dependency Based on the mapping between analysis trees, the dependency semantics contained in the shortest dependency path are deeply excavated, and the dependency semantics normal forms (DSNFs) are obtained by using the characteristics of dependency relationship, part-of-speech information and position relationship as limitations. Using this paradigm set, we can learn from massive Fast and accurate automatic extraction of entity relationships from data. Fully automatic extraction can be realized without any manual work, no need to rely on model training corpus, low computational complexity, high extraction efficiency, and can meet high real-time requirements. The present invention can be widely used in artificial intelligence fields such as knowledge maps, intelligent search engines, automatic question answering systems, text mining, and machine translation.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到各种等效的修改或替换，这些修改或替换都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of various equivalents within the technical scope disclosed in the present invention. Modifications or replacements shall all fall within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

1. it is a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method, it is characterised in that this method bag Include following steps：

S1, pretreatment input text：Chinese word segmentation, part-of-speech tagging and interdependent syntactic analysis are carried out to input text；

S2, to input text be named Entity recognition；

S3, any from the entity identified select two entities and constitute candidate's entities pair；

Interdependent path between S4, two entities of searching candidate's entity centering；

S5, analyze the syntactic structure that is mapped of interdependent path between two entities of candidate's entity centering whether with interdependent semanteme The normal form matching of normal form collection, if so, then extracting word or phrase from the remainder of input text according to the normal form being matched As relative, the relative and candidate's entity of extraction are to constituent relation triple, if otherwise carrying out next group of candidate's entity pair Normal form matching；

S6, output relation triple.

2. it is according to claim 1 a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method, Characterized in that, described relation triple form is：(Entity1, RelationWords, Entity2), wherein Entity1, Entity2 are the entities pair that there is relation, and RelationWords is the word or short of semantic relation between description entity Language.

3. it is according to claim 1 a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method, Moved characterized in that, described interdependent semantic normal form includes modification structure class, Equations of The Second Kind parallel construction class, the 3rd class before the first kind Word associated class, the 4th class template class and other classes.

4. it is according to claim 3 a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method, Characterized in that, before the described first kind modification structure class include combined type attribute structure and by structural auxiliary word " " and head The structure of connection, the interdependent semantic normal form of combined type attribute structure correspondence " Entity1+AttWord1 (+AttWord2)+ Entity2 ", by structural auxiliary word " " semanteme normal form " Entity1++Noun+ corresponding with the structure that head is connected Entity2 " or " Entity1++Entity2+Noun ", wherein Entity1, Entity2 are the entities pair that there is relation, AttWord1 and AttWord2 is different attribute word, and Noun is noun.

5. it is according to claim 3 a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method, Characterized in that, described Equations of The Second Kind parallel construction class includes coordinate noun structure and verb structure arranged side by side.

6. it is according to claim 5 a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method, Characterized in that, described coordinate noun structure includes entity arranged side by side as subject structure, entity arranged side by side is used as predicate object knot Structure, entity arranged side by side is as object of preposition structure and the mixed structure of first three, and entity arranged side by side is interdependent as subject structure correspondence Semantic normal form " Entity2+Conj+ (Entity1++)+Pred+Entity3 ", entity arranged side by side is used as predicate object structure correspondence Interdependent semantic normal form " Entity2+Pred+Entity3+Conj+ (Entity1++) ", entity arranged side by side is used as object of preposition structure The interdependent semantic normal form " Entity2+Prep+Entity3+Conj+ (Entity1++)+Pred (+Dobj) " of correspondence, wherein Entity2, Entity3 are the entity pair that there is relation, and (Entity1++) represents there are one or more entities arranged side by side, Conj For conjunction, Pred is predicate, and Prep is preposition, and Dobj is direct object.

7. it is according to claim 5 a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method, Characterized in that, structure and class complex sentence structure arranged side by side is used in conjunction including verb in described verb structure arranged side by side.

8. it is according to claim 3 a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method, Characterized in that, the 3rd described class verb associated class includes subject-predicate V-O construction and subject-predicate guest's Jie structure, subject-predicate V-O construction The interdependent semantic normal form " Entity1+Pred+Entity2 " of correspondence, the interdependent semantic normal form " Entity1+ of subject-predicate guest Jie structure correspondence Prep+Entity2+Pred (+Dobj) ", wherein, Entity1, Entity2 are the entities pair that there is relation, and Pred is predicate, Prep is preposition, and Dobj is direct object.