+

CN109472033B - Method and system for extracting entity relationship in text, storage medium and electronic equipment - Google Patents

Method and system for extracting entity relationship in text, storage medium and electronic equipment Download PDF

Info

Publication number
CN109472033B
CN109472033B CN201811376209.2A CN201811376209A CN109472033B CN 109472033 B CN109472033 B CN 109472033B CN 201811376209 A CN201811376209 A CN 201811376209A CN 109472033 B CN109472033 B CN 109472033B
Authority
CN
China
Prior art keywords
sentence
relationship
entity
entities
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811376209.2A
Other languages
Chinese (zh)
Other versions
CN109472033A (en
Inventor
蒋运承
瞿荣
朱星图
郑一东
马文俊
詹捷宇
刘宇东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN201811376209.2A priority Critical patent/CN109472033B/en
Publication of CN109472033A publication Critical patent/CN109472033A/en
Application granted granted Critical
Publication of CN109472033B publication Critical patent/CN109472033B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a method and a system for extracting entity relations in texts, a storage medium and electronic equipment. The method for extracting the entity relationship in the text comprises the following steps: acquiring an entity triple relation set, an entity and entity attribute set and a concept set; training a sentence of a text set and a triple relation set of two entities identified in the sentence; carrying out remote supervision and labeling, acquiring a sentence comprising a training text set, two entities identified in the sentence, concepts respectively corresponding to the two entities and a relationship set of the two entities, inputting a sentence vector into an entity relationship extraction model and training; and acquiring a relation set of each sentence, wherein each sentence comprises two entities, concepts respectively corresponding to the two entities and the two entities. The method for extracting the entity relationship in the text extracts the relationship between the entities by utilizing the semantic context information in the text, thereby solving the problem of wrong labeling in the remote supervision process.

Description

Method and system for extracting entity relationship in text, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of text processing and information extraction, in particular to a method and a system for extracting entity relations in texts, a storage medium and electronic equipment.
Background
In the past, people have built large-scale knowledge bases such as Wikipedia and DBpedia from real-world knowledge. These knowledge bases are widely used in the fields of artificial intelligence and natural language processing, such as question-answering systems, information extraction, and the like. The knowledge base contains a large number of triple facts, such as (New York, city of United States) representing the fact that "New York is a city in the United States". However, existing knowledge bases contain limited and far from complete facts, with new facts being generated each day. How to mark new facts to supplement the knowledge base becomes a difficult problem which needs to be solved urgently. The fact triplets are labeled by adopting a manual labeling method, which is a time-consuming and labor-consuming project, so that many researches are now carried out to transfer the gravity center to automatically label new facts from complicated and diversified internet resources. The extraction of entity relationships in a large amount of texts is a very important task and is the most core task. Although the entity relationship extraction method in the existing text can achieve a better effect with the help of a remote supervision mechanism, the assumption of remote supervision has the problem of wrong labeling. The reason for this is that, in the assumption of remote supervision, there is only one relationship between a pair of entities, and all sentences in which the pair of entities appear are considered to express the relationship. In fact, when two entities appear in a sentence at the same time, the established relationship in the knowledge base may not be expressed, other relationships may be expressed, or a common subject may be reflected, which needs to be determined according to the semantic context in the sentence.
Disclosure of Invention
Based on this, the present invention provides a method for extracting entity relationships in a text, which extracts relationships between entities by using semantic context information in the text, thereby fundamentally solving the problem of wrong labeling in a remote monitoring process.
The invention is realized by the following scheme:
a method for extracting entity relation in text comprises the following steps:
acquiring an entity triple relation set, acquiring an entity and an entity attribute set, and acquiring a concept set;
acquiring a sentence of a training text set and a triple relation set of two entities identified in the sentence;
according to the entity triple relation set, the entity and entity attribute set and the concept set, carrying out remote supervision and labeling on the sentence of the training text set and the triple relation set of the two entities identified in the sentence to obtain the sentence comprising the training text set, the two entities identified in the sentence, concepts respectively corresponding to the two entities and the relationship set of the two entities, and putting the relationship set into a labeling training set;
acquiring vector representation of words in sentences in a training text set according to the labeled training set;
obtaining a sentence vector of each sentence in the training text set according to the vector representation of the words in the sentence;
inputting a sentence vector of each sentence of a training text set into an entity relationship extraction model, and training the entity relationship extraction model according to two entities marked in the sentence, concepts respectively corresponding to the two entities marked in the sentence and the relationship between the two entities marked in the sentence;
obtaining a sentence vector of each sentence in a text set to be extracted;
and inputting the sentence vector of each sentence in the text set to be extracted into the entity relation extraction model, and acquiring a relation set of each sentence in the text set to be extracted, wherein the relation set comprises two entities, concepts corresponding to the two entities respectively and the two entities.
The method for extracting the entity relationship in the text provided by the invention has the advantages that the concept range of the entity in the context represents semantic context information, the entity relationship training set of multi-concept and multi-relationship is obtained according to the concept range, and the entity relationship extraction model is constructed according to the training set, so that the problem of wrong labeling in the remote supervision process is fundamentally solved.
In one embodiment, the performing remote supervision and annotation on the sentence in the training text set and the triple relationship set of two entities identified in the sentence according to the entity triple relationship set, the entity and entity attribute set, and the concept set includes:
and carrying out context recognition on the sentences of the training text set to obtain concepts corresponding to the two entities recognized by the sentences respectively.
In one embodiment, after performing context recognition on the sentences in the training text set and obtaining concepts corresponding to the two entities recognized by the sentences, the method further includes the following steps:
matching two entities identified in the sentences of the training text set with the entity triple relation set;
if the matching fails, a relation is randomly extracted from the entity triple relation set, a relation set which comprises sentences, the two marked entities, concepts respectively corresponding to the two marked entities and the randomly extracted relation set is generated, and the data set is used as a negative sample and is placed into the marking training set.
In one embodiment, the method further comprises the following steps:
if the matching is successful, generating a concept and a matching relation set which respectively correspond to the sentence, the two marked entities and the two marked entities, scoring the confidence degree of the relation obtained by matching, if the scoring result exceeds a first set threshold value, putting the data set into a marking training set as a positive sample, and if the scoring result is lower than the first set threshold value, putting the data set into the marking training set as a negative sample.
In one embodiment, the confidence scoring of the matched relationship comprises:
and acquiring the correlation degree of the matched relation and the context in the sentence according to the proportion of the context of the sentence appearing in the corpus together, wherein the higher the correlation degree is, the higher the confidence score is.
In one embodiment, the method further comprises the following steps:
acquiring a plurality of relation sets with the same concept in the generated relation sets;
judging the context correlation degree of each relation and sentence in the plurality of relation sets;
and replacing the relationship with the maximum degree of correlation into a plurality of relationship sets as a new relationship.
In one embodiment, after replacing the relationship with the largest degree of correlation into the plurality of relationship sets as a new relationship, the method further includes the following steps:
deleting the plurality of relationship sets in the labeling training set;
placing the plurality of relationship sets including new relationships into the annotation training set.
Further, the present invention also provides a system for extracting entity relationships in a text, including:
the first acquisition module is used for acquiring the entity triple relation set, acquiring the entity and the entity attribute set and acquiring the concept set;
the second acquisition module is used for acquiring a sentence of the training text set and a triple relation set of two entities identified in the sentence;
the remote supervision and labeling module is used for carrying out remote supervision and labeling on the sentence of the training text set and the triple relation sets of the two entities identified in the sentence according to the entity triple relation set, the entity and entity attribute set and the concept set, acquiring the sentence comprising the training text set, the two entities identified in the sentence, concepts respectively corresponding to the two entities and the relation sets of the two entities, and putting the relation sets into a labeling training set;
the representation input module is used for acquiring vector representation of words in the sentence of the training text set according to the labeled training set;
the first sentence expression module is used for acquiring a sentence vector of each sentence in the training text set according to the vector expression of the words in the sentence;
the entity relationship extraction model training module is used for inputting the sentence vector of each sentence in the training text set into the entity relationship extraction model, and training the entity relationship extraction model according to the two entities marked in the sentence, the concepts respectively corresponding to the two entities marked in the sentence and the relationship between the two entities marked in the sentence;
the second sentence expression module is used for acquiring a sentence vector of each sentence in the text set to be extracted;
and the entity relationship extraction module is used for inputting the sentence vector of each sentence of the text set to be extracted into the entity relationship extraction model, and acquiring a relationship set which comprises two entities, concepts respectively corresponding to the two entities and the two entities of each sentence of the text set to be extracted.
The method for extracting the entity relationship in the text provided by the invention has the advantages that the concept range of the entity in the context represents semantic context information, the entity relationship training set with multiple concepts and multiple relationships is obtained according to the concept range, and the entity relationship extraction model is constructed according to the training set, so that the problem of wrong labeling in the remote supervision process is fundamentally solved.
Further, the present invention also provides a computer readable medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the entity relationship extraction method in the text as described in any one of the above embodiments.
Further, the present invention also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable by the processor, and when the processor executes the computer program, the processor implements the entity relationship extraction method in the text as described in any of the above embodiments.
For a better understanding and practice, the present invention is described in detail below with reference to the accompanying drawings.
Drawings
FIG. 1 is a flowchart illustrating a method for extracting entity relationships in a text according to an embodiment;
FIG. 2 is a diagram illustrating a process of matching relationships in a training text set according to an embodiment;
FIG. 3 is a schematic diagram illustrating a remote surveillance annotation process in one embodiment;
FIG. 4 is a flow chart illustrating the process of modifying the annotation result according to an embodiment;
FIG. 5 is a diagram of an entity relationship extraction model in one embodiment;
FIG. 6 is a block diagram of an embodiment of a system for extracting entity relationships from text;
fig. 7 is a schematic structural diagram of an electronic device in an embodiment.
Detailed Description
Referring to fig. 1, in an embodiment, the method for extracting entity relationships in the text of the present invention includes the following steps:
step S101: and acquiring an entity triple relation set, acquiring an entity and entity attribute set, and acquiring a concept set.
In this example, freebase was selected as the basic knowledge base. Freebase is a large-scale knowledge map, inherently containing 7300 diverse relationships and over 9 million entities. The Resource Description Framework (RDF) triples (entity 1, relationship, entity 2) in the Freebase are sorted and stored in the computer, and as the entity triplet relationship set of this embodiment, denoted as R, include triples such as (New York, circyof, united States). In addition, the entity and the attribute information of the entity in Freebase are sorted and stored in the computer, and each entity may include zero or more attributes, which is denoted as E as the entity and entity attribute set in this embodiment.
The scheme of the embodiment relates to the construction and the use of a multi-concept knowledge base, and a concept dictionary is required to be prepared. The concept is to judge the concept category of the entity according to the context. In the knowledge-graph base, millions of concepts are included, and thus this knowledge base is used as a data source of the concept dictionary in the present embodiment. The entity related to each relationship in the relationship set R and its corresponding concept are organized and stored in a computer, which is denoted as C as a set of entities and its possible concepts of this embodiment, wherein the concept may be 1 or more, such as an entity and its concept (IBM, company; corporation; client; organization; vendor; supplier; 8230).
Step S102: and acquiring a triplet relation set of the sentences of the training text set and two entities identified in the sentences.
The present embodiment uses the new york times news text set as the training text set ratio. For each news document D in the training text set D, the starting point and the ending point of each sentence s are identified through punctuation marks, and the document is divided into a plurality of sentences. In order to perform the entity relationship extraction task, the entity in s needs to be identified, and in the scheme of the invention, the existing natural language processing tool StanfordNLP is used for carrying out the named entity identification operation. If the identified entities in s are not equal to 2, or the identified entities are not in set E, the sentence is considered invalid and discarded. Each sentence s meeting the condition and two recognized entities e 1 And e 2 Entering the triplet (s, e) 1 ,e 2 ) And stored in the computer, and the set of triple relationships between the sentence and the two entities identified in the sentence, denoted as SE, may include, for example, (New York is the most popular city in the United States, new York, united States).
Step S103: and performing remote supervision and labeling on the sentence of the training text set and the triple relationship set of the two entities identified in the sentence according to the entity triple relationship set, the entity and entity attribute set and the concept set to obtain the sentence comprising the training text set, the two entities identified in the sentence, concepts respectively corresponding to the two entities and the relationship set of the two entities, and putting the relationship set into a labeling training set.
The inputs for remote supervision include the set of entity triple relationships R, the set of entities and entity attributes E, and the set of concepts C. Sequentially carrying out concept recognition, remote supervision and analysis on the sentence of the training text set and the triple relation set SE of the two entities recognized in the sentence,Carrying out remote supervision and annotation by three operations of relation confidence degree scoring, and obtaining a sentence comprising a training text set and two entities e recognized in the sentence 1 、e 1 Concept c corresponding to two entities respectively 1 、c 2 And a set of relationships r of two entities 1 (s,(e 1 ,c 1 ,r 1 ,c 2 ,e 2 ) And putting the relation set into a labeling training set. And obtaining the five-tuple relation (e) 1 ,c 1 ,r 1 ,c 2 ,e 2 ) And putting the five-tuple relationship into a knowledge base KB to be built.
Step S104: and acquiring vector representation of words in the sentence in the training text set according to the labeled training set.
In this step, the input includes a label training set T train And a Wikipedia text corpus, the output being a vector representation of the words.
To represent the annotation training set T train Every word appearing in Chinese requires two operations: 1) Representing each word with a word vector, 2) enhancing the expression of the word vector in combination with the positional relationship of the words and the two entities in the sentence. In order to compute the word vector, a vocabulary needs to be determined. In the scheme of the invention, words appearing in Wikipedia more than 100 times are stored to jointly form a vocabulary table. And then training by using an open source word2vec tool through context information in a Wikipedia text corpus to obtain word vector expression of each word, and storing the word vector expression in a computer, wherein W is a set containing the words and word vectors corresponding to the words. Here, the dimension of the word vector and the size of the context window may be set, and in order to ensure the calculation efficiency, the dimension is set to be 50 and the window size is set to be 3 in the present embodiment. Suppose there is a training sample (s, (e) 1 , c 1 ,r 1 ,c 2 ,e 2 ) S = { w) including n words in the sentence s 1 ,w 2 ,…,w n Wherein two words correspond to the entity e 1 And e 2 . Firstly, obtaining a word vector v of each word through a query set W, and then recording each word to an entity e 1 And e 2 Distance ofDistance dist 1 And dist 2 And dist 1 And dist 2 Splicing the tail part of v to form a 52-dimensional word vector, and finally using the processed word vector sequence (v) 1 ,v 2 ,…,v n ) As input to the encoded sentence s-vector.
Step S105: and acquiring a sentence vector of each sentence in the training text set according to the vector representation of the words in the sentence.
In this step, the label training set T is input train The output is a sentence vector for each sentence for the word vectors of the words in each sentence in the sample.
Because each word in a sentence may contain important characteristic information in the entity relationship extraction task, the characteristic information of each word in the sentence needs to be integrated to jointly represent the sentence, so as to prepare for extracting the entity relationship from the sentence subsequently. The word vector of each word is obtained in step 3, and features in a plurality of word vectors in the sentence need to be extracted. The feature extraction mode is various, and a Convolutional Neural Network (CNN) is adopted in the scheme of the invention. Specifically, a segmented convolutional neural network model (PCNN) that can effectively use the position information of two entities in a sentence is employed. The process of PCNN mainly comprises 3 steps: 1) convolution, which needs to set step length and filter size, 2) maximum pooling, which divides sentences into three segments according to two entity positions, each segment performs maximum pooling operation, and 3) nonlinear activation and output operation. Through the operation, each input sentence can be represented into a vector, the dimension of the vector can be set by self, and the dimension can be set to be 200 according to the proposal in the prior scheme.
Step S106: and inputting the sentence vector of each sentence in the training text set into an entity relationship extraction model, and training the entity relationship extraction model according to the two entities marked in the sentence, the concepts respectively corresponding to the two entities marked in the sentence and the relationship between the two entities marked in the sentence.
After each sentence in the labeled training set is represented by a vector, the sentence can be used as the input of the entity relationship extraction model M, and the parameters of the neural network model M are trained according to the entity labeled in each training sample, the concept corresponding to the entity and the relationship of the entity.
Step S107: and obtaining a sentence vector of each sentence in the text set to be extracted.
Step S108: and inputting the sentence vector of each sentence in the text set to be extracted into the entity relationship extraction model, and acquiring a relationship set which comprises two entities, concepts respectively corresponding to the two entities and is used for each sentence in the text set to be extracted.
The method for extracting the entity relationship in the text provided by the invention has the advantages that the concept range of the entity in the context represents semantic context information, the entity relationship training set with multiple concepts and multiple relationships is obtained according to the concept range, and the entity relationship extraction model is constructed according to the training set, so that the problem of wrong labeling in the remote supervision process is fundamentally solved.
In one embodiment, the performing remote supervision and annotation on the sentence in the training text set and the triple relationship set of two entities identified in the sentence according to the entity triple relationship set, the entity and entity attribute set, and the concept set includes:
and performing context recognition on the sentences in the training text set to acquire concepts corresponding to the two entities recognized by the sentences respectively.
SE is an element in the set SE, i.e. SE is a triple about a sentence in a news document and two entities contained in the sentence. First, two entities e in the sentence s 1 And e 2 Respectively carrying out concept identification through context to obtain c 1 And c 2 Here, the concept recognition method is a classification problem, using the naive Bayes classification method, entity e 1 And entity e 2 All possible concepts of (C) can be queried from the set C.
Referring to fig. 2, in an embodiment, after performing context recognition on a sentence in the training text set and obtaining concepts corresponding to two entities recognized by the sentence, the method further includes the following steps:
step S201: and matching the two entities identified in the sentence of the training text set with the entity triple relation set.
Step S202: if the matching fails, a relation is randomly extracted from the entity triple relation set, a relation set which comprises sentences, the two marked entities, concepts respectively corresponding to the two marked entities and the randomly extracted relation set is generated, and the data set is used as a negative sample and is placed into the marking training set.
By finding the relationship triplets in the entity triplet relationship set R, the entity e is utilized 1 And e 2 As identity and relationship triplets (e) 1 ,r,e 2 ) And (6) matching. If not, the entity e of the knowledge base is considered 1 And e 2 There is no relation between them, and a relation R existing in the triple set R is randomly selected random Generating a label record (s, (e) 1 ,c 1 ,r random ,c 2 ,e 2 ) Put into the labeled training set T as a negative sample train
In one embodiment, the method further comprises the following steps:
if the matching is successful, generating a concept and a matching relation set which respectively correspond to the sentence, the two marked entities and the two marked entities, and performing confidence score on the matched relation, if the score result exceeds a first set threshold value, putting the data set into a marking training set as a positive sample, and if the score result is lower than the first set threshold value, putting the data set into the marking training set as a negative sample.
Wherein, fig. 3 is a schematic diagram of the remote supervision labeling process in this embodiment, if a triple (e) is matched 1 ,r,e 2 ) Then the relation r obtained for matching 1 And (6) performing confidence score. The scoring is based on the calculation of the relationship r from the co-occurrence 1 And the degree of relevance of the context in the sentence s, the higher the degree of relevance the higher the confidence score. When the score exceeds a first set threshold, a quintuple (e) is generated 1 ,c 1 ,r 1 ,c 2 , e 2 ) Represents when entity e 1 Concept of (c) 1 And entity e 2 Is c 2 When e is present 1 And e 2 Has a relation r between 1 . And generates a label record (s, (e) 1 ,c 1 ,r 1 ,c 2 ,e 2 ) Add label training set T as positive sample train . If the score does not exceed the first set threshold, the record (s, (e) is labeled 1 ,c 1 ,r random , c 2 ,e 2 ) Add label training set T as negative sample train
Referring to fig. 4, in an embodiment, the method further includes the following steps:
step S401: and acquiring a plurality of relation sets with the same concept in the generated relation sets.
Step S402: and judging the context correlation degree of each relation and sentence in the plurality of relation sets.
Step S403: and replacing the relationship with the maximum correlation degree into a plurality of relationship sets as a new relationship.
After all triples in the sentence of the training text set and the triple relationship set SE of the two entities identified in the sentence are labeled, because all labeled relationships are derived from Freebase, if the fact relationship contained in the Freebase has deviation, errors will be brought to the following calculation, and therefore, a positive sample in the labeling result needs to be corrected and adjusted to improve and optimize the result of the relationship labeling between the entities. For example, for a labeled positive sample (s, (e) 1 ,c 1 ,r 1 ,c 2 ,e 2 ) The annotated relation r was assumed in previous studies) 1 It is true that in the solution of the invention, concept c is assumed 1 And c 2 Is labeled correctly, but the relationship r 1 The correctness of (1) requires verification and correction. In order to reduce the computational complexity, in this embodiment, a candidate relationship set of each label is first screened out. The screening method is that the relation that two concepts in the label record are respectively the same is added into the candidate relation listR 1 In (1). For example, record (s 1, (e) 3 ,c 1 ,r 2 , c 2 ,e 4 ) Because of concept c) 1 And c 2 Respectively the same as in the above records, so entity e is recorded 3 And e 4 In concept c 1 And c 2 The relationship r expressed below 2 Listing into a candidate relationship list R 1 In (1). Next, the optimal relationship needs to be identified from the candidate relationships. Separately computing a set of relationships R 1 Each of the relationships r i The relation r with the context in the sentence s and the larger relation max As an optimization result of the relationship labeling. Deleting the positive sample record (s, (e) from the annotation data set T 1 ,c 1 ,r 1 ,c 2 ,e 2 ) And adding the optimized records (s, (e) 1 ,c 1 ,r max ,c 2 ,e 2 ) As a new positive sample. Finally, an entity e is added or updated to the knowledge base KB to be built 1 And e 2 Adding the quintuple relation (e) 1 ,c 1 ,r max ,c 2 ,e 2 )。
Please refer to fig. 5, which is a diagram illustrating an entity relationship extraction model M according to an embodiment of the present invention, in which a label is labeled as a training set T train Divided randomly in three parts T according to a ratio train (80% of the entire data set), T valid (10%),T test (10%) represent the training set, validation set, and test set, respectively, which obey the same data distribution.
The parameters of the entity relation extraction model M comprise a hyper-parameter and a common parameter. There are 4 hyper-parameters in the convolutional neural network that need to be set with initial values, set with B =100 per Batch of sample size, λ =0.01 for Learning rate of stochastic gradient descent, ρ =0.5 for neural network unit discarding probability (Dropout probability), and n =10 for maximum number of uses per sample. And after the hyper-parameters are set, starting a training process of the entity relation extraction model M. Inputting the processed positive and negative samples into the convolutional neural network in batches, and recording the concept recognition result and the label of each sampleAnd (3) noting the error between the concept categories, the error between the entity relation extraction result and the entity relation in the label, minimizing the comprehensive error of the convolutional neural network through a random gradient descent algorithm, and continuously adjusting and storing common parameters in the model M. In order to find the problem of model parameters in time and verify the generalization capability of the model, after 5 batches of sample calculation, a verification set T prepared in advance is used in the scheme of the invention valid And verifying whether the parameter setting of the current network model M is reasonable or not, and if not, adjusting in time.
After the entity relationship extraction model M training is completed, the present invention uses two benchmarking datasets: 1) A SemEval-2010Task 8 data set, wherein the data set comprises 9 bidirectional relations and 1 undirected 'other' relation, and comprises 10717 labeled samples in total, and 2) a NYT10 data set, wherein the data set comprises 53 relations in total, 1 relation 'NA' represents that two entities do not have any relation, the data set comprises 20202 labeled samples in total, and the two data sets are respectively subjected to entity relation extraction model M and are subjected to statistics on accuracy, recall rate and F1 value.
The method for extracting the entity relationship in the text, which is provided by the invention, can fundamentally reduce and solve the problem of wrong labeling in the knowledge base by using the multi-concept multi-relationship knowledge base. Meanwhile, the method for extracting the entity relationship in the text can effectively utilize the concept information of the entity, combines the context of the entity, eliminates the noise relationship before the relationship extraction, reduces the search space of the relationship extraction and improves the speed and the precision of the relationship extraction.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an entity relationship extraction system in text according to an embodiment of the present invention, where the entity relationship extraction system 600 in text includes:
the first obtaining module 601 is configured to obtain the entity triple relationship set, obtain the entity and the entity attribute set, and obtain the concept set.
A second obtaining module 602, configured to obtain a set of triple relationships between a sentence in the training text set and two entities identified in the sentence.
And a remote supervision and labeling module 603, configured to perform remote supervision and labeling on the sentence in the training text set and the triple relationship set of the two entities identified in the sentence according to the entity triple relationship set, the entity and entity attribute set, and the concept set, to obtain a sentence including the training text set, the two entities identified in the sentence, concepts corresponding to the two entities, and a relationship set of the two entities, and place the relationship set in a labeling training set.
And a representation input module 604, configured to obtain vector representations of words in a sentence in the training text set according to the labeled training set.
A first sentence expression module 605, configured to obtain a sentence vector of each sentence in the training text set according to vector expression of words in the sentence;
an entity relationship extraction model training module 606, configured to input the sentence vector of each sentence in the training text set into the entity relationship extraction model, and train the entity relationship extraction model according to the two entities labeled in the sentence, the concepts corresponding to the two entities labeled in the sentence, and the relationship between the two entities labeled in the sentence;
a second sentence expression module 607, configured to obtain a sentence vector of each sentence in the text set to be extracted;
the entity relationship extraction module 608 is configured to input the sentence vector of each sentence in the to-be-extracted text set into the entity relationship extraction model, and obtain a relationship set including two entities, concepts corresponding to the two entities, and the two entities for each sentence in the to-be-extracted text set.
In an embodiment, the remote supervision tagging module 603 further includes a context recognition unit 6031, configured to perform context recognition on the sentences in the training text set, and obtain concepts corresponding to the two entities recognized by the sentences respectively.
In one embodiment, the remote supervised tagging module 603 further includes a matching unit 6032 and a random extraction unit 6033, where the matching unit 6032 is configured to match two entities identified in a training text set sentence with an entity triplet relationship set; the random extraction unit 6033 is configured to, if matching fails, randomly extract a relationship from the entity triple relationship set, generate a relationship set including a sentence, two tagged entities, concepts corresponding to the two tagged entities, and the random extraction, and place the data set as a negative sample in the tagging training set.
In an embodiment, the remote monitoring labeling module 603 further includes a confidence scoring unit 6034 configured to, if matching is successful, generate a set of matching relationships and concepts corresponding to the sentence, the two labeled entities, and perform confidence scoring on the matched relationships, and if a scoring result exceeds a first set threshold, put the data set as a positive sample into a labeling training set, and if the scoring result is lower than the first set threshold, put the data set as a negative sample into the labeling training set.
In one embodiment, the remote supervised annotation module 603 further comprises:
a relationship set acquisition unit 6035 configured to acquire a plurality of relationship sets having the same concept in the generated relationship sets.
A contextual relevance degree determination unit 6036 configured to determine a contextual relevance degree of each of the relationships and sentences in the plurality of relationship sets.
A relationship replacing unit 6037, configured to replace the relationship with the largest degree of correlation into the plurality of relationship sets as a new relationship.
In one embodiment, the remote supervised annotation module 603 further comprises:
a relationship set deleting unit 6038, configured to delete the relationship sets in the annotation training set.
A relationship set replacing unit 6039, configured to place the relationship sets including the new relationship into the annotation training set.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The system for extracting the entity relationship in the text provided by the invention can fundamentally reduce and solve the problem of wrong labeling in the knowledge base by using the multi-concept multi-relationship knowledge base. Meanwhile, the method for extracting the entity relationship in the text can effectively utilize the concept information of the entity, combines the context of the entity, eliminates the noise relationship before the relationship extraction, reduces the search space of the relationship extraction and improves the speed and the precision of the relationship extraction.
The present invention also provides a computer readable medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the entity relationship extraction method in the text in any one of the above embodiments.
Referring to fig. 7, in an embodiment, an electronic device 700 of the present invention includes a memory 710 and a processor 720, and a computer program stored in the memory 710 and executable by the processor 720, where when the processor 720 executes the computer program, the method for extracting an entity relationship in a text in any one of the above embodiments is implemented.
In this embodiment, the controller 720 may be one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components. The storage medium 710 may take the form of a computer program product embodied on one or more storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc., in which the program code is embodied. Computer readable storage media, including both permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of the storage medium of the computer include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims (8)

1.一种文本中的实体关系抽取方法,其特征在于,包括如下步骤:1. an entity relationship extraction method in a text, is characterized in that, comprises the steps: 获取实体三元组关系集合,获取实体及实体属性集合,获取概念集合;Obtain the entity triplet relationship set, acquire the entity and entity attribute set, and acquire the concept set; 获取训练文本集的句子与该句子中识别到的两个实体的三元组关系集合;Obtain the set of triplet relations between the sentence of the training text set and the two entities identified in the sentence; 根据所述实体三元组关系集合、所述实体及实体属性集合以及所述概念集合,对所述训练文本集的句子与该句子中识别到的两个实体的三元组关系集合进行远程监督标注,具体的,通过对所述训练文本集的句子进行上下文识别,获取该句子识别到的两个实体分别对应的概念;According to the entity triple relationship set, the entity and entity attribute set, and the concept set, perform remote supervision on the sentence in the training text set and the triple relationship set of the two entities identified in the sentence. Marking, specifically, by performing context recognition on the sentences in the training text set, to obtain the concepts corresponding to the two entities identified by the sentence; 将训练文本集句子中识别到的两个实体与实体三元组关系集合进行匹配,获取包括训练文本集的句子、该句子中识别到的两个实体、两个实体分别对应的概念以及两个实体的关系集合,具体的,获取所生成的关系集合中,概念相同的多个关系集合,并判断所述多个关系集合中每一个关系和句子的上下文相关程度,再将相关程度最大的关系替换到多个关系集合中作为新的关系,将包括新的关系的所述多个关系集合放入标注训练集中;Match the two entities identified in the sentence of the training text set with the set of entity triples, and obtain the sentence including the training text set, the two entities identified in the sentence, the concepts corresponding to the two entities, and the two entities The relationship set of the entity, specifically, obtain multiple relationship sets with the same concept in the generated relationship set, and judge the degree of context correlation between each relationship and the sentence in the multiple relationship sets, and then assign the relationship with the greatest degree of correlation replacing into a plurality of relationship sets as a new relationship, and putting the plurality of relationship sets including the new relationship into a label training set; 根据所述标注训练集,获取训练文本集句子中词语的向量表示;According to the labeled training set, obtain the vector representation of the words in the sentence of the training text set; 根据句子中词语的向量表示,获取训练文本集每个句子的句子向量;Obtain the sentence vector of each sentence in the training text set according to the vector representation of the words in the sentence; 将训练文本集每个句子的句子向量输入实体关系抽取模型,根据该句子中被标注的两个实体、该句子中被标注的两个实体分别对应的概念以及该句子中被标注的两个实体的关系训练所述实体关系抽取模型;Input the sentence vector of each sentence in the training text set into the entity relationship extraction model, according to the two marked entities in the sentence, the concepts corresponding to the two marked entities in the sentence, and the two marked entities in the sentence The relationship between training the entity relationship extraction model; 获取待抽取文本集每个句子的句子向量;Obtain the sentence vector of each sentence in the text set to be extracted; 将待抽取文本集每个句子的句子向量输入所述实体关系抽取模型,获取待抽取文本集每个句子的包括两个实体、两个实体分别对应的概念以及两个实体的关系集合。The sentence vector of each sentence in the text set to be extracted is input into the entity relationship extraction model, and each sentence in the text set to be extracted includes two entities, concepts corresponding to the two entities, and a relationship set between the two entities. 2.根据权利要求1所述的文本中的实体关系抽取方法,其特征在于,对所述训练文本集的句子进行上下文识别,获取该句子识别到的两个实体分别对应的概念后,还包括如下步骤:2. the entity relation extraction method in the text according to claim 1, is characterized in that, context recognition is carried out to the sentence of described training text set, after obtaining the concept corresponding to two entities that this sentence recognizes respectively, also comprises Follow the steps below: 如果匹配失败,则从实体三元组关系集合随机抽取一种关系,生成包括句子、被标注的两个实体、被标注的两个实体分别对应的概念以及随机抽取的关系集合的数据集,并将该数据集作为负样本放入标注训练集。If the matching fails, a relationship is randomly extracted from the entity triple relationship set, and a data set including the sentence, the two marked entities, the concepts corresponding to the two marked entities and the randomly extracted relationship set is generated, and Put this dataset as a negative sample into the labeled training set. 3.根据权利要求2所述的文本中的实体关系抽取方法,其特征在于,还包括如下步骤:3. the entity relationship extraction method in the text according to claim 2, is characterized in that, also comprises the steps: 如果匹配成功,则生成包括句子、被标注的两个实体、被标注的两个实体分别对应的概念以及匹配的关系集合,并对匹配得到的关系进行置信度评分,如果评分结果超过第一设定阈值,则将该数据集作为正样本放入标注训练集,如果评分结果低于第一设定阈值,则将该数据集作为负样本放入标注训练集。If the matching is successful, generate a sentence, the two marked entities, the concepts corresponding to the two marked entities, and a matching relationship set, and perform a confidence score on the matching relationship. If the scoring result exceeds the first set If the threshold is set, the data set is put into the labeled training set as a positive sample, and if the scoring result is lower than the first set threshold, the data set is put into the labeled training set as a negative sample. 4.根据权利要求3所述的文本中的实体关系抽取方法,其特征在于,对匹配得到的关系进行置信度评分,包括:4. The entity-relationship extraction method in the text according to claim 3, is characterized in that, carrying out confidence score to the relation that matching obtains, comprises: 根据与句子的上下文共同出现在语料库中的比例,获取该匹配的关系与该句子中上下文的相关程度,相关程度越高,则置信度评分越高。According to the proportion of co-occurrence in the corpus with the context of the sentence, the degree of correlation between the matching relationship and the context in the sentence is obtained. The higher the degree of correlation, the higher the confidence score. 5.根据权利要求4所述的文本中的实体关系抽取方法,其特征在于,将相关程度最大的关系替换到多个关系集合中作为新的关系后,还包括如下步骤:5. the entity relationship extraction method in the text according to claim 4 is characterized in that, after replacing the relationship with the greatest degree of relevance into a plurality of relationship sets as new relationships, it also includes the following steps: 删除所述标注训练集中的所述多个关系集合。deleting the plurality of relationship sets in the labeled training set. 6.一种文本中的实体关系抽取系统,其特征在于,包括:6. An entity relationship extraction system in a text, characterized in that it comprises: 第一获取模块,用于获取实体三元组关系集合,获取实体及实体属性集合,获取概念集合;The first acquisition module is used to acquire entity triplet relationship sets, entity and entity attribute sets, and concept sets; 第二获取模块,用于获取训练文本集的句子与该句子中识别到的两个实体的三元组关系集合;The second acquisition module is used to acquire the triplet relationship set between the sentence of the training text set and the two entities identified in the sentence; 远程监督标注模块,用于根据所述实体三元组关系集合、所述实体及实体属性集合以及所述概念集合,对所述训练文本集的句子与该句子中识别到的两个实体的三元组关系集合进行远程监督标注,具体的,用于通过对所述训练文本集的句子进行上下文识别,获取该句子识别到的两个实体分别对应的概念,再将训练文本集句子中识别到的两个实体与实体三元组关系集合进行匹配,获取包括训练文本集的句子、该句子中识别到的两个实体、两个实体分别对应的概念以及两个实体的关系集合,并获取所生成的关系集合中,概念相同的多个关系集合,并判断所述多个关系集合中每一个关系和句子的上下文相关程度,再将相关程度最大的关系替换到多个关系集合中作为新的关系,并将包括新的关系的所述多个关系集合放入标注训练集中;The remote supervision and labeling module is used to compare the sentence of the training text set and the triplets of the two entities identified in the sentence according to the set of entity triples, the set of entities and entity attributes, and the set of concepts. The set of tuple relations is marked with remote supervision. Specifically, it is used to perform context recognition on the sentences of the training text set to obtain the concepts corresponding to the two entities recognized by the sentence, and then identify the two entities in the sentences of the training text set. Match the two entities of the two entities with the entity triplet relationship set, obtain the sentence including the training text set, the two entities identified in the sentence, the concepts corresponding to the two entities, and the relationship set of the two entities, and obtain all Among the generated relationship sets, there are multiple relationship sets with the same concept, and the degree of context correlation between each relationship and sentence in the multiple relationship sets is judged, and then the relationship with the highest degree of correlation is replaced into the multiple relationship sets as a new relationship, and put the plurality of relationship sets including the new relationship into the label training set; 表示输入模块,用于根据所述标注训练集,获取训练文本集句子中词语的向量表示;Indicate an input module, for obtaining the vector representation of words in the sentences of the training text set according to the labeled training set; 第一表示句子模块,用于根据句子中词语的向量表示,获取训练文本集每个句子的句子向量;The first represents the sentence module, which is used to obtain the sentence vector of each sentence of the training text set according to the vector representation of the words in the sentence; 实体关系抽取模型训练模块,用于将训练文本集每个句子的句子向量输入实体关系抽取模型,根据该句子中被标注的两个实体、该句子中被标注的两个实体分别对应的概念以及该句子中被标注的两个实体的关系训练所述实体关系抽取模型;The entity relationship extraction model training module is used to input the sentence vector of each sentence in the training text set into the entity relationship extraction model, according to the two entities marked in the sentence, the concepts corresponding to the two entities marked in the sentence, and The relationship between the two entities marked in the sentence trains the entity relationship extraction model; 第二表示句子模块,用于获取待抽取文本集每个句子的句子向量;The second represents the sentence module, which is used to obtain the sentence vector of each sentence of the text set to be extracted; 实体关系抽取模块,用于将待抽取文本集每个句子的句子向量输入所述实体关系抽取模型,获取待抽取文本集每个句子的包括两个实体、两个实体分别对应的概念以及两个实体的关系集合。The entity relationship extraction module is used to input the sentence vector of each sentence of the text set to be extracted into the entity relationship extraction model, and obtain each sentence of the text set to be extracted including two entities, concepts corresponding to the two entities, and two An entity's relationship collection. 7.一种计算机可读介质,其上存储有计算机程序,其特征在于:7. A computer-readable medium having a computer program stored thereon, characterized in that: 该计算机程序被处理器执行时实现如权利要求1至5任意一项所述的文本中的实体关系抽取方法。When the computer program is executed by the processor, the method for extracting the entity relationship in the text according to any one of claims 1 to 5 is realized. 8.一种电子设备,包括存储器、处理器以及储存在所述存储器并可被所述处理器执行的计算机程序,其特征在于:8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable by the processor, characterized in that: 所述处理器执行所述计算机程序时,实现如权利要求1至5任意一项所述的文本中的实体关系抽取方法。When the processor executes the computer program, the method for extracting entity relationship in text according to any one of claims 1 to 5 is realized.
CN201811376209.2A 2018-11-19 2018-11-19 Method and system for extracting entity relationship in text, storage medium and electronic equipment Active CN109472033B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811376209.2A CN109472033B (en) 2018-11-19 2018-11-19 Method and system for extracting entity relationship in text, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811376209.2A CN109472033B (en) 2018-11-19 2018-11-19 Method and system for extracting entity relationship in text, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN109472033A CN109472033A (en) 2019-03-15
CN109472033B true CN109472033B (en) 2022-12-06

Family

ID=65673074

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811376209.2A Active CN109472033B (en) 2018-11-19 2018-11-19 Method and system for extracting entity relationship in text, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN109472033B (en)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070093B (en) * 2019-04-08 2023-04-25 东南大学 Remote supervision relation extraction denoising method based on countermeasure learning
CN110209836B (en) * 2019-05-17 2022-04-26 北京邮电大学 Method and device for remote supervision relationship extraction
CN111475641B (en) * 2019-08-26 2021-05-14 北京国双科技有限公司 A data extraction method, device, storage medium and device
CN110516252B (en) * 2019-08-30 2022-12-09 京东方科技集团股份有限公司 Data annotation method and device, computer equipment and storage medium
CN110674637B (en) * 2019-09-06 2023-07-11 腾讯科技(深圳)有限公司 Character relationship recognition model training method, device, equipment and medium
CN110569366B (en) * 2019-09-09 2023-05-23 腾讯科技(深圳)有限公司 Text entity relation extraction method, device and storage medium
CN112579748B (en) * 2019-09-30 2024-07-26 北京国双科技有限公司 Method and device for extracting specific event relation from inquiry stroke list
CN110765231A (en) * 2019-10-11 2020-02-07 南京摄星智能科技有限公司 A textual event extraction method based on coreference fusion
CN111241303A (en) * 2020-01-16 2020-06-05 东方红卫星移动通信有限公司 Remote supervision relation extraction method for large-scale unstructured text data
CN111291554B (en) 2020-02-27 2024-01-12 京东方科技集团股份有限公司 Annotation method, relationship extraction method, storage medium and computing device
CN111563374B (en) * 2020-03-23 2022-08-19 北京交通大学 Personnel social relationship extraction method based on judicial official documents
CN111914553B (en) * 2020-08-11 2023-10-31 民生科技有限责任公司 Financial information negative main body judging method based on machine learning
CN112507125B (en) * 2020-12-03 2025-01-21 平安科技(深圳)有限公司 Triple information extraction method, device, equipment and computer-readable storage medium
CN112559770A (en) * 2020-12-15 2021-03-26 北京邮电大学 Text data relation extraction method, device and equipment and readable storage medium
CN112613306B (en) * 2020-12-31 2024-08-02 恒安嘉新(北京)科技股份公司 Method, device, electronic equipment and storage medium for extracting entity relationship
CN113051356B (en) * 2021-04-21 2023-05-30 深圳壹账通智能科技有限公司 Open relation extraction method and device, electronic equipment and storage medium
CN113268577B (en) * 2021-06-04 2022-08-23 厦门快商通科技股份有限公司 Training data processing method and device based on dialogue relation and readable medium
CN113282717B (en) * 2021-07-23 2021-10-29 北京惠每云科技有限公司 Method, device, electronic device and storage medium for extracting entity relationship in text
CN113806493B (en) * 2021-10-09 2023-08-29 中国人民解放军国防科技大学 Entity relationship joint extraction method and device for Internet text data
CN114139515B (en) * 2021-10-18 2024-11-29 浙江香侬慧语科技有限责任公司 Method, device, medium and equipment for generating transfer text
CN114519105B (en) * 2021-12-24 2024-07-12 北京达佳互联信息技术有限公司 Concept word determining method and device, electronic equipment and storage medium
CN114637824B (en) * 2022-03-18 2023-12-01 马上消费金融股份有限公司 Data enhancement processing method and device
CN114996472B (en) * 2022-05-26 2025-04-25 神州医疗科技股份有限公司 A sample optimization method and system based on relation extraction model
CN114896424A (en) * 2022-06-23 2022-08-12 北京交通大学 A text quintuple data extraction method for domain knowledge graph construction
CN114936558A (en) * 2022-07-08 2022-08-23 讯飞智元信息科技有限公司 Entity recognition model training method, entity recognition device and related equipment
CN116205235B (en) * 2023-05-05 2023-08-01 北京脉络洞察科技有限公司 Data set dividing method and device and electronic equipment
CN117194608A (en) * 2023-08-23 2023-12-08 南京东南大学城市规划设计研究院有限公司 Spatial gene identification and extraction method and system based on ancient poetry text
CN117909487A (en) * 2024-03-20 2024-04-19 北方健康医疗大数据科技有限公司 Medical question-answering service method, system, device and medium for old people
CN119476295A (en) * 2025-01-14 2025-02-18 中国科学院深圳先进技术研究院 A semi-supervised information extraction method, system, device and medium
CN119918677B (en) * 2025-04-01 2025-07-25 北京工业大学 A large model prompt design method for building a social governance knowledge graph based on CoT chain thinking

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808525A (en) * 2016-03-29 2016-07-27 国家计算机网络与信息安全管理中心 Domain concept hypernym-hyponym relation extraction method based on similar concept pairs
CN106874261A (en) * 2017-03-17 2017-06-20 中国科学院软件研究所 A kind of domain knowledge collection of illustrative plates and querying method based on semantic triangle
CN107169079A (en) * 2017-05-10 2017-09-15 浙江大学 A kind of field text knowledge abstracting method based on Deepdive
CN108287911A (en) * 2018-02-01 2018-07-17 浙江大学 A kind of Relation extraction method based on about fasciculation remote supervisory

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9202176B1 (en) * 2011-08-08 2015-12-01 Gravity.Com, Inc. Entity analysis system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808525A (en) * 2016-03-29 2016-07-27 国家计算机网络与信息安全管理中心 Domain concept hypernym-hyponym relation extraction method based on similar concept pairs
CN106874261A (en) * 2017-03-17 2017-06-20 中国科学院软件研究所 A kind of domain knowledge collection of illustrative plates and querying method based on semantic triangle
CN107169079A (en) * 2017-05-10 2017-09-15 浙江大学 A kind of field text knowledge abstracting method based on Deepdive
CN108287911A (en) * 2018-02-01 2018-07-17 浙江大学 A kind of Relation extraction method based on about fasciculation remote supervisory

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于概念语义相关性和LDA的文本标记算法;周春 等;《华南师范大学学报(自然科学版)》;20180825;第50卷(第04期);第121-128 *

Also Published As

Publication number Publication date
CN109472033A (en) 2019-03-15

Similar Documents

Publication Publication Date Title
CN109472033B (en) Method and system for extracting entity relationship in text, storage medium and electronic equipment
US11521713B2 (en) System and method for generating clinical trial protocol design document with selection of patient and investigator
US9146987B2 (en) Clustering based question set generation for training and testing of a question and answer system
US9239875B2 (en) Method for disambiguated features in unstructured text
US9230009B2 (en) Routing of questions to appropriately trained question and answer system pipelines using clustering
WO2021000676A1 (en) Q&a method, q&a device, computer equipment and storage medium
WO2020052405A1 (en) Corpus annotation set generation method and apparatus, electronic device, and storage medium
CN109582799A (en) The determination method, apparatus and electronic equipment of knowledge sample data set
CN106776711A (en) A kind of Chinese medical knowledge mapping construction method based on deep learning
CN111125295B (en) A method and system for obtaining answers to food safety questions based on LSTM
CN104573130A (en) Entity resolution method based on group calculation and entity resolution device based on group calculation
WO2020074017A1 (en) Deep learning-based method and device for screening for keywords in medical document
CN114626463A (en) Language model training method, text matching method and related device
Malaviya et al. Quest: A retrieval dataset of entity-seeking queries with implicit set operations
CN108959529A (en) Determination method, apparatus, equipment and the storage medium of problem answers type
CN116049376B (en) Method, device and system for retrieving and replying information and creating knowledge
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN117435685A (en) Document retrieval method, document retrieval device, computer equipment, storage medium and product
CN116976321A (en) Text processing method, apparatus, computer device, storage medium, and program product
CN116975275A (en) Multilingual text classification model training method and device and computer equipment
CN113505889B (en) Processing method and device of mapping knowledge base, computer equipment and storage medium
CN118503454B (en) Data query method, device, storage medium and computer program product
CN114741494A (en) Question answering method, device, equipment and medium
CN119226470A (en) Automatic question-answering method, system, terminal and storage medium based on medical documents
CN113761126A (en) Text content identification method, text content identification device, text content identification equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载