CN111814461B - Text processing method, related equipment and readable storage medium - Google Patents
Text processing method, related equipment and readable storage medium Download PDFInfo
- Publication number
- CN111814461B CN111814461B CN202010656329.9A CN202010656329A CN111814461B CN 111814461 B CN111814461 B CN 111814461B CN 202010656329 A CN202010656329 A CN 202010656329A CN 111814461 B CN111814461 B CN 111814461B
- Authority
- CN
- China
- Prior art keywords
- text
- processed
- character
- determining
- characters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 238000000034 method Methods 0.000 claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 29
- 238000012549 training Methods 0.000 claims description 18
- 230000015654 memory Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 4
- BGRDGMRNKXEXQD-UHFFFAOYSA-N Maleic hydrazide Chemical compound OC1=CC=C(O)N=N1 BGRDGMRNKXEXQD-UHFFFAOYSA-N 0.000 description 9
- 238000010586 diagram Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 239000003337 fertilizer Substances 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Character Input (AREA)
- Machine Translation (AREA)
Abstract
The application discloses a text processing method, related equipment and a readable storage medium, wherein after a text to be processed is acquired, an object set contained in the text to be processed is determined, an attribute corresponding to each object in the object set is determined, and the attribute is combined with the object to obtain a target object. The method for processing and identifying the target object by the text can save manpower and time compared with a manual mode. Furthermore, in the application, the specific reference relation of the target object can be defined through different object properties, so that the accuracy of the identified target object can be higher.
Description
Technical Field
The present application relates to the field of natural language processing, and more particularly, to a text processing method, a related device, and a readable storage medium.
Background
In some scenarios, it is often desirable to identify certain objects from text, for example, in the jurisdiction, objects belonging to a judicial volume (e.g., prosecution notes, authentication reports, inquiry notes, survey notes, identification notes, etc.) from documents in the judicial volume.
At present, a manual mode is mostly adopted to identify a certain object from a text, however, the manual mode consumes a great deal of manpower and time, and has low efficiency and low identification accuracy.
Disclosure of Invention
In view of the foregoing, the present application proposes a text processing method, related apparatus, and readable storage medium. The specific scheme is as follows:
A text processing method, comprising:
Acquiring a text to be processed;
determining an object set contained in the text to be processed;
And determining an attribute corresponding to each object in the object set, and combining the attribute with the object to obtain a target object.
Optionally, the determining the object set contained in the text to be processed includes:
determining character level characteristics of each character in the text to be processed and text level characteristics of the text to be processed;
Each character in the text to be processed is spliced with the character level feature of the character and the text level feature of the text to be processed, so that the character spliced feature is obtained;
identifying the characteristics of each character after the splicing to obtain an object identification result of each character;
and determining an object set contained in the text to be processed based on the object recognition result of each character.
Optionally, the determining, for each object in the object set, an attribute corresponding to the object includes:
Acquiring dependency syntax relations among all characters in the text to be processed;
Determining object attribute characteristics of each character in the text to be processed according to character level characteristics of the characters, object recognition results of the characters and dependency syntactic relations among the characters in the text to be processed;
And identifying object attribute characteristics of each character in the text to be processed, and determining the attribute corresponding to each object in the object set.
Optionally, for each character in the text to be processed, determining the object attribute feature of the character according to the character level feature of the character, the object recognition result of the character, and the dependency syntax relationship between the characters in the text to be processed, including:
Generating object recognition features of the characters according to the character level features of the characters and the object recognition results of the characters;
and determining object attribute characteristics of the characters according to the object identification characteristics of the characters in the text to be processed and the dependency syntax relationship among the characters in the text to be processed.
Optionally, the text to be processed is a plurality of, and the method further comprises:
and carrying out association of the same object on the target object corresponding to each text to be processed.
Optionally, the associating the same object with the target object corresponding to each text to be processed includes:
determining two target objects to be judged from target objects corresponding to each text to be processed, wherein the two target objects to be judged are respectively contained in different texts to be processed;
Judging whether the two target objects to be judged are matched or not;
And if the two target objects to be determined are matched, determining that the two target objects to be determined are the same object.
Optionally, the determining whether the two target objects to be determined match includes:
And processing the two target objects to be judged by using a matching judgment model to obtain judging results of whether the two target objects to be judged are matched, wherein the judging results are output by the matching judgment model, and the matching judgment model is obtained by taking a target object pair as a training sample and taking a judging result of whether the target object pair is matched or not as a sample label as training.
Optionally, the process of processing the two target objects to be determined by using a matching determination model to obtain a determination result of whether the two target objects to be determined output by the matching determination model match, includes:
comparing the two target objects to be judged by using a first matching judgment module of the matching judgment model to obtain a first matching judgment result;
Comparing the same object attributes in the two target objects to be judged by using a second matching judgment module of the matching judgment model to obtain a second matching judgment result;
And determining whether the two target objects to be determined are matched based on the first matching determination result and the second matching determination result by utilizing a comprehensive matching determination module of the matching determination model.
A text processing apparatus, comprising:
The acquisition unit is used for acquiring the text to be processed;
An object set determining unit, configured to determine an object set included in the text to be processed;
And the target object determining unit is used for determining the attribute corresponding to each object in the object set, and combining the attribute with the object to obtain the target object.
Optionally, the object set determining unit includes:
the feature determining unit is used for determining character level features of each character in the text to be processed and text level features of the text to be processed;
the character splicing unit is used for splicing each character in the text to be processed, and the character level characteristics of the character and the text level characteristics of the text to be processed to obtain characteristics after character splicing;
the feature recognition unit is used for recognizing the features of each character after the characters are spliced to obtain an object recognition result of each character;
And the object set determining subunit is used for determining the object set contained in the text to be processed based on the object recognition result of each character.
Optionally, the target object determining unit includes:
a dependency syntax relation acquisition unit, configured to acquire a dependency syntax relation between each character in the text to be processed;
The object attribute feature determining unit is used for determining object attribute features of the characters according to character level features of the characters, object recognition results of the characters and dependency syntactic relations among the characters in the text to be processed for each character in the text to be processed;
And the object attribute characteristic recognition unit is used for recognizing object attribute characteristics of each character in the text to be processed and determining the attribute corresponding to each object in the object set.
Optionally, the object attribute feature determining unit includes:
An object recognition feature determining unit, configured to generate an object recognition feature of the character according to a character level feature of the character and an object recognition result of the character;
and the object attribute feature determining subunit is used for determining the object attribute features of the characters according to the object identification features of the characters in the text to be processed and the dependency syntax relationship among the characters in the text to be processed.
Optionally, the text to be processed is plural, and the apparatus further includes:
And the object association unit is used for associating the same object with the target object corresponding to each text to be processed.
Optionally, the object association unit includes:
The target object determining unit is used for determining two target objects to be determined from target objects corresponding to the texts to be processed, wherein the two target objects to be determined are respectively contained in different texts to be processed;
The judging unit is used for judging whether the two target objects to be judged are matched or not; and if the two target objects to be determined are matched, determining that the two target objects to be determined are the same object.
Optionally, the judging unit is specifically configured to:
And processing the two target objects to be judged by using a matching judgment model to obtain judging results of whether the two target objects to be judged are matched, wherein the judging results are output by the matching judgment model, and the matching judgment model is obtained by taking a target object pair as a training sample and taking a judging result of whether the target object pair is matched or not as a sample label as training.
Optionally, the process of processing the two target objects to be determined by using a matching determination model to obtain a determination result of whether the two target objects to be determined output by the matching determination model match, includes:
comparing the two target objects to be judged by using a first matching judgment module of the matching judgment model to obtain a first matching judgment result;
Comparing the same object attributes in the two target objects to be judged by using a second matching judgment module of the matching judgment model to obtain a second matching judgment result;
And determining whether the two target objects to be determined are matched based on the first matching determination result and the second matching determination result by utilizing a comprehensive matching determination module of the matching determination model.
A text processing device comprising a memory and a processor;
The memory is used for storing programs;
the processor is configured to execute the program to implement the steps of the text processing method as described above.
A readable storage medium having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the text processing method as described above.
By means of the technical scheme, the application discloses a text processing method, related equipment and a readable storage medium, after a text to be processed is acquired, an object set contained in the text to be processed is determined, an attribute corresponding to each object in the object set is determined, and the attribute is combined with the object to obtain a target object. The method for processing and identifying the target object by the text can save manpower and time compared with a manual mode. Furthermore, in the application, the specific reference relation of the target object can be defined through different object properties, so that the accuracy of the identified target object can be higher.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
Fig. 1 is a schematic flow chart of a text processing method according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of an object recognition model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of dependency syntax among characters in a text according to an embodiment of the present application;
FIG. 4 is a flow chart of another text processing method according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of a matching judgment model according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a text processing device according to an embodiment of the present application;
fig. 7 is a block diagram of a hardware structure of a text processing device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Next, the text processing method provided by the present application will be described by the following examples.
Referring to fig. 1, fig. 1 is a schematic flow chart of a text processing method disclosed in an embodiment of the present application, where the method may include the following steps:
Step S101: and acquiring a text to be processed.
In the present application, the text to be processed may be composed of words expressed in any written language (e.g., chinese, english, etc.). The text to be processed may be a sentence, a paragraph, or a chapter, which is not limited in any way.
It should be noted that the text to be processed may be text obtained based on techniques such as voice recognition, picture recognition, input method recognition, etc., or may be a document with a specific format, etc., which is not limited in any way.
For ease of understanding, the present application gives the following example of text to be processed:
' Zhang san, 3 months and 4 days steal two electric vehicles in a fertilizer factory, one is red and yadi, and the other is black; a gray association computer was stolen in Long Qingxiao on day 3 and 16, and then a Honda CB400 motorcycle was stolen. "
Step S102: and determining an object set contained in the text to be processed.
In the present application, at least one object is included in the object set. The object may be a character having some kind of commonality in the text, for example, the object may be an object appearing in the text, and the object may also be a name of a person, a place, etc. appearing in the text. The present application is not limited in this regard.
To facilitate an understanding of the object set, the present application gives the following examples:
Assuming that the text to be processed is Zhang San, two electric vehicles are stolen in a fertilizer plant for 3 months and 4 days, one is red Atdi, and one is black Ema; a gray association computer was stolen in Long Qingxiao on day 3 and 16, and then a Honda CB400 motorcycle was stolen. And if the object is an object, the object set contained in the text to be processed is an electric vehicle, a computer and a motorcycle.
It should be noted that, a specific implementation manner of determining the object set included in the text to be processed will be described in detail through a later embodiment.
Step S103: and determining an attribute corresponding to each object in the object set, and combining the attribute with the object to obtain a target object.
In the present application, different objects have different object properties, for example, when the object is an object, the object properties may be color, brand, model, and the like. The target object is an object with object attribute, for the convenience of understanding, it is assumed that the text to be processed is Zhang three, two electric vehicles are stolen in a fertilizer factory for 3 months and 4 days, one is red yadi, and the other is black Ema; a gray association computer was stolen in Long Qingxiao on day 3 and 16, and then a Honda CB400 motorcycle was stolen. The object set contained in the text to be processed is electric vehicle, computer and motorcycle, the corresponding attribute of the electric vehicle is red, atdi, black and Ama, the corresponding attribute of the computer is gray and association, and the corresponding attribute of the motorcycle is Honda and CB400. In the application, the following target objects of red Atdi electric vehicle, black Ama electric vehicle, gray association computer and Honda CB400 motorcycle can be obtained by combining the objects with the attributes.
It should be noted that, for each object in the object set, an attribute corresponding to the object is determined, and the attribute is combined with the object to obtain a specific implementation manner of the target object, which will be described in detail in the following embodiments.
The embodiment discloses a text processing method, after a text to be processed is acquired, an object set contained in the text to be processed is determined, an attribute corresponding to each object in the object set is determined, and the attribute is combined with the object to obtain a target object. The method for processing and identifying the target object by the text can save manpower and time compared with a manual mode. Furthermore, in the application, the specific reference relation of the target object can be defined through different object properties, so that the accuracy of the identified target object can be higher.
As an implementation manner, a specific implementation manner of determining an object set contained in a text to be processed is disclosed in the present application, and the manner may include the following steps:
Step S201: and determining character level characteristics of each character in the text to be processed, and determining the text level characteristics of the text to be processed.
In the application, the character level characteristic of each character in the text to be processed can be semantic information of each character, and the text level characteristic of the text to be processed can be semantic information of the text to be processed, wherein different texts have uniqueness and different expressions of objects.
Step S202: and splicing each character in the text to be processed, and acquiring the character spliced characteristic by splicing the character grade characteristic of the character with the text grade characteristic of the text to be processed.
For ease of understanding, assuming that the character level feature of the character "electric" is c and the text level feature of the text to be processed is h, the feature after the character "electric" concatenation is c+h.
Step S203: and identifying the characteristics of each character after the splicing to obtain the object identification result of each character.
In the present application, the steps S201 to S203 may be performed based on an object recognition model, and as an implementation manner, the text to be processed may be input into the object recognition model, the object recognition model outputs the object set included in the text to be processed, and the object recognition model is obtained by training with the training text as a training sample and the object set labeled by the training text as a sample label.
A detailed description of a specific implementation of determining the set of objects contained in the text to be processed based on the object recognition model is provided below.
Referring to fig. 2, fig. 2 is a schematic structural diagram of an object recognition model according to an embodiment of the present application, where the object recognition model includes: the device comprises a character level feature determining module, a text level feature determining module, a feature splicing module and an identification module.
After the text to be processed is input into the object recognition model, the text to be processed is processed by utilizing the character level feature determining module, so that the character level feature of each character in the text to be processed output by the character level feature determining module is obtained. And processing the text to be processed by using the text-level feature determining module to obtain the text-level feature of the text to be processed, which is output by the text-level feature determining module. After the character level feature of each character in the text to be processed and the text level feature of the text to be processed are obtained, inputting the character level feature of each character in the text to be processed and the text level feature of the text to be processed into a splicing module to obtain the spliced feature of each character in the text to be processed. And finally, inputting the characteristics of each character spliced in the text to be processed into a recognition module, and outputting an object recognition result of each character by the recognition module.
The character level feature determination module can be implemented based on any one of a BERT (Bidirectional Encoder Representations from Transformers, bi-directional encoder characterizer from transformer) model, a RoBERTa model, a RoBERTa-large Chinese pre-training model, roBERTa-wwm-ext, and RoBERTa-wwm-large-ext.
The text level feature determination model may be implemented based on an LSTM (Long Short-Term Memory) network, which is capable of encoding text to be processed to obtain text level features of the text to be processed. The present application is not limited in this regard.
The recognition module may include a full connection layer and a classification layer, and the object recognition result of each character is the classification result output by the classification layer. And finally, determining the object set contained in the text to be processed based on the classification result output by the two classification layers.
For the sake of understanding, assume that the text to be processed is Zhang san, two electric vehicles are stolen in a fertilizer plant for 3 months and 4 days, one is red yadi, and one is black Ema; a gray association computer is stolen in Long Qingxiao area on the 3 month and 16 days, then a Honda CB400 motorcycle is stolen, and if characters are the object, the output of the two classification layers is 1, and if the characters are not the object, the output of the two classification layers is 0, and the output of the two classification layers is 0000000000000001110000000000000000000000000000000000011000000000000000111.
Step S204: and determining an object set contained in the text to be processed based on the object recognition result of each character.
Based on the output of the two classification layers, determining that the object set contained in the text to be processed is an electric vehicle, a computer and a motorcycle.
As an implementation manner, the present application discloses a specific implementation manner of determining, for each object in the object set, an attribute corresponding to the object, where the method may include the following steps:
Step S301: and obtaining the dependency syntax relation among all the characters in the text to be processed.
It should be noted that, the dependency syntax relationship may be obtained based on a dependency syntax relationship obtaining method commonly used at present, and the present application will not be described in detail.
For ease of understanding, assuming that the text to be processed is "Zhang san has stolen two electric vehicles in the fertilizer plant, one red yadi and one black emma", the dependency syntax relationship between the characters in the text to be processed is as shown in fig. 3.
Step S302: and determining object attribute characteristics of each character in the text to be processed according to character level characteristics of the characters, object recognition results of the characters and dependency syntactic relations among the characters in the text to be processed.
As an implementation manner, the determining, for each character in the text to be processed, the object attribute feature of the character according to the character level feature of the character, the object recognition result of the character, and the dependency syntax relationship between the characters in the text to be processed includes:
step S3021: and generating the object recognition feature of the character according to the character level feature of the character and the object recognition result of the character.
In the present application, the character level feature of each character in the text to be processed may be obtained by processing the text to be processed by using the character level feature determining module of the object recognition model, which is not described herein. The object recognition result of the character may be obtained by using the recognition module of the object recognition model, where the object recognition result of the character is used to indicate whether the character is an object in the object set, and as an example, if the character is an object in the object set, the object recognition result of the character is1, and if the character is not an object in the object set, the object recognition result of the character is 0.
As an implementation manner, the character level feature of the character and the object recognition result of the character may be encoded based on BiLSTM (Bi-directional Long Short-Term memory) network, so as to obtain the object recognition feature of the character.
Step S3022: and determining object attribute characteristics of the characters according to the object identification characteristics of the characters in the text to be processed and the dependency syntax relationship among the characters in the text to be processed.
In the application, according to the dependency syntax relationship among the characters in the text to be processed, the specific implementation manner of determining the dependency syntax characteristics of the characters can be as follows: determining the dependency syntax characters corresponding to the characters and the dependency syntax features between the characters and the corresponding dependency syntax characters according to the dependency syntax relationship between the characters in the text to be processed, and splicing the object identification features of the characters, the object identification features of the dependency syntax characters corresponding to the characters and the dependency syntax features between the characters and the corresponding dependency syntax characters to obtain the object attribute features of the characters.
For ease of understanding, assume that characters x h and x i in the text to be processed have a dependency syntax relationship r, which corresponds to the dependency syntax character x i for character x h. The dependency syntax feature u i=[wi,wh,vr for character x h where w h is the object identification feature for x h, w i is the object identification feature for x i and v r is the feature for r.
There are 14 dependency syntaxes, and a 14×200 two-dimensional matrix can be preset in the present application, and each dependency syntaxes feature a1×200 vector. In the present application, indexes (e.g., 0 to 13) of the respective dependency syntaxes may be preset, and after determining that the characters having the dependency syntaxes r are x h and x i, dependency syntactic characteristics between the characters x h and x i are determined according to the indexes of the dependency syntaxes r.
Step S303: and identifying object attribute characteristics of each character in the text to be processed, and determining the attribute corresponding to each object in the object set.
According to the application, the dependency syntax relationship among the characters in the text to be processed is integrated in the object attribute characteristics of the characters in the text to be processed, and the attribute corresponding to different objects can be determined based on the dependency syntax relationship among the characters. For example, as can be seen from fig. 3, there is an object relationship (indicated by VOB shown in fig. 3) between the thief and the electric vehicle, there is a parallel relationship (indicated by COO shown in fig. 3) between the thief and the yady, it can be determined from the VOB and COO that the yady and the yama are contained by the electric vehicle, and there is a parallel relationship between the yady and the yama.
In this way, the method of the present application may implement the step of "determining, for each object in the object set, an attribute corresponding to the object, and combining the attribute with the object to obtain the target object" based on the target object determination model. The target object determining model is obtained by taking training texts as training samples and taking target objects marked by the training texts as sample tags. The targeting model is specifically used to perform steps S301 to S303 described above.
In some scenarios, it is often desirable to identify certain objects from a multitude of texts and to associate the identified objects with one another. For example, in the judicial field, the integrity of the dirt chain is one of the requirements of the suspected person, so in order to determine whether the dirt chain corresponding to the judicial document is complete, it is necessary to identify the dirt from a plurality of documents (such as a prosecution opinion book, an identification report, an inquiry record, a survey record, a recognition record, etc.) in the judicial document, and associate the same dirt with each of the identified dirt, so as to determine whether the dirt chain corresponding to the judicial document is complete.
At present, a manual mode is often adopted to identify a certain object from a plurality of texts, and the identified object is associated with the same object. For example, in the judicial field, it is required that a judicial practitioner identify dirt from a plurality of documents in a judicial document and associate the identified dirt with each other to determine whether the corresponding dirt chain of the judicial document is complete. However, the manual mode consumes a lot of manpower and time, and is low in efficiency.
To solve the above-described problems, another text processing method is disclosed in the present application.
Referring to fig. 4, fig. 4 is a schematic flow chart of another text processing method disclosed in an embodiment of the present application, where the method may include the following steps:
step S401: and acquiring a plurality of texts to be processed.
In the present application, the plurality of texts to be processed may be texts having a certain association relationship, for example, the plurality of texts to be processed may be a plurality of documents in a judicial document (for example, a prosecution opinion book, an identification report, an inquiry transcript, a survey transcript, a recognition transcript, etc.).
Step S402: and determining an object set contained in each text to be processed, determining an attribute corresponding to each object in the object set, and combining the attribute with the object to obtain a target object corresponding to the text to be processed.
In the present application, the processing manner of each text to be processed may refer to the description related to the step S102 and the step S103, which is not described herein.
Step S403: and carrying out association of the same object on the target object corresponding to each text to be processed.
In the application, the target object corresponding to each text to be processed is associated with the same object, so that the target object which is the same object in each text to be processed can be determined, and the specific implementation manner will be described in detail through the following embodiments.
As an implementation manner, the present application discloses a specific implementation manner of associating a combined target object corresponding to each text to be processed with the same object, where the method may include the following steps:
step S501: and determining two target objects to be judged from target objects corresponding to each text to be processed, wherein the two target objects to be judged are respectively contained in different texts to be processed.
For example, in the judicial field, a complete dirt chain is required to satisfy that dirt mentioned in the prosecution opinion book is present in the identification report, the interrogation transcript and the identification transcript, and the two target objects to be determined may be dirt contained in the prosecution opinion book and in the identification report, respectively. For ease of understanding, the two target objects to be determined may be "one black 48V yadi electric vehicle", "one red 48V yadi electric vehicle".
Step S502: judging whether the two target objects to be judged are matched or not; if there is a match, step S503 is performed, and if there is no match, step S504 is performed.
In the application, judging whether the two target objects to be judged are matched can be realized based on a neural network structure, specifically, the two target objects to be judged can be processed by utilizing a matching judgment model to obtain judging results of whether the two target objects to be judged are matched, which are output by the matching judgment model, wherein the matching judgment model is obtained by taking a target object pair as a training sample and taking a judging result of whether the target object pair is matched or not as a sample label as training.
Step S503: and determining that the two target objects to be determined are the same object.
Step S504: and determining that the two target objects to be determined are not the same object.
In another embodiment of the present application, a specific implementation manner of determining whether the two target objects to be determined match based on the matching determination model is described.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a matching determination model according to an embodiment of the present application, where the matching determination model includes: the device comprises a first matching judging module, a second matching judging module and a comprehensive matching judging module.
Based on the structure of the above-mentioned matching judgment model, the process of processing the two target objects to be judged by using the matching judgment model to obtain the judgment result of whether the two target objects to be judged are matched, which is output by the matching judgment model, includes:
s601: and comparing the two target objects to be judged by using a first matching judgment module of the matching judgment model to obtain a first matching judgment result.
In the application, the characteristics of each target object to be judged can be determined, and the similarity of the characteristics of two target objects to be judged is compared to obtain a first matching judgment result.
For each target object to be determined, a specific implementation manner of determining the characteristics of the target object to be determined may be: determining the characteristic of each character in the target object to be judged, and determining the object attribute characteristic corresponding to each character in the target object to be judged; weighting the object attribute characteristics corresponding to each character in the target object to be judged to obtain weighted characteristics of the object attributes corresponding to each character in the target object to be judged; splicing the characteristics of each character in the target object to be judged and the weighted characteristics of the object attribute corresponding to each character in the target object to be judged to obtain the characteristics of each character in the target object to be judged after splicing; and according to the characteristics of each character spliced in the target object to be judged, obtaining the characteristics of the target object to be judged.
S602: and comparing the same object attributes in the two target objects to be judged by using a second matching judgment module of the matching judgment model to obtain a second matching judgment result.
In the application, the same object attribute in the two target objects to be determined can be determined, then the characteristics of the characters corresponding to the same object attribute in each target object to be determined are compared, and the similarity of the characteristics of the characters corresponding to the same object attribute in the two target objects to be determined is compared to obtain a second matching determination result.
For ease of understanding, assuming that the two target objects to be determined are "one black 48V yadi electric vehicle" and "one red 48V yadi electric vehicle", respectively, the same object attributes in the two target objects to be determined are "48V" and "yadi".
In the present application, a specific implementation manner of determining the feature of the character corresponding to the same object attribute for each target object to be determined may be: and determining the characteristics of the characters of the target object to be determined corresponding to the same object attribute according to the spliced characteristics of the characters of the target object to be determined corresponding to the same object attribute.
S603: and determining whether the two target objects to be determined are matched based on the first matching determination result and the second matching determination result by utilizing a comprehensive matching determination module of the matching determination model.
In the application, the weights of the first matching judgment result and the second matching judgment result can be preset, the final matching judgment result is obtained based on the weights, and whether the two target objects to be judged are matched can be determined according to the final matching judgment result and the preset judgment threshold value.
It should be further noted that after the target objects corresponding to the texts to be processed are associated with the same object, other processing may be performed according to the association result, and as an implementation manner, the missing target objects in each text may be determined according to the association result. For example, in the judicial field, after the same dirt association is performed on the dirt in each document in the judicial volume, it can be determined whether the dirt in the complaint opinion book is missing in the identification report, the inquiry record, the investigation record, the identification record, etc.
The text processing device disclosed in the embodiments of the present application will be described below, and the text processing device described below and the text processing method described above may be referred to correspondingly to each other.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a text processing device according to an embodiment of the present application. As shown in fig. 6, the text processing apparatus may include:
an acquiring unit 11 for acquiring a text to be processed;
an object set determining unit 12, configured to determine an object set contained in the text to be processed;
A target object determining unit 13, configured to determine, for each object in the object set, an attribute corresponding to the object, and combine the attribute with the object to obtain a target object.
Optionally, the object set determining unit includes:
the feature determining unit is used for determining character level features of each character in the text to be processed and text level features of the text to be processed;
the character splicing unit is used for splicing each character in the text to be processed, and the character level characteristics of the character and the text level characteristics of the text to be processed to obtain characteristics after character splicing;
the feature recognition unit is used for recognizing the features of each character after the characters are spliced to obtain an object recognition result of each character;
And the object set determining subunit is used for determining the object set contained in the text to be processed based on the object recognition result of each character.
Optionally, the target object determining unit includes:
a dependency syntax relation acquisition unit, configured to acquire a dependency syntax relation between each character in the text to be processed;
The object attribute feature determining unit is used for determining object attribute features of the characters according to character level features of the characters, object recognition results of the characters and dependency syntactic relations among the characters in the text to be processed for each character in the text to be processed;
And the object attribute characteristic recognition unit is used for recognizing object attribute characteristics of each character in the text to be processed and determining the attribute corresponding to each object in the object set.
Optionally, the object attribute feature determining unit includes:
An object recognition feature determining unit, configured to generate an object recognition feature of the character according to a character level feature of the character and an object recognition result of the character;
and the object attribute feature determining subunit is used for determining the object attribute features of the characters according to the object identification features of the characters in the text to be processed and the dependency syntax relationship among the characters in the text to be processed.
Optionally, the text to be processed is plural, and the apparatus further includes:
And the object association unit is used for associating the same object with the target object corresponding to each text to be processed.
Optionally, the object association unit includes:
The target object determining unit is used for determining two target objects to be determined from target objects corresponding to the texts to be processed, wherein the two target objects to be determined are respectively contained in different texts to be processed;
The judging unit is used for judging whether the two target objects to be judged are matched or not; and if the two target objects to be determined are matched, determining that the two target objects to be determined are the same object.
Optionally, the judging unit is specifically configured to:
And processing the two target objects to be judged by using a matching judgment model to obtain judging results of whether the two target objects to be judged are matched, wherein the judging results are output by the matching judgment model, and the matching judgment model is obtained by taking a target object pair as a training sample and taking a judging result of whether the target object pair is matched or not as a sample label as training.
Optionally, the process of processing the two target objects to be determined by using a matching determination model to obtain a determination result of whether the two target objects to be determined output by the matching determination model match, includes:
comparing the two target objects to be judged by using a first matching judgment module of the matching judgment model to obtain a first matching judgment result;
Comparing the same object attributes in the two target objects to be judged by using a second matching judgment module of the matching judgment model to obtain a second matching judgment result;
And determining whether the two target objects to be determined are matched based on the first matching determination result and the second matching determination result by utilizing a comprehensive matching determination module of the matching determination model.
Referring to fig. 7, fig. 7 is a block diagram of a hardware structure of a text processing device according to an embodiment of the present application, and referring to fig. 7, the hardware structure of the text processing device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
In the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete the communication with each other through the communication bus 4;
The processor 1 may be a central processing unit CPU, or an Application-specific integrated Circuit ASIC (Application SPECIFIC INTEGRATED Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;
the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;
Wherein the memory stores a program, the processor is operable to invoke the program stored in the memory, the program operable to:
Acquiring a text to be processed;
determining an object set contained in the text to be processed;
And determining an attribute corresponding to each object in the object set, and combining the attribute with the object to obtain a target object.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
The embodiment of the present application also provides a readable storage medium storing a program adapted to be executed by a processor, the program being configured to:
Acquiring a text to be processed;
determining an object set contained in the text to be processed;
And determining an attribute corresponding to each object in the object set, and combining the attribute with the object to obtain a target object.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A text processing method, comprising:
Acquiring a text to be processed;
determining an object set contained in the text to be processed;
determining an attribute corresponding to each object in the object set, and combining the attribute with the object to obtain a target object;
wherein the determining the object set contained in the text to be processed includes:
determining character level characteristics of each character in the text to be processed and text level characteristics of the text to be processed; the character level features are semantic information of characters, and the text level features are semantic information of texts; each character in the text to be processed is spliced with the character level feature of the character and the text level feature of the text to be processed, so that the character spliced feature is obtained; identifying the characteristics of each character after the splicing to obtain an object identification result of each character; determining an object set contained in the text to be processed based on object recognition results of the characters;
the determining, for each object in the set of objects, an attribute corresponding to the object includes:
And determining object attribute characteristics of each character in the text to be processed according to character level characteristics of the characters, object recognition results of the characters and dependency syntactic relations among the characters in the text to be processed.
2. The method of claim 1, wherein the determining, for each object in the set of objects, an attribute corresponding to the object comprises:
And identifying object attribute characteristics of each character in the text to be processed, and determining the attribute corresponding to each object in the object set.
3. The method of claim 1, wherein the determining, for each character in the text to be processed, the object attribute characteristics of the character based on the character-level characteristics of the character, the object recognition results of the character, the dependency syntax relationship between the respective characters in the text to be processed, comprises:
Generating object recognition features of the characters according to the character level features of the characters and the object recognition results of the characters;
and determining object attribute characteristics of the characters according to the object identification characteristics of the characters in the text to be processed and the dependency syntax relationship among the characters in the text to be processed.
4. A method according to any one of claims 1 to 3, wherein the text to be processed is a plurality of, the method further comprising:
and carrying out association of the same object on the target object corresponding to each text to be processed.
5. The method according to claim 4, wherein the associating the target object corresponding to each text to be processed with the same object includes:
determining two target objects to be judged from target objects corresponding to each text to be processed, wherein the two target objects to be judged are respectively contained in different texts to be processed;
Judging whether the two target objects to be judged are matched or not;
And if the two target objects to be determined are matched, determining that the two target objects to be determined are the same object.
6. The method of claim 5, wherein said determining whether the two target objects to be determined match comprises:
And processing the two target objects to be judged by using a matching judgment model to obtain judging results of whether the two target objects to be judged are matched, wherein the judging results are output by the matching judgment model, and the matching judgment model is obtained by taking a target object pair as a training sample and taking a judging result of whether the target object pair is matched or not as a sample label as training.
7. The method according to claim 6, wherein the processing the two target objects to be determined using the matching determination model to obtain the determination result of whether the two target objects to be determined output by the matching determination model match, includes:
comparing the two target objects to be judged by using a first matching judgment module of the matching judgment model to obtain a first matching judgment result;
Comparing the same object attributes in the two target objects to be judged by using a second matching judgment module of the matching judgment model to obtain a second matching judgment result;
And determining whether the two target objects to be determined are matched based on the first matching determination result and the second matching determination result by utilizing a comprehensive matching determination module of the matching determination model.
8. A text processing apparatus, comprising:
The acquisition unit is used for acquiring the text to be processed;
An object set determining unit, configured to determine an object set included in the text to be processed;
a target object determining unit, configured to determine, for each object in the object set, an attribute corresponding to the object, and combine the attribute with the object to obtain a target object;
the object set determining unit is specifically configured to:
determining character level characteristics of each character in the text to be processed and text level characteristics of the text to be processed; the character level features are semantic information of characters, and the text level features are semantic information of texts; each character in the text to be processed is spliced with the character level feature of the character and the text level feature of the text to be processed, so that the character spliced feature is obtained; identifying the characteristics of each character after the splicing to obtain an object identification result of each character; determining an object set contained in the text to be processed based on object recognition results of the characters;
The target object determining unit is specifically configured to:
And determining object attribute characteristics of each character in the text to be processed according to character level characteristics of the characters, object recognition results of the characters and dependency syntactic relations among the characters in the text to be processed.
9. A text processing device comprising a memory and a processor;
The memory is used for storing programs;
The processor is configured to execute the program to implement the respective steps of the text processing method according to any one of claims 1 to 7.
10. A readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the text processing method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010656329.9A CN111814461B (en) | 2020-07-09 | 2020-07-09 | Text processing method, related equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010656329.9A CN111814461B (en) | 2020-07-09 | 2020-07-09 | Text processing method, related equipment and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111814461A CN111814461A (en) | 2020-10-23 |
CN111814461B true CN111814461B (en) | 2024-05-31 |
Family
ID=72843145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010656329.9A Active CN111814461B (en) | 2020-07-09 | 2020-07-09 | Text processing method, related equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111814461B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011159078A (en) * | 2010-01-29 | 2011-08-18 | Fujitsu Ltd | Information processing apparatus, determination program and determination method |
CN102866989A (en) * | 2012-08-30 | 2013-01-09 | 北京航空航天大学 | Viewpoint extracting method based on word dependence relationship |
CN109800414A (en) * | 2018-12-13 | 2019-05-24 | 科大讯飞股份有限公司 | Faulty wording corrects recommended method and system |
CN110069631A (en) * | 2019-04-08 | 2019-07-30 | 腾讯科技(深圳)有限公司 | A kind of text handling method, device and relevant device |
CN110210032A (en) * | 2019-05-31 | 2019-09-06 | 北京神州泰岳软件股份有限公司 | Text handling method and device |
CN110348012A (en) * | 2019-07-01 | 2019-10-18 | 北京明略软件系统有限公司 | Determine method, apparatus, storage medium and the electronic device of target character |
CN110532558A (en) * | 2019-08-29 | 2019-12-03 | 杭州涂鸦信息技术有限公司 | A kind of more intension recognizing methods and system based on the parsing of sentence structure deep layer |
CN110569500A (en) * | 2019-07-23 | 2019-12-13 | 平安国际智慧城市科技股份有限公司 | Text semantic recognition method and device, computer equipment and storage medium |
CN110598206A (en) * | 2019-08-13 | 2019-12-20 | 平安国际智慧城市科技股份有限公司 | Text semantic recognition method and device, computer equipment and storage medium |
CN110597082A (en) * | 2019-10-23 | 2019-12-20 | 北京声智科技有限公司 | Intelligent household equipment control method and device, computer equipment and storage medium |
CN110765235A (en) * | 2019-09-09 | 2020-02-07 | 深圳市人马互动科技有限公司 | Training data generation method and device, terminal and readable medium |
CN111128394A (en) * | 2020-03-26 | 2020-05-08 | 腾讯科技(深圳)有限公司 | Medical text semantic recognition method and device, electronic equipment and readable storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7016895B2 (en) * | 2002-07-05 | 2006-03-21 | Word Data Corp. | Text-classification system and method |
US8655803B2 (en) * | 2008-12-17 | 2014-02-18 | Xerox Corporation | Method of feature extraction from noisy documents |
US10515153B2 (en) * | 2013-05-16 | 2019-12-24 | Educational Testing Service | Systems and methods for automatically assessing constructed recommendations based on sentiment and specificity measures |
-
2020
- 2020-07-09 CN CN202010656329.9A patent/CN111814461B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011159078A (en) * | 2010-01-29 | 2011-08-18 | Fujitsu Ltd | Information processing apparatus, determination program and determination method |
CN102866989A (en) * | 2012-08-30 | 2013-01-09 | 北京航空航天大学 | Viewpoint extracting method based on word dependence relationship |
CN109800414A (en) * | 2018-12-13 | 2019-05-24 | 科大讯飞股份有限公司 | Faulty wording corrects recommended method and system |
CN110069631A (en) * | 2019-04-08 | 2019-07-30 | 腾讯科技(深圳)有限公司 | A kind of text handling method, device and relevant device |
CN110210032A (en) * | 2019-05-31 | 2019-09-06 | 北京神州泰岳软件股份有限公司 | Text handling method and device |
CN110348012A (en) * | 2019-07-01 | 2019-10-18 | 北京明略软件系统有限公司 | Determine method, apparatus, storage medium and the electronic device of target character |
CN110569500A (en) * | 2019-07-23 | 2019-12-13 | 平安国际智慧城市科技股份有限公司 | Text semantic recognition method and device, computer equipment and storage medium |
CN110598206A (en) * | 2019-08-13 | 2019-12-20 | 平安国际智慧城市科技股份有限公司 | Text semantic recognition method and device, computer equipment and storage medium |
CN110532558A (en) * | 2019-08-29 | 2019-12-03 | 杭州涂鸦信息技术有限公司 | A kind of more intension recognizing methods and system based on the parsing of sentence structure deep layer |
CN110765235A (en) * | 2019-09-09 | 2020-02-07 | 深圳市人马互动科技有限公司 | Training data generation method and device, terminal and readable medium |
CN110597082A (en) * | 2019-10-23 | 2019-12-20 | 北京声智科技有限公司 | Intelligent household equipment control method and device, computer equipment and storage medium |
CN111128394A (en) * | 2020-03-26 | 2020-05-08 | 腾讯科技(深圳)有限公司 | Medical text semantic recognition method and device, electronic equipment and readable storage medium |
Non-Patent Citations (2)
Title |
---|
Feature selection for text classification based on part of speech filter and synonym merge;Sijun Qin等;IEEE;全文 * |
融合多类特征的Web查询意图识别;伍大勇;赵世奇;刘挺;张宇;;模式识别与人工智能(03);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111814461A (en) | 2020-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109947909B (en) | Intelligent customer service response method, equipment, storage medium and device | |
CN111708869B (en) | Method and device for man-machine dialogue processing | |
US20210081611A1 (en) | Methods and systems for language-agnostic machine learning in natural language processing using feature extraction | |
CN110472027B (en) | Intent recognition method, apparatus, and computer-readable storage medium | |
CN110069609B (en) | Referee document analysis method, referee document analysis device, computer equipment and storage medium | |
CN107436922A (en) | Text label generation method and device | |
CN113051384B (en) | User portrait extraction method based on dialogue and related device | |
CN112036184A (en) | Entity identification method, device, computer device and storage medium based on BilSTM network model and CRF model | |
CN113158656B (en) | Ironic content recognition method, ironic content recognition device, electronic device, and storage medium | |
CN114328837B (en) | Sequence labeling method, device, computer equipment, and storage medium | |
CN114218945A (en) | Entity identification method, device, server and storage medium | |
CN110852071B (en) | Knowledge point detection method, device, equipment and readable storage medium | |
CN114661861A (en) | Text matching method and device, storage medium and terminal | |
CN110084105A (en) | Contract documents analysis method, device, computer equipment and storage medium | |
CN111259645A (en) | Referee document structuring method and device | |
CN113095083A (en) | Entity extraction method and device | |
CN114021004A (en) | Method, device, device and readable storage medium for recommending similar questions in science | |
CN118113852A (en) | Financial problem answering method, device, equipment, system, medium and product | |
CN117648618A (en) | Intention recognition method, device, electronic equipment and storage medium | |
CN114117041B (en) | Attribute-level emotion analysis method based on specific attribute word context modeling | |
CN111814461B (en) | Text processing method, related equipment and readable storage medium | |
CN111143515B (en) | Text matching method and device | |
CN115098629B (en) | File processing method, device, server and readable storage medium | |
WO2011013587A1 (en) | Document data processing device | |
CN116701604A (en) | Question and answer corpus construction method and device, question and answer method, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |