CN115809311B - Knowledge graph data processing method and device and computer equipment - Google Patents
Knowledge graph data processing method and device and computer equipment Download PDFInfo
- Publication number
- CN115809311B CN115809311B CN202211654507.XA CN202211654507A CN115809311B CN 115809311 B CN115809311 B CN 115809311B CN 202211654507 A CN202211654507 A CN 202211654507A CN 115809311 B CN115809311 B CN 115809311B
- Authority
- CN
- China
- Prior art keywords
- real
- information
- data
- time information
- knowledge graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 238000013507 mapping Methods 0.000 claims abstract description 76
- 238000000034 method Methods 0.000 claims abstract description 34
- 238000004590 computer program Methods 0.000 claims description 19
- 238000005192 partition Methods 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 18
- 230000004044 response Effects 0.000 claims description 12
- 238000003860 storage Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000008676 import Effects 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000002688 persistence Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 241000475481 Nebula Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013501 data transformation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 239000002994 raw material Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The disclosure relates to the technical field of knowledge maps, and particularly discloses a data processing method, a device and computer equipment of a knowledge map, wherein the method comprises the following steps: acquiring historical information of a data producer, and establishing a data file mapping table according to the historical information; importing the data file mapping table into a knowledge graph to form base data of the knowledge graph; acquiring real-time information of the data producer, and transmitting the updated real-time information to a message middleware under the condition that content update of the real-time information is determined according to the historical information; and updating the base data of the knowledge graph by using the message middleware. According to the method and the system, after the updated real-time information exists in the real-time information according to the historical information, the updated real-time information is imported into the knowledge graph through the message middleware, so that the knowledge graph can be updated in time, the stability of writing data into the knowledge graph is improved, and the stability of the knowledge graph providing service is further ensured.
Description
Technical Field
The disclosure relates to the technical field of knowledge graphs, and in particular relates to a data processing method and device of a knowledge graph and computer equipment.
Background
With the advent of the Internet and the large data age, the data generated by such interconnection has also exploded, and these data can be used as effective raw materials for analysis relations. Therefore, the knowledge graph is widely applied to the occasions requiring data mining, data analysis and the like.
In the related art, the data in the knowledge graph is often n+1 data, namely, the total data of the current day is imported when the flow rate is low at night, and the knowledge graph service is suspended during the importing, so that the instantaneity of the knowledge graph data and the stability of the service are greatly limited.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data processing method, apparatus, computer device, storage medium, and computer program product for a knowledge-graph.
In a first aspect, the present disclosure provides a data processing method of a knowledge graph. The method comprises the following steps:
Acquiring historical information of a data producer, and establishing a data file mapping table according to the historical information;
Importing the data file mapping table into a knowledge graph to form base data of the knowledge graph;
acquiring real-time information of the data producer, and transmitting the updated real-time information to a message middleware under the condition that content update of the real-time information is determined according to the historical information;
and updating the base data of the knowledge graph by using the message middleware.
In one embodiment, the acquiring the real-time information of the data producer, and in a case of determining the content update of the real-time information according to the history information, sending the updated real-time information to the message middleware includes:
judging whether the information abstract of the real-time information is consistent with the information abstract of the historical information in the data file mapping table;
and determining that the real-time information inconsistent with the information abstract is the updated real-time information in response to the fact that the information abstract of the real-time information is inconsistent with the information abstract of the historical information in the data file mapping table.
In one embodiment, the method further comprises:
And updating the data file mapping table according to the updated real-time information in response to the inconsistent information abstracts of the real-time information and the historical information in the data file mapping table.
In one embodiment, the message middleware includes a plurality of partitions, the acquiring the real-time information of the data producer, and in a case that a content update of the real-time information is determined according to the history information, sending the updated real-time information to the message middleware further includes:
and sending the updated real-time information to the partition corresponding to the message middleware according to the key field of the updated real-time information.
In one embodiment, the updating the base data of the knowledge-graph using the message-middleware includes:
determining an updated write flow threshold of the knowledge graph according to the data read flow of the knowledge graph;
And updating the base data of the knowledge graph according to the updated writing flow threshold.
In one embodiment, importing the data file mapping table into a knowledge graph to form the base data of the knowledge graph includes:
And importing the whole data file mapping table into a database of the knowledge graph through a distributed computing engine.
In a second aspect, the present disclosure further provides a data processing apparatus for a knowledge graph. The device comprises:
the historical data module is used for acquiring historical information of a data producer and establishing a data file mapping table according to the historical information;
the mapping table importing module is used for importing the data file mapping table into a knowledge graph to form base data of the knowledge graph;
the real-time data module is used for acquiring the real-time information of the data producer and sending the updated real-time information to the message middleware under the condition that the content of the real-time information is updated according to the historical information;
And the knowledge graph updating module is used for updating the base data of the knowledge graph by using the message middleware.
In one embodiment, the real-time data module comprises:
the information summarization unit is used for judging whether the information summarization of the real-time information is consistent with the information summarization of the historical information in the data file mapping table;
and the updating determining unit is used for determining that the real-time information inconsistent with the information abstract is the updated real-time information in response to the fact that the information abstract of the real-time information is inconsistent with the information abstract of the historical information in the data file mapping table.
In one embodiment, the apparatus further comprises:
And the mapping table updating module is used for updating the data file mapping table according to the updated real-time information in response to inconsistent information abstracts of the real-time information and historical information in the data file mapping table.
In one embodiment, the message middleware includes a plurality of partitions, and the update sending unit is further configured to send the updated real-time information to the partition corresponding to the message middleware according to the key field of the updated real-time information.
In one embodiment, the knowledge-graph updating module includes:
the updating and writing flow threshold unit is used for determining the updating and writing flow threshold of the knowledge graph according to the data reading flow of the knowledge graph;
and the updating and writing unit is used for updating the base data of the knowledge graph according to the updating and writing flow threshold.
In one embodiment, the mapping table importing module includes:
And the calculation engine unit is used for importing the whole data file mapping table into the database of the knowledge graph through a distributed calculation engine.
In a third aspect, the present disclosure also provides a computer device. The computer device comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the data processing method of the knowledge graph when executing the computer program.
In a fourth aspect, the present disclosure also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the data processing method of knowledge-graph described above.
In a fifth aspect, the present disclosure also provides a computer program product. The computer program product comprises a computer program which, when being executed by a processor, implements the steps of the data processing method of the knowledge graph described above.
The data processing method, the device, the computer equipment, the storage medium and the computer program product of the knowledge graph at least comprise the following beneficial effects:
According to the method and the device, the data is imported into the knowledge graph through the data file mapping table, so that the data disorder probability caused by direct writing of the data is reduced, and the data is analyzed according to the data file mapping table in the follow-up process; in addition, after the updated real-time information exists in the real-time information according to the historical information, the updated real-time information is imported into the knowledge graph through the message middleware, so that the knowledge graph can be updated in time, the stability of writing data into the knowledge graph is improved, and the stability of the knowledge graph providing service is further ensured; meanwhile, the updated real-time information is written into the message middleware, so that unnecessary business logic can be supported to run in an asynchronous mode, the response speed is increased, the message middleware plays a role in buffering under the condition of large writing concurrency, the message middleware can gradually import information into a knowledge graph, abnormal connection with a database is avoided, in addition, the message middleware realizes decoupling of data production and data consumption, and data writing into the message middleware and data reading from the message middleware are not interfered with each other; and the method is beneficial to persistence of the real-time information through the message-entering middleware, and is convenient for subsequent investigation and analysis.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments or the conventional techniques of the present disclosure, the drawings required for the descriptions of the embodiments or the conventional techniques will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to the drawings without inventive effort to those of ordinary skill in the art.
FIG. 1 is an application environment diagram of a data processing method of a knowledge graph in one embodiment;
FIG. 2 is a flow chart of a method for processing knowledge-graph data in one embodiment;
FIG. 3 is a flow chart of a method for processing knowledge-graph data in another embodiment;
FIG. 4 is a data flow diagram of a data processing method of a knowledge graph in one embodiment;
FIG. 5 is a flow chart of a method for processing knowledge-graph data in another embodiment;
FIG. 6 is a block diagram of a knowledge-graph data processing apparatus in one embodiment;
FIG. 7 is a block diagram of a knowledge-graph data processing apparatus according to another embodiment;
FIG. 8 is a block diagram of a knowledge-graph data processing apparatus in another embodiment;
FIG. 9 is a block diagram of the internal architecture of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element. For example, if first, second, etc. words are used to indicate a name, but not any particular order.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," and/or the like, specify the presence of stated features, integers, steps, operations, elements, components, or groups thereof, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or groups thereof. Also, in this specification, the term "and/or" includes any and all combinations of the associated listed items.
The data processing method of the knowledge graph provided by the embodiment of the disclosure can be applied to an application environment as shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The server 104 includes a knowledge-graph database, and the server 104 provides knowledge-graph services to the terminal 102. Server 104 may obtain data generated by a public data source and a business system, which may be a Hadoop distributed file system (i.e., HDFS). The server 104 can acquire the data generated by the public data source and the service system in real time, screen out the updated data of the real-time data for changing, and timely import the updated data into the knowledge graph database. The knowledge-graph database may be integrated on the server 104 or may be located on the cloud or other network server. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In some embodiments of the present disclosure, as shown in fig. 2, a data processing method of a knowledge graph is provided, and an example of application of the method 200 to the server in fig. 1 is described, including the following steps:
step 210, acquiring historical information of a data producer, and processing data according to the knowledge graph of the historical information.
The data producer may include public data sources and service systems, and the public data sources may refer to public websites, public databases and other data sources that are open to the public. A business system may refer to a system that implements interactions between a business provider and a user for completing a business link required for a particular task.
The server may illustratively obtain historical information generated by the public data source and the business system via the data interface and store the historical information in the data file mapping table. The data file mapping table may be built based on hive, which generally refers to a data warehouse tool based on Hadoop for data extraction, transformation, and loading, which is a mechanism that may store, query, and analyze large-scale data stored in Hadoop. The hive data warehouse tool can map structured data files into a hive table (i.e., a data file mapping table) and provide SQL (Structured Query Language ) query functions.
Step 220, importing the data file mapping table into a knowledge graph to form base data of the knowledge graph.
The knowledge graph may be a knowledge graph constructed based on requirements of the service system, and is used for providing knowledge graph service for the service system. The base data can be processed to form a knowledge graph through information extraction, knowledge fusion, knowledge processing and the like.
Illustratively, the established data file mapping table is imported into a knowledge-graph database according to the pattern of the knowledge-graph. The mode of the knowledge graph can be defined by definitely defining the entity, attribute and relation in the knowledge graph, and define the feasible category of the knowledge graph. Alternatively, the knowledge graph database may be a Nebula database, which is a distributed and extensible graph database.
Optionally, before the data file mapping table is imported into the knowledge graph, data cleaning may be performed by means of HQL (hive sql), so that the data in the data file mapping table satisfies the data format required by the knowledge graph. The data cleansing method may include, but is not limited to, missing value padding, numerical replacement, data type conversion, data sorting, repeated value processing, etc., with the cleansed data results directly affecting the results of the final data analysis.
And step 230, acquiring real-time information of the data producer, and sending the updated real-time information to a message middleware under the condition that content update of the real-time information is determined according to the historical information.
For example, the server may monitor the real-time information generated by the data producer in real time, and determine in combination with the history information that updated real-time information exists. The server may also send the determined updated real-time information to the message middleware.
Alternatively, the server may monitor real-time information generated by the data producer using multiple means, such as Redis or human triggering. Redis (Remote Dictionary Server), a remote dictionary service, is a high-performance key-value database, and Redis supports publish/subscribe mechanisms that subscribe to a channel and receive complete message publish records from the data producer. Message middleware may choose Kafka, which is a high throughput distributed publish-subscribe message system.
Alternatively, in the step of sending the updated real-time information to the message middleware, the server may send the changed data field in the real-time information or the entire set of data objects of the real-time information to the message middleware.
And step 240, updating the base data of the knowledge graph by using the message middleware.
The server may update the base data of the knowledge graph further according to the updated real-time information in the message middleware after sending the updated real-time information to the message middleware, so as to ensure the real-time performance of the knowledge graph.
In the data processing method of the knowledge graph, the data is imported into the knowledge graph through the data file mapping table, so that the data disorder probability caused by direct writing of the data is reduced, and the subsequent analysis of the data according to the data file mapping table is facilitated; in addition, after the updated real-time information exists in the real-time information according to the historical information, the updated real-time information is imported into the knowledge graph through the message middleware, so that the knowledge graph can be updated in time, the stability of writing data into the knowledge graph is improved, and the stability of the knowledge graph providing service is further ensured; meanwhile, the updated real-time information is written into the message middleware, so that unnecessary business logic can be supported to run in an asynchronous mode, the response speed is increased, the message middleware plays a role in buffering under the condition of large writing concurrency, the message middleware can gradually import information into a knowledge graph, abnormal connection with a database is avoided, in addition, the message middleware realizes decoupling of data production and data consumption, and data writing into the message middleware and data reading from the message middleware are not interfered with each other; and the method is beneficial to persistence of the real-time information through the message-entering middleware, and is convenient for subsequent investigation and analysis.
In some embodiments of the present disclosure, as shown in fig. 3, step 230 includes:
Step 232, determining whether the information abstract of the real-time information is consistent with the information abstract of the history information in the data file mapping table.
For example, the server may parse the information digest of the obtained real-time information/history information and determine whether the information digest of the real-time information and the information digest of the history information are identical. Alternatively, the Message Digest for parsing the real-time/history information may be performed by an MD5 Algorithm, MD5 (MD 5 Message-Digest Algorithm), which is generally referred to as a cryptographic hash function, to generate a 128-bit (16-byte) hash value (i.e., message Digest). The hash value (i.e., the message digest) is different, indicating that the data itself has changed.
And step 234, in response to the information summary of the real-time information being inconsistent with the information summary of the history information in the data file mapping table, determining that the real-time information inconsistent with the information summary is the updated real-time information.
By way of example, in connection with the data flow chart of the data processing method of the knowledge graph provided in the embodiment shown in fig. 4, when the server determines that the information abstract of the real-time information is inconsistent with the information abstract of the history information in the data file mapping table, the server may determine that the real-time information inconsistent with the information abstract is updated real-time information, and trigger an action of sending the updated real-time information to the message middleware.
According to the embodiment, whether the information abstract of the real-time information is consistent with the information abstract of the historical information is judged, so that the real-time information inconsistent with the information abstract is determined to be updated real-time information, and the updated real-time information can be determined more conveniently and efficiently.
In some embodiments of the present disclosure, the method further comprises:
And updating the data file mapping table according to the updated real-time information in response to the inconsistent information abstracts of the real-time information and the historical information in the data file mapping table.
The server also triggers updating of the historical information in the data file mapping table according to the real-time information when judging that the information abstract of the real-time information is inconsistent with the information abstract of the historical information in the data file mapping table, so as to maintain the instantaneity of the data file mapping table. It should be noted that, in the step of repeatedly determining whether the information abstract of the real-time information is consistent with the information abstract of the history information in the data file mapping table, the data file mapping table is a real-time updated data file mapping table.
Optionally, when the server sends the updated real-time information to the message middleware, the updated real-time information may be synchronously written into the data file mapping table for updating. The data file mapping table may support HDFS system read lookups to analyze real-time information.
According to the embodiment, the data file mapping table is updated in time according to the updating condition of the real-time information, the real-time information is stored through the data file mapping table and is read and referred by an HDFS system, so that the data correction is performed by combining the whole data object set when the real-time information is imported into the knowledge graph, and the data disorder probability caused by the real-time writing of the data and the follow-up data tracking analysis are reduced.
In some embodiments of the present disclosure, step 230 further comprises:
and sending the updated real-time information to the partition corresponding to the message middleware according to the key field of the updated real-time information.
The message middleware may include a plurality of partitions, and the server may send the updated real-time information to the partition corresponding to the message middleware according to a preset key field rule and the key field of the updated real-time information, so that the same piece of data updated real-time information may enter the same partition. Alternatively, the key fields may be selected based on the characteristics, timing, of the data.
According to the embodiment, updated real-time information is sent to the partition corresponding to the message middleware according to the key field, so that the same piece of data updated real-time information can enter the same partition, data management is enhanced, and the knowledge graph writing efficiency is improved.
In some embodiments of the present disclosure, as shown in fig. 5, step 240 includes:
And step 242, determining an updated write flow threshold of the knowledge graph according to the data read flow of the knowledge graph.
The server obtains the real-time data reading flow of the knowledge graph after writing the updated real-time information in the message middleware, and determines the updated writing flow threshold of the knowledge graph according to the real-time data reading flow of the knowledge graph. The updating of the write flow threshold can ensure that the data reading flow of the knowledge graph is not affected, and further ensure the stability of the knowledge graph providing service.
And step 244, updating the base data of the knowledge graph according to the updated write flow threshold.
The update write flow threshold may be changed according to the real-time data read flow of the knowledge graph, and the server writes the updated real-time information in the message middleware in the knowledge graph database according to the update write flow threshold determined in real time without exceeding the update write flow threshold.
According to the embodiment, the updated writing flow threshold value of the knowledge graph is determined through the data reading flow of the knowledge graph, and under the condition that the updated writing flow threshold value is not exceeded, the updated real-time information in the message middleware is written in the knowledge graph database, so that the stability of the knowledge graph providing service is ensured, the updated real-time information can be written in the knowledge graph in time, and the instantaneity of the knowledge graph is improved.
In some embodiments of the present disclosure, step 220 comprises:
And importing the whole data file mapping table into a database of the knowledge graph through a distributed computing engine.
Illustratively, the server may be imported by the distributed computing engine during importing the data file mapping table into the knowledge-graph. Alternatively, the distributed computing engine may choose Spark, which is an open source clustered computing environment similar to Hadoop, that enables in-memory distributed data sets that optimize iterative workload in addition to being able to provide interactive queries.
According to the embodiment, the Spark distributed computing engine is used for importing the whole data file mapping table into the database of the knowledge graph, so that the whole data file mapping table with large data volume can be imported, and the importing efficiency is improved.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the disclosure also provides a knowledge graph data processing device for implementing the data processing method of the related knowledge graph. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the data processing device for one or more knowledge maps provided below may refer to the limitation of the data processing method for a knowledge map hereinabove, and will not be repeated herein.
In some embodiments of the present disclosure, as shown in fig. 6, a data processing apparatus of a knowledge-graph is provided. The apparatus 700 includes:
A history data module 710, configured to obtain history information of a data producer, and establish a data file mapping table according to the history information;
the mapping table importing module 720 is configured to import the data file mapping table into a knowledge graph to form base data of the knowledge graph;
a real-time data module 730, configured to obtain real-time information of the data producer, and send updated real-time information to a message middleware when determining content update of the real-time information according to the history information;
And a knowledge graph updating module 740, configured to update the base data of the knowledge graph by using the message middleware.
In some embodiments of the present disclosure, as shown in fig. 7, the real-time data module 730 includes:
a message digest unit 732, configured to determine whether the message digest of the real-time information is consistent with the message digest of the history information in the data file mapping table;
And an update determining unit 734, configured to determine, in response to the information summary of the real-time information being inconsistent with the information summary of the history information in the data file mapping table, that the real-time information inconsistent with the information summary is the updated real-time information.
In some embodiments of the present disclosure, the apparatus further comprises:
And the mapping table updating module is used for updating the data file mapping table according to the updated real-time information in response to inconsistent information abstracts of the real-time information and historical information in the data file mapping table.
In some embodiments of the present disclosure, the message middleware includes a plurality of partitions, and the update sending unit 738 is further configured to send the updated real-time information to the partition corresponding to the message middleware according to the key field of the updated real-time information.
In some embodiments of the present disclosure, as shown in fig. 8, the knowledge-graph updating module 740 includes:
An update write flow threshold unit 742, configured to determine an update write flow threshold of the knowledge graph according to the data read flow of the knowledge graph;
the update writing unit 744 is configured to update the base data of the knowledge graph according to the update writing flow threshold.
In one embodiment, the mapping table importing module includes:
And the calculation engine unit is used for importing the whole data file mapping table into the database of the knowledge graph through a distributed calculation engine.
The modules in the data processing device of the knowledge graph can be realized in whole or in part by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules. It should be noted that, in the embodiment of the present disclosure, the division of the modules is merely a logic function division, and other division manners may be implemented in actual practice.
In another embodiment provided by the present disclosure, a computer device is provided, which may be a server, and an internal structure diagram thereof may be as shown in the figure. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing base data of the knowledge graph. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a data processing method of a knowledge graph.
It will be appreciated by persons skilled in the art that the architecture shown in fig. 9 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In another embodiment provided in the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor, implements the steps of the method embodiments described above.
In another embodiment provided in the present disclosure, a computer program product is provided, which includes a computer program that implements the steps of the method embodiments described above when executed by a processor.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
In the description of the present specification, reference to the terms "some embodiments," "other embodiments," "desired embodiments," and the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic descriptions of the above terms do not necessarily refer to the same embodiment or example.
It should be understood that, in the present specification, each embodiment of the method is described in a progressive manner, and the same/similar parts of each embodiment are referred to each other, where each embodiment focuses on a difference from other embodiments. For relevance, reference should be made to the description of other method embodiments.
The technical features of the above embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present disclosure, which are described in more detail and are not to be construed as limiting the scope of the claims. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the disclosure, which are within the scope of the disclosure. Accordingly, the scope of protection of the present disclosure should be determined by the following claims.
Claims (10)
1. A data processing method of a knowledge graph, the method comprising:
Acquiring historical information of a data producer, and establishing a data file mapping table according to the historical information;
Importing the data file mapping table into a knowledge graph to form base data of the knowledge graph;
acquiring real-time information of the data producer, and transmitting the updated real-time information to a message middleware under the condition that content update of the real-time information is determined according to the historical information;
updating the base data of the knowledge graph by using the message middleware;
The acquiring the real-time information of the data producer, and in the case of determining the content update of the real-time information according to the history information, sending the updated real-time information to the message middleware comprises:
Judging whether the information abstract of the real-time information is consistent with the information abstract of the historical information in the data file mapping table; the information abstract of the real-time information and the information abstract of the historical information are obtained through an MD5 algorithm;
And responding to inconsistent information abstracts of the real-time information and historical information in the data file mapping table, determining that the inconsistent information abstracts of the real-time information is the updated real-time information, and updating the data file mapping table according to the updated real-time information.
2. The method of claim 1, wherein the message middleware comprises a plurality of partitions, wherein the acquiring the real-time information of the data producer, and wherein in the case of determining the content update of the real-time information based on the history information, transmitting the updated real-time information to the message middleware further comprises:
and sending the updated real-time information to the partition corresponding to the message middleware according to the key field of the updated real-time information.
3. The method of claim 1, wherein the updating the base data of the knowledge-graph with the message-middleware comprises:
determining an updated write flow threshold of the knowledge graph according to the data read flow of the knowledge graph;
And updating the base data of the knowledge graph according to the updated writing flow threshold.
4. The method of claim 1, wherein importing the data file mapping table into a knowledge-base data of a knowledge-base to form the knowledge-base comprises:
And importing the whole data file mapping table into a database of the knowledge graph through a distributed computing engine.
5. A data processing apparatus for knowledge-graph, the apparatus comprising:
the historical data module is used for acquiring historical information of a data producer and establishing a data file mapping table according to the historical information;
the mapping table importing module is used for importing the data file mapping table into a knowledge graph to form base data of the knowledge graph;
The real-time data module is used for acquiring the real-time information of the data producer and sending the updated real-time information to the message middleware under the condition that the content of the real-time information is updated according to the historical information; the real-time data module comprises: the information summarization unit is used for judging whether the information summarization of the real-time information is consistent with the information summarization of the historical information in the data file mapping table; an update determining unit, configured to determine, in response to the information digest of the real-time information being inconsistent with the information digest of the history information in the data file mapping table, that the real-time information in which the information digest is inconsistent is the updated real-time information;
The mapping table updating module is used for updating the data file mapping table according to the updated real-time information in response to inconsistent information abstracts of the real-time information and historical information in the data file mapping table;
And the knowledge graph updating module is used for updating the base data of the knowledge graph by using the message middleware.
6. The apparatus according to claim 5, wherein the message middleware includes a plurality of partitions, and the update sending unit is further configured to send the updated real-time information to the partition corresponding to the message middleware according to a key field of the updated real-time information.
7. The apparatus according to claim 5, wherein the message middleware includes a plurality of partitions, and the update sending unit is further configured to send the updated real-time information to the partition corresponding to the message middleware according to a key field of the updated real-time information.
8. The apparatus of claim 5, wherein the map import module comprises:
And the calculation engine unit is used for importing the whole data file mapping table into the database of the knowledge graph through a distributed calculation engine.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 4 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 4.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211654507.XA CN115809311B (en) | 2022-12-22 | 2022-12-22 | Knowledge graph data processing method and device and computer equipment |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211654507.XA CN115809311B (en) | 2022-12-22 | 2022-12-22 | Knowledge graph data processing method and device and computer equipment |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN115809311A CN115809311A (en) | 2023-03-17 |
| CN115809311B true CN115809311B (en) | 2024-08-16 |
Family
ID=85486761
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211654507.XA Active CN115809311B (en) | 2022-12-22 | 2022-12-22 | Knowledge graph data processing method and device and computer equipment |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115809311B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116932779B (en) * | 2023-08-14 | 2024-03-12 | 企查查科技股份有限公司 | Knowledge graph data processing method and device |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114153986A (en) * | 2021-11-29 | 2022-03-08 | 北京达佳互联信息技术有限公司 | A knowledge graph construction method, device, electronic device and storage medium |
| CN114328981A (en) * | 2022-03-14 | 2022-04-12 | 中国电子科技集团公司第二十八研究所 | Knowledge graph establishing and data obtaining method and device based on mode mapping |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110929038B (en) * | 2019-10-18 | 2023-07-21 | 平安科技(深圳)有限公司 | Knowledge graph-based entity linking method, device, equipment and storage medium |
| CN112948566B (en) * | 2021-04-21 | 2024-02-02 | 华东理工大学 | Construction method and device of chemical knowledge graph and intelligent question-answering method and device |
| CN113626616B (en) * | 2021-08-25 | 2024-03-12 | 中国电子科技集团公司第三十六研究所 | Aircraft safety early warning method, device and system |
| CN114238654B (en) * | 2021-12-15 | 2024-10-29 | 科大讯飞股份有限公司 | Knowledge graph construction method and device and computer readable storage medium |
| CN114385833B (en) * | 2022-03-23 | 2023-05-12 | 支付宝(杭州)信息技术有限公司 | Method and device for updating knowledge graph |
| CN115455935B (en) * | 2022-09-14 | 2025-08-12 | 华东师范大学 | Text information intelligent processing system |
-
2022
- 2022-12-22 CN CN202211654507.XA patent/CN115809311B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114153986A (en) * | 2021-11-29 | 2022-03-08 | 北京达佳互联信息技术有限公司 | A knowledge graph construction method, device, electronic device and storage medium |
| CN114328981A (en) * | 2022-03-14 | 2022-04-12 | 中国电子科技集团公司第二十八研究所 | Knowledge graph establishing and data obtaining method and device based on mode mapping |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115809311A (en) | 2023-03-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20220156289A1 (en) | Generating a multi-column index for relational databases by interleaving data bits for selectivity | |
| US20240078229A1 (en) | Generating, accessing, and displaying lineage metadata | |
| CN111209352B (en) | Data processing method and device, electronic equipment and storage medium | |
| US9747349B2 (en) | System and method for distributing queries to a group of databases and expediting data access | |
| US8719254B2 (en) | Efficient querying using on-demand indexing of monitoring tables | |
| AU2017202873A1 (en) | Efficient query processing using histograms in a columnar database | |
| CN113609374A (en) | Data processing method, device and equipment based on content push and storage medium | |
| CN115269877A (en) | Method, system and equipment for constructing domain entity and event double-center knowledge graph | |
| CN115658680A (en) | Data storage method, data query method and related device | |
| CN115809311B (en) | Knowledge graph data processing method and device and computer equipment | |
| CN117453810A (en) | Heterogeneous data processing method, heterogeneous data processing device, computer equipment and storage medium | |
| CN114218211A (en) | Data processing system, method, computer device, and readable storage medium | |
| CN113778996A (en) | Large data stream data processing method and device, electronic equipment and storage medium | |
| CN107430633B (en) | System and method for data storage and computer readable medium | |
| CN114490720B (en) | Killing method, device, computer equipment and storage medium | |
| CN114092064A (en) | Power big data processing batch system and method, device, equipment and medium | |
| US10558647B1 (en) | High performance data aggregations | |
| CN117909550B (en) | Query method, device, computer equipment and storage medium | |
| CN116932779B (en) | Knowledge graph data processing method and device | |
| CN117931747A (en) | Metadata management method, device, system and equipment for data marts | |
| CN115048059B (en) | Data processing method and device | |
| CN117194524A (en) | Offline index data processing method, device, equipment and storage medium | |
| CN119201992A (en) | Business data query method, device, equipment, storage medium and program product | |
| CN117807080A (en) | Text data processing method, apparatus, computer device and storage medium | |
| CN119646033A (en) | Method, apparatus, computer device, computer readable storage medium and computer program product for executing optimization strategy |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| CB02 | Change of applicant information | ||
| CB02 | Change of applicant information |
Country or region after: China Address after: No. 8 Huizhi Street, Suzhou Industrial Park, Suzhou Area, China (Jiangsu) Pilot Free Trade Zone, Suzhou City, Jiangsu Province, 215000 Applicant after: Qichacha Technology Co.,Ltd. Address before: Room 503, 5 / F, C1 building, 88 Dongchang Road, Suzhou Industrial Park, 215000, Jiangsu Province Applicant before: Qicha Technology Co.,Ltd. Country or region before: China |
|
| GR01 | Patent grant | ||
| GR01 | Patent grant |