CN115729993A - Multi-source heterogeneous data unified management method and system based on metadata - Google Patents
Multi-source heterogeneous data unified management method and system based on metadata Download PDFInfo
- Publication number
- CN115729993A CN115729993A CN202211328572.3A CN202211328572A CN115729993A CN 115729993 A CN115729993 A CN 115729993A CN 202211328572 A CN202211328572 A CN 202211328572A CN 115729993 A CN115729993 A CN 115729993A
- Authority
- CN
- China
- Prior art keywords
- data
- metadata
- user
- directory
- source heterogeneous
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a multi-source heterogeneous data unified management method and a system based on metadata, wherein the method comprises the following steps: responding to the operation that a data production user uploads metadata of multi-source heterogeneous data, and converting the metadata into an operating state; the metadata are data obtained by a data production user depicting multi-source heterogeneous data based on preset metadata definition rules; configuring corresponding authorities for users with different roles; responding to the compiling operation of a platform management user and/or a data production user on the data catalog, establishing the association between the data catalog and the data object according to the compiling operation, and forming a classified and layered data catalog; and returning the corresponding data attribute information and the content information to the data consumption user. According to the method, the standardized and visual data management is formed by uniformly depicting the metadata and packaging the data catalog on the multi-source heterogeneous data, and the requirements of a user on data cognition and data searching are met.
Description
Technical Field
The invention belongs to the technical field of data management, and particularly relates to a multi-source heterogeneous data unified management method and system based on metadata.
Background
The current common data categories are primarily related to family-type data, file-type data, and message-type data. The relational data refers to data which are managed through a relational database and are displayed in a data structure in a two-dimensional logic table form; the file type data is a form in which content information is stored in a file object of a specific format. Messaging data refers to data produced, exchanged, and used by MQ/Kafka, etc. message handling middleware. At present, the three types of heterogeneous data are not fully used, the processing and processing capacity of the data usually only stays in the same type of data, and the analysis, comparison and other work among different types of data usually depends on manual work. For example, the data content is delivered in the form of a middle table of a given relational database by means of advance agreement between both parties, that is, flexible expansion of service requirements cannot be adapted, and the data use efficiency is low. For the file type and the relational data, whether the file object is directly stored in the set position of the server or the message object is directly subscribed or consumed by means of the mainstream message middleware without locally storing the message content is only black box type management, and the specific information of the data management object cannot be effectively mastered.
Therefore, it is desirable to provide a method capable of uniformly managing multi-source heterogeneous data to solve the above problems.
Disclosure of Invention
The invention aims to provide a metadata-based multi-source heterogeneous data unified management method and a metadata-based multi-source heterogeneous data unified management system, which are used for solving the technical problems that the use of multi-source heterogeneous data is not fully merged and the use efficiency of data is low in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
the first aspect provides a metadata-based unified management method for multi-source heterogeneous data, which includes:
responding to the operation that a data production user uploads metadata of multi-source heterogeneous data, receiving, auditing and verifying the metadata, and converting the metadata to an operating state after the verification is passed; the metadata are data obtained by a data production user describing multi-source heterogeneous data based on preset metadata definition rules, and the multi-source heterogeneous data at least comprise relational data, file type data or message type data;
configuring corresponding authorities for users with different roles so that each role user can perform corresponding operation on the data directory based on the authority of the role user; the users with different roles comprise a data production user, a platform management user and a data consumption user;
responding to the compiling operation of a platform management user and/or a data production user on the data catalog, establishing the association between the data catalog and the data object according to the compiling operation, and forming a classified and layered data catalog; wherein the data object is an instantiated metadata object;
and responding to the information retrieval operation of the data consuming user on the data directory, and returning corresponding data attribute information and content information to the data consuming user.
In one possible design, the metadata includes new-added data and stock data;
when the metadata is newly added data, before the operation of uploading the metadata of the multi-source heterogeneous data by a response data production user, the method further comprises the following steps: receiving a data production request of a data production user, generating a data demand sheet and associating a corresponding batch number;
when the metadata data is stock data, before the operation of uploading the metadata of the multi-source heterogeneous data by a response data production user, the method further comprises the following steps:
and analyzing the basic information and content elements to be described of the stock data to form a description template and issuing the description template.
In one possible design, the preset metadata definition rule defines metadata at least based on the dimensions of the category, the position, the content and the resolution parameter; and adopting a predefined mode for the definition of the analysis parameters to constrain the extended definition of the content elements.
In one possible design, receiving, reviewing, and verifying the metadata, and transforming the metadata to a runtime state after verification passes, includes:
receiving the metadata, verifying the compliance and consistency of the basic information and the content elements defined in the metadata, and entering version release verification after the verification is passed;
archiving metadata in the version by using the version number, and converting the metadata into a running state; wherein the version number comprises at least one batch number;
when the metadata is stock data, after archiving the metadata in the version by using the version number, the method further comprises the following steps:
the metadata design table and the release history table are updated.
In one possible design, configuring corresponding permissions for users in different roles so that each role user performs corresponding operations on the data directory based on the own permission, includes:
at least configuring the authority of catalog configuration management, catalog classification, catalog audit and catalog expansion audit for a platform management user so that the platform management user can plan a data catalog, establish and maintain a top catalog, audit the data catalog and expand the audit the data catalog based on the authority of the platform management user;
at least configuring the authority of catalogue compilation for the data production user so that the data production user can perform data catalogue expansion compilation based on the own authority;
and configuring at least the authority of directory retrieval for the data consuming users, so that the data consuming users can inquire the data directory information based on the own authority.
In one possible design, the data production user performs extended compilation on the structure of the data directory and the mounting of the data objects based on preset classification rules, classification brevity codes and directory coding rules.
In one possible design, the associating of the data catalog with the data object according to the compilation operation includes:
and acquiring label information of the data object under the data directory depicted by the data production user, and establishing association between the data directory and the data object under the corresponding data source according to the label information.
In one possible design, after forming the cataloged hierarchical data catalog, the method further comprises:
and issuing the compiled data directory, and setting the open mode, range and time of the data directory.
In one possible design, an information retrieval operation for a data catalog in response to a data consuming user includes:
and responding the information retrieval operation of the data consumption user based on the uniform query window so as to allow the data consumption user to perform information query based on the data directory structure and the data object.
A second aspect provides a metadata-based unified management system for multi-source heterogeneous data, including:
the metadata release module is used for responding to the operation that a data production user uploads metadata of multi-source heterogeneous data, receiving, auditing and verifying the metadata, and converting the metadata to an operating state after the verification is passed; the metadata is data obtained by a data production user describing multi-source heterogeneous data based on preset metadata definition rules, and the multi-source heterogeneous data at least comprises relational data, file type data or message type data;
the authority configuration module is used for configuring corresponding authorities for users with different roles so that each role user can perform corresponding operation on the data directory based on the authority of the role user; the users with different roles comprise a data production user, a platform management user and a data consumption user;
the data catalog issuing module is used for responding to the compiling operation of a platform management user and/or a data production user on the data catalog, establishing the association between the data catalog and the data object and the data source storage information according to the compiling operation and forming a classified and layered data catalog; wherein the data object is an instantiated metadata object;
and the data query module is used for responding to the information retrieval operation of the data consuming user on the data directory and returning corresponding data attribute information and content information to the data consuming user.
In a third aspect, the present invention provides a computer device, including a memory, a processor, and a transceiver, which are sequentially connected in communication, where the memory is used to store a computer program, the transceiver is used to send and receive a message, and the processor is used to read the computer program and execute the metadata-based multi-source heterogeneous data unified management method as described in any one of the possible designs of the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium, which stores instructions for executing the metadata-based multi-source heterogeneous data unified management method as described in any one of the possible designs of the first aspect when the instructions are executed on a computer.
In a fifth aspect, the present invention provides a computer program product comprising instructions that, when executed on a computer, cause the computer to perform a metadata-based multi-source heterogeneous data unified management method as described in any one of the possible designs of the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
the method is based on the preset metadata definition rule, and carries out unified metadata portrayal and management on multi-source heterogeneous data so as to be suitable for the business data of various data formats under various file type and/or message type data sources; defining data characteristics by respectively providing corresponding metadata defining modes for the newly added data and the stock data to form standardized data management; forming classified and layered data assets based on a metadata and data directory abstract packaging method, realizing visual management of data, and meeting the requirements of users on data cognition and data searching; after the file type or message type data is subjected to standard metadata management, classification and hierarchical management is carried out through a data directory, and clear data assets of enterprises can be formed, so that the enterprises can quickly and clearly know the overall conditions of the data assets, and powerful data support is provided for subsequent data application.
Drawings
Fig. 1 is a flowchart of a metadata-based unified management method for multi-source heterogeneous data in an embodiment of the present application;
FIG. 2 is a logic diagram of a unified management method for multi-source heterogeneous data based on metadata in an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating metadata depiction of newly added data according to an embodiment of the present application;
FIG. 4 is a logic diagram illustrating metadata mapping for inventory data in an embodiment of the present application;
FIG. 5 is a schematic flow chart illustrating data directory abstraction and encapsulation according to an embodiment of the present application;
FIG. 6 is a flow chart illustrating data directory management according to an embodiment of the present application;
FIG. 7 is a diagram illustrating a data directory classification model according to an embodiment of the present application.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the present invention will be briefly described below with reference to the accompanying drawings and the embodiments or the description of the prior art, it is obvious that the following description of the structure of the drawings is only some embodiments of the present invention, and it is also possible for those skilled in the art to obtain other drawings based on the drawings without creative efforts. It should be noted that the description of the embodiments is provided to help understanding of the present invention, and the present invention is not limited thereto.
Examples
The method aims to solve the technical problems that the use of multi-source heterogeneous data is not fully communicated and the use efficiency of the data is low in the prior art. The embodiment of the application provides a metadata-based unified management method for multi-source heterogeneous data, which is used for uniformly depicting and managing the multi-source heterogeneous data through a preset metadata definition rule so as to be suitable for business data of various data formats under various file type and/or message type data sources; defining data characteristics by respectively providing corresponding metadata defining modes for the newly added data and the stock data to form standardized data management; forming classified and layered data assets based on a metadata and data directory abstract packaging method, realizing visual management of data, and meeting the requirements of users on data cognition and data searching; after the file type or message type data is subjected to standard metadata management, classification and hierarchical management is carried out through a data directory, and clear data assets of enterprises can be formed, so that the enterprises can quickly and clearly know the overall conditions of the data assets, and powerful data support is provided for subsequent data application.
The metadata-based multi-source heterogeneous data unified management method in the embodiment of the present application is described in detail through specific embodiments below.
As shown in fig. 1 to 7, in an aspect, the embodiment of the present application provides a metadata-based multi-source heterogeneous data unified management method, including but not limited to steps S1 to S4:
s1, responding to an operation that a data production user uploads metadata of multi-source heterogeneous data, receiving, auditing and verifying the metadata, and converting the metadata to an operating state after the metadata passes verification; the metadata is data obtained by a data production user describing multi-source heterogeneous data based on preset metadata definition rules, and the multi-source heterogeneous data at least comprises relational data, file type data or message type data;
it should be noted that, compared to the conventional method that a passive offline mode is usually adopted for metadata model depiction and lacks flexibility, the embodiment of the present application provides multiple methods for a data production user based on actual selection of application scenarios and data management, including but not limited to a method of forward depicting metadata for newly added data and reverse depicting metadata for stock data, to support active depiction management of a user on nanotube data. Namely, the metadata description of the system at different stages is compatible and managed through a plurality of modes, so that the flexibility of the system use is improved.
Preferably, the preset metadata definition rule defines metadata at least based on the dimensions of the category, the position, the content and the analysis parameters; of course, it is understood that the preset metadata definition rule includes definition rules for relational metadata, file-type metadata, and message-type metadata, but when the rule is actually applied to metadata depiction, metadata can be defined by using only one or a combination of several dimensions, and the definition rules are not limited herein; for example: metadata definition is performed on the relational data based on the category, the position and the content; the file-type data defines metadata based on the category, location, content, and parsing parameters, wherein the metadata is defined using the parsing parameters only when the file-type data is in a csv content format; the message type data also defines metadata based only on category, location, content. Of course, it should be understood that the above examples are only examples of the application of the preset metadata definition rule, and do not constitute a limitation to the scope of the preset metadata definition rule.
And adopting a predefined mode for the definition of the analysis parameters to constrain the extended definition of the content elements. The category dimension is mainly used for describing the service purpose of the data object, namely the category dimension is suitable for which service scene or specifically required data; the location dimension is used primarily to describe where to obtain the data object, defining information such as object name; the content dimension is mainly used for explaining specific information contained in the data object and helping to understand the business meaning of the content element of the data object; the analytic dimension is mainly used to explain how to identify content from the target category. More preferably, for the management of the content analysis parameters, the embodiment of the present application adopts a mode of pre-configuration in the background, that is, by predefining the analysis parameters respectively applicable to the file type data or the message type data with different formats, the implementation cost of the parser can be effectively reduced, so that the parameter analysis process is more flexible.
It should be noted that, preferably, a data production user in the embodiment of the present application is a person who has sufficient knowledge about each type of heterogeneous data, and can perform effective identification and deconstruction on the obtained heterogeneous data based on the metadata definition function provided in the embodiment of the present application, so as to perform metadata depiction, thereby completely and clearly describing information that needs to be understood by cutting processing, such as business meaning and analysis parameters included in the data. For example, business semantics included in the data, including but not limited to connotations, attribute names, value ranges, attribute types, object categories, and structural components, and technical semantics included in the data, including but not limited to storage locations, field types, lengths, parsing parameters, encoding methods, expressions, and the like.
Specifically, in step S1, the metadata includes new addition data and stock data;
when the metadata is newly added data, before the operation of uploading the metadata of the multi-source heterogeneous data by a response data production user, the method further comprises the following steps: receiving a data production request of a data production user, generating a data demand sheet and associating corresponding batch numbers;
it should be noted that the metadata of the non-relational data for the new application, that is, the metadata of the file-type data and the message-type data, is characterized. The data production request of a data production user is received, the data requirement list is generated by analyzing and processing the interface specification requirement book, the batch number is generated on line, and the requirement list and the batch number are associated, so that the data can be traced based on the batch number.
For example: for the file class data with the content format JSON, the metadata definition interface is as shown in fig. 3. IT is worth noting that in the process of metadata depiction, in order to ensure the balance of the object format in definition, the embodiment of the present application defines the format details according to the IT general definition, and also provides extensibility following the specific service scenario. For example, the definition of the content of the document object in the JSON format not only adopts the existing defined standard data format in the JSON format, but also restricts the input normal form of the data type of the data by introducing the concept of the sub-element and using the sub-element type, so that when the data type is analyzed, the standardized processing can be still carried out strictly according to the existing format definition of the JSON, and the compatibility problem possibly brought by introducing a proprietary processing protocol is avoided.
As shown in fig. 4, when the metadata is inventory data, before the operation of uploading the metadata of the multi-source heterogeneous data by the response data producing user, the method further includes:
and analyzing the basic information and content elements to be described of the stock data to form a description template and issuing the description template.
In particular, there is a preferred need to reduce the invasiveness and administrative costs of the management of non-relational data accessing the inventory business system. Therefore, in the embodiment, an active acquisition and import mode is adopted, that is, a decomposition method for managed data objects during forward design is used to analyze and format the managed data objects from an existing stock data interface to form a drawing template table, so that a data production user only needs to draw each piece of basic information and each content element, and can upload and import the basic information and the content elements in batch to a design library to reduce the invasiveness and the management cost of the data production user.
In step S1, receiving, auditing and verifying the metadata, and converting the metadata to a running state after the verification is passed, including:
receiving the metadata, auditing the compliance and consistency of the basic information and content elements defined in the metadata, and entering version release verification after the auditing is passed;
archiving metadata in the version by using the version number, and converting the metadata into a running state; wherein the version number at least comprises a batch number;
when the metadata is stock data, after archiving the metadata in the version by using the version number, the method further comprises the following steps:
the metadata design table and the release history table are updated.
Specifically, when the metadata is newly added data, after the metadata is completely described and uploaded, the system generates a corresponding version number, such as a V1.0 version, based on the design table of the metadata, and it is worth noting that there may be multiple batch numbers under the same version number, that is, there may be design tables under multiple batches correspondingly, which are combined into the same version for release; when the metadata is stock data, it is also necessary to update the original metadata design table and distribute the historical design table together with the current design table, for example, the design table of the V1.0 version and the design table of the V2.0 version.
S2, configuring corresponding authorities for users of different roles so that each role user can perform corresponding operation on the data directory based on the authority of the role user; the users with different roles comprise a data production user, a platform management user and a data consumption user;
in step S2, configuring corresponding permissions for users with different roles, so that each role user performs corresponding operations on the data directory based on its own permission, including:
at least configuring the authority of directory configuration management, directory classification, directory audit and directory expansion audit for the platform management user, so that the platform management user can perform data directory planning, top-level directory creation and maintenance, data directory audit and data directory expansion audit based on the authority of the platform management user;
at least configuring the authority of catalogue compilation for the data production user so that the data production user can perform data catalogue expansion compilation based on the own authority;
and configuring at least the authority of directory retrieval for the data consuming user so that the data consuming user can query the data directory information based on the own authority.
Specifically, the present embodiment provides autonomous classification management of a data directory based on the federal principle, which specifically includes:
as shown in fig. 5-6, different roles are assigned according to production, consumption and management scenarios, and different permissions and functions are opened for different users. The top-level design of the data directory is carried out by a directory administrator (namely a platform administrator), a data producer is authorized to concentrate on self service data, directory extension management and data object maintenance can be carried out based on data production and consumption requirements, and the data directory is created together to form a unified and standard data directory to achieve mutual data sharing interaction on the premise of keeping mutual independence of service systems among tenants (data producers).
It should be noted that, preferably, the configuration based on administrator and user management in this embodiment has two-level definitions. The data directory management adopts a mode of a platform manager, a branch manager (tenant manager), a directory producer (common user), provides two-stage management definitions of the directory manager and the data producer, and respectively performs directory management and control, directory compilation, directory related change audit and directory query consumption according to different management responsibilities and data functions and different use scenes. The division of the roles of the data catalog management role is specifically shown in the following table:
the catalog manager plans a data catalog structure from a full platform angle, establishes and maintains a top catalog, authorizes catalog node expansion and editing permission to a sub-catalog manager, and controls the expansion catalog, classification application and data catalog opening control. After the tenant serving as a data producer applies for the data directory configuration authority to the administrator, the hierarchy expansion of the subdirectories can be performed based on the data directory branches authorized by the directory administrator, and the expansion of the subdirectories inherits the attribution of the superior directory. After the extended directory is approved by an administrator, the data object can be mounted or continuously extended.
S3, responding to the compiling operation of the platform management user and/or the data production user on the data catalog, establishing the association between the data catalog and the data object according to the compiling operation, and forming a classified and layered data catalog; wherein the data object is an instantiated metadata object;
it should be noted that, as a management core of the data directory, the data directory is composed of a directory structure and a data object, where the directory structure refers to the directory structure information that is combed according to the management needs of the data directory, and the data object refers to an instantiated metadata object, and includes information of the metadata object and corresponding data storage configuration and related extended description information. By acquiring physical information, logic definition, association and classification are carried out, and a normalized object model is formed through integration, so that the normalized object model can be compiled and classified into different classification attributions according to requirements.
Preferably, the data production user carries out extended compilation on the structure of the data directory and the mounting of the data objects based on preset classification rules, classification brevity codes and directory coding rules.
As shown in fig. 7, it should be noted that the catalog classification information maintains global classification rules, classification brevity codes, and catalog encoding rules, provides rule dependencies for constructing a data frame view, sorting the business of data, and generating catalogues, and is a first perspective of a data consuming user to retrieve heterogeneous data. The data directory is mainly used for interfacing with data storage entities, organizing and organizing various data sources, classifying and depicting the data entities according to business angles, and is a rule basis for data object mount management. The catalog classification information is taken from data storage, metadata object attributes and user-defined classification information and is characterized by unified catalog management personnel. The data directory management controls the hierarchical composition and the encoding formation rule of the data directory by specifying classification. And constructing classification through a multi-dimensional model to form a classification label.
Preferably, in step S3, the establishing of the association between the data directory and the data object according to the preparation operation includes:
and acquiring label information of the data object in the data directory depicted by the data production user, and establishing association between the data directory and the data object in the corresponding data source according to the label information.
The core link of catalog management of multi-source heterogeneous data is to describe and define data objects, realize the association of data source storage information on the basis of metadata, improve related business attributes, technical management attributes and management attributes through tags, and form a basic object capable of providing atomic data service capability for data consumption users. For the data object with the single structure, a data production user only needs to select the corresponding metadata object, associate the corresponding storage type according to the type structure information designed by the metadata, and quickly complete the creation of the data object and publish the data object to the classification catalog in a key mode by specifying a specific storage source and a segmentation rule.
Specifically, the definite basic attributes of the file-type metadata object during mounting include a server type of a storage file, a link address, a storage directory, a storage policy, an update cycle type of data, a time point to be generated, and a reminding manner of arrival of data generation. And the definite basic attributes of the message type object during mounting comprise the message type, the data source and the message subject topic, and when the message type object is not a Kafka object, the corresponding tag classification can be recorded.
In step S3, after forming the data directory of the classification hierarchy, the method further comprises:
and issuing the compiled data directory, and setting the opening mode, range and time of the data directory.
Preferably, after the data directory is established, the embodiment further performs auditing on the data directory, and more preferably, provides a multi-level auditing and platform autonomous auditing mechanism. The data producer only needs to associate the metadata with the stored information, and according to the appointed input submission, the system background automatically checks the integrity and the logicality of the configuration information according to the subsequent service access requirement and the safety strategy requirement, and judges whether the configuration information can be issued into a complete data object or not; after the object is issued, the embodiment provides an open policy selection for verifying the hierarchical attribution authority of the object and the directory, and ensures that the open range conforms to the authority containing relationship of the directory attribution hierarchy; in the data directory and object opening or logout link, operation limitation and alarm basis are provided by providing associated reference analysis and state judgment. Through the independent approval function of different links, the problems of logicality, correctness, integrity and the like of configuration which are not allowed to intervene when an administrator intervenes in the auditing link are ensured, the introduction and judgment of the informed condition and the external strategy can be emphasized, and the operation of managing the data directory by the user is simplified.
Preferably, after the data directory structure and the data object are constructed and released, and after open setting and verification authorization, a unified service management link can be entered to perform sharing interaction of information and data content. The data directory and the data object are opened to support the appointed opening mode, range and expiration time, wherein the opening mode and the range can be switched to be opened globally, limited not to be opened and partially opened to the appointed tenant/role/user range according to the opening state of the superior directory. Specifically, in configuration management, a directory administrator selects a node whose status is published, and if the node is selected to be open, the latest upper-level directory node opening mode, opening range and expiration time are acquired and verified. The method comprises the following specific steps:
1) If no upper node exists, the default initialization page is globally open and never expires;
2) If the upper level opened node = globally opened, the optional range of initialization opening is:
a) The open mode is only selectable: the part is opened and the opening is limited,
b) Optional tenant/user scope: the platform is a full-scale tenant/user,
c) Limiting an optional range of expiration times < = the expiration time of the nearest superior directory node;
3) Upper level opened node = partially opened/restricted opened, then the optional range of initialization opening is:
a) The open mode can only hold the corresponding selection: partial open-partial open/restricted open-restricted open,
b) Optional tenant/user scope: the selected range of the upper level node is,
c) The optional range of the limit expiration time is less than or equal to the expiration time of the nearest superior directory node.
And S4, responding to the information retrieval operation of the data consuming user on the data directory, and returning corresponding data attribute information and content information to the data consuming user.
In step S4, the information retrieval operation of the data directory in response to the data consuming user includes:
and responding the information retrieval operation of the data consumption user based on the uniform query window so as to allow the data consumption user to perform information query based on the data directory structure and the data object.
Specifically, the present embodiment provides query, detail review, and data preview of the full directory information to the data consuming gourd, and provides a quick use entry for performing subscription service and the like on the target data object. The data consumption user does not need to pay attention to the storage route access details of the object, and can directly call and read the basic attributes such as the data category, the content structure, the business attribution and the like of the metadata portrayal from the overall perspective through the details of the data object, and check the information such as the data classification, the physical storage, the data storage type, the business label and the like of the supplementary portrayal of the data object.
Based on the above disclosure, in the embodiment of the present application, based on a preset metadata definition rule, unified metadata depiction and management are performed on multi-source heterogeneous data, so as to be applicable to service data in multiple data formats under multiple file-type and/or message-type data sources; defining data characteristics by respectively providing corresponding metadata defining modes for the newly added data and the stock data to form standardized data management; forming classified and layered data assets based on a metadata method and a data directory abstract packaging method, realizing visual management of data, and meeting the requirements of users on data cognition and data searching; after the file type or message type data is subjected to standard metadata management, classification and hierarchical management are carried out through a data directory, and clear data assets of an enterprise can be formed, so that the enterprise can quickly and clearly know the whole condition of the data assets, and powerful data support is provided for subsequent data application. For example, the type, the quantity, the cooling and heating degree (use condition) and the like of the displayed assets can be quantified, so that an enterprise manager can visually learn the overall condition of the assets per se, and scientific data basis is provided for capital investment and operation decision; the system realizes the universal data cutting and processing capacity, can obviously improve the efficiency of collaborative analysis among heterogeneous data source data, efficiently converges the diversified data circulated in an enterprise in a role similar to a bus, and performs alignment and statistical analysis among the data according to a demand scene, thereby providing powerful data support for the business development of the enterprise.
A second aspect provides a metadata-based multi-source heterogeneous data unified management system, including:
the metadata publishing module is used for responding to the operation that a data production user uploads metadata of multi-source heterogeneous data, receiving, auditing and verifying the metadata, and converting the metadata to a running state after the verification is passed; the metadata are data obtained by a data production user describing multi-source heterogeneous data based on preset metadata definition rules, and the multi-source heterogeneous data at least comprise relational data, file type data or message type data;
the authority configuration module is used for configuring corresponding authorities for users with different roles so that each role user can perform corresponding operation on the data directory based on the authority of the role user; the users with different roles comprise a data production user, a platform management user and a data consumption user;
the data catalog issuing module is used for responding to the compiling operation of a platform management user and/or a data production user on the data catalog, establishing the association between the data catalog and the data object and the data source storage information according to the compiling operation and forming a classified and layered data catalog; wherein the data object is an instantiated metadata object;
and the data query module is used for responding to the information retrieval operation of the data consuming user on the data catalogue and returning corresponding data attribute information and content information to the data consuming user.
In a third aspect, the present invention provides a computer device, comprising a memory, a processor and a transceiver communicatively connected in sequence, wherein the memory is used for storing a computer program, the transceiver is used for transceiving a message, and the processor is used for reading the computer program and executing the method as set forth in any one of the possible designs of the first aspect.
For example, the Memory may include, but is not limited to, a Random-Access Memory (RAM), a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a First-in First-out (FIFO), and/or a First-in Last-out (FILO), and the like; the processor may not be limited to the use of a microprocessor model number STM32F105 family; the transceiver may be, but is not limited to, a WiFi (wireless fidelity) wireless transceiver, a bluetooth wireless transceiver, a GPRS (General Packet Radio Service) wireless transceiver, and/or a ZigBee (ZigBee protocol, low power local area network protocol based on ieee802.15.4 standard) wireless transceiver, etc. In addition, the computer device may also include, but is not limited to, a power module, a display screen, and other necessary components.
For the working process, working details and technical effects of the foregoing computer device provided in the third aspect of this embodiment, reference may be made to the method described in the first aspect or any one of the possible designs of the first aspect, which is not described herein again.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon instructions which, when executed on a computer, perform the method as set forth in any one of the possible designs of the first aspect.
The computer-readable storage medium refers to a carrier for storing data, and may include, but is not limited to, floppy disks, optical disks, hard disks, flash memories, flash disks and/or Memory sticks (Memory sticks), etc., and the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
For the working process, working details, and technical effects of the foregoing computer-readable storage medium provided in the fourth aspect of this embodiment, reference may be made to the method described in the first aspect or any one of the possible designs in the first aspect, which is not described herein again.
In a fifth aspect, the present invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method as set forth in any one of the possible designs of the first aspect.
For the working process, the working details and the technical effects of the computer program product containing the instructions provided in the fifth aspect of the present embodiment, reference may be made to the method described in the first aspect or any one of the possible designs of the first aspect, and details are not described herein again.
Finally, it should be noted that: the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. The unified management method for the multi-source heterogeneous data based on the metadata is characterized by comprising the following steps:
responding to the operation that a data production user uploads metadata of multi-source heterogeneous data, receiving, auditing and verifying the metadata, and converting the metadata to an operating state after the metadata passes verification; the metadata is data obtained by a data production user describing multi-source heterogeneous data based on preset metadata definition rules, and the multi-source heterogeneous data at least comprises relational data, file type data or message type data;
configuring corresponding authorities for users with different roles so that each role user can perform corresponding operation on the data directory based on the authority of the role user; the users with different roles comprise a data production user, a platform management user and a data consumption user;
responding to the compiling operation of a platform management user and/or a data production user on the data catalog, establishing the association between the data catalog and the data object according to the compiling operation, and forming a classified and layered data catalog; wherein the data object is an instantiated metadata object;
and responding to the information retrieval operation of the data consuming user on the data directory, and returning corresponding data attribute information and content information to the data consuming user.
2. The metadata-based multi-source heterogeneous data unified management method according to claim 1, wherein the metadata comprises new-added data and stock data;
when the metadata is newly added data, before the operation of uploading the metadata of the multi-source heterogeneous data by a response data production user, the method further comprises the following steps: receiving a data production request of a data production user, generating a data demand sheet and associating corresponding batch numbers;
when the metadata is stock data, before a response data production user uploads the metadata of the multi-source heterogeneous data, the method further comprises the following steps:
and analyzing the basic information and content elements to be described of the stock data to form a description template and issuing the description template.
3. The unified management method for multi-source heterogeneous data based on metadata according to claim 1, wherein the preset metadata definition rules define metadata at least based on the dimensions of category, location, content and analysis parameters; and adopting a predefined mode for the definition of the analysis parameters to constrain the extension definition of the content elements.
4. The unified management method for multi-source heterogeneous data based on metadata according to claim 2, wherein the receiving, auditing and verifying the metadata, and converting the metadata to a running state after the verification is passed comprises:
receiving the metadata, auditing the compliance and consistency of the basic information and content elements defined in the metadata, and entering version release verification after the auditing is passed;
archiving metadata in the version by using the version number, and converting the metadata into a running state; wherein the version number comprises at least one batch number;
when the metadata is stock data, after archiving the metadata in the version by using the version number, the method further comprises the following steps:
the metadata design table and the release history table are updated.
5. The unified management method for multi-source heterogeneous data based on metadata according to claim 1, wherein corresponding permissions are configured for users of different roles, so that each role user performs corresponding operations on a data directory based on its own permission, including:
at least configuring the authority of directory configuration management, directory classification, directory audit and directory expansion audit for the platform management user, so that the platform management user can perform data directory planning, top-level directory creation and maintenance, data directory audit and data directory expansion audit based on the authority of the platform management user;
configuring at least the authority of catalog making for the data production user so that the data production user can perform data catalog expansion making based on the own authority;
and configuring at least the authority of directory retrieval for the data consuming user so that the data consuming user can query the data directory information based on the own authority.
6. The metadata-based multi-source heterogeneous data unified management method according to claim 5, wherein a data production user performs extended compilation on the structure of a data directory and the mounting of data objects based on preset classification rules, classification brevity codes and directory coding rules.
7. The unified management method for multi-source heterogeneous data based on metadata according to claim 1, wherein the establishing of the association between the data directory and the data object according to the compilation operation comprises:
and acquiring label information of the data object in the data directory depicted by the data production user, and establishing association between the data directory and the data object in the corresponding data source according to the label information.
8. The unified management method for multi-source heterogeneous data based on metadata according to claim 1, wherein after forming the data directory of the classification hierarchy, the method further comprises:
and issuing the compiled data directory, and setting the opening mode, range and time of the data directory.
9. The unified management method for multi-source heterogeneous data based on metadata according to claim 1, wherein responding to the information retrieval operation of the data consuming user to the data directory comprises:
and responding the information retrieval operation of the data consumption user based on the uniform query window so as to allow the data consumption user to perform information query based on the data directory structure and the data object.
10. The unified management system of heterogeneous data of multisource based on metadata, its characterized in that includes:
the metadata publishing module is used for responding to the operation that a data production user uploads metadata of multi-source heterogeneous data, receiving, auditing and verifying the metadata, and converting the metadata to a running state after the verification is passed; the metadata is data obtained by a data production user describing multi-source heterogeneous data based on preset metadata definition rules, and the multi-source heterogeneous data at least comprises relational data, file type data or message type data;
the authority configuration module is used for configuring corresponding authorities for users with different roles so that each role user can perform corresponding operation on the data directory based on the authority of the role user; the users with different roles comprise a data production user, a platform management user and a data consumption user;
the data catalog issuing module is used for responding to the compiling operation of a platform management user and/or a data production user on the data catalog, establishing the association between the data catalog and the data object and the data source storage information according to the compiling operation and forming a classified and layered data catalog; wherein the data object is an instantiated metadata object;
and the data query module is used for responding to the information retrieval operation of the data consuming user on the data directory and returning corresponding data attribute information and content information to the data consuming user.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211328572.3A CN115729993A (en) | 2022-10-27 | 2022-10-27 | Multi-source heterogeneous data unified management method and system based on metadata |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211328572.3A CN115729993A (en) | 2022-10-27 | 2022-10-27 | Multi-source heterogeneous data unified management method and system based on metadata |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN115729993A true CN115729993A (en) | 2023-03-03 |
Family
ID=85294053
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211328572.3A Pending CN115729993A (en) | 2022-10-27 | 2022-10-27 | Multi-source heterogeneous data unified management method and system based on metadata |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115729993A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119205277A (en) * | 2024-11-22 | 2024-12-27 | 南京协守信息科技有限公司 | Retrieval method and system for efficiently collecting accurate global potential customer information |
| CN119884257A (en) * | 2024-12-10 | 2025-04-25 | 华能招采数字科技有限公司 | Metadata-driven data resource directory management method |
-
2022
- 2022-10-27 CN CN202211328572.3A patent/CN115729993A/en active Pending
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119205277A (en) * | 2024-11-22 | 2024-12-27 | 南京协守信息科技有限公司 | Retrieval method and system for efficiently collecting accurate global potential customer information |
| CN119884257A (en) * | 2024-12-10 | 2025-04-25 | 华能招采数字科技有限公司 | Metadata-driven data resource directory management method |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN117521969B (en) | Intelligent park operation index calculation system based on digital twinning | |
| US7657534B2 (en) | Order commitment method and system | |
| US11496584B2 (en) | Extraction and distribution of content packages in a digital services framework | |
| CN114218218A (en) | Data processing method, device and equipment based on data warehouse and storage medium | |
| CN103778107A (en) | Method and platform for quickly and dynamically generating form based on EXCEL | |
| CN104200402A (en) | Publishing method and system of source data of multiple data sources in power grid | |
| CN115729993A (en) | Multi-source heterogeneous data unified management method and system based on metadata | |
| CN115617776A (en) | A data management system and method | |
| CN104111998A (en) | Method and device for sorting coding and integrated exchange and management of heterogeneous data of enterprise | |
| CN111709702A (en) | Product full life cycle management system | |
| CN102722769A (en) | Experimental data processing system and method | |
| KR102339897B1 (en) | Method for providing business process management system based on automatic report generation | |
| US20140229223A1 (en) | Integrated erp based planning | |
| CN102306355A (en) | Management system for IT (Information Technology) operation and maintenance configuration | |
| CN115658658A (en) | Batch-based data sharing method, device, and storage medium in an enterprise data center | |
| CN115309839A (en) | Intelligent forestry system and medium based on data warehouse and construction method | |
| CN109597603A (en) | A kind of requirement documents automatic generation method based on document component | |
| CN117874384A (en) | Site content publishing management method, device, computer equipment and storage medium | |
| CN117371945A (en) | One-stop big data management service platform for environmental industry | |
| CN116485169A (en) | Method for supervising product production life cycle based on metadata and flow management | |
| CN108470087B (en) | Data bus of ramjet design simulation platform | |
| KR102183817B1 (en) | System of template editor for legal document | |
| CN113918511A (en) | Multi-factor data analysis processing method, system and storage medium | |
| CN112183991A (en) | Power plant data management method and system, electronic equipment and storage medium | |
| CN113420996A (en) | Digital logistics data information management system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |