CN120653636A

CN120653636A - Data quality control method and device, computer equipment and storage medium

Info

Publication number: CN120653636A
Application number: CN202510687406.XA
Authority: CN
Inventors: 陈燎
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2025-05-26
Filing date: 2025-05-26
Publication date: 2025-09-16

Abstract

The present application relates to the field of data processing technology and specifically discloses a data quality control method, apparatus, computer equipment, and storage medium. The present application automatically executes quality assessment tasks and data repair tasks through a timed scheduling tool, and intelligently determines a data repair strategy based on the quality assessment results, so as to perform quality assessment and data repair in accordance with the corresponding target quality assessment rules and data repair strategy, thereby realizing intelligent and automated data quality assessment and data repair, and thus improving data control efficiency. Applying this method to the data quality control business of the financial system can enhance the intelligence and automation level of data quality control, thereby improving data quality control efficiency.

Description

Data quality control method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data quality control method, a data quality control device, a computer device, and a storage medium.

Background

With the rapid development of the financial industry and the wide application of financial science and technology, the processing and analysis of massive data are demanded. The data of each financial service has the characteristics of large data volume, wide data source and complex data types, for example, the data sources of the insurance industry comprise a client information system, a claim settlement system, a third party data system and the like, and the data types relate to structured data, semi-structured data and unstructured data. However, the existing data quality control scheme needs a great deal of manual participation in the processes of data preprocessing and data quality evaluation, such as generation and trigger execution of preprocessing tasks and quality evaluation tasks, and needs manual processing, so that the data quality control efficiency is reduced. Therefore, how to implement intelligent and automatic data quality control to improve the data quality control efficiency becomes a problem to be solved.

Disclosure of Invention

The application provides a data quality control method, a data quality control device, computer equipment and a storage medium, which are used for realizing intelligent and automatic data quality control and improving data quality control efficiency.

In a first aspect, the present application provides a data quality control method, the method comprising:

Acquiring data to be managed and controlled and a target quality evaluation rule corresponding to the data to be managed and controlled;

Based on a preset timing scheduling tool and the target quality evaluation rule, performing a quality evaluation task at regular time to obtain a quality evaluation result of the data to be managed and controlled, wherein the quality evaluation result comprises a quality check result;

when the quality check result is that the check fails, determining a data restoration strategy according to the quality evaluation result, and generating a data restoration task based on the data restoration strategy;

And when a preset data restoration triggering condition is reached, executing the data restoration task at regular time based on the timing scheduling tool and the data restoration strategy, and storing the restored data into a preset data warehouse.

In a second aspect, the present application also provides a data quality control apparatus, the apparatus comprising:

the data acquisition module is used for acquiring data to be managed and target quality evaluation rules corresponding to the data to be managed;

The quality evaluation module is used for executing quality evaluation tasks at regular time based on a preset timing scheduling tool and the target quality evaluation rule to obtain quality evaluation results of the data to be managed, wherein the quality evaluation results comprise quality check results;

the strategy determining module is used for determining a data repairing strategy according to the quality evaluation result when the quality check result is that the check fails, and generating a data repairing task based on the data repairing strategy;

And the data restoration module is used for executing the data restoration task at regular time based on the timing scheduling tool and the data restoration strategy when a preset data restoration triggering condition is reached, and storing the restored data into a preset data warehouse.

In a third aspect, the present application also provides a computer device, where the computer device includes a memory and a processor, where the memory is configured to store a computer program, and where the processor is configured to execute the computer program and implement a data quality management method as described above when the computer program is executed.

In a fourth aspect, the present application also provides a computer readable storage medium storing a computer program, which when executed by a processor causes the processor to implement a data quality management method as described above.

The application discloses a data quality control method, a device, computer equipment and a storage medium, which are used for acquiring data to be controlled and a target quality evaluation rule corresponding to the data to be controlled, executing a quality evaluation task at regular time based on a preset timing scheduling tool and the target quality evaluation rule to obtain a quality evaluation result of the data to be controlled, wherein the quality evaluation result comprises a quality check result, determining a data restoration strategy according to the quality evaluation result when the quality check result is that the quality check result fails, generating a data restoration task based on the data restoration strategy, executing the data restoration task at regular time based on the timing scheduling tool and the data restoration strategy when a preset data restoration triggering condition is met, and storing the restored data into a preset data warehouse. According to the application, the quality evaluation task and the data restoration task are automatically executed through the timing scheduling tool, and the data restoration strategy is intelligently determined according to the quality evaluation result, so that the quality evaluation and the data restoration are carried out according to the corresponding target quality evaluation rule and the data restoration strategy, the intelligent and automatic data quality evaluation and the data restoration are realized, and the data management and control efficiency is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a data quality control method according to a first embodiment of the present application;

FIG. 2 is a schematic flow chart of a data quality control method according to a second embodiment of the present application;

FIG. 3 is a schematic flow chart of a data quality control method according to a third embodiment of the present application;

FIG. 4 is a schematic block diagram of a data quality control apparatus according to an embodiment of the present application;

fig. 5 is a schematic block diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

It is to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

The embodiment of the application provides a data quality control method, a data quality control device, computer equipment and a storage medium. The data quality control method can be applied to servers, and the servers can be independent servers or server clusters.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic flow chart of a data quality control method according to an embodiment of the application. The data quality control method can be applied to a server, is used for automatically executing a quality evaluation task and a data restoration task through a timing scheduling tool, intelligently determining a data restoration strategy according to a quality evaluation result, and performing quality evaluation and data restoration according to a corresponding target quality evaluation rule and the data restoration strategy, so that intelligent and automatic data quality evaluation and data restoration are realized, and further, the data control efficiency is improved.

As shown in fig. 1, the data quality control method specifically includes steps S101 to S104.

S101, acquiring data to be managed and controlled and a target quality evaluation rule corresponding to the data to be managed and controlled;

in one embodiment, the data to be managed may be data that is first entered or acquired, such as business data that is newly generated and entered after the insurance claim business is completed. When data is first entered or acquired, the data needs to be preprocessed, including preprocessing steps such as data cleaning, data conversion and data standardization. And taking the preprocessed data as data to be managed.

In another embodiment, the data to be managed may also be data already stored in a data warehouse.

Further, the acquiring of the data to be managed comprises the steps of acquiring the data in real time based on a preset data capturing and transmitting tool, and preprocessing the data based on a preset data preprocessing tool to obtain the data to be managed.

In one embodiment, data preprocessing is inefficient due to the diversity of data sources and data structures. For example, insurance industry data sources are wide, including customer information, claim records, third party data, etc., and data categories are complex, involving structured, semi-structured and unstructured data, data is scattered in different systems and channels, and data integration and integration face challenges of heterogeneous data sources. The prior art is difficult to realize efficient integration and unified management when processing multi-source heterogeneous data, so that data consistency is difficult to guarantee, intelligent and automatic tools are lacked when processing unstructured data, and cleaning and standardization efficiency is low.

In order to improve the data preprocessing efficiency, the embodiment of the application adopts the distributed computing capability of the distributed computing framework to realize the efficient preprocessing of mass data, and the data preprocessing comprises the steps of data cleaning, data conversion, data standardization and the like.

Specifically, structured or unstructured data from different sources is collected into a real-time streaming tool for preprocessing by using a data capture and transmission tool. And storing the preprocessed data into a preset data warehouse to serve as data to be managed.

Further, the obtaining of the target quality evaluation rule corresponding to the data to be managed includes monitoring a data change condition of the data to be managed, obtaining a data quality list according to the data change condition, and adjusting a preset quality evaluation rule according to the data quality list to obtain the target quality evaluation rule.

In one embodiment, the business requirements are continually updated as the data environment changes dynamically. The prior art has insufficient flexibility in dynamically adjusting data quality evaluation rules and models, and is difficult to monitor in real time and quickly respond to data quality problems. Therefore, the embodiment dynamically adjusts the quality evaluation rule by monitoring the data change condition, thereby improving the efficiency and accuracy of data quality control. The dynamic change of the data environment refers to the diversity of data sources, with the progress of the times, the insurance industry data is contacted with more non-traditional data such as unstructured data of social media and the like, and sudden events cause a large amount of claims of insurance policies, the data volume is increased rapidly and the like.

The data quality list comprises quality evaluation dimensions and standard indexes corresponding to the quality evaluation dimensions.

Specifically, a real-time stream processing tool is utilized to read metadata information in a data stream, and the change condition of data to be managed is analyzed in real time in combination with a data blood relationship map. And analyzing the quality evaluation dimension of the current data to be managed and the specified index of each dimension according to the data change details. Comparing the obtained quality evaluation dimension and index with a preset quality evaluation rule, and adjusting the quality evaluation dimension and index in the preset quality evaluation rule to the currently obtained quality evaluation dimension and standard index to obtain a target quality evaluation rule, for example, in the insurance claim settlement business, real-time monitoring claim settlement data, and when the data increase exists, adjusting the quality evaluation rule according to the specific situation of newly added data to obtain the target quality evaluation rule.

In the above embodiment, by combining metadata management and data blood relationship tracking, a data quality list may be provided for dynamic adjustment of quality evaluation rules, and the quality evaluation rules may be dynamically adjusted according to the data quality list, so as to improve efficiency and accuracy of data quality management and control.

S102, based on a preset timing scheduling tool and the target quality evaluation rule, performing a quality evaluation task at regular time to obtain a quality evaluation result of the data to be managed and controlled, wherein the quality evaluation result comprises a quality check result;

In one embodiment, the timing scheduling tool (e.g., linkdo scheduling tool) may trigger the data quality assessment task at preset time intervals (e.g., 2 a.m. each day), and may trigger the data quality assessment task according to the triggering conditions of other data quality assessment tasks. The data quality checking work can be ensured to be carried out according to the specified requirement, and any important data quality checking time point is not missed.

In one embodiment, the data quality assessment task includes quality check, and when the quality check result is that the check fails, abnormality detection is further included.

Triggering and executing a data quality evaluation task through a timing scheduling tool, checking the designated data to be managed according to a target quality evaluation rule, directly obtaining a quality evaluation result if the data to be managed passes the check, and carrying out anomaly detection on the abnormal data if the data to be managed does not pass the check to obtain an anomaly detection result.

It can be understood that when the quality check result is that the check is passed, the quality evaluation result only includes the quality check result, namely that the check is passed, and when the quality check result is that the check is not passed, the quality evaluation result also includes the abnormality detection result, namely that the check is not passed, the abnormality problem and the abnormality grade.

S103, when the quality check result is that the check fails, determining a data restoration strategy according to the quality evaluation result, and generating a data restoration task based on the data restoration strategy;

In one embodiment, when the verification fails, the quality assessment result includes an abnormal problem and an abnormal grade, and the data restoration strategy is determined according to the abnormal problem and the abnormal grade.

Specifically, the repair mode corresponding to the abnormal problem and the repair priority corresponding to the abnormal level may be determined by looking up a table (including an abnormal problem repair mode list and an abnormal level repair priority list). The abnormal problem repair mode list and the abnormal grade repair priority list can be freely set by a user according to actual conditions.

In one embodiment, a specific data repair task is created according to a data repair policy, and a repair target, scope, step, and repair priority of the data repair task are specified. And arranging the execution sequence of the data repair tasks according to the repair priority. The high-priority tasks are executed first, and the low-priority repair tasks can be executed in batches at preset time.

And S104, executing the data restoration task at regular time based on the timing scheduling tool and the data restoration strategy when a preset data restoration triggering condition is reached, and storing the restored data into a preset data warehouse.

In one embodiment, a timing scheduling tool is used to perform data repair tasks when a preset data repair trigger condition is reached. The data repair triggering condition can be that the repair task is executed for a preset time, such as 2 a.m. a day, so as to avoid the influence on the system performance during the service peak period, when the accumulated data to be repaired reaches a certain amount (such as 1000 records), the repair task is triggered and executed in batches, and the data repair task execution requirement, such as the abnormal repair task for the key field data, can be required to be executed immediately.

For the automatic repair task, the repair tool performs data cleaning, conversion and updating according to preset rules. Such as automatically filling in missing values, correcting the wrong date format, etc. And for tasks requiring manual intervention, notifying related personnel to carry out auditing and correction.

In one embodiment, the repaired data is stored in a data warehouse where the repair history and blood-lineage relationships of the data are recorded. And the information of which repair operation, by whom repair, repair time and the like are passed through the data is noted, and the data tracing and auditing are facilitated. By way of example, the data warehouse may include a Hive-based constructed data warehouse that supports storage and querying of structured and semi-structured data, enhancing data integration and management capabilities.

The embodiment of the application provides a data quality control method, a device, computer equipment and a storage medium, wherein the data quality control method, the device, the computer equipment and the storage medium are used for acquiring data to be controlled and a target quality evaluation rule corresponding to the data to be controlled, executing a quality evaluation task at regular time based on a preset timing scheduling tool and the target quality evaluation rule to obtain a quality evaluation result of the data to be controlled, wherein the quality evaluation result comprises a quality check result, determining a data restoration strategy according to the quality evaluation result when the quality check result is that the quality check is not passed, generating a data restoration task based on the data restoration strategy, executing the data restoration task at regular time based on the timing scheduling tool and the data restoration strategy when a preset data restoration triggering condition is reached, and storing the restored data into a preset data warehouse. According to the application, the quality evaluation task and the data restoration task are automatically executed through the timing scheduling tool, and the data restoration strategy is intelligently determined according to the quality evaluation result, so that the quality evaluation and the data restoration are carried out according to the corresponding target quality evaluation rule and the data restoration strategy, the intelligent and automatic data quality evaluation and the data restoration are realized, and the data management and control efficiency is further improved.

Referring to fig. 2, fig. 2 is a schematic flow chart of a data quality control method according to an embodiment of the application. The data quality control method can be applied to a server and is used for realizing automatic data quality comprehensive monitoring and evaluation by utilizing technologies and means such as a timing scheduling tool, a quality evaluation rule, an abnormality detection model and the like.

As shown in fig. 2, the step S102 of the data quality control method specifically includes steps S201 to S203.

S201, performing quality check on the data to be managed and controlled based on the timing scheduling tool and the target quality evaluation rule to obtain a quality check result;

In one embodiment, the quality assessment rules include, but are not limited to, data integrity check rules (e.g., whether key fields are missing), data accuracy check rules (e.g., whether data is within a reasonable range), data consistency check rules (e.g., whether data in different systems is consistent), and the like.

In one embodiment, the data quality assessment task is triggered and executed through the timing scheduling tool, and the specified data to be managed is checked according to the target quality assessment rule.

And checking the data to be managed one by one according to the target quality evaluation rule. For example, it is checked whether the "claims amount" field in the insurance claims data is within the claims range specified by the policy, whether the "identification card number" field in the customer information meets the format requirements of the identification card number, and the like. In the quality check process, a check result of each record is recorded, which indicates whether the record passes the quality check.

And after the verification of all the data to be managed and controlled is completed, counting and summarizing verification results. Statistics include, but are not limited to, the number of data records that pass the verification, the number of data records that fail the verification, the distribution of various data quality problems, and the like.

The quality check results are stored to a designated storage location (e.g., database, file system, etc.) for subsequent querying and analysis. And the quality check result can be displayed through a visual interface of the data quality control platform, so that service and system responsible persons can conveniently and intuitively know the data quality condition. The presentation includes, but is not limited to, a data quality trend graph, a duty cycle pie graph of various types of data quality problems, and the like.

S202, marking the data to be managed as data to be repaired when the quality check result is that the quality check result fails, calling a preset abnormality detection model, and carrying out abnormality detection on the data to be repaired to obtain an abnormality problem and an abnormality grade of the data to be repaired;

in one embodiment, when the quality check result indicates that the data fails the check, the data record that fails the check is automatically marked as "data to be repaired". To distinguish the abnormal data from other normal data, so as to facilitate the subsequent repair processing.

Isolating the data to be repaired from the original data set, and storing the data to be repaired in a temporary storage area. The temporary storage area can be an independent database table, a folder or a data partition, and is used for storing all data records to be repaired in a centralized manner so as to avoid the interference of the data to be repaired on normal data and facilitate centralized management and processing of the data to be repaired.

In one embodiment, an anomaly detection model (e.g., a clustering algorithm) may be trained based on historical data for identifying anomaly problems in the data.

And transmitting the data to be repaired to an anomaly detection model for analysis. The anomaly detection model can perform deep anomaly detection on the data according to the algorithm and rules in the anomaly detection model. In this process, the model may detect outliers from the center, calculate feature vectors for the data records, and compare them to normal data patterns to find potential anomaly problems. For example, the model may find that the amount of claims recorded in a claim is too high and the time to risk does not meet the usual regulations.

The anomaly detection model can output specific anomaly problems in the data to be repaired, including but not limited to aspects of accuracy, integrity, consistency, etc. of the data. For example, the model may identify anomalies such as "the amount of claims exceeds the policy limit", "the name in the customer information does not match the identification number".

In one embodiment, the anomaly level of the anomaly data is determined according to a preset anomaly level determination rule. Illustratively, the anomaly level for each anomaly problem is determined based on the severity of the anomaly problem and the impact on the business. The anomaly level may be classified into three levels, low, medium, and high. For example, "the claim amount exceeds the policy limit" and the exceeding amount is larger than the threshold value, the high-level abnormality is determined, "the contact phone format in the client information is incorrect" is determined as the low-level abnormality, and when a certain data has the low-level abnormality exceeding the preset number, the high-level abnormality is determined as the high-level abnormality. The abnormal grade determining rule can be freely set by a user according to actual requirements.

In another embodiment, the anomaly level determining rule may further include trend analysis of the data according to the continuous multiple quality assessment results, showing the change of the quality of the data with time, and if the anomaly trend of the data increases, determining the data level as a high level anomaly.

S203, generating the quality evaluation result based on the quality check result, the abnormal problem and the abnormal grade.

In one embodiment, information about quality check results, anomaly issues, and anomaly ratings is collected. The quality check result provides the overall quality profile of the data, including the number of data records passing and failing to pass the check, etc., the abnormality question details the specific quality question in the data, and the abnormality grade reflects the severity of the data abnormality. And (5) sorting and summarizing the abnormal problems and the abnormal grades to generate a comprehensive quality evaluation result.

In one embodiment, the quality assessment report can be in various formats, such as a text report, a form report, a graphical report and the like, so that the report content is concise and clear, the emphasis is on, and business personnel and responsible personnel can quickly know the data quality condition and take corresponding measures.

In the embodiment, the automatic data quality comprehensive monitoring and evaluation is realized by using technologies and means such as a timing scheduling tool, a quality evaluation rule, an anomaly detection model and the like.

Referring to fig. 3, fig. 3 is a schematic flow chart of a data quality control method according to an embodiment of the application. The data quality control method can be applied to a server and used for enabling a user to flexibly access data through a data quality control platform, realizing cross-department collaboration, promoting data sharing and flow optimization and improving the utilization rate of data resources.

As shown in fig. 3, the data quality control method specifically includes steps S301 to S304.

S301, acquiring a data responsibility list based on a preset data quality control platform, and receiving a user access request;

in one embodiment, the data quality control platform is a support cross-department collaboration platform for managing tasks related to data quality, including data responsibility list management, user access control, data blood relationship tracking, and the like. A unified operation interface is provided for data manager and user to ensure the safety, accuracy and reliability of data.

And the responsibility of the corresponding table is compacted to the relevant business part through the responsibility form of the library table and the data blood relation is added, and the data inquiry authority is opened, so that the business problem of one party can be traced and checked to the relevant data information, for example, the attribution of the policy is attributed to the insurance module, but when the client makes a claim, the customer can trace to the attribution of the policy, the data sharing is ensured, and the authority is controllable.

In one embodiment, the data responsibility list specifies the tables, the responsible departments and responsible persons of the data fields. The data responsibility list may be read from a background database or configuration file so that the management attribution of each part of data can be clearly known when the user requests the data. For example, in the insurance industry, the data responsibility list would record information such as "customer information sheet is responsible by customer service department, responsibility person is Zhang Sano", etc.

S302, analyzing the user access request to obtain user information and request data;

in one embodiment, after the platform receives the user access request, the parsing technique is used to decompose and extract the requested content to obtain the user information and the requested data of the user request.

The user information typically includes an identity (e.g., user name, job number), affiliated department, role, etc., which can be used for subsequent permission determination. For example, the identity information of the user is parsed from the authentication token in the request header, and it is determined whether the user is an insurance company's internal or employee external partner, and its position and department within the company. The request data indicates the specific data resource that the user wants to access.

In another embodiment, user intent, such as a read-only query or data modification, may also be included in the user access request. To accurately verify whether the user intention is allowed at the time of the authority verification.

S303, determining user access rights based on the user information and the data responsibility list, and checking the user access rights and the request data;

in one embodiment, the user's access rights are determined based on user information and a data responsibility list.

And comparing the user information with the data responsibility list to determine whether the user belongs to a responsibility department of a certain data resource or is authorized by a responsibility person. For example, if the user is a claimant, the user may only have access to data related to the claimant case for which he is responsible. If the data the user requests to access belongs to a specific department, it is verified whether the user is a member of the department or whether permission for cross-department data access is obtained.

After determining the access rights of the user, verifying the rights of the user and the request data. It is checked whether the user has access rights to the requested data, including levels of rights to read, write, modify, etc. For example, a user requests to read a table of customer base information, and the platform verifies whether its rights contain a read operation to the table.

In one embodiment, the requested data itself is checked, including but not limited to checking whether the data is present, in an accessible state, and in compliance with data security and privacy requirements. For example, some sensitive data may not be accessible outside a particular time window or may need to be desensitized before being provided to a user.

And S304, when the right verification is passed, the request data is displayed to the user, and the data blood-edge relationship is recorded for realizing data tracking.

In one embodiment, after the user access rights verification is passed, the platform converts the requested data into a suitable format for presentation to the user in accordance with the user's request and preferences. For example, if a user requests to view data in tabular form, the platform converts the query result set into tabular form, including setting a header, arranging rows and columns of data, and the like. For data analysis personnel in the insurance industry, distribution and trend of the claim data can be displayed in an intuitive statistical chart form, so that business analysis is facilitated.

In one embodiment, the data blood-lineage relationship is recorded while the data is provided. The method comprises the steps of data source, passing processing links, user access information, data circulation paths and the like. For example, when a user accesses a claim data report that has been subjected to data cleansing and conversion, it is recorded from which original data table the report originated from, the path the data was transmitted, and the current user's access operation.

The data blood relationship records can be used for tracing the data quality problem, influencing analysis and compliance audit. When the data is found to be wrong or needs to be updated, the affected users and business processes can be rapidly positioned through the blood relationship. For example, when errors occur in the insurance product data, departments and personnel using the data for pricing analysis are tracked through blood relationship, and relevant personnel are timely notified to take measures.

In the embodiment, through the data quality control platform, the user can flexibly access the data, cross-department collaboration is realized, data sharing and flow optimization are promoted, and the utilization rate of data resources is improved.

Further, when the right verification is passed, the request data is displayed to the user, wherein the method comprises the steps of obtaining the data category of the request data when the right verification is passed, determining the data encryption rule of the request data according to the data category and the user access right, conducting encryption processing on the request data based on the data encryption rule, and displaying the encrypted request data to the user.

In one embodiment, after receiving a user request and passing the permission check, the system needs to determine the data category of the requested data. According to the service requirement and sensitivity, the data are classified into different categories such as public data, internal data, confidential data and the like, and according to the data content, the data can be classified into name data, age data, date data and the like. Different classes of data use different encryption rules. For example, confidential data may require a higher level encryption algorithm.

In one embodiment, encryption rules are determined based on the identified data categories and the access rights of the user. For example, for data containing personal sensitive information (e.g. identification card number, bank account), a strong encryption algorithm (e.g. SM3 encryption algorithm) should be used, regardless of the user rights. For general service data, a proper encryption mode can be selected according to the authority level of the user, and the user with lower authority can only see the data after partial encryption or desensitization.

Exemplary, for example, the customer information may be displayed as desensitized encrypted when presented to the outside:

1. Taking the first Chinese character of the personal name (if the name is foreign, taking the first 3 bytes of its UTF-8 (8 bits Unicode Transformation Format, a variable length character code), followed by the identity document number (English in the document number is uniformly converted into uppercase), forming a character string (UTF-8 code, taking the resident identity number as an example, 21 bytes);

2. Taking the SM3 hash value of the above string, it is a 64 byte string (lowercase form representation). SM3 is a password hash algorithm defined in a GB/T32905-2016 information security technology SM3 password hash algorithm;

3. the first 6 bytes of the code of the identity document number UTF-8 are taken, and the SM3 hash value is connected to obtain a 70-byte character string which is the final desensitization value.

In one embodiment, the requested data is encrypted using selected encryption rules, and the encrypted data is presented to the user.

In another embodiment, in the data processing and storing process, the data is encrypted by adopting the corresponding encryption rule, so that the safety of the data is ensured, the data is prevented from being stolen or tampered in the transmission and storing process, and the data privacy is protected.

In the embodiment, the data encryption rule is flexibly determined according to the data type and the user access authority, so that data leakage is avoided, and data security is improved.

Further, after the step S304, the method further includes obtaining a service processing flow corresponding to the request data, verifying a data flow path of the request data based on the service processing flow and a data blood relationship of the request data to obtain a data path verification result, determining that the user has an illegal action when the data flow path does not conform to the service processing flow, and generating an illegal alarm based on the illegal action.

In one embodiment, the business process flows include data process flows under different business scenarios. For example, the insurance claim settlement business process comprises the links of reporting, surveying, assessing losses, checking claims, paying claims and the like.

And according to the data blood relationship, restoring all circulation links from the generation of the request data to the current state, including information such as a source system of the data, the passing processing steps, which service links are used, and the like, and obtaining a data circulation path.

And comparing the restored data flow path with the service processing flow. It is checked whether the data is circulated in the order and rule specified by the flow. For example, in the insurance claim settlement business, the claim payment data enters the claim payment link after the front links such as survey, damage assessment, claim checking and the like are completed.

If the data flow path is not consistent with the business flow, for example, the pay data directly enters the pay operation without going through the pay link, and the platform judges that the user may have illegal behaviors.

Triggering an alarm mechanism to generate a violation alarm. The alarm information should describe the nature of the violation, the data involved, the violation links, etc. in detail. For example, "when the user a processes the claim case B, the claim link is skipped to directly perform the claim payment operation, and serious rule violations exist. "

And informing relevant management personnel of the violation alarms in a mode of mail, short messages or in-station messages and the like, and recording detailed information of the violation alarms in a platform, including alarm time, violation users, related data, processing states and the like, so as to audit and process.

In the embodiment, the data circulation path can be monitored through the data blood relationship and compared with the business flow to judge whether illegal behaviors exist or not, whether data use is standard or not and whether user behaviors are standard or not is effectively monitored, and the rationality and the data quality of data access are improved.

Referring to fig. 4, fig. 4 is a schematic block diagram of a data quality control device according to an embodiment of the present application, where the data quality control device is configured to perform the foregoing data quality control method. The data quality control device can be configured on a server.

As shown in fig. 4, the data quality control apparatus 400 includes:

the data acquisition module 401 is configured to acquire data to be managed and a target quality evaluation rule corresponding to the data to be managed;

A quality evaluation module 402, configured to perform a quality evaluation task at regular time based on a preset timing scheduling tool and the target quality evaluation rule, to obtain a quality evaluation result of the data to be managed, where the quality evaluation result includes a quality check result;

The policy determining module 403 is configured to determine a data repair policy according to the quality evaluation result when the quality check result is that the check fails, and generate a data repair task based on the data repair policy;

And the data repair module 404 is configured to execute the data repair task at regular time based on the timing scheduling tool and the data repair policy when a preset data repair trigger condition is reached, and store the repaired data in a preset data warehouse.

Further, the data acquisition module 401 includes:

The data quality list obtaining unit is used for monitoring the data change condition of the data to be managed and controlled and obtaining a data quality list according to the data change condition;

and the quality evaluation rule adjusting unit is used for adjusting a preset quality evaluation rule according to the data quality list to obtain a target quality evaluation rule.

Further, the quality assessment module 402 includes:

The quality verification unit is used for carrying out quality verification on the data to be managed and controlled based on the timing scheduling tool and the target quality evaluation rule to obtain a quality verification result;

The abnormality monitoring unit is used for marking the data to be managed as data to be repaired when the quality check result is that the quality check result fails, calling a preset abnormality detection model, and carrying out abnormality detection on the data to be repaired to obtain an abnormality problem and an abnormality grade of the data to be repaired;

And a result generation unit configured to generate the quality evaluation result based on the quality check result, the abnormality question, and the abnormality level.

Further, the data quality control device 400 further includes a data access control module, where the data access control module includes:

The request receiving unit is used for acquiring a data responsibility list based on a preset data quality control platform and receiving a user access request;

The request analysis unit is used for analyzing the user access request to obtain user information and request data;

the permission verification unit is used for determining user access permission based on the user information and the data responsibility list and verifying the user access permission and the request data;

and the data display unit is used for displaying the request data to a user when the right verification is passed, and recording the data blood relationship so as to realize data tracking.

Further, the data display unit includes:

a data category obtaining subunit, configured to obtain a data category of the request data when the weight verification passes;

an encryption rule determining subunit, configured to determine a data encryption rule of the requested data according to the data class and the user access right;

And the data encryption display subunit is used for encrypting the request data based on the data encryption rule and displaying the encrypted request data to the user.

Further, the data quality control apparatus 400 further includes a violation verification module, where the violation verification module includes:

A service processing flow obtaining unit, configured to obtain a service processing flow corresponding to the request data;

A path verification result obtaining unit, configured to verify a data flow path of the request data based on the service processing flow and a data blood edge relationship of the request data, to obtain a data path verification result;

And the violation alarm generation unit is used for determining that the user has violation behaviors when the data path verification result is not in accordance with the business processing flow of the data flow path, and generating violation alarms based on the violation behaviors.

Further, the data acquisition module 401 includes:

the data acquisition unit is used for acquiring data in real time based on a preset data acquisition and transmission tool;

the data preprocessing unit is used for preprocessing the data based on a preset data preprocessing tool to obtain the data to be managed and controlled.

It should be noted that, for convenience and brevity of description, the specific working process of the apparatus and each module described above may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

The apparatus described above may be implemented in the form of a computer program which is executable on a computer device as shown in fig. 5.

Referring to fig. 5, fig. 5 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device may be a server.

With reference to FIG. 5, the computer device includes a processor, memory, and a network interface connected by a system bus, where the memory may include a non-volatile storage medium and an internal memory.

The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause a processor to perform any of a number of data quality management methods.

The processor is used to provide computing and control capabilities to support the operation of the entire computer device.

The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by a processor, causes the processor to perform any of a number of data quality management methods.

The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in FIG. 5 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

It should be appreciated that the Processor may be a central processing unit (Central Processing Unit, CPU), it may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein in one embodiment the processor is configured to run a computer program stored in the memory to implement the steps of:

In one embodiment, when implementing the target quality evaluation rule corresponding to the data to be managed, the processor is configured to implement:

Monitoring the data change condition of the data to be managed and controlled, and obtaining a data quality list according to the data change condition;

And adjusting a preset quality evaluation rule according to the data quality list to obtain a target quality evaluation rule.

In one embodiment, the processor is configured to, when implementing a quality assessment task based on a preset timing scheduling tool and the target quality assessment rule, obtain a quality assessment result of the data to be managed, where the quality assessment result includes a quality check result, implement:

Based on the timing scheduling tool and the target quality evaluation rule, performing quality check on the data to be managed and controlled to obtain the quality check result;

When the quality check result is that the check fails, marking the data to be managed as data to be repaired, calling a preset abnormality detection model, and carrying out abnormality detection on the data to be repaired to obtain an abnormality problem and an abnormality grade of the data to be repaired;

and generating the quality evaluation result based on the quality check result, the abnormal problem and the abnormal grade.

In one embodiment, when the processor achieves that the preset data repair triggering condition is reached, the processor executes the data repair task at regular time based on the timing scheduling tool and the data repair policy, and stores the repaired data in a preset data warehouse, and then the processor is further configured to achieve:

based on a preset data quality control platform, acquiring a data responsibility list and receiving a user access request;

analyzing the user access request to obtain user information and request data;

determining user access rights based on the user information and the data responsibility list, and checking the user access rights and the request data;

and when the authority verification is passed, displaying the request data to a user, and recording the data blood-edge relationship for realizing data tracking.

In one embodiment, the processor, when implementing that the request data is presented to the user when the rights verification is passed, is configured to implement:

when the authority verification is passed, acquiring the data category of the request data;

Determining a data encryption rule of the request data according to the data category and the user access authority;

And encrypting the request data based on the data encryption rule, and displaying the encrypted request data to the user.

In one embodiment, the processor, when implementing that the right check passes, presents the request data to the user and records the data blood relationship for implementing the data tracking, and then further implements:

acquiring a business processing flow corresponding to the request data;

based on the business processing flow and the data blood relationship of the request data, verifying a data flow path of the request data to obtain a data path verification result;

And when the data path verification result is that the data flow path does not accord with the business processing flow, determining that the user has illegal behaviors, and generating an illegal alarm based on the illegal behaviors.

In one embodiment, the processor, when implementing obtaining the data to be managed, is configured to implement:

Based on a preset data capturing and transmitting tool, acquiring data in real time;

and preprocessing the data based on a preset data preprocessing tool to obtain the data to be managed.

The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, the computer program comprises program instructions, and the processor executes the program instructions to realize any item of data quality control method provided by the embodiment of the application.

The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, which are provided on the computer device.

While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A method for data quality control, comprising:

2. The method for data quality control according to claim 1, wherein the obtaining the target quality evaluation rule corresponding to the data to be controlled includes:

3. The method for data quality control according to claim 1, wherein the performing a quality evaluation task at regular time based on a preset timing scheduling tool and the target quality evaluation rule, and obtaining a quality evaluation result of the data to be controlled, wherein the quality evaluation result includes a quality check result, includes:

4. The method according to claim 1, wherein when a preset data repair trigger condition is reached, the method performs the data repair task at regular time based on the timing scheduling tool and the data repair policy, and stores the repaired data in a preset data warehouse, and further comprising:

analyzing the user access request to obtain user information and request data;

5. The method of claim 4, wherein the presenting the requested data to the user when the rights verification passes comprises:

6. The method of claim 4, wherein when the right check is passed, displaying the request data to a user, and recording a data blood relationship for realizing data tracking, further comprising:

acquiring a business processing flow corresponding to the request data;

7. The method for data quality control according to any one of claims 1 to 6, wherein the obtaining the data to be controlled includes:

8. A data quality control apparatus, comprising:

9. A computer device, the computer device comprising a memory and a processor;

the memory is used for storing a computer program;

the processor for executing the computer program and for implementing the data quality management method according to any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to implement the data quality management method according to any one of claims 1 to 7.