CN115757384A

CN115757384A - A big data-based government data processing method

Info

Publication number: CN115757384A
Application number: CN202211520308.XA
Authority: CN
Inventors: 李娟�
Original assignee: Anhui Changzheng Think Tank Management Consulting Co ltd
Current assignee: Anhui Changzheng Think Tank Management Consulting Co ltd
Priority date: 2022-11-30
Filing date: 2022-11-30
Publication date: 2023-03-07

Abstract

The invention relates to data processing, in particular to a big data-based government affair data processing method, which comprises the steps of constructing a government affair data sharing model for multi-place joint office, detecting whether abnormal data exist in government affair data by using the government affair data sharing model, and removing the abnormal data; determining a data optimization model for optimizing the government affair data from which the abnormal data are removed; carrying out optimization training on the data optimization model until an optimal data optimization model is obtained; the optimal data optimal model is used for optimizing the government affair data to obtain high-quality government affair data; the technical scheme provided by the invention can effectively overcome the defects that abnormal data in government affair data cannot be effectively removed and high-quality government affair data cannot be preferably obtained from the government affair data in the prior art.

Description

A big data-based government data processing method

技术领域technical field

本发明涉及数据处理，具体涉及一种基于大数据的政务数据处理方法。The invention relates to data processing, in particular to a method for processing government affairs data based on big data.

背景技术Background technique

政务指与政府相关的事务性工作，当某项政务商议完成后，需要进行发布工作，以使人民群众和政府工作人员及时获知。随着互联网技术的发展，政务的发布已经由最初的纸质文件扩展到了网络。政府部门经过长期发展，记录了大量的政务数据，这些都是政府部门进行日常管理的重要依据。政务数据的数据量大、数据类型多、来源广泛、数据格式复杂，随着大数据和互联网的发展，政府对挖掘各部门政务数据价值的需求越来越大。Government affairs refer to the transactional work related to the government. When a certain government affairs consultation is completed, it needs to be released so that the people and government staff can be informed in time. With the development of Internet technology, the release of government affairs has expanded from the initial paper documents to the Internet. After long-term development, government departments have recorded a large amount of government affairs data, which are important basis for daily management of government departments. Government data has a large amount of data, many data types, a wide range of sources, and complex data formats. With the development of big data and the Internet, the government has an increasing demand for mining the value of government data from various departments.

近年来，随着技术的进步和政策方面的导向，数字政府的改革提出打破“数据孤岛”，提高办事效率需要各政府部门进行数据对接，日益严峻的安全问题要求各政府部门的数据能够有效共享，便民服务的优化升级需要各政府部门相互协作，这些都促使各级政府积极推动跨部门政务数据共享。In recent years, with the advancement of technology and the guidance of policies, the reform of digital government proposes to break the "data islands". Improving the efficiency of work requires the data docking of various government departments. The increasingly serious security problems require that the data of various government departments can be effectively shared. , the optimization and upgrading of convenience services requires the cooperation of various government departments, which has prompted governments at all levels to actively promote cross-departmental government data sharing.

然而，跨部门政务数据共享是一个非常复杂的工程，如何有效剔除政务数据中的异常数据，以及通过优选得到高质量政务数据进行数据价值的充分挖掘是当前政务数据共享亟待解决的问题。However, inter-departmental government data sharing is a very complex project. How to effectively eliminate abnormal data in government data and fully mine data value by optimizing high-quality government data is an urgent problem to be solved in current government data sharing.

发明内容Contents of the invention

(一)解决的技术问题(1) Solved technical problems

针对现有技术所存在的上述缺点，本发明提供了一种基于大数据的政务数据处理方法，能够有效克服现有技术所存在的不能有效剔除政务数据中的异常数据，无法从政务数据中优选得到高质量政务数据的缺陷。Aiming at the above-mentioned shortcomings existing in the prior art, the present invention provides a method for processing government affairs data based on big data, which can effectively overcome the problems existing in the prior art that the abnormal data in the government affairs data cannot be effectively eliminated and cannot be optimized from the government affairs data. Drawbacks of obtaining high-quality government data.

(二)技术方案(2) Technical solution

为实现以上目的，本发明通过以下技术方案予以实现：To achieve the above object, the present invention is achieved through the following technical solutions:

一种基于大数据的政务数据处理方法，包括以下步骤：A method for processing government affairs data based on big data, comprising the following steps:

S1、构建多地联合办公的政务数据共享模型，利用政务数据共享模型检测政务数据中是否存在异常数据，并剔除异常数据；S1. Build a government data sharing model for multi-site joint office, use the government data sharing model to detect whether there is abnormal data in the government data, and eliminate the abnormal data;

S2、确定用于对剔除异常数据后的政务数据进行优选的数据优选模型；S2. Determine a data optimization model for optimizing government affairs data after removing abnormal data;

S3、对数据优选模型进行优化训练，直至得到最优数据优选模型；S3. Perform optimization training on the data optimization model until an optimal data optimization model is obtained;

S4、利用最优数据优选模型对政务数据进行优选，得到高质量政务数据。S4. Using the optimal data optimization model to optimize the government affairs data to obtain high-quality government affairs data.

优选地，S1中构建多地联合办公的政务数据共享模型，包括：Preferably, in S1, a government affairs data sharing model for joint offices in multiple places is constructed, including:

获取政务节点，利用区块链中的共识算法，从政务节点中随机选择一个节点，作为当前轮次的聚合节点；Obtain a government node, and use the consensus algorithm in the blockchain to randomly select a node from the government nodes as the aggregation node of the current round;

进行数据降维后基于聚合节点构建孤立森林，并剔除孤立参数向量后完成节点聚合；After data dimensionality reduction, an isolated forest is constructed based on aggregated nodes, and node aggregation is completed after eliminating isolated parameter vectors;

将剔除孤立参数向量后的向量的哈希值上传至区块链，并将向量源数据发送至下轮参与节点，直至完成政务数据共享模型的构建。Upload the hash value of the vector after removing the isolated parameter vector to the blockchain, and send the vector source data to the next round of participating nodes until the construction of the government data sharing model is completed.

优选地，所述进行数据降维，包括：Preferably, the data dimensionality reduction includes:

聚合政务节点对于数据向量的各个维度，获取各个维度的对应值，利用降维图法进行数据降维。For each dimension of the data vector, the aggregation government node obtains the corresponding value of each dimension, and uses the dimensionality reduction graph method to perform data dimensionality reduction.

优选地，所述基于聚合节点构建孤立森林，包括：Preferably, said constructing an isolated forest based on aggregation nodes includes:

基于聚合节点对政务节点进行聚合，根据降维后的数据集合构建拥有k棵孤立树的孤立森林，得到孤立参数向量。Aggregate the government nodes based on the aggregation nodes, construct an isolated forest with k isolated trees according to the data set after dimensionality reduction, and obtain the isolated parameter vector.

优选地，所述剔除孤立参数向量，包括：Preferably, the elimination of isolated parameter vectors includes:

对于政务节点对于数据向量的各个维度，计算箱型图函数；For each dimension of the data vector of the government node, calculate the box plot function;

若一个维度内的所有函数值均在设定范围内，或一半以上的函数值均在设定范围外，则剔除该维度内的数据向量。If all the function values in a dimension are within the set range, or more than half of the function values are outside the set range, then the data vector in this dimension is eliminated.

优选地，S2中确定用于对剔除异常数据后的政务数据进行优选的数据优选模型，包括：Preferably, in S2, a data optimization model for optimizing government affairs data after removing abnormal data is determined, including:

根据政务数据选取包含默认网络结构和超参数的模型文件，根据预设损失期望值确定迭代算法的算法文件。Select the model file containing the default network structure and hyperparameters according to the government affairs data, and determine the algorithm file of the iterative algorithm according to the preset loss expectation value.

优选地，S3中对数据优选模型进行优化训练，直至得到最优数据优选模型，包括：Preferably, in S3, the data optimization model is optimized and trained until the optimal data optimization model is obtained, including:

利用调优器对数据优选模型进行训练，得到目标数据优选模型；Use the tuner to train the data optimization model to obtain the target data optimization model;

基于目标数据优选模型的模型参数利用评估器对目标数据优选模型进行评估，得到模型评估结果；Based on the model parameters of the target data optimization model, the evaluator is used to evaluate the target data optimization model, and the model evaluation result is obtained;

基于模型评估结果利用调优器对目标数据优选模型进行初始化，利用调优器和评估器对目标数据优选模型进行循环优化，直至达到预设收敛条件，得到最优数据优选模型。Based on the model evaluation results, the optimizer is used to initialize the target data optimization model, and the target data optimization model is cyclically optimized by using the tuner and evaluator until the preset convergence condition is reached, and the optimal data optimization model is obtained.

优选地，所述利用调优器对数据优选模型进行训练，得到目标数据优选模型，包括：Preferably, using the tuner to train the data optimization model to obtain the target data optimization model includes:

根据预设优化方式，利用调优器对数据优选模型进行训练，得到目标数据优选模型；According to the preset optimization method, use the tuner to train the data optimization model to obtain the target data optimization model;

其中，预设优化方式包括强化学习方式、非导数优化方式和启发式搜索方式。Among them, the preset optimization methods include reinforcement learning methods, non-derivative optimization methods and heuristic search methods.

优选地，所述基于目标数据优选模型的模型参数利用评估器对目标数据优选模型进行评估，得到模型评估结果，包括：Preferably, the model parameters of the optimal model based on target data use an evaluator to evaluate the optimal model of target data to obtain model evaluation results, including:

根据预设评估方式，基于目标数据优选模型的模型参数利用评估器对目标数据优选模型进行评估，得到模型评估结果。According to the preset evaluation method, based on the model parameters of the target data optimization model, the evaluator is used to evaluate the target data optimization model to obtain the model evaluation result.

优选地，所述基于模型评估结果利用调优器对目标数据优选模型进行初始化，包括：Preferably, the optimizer is used to initialize the target data optimization model based on the model evaluation result, including:

采用经验学习算法确定模型评估结果对应的最优模型参数，基于最优模型参数利用调优器对目标数据优选模型进行初始化。The empirical learning algorithm is used to determine the optimal model parameters corresponding to the model evaluation results, and based on the optimal model parameters, the optimizer is used to initialize the target data optimization model.

(三)有益效果(3) Beneficial effects

与现有技术相比，本发明所提供的一种基于大数据的政务数据处理方法，具有以下有益效果：Compared with the prior art, a method for processing government affairs data based on big data provided by the present invention has the following beneficial effects:

1)构建多地联合办公的政务数据共享模型，利用政务数据共享模型检测政务数据中是否存在异常数据，并剔除异常数据，从而能够利用政务数据共享模型实现对政务数据中异常数据的有效剔除，确保后续对政务数据进行优选，得到高质量政务数据的准确性；1) Build a government data sharing model for multi-site joint office, use the government data sharing model to detect whether there is abnormal data in the government data, and remove the abnormal data, so that the government data sharing model can be used to effectively remove the abnormal data in the government data, Ensure the subsequent optimization of government affairs data to obtain the accuracy of high-quality government affairs data;

2)确定用于对剔除异常数据后的政务数据进行优选的数据优选模型，对数据优选模型进行优化训练，直至得到最优数据优选模型，利用最优数据优选模型对政务数据进行优选，得到高质量政务数据，从而能够利用最优数据优选模型从政务数据中优选得到高质量政务数据，为政务数据价值的充分挖掘提供数据保障。2) Determine the data optimization model used to optimize the government affairs data after removing abnormal data, optimize and train the data optimization model until the optimal data optimization model is obtained, use the optimal data optimization model to optimize the government affairs data, and obtain high-quality Quality government data, so that the optimal data optimization model can be used to optimize high-quality government data from government data, and provide data protection for the full mining of the value of government data.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the drawings that are required in the description of the embodiments or the prior art. Apparently, the drawings in the following description are only some embodiments of the present invention, and those skilled in the art can obtain other drawings according to these drawings without creative efforts.

图1为本发明的流程示意图；Fig. 1 is a schematic flow sheet of the present invention;

图2为本发明中剔除政务数据中异常数据的流程示意图；Fig. 2 is a schematic flow chart of removing abnormal data in government affairs data in the present invention;

图3为本发明中优选得到高质量政务数据的流程示意图。Fig. 3 is a schematic flow diagram for preferably obtaining high-quality government affairs data in the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述。显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Apparently, the described embodiments are some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

一种基于大数据的政务数据处理方法，如图1和图2所示，①构建多地联合办公的政务数据共享模型，利用政务数据共享模型检测政务数据中是否存在异常数据，并剔除异常数据。A government data processing method based on big data, as shown in Figure 1 and Figure 2, ① Construct a multi-site joint office data sharing model, use the government data sharing model to detect whether there is abnormal data in the government data, and eliminate abnormal data .

其中，构建多地联合办公的政务数据共享模型，包括：Among them, a government data sharing model for joint offices in multiple places is constructed, including:

1)进行数据降维，包括：1) Perform data dimensionality reduction, including:

2)基于聚合节点构建孤立森林，包括：2) Construct an isolated forest based on aggregation nodes, including:

3)剔除孤立参数向量，包括：3) Eliminate isolated parameter vectors, including:

上述技术方案，构建多地联合办公的政务数据共享模型，利用政务数据共享模型检测政务数据中是否存在异常数据，并剔除异常数据，从而能够利用政务数据共享模型实现对政务数据中异常数据的有效剔除，确保后续对政务数据进行优选，得到高质量政务数据的准确性。The above technical solution builds a government data sharing model for multi-site joint office, uses the government data sharing model to detect whether there is abnormal data in the government data, and eliminates the abnormal data, so that the government data sharing model can be used to realize the effective analysis of the abnormal data in the government data. Elimination to ensure the subsequent optimization of government affairs data and the accuracy of high-quality government affairs data.

如图1和图3所示，②确定用于对剔除异常数据后的政务数据进行优选的数据优选模型，具体包括：As shown in Figure 1 and Figure 3, ② Determine the data optimization model used to optimize the government data after removing abnormal data, including:

③对数据优选模型进行优化训练，直至得到最优数据优选模型，具体包括：③ Optimize and train the data optimization model until the optimal data optimization model is obtained, specifically including:

1)利用调优器对数据优选模型进行训练，得到目标数据优选模型，包括：1) Use the tuner to train the data optimization model to obtain the target data optimization model, including:

2)基于目标数据优选模型的模型参数利用评估器对目标数据优选模型进行评估，得到模型评估结果，包括：2) Based on the model parameters of the target data optimization model, use the evaluator to evaluate the target data optimization model, and obtain the model evaluation results, including:

3)基于模型评估结果利用调优器对目标数据优选模型进行初始化，包括：3) Based on the model evaluation results, use the tuner to initialize the target data optimization model, including:

④利用最优数据优选模型对政务数据进行优选，得到高质量政务数据。④Use the optimal data optimization model to optimize government affairs data to obtain high-quality government affairs data.

上述技术方案，确定用于对剔除异常数据后的政务数据进行优选的数据优选模型，对数据优选模型进行优化训练，直至得到最优数据优选模型，利用最优数据优选模型对政务数据进行优选，得到高质量政务数据，从而能够利用最优数据优选模型从政务数据中优选得到高质量政务数据，为政务数据价值的充分挖掘提供数据保障。The above technical solution determines the data optimization model for optimizing the government affairs data after removing the abnormal data, optimizes and trains the data optimization model until the optimal data optimization model is obtained, and uses the optimal data optimization model to optimize the government affairs data, Obtain high-quality government affairs data, so that the optimal data optimization model can be used to optimize high-quality government affairs data from government affairs data, and provide data guarantee for the full mining of the value of government affairs data.

以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不会使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be described in the foregoing embodiments The recorded technical solutions are modified, or some of the technical features are replaced equivalently; and these modifications or replacements will not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for processing government affairs data based on big data, characterized in that: comprising the following steps:

S1. Build a government data sharing model for multi-site joint office, use the government data sharing model to detect whether there is abnormal data in the government data, and eliminate the abnormal data;

S2. Determine a data optimization model for optimizing government affairs data after removing abnormal data;

S3. Perform optimization training on the data optimization model until an optimal data optimization model is obtained;

S4. Using the optimal data optimization model to optimize the government affairs data to obtain high-quality government affairs data.

2. The method for processing government affairs data based on big data according to claim 1, characterized in that: in S1, a government affairs data sharing model for joint offices in multiple places is constructed, including:

Obtain a government node, and use the consensus algorithm in the blockchain to randomly select a node from the government nodes as the aggregation node of the current round;

After data dimensionality reduction, an isolated forest is constructed based on aggregated nodes, and node aggregation is completed after eliminating isolated parameter vectors;

Upload the hash value of the vector after removing the isolated parameter vector to the blockchain, and send the vector source data to the next round of participating nodes until the construction of the government data sharing model is completed.

3. The method for processing government affairs data based on big data according to claim 2, wherein said performing data dimension reduction includes:

For each dimension of the data vector, the aggregation government node obtains the corresponding value of each dimension, and uses the dimensionality reduction graph method to perform data dimensionality reduction.

4. the government affairs data processing method based on big data according to claim 3, is characterized in that: described construction isolated forest based on aggregation node, comprises:

Aggregate the government nodes based on the aggregation nodes, construct an isolated forest with k isolated trees according to the data set after dimensionality reduction, and obtain the isolated parameter vector.

5. the government affairs data processing method based on big data according to claim 4, is characterized in that: described eliminating isolated parameter vector, comprises:

For each dimension of the data vector of the government node, calculate the box plot function;

If all the function values in a dimension are within the set range, or more than half of the function values are outside the set range, then the data vector in this dimension is eliminated.

6. The method for processing government affairs data based on big data according to claim 1, characterized in that: in S2, it is determined that the data optimization model used to optimize the government affairs data after removing abnormal data includes:

Select the model file containing the default network structure and hyperparameters according to the government affairs data, and determine the algorithm file of the iterative algorithm according to the preset loss expectation value.

7. The method for processing government affairs data based on big data according to claim 1, characterized in that: in S3, the data optimization model is optimized and trained until the optimal data optimization model is obtained, including:

Use the tuner to train the data optimization model to obtain the target data optimization model;

Based on the model parameters of the target data optimization model, the evaluator is used to evaluate the target data optimization model, and the model evaluation result is obtained;

Based on the model evaluation results, the optimizer is used to initialize the target data optimization model, and the target data optimization model is cyclically optimized by using the tuner and evaluator until the preset convergence condition is reached, and the optimal data optimization model is obtained.

8. the government affairs data processing method based on big data according to claim 7, is characterized in that: described utilizes tuner to train data optimal model, obtains target data optimal model, comprises:

According to the preset optimization method, use the tuner to train the data optimization model to obtain the target data optimization model;

Among them, the preset optimization methods include reinforcement learning methods, non-derivative optimization methods and heuristic search methods.

9. The government affairs data processing method based on big data according to claim 7, characterized in that: the model parameters of the preferred model based on target data utilize an evaluator to evaluate the preferred model of target data, and obtain a model evaluation result, including:

According to the preset evaluation method, based on the model parameters of the target data optimization model, the evaluator is used to evaluate the target data optimization model to obtain the model evaluation result.

10. The method for processing government affairs data based on big data according to claim 7, characterized in that: said model-based evaluation result uses an optimizer to initialize the target data optimization model, including:

The empirical learning algorithm is used to determine the optimal model parameters corresponding to the model evaluation results, and based on the optimal model parameters, the optimizer is used to initialize the target data optimization model.