+

CN114139720A - Method and device for processing big data of government affairs based on machine learning - Google Patents

Method and device for processing big data of government affairs based on machine learning Download PDF

Info

Publication number
CN114139720A
CN114139720A CN202111358382.1A CN202111358382A CN114139720A CN 114139720 A CN114139720 A CN 114139720A CN 202111358382 A CN202111358382 A CN 202111358382A CN 114139720 A CN114139720 A CN 114139720A
Authority
CN
China
Prior art keywords
data processing
model
preset
processing model
government affairs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111358382.1A
Other languages
Chinese (zh)
Inventor
梁明杰
郑鹏
刘志徽
韦静贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Zhongke Shuguang Cloud Computing Co ltd
Pingnan Zhongke Shuguang cloud computing Co.,Ltd.
Original Assignee
Guangxi Zhongke Shuguang Cloud Computing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Zhongke Shuguang Cloud Computing Co ltd filed Critical Guangxi Zhongke Shuguang Cloud Computing Co ltd
Priority to CN202111358382.1A priority Critical patent/CN114139720A/en
Publication of CN114139720A publication Critical patent/CN114139720A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请公开了一种基于机器学习的政务大数据处理方法及装置,通过获取政务日志数据,在预设搜索空间中,确定用于对政务日志数据进行预处理的预设数据处理模型,从而使政务日志数据能够在预设搜索空间基于机器学习实现自动化处理;基于调优器技术和评估器技术,对预设数据处理模型进行优化训练,直至得到最优数据处理模型,从而避免传统手动流程容易出错、效率不高且难于管理的问题;最后利用最优数据处理模型,对政务日志数据进行预处理,得到高质量政务数据,以及对高质量政务数据进行存储或可视化展示,提高数据预处理能力,提升实时批量采集、精准过滤的效率。

Figure 202111358382

The present application discloses a method and device for processing government affairs big data based on machine learning. By acquiring government affairs log data, in a preset search space, a preset data processing model for preprocessing the government affairs log data is determined, so that the The government affairs log data can be automatically processed based on machine learning in the preset search space; based on the optimizer technology and evaluator technology, the preset data processing model is optimized and trained until the optimal data processing model is obtained, thus avoiding the traditional manual process. Errors, low efficiency and difficult management; finally, the optimal data processing model is used to preprocess government log data to obtain high-quality government data, and to store or visualize high-quality government data to improve data preprocessing capabilities. , to improve the efficiency of real-time batch collection and precise filtering.

Figure 202111358382

Description

Government affair big data processing method and device based on machine learning
Technical Field
The application relates to the technical field of big data, in particular to a government affair big data processing method and device based on machine learning.
Background
With the development of big data technology, municipal administration data are gradually gathered to a big data platform of government affairs of smart cities, matched tools for data acquisition, calculation, processing, analysis and the like are formed, mechanisms such as metadata management, data sharing, data safety protection and the like are established, and data innovation application is developed.
But the smart city government big data platform also faces challenges: the traditional data preprocessing comprises the processes of data cleaning, data sampling, data processing, data segmentation and the like, each process has multiple alternative methods, data analysis is often required to be carried out on data before the method is selected, the whole data preprocessing process is repeated and time-consuming, and the data preprocessing efficiency of a government affair big data processing system is very low.
Disclosure of Invention
The application provides a government affair big data processing method and device based on machine learning, and aims to solve the technical problem that a government affair big data processing system is low in data preprocessing efficiency.
To solve the above technical problem, a first aspect. The embodiment of the application provides a government affair big data processing method based on machine learning, which comprises the following steps:
acquiring government affair log data;
in a preset search space, determining a preset data processing model for preprocessing government affair log data;
performing optimization training on a preset data processing model based on a tuner technology and an evaluator technology until an optimal data processing model is obtained;
preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data;
and storing or visually displaying the high-quality government affair data.
In the embodiment, by acquiring government affair log data and determining a preset data processing model for preprocessing the government affair log data in a preset search space, the government affair log data can be automatically processed in the preset search space based on machine learning; based on the optimizer technology and the evaluator technology, the preset data processing model is optimized and trained until the optimal data processing model is obtained, so that the problems that the traditional manual process is easy to make mistakes, low in efficiency and difficult to manage are solved, and the problem that configuration parameter adjustment is difficult due to the lack of professional knowledge for configuring and optimizing different algorithms is solved; and finally, preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data, and storing or visually displaying the high-quality government affair data, so that the data preprocessing capacity is improved, and the efficiency of real-time batch acquisition and accurate filtering is improved.
In one embodiment, in the preset search space, determining a preset data processing model for preprocessing government affair log data includes:
in a preset search space, selecting a model file containing a default network structure and hyper-parameters according to government affair log data;
and determining an algorithm file of an iterative algorithm according to a preset model loss expected value, wherein the preset data processing model comprises a model file and an algorithm file.
In the embodiment, the model files and the algorithm files are automatically determined in the preset search space, so that the automation of model selection and algorithm selection is realized, the model deployment training efficiency is improved, and the data preprocessing efficiency is improved.
In one embodiment, the performing optimization training on the preset data processing model based on the tuner technology and the evaluator technology until obtaining the optimal data processing model includes:
training a preset data processing model by using a preset tuner to obtain a target data processing model, wherein the target data processing model comprises model parameters;
evaluating the target data processing model by using a preset evaluator according to the model parameters to obtain a model evaluation result;
initializing a target data processing model by using an optimizer according to a model evaluation result;
and circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition to obtain an optimal data processing model.
In the embodiment, model parameters are continuously and circularly optimized through the optimizer and the evaluator to obtain the intelligent model for adjusting the acquisition and filtering big data processing mechanism, so that the automatic parameter adjustment of the model is realized, the problem of error caused by complicated steps of large manual parameter adjustment is solved, the time is saved, and the labor cost is reduced.
In a preferred embodiment, training a preset data processing model by using a preset tuning device to obtain a target data processing model, includes:
and training the preset data processing model by using the optimizer according to a preset optimization mode to obtain a target data processing model, wherein the preset optimization mode comprises a heuristic search mode, a non-conductive optimization mode and a reinforcement learning mode.
The embodiment trains through a heuristic search mode, a non-guide optimization mode, a reinforcement learning mode and other preset optimization modes, does not need specific assumed conditions, and enables the model training to be more efficient.
In a preferred embodiment, the evaluating the target data processing model according to the model parameters by using a preset evaluator to obtain a model evaluation result, including:
and performing auxiliary evaluation on the target data processing model by using an evaluator according to the model parameters by adopting a preset auxiliary evaluation method to obtain a model evaluation result, wherein the preset auxiliary evaluation method comprises a sub-sampling method, a parameter multiplexing method and an agent evaluation method.
In the embodiment, the evaluation is performed by using an auxiliary evaluation method such as a sub-sampling method, a parameter multiplexing method or a proxy evaluation method, so that the increase of the load of the evaluation process due to the increase of the data volume and the increase of the iteration times is avoided, and the resource consumption of the evaluation process is reduced.
In a preferred embodiment, initializing the target data processing model according to the model evaluation result by using the tuner includes:
determining optimal model parameters corresponding to the model evaluation result by using an empirical learning algorithm through an optimizer;
and initializing the target data processing model according to the optimal model parameters.
In the embodiment, the parameters are adjusted by introducing machine experience so as to accelerate the training process of the network structure and greatly improve the efficiency of optimization training.
In a second aspect, an embodiment of the present application provides a big government affair data processing device based on machine learning, including:
the acquisition module is used for acquiring government affair log data;
the system comprises a determining module, a searching module and a searching module, wherein the determining module is used for determining a preset data processing model for preprocessing government affair log data in a preset searching space;
the training module is used for carrying out optimization training on a preset data processing model based on the tuner technology and the evaluator technology until an optimal data processing model is obtained;
the processing module is used for preprocessing the government affair log data by utilizing the optimal data processing model to obtain high-quality government affair data;
and the display module is used for storing or visually displaying the high-quality government affair data.
In one embodiment, a training module comprises:
the training unit is used for training a preset data processing model by using a preset tuning device to obtain a target data processing model, and the target data processing model comprises model parameters;
the evaluation unit is used for evaluating the target data processing model according to the model parameters by using a preset evaluator to obtain a model evaluation result;
the initialization unit is used for initializing the target data processing model according to the model evaluation result by using the tuner;
and the circulating unit is used for circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition, so as to obtain an optimal data processing model.
In a third aspect, an embodiment of the present application provides a computer device, including a processor and a memory, where the memory is used to store a computer program, and the computer program, when executed by the processor, implements the machine learning-based government affair big data processing method according to the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method for processing government affair big data based on machine learning according to the first aspect.
Please refer to the relevant description of the first aspect for the beneficial effects of the second to fourth aspects, which are not repeated herein.
Drawings
Fig. 1 is a schematic flowchart of a method for processing government affairs big data based on machine learning according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a big government data processing device based on machine learning according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As described in the related art, the smart city government big data platform faces challenges: the traditional data preprocessing comprises the processes of data cleaning, data sampling, data processing, data segmentation and the like, each process has multiple alternative methods, data analysis is often required to be carried out on data before the method is selected, the whole data preprocessing process is repeated and time-consuming, and the data preprocessing efficiency of a government affair big data processing system is very low.
Therefore, according to the method and the device for processing government affair big data based on machine learning, government affair log data are obtained, and a preset data processing model for preprocessing the government affair log data is determined in a preset search space, so that the government affair log data can be automatically processed based on machine learning in the preset search space; based on the optimizer technology and the evaluator technology, the preset data processing model is optimized and trained until the optimal data processing model is obtained, so that the problems that the traditional manual process is easy to make mistakes, low in efficiency and difficult to manage are solved, and the problem that configuration parameter adjustment is difficult due to the lack of professional knowledge for configuring and optimizing different algorithms is solved; and finally, preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data, and storing or visually displaying the high-quality government affair data, so that the data preprocessing capacity is improved, and the efficiency of real-time batch acquisition and accurate filtering is improved.
Referring to fig. 1, fig. 1 is a schematic flowchart of a government affair big data processing method based on machine learning according to an embodiment of the present application. The government affair big data processing method based on machine learning can be applied to computer equipment, and the computer equipment comprises but is not limited to a smart phone, a tablet computer, a notebook computer, a desktop computer, a physical server or a cloud server. As shown in fig. 1, the machine learning-based government affairs big data processing includes steps S101 to S105, which are detailed as follows:
step S101, government affair log data are obtained.
In this step, the government affair log data is the log data of the government affair system. Optionally, the logstack data engine collects government affair log data, and transmits the collected government affair log data to the computer device. It can be understood that the Logstash data engine supports dynamic data collection from various data sources, and performs operations such as filtering, analysis, enrichment, uniform format and the like on the data, and then stores the data in a preset storage space.
And S102, determining a preset data processing model for preprocessing the government affair log data in a preset search space.
In this step, the preset search space includes model files and algorithm files of a plurality of candidate models, and the model files include model network structures and model hyper-parameters. The embodiment of determining the preset data processing model comprises determining a model file and an algorithm file.
And S103, carrying out optimization training on the preset data processing model based on the tuner technology and the evaluator technology until an optimal data processing model is obtained.
In this step, the tuner is used to optimize the model through the sample and the evaluator is used for model performance evaluation. Optionally, the model is trained by using an optimizer, after the training is completed, a verification data set is obtained from a logstack data engine, the model effect is verified through an evaluator, so that the LOSS information of each sample of the verification data set is obtained, the optimizer automatically adjusts the network structure and the hyper-parameters through a machine learning technology according to the LOSS information of each sample of the verification data set, and so on, the model is optimized through continuous machine learning, the optimal model scheme is iterated, the current optimal model is trained, so that the processing mechanism for acquiring and filtering big data is adjusted, the complex problem of manual parameter adjustment is solved, the labor cost is reduced, and the model value is improved.
And step S104, preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data.
In the step, the optimal data processing model is utilized to preprocess the government affair log data to obtain high-quality government affair data, and the efficiency of preprocessing the real-time data is improved.
And step S105, storing or visually displaying the high-quality government affair data.
In this step, optionally, data storage is performed through an Elasticsearch distributed search analysis engine, the engine has the characteristics of high scalability, high reliability, easiness in management and the like, can be constructed based on Apache Lucene, and can perform near-real-time storage, search and analysis operations on large-capacity data.
Optionally, the data of the elastic search is searched, analyzed and displayed in a statistical chart mode through a Kibana data analysis and visualization platform and used together with the elastic search, and the data in the es can be displayed in multiple dimensions.
In an embodiment, based on the embodiment shown in fig. 1, the step S102 includes:
in the preset search space, selecting a model file containing a default network structure and hyper-parameters according to the government affair log data;
and determining an algorithm file of an iterative algorithm according to a preset model loss expected value, wherein the preset data processing model comprises the model file and the algorithm file.
In this embodiment, because there may be multiple alternative models for the same problem, and the hyper-parameters of each model are also unknown, compared with the conventional method in which an "optimal" result is obtained through user professional knowledge and repeated tests, the method and the system can realize automatic selection of the model files by presetting multiple alternative models and the hyper-parameters corresponding to the models in a search space, and selecting the model files containing the default network results and the hyper-parameters through government affair log data during actual application.
The purpose of the algorithm file selection is to automatically find an optimization algorithm to balance model efficiency and model performance. Illustratively, the goal is to minimize a smooth objective function, the computer device may select among a gradient descent algorithm, a random gradient descent algorithm, and a L-BGFS algorithm. The gradient descent algorithm has less hyper-parameters, but the model convergence speed is low, and the complexity of each iteration process is high; the resource consumption of the L-BFGS is higher, but the model convergence speed is higher, the consumption of each iteration process of the random gradient descent algorithm is small, and the iteration times are more. If the expected loss value of the preset model is the model convergence block, the computer equipment can balance the efficiency and the performance among the three so as to select the optimal algorithm.
In an embodiment, based on the embodiment shown in fig. 1, the step S103 includes:
training the preset data processing model by using a preset tuning device to obtain a target data processing model, wherein the target data processing model comprises model parameters;
evaluating the target data processing model by using a preset evaluator according to the model parameters to obtain a model evaluation result;
initializing the target data processing model by using the tuner according to the model evaluation result;
and circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition to obtain the optimal data processing model.
Optionally, the training the preset data processing model by using a preset tuning device to obtain a target data processing model includes:
and training the preset data processing model by using the tuning optimizer according to a preset optimization mode to obtain a target data processing model, wherein the preset optimization mode comprises a heuristic search mode, a non-conductive optimization mode and a reinforcement learning mode.
In this embodiment, for the tuner technology, the preset optimization method is a sample optimization method, and the sample optimization method includes a heuristic search method, a model-based non-conductive optimization method, and a reinforcement learning method.
A heuristic search mode: the method is sensitive to biological behaviors and phenomena, and is widely applied to the problems of non-convex, unsmooth and discontinuous tuning. The basic idea is to initialize a population, obtain a new population through the optimizer and the original population, then evaluate the new population, and repeat the above process.
Model-based non-optimization mode: a model is constructed through samples, then new samples are generated according to evaluation, and then the process is iterated repeatedly to achieve the purpose of targeted space search, so that the method can be used for unguided space optimization, and mainly comprises Bayesian optimization, classification-based optimization and synchronous optimistic optimization.
Bayes optimization: by constructing a probability model (e.g., gaussian, tree, deep network), and then defining an acquisition function (e.g., desired lift, upper confidence limit) based on the probability model, a new sample is obtained from the acquisition function for updating the probability function at each iteration. Bayesian optimization has the advantage of high convergence rate.
Optimization based on classification: by training a classifier with the old samples and dividing the search space into positive and negative regions, the samples in the positive region are more likely to obtain excellent results, so that the new samples are obtained from the positive region, and the steps are iterated, so that the method has the advantage of being very efficient.
And the synchronous optimistic optimization is a branch boundary optimization algorithm. And constructing a tree structure in a search space, wherein each leaf node is a small area, coordinating depth and breadth, and finding a global optimal point.
The reinforcement learning method is a wide and powerful optimization framework, solves the problem through delayed feedback, and is different from other optimization methods in that delayed feedback exists to add a time sequence concept to learning. Which includes policy learning and Q-learning.
Strategy learning: by considering a policy as a function, only one input to the current state, the action to be performed in the current state is determined based on a priori policy, but knowing the policy in advance is not an easy task, where a sophisticated function mapping states to targets needs to be understood in depth.
Q-Learning: unlike policy Learning, the Q-Learning algorithm has two inputs, state and action respectively, and returns a corresponding value for each state action pair. When faced with selection, the algorithm calculates the expected values for the agents taking different actions to select the best result.
Optionally, the evaluating the target data processing model according to the model parameter by using a preset evaluator to obtain a model evaluation result includes:
and performing auxiliary evaluation on the target data processing model according to the model parameters by using the evaluator by adopting a preset auxiliary evaluation method to obtain a model evaluation result, wherein the preset auxiliary evaluation method comprises a sub-sampling method, a parameter multiplexing method and a proxy evaluation method.
In this embodiment, the overall consumption of the evaluator is much larger compared to the tuner technology, where direct evaluation is the simplest method, and evaluation after direct training of the model is more expensive, although accurate. With the increase of the data volume and the increase of the iteration times, the direct evaluation clearly causes great burden to the whole process. In order to improve the efficiency of direct evaluation, the present embodiment designs the following method to assist the direct evaluation method to reduce the consumption thereof.
A sub-sampling method: the less training data, the faster the speed and the more noisy the evaluation is performed using the original sample or feature subset. Early termination: unlike in conventional machine learning, early termination is used to prevent overfitting. When the configuration information without prospect is met, the evaluation can be directly terminated, and unnecessary waste is avoided.
Multiplexing parameters: for configuration information with small difference, the previous parameters can be used as the initial information, so that the convergence speed can be increased, and better performance can be obtained.
Agent evaluation: given that configuration information can be quantified, the behavior of a given configuration can be predicted by building a proxy model.
Optionally, the initializing the target data processing model according to the model evaluation result by using the tuning optimizer includes:
determining an optimal model parameter corresponding to the model evaluation result by using the tuning and optimizing device and adopting an empirical learning algorithm;
and initializing the target data processing model according to the optimal model parameters.
In the present embodiment, the empirical learning algorithm improves the efficiency of automated machine learning by reducing the consumption in configuration generation and evaluation. The empirical learning algorithm includes meta learning and transfer learning.
Meta learning guides learning by extracting meta information. Meta-learning first characterizes the learning problem and learning tools (e.g., statistical features of the data, hyperparameters of the learning tools), then extracts meta-features from past experiences, and finally the meta-learner can train with meta-knowledge. Meta learning has important significance in automatic machine learning, on one hand, important information can be found through characteristic learning problems and learning tools, for example, data drifting exists in data (a model is not accurate any more along with time), and on the other hand, similar problems are easily found along with characterization, so that knowledge can be multiplexed and transferred among different problems. On the other hand, meta-learners encode past knowledge as a guide to solve future problems. Meta-learning can be applied to the evaluator to reduce the huge consumption caused by training in the evaluation process. By entering configuration information into a previously trained meta-learner for evaluation to predict the performance or fitness of the configuration, the meta-learner may directly select the optimal configuration, ideally if all configurations have been enumerated. Meta-learning can also be applied to the tuning unit to reduce meaningless consumption in the tuning process by optimizing the search space, and in the configuration generation stage, the features of the learning problem are extracted as input to predict the promising configuration by the meta-learning unit obtained from previous experience. Meanwhile, the method can be applied to transfer learning, and the configuration generation hot start is carried out by using the configuration which is most similar to the previous task element feature space as initialization data. In addition, meta-learning can be applied to dynamic configuration adaptation, whether concept drift occurs or not is detected through statistics of data and features, and once the concept drift is found, a promising configuration is predicted again to guarantee model availability.
The transfer learning is used for guiding learning by using previous experience, and in the machine learning, an optimal trained agent model or a search strategy is reused to save consumption. In the tuning process, the agent model can be migrated, and in the problem of network structure, due to the transferability of the network, the migration learning is widely applied to the neural architecture search. Transfer learning is used in the evaluator to expedite evaluation of the preselected configuration. For a general optimization problem, the transfer learning can transfer model parameters, and initialization is performed by using the trained optimal parameters. Another idea of the transfer learning is to initialize a new network by requiring the same function as the previously trained model through function retention conversion, such as Net2Net, so as to accelerate the training process of the network structure and greatly improve the efficiency.
In order to execute the government affair big data processing method based on machine learning corresponding to the method embodiment, corresponding functions and technical effects are achieved. Referring to fig. 2, fig. 2 is a block diagram illustrating a structure of a government affair big data processing device based on machine learning according to an embodiment of the present application. For convenience of explanation, only the parts related to the present embodiment are shown, and the device for processing government affairs big data based on machine learning according to the embodiment of the present application includes:
an obtaining module 201, configured to obtain government affair log data;
a determining module 202, configured to determine, in a preset search space, a preset data processing model for preprocessing the government affair log data;
the training module 203 is used for performing optimization training on the preset data processing model based on a tuner technology and an evaluator technology until an optimal data processing model is obtained;
the processing module 204 is configured to utilize the optimal data processing model to preprocess the government affair log data to obtain high-quality government affair data;
and the display module 205 is used for storing or visually displaying the high-quality government affair data.
In one embodiment, the determining module 202 includes:
the selecting unit is used for selecting a model file containing a default network structure and hyper-parameters in the preset search space according to the government affair log data;
and the determining unit is used for determining an algorithm file of an iterative algorithm according to a preset model loss expected value, and the preset data processing model comprises the model file and the algorithm file.
In one embodiment, the training module 203 comprises:
the training unit is used for training the preset data processing model by using a preset tuning device to obtain a target data processing model, and the target data processing model comprises model parameters;
the evaluation unit is used for evaluating the target data processing model according to the model parameters by using a preset evaluator to obtain a model evaluation result;
the initialization unit is used for initializing the target data processing model according to the model evaluation result by utilizing the tuner;
and the circulating unit is used for circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition, so as to obtain the optimal data processing model.
In a preferred embodiment, the training unit includes:
and the training subunit is used for training the preset data processing model by using the tuner according to a preset optimization mode to obtain a target data processing model, wherein the preset optimization mode comprises a heuristic search mode, a non-conductive optimization mode and a reinforcement learning mode.
In a preferred embodiment, the evaluation unit includes:
and the evaluation subunit is used for performing auxiliary evaluation on the target data processing model according to the model parameters by using a preset auxiliary evaluation method by using the evaluator to obtain a model evaluation result, wherein the preset auxiliary evaluation method comprises a sub-sampling method, a parameter multiplexing method and a proxy evaluation method.
In a preferred embodiment, the initialization unit includes:
the determining subunit is used for determining the optimal model parameters corresponding to the model evaluation result by using the tuning and optimizing device and adopting an empirical learning algorithm;
and the initialization subunit is used for initializing the target data processing model according to the optimal model parameters.
The device for processing big government affairs data based on machine learning can implement the method for processing big government affairs data based on machine learning of the method embodiment. The alternatives in the above-described method embodiments are also applicable to this embodiment and will not be described in detail here. The rest of the embodiments of the present application may refer to the contents of the above method embodiments, and in this embodiment, details are not described again.
Fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 3, the computer device 3 of this embodiment includes: at least one processor 30 (only one shown in fig. 3), a memory 31, and a computer program 32 stored in the memory 31 and executable on the at least one processor 30, the processor 30 implementing the steps of any of the above-described method embodiments when executing the computer program 32.
The computer device 3 may be a computing device such as a smart collection, a tablet computer, a desktop computer, and a cloud server. The computer device may include, but is not limited to, a processor 30, a memory 31. Those skilled in the art will appreciate that fig. 3 is merely an example of the computer device 3, and does not constitute a limitation of the computer device 3, and may include more or less components than those shown, or combine some of the components, or different components, such as input output devices, network access devices, etc.
The Processor 30 may be a Central Processing Unit (CPU), and the Processor 30 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 31 may in some embodiments be an internal storage unit of the computer device 3, such as a hard disk or a memory of the computer device 3. The memory 31 may also be an external storage device of the computer device 3 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the computer device 3. The memory 31 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 31 may also be used to temporarily store data that has been output or is to be output.
In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in any of the method embodiments described above.
The embodiments of the present application provide a computer program product, which when executed on a computer device, enables the computer device to implement the steps in the above method embodiments.
In several embodiments provided herein, it will be understood that each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are further detailed to explain the objects, technical solutions and advantages of the present application, and it should be understood that the above-mentioned embodiments are only examples of the present application and are not intended to limit the scope of the present application. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the present application, may occur to those skilled in the art and are intended to be included within the scope of the present application.

Claims (10)

1.一种基于机器学习的政务大数据处理方法,其特征在于,包括:1. a kind of government affairs big data processing method based on machine learning, is characterized in that, comprises: 获取政务日志数据;Obtain government log data; 在预设搜索空间中,确定用于对所述政务日志数据进行预处理的预设数据处理模型;In the preset search space, determine a preset data processing model for preprocessing the government affairs log data; 基于调优器技术和评估器技术,对所述预设数据处理模型进行优化训练,直至得到最优数据处理模型;Based on the optimizer technology and the evaluator technology, the preset data processing model is optimized and trained until the optimal data processing model is obtained; 利用所述最优数据处理模型,对所述政务日志数据进行预处理,得到高质量政务数据;Using the optimal data processing model, preprocessing the government affairs log data to obtain high-quality government affairs data; 对所述高质量政务数据进行存储或可视化展示。The high-quality government affairs data is stored or visualized. 2.如权利要求1所述的政务大数据处理方法,其特征在于,所述在预设搜索空间中,确定用于对所述政务日志数据进行预处理的预设数据处理模型,包括:2. The method for processing government affairs big data according to claim 1, wherein, in the preset search space, determining a preset data processing model for preprocessing the government affairs log data, comprising: 在所述预设搜索空间中,根据所述政务日志数据,选取包含默认网络结构和超参数的模型文件;In the preset search space, according to the government affairs log data, select a model file containing a default network structure and hyperparameters; 根据预设的模型损失期望值,确定迭代算法的算法文件,所述预设数据处理模型包括所述模型文件和所述算法文件。An algorithm file of the iterative algorithm is determined according to a preset model loss expectation value, and the preset data processing model includes the model file and the algorithm file. 3.如权利要求1所述的政务大数据处理方法,其特征在于,所述基于调优器技术和评估器技术,对所述预设数据处理模型进行优化训练,直至得到最优数据处理模型,包括:3. The method for processing government affairs big data according to claim 1, wherein the preset data processing model is optimized and trained based on the tuner technology and the evaluator technology, until the optimal data processing model is obtained ,include: 利用预设的调优器,对所述预设数据处理模型进行训练,得到目标数据处理模型,所述目标数据处理模型包括模型参数;Using a preset optimizer, the preset data processing model is trained to obtain a target data processing model, where the target data processing model includes model parameters; 利用预设的评估器,根据所述模型参数,对所述目标数据处理模型进行评估,得到模型评估结果;Using a preset evaluator to evaluate the target data processing model according to the model parameters to obtain a model evaluation result; 利用所述调优器,根据所述模型评估结果,对所述目标数据处理模型进行初始化;Use the tuner to initialize the target data processing model according to the model evaluation result; 基于所述调优器和所述评估器,对初始化后的所述目标数据处理模型进行循环优化,直至所述目标数据处理模型达到预设收敛条件,得到所述最优数据处理模型。Based on the optimizer and the evaluator, the initialized target data processing model is cyclically optimized until the target data processing model reaches a preset convergence condition, and the optimal data processing model is obtained. 4.如权利要求3所述的政务大数据处理方法,其特征在于,所述利用预设的调优器,对所述预设数据处理模型进行训练,得到目标数据处理模型,包括:4. The method for processing government affairs big data according to claim 3, characterized in that, using a preset optimizer to train the preset data processing model to obtain a target data processing model, comprising: 利用所述调优器,根据预设优化方式,对所述预设数据处理模型进行训练,得到目标数据处理模型,所述预设优化方式包括启发式搜索方式、非导优化方式和强化学习方式。Using the optimizer, according to a preset optimization method, the preset data processing model is trained to obtain a target data processing model, and the preset optimization method includes a heuristic search method, a non-guided optimization method and a reinforcement learning method . 5.如权利要求3所述的政务大数据处理方法,其特征在于,所述利用预设的评估器,根据所述模型参数,对所述目标数据处理模型进行评估,得到模型评估结果,包括:5. The method for processing government affairs big data according to claim 3, characterized in that, using a preset evaluator to evaluate the target data processing model according to the model parameters to obtain a model evaluation result, comprising: : 利用所述评估器采用预设辅助评估法,根据所述模型参数,对所述目标数据处理模型进行辅助评估,得到模型评估结果,所述预设辅助评估法包括子取样法、参数复用法和代理评估法。Using the evaluator to adopt a preset auxiliary evaluation method, according to the model parameters, perform auxiliary evaluation on the target data processing model to obtain a model evaluation result. The preset auxiliary evaluation method includes a sub-sampling method and a parameter multiplexing method. and proxy assessment. 6.如权利要求3所述的政务大数据处理方法,其特征在于,所述利用所述调优器,根据所述模型评估结果,对所述目标数据处理模型进行初始化,包括:6. The method for processing government affairs big data according to claim 3, characterized in that, using the tuner to initialize the target data processing model according to the model evaluation result, comprising: 利用所述调优器采用经验学习算法,确定所述模型评估结果对应的最优模型参数;Using the tuner to adopt an empirical learning algorithm to determine the optimal model parameters corresponding to the model evaluation results; 根据所述最优模型参数,对所述目标数据处理模型进行初始化。The target data processing model is initialized according to the optimal model parameters. 7.一种基于机器学习的政务大数据处理装置,其特征在于,包括:7. A device for processing government affairs big data based on machine learning, characterized in that, comprising: 获取模块,用于获取政务日志数据;The acquisition module is used to acquire the government affairs log data; 确定模块,用于在预设搜索空间中,确定用于对所述政务日志数据进行预处理的预设数据处理模型;a determination module, configured to determine a preset data processing model for preprocessing the government affairs log data in a preset search space; 训练模块,用于基于调优器技术和评估器技术,对所述预设数据处理模型进行优化训练,直至得到最优数据处理模型;A training module, used for optimizing and training the preset data processing model based on the tuner technology and the evaluator technology, until the optimal data processing model is obtained; 处理模块,用于利用所述最优数据处理模型,对所述政务日志数据进行预处理,得到高质量政务数据;a processing module, configured to use the optimal data processing model to preprocess the government affairs log data to obtain high-quality government affairs data; 展示模块,用于对所述高质量政务数据进行存储或可视化展示。The display module is used to store or visualize the high-quality government affairs data. 8.根据权利要求7所述的基于机器学习的政务大数据处理装置,其特征在于,所述训练模块,包括:8. The apparatus for processing government affairs big data based on machine learning according to claim 7, wherein the training module comprises: 训练单元,用于利用预设的调优器,对所述预设数据处理模型进行训练,得到目标数据处理模型,所述目标数据处理模型包括模型参数;a training unit, configured to use a preset tuner to train the preset data processing model to obtain a target data processing model, where the target data processing model includes model parameters; 评估单元,用于利用预设的评估器,根据所述模型参数,对所述目标数据处理模型进行评估,得到模型评估结果;an evaluation unit, configured to use a preset evaluator to evaluate the target data processing model according to the model parameters to obtain a model evaluation result; 初始化单元,用于利用所述调优器,根据所述模型评估结果,对所述目标数据处理模型进行初始化;an initialization unit, configured to use the tuner to initialize the target data processing model according to the model evaluation result; 循环单元,用于基于所述调优器和所述评估器,对初始化后的所述目标数据处理模型进行循环优化,直至所述目标数据处理模型达到预设收敛条件,得到所述最优数据处理模型。A loop unit, configured to perform loop optimization on the initialized target data processing model based on the tuner and the evaluator, until the target data processing model reaches a preset convergence condition, and obtain the optimal data Process the model. 9.一种计算机设备,其特征在于,包括处理器和存储器,所述存储器用于存储计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1至6任一项所述的基于机器学习的政务大数据处理方法。9. A computer device, characterized in that it comprises a processor and a memory, wherein the memory is used to store a computer program, and when the computer program is executed by the processor, the computer program according to any one of claims 1 to 6 is implemented. Big data processing method for government affairs based on machine learning. 10.一种计算机可读存储介质,其特征在于,其存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至6任一项所述的基于机器学习的政务大数据处理方法。10. A computer-readable storage medium, characterized in that it stores a computer program, and when the computer program is executed by the processor, the machine learning-based government big data processing according to any one of claims 1 to 6 is realized method.
CN202111358382.1A 2021-11-16 2021-11-16 Method and device for processing big data of government affairs based on machine learning Pending CN114139720A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111358382.1A CN114139720A (en) 2021-11-16 2021-11-16 Method and device for processing big data of government affairs based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111358382.1A CN114139720A (en) 2021-11-16 2021-11-16 Method and device for processing big data of government affairs based on machine learning

Publications (1)

Publication Number Publication Date
CN114139720A true CN114139720A (en) 2022-03-04

Family

ID=80390341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111358382.1A Pending CN114139720A (en) 2021-11-16 2021-11-16 Method and device for processing big data of government affairs based on machine learning

Country Status (1)

Country Link
CN (1) CN114139720A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114493379A (en) * 2022-04-08 2022-05-13 金电联行(北京)信息技术有限公司 Enterprise evaluation model automatic generation method, device and system based on government affair data
CN115422176A (en) * 2022-09-01 2022-12-02 安徽省安策智库咨询有限公司 Government affair public big data quality evaluation and processing method
CN115757384A (en) * 2022-11-30 2023-03-07 安徽长正智库管理咨询有限公司 A big data-based government data processing method
WO2025015835A1 (en) * 2023-07-17 2025-01-23 中国电信股份有限公司技术创新中心 Data analysis method and apparatus, computer device, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062587A (en) * 2017-12-15 2018-05-22 清华大学 The hyper parameter automatic optimization method and system of a kind of unsupervised machine learning
CN110472119A (en) * 2019-07-17 2019-11-19 广东鼎义互联科技股份有限公司 One kind being applied to government affairs the analysis of public opinion platform
CN111144581A (en) * 2019-12-31 2020-05-12 杭州雅拓信息技术有限公司 Machine learning hyper-parameter adjusting method and system
CN111950738A (en) * 2020-08-10 2020-11-17 中国平安人寿保险股份有限公司 Machine learning model optimization effect evaluation method and device, terminal and storage medium
CN112015962A (en) * 2020-07-24 2020-12-01 北京艾巴斯智能科技发展有限公司 Government affair intelligent big data center system architecture
WO2021007812A1 (en) * 2019-07-17 2021-01-21 深圳大学 Deep neural network hyperparameter optimization method, electronic device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062587A (en) * 2017-12-15 2018-05-22 清华大学 The hyper parameter automatic optimization method and system of a kind of unsupervised machine learning
CN110472119A (en) * 2019-07-17 2019-11-19 广东鼎义互联科技股份有限公司 One kind being applied to government affairs the analysis of public opinion platform
WO2021007812A1 (en) * 2019-07-17 2021-01-21 深圳大学 Deep neural network hyperparameter optimization method, electronic device and storage medium
CN111144581A (en) * 2019-12-31 2020-05-12 杭州雅拓信息技术有限公司 Machine learning hyper-parameter adjusting method and system
CN112015962A (en) * 2020-07-24 2020-12-01 北京艾巴斯智能科技发展有限公司 Government affair intelligent big data center system architecture
CN111950738A (en) * 2020-08-10 2020-11-17 中国平安人寿保险股份有限公司 Machine learning model optimization effect evaluation method and device, terminal and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114493379A (en) * 2022-04-08 2022-05-13 金电联行(北京)信息技术有限公司 Enterprise evaluation model automatic generation method, device and system based on government affair data
CN115422176A (en) * 2022-09-01 2022-12-02 安徽省安策智库咨询有限公司 Government affair public big data quality evaluation and processing method
CN115757384A (en) * 2022-11-30 2023-03-07 安徽长正智库管理咨询有限公司 A big data-based government data processing method
WO2025015835A1 (en) * 2023-07-17 2025-01-23 中国电信股份有限公司技术创新中心 Data analysis method and apparatus, computer device, and storage medium

Similar Documents

Publication Publication Date Title
CN114139720A (en) Method and device for processing big data of government affairs based on machine learning
CN110390396B (en) Method, device and system for estimating causal relationship between observed variables
CN113344016B (en) Deep transfer learning method, device, electronic device and storage medium
CN111406264B (en) Neural architecture search
WO2022027937A1 (en) Neural network compression method, apparatus and device, and storage medium
US20240054146A1 (en) Selectively identifying and recommending digital content items for synchronization
CN113821657B (en) Image processing model training method and image processing method based on artificial intelligence
WO2016062044A1 (en) Model parameter training method, device and system
CN106648654A (en) Data sensing-based Spark configuration parameter automatic optimization method
CN113392867B (en) Image recognition method, device, computer equipment and storage medium
JP2021022367A (en) Image processing method and information processor
CN111914159A (en) Information recommendation method and terminal
CN108446770A (en) A kind of slow node processing system and method for distributed machines study based on sampling
US20240013061A1 (en) Architecture search method and apparatus for large-scale graph, and device and storage medium
CN111898766A (en) Ether house fuel limitation prediction method and device based on automatic machine learning
CN114492601A (en) Resource classification model training method and device, electronic equipment and storage medium
CN116310578B (en) A method for constructing an image classification model without training in the search phase
CN117744754A (en) Large language model task processing method, device, equipment and medium
JP2020009122A (en) Control program, control method and system
Zafar et al. An Optimization Approach for Convolutional Neural Network Using Non-Dominated Sorted Genetic Algorithm-II.
CN117999560A (en) Hardware-aware progressive training of machine learning models
CN113806579A (en) Text image retrieval method and device
CN112926611B (en) Feature extraction method, device and computer readable storage medium
JP7063397B2 (en) Answer integration device, answer integration method and answer integration program
CN111177015A (en) Application program quality identification method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220310

Address after: Room 1119, building 6, Derui garden, 143 Minzu Avenue, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region

Applicant after: Guangxi Zhongke Shuguang cloud computing Co.,Ltd.

Applicant after: Pingnan Zhongke Shuguang cloud computing Co.,Ltd.

Address before: Room 1119, building 6, Derui garden, 143 Minzu Avenue, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region

Applicant before: Guangxi Zhongke Shuguang cloud computing Co.,Ltd.

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载