CN114139720A

CN114139720A - Method and device for processing big data of government affairs based on machine learning

Info

Publication number: CN114139720A
Application number: CN202111358382.1A
Authority: CN
Inventors: 梁明杰; 郑鹏; 刘志徽; 韦静贤
Original assignee: Guangxi Zhongke Shuguang Cloud Computing Co ltd
Current assignee: Guangxi Zhongke Shuguang Cloud Computing Co ltd; Pingnan Zhongke Shuguang cloud computing Co.,Ltd.
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2022-03-04

Abstract

The present application discloses a method and device for processing government affairs big data based on machine learning. By acquiring government affairs log data, in a preset search space, a preset data processing model for preprocessing the government affairs log data is determined, so that the The government affairs log data can be automatically processed based on machine learning in the preset search space; based on the optimizer technology and evaluator technology, the preset data processing model is optimized and trained until the optimal data processing model is obtained, thus avoiding the traditional manual process. Errors, low efficiency and difficult management; finally, the optimal data processing model is used to preprocess government log data to obtain high-quality government data, and to store or visualize high-quality government data to improve data preprocessing capabilities. , to improve the efficiency of real-time batch collection and precise filtering.

Description

Government affair big data processing method and device based on machine learning

Technical Field

The application relates to the technical field of big data, in particular to a government affair big data processing method and device based on machine learning.

Background

With the development of big data technology, municipal administration data are gradually gathered to a big data platform of government affairs of smart cities, matched tools for data acquisition, calculation, processing, analysis and the like are formed, mechanisms such as metadata management, data sharing, data safety protection and the like are established, and data innovation application is developed.

But the smart city government big data platform also faces challenges: the traditional data preprocessing comprises the processes of data cleaning, data sampling, data processing, data segmentation and the like, each process has multiple alternative methods, data analysis is often required to be carried out on data before the method is selected, the whole data preprocessing process is repeated and time-consuming, and the data preprocessing efficiency of a government affair big data processing system is very low.

Disclosure of Invention

The application provides a government affair big data processing method and device based on machine learning, and aims to solve the technical problem that a government affair big data processing system is low in data preprocessing efficiency.

To solve the above technical problem, a first aspect. The embodiment of the application provides a government affair big data processing method based on machine learning, which comprises the following steps:

acquiring government affair log data;

in a preset search space, determining a preset data processing model for preprocessing government affair log data;

performing optimization training on a preset data processing model based on a tuner technology and an evaluator technology until an optimal data processing model is obtained;

preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data;

and storing or visually displaying the high-quality government affair data.

In the embodiment, by acquiring government affair log data and determining a preset data processing model for preprocessing the government affair log data in a preset search space, the government affair log data can be automatically processed in the preset search space based on machine learning; based on the optimizer technology and the evaluator technology, the preset data processing model is optimized and trained until the optimal data processing model is obtained, so that the problems that the traditional manual process is easy to make mistakes, low in efficiency and difficult to manage are solved, and the problem that configuration parameter adjustment is difficult due to the lack of professional knowledge for configuring and optimizing different algorithms is solved; and finally, preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data, and storing or visually displaying the high-quality government affair data, so that the data preprocessing capacity is improved, and the efficiency of real-time batch acquisition and accurate filtering is improved.

In one embodiment, in the preset search space, determining a preset data processing model for preprocessing government affair log data includes:

in a preset search space, selecting a model file containing a default network structure and hyper-parameters according to government affair log data;

and determining an algorithm file of an iterative algorithm according to a preset model loss expected value, wherein the preset data processing model comprises a model file and an algorithm file.

In the embodiment, the model files and the algorithm files are automatically determined in the preset search space, so that the automation of model selection and algorithm selection is realized, the model deployment training efficiency is improved, and the data preprocessing efficiency is improved.

In one embodiment, the performing optimization training on the preset data processing model based on the tuner technology and the evaluator technology until obtaining the optimal data processing model includes:

training a preset data processing model by using a preset tuner to obtain a target data processing model, wherein the target data processing model comprises model parameters;

evaluating the target data processing model by using a preset evaluator according to the model parameters to obtain a model evaluation result;

initializing a target data processing model by using an optimizer according to a model evaluation result;

and circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition to obtain an optimal data processing model.

In the embodiment, model parameters are continuously and circularly optimized through the optimizer and the evaluator to obtain the intelligent model for adjusting the acquisition and filtering big data processing mechanism, so that the automatic parameter adjustment of the model is realized, the problem of error caused by complicated steps of large manual parameter adjustment is solved, the time is saved, and the labor cost is reduced.

In a preferred embodiment, training a preset data processing model by using a preset tuning device to obtain a target data processing model, includes:

and training the preset data processing model by using the optimizer according to a preset optimization mode to obtain a target data processing model, wherein the preset optimization mode comprises a heuristic search mode, a non-conductive optimization mode and a reinforcement learning mode.

The embodiment trains through a heuristic search mode, a non-guide optimization mode, a reinforcement learning mode and other preset optimization modes, does not need specific assumed conditions, and enables the model training to be more efficient.

In a preferred embodiment, the evaluating the target data processing model according to the model parameters by using a preset evaluator to obtain a model evaluation result, including:

and performing auxiliary evaluation on the target data processing model by using an evaluator according to the model parameters by adopting a preset auxiliary evaluation method to obtain a model evaluation result, wherein the preset auxiliary evaluation method comprises a sub-sampling method, a parameter multiplexing method and an agent evaluation method.

In the embodiment, the evaluation is performed by using an auxiliary evaluation method such as a sub-sampling method, a parameter multiplexing method or a proxy evaluation method, so that the increase of the load of the evaluation process due to the increase of the data volume and the increase of the iteration times is avoided, and the resource consumption of the evaluation process is reduced.

In a preferred embodiment, initializing the target data processing model according to the model evaluation result by using the tuner includes:

determining optimal model parameters corresponding to the model evaluation result by using an empirical learning algorithm through an optimizer;

and initializing the target data processing model according to the optimal model parameters.

In the embodiment, the parameters are adjusted by introducing machine experience so as to accelerate the training process of the network structure and greatly improve the efficiency of optimization training.

In a second aspect, an embodiment of the present application provides a big government affair data processing device based on machine learning, including:

the acquisition module is used for acquiring government affair log data;

the system comprises a determining module, a searching module and a searching module, wherein the determining module is used for determining a preset data processing model for preprocessing government affair log data in a preset searching space;

the training module is used for carrying out optimization training on a preset data processing model based on the tuner technology and the evaluator technology until an optimal data processing model is obtained;

the processing module is used for preprocessing the government affair log data by utilizing the optimal data processing model to obtain high-quality government affair data;

and the display module is used for storing or visually displaying the high-quality government affair data.

In one embodiment, a training module comprises:

the training unit is used for training a preset data processing model by using a preset tuning device to obtain a target data processing model, and the target data processing model comprises model parameters;

the evaluation unit is used for evaluating the target data processing model according to the model parameters by using a preset evaluator to obtain a model evaluation result;

the initialization unit is used for initializing the target data processing model according to the model evaluation result by using the tuner;

and the circulating unit is used for circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition, so as to obtain an optimal data processing model.

In a third aspect, an embodiment of the present application provides a computer device, including a processor and a memory, where the memory is used to store a computer program, and the computer program, when executed by the processor, implements the machine learning-based government affair big data processing method according to the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method for processing government affair big data based on machine learning according to the first aspect.

Please refer to the relevant description of the first aspect for the beneficial effects of the second to fourth aspects, which are not repeated herein.

Drawings

Fig. 1 is a schematic flowchart of a method for processing government affairs big data based on machine learning according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a big government data processing device based on machine learning according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As described in the related art, the smart city government big data platform faces challenges: the traditional data preprocessing comprises the processes of data cleaning, data sampling, data processing, data segmentation and the like, each process has multiple alternative methods, data analysis is often required to be carried out on data before the method is selected, the whole data preprocessing process is repeated and time-consuming, and the data preprocessing efficiency of a government affair big data processing system is very low.

Therefore, according to the method and the device for processing government affair big data based on machine learning, government affair log data are obtained, and a preset data processing model for preprocessing the government affair log data is determined in a preset search space, so that the government affair log data can be automatically processed based on machine learning in the preset search space; based on the optimizer technology and the evaluator technology, the preset data processing model is optimized and trained until the optimal data processing model is obtained, so that the problems that the traditional manual process is easy to make mistakes, low in efficiency and difficult to manage are solved, and the problem that configuration parameter adjustment is difficult due to the lack of professional knowledge for configuring and optimizing different algorithms is solved; and finally, preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data, and storing or visually displaying the high-quality government affair data, so that the data preprocessing capacity is improved, and the efficiency of real-time batch acquisition and accurate filtering is improved.

Referring to fig. 1, fig. 1 is a schematic flowchart of a government affair big data processing method based on machine learning according to an embodiment of the present application. The government affair big data processing method based on machine learning can be applied to computer equipment, and the computer equipment comprises but is not limited to a smart phone, a tablet computer, a notebook computer, a desktop computer, a physical server or a cloud server. As shown in fig. 1, the machine learning-based government affairs big data processing includes steps S101 to S105, which are detailed as follows:

step S101, government affair log data are obtained.

In this step, the government affair log data is the log data of the government affair system. Optionally, the logstack data engine collects government affair log data, and transmits the collected government affair log data to the computer device. It can be understood that the Logstash data engine supports dynamic data collection from various data sources, and performs operations such as filtering, analysis, enrichment, uniform format and the like on the data, and then stores the data in a preset storage space.

And S102, determining a preset data processing model for preprocessing the government affair log data in a preset search space.

In this step, the preset search space includes model files and algorithm files of a plurality of candidate models, and the model files include model network structures and model hyper-parameters. The embodiment of determining the preset data processing model comprises determining a model file and an algorithm file.

And S103, carrying out optimization training on the preset data processing model based on the tuner technology and the evaluator technology until an optimal data processing model is obtained.

In this step, the tuner is used to optimize the model through the sample and the evaluator is used for model performance evaluation. Optionally, the model is trained by using an optimizer, after the training is completed, a verification data set is obtained from a logstack data engine, the model effect is verified through an evaluator, so that the LOSS information of each sample of the verification data set is obtained, the optimizer automatically adjusts the network structure and the hyper-parameters through a machine learning technology according to the LOSS information of each sample of the verification data set, and so on, the model is optimized through continuous machine learning, the optimal model scheme is iterated, the current optimal model is trained, so that the processing mechanism for acquiring and filtering big data is adjusted, the complex problem of manual parameter adjustment is solved, the labor cost is reduced, and the model value is improved.

And step S104, preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data.

In the step, the optimal data processing model is utilized to preprocess the government affair log data to obtain high-quality government affair data, and the efficiency of preprocessing the real-time data is improved.

And step S105, storing or visually displaying the high-quality government affair data.

In this step, optionally, data storage is performed through an Elasticsearch distributed search analysis engine, the engine has the characteristics of high scalability, high reliability, easiness in management and the like, can be constructed based on Apache Lucene, and can perform near-real-time storage, search and analysis operations on large-capacity data.

Optionally, the data of the elastic search is searched, analyzed and displayed in a statistical chart mode through a Kibana data analysis and visualization platform and used together with the elastic search, and the data in the es can be displayed in multiple dimensions.

In an embodiment, based on the embodiment shown in fig. 1, the step S102 includes:

in the preset search space, selecting a model file containing a default network structure and hyper-parameters according to the government affair log data;

and determining an algorithm file of an iterative algorithm according to a preset model loss expected value, wherein the preset data processing model comprises the model file and the algorithm file.

In this embodiment, because there may be multiple alternative models for the same problem, and the hyper-parameters of each model are also unknown, compared with the conventional method in which an "optimal" result is obtained through user professional knowledge and repeated tests, the method and the system can realize automatic selection of the model files by presetting multiple alternative models and the hyper-parameters corresponding to the models in a search space, and selecting the model files containing the default network results and the hyper-parameters through government affair log data during actual application.

The purpose of the algorithm file selection is to automatically find an optimization algorithm to balance model efficiency and model performance. Illustratively, the goal is to minimize a smooth objective function, the computer device may select among a gradient descent algorithm, a random gradient descent algorithm, and a L-BGFS algorithm. The gradient descent algorithm has less hyper-parameters, but the model convergence speed is low, and the complexity of each iteration process is high; the resource consumption of the L-BFGS is higher, but the model convergence speed is higher, the consumption of each iteration process of the random gradient descent algorithm is small, and the iteration times are more. If the expected loss value of the preset model is the model convergence block, the computer equipment can balance the efficiency and the performance among the three so as to select the optimal algorithm.

In an embodiment, based on the embodiment shown in fig. 1, the step S103 includes:

training the preset data processing model by using a preset tuning device to obtain a target data processing model, wherein the target data processing model comprises model parameters;

initializing the target data processing model by using the tuner according to the model evaluation result;

and circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition to obtain the optimal data processing model.

Optionally, the training the preset data processing model by using a preset tuning device to obtain a target data processing model includes:

and training the preset data processing model by using the tuning optimizer according to a preset optimization mode to obtain a target data processing model, wherein the preset optimization mode comprises a heuristic search mode, a non-conductive optimization mode and a reinforcement learning mode.

In this embodiment, for the tuner technology, the preset optimization method is a sample optimization method, and the sample optimization method includes a heuristic search method, a model-based non-conductive optimization method, and a reinforcement learning method.

A heuristic search mode: the method is sensitive to biological behaviors and phenomena, and is widely applied to the problems of non-convex, unsmooth and discontinuous tuning. The basic idea is to initialize a population, obtain a new population through the optimizer and the original population, then evaluate the new population, and repeat the above process.

Model-based non-optimization mode: a model is constructed through samples, then new samples are generated according to evaluation, and then the process is iterated repeatedly to achieve the purpose of targeted space search, so that the method can be used for unguided space optimization, and mainly comprises Bayesian optimization, classification-based optimization and synchronous optimistic optimization.

Bayes optimization: by constructing a probability model (e.g., gaussian, tree, deep network), and then defining an acquisition function (e.g., desired lift, upper confidence limit) based on the probability model, a new sample is obtained from the acquisition function for updating the probability function at each iteration. Bayesian optimization has the advantage of high convergence rate.

Optimization based on classification: by training a classifier with the old samples and dividing the search space into positive and negative regions, the samples in the positive region are more likely to obtain excellent results, so that the new samples are obtained from the positive region, and the steps are iterated, so that the method has the advantage of being very efficient.

And the synchronous optimistic optimization is a branch boundary optimization algorithm. And constructing a tree structure in a search space, wherein each leaf node is a small area, coordinating depth and breadth, and finding a global optimal point.

The reinforcement learning method is a wide and powerful optimization framework, solves the problem through delayed feedback, and is different from other optimization methods in that delayed feedback exists to add a time sequence concept to learning. Which includes policy learning and Q-learning.

Strategy learning: by considering a policy as a function, only one input to the current state, the action to be performed in the current state is determined based on a priori policy, but knowing the policy in advance is not an easy task, where a sophisticated function mapping states to targets needs to be understood in depth.

Q-Learning: unlike policy Learning, the Q-Learning algorithm has two inputs, state and action respectively, and returns a corresponding value for each state action pair. When faced with selection, the algorithm calculates the expected values for the agents taking different actions to select the best result.

Optionally, the evaluating the target data processing model according to the model parameter by using a preset evaluator to obtain a model evaluation result includes:

and performing auxiliary evaluation on the target data processing model according to the model parameters by using the evaluator by adopting a preset auxiliary evaluation method to obtain a model evaluation result, wherein the preset auxiliary evaluation method comprises a sub-sampling method, a parameter multiplexing method and a proxy evaluation method.

In this embodiment, the overall consumption of the evaluator is much larger compared to the tuner technology, where direct evaluation is the simplest method, and evaluation after direct training of the model is more expensive, although accurate. With the increase of the data volume and the increase of the iteration times, the direct evaluation clearly causes great burden to the whole process. In order to improve the efficiency of direct evaluation, the present embodiment designs the following method to assist the direct evaluation method to reduce the consumption thereof.

A sub-sampling method: the less training data, the faster the speed and the more noisy the evaluation is performed using the original sample or feature subset. Early termination: unlike in conventional machine learning, early termination is used to prevent overfitting. When the configuration information without prospect is met, the evaluation can be directly terminated, and unnecessary waste is avoided.

Multiplexing parameters: for configuration information with small difference, the previous parameters can be used as the initial information, so that the convergence speed can be increased, and better performance can be obtained.

Agent evaluation: given that configuration information can be quantified, the behavior of a given configuration can be predicted by building a proxy model.

Optionally, the initializing the target data processing model according to the model evaluation result by using the tuning optimizer includes:

determining an optimal model parameter corresponding to the model evaluation result by using the tuning and optimizing device and adopting an empirical learning algorithm;

In the present embodiment, the empirical learning algorithm improves the efficiency of automated machine learning by reducing the consumption in configuration generation and evaluation. The empirical learning algorithm includes meta learning and transfer learning.

Meta learning guides learning by extracting meta information. Meta-learning first characterizes the learning problem and learning tools (e.g., statistical features of the data, hyperparameters of the learning tools), then extracts meta-features from past experiences, and finally the meta-learner can train with meta-knowledge. Meta learning has important significance in automatic machine learning, on one hand, important information can be found through characteristic learning problems and learning tools, for example, data drifting exists in data (a model is not accurate any more along with time), and on the other hand, similar problems are easily found along with characterization, so that knowledge can be multiplexed and transferred among different problems. On the other hand, meta-learners encode past knowledge as a guide to solve future problems. Meta-learning can be applied to the evaluator to reduce the huge consumption caused by training in the evaluation process. By entering configuration information into a previously trained meta-learner for evaluation to predict the performance or fitness of the configuration, the meta-learner may directly select the optimal configuration, ideally if all configurations have been enumerated. Meta-learning can also be applied to the tuning unit to reduce meaningless consumption in the tuning process by optimizing the search space, and in the configuration generation stage, the features of the learning problem are extracted as input to predict the promising configuration by the meta-learning unit obtained from previous experience. Meanwhile, the method can be applied to transfer learning, and the configuration generation hot start is carried out by using the configuration which is most similar to the previous task element feature space as initialization data. In addition, meta-learning can be applied to dynamic configuration adaptation, whether concept drift occurs or not is detected through statistics of data and features, and once the concept drift is found, a promising configuration is predicted again to guarantee model availability.

The transfer learning is used for guiding learning by using previous experience, and in the machine learning, an optimal trained agent model or a search strategy is reused to save consumption. In the tuning process, the agent model can be migrated, and in the problem of network structure, due to the transferability of the network, the migration learning is widely applied to the neural architecture search. Transfer learning is used in the evaluator to expedite evaluation of the preselected configuration. For a general optimization problem, the transfer learning can transfer model parameters, and initialization is performed by using the trained optimal parameters. Another idea of the transfer learning is to initialize a new network by requiring the same function as the previously trained model through function retention conversion, such as Net2Net, so as to accelerate the training process of the network structure and greatly improve the efficiency.

In order to execute the government affair big data processing method based on machine learning corresponding to the method embodiment, corresponding functions and technical effects are achieved. Referring to fig. 2, fig. 2 is a block diagram illustrating a structure of a government affair big data processing device based on machine learning according to an embodiment of the present application. For convenience of explanation, only the parts related to the present embodiment are shown, and the device for processing government affairs big data based on machine learning according to the embodiment of the present application includes:

an obtaining module 201, configured to obtain government affair log data;

a determining module 202, configured to determine, in a preset search space, a preset data processing model for preprocessing the government affair log data;

the training module 203 is used for performing optimization training on the preset data processing model based on a tuner technology and an evaluator technology until an optimal data processing model is obtained;

the processing module 204 is configured to utilize the optimal data processing model to preprocess the government affair log data to obtain high-quality government affair data;

and the display module 205 is used for storing or visually displaying the high-quality government affair data.

In one embodiment, the determining module 202 includes:

the selecting unit is used for selecting a model file containing a default network structure and hyper-parameters in the preset search space according to the government affair log data;

and the determining unit is used for determining an algorithm file of an iterative algorithm according to a preset model loss expected value, and the preset data processing model comprises the model file and the algorithm file.

In one embodiment, the training module 203 comprises:

the training unit is used for training the preset data processing model by using a preset tuning device to obtain a target data processing model, and the target data processing model comprises model parameters;

the initialization unit is used for initializing the target data processing model according to the model evaluation result by utilizing the tuner;

and the circulating unit is used for circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition, so as to obtain the optimal data processing model.

In a preferred embodiment, the training unit includes:

and the training subunit is used for training the preset data processing model by using the tuner according to a preset optimization mode to obtain a target data processing model, wherein the preset optimization mode comprises a heuristic search mode, a non-conductive optimization mode and a reinforcement learning mode.

In a preferred embodiment, the evaluation unit includes:

and the evaluation subunit is used for performing auxiliary evaluation on the target data processing model according to the model parameters by using a preset auxiliary evaluation method by using the evaluator to obtain a model evaluation result, wherein the preset auxiliary evaluation method comprises a sub-sampling method, a parameter multiplexing method and a proxy evaluation method.

In a preferred embodiment, the initialization unit includes:

the determining subunit is used for determining the optimal model parameters corresponding to the model evaluation result by using the tuning and optimizing device and adopting an empirical learning algorithm;

and the initialization subunit is used for initializing the target data processing model according to the optimal model parameters.

The device for processing big government affairs data based on machine learning can implement the method for processing big government affairs data based on machine learning of the method embodiment. The alternatives in the above-described method embodiments are also applicable to this embodiment and will not be described in detail here. The rest of the embodiments of the present application may refer to the contents of the above method embodiments, and in this embodiment, details are not described again.

Fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 3, the computer device 3 of this embodiment includes: at least one processor 30 (only one shown in fig. 3), a memory 31, and a computer program 32 stored in the memory 31 and executable on the at least one processor 30, the processor 30 implementing the steps of any of the above-described method embodiments when executing the computer program 32.

The computer device 3 may be a computing device such as a smart collection, a tablet computer, a desktop computer, and a cloud server. The computer device may include, but is not limited to, a processor 30, a memory 31. Those skilled in the art will appreciate that fig. 3 is merely an example of the computer device 3, and does not constitute a limitation of the computer device 3, and may include more or less components than those shown, or combine some of the components, or different components, such as input output devices, network access devices, etc.

The Processor 30 may be a Central Processing Unit (CPU), and the Processor 30 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 31 may in some embodiments be an internal storage unit of the computer device 3, such as a hard disk or a memory of the computer device 3. The memory 31 may also be an external storage device of the computer device 3 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the computer device 3. The memory 31 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 31 may also be used to temporarily store data that has been output or is to be output.

In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in any of the method embodiments described above.

The embodiments of the present application provide a computer program product, which when executed on a computer device, enables the computer device to implement the steps in the above method embodiments.

In several embodiments provided herein, it will be understood that each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above-mentioned embodiments are further detailed to explain the objects, technical solutions and advantages of the present application, and it should be understood that the above-mentioned embodiments are only examples of the present application and are not intended to limit the scope of the present application. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the present application, may occur to those skilled in the art and are intended to be included within the scope of the present application.

Claims

1. a kind of government affairs big data processing method based on machine learning, is characterized in that, comprises:

Obtain government log data;

In the preset search space, determine a preset data processing model for preprocessing the government affairs log data;

Based on the optimizer technology and the evaluator technology, the preset data processing model is optimized and trained until the optimal data processing model is obtained;

Using the optimal data processing model, preprocessing the government affairs log data to obtain high-quality government affairs data;

The high-quality government affairs data is stored or visualized.

2. The method for processing government affairs big data according to claim 1, wherein, in the preset search space, determining a preset data processing model for preprocessing the government affairs log data, comprising:

In the preset search space, according to the government affairs log data, select a model file containing a default network structure and hyperparameters;

An algorithm file of the iterative algorithm is determined according to a preset model loss expectation value, and the preset data processing model includes the model file and the algorithm file.

3. The method for processing government affairs big data according to claim 1, wherein the preset data processing model is optimized and trained based on the tuner technology and the evaluator technology, until the optimal data processing model is obtained ,include:

Using a preset optimizer, the preset data processing model is trained to obtain a target data processing model, where the target data processing model includes model parameters;

Using a preset evaluator to evaluate the target data processing model according to the model parameters to obtain a model evaluation result;

Use the tuner to initialize the target data processing model according to the model evaluation result;

Based on the optimizer and the evaluator, the initialized target data processing model is cyclically optimized until the target data processing model reaches a preset convergence condition, and the optimal data processing model is obtained.

4. The method for processing government affairs big data according to claim 3, characterized in that, using a preset optimizer to train the preset data processing model to obtain a target data processing model, comprising:

Using the optimizer, according to a preset optimization method, the preset data processing model is trained to obtain a target data processing model, and the preset optimization method includes a heuristic search method, a non-guided optimization method and a reinforcement learning method .

5. The method for processing government affairs big data according to claim 3, characterized in that, using a preset evaluator to evaluate the target data processing model according to the model parameters to obtain a model evaluation result, comprising: :

Using the evaluator to adopt a preset auxiliary evaluation method, according to the model parameters, perform auxiliary evaluation on the target data processing model to obtain a model evaluation result. The preset auxiliary evaluation method includes a sub-sampling method and a parameter multiplexing method. and proxy assessment.

6. The method for processing government affairs big data according to claim 3, characterized in that, using the tuner to initialize the target data processing model according to the model evaluation result, comprising:

Using the tuner to adopt an empirical learning algorithm to determine the optimal model parameters corresponding to the model evaluation results;

The target data processing model is initialized according to the optimal model parameters.

7. A device for processing government affairs big data based on machine learning, characterized in that, comprising:

The acquisition module is used to acquire the government affairs log data;

a determination module, configured to determine a preset data processing model for preprocessing the government affairs log data in a preset search space;

A training module, used for optimizing and training the preset data processing model based on the tuner technology and the evaluator technology, until the optimal data processing model is obtained;

a processing module, configured to use the optimal data processing model to preprocess the government affairs log data to obtain high-quality government affairs data;

The display module is used to store or visualize the high-quality government affairs data.

8. The apparatus for processing government affairs big data based on machine learning according to claim 7, wherein the training module comprises:

a training unit, configured to use a preset tuner to train the preset data processing model to obtain a target data processing model, where the target data processing model includes model parameters;

an evaluation unit, configured to use a preset evaluator to evaluate the target data processing model according to the model parameters to obtain a model evaluation result;

an initialization unit, configured to use the tuner to initialize the target data processing model according to the model evaluation result;

A loop unit, configured to perform loop optimization on the initialized target data processing model based on the tuner and the evaluator, until the target data processing model reaches a preset convergence condition, and obtain the optimal data Process the model.

9. A computer device, characterized in that it comprises a processor and a memory, wherein the memory is used to store a computer program, and when the computer program is executed by the processor, the computer program according to any one of claims 1 to 6 is implemented. Big data processing method for government affairs based on machine learning.

10. A computer-readable storage medium, characterized in that it stores a computer program, and when the computer program is executed by the processor, the machine learning-based government big data processing according to any one of claims 1 to 6 is realized method.