WO2021007812A1

WO2021007812A1 - Deep neural network hyperparameter optimization method, electronic device and storage medium

Info

Publication number: WO2021007812A1
Application number: PCT/CN2019/096379
Authority: WO
Inventors: 骆剑平; 陈亮
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2019-07-17
Filing date: 2019-07-17
Publication date: 2021-01-21
Anticipated expiration: 2022-01-17

Abstract

A deep neural network hyperparameter optimization method, an electronic device, and a storage medium. The method comprises: using a multi-task conditional neural network model to replace an existing Gaussian process model to implement a data processing process of hyperparameter optimization; determining parameters, sequentially selecting training sets of a decoder and an encoder according to the specified parameters to perform network training, prediction, screening, and evaluation; adding candidate points of each task obtained after evaluation and target function values into observation sets of corresponding tasks; and carrying out network training, prediction, screening, and evaluation again, repeating the aforementioned steps until the maximum number of iterations is reached, and finding the point, i.e. the hyperparameter, when the target function values in the observation sets of each task are the largest, thereby solving the problems in the prior art of complex covariance calculations and the like when the Gaussian process is used to implement hyperparameter optimization, and simultaneously achieving the hyperparameter optimization of multiple tasks.

Description

A deep neural network hyperparameter optimization method, electronic equipment and storage medium

Technical field

本发明涉及超参数优化方法，尤其涉及一种深度神经网络超参数优化方法、电子设备及储存介质。The invention relates to a hyperparameter optimization method, in particular to a deep neural network hyperparameter optimization method, electronic equipment and storage medium.

Background technique

目前，深度学习的网络模型中有许多不是在训练过程中学习到的参数，而是在训练开始之前就直接设置，这些参数被称为神经网络超参数。而超参数优化一直是限制网络模型性能提升的难题，目前神经网络模型越来越复杂，超参数的种类也越来越多，往往造成不能根据超参数之间的关系选择合适的超参数组合。目前比较常用的超参数优化方法为Bayesian优化方法，该方法可以在尽可能的真实评估次数下从决策空间中找到好的决策组合，主要思路是根据历史数据构建了整个问题过程的代理模型而不对真实问题进行评估，并通过代理模型预测的不确定性来决定下一步采样点，通过不断迭代后寻找一个近似最优解。目前Bayesian优化中主要的代理模型为高斯过程(Gaussian Process，GPs)，但是一般每次只优化单个问题，或者以牺牲硬件为代价并行运行确保在一定时间内充分利用数据信息等问题。单任务学习单独地从零开始学习，忽略了其他相似任务的相关信息来深入研究数据特征，并且经常会遇到噪声大、数据维度较高或数据量偏小等对结果影响较大的问题。通常需要大量的观测数据来训练得到足够精确的单任务代理模型，但现实生活中很难达到要求，导致根据数据训练的模型有一定的局限性，也就造成了模型预测不够准确。随着数据量的增多，高斯过程中协方差函数计算复杂度呈指数增长，导致计算成本高、运行时间长等问题。At present, many of the network models of deep learning are not learned during the training process, but are set directly before the training starts. These parameters are called neural network hyperparameters. Hyperparameter optimization has always been a problem that limits the performance of network models. At present, neural network models are becoming more and more complex, and there are more and more types of hyperparameters, which often results in failure to select appropriate hyperparameter combinations according to the relationship between hyperparameters. At present, the most commonly used hyperparameter optimization method is the Bayesian optimization method. This method can find a good decision combination from the decision space under the actual evaluation times as much as possible. The main idea is to construct a proxy model of the entire problem process based on historical data. Real problems are evaluated, and the next sampling point is determined by the uncertainty of the proxy model prediction, and an approximate optimal solution is found after constant iteration. At present, the main proxy model in Bayesian optimization is Gaussian Process (GPs), but generally only a single problem is optimized at a time, or runs in parallel at the expense of hardware to ensure full use of data and information within a certain period of time. Single-task learning learns from scratch independently, ignoring the relevant information of other similar tasks to in-depth study of data characteristics, and often encounters problems such as large noise, high data dimensionality, or small amount of data that have a greater impact on the results. A large amount of observation data is usually required to train a sufficiently accurate single-task agent model, but it is difficult to meet the requirements in real life, which leads to certain limitations of the model trained on the data, which results in the model prediction is not accurate enough. As the amount of data increases, the computational complexity of the covariance function in the Gaussian process increases exponentially, leading to problems such as high computational cost and long running time.

发明内容Summary of the invention

为了克服现有技术的不足，本发明的目的之一在于提供一种深度神经网络超参数优化方法，其能够解决现有技术中对于超参数优化时计算成本高、运行时间长等问题。In order to overcome the shortcomings of the prior art, one of the objectives of the present invention is to provide a deep neural network hyperparameter optimization method, which can solve the problems of high calculation cost and long running time for hyperparameter optimization in the prior art.

本发明的目的之二在于提供一种电子设备，其能够解决现有技术中对于超参数优化时计算成本高、运行时间长等问题。The second object of the present invention is to provide an electronic device that can solve the problems of high calculation cost and long running time when optimizing hyperparameters in the prior art.

本发明的目的之三在于提供一种计算机存储介质，其能够解决现有技术中对于超参数优化时计算成本高、运行时间长等问题。The third object of the present invention is to provide a computer storage medium, which can solve the problems of high calculation cost and long running time when optimizing hyperparameters in the prior art.

本发明的目的之一采用如下技术方案实现：One of the objectives of the present invention is achieved by the following technical solutions:

一种深度神经网络超参数优化方法，包括以下步骤：A deep neural network hyperparameter optimization method, including the following steps:

参数设定步骤：设定训练的任务个数、每个任务的目标函数、每个任务的初始值个数、最大迭代次数；Parameter setting steps: set the number of training tasks, the objective function of each task, the number of initial values of each task, and the maximum number of iterations;

模型训练步骤：根据每个任务的初始值个数以及目标函数得出每个任务的观测集，并根据每个任务的观测集进行网络模型训练得出多任务条件神经网络模型；Model training steps: Obtain the observation set of each task according to the number of initial values of each task and the objective function, and train the network model according to the observation set of each task to obtain a multi-task conditional neural network model;

预测步骤：根据多任务条件神经网络模型对未知区域中的任意一点进行预测得出每个点对应每个任务的目标函数值；Prediction step: predict any point in the unknown area according to the multi-task conditional neural network model to obtain the objective function value of each point corresponding to each task;

筛选步骤：根据粒子群算法以及未知区域的所有点对每个任务的目标函数值进行筛选，得出每个任务的候选点；Screening step: Screen the objective function value of each task according to the particle swarm algorithm and all points in the unknown area to obtain candidate points for each task;

迭代步骤：将每个任务的候选点代入到多任务条件神经网络模型中进行真实值计算，并将每个任务的候选点以及计算出的真实值加入到每个任务的观测集中并形成每个任务的新的观测集，然后根据新的观测集依次执行模型训练步骤、预测步骤、筛选步骤以及迭代步骤；直到达到最大迭代次数，从最后一次迭代所形成的观测集中选择响应值最大时所对应的参数组合作为超参数组合。Iterative steps: Substitute the candidate points of each task into the multi-task conditional neural network model to calculate the true value, and add the candidate points of each task and the calculated true value to the observation set of each task to form each The new observation set of the task, and then according to the new observation set, the model training step, the prediction step, the screening step and the iteration step are executed in sequence; until the maximum number of iterations is reached, the observation set formed by the last iteration is selected from the observation set corresponding to the largest response value The parameter combination of as a hyperparameter combination.

进一步地，所述多任务神经条件网络模型是通过多任务学习将多个任务在条件神经网络模型进行训练过程中进行相互学习，进而形成多任务条件神经网络模型。Further, the multi-task neural conditional network model learns from each other during the training of the conditional neural network model through multi-task learning, and then forms the multi-task conditional neural network model.

进一步地，多任务条件神经网络模型是将多个任务的条件神经网络模型的输出层用相似性网络层组合起来，形成多任务条件神经网络模型；相似性网络层是由全连接网络构成。Furthermore, the multi-task conditional neural network model combines the output layers of the conditional neural network model of multiple tasks with a similarity network layer to form a multi-task conditional neural network model; the similarity network layer is composed of a fully connected network.

进一步地，所述模型训练步骤具体为：Further, the model training step specifically includes:

步骤S11：设置模型训练迭代次数、算法最大迭代次数以及最小化损失函数；最小化损失函数为所有任务的最小化损失函数的总和；每个任务的最小化损失函数为：最小化负条件对数概率；Step S11: Set the number of model training iterations, the maximum number of algorithm iterations, and the minimized loss function; the minimized loss function is the sum of the minimized loss functions of all tasks; the minimized loss function of each task is: minimize the negative conditional logarithm Probability

步骤S12：按照预设比例分别从每个任务的观测集中随机选择出对应的初始值，并生成对应任务的解码器训练集，以及将每个任务的观测集作为对应任务的编码器训练集；Step S12: randomly select the corresponding initial value from the observation set of each task according to the preset ratio, generate the decoder training set of the corresponding task, and use the observation set of each task as the encoder training set of the corresponding task;

步骤S13：根据每个任务的解码器训练集、编码器训练集进行网络模型训练，同时根据最小化损失函数更新每个任务的条件神经网络模型的网络参数和相似性网络层的参数；Step S13: Perform network model training according to the decoder training set and encoder training set of each task, and update the network parameters of the conditional neural network model of each task and the parameters of the similarity network layer according to the minimized loss function;

步骤S14：当网络模型训练达到模型训练迭代次数后，返回依次执行步骤S12～步骤S14；直到达到算法最大迭代次数，得到多任务条件神经网络模型。Step S14: When the network model training reaches the number of model training iterations, return to perform step S12 to step S14 in sequence; until the maximum number of algorithm iterations is reached, a multi-task conditional neural network model is obtained.

进一步地，所述步骤S13还包括：根据最小化损失函数并利用Adam优化器反向传播算法更新每个任务的条件神经网络模型的网络参数和相似性网络层的参数。Further, the step S13 further includes: updating the network parameters of the conditional neural network model and the parameters of the similarity network layer of each task according to the minimized loss function and using the Adam optimizer back propagation algorithm.

进一步地，所述观测集的生成过程如下：首先根据每个任务的目标函数以及初始值个数生成每个任务的目标函数中得出对应的目标函数值，根据每个任务的初始值以及目标函数值得出每个任务的观测值。Further, the generation process of the observation set is as follows: First, the objective function of each task is generated according to the objective function of each task and the number of initial values, and the corresponding objective function value is obtained according to the initial value and objective of each task. The function value is the observation value of each task.

进一步地，所述筛选步骤为：根据未知区域每个点对每个任务的目标函数值计算得出每个任务的采集函数EI，并将每个任务的采集函数EI作为粒子群算法的适应度函数，然后根据粒子群算法选取采集函数EI的函数值最大的点作为每个任务的候选点。Further, the screening step is: calculating the collection function EI of each task according to the value of the objective function for each task at each point in the unknown area, and using the collection function EI of each task as the fitness of the particle swarm algorithm Function, and then select the point with the largest function value of the collection function EI according to the particle swarm algorithm as the candidate point for each task.

本发明的目的之二采用如下技术方案实现：The second objective of the present invention is achieved by adopting the following technical solutions:

一种电子设备，包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如本发明目的之一所采用的一种深度神经网络超参数优化方法的步骤。An electronic device, including a memory, a processor, and a computer program stored on the memory and running on the processor. The processor implements a deep neural network as one of the objectives of the present invention when the processor executes the program Steps of hyperparameter optimization method.

本发明的目的之三采用如下技术方案实现：The third objective of the present invention is achieved by the following technical solutions:

一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现如本发明目的之一所采用的一种深度神经网络超参数优化方法的步骤。A computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of a deep neural network hyperparameter optimization method adopted as one of the objectives of the present invention are realized.

相比现有技术，本发明的有益效果在于：Compared with the prior art, the present invention has the following beneficial effects:

本发明通过使用条件神经网络模型来代替高斯模型实现模型的训练，并将其与多任务学习结合一起组成多任务条件神经网络模型，将多任务条件神经网络模型应用到贝叶斯算法框架中，进而实现多任务的超参数优化，使得超参数优化的处理过程中，降低了计算成本和运行时间成本。The present invention implements the training of the model by using the conditional neural network model instead of the Gaussian model, and combines it with multi-task learning to form a multi-task conditional neural network model, and applies the multi-task conditional neural network model to the Bayesian algorithm framework, In turn, multi-task hyperparameter optimization is realized, so that the calculation cost and running time cost are reduced in the process of hyperparameter optimization.

Description of the drawings

图1为本发明提供的一种深度神经网络超参数优化方法的流程框图；FIG. 1 is a flowchart of a method for optimizing hyperparameters of a deep neural network provided by the present invention;

图2为本发明提供的条件神经网络模型网络结构图；Figure 2 is a network structure diagram of the conditional neural network model provided by the present invention;

图3为本发明提供的多任务条件神经网络模型网络结构图；Figure 3 is a network structure diagram of a multi-task conditional neural network model provided by the present invention;

图4为本发明提供的多任务条件神经网络模型网络训练图；Figure 4 is a network training diagram of a multi-task conditional neural network model provided by the present invention;

图5为本发明提供的多任务条件神经网络模型中预测点x ^*的预测示意图； FIG. 5 is a schematic diagram of the prediction of the prediction point x ^* in the multi-task conditional neural network model provided by the present invention;

图6为本发明提供的将多任务条件神经网络模型应用于贝叶斯决策算法中的超参数优化方法流程图；Fig. 6 is a flowchart of a hyperparameter optimization method for applying a multi-task conditional neural network model to a Bayesian decision algorithm provided by the present invention;

图7为本发明提供的扩充数据集示意图。Fig. 7 is a schematic diagram of the extended data set provided by the present invention.

Detailed ways

下面，结合附图以及具体实施方式，对本发明做进一步描述，需要说明的是，在不相冲突的前提下，以下描述的各实施例之间或各技术特征之间可以任意组合形成新的实施例。Hereinafter, the present invention will be further described with reference to the drawings and specific implementations. It should be noted that, provided that there is no conflict, the following embodiments or technical features can be combined to form new embodiments. .

实施例一：Example one:

本发明提出了一种利用条件神经网络过程(Conditional Neural Processes，CNPs)来代替高斯过程(Gaussian Process，GPs)，而CNPs结合了随机过程和神经网络的特点，受到高斯过程灵活性的启发并利用梯度下降进行神经网络训练，进而实现超参数优化方法。The present invention proposes a method that uses Conditional Neural Processes (CNPs) to replace Gaussian Processes (GPs), and CNPs combine the characteristics of random processes and neural networks, and are inspired and used by the flexibility of Gaussian processes Gradient descent is used for neural network training to realize the hyperparameter optimization method.

条件神经网络过程对已知观测数据的学习时由神经网络参数化通过对数据集进行随机采样并遵循梯度步骤来训练形成条件神经网络模型，以在给定随机观察集的情况下最大化随机子集的条件似然。The conditional neural network process is parameterized by the neural network when learning the known observation data. The conditional neural network model is trained by randomly sampling the data set and following the gradient steps to maximize the random number given a random observation set. Set the conditional likelihood.

现有技术中的超参数优化的一般方法都是采用高斯过程进行模型训练，而本专利利用了学习能力更强的条件神经网络模型来代替传统的高斯过程，避免了复杂的协方差函数的计算。通过结合当前多任务学习的优点，在条件神经网络模型的基础上提出了多任务条件神经网络过程(Multi-task Conditional Neural Processes in One-to-many case，OMc-MTCNPs)，并将其应用到贝叶斯决策算法(Bayesian)中，实现对超参数进行优化。The general method of hyperparameter optimization in the prior art is to use the Gaussian process for model training, and this patent uses a conditional neural network model with stronger learning ability to replace the traditional Gaussian process, avoiding the calculation of complex covariance functions . By combining the advantages of current multi-task learning, a multi-task conditional neural network process (Multi-task Conditional Neural Processes in One-to-many case, OMc-MTCNPs) is proposed on the basis of the conditional neural network model and applied to In the Bayesian decision algorithm (Bayesian), the hyperparameters are optimized.

在多任务条件神经网络模型中，通过多任务条件神经网络模型可以充分利用已知的少量数据来增强每个单任务的泛化性能。将多任务条件神经网络模型应用在贝叶斯决策算法的框架中，可以对超参数优化过程中寻找到更好的近似最优解，解决现有超参数优化处理的运行时间长、计算成本高等问题。In the multi-task conditional neural network model, the multi-task conditional neural network model can make full use of the known small amount of data to enhance the generalization performance of each single task. Applying the multi-task conditional neural network model to the framework of the Bayesian decision algorithm can find a better approximate optimal solution in the hyperparameter optimization process, and solve the long running time and high computational cost of the existing hyperparameter optimization processing. problem.

例如，如图1所示，针对M个任务(也即是每个任务为一个深度网络模型，每个模型均有一个深度网络超参数优化训练集)。其中，每个任务都具有多个超参数，多任务条件神经网络模型可以根据每个任务各自的训练集以及训练结果迭代地找出每个任务对应的较好的超参数组合。当找到最优超参数组合时，就可以在该最优超参数存在的条件下，对未知点进行准确地预测。For example, as shown in Figure 1, for M tasks (that is, each task is a deep network model, and each model has a deep network hyperparameter optimization training set). Among them, each task has multiple hyperparameters, and the multi-task conditional neural network model can iteratively find a better hyperparameter combination corresponding to each task according to the respective training set and training results of each task. When the optimal hyperparameter combination is found, the unknown point can be accurately predicted under the condition that the optimal hyperparameter exists.

为了更好地说明多任务条件神经网络模型如何对超参数进行优化的，本发明首先介绍下条件神经网络模型的模型训练过程以及多任务条件神经网络模型的模型训练过程。In order to better explain how the multi-task conditional neural network model optimizes hyperparameters, the present invention first introduces the model training process of the conditional neural network model and the model training process of the multi-task conditional neural network model.

一、针对条件神经网络模型(CNPs模型)：1. For conditional neural network model (CNPs model):

1、条件神经网络模型是通过两个阶段的学习数据以缓解少量数据无法有效训练的问题：第一阶段是学习训练数据的统计分布信息，第二阶段只利用部分训练数据和第一阶段学习到的分布信息共同拟合特定函数。1. The conditional neural network model uses two stages of learning data to alleviate the problem that a small amount of data cannot be effectively trained: the first stage is to learn the statistical distribution information of the training data, the second stage only uses part of the training data and the first stage to learn The distribution information of the joint fits a specific function.

例如假设我们有n个不同输入的数据集

有一个未知表达式的函数f:X→Y，将X输入到函数f中得到输出

其中y _i＝f(x _i)。 For example, suppose we have n data sets with different inputs

There is a function f:X→Y with an unknown expression, input X into the function f to get the output

Where y _i =f(x _i ).

其中，第一阶段：如图2所示为条件神经网络过程结构图，条件神经网络由两部分组成，分别为编码器h和解码器g，它们都是由神经网络组成。其中，编码器h主要是学习数据之间的映射关系，即参数化条件概率；解码器g是利用学习到的信息进行预测回归，即计算条件概率。Among them, the first stage: Figure 2 shows the process structure diagram of the conditional neural network. The conditional neural network is composed of two parts, namely the encoder h and the decoder g, both of which are composed of neural networks. Among them, the encoder h mainly learns the mapping relationship between the data, that is, parameterized conditional probability; the decoder g uses the learned information to perform predictive regression, that is, calculates the conditional probability.

其中，第二阶段：拟合函数p:X→Y，解码器g最终就是使f和p尽可能的接近。Among them, the second stage: fitting function p: X→Y, the decoder g finally makes f and p as close as possible.

条件神经网络模型的一个好处是不需要设定高斯先验，而是直接通过神经网络从数据中获取分布信息。假如我们有m个未观测数据的集合

CNPs在观测值

上参数化f(T|O,T)，利用这种方式放弃了随机过程的数学计算，但增加了灵活性和扩展性。 One advantage of the conditional neural network model is that there is no need to set a Gaussian prior, but to obtain distribution information from the data directly through the neural network. If we have a set of m unobserved data

CNPs in observations

The above parameterization f(T|O,T), in this way, the mathematical calculation of the random process is abandoned, but the flexibility and scalability are increased.

2、条件神经网络模型的数学分析2. Mathematical analysis of conditional neural network model

设定P _θ为一个在随机变量f(x),x∈T上的分布，其中θ是定义P _θ的所有参数集合。则按一定循序排列已知观测集O和未知观测集T得到O'和T'，也即是： Let P _θ be a distribution on the random variable f(x),x∈T, where θ is the set of all parameters that define P _θ . Then arrange the known observation set O and the unknown observation set T in a certain order to obtain O'and T', that is:

P _θ(f(T)|O,T)＝P _θ(f(T)|O,T')＝P _θ(f(T)|O',T)。其中，T＝{x1,x2,x3,…,xn}。因此，P _θ(f(T)|O _,T)＝∏ _x∈TP _θ(f(T)|O _,x)(1)，也即是将T中的每个x组合连乘变换，也即是可以将P _θ分解并对每个输入x进行单独条件分布计算。 P _θ (f(T)|O,T)=P _θ (f(T)|O,T')=P _θ (f(T)|O',T). Among them, T={x1,x2,x3,...,xn}. Therefore, P _θ (f(T)|O _, T)=∏ _x∈T P _θ (f(T)|O _, x)(1), that is, to transform each combination of x in T, That is, it is possible to decompose P _θ and perform a separate conditional distribution calculation for each input x.

如图2所示，其表示了条件神经网络模型的结构，其中r _i＝h _θ(x _i,y _i)

As shown in Figure 2, it represents the structure of the conditional neural network model, where r _i =h _θ (x _i ,y _i )

其中，h _θ:X×Y→R ^d和g _θ:X×R ^d→R ^e都是神经网络，公式(2)中的编码器h将已知数据信息提炼出来得到r，再利用公式(3)进行整合并作为编码器g预测每个x的决策因素之一。公式(4)为编码器g作出预测在数学上的表示。其中，符号

为整合所有学习到的信息并映射到单个x上。在大部分实验中

用来表示加权平均，即式子(2)可以等价于r＝(r ₁+r ₂+…+r _n-1+r _n)/n。 Among them, h _θ : X×Y→R ^d and g _θ : X×R ^d →R ^e are neural networks. The encoder h in formula (2) extracts the known data information to obtain r, and then uses the formula ( 3) It is integrated and used as one of the decision factors for the encoder g to predict each x. Formula (4) is the mathematical representation of the prediction made by the encoder g. Among them, the symbol

To integrate all learned information and map it to a single x. In most experiments

Used to express the weighted average, that is, formula (2) can be equivalent to r=(r ₁ +r ₂ +...+r _n-1 +r _n )/n.

3、模型参数训练与预测3. Model parameter training and prediction

首先选取一数据集O，并且根据预设比例从该数据集O中随机选取一子集作为模型训练的编码器h，而将数据集O作为模型训练的解码器g。也即是通过数据集O中X的目标预测值与真实目标值的误差来更新训练P _θ中的参数，进而得出的参数，能够使得预测的结果更准确。在选取子集时，一般来说，通常设置 N～uniform[1,...,n]，也即是，选取数据集O的前N个元素得到子集

将其作为模型训练的编码器h。 First select a data set O, and randomly select a subset from the data set O according to a preset ratio as the encoder h for model training, and use the data set O as the decoder g for model training. That is, the parameters in the training P _θ are updated by the error between the target predicted value of X in the data set O and the real target value, and the obtained parameters can make the predicted result more accurate. When selecting a subset, generally speaking, N～uniform[1,...,n] is usually set, that is, the first N elements of the data set O are selected to obtain the subset

Use it as the encoder h for model training.

因此，基于数据集进行条件神经网络模型的训练，其步骤如下：Therefore, to train the conditional neural network model based on the data set, the steps are as follows:

步骤A1：首先设置学习率、模型训练迭代次数、算法最大迭代次数等参数。在一个模型训练之前，这些超参数可根据历史经验给出，便于开始模型训练。其中，学习率也是超参数，其主要是对神经网络模型进行设定，如何进行学习的，通过对该超参数的优化，将其设置给神经网络模型来实现模型训练。Step A1: First set the learning rate, the number of model training iterations, and the maximum number of algorithm iterations. Before a model is trained, these hyperparameters can be given based on historical experience to facilitate the start of model training. Among them, the learning rate is also a hyperparameter, which is mainly used to set the neural network model, how to learn, through the optimization of the hyperparameter, set it to the neural network model to achieve model training.

步骤A2：按照预设比例从数据集O中随机选取子集O _N。一般来说，N～uniform[1,...,n]。也即是，从数据集O中选取前N个数据作为子集O _N。 Step A2: according to a preset ratio O randomly selected subset of the data set from O, _N. Generally speaking, N～uniform[1,...,n]. That is, the first N data are selected from the data set O as the subset _ON .

步骤A3：将数据集O作为解码器g的训练集，子集O _N作为编码器h的训练集。 Step A3: Use the data set O as the training set of the decoder g, and the subset _ON as the training set of the encoder h.

步骤A4：将最小化负条件对数概率作为最小化损失函数，也即是：

其中，

另外，该最小化损失函数是通过最小化附条件对数概率得到的，其计算公式是常规公式，本发明不对计算公式进行改进。 Step A4: Use the minimized negative conditional logarithmic probability as the minimized loss function, that is:

among them,

In addition, the minimized loss function is obtained by minimizing the conditional logarithmic probability, and its calculation formula is a conventional formula, and the present invention does not improve the calculation formula.

步骤A5：根据该最小化损失函数利用Adam优化器更新算法更新多条件网络模型的参数；当每次模型训练迭代次数后返回步骤A2～步骤A5，继续进行子集的选取、模型的训练以及参数的更新，以此类推。Step A5: Use the Adam optimizer update algorithm to update the parameters of the multi-condition network model according to the minimized loss function; return to step A2 to step A5 after each model training iteration number, continue to select subsets, model training and parameters Update, and so on.

步骤A6：当达到算法最大迭代次数时，训练结束，进而得到任务的条件神经网络模型。Step A6: When the maximum number of iterations of the algorithm is reached, the training ends, and the conditional neural network model of the task is obtained.

模型训练完成后，向条件神经网络模型的解码器g中输入未知点x ^*，通过该条件神经网络模型就可以对该未知点x ^*进行预测，得出其目标函数值，比如得出目标函数值的预测均值和方差。 After the model training is completed, input the unknown point x ^* into the decoder g of the conditional neural network model, and the unknown point x ^* can be predicted by the conditional neural network model to obtain the objective function value, such as the objective function The predicted mean and variance of the value.

二、多任务条件神经网络模型：2. Multi-task conditional neural network model:

高斯过程适合样本规模较小的场合，但是存在协方差函数计算复杂度性高的不足；而上述条件神经网络模型也只能对单一任务进行模型训练。而对于实现多任务的模型训练时，本发明将条件神经网络模型进行了扩展，将其扩展为多任务条件神经网络模型(Multi-task Conditional Neural Processes in One-to-many case，OMc-MTCNPs)，通过多任务学习，不仅可以减轻数据不足的问题，还可以通过任务间相互学习提升单个任务的学习能力，也避免了复杂的协方差计算过程。The Gaussian process is suitable for occasions with small sample size, but it has the disadvantage of high computational complexity of the covariance function; and the above-mentioned conditional neural network model can only perform model training on a single task. For multi-task model training, the present invention expands the conditional neural network model and expands it into a multi-task conditional neural network model (Multi-task Conditional Neural Processes in One-to-many case, OMC-MTCNPs) Through multi-task learning, not only can the problem of insufficient data be alleviated, but also the learning ability of a single task can be improved through mutual learning between tasks, and the complicated covariance calculation process can also be avoided.

也即是说，本发明将多任务学习和条件神经网络模型的较强的学习预测能力结合在一起，提出了多任务条件神经网络模型，并将多任务条件神经网络模型应用到贝叶斯决策算法的优化框架来对超参数进行优化处理，进而得出最优的超参数。条件神经网络模型和多任务条件神经网络模型都具有很强的拓展性，对于一些简单低维数据，使用全连接层就能达到较好的性能，而对于一些复杂高维数据问题，我们可以使用卷积层来高效抽取数据信息。In other words, the present invention combines multi-task learning with the strong learning and predictive capabilities of the conditional neural network model, proposes a multi-task conditional neural network model, and applies the multi-task conditional neural network model to Bayesian decision making The optimization framework of the algorithm is used to optimize the hyperparameters, and then obtain the optimal hyperparameters. Both the conditional neural network model and the multi-task conditional neural network model have strong scalability. For some simple low-dimensional data, a fully connected layer can achieve better performance, and for some complex high-dimensional data problems, we can use Convolutional layer to efficiently extract data information.

1、首先本发明给出了多任务条件神经网络模型的网络结构，如图3所示。1. First, the present invention provides the network structure of the multi-task conditional neural network model, as shown in Fig. 3.

假设有M个任务，

代表训练集。其中，每个任务同时采样相同的集合X，Y _l代表点集合X在第l个任务上的目标函数值的集合。 Suppose there are M tasks,

Represents the training set. Among them, each task samples the same set X at the same time, and Y _l represents the set of objective function values of the point set X on the l-th task.

也是说，多任务条件神经网络模型是将多个任务的条件神经网络模型组合在一起，并将每个条件神经网络模型的输出层用相似性网络层k组合起来，进而形成该多任务条件神经网络模型。In other words, the multi-task conditional neural network model combines the conditional neural network models of multiple tasks, and combines the output layer of each conditional neural network model with the similarity network layer k to form the multi-task conditional neural network. Network model.

而，对于相似性网络层k是由全连接网络构成，它主要通过更新参数学习每个任务之间的相似性度量，并结合相似性信息对每个任务预测结果进行一定的线性组合得出最终的结果。也即是说，通过相似性网络层k将每个条件神经网络模型输出的均值矩阵

和方差矩阵

用神经网络连接起来，则整个模型输出的均值矩阵m和方差矩阵v分别由

和

表示。w _m和b _m分别表示均值连接项的权值矩阵和偏置矩阵，w _v和b _v分别表示方差连接项的权值矩阵和偏置矩阵。也就是说，每个任务最终输出经过相似性网络层k后都与其他任务相关。 However, for the similarity network layer k is composed of a fully connected network, it mainly learns the similarity measure between each task by updating parameters, and combines the similarity information to perform a certain linear combination of the prediction results of each task to obtain the final the result of. In other words, the mean matrix of each conditional neural network model output by the similarity network layer k

Sum variance matrix

Connected by a neural network, the mean matrix m and variance matrix v output by the entire model are respectively determined by

with

Said. w _m and b _m respectively represent the weight matrix and the bias matrix of the mean connection term, and w _v and b _v represent the weight matrix and the bias matrix of the variance connection term, respectively. In other words, the final output of each task is related to other tasks after passing through the similarity network layer k.

多任务条件神经网络模型利用从类似相关任务中学到的经验，通过适当的损失函数将一个任务的信息有效地传播到其他任务的网络模型中，实现任务之间的信息共享。由于多任务模型的特点，观测数据的规模增大M倍，而且每个任务都可以利用其他任务共享的信息，避免了任务只从本身数据从零开始学习，也可以通过更新网络将这些经验从处理类似问题的过程中整合起来。图3中只画出添加一个全连接层来表示任务的相关性，但是如果处理复杂的问题，相似性网络层k中网络结构或层数可以根据任务的不同灵活地改变以满足需求。当任务之间关系很难表达时，可通过添加更多的隐藏层，使相似性网络层k学习到所有数据的数学分布特征，这些特征被表征在隐藏节点中，当网络前向传播时可以激活这些节点以促使任务相互学习。The multi-task conditional neural network model uses the experience learned from similar related tasks to effectively spread the information of one task to the network model of other tasks through an appropriate loss function, and realize the information sharing between tasks. Due to the characteristics of the multi-task model, the scale of observation data is increased by M times, and each task can use the information shared by other tasks, avoiding the task to learn from scratch from its own data, and it can also update these experiences from the network. Integration in the process of dealing with similar problems. Figure 3 only draws the addition of a fully connected layer to represent the relevance of tasks, but if you deal with complex problems, the network structure or the number of layers in the similarity network layer k can be flexibly changed according to different tasks to meet requirements. When the relationship between tasks is difficult to express, by adding more hidden layers, the similarity network layer k can learn the mathematical distribution characteristics of all data. These characteristics are represented in hidden nodes. When the network is forwarded, it can be Activate these nodes to encourage tasks to learn from each other.

2、多任务条件神经网络模型的数学分析2. Mathematical analysis of multi-task conditional neural network model

假设我们测试M个任务，对于多任务条件神经网络模型中第l任务，有n个不同输入的观测数据

和未观测数据Tl。 Suppose we test M tasks, for the lth task in the multi-task conditional neural network model, there are n different input observation data

And unobserved data Tl.

设Q _lθ为函数f _l:X _l←Y _l上的概率分布，即

f _l表示第l任务的目标函数，O _l＝{X _l,Y _l＝f _l(X _l)}为第l个任务的观测集合，L _l为对应第l个任务的损失函数，θ为所有定义Q的参数向量。 Let Q _lθ be the probability distribution on the function f _l :X _l ←Y _l , namely

f _l represents the objective function of the lth task, O _l = {X _l , Y _l = f _l (X _l )} is the observation set of the lth task, L _l is the loss function corresponding to the lth task, and θ is All parameter vectors that define Q.

在多任务条件神经网络模型的每个任务的条件神经网络模型中，数学结构与前述中的公式(2)、(3)和(4)相似。In the conditional neural network model of each task of the multi-task conditional neural network model, the mathematical structure is similar to the aforementioned formulas (2), (3) and (4).

对于第l任务相关性网络层k使用

结构，

代表

的参数，其含义与单任务的条件神经网络模型相似，

即为第l个任务的输出经过相似性网络层处理后的最终结果。 For the first task dependency network layer k use

structure,

representative

The parameters of, whose meaning is similar to that of a single-task conditional neural network model,

It is the final result after the output of the lth task is processed by the similarity network layer.

3、多任务条件神经网络模型的训练过程如下：3. The training process of the multi-task conditional neural network model is as follows:

在多任务条件神经网络模型中，每一个任务的输入观测数据都是相同的，即设置为X＝X ₁＝X ₂＝…＝X _M。由于先验知识是指保证输入每个任务的X保持一致，这样得出的相互结果才有关联性，否则会产生噪声，比如x1输入任务1，不同值x2输入任务2，得出的结果并没有一定的联系。因此，本申请中将每个任务的输入观测数据设置为相同的。 In the multi-task conditional neural network model, the input observation data of each task is the same, that is, it is set as X=X ₁ =X ₂ =...=X _M. Since a priori knowledge means to ensure that the X input to each task is consistent, the mutual results obtained in this way are related, otherwise noise will be generated, such as x1 input task 1, different values x2 input task 2, the results obtained are not There is no certain connection. Therefore, the input observation data of each task is set to be the same in this application.

在训练中，我们会适当调整从训练的数据集中选取的编码器的训练数据比例σ。因此，多任务条件神经网络模型在充分学习观测数据的前提下防止过拟合，同时还能保证模型在学习决策空间X上的不确定性。在保证随机性的前提下，按照设定的数据比例多次随机从数据集中选取出子集，作为编码器的训练数据集，可使得模型的泛化效果更好。During training, we will appropriately adjust the training data ratio σ of the encoder selected from the training data set. Therefore, the multi-task conditional neural network model prevents overfitting under the premise of fully learning the observation data, and at the same time guarantees the uncertainty of the model in the learning decision space X. Under the premise of ensuring randomness, a subset of the data set is randomly selected multiple times according to the set data ratio as the training data set of the encoder, which can make the generalization effect of the model better.

同样的，在模型训练以及更新参数时，也需要用到最小化损失函数。Similarly, when training the model and updating the parameters, it is also necessary to minimize the loss function.

令

则最小化负条件对数概率： make

Then minimize the negative conditional logarithmic probability:

其中w _l为第l任务损失函数的权重。为了均衡每个任务的比重，我们将权重设置为1。

Where w _l is the weight of the loss function of the lth task. In order to balance the weight of each task, we set the weight to 1.

如图4表示多任务条件神经网络模型的模型训练，在多任务条件神经网络模型进行训练时，每个任务的条件神经网络模型也会相应的训练，并且每个任务的条件神经网络模型都可以独立地学习其他任务的相关的数据集特征。Figure 4 shows the model training of the multi-task conditional neural network model. When the multi-task conditional neural network model is trained, the conditional neural network model of each task will be trained accordingly, and the conditional neural network model of each task can be Independently learn related data set features for other tasks.

也即是，首先对每个任务的条件神经网络模型进行训练，然后将各个任务组合成一个整体进行训练，同时更新每个任务的条件神经网络模型的参数，并且在最小化损失函数下反向更新每个任务的条件神经网络模型的网络参数，也即是将不同任务的信息分布到各个任务的条件神经网络模型的参数中。与此同时，也需要对相似性网络层k中的参数进行更新，并且每个参数趋向于表现出任务之间的相关性。That is, first train the conditional neural network model of each task, and then combine the tasks into a whole for training, and update the parameters of the conditional neural network model of each task at the same time, and reverse the loss function under the minimized loss function. To update the network parameters of the conditional neural network model of each task, that is, to distribute the information of different tasks to the parameters of the conditional neural network model of each task. At the same time, it is also necessary to update the parameters in the similarity network layer k, and each parameter tends to show the correlation between tasks.

多任务条件神经网络模型的训练具体步骤如下：The specific steps of multi-task conditional neural network model training are as follows:

步骤B1：设置模型训练迭代次数、算法最大迭代次数以及最小化损失函数等网络参数。该最小化损失函数为每个任务的损失函数的总和。Step B1: Set network parameters such as the number of model training iterations, the maximum number of algorithm iterations, and the minimized loss function. The minimized loss function is the sum of the loss functions of each task.

步骤B2：根据预设比例为每个任务选取对应的解码器训练集O和编码器训练集O ^σn进行模型训练。 Step B2: Select the corresponding decoder training set O and encoder training set O ^σn for each task according to the preset ratio for model training.

步骤B3：根据最小化损失函数以及Adam优化器反向传播更新算法更新每个任务的条件神经网络模型的参数和相似性网络层k的参数，即图4中虚线及虚线框相关参数。Step B3: Update the parameters of the conditional neural network model of each task and the parameters of the similarity network layer k according to the minimized loss function and the Adam optimizer back propagation update algorithm, that is, the dashed line and the dashed box related parameters in FIG. 4.

步骤B4：当模型训练达到模型训练迭代次数后，返回步骤B2～步骤B4，依次重新随机选取编码器训练集

进行模型训练以及参数更新等，以此类推。 Step B4: When the model training reaches the number of model training iterations, return to step B2 to step B4, and randomly select the encoder training set again in turn

Perform model training and parameter update, and so on.

步骤B5：当达到算法最大迭代次数后，网络训练完毕，得到了多任务条件神经网络模型。Step B5: When the maximum number of iterations of the algorithm is reached, the network training is completed, and the multi-task conditional neural network model is obtained.

多任务条件神经网络模型训练完成后，即可对任意未知点x _*进行预测，可以预测未知点在各个任务上的目标函数值，比如预测未知点在各个任务上的均值和方差。 After the multi-task conditional neural network model is trained, it can predict any unknown point x _*, and predict the objective function value of the unknown point on each task, such as predicting the mean and variance of the unknown point on each task.

如图5，每个任务的条件神经网络模型的解码器输入未知点x _*，在编码器训练集上仍然输入对应的训练集，这一部分是读取训练集的信息，并对后面的预测做出帮助。由于通过相似性网络层k将任务之间的信息进行分享，应每个任务的条件神经网络模型输出的预测值要明显好于单个任务所预测的结果。 As shown in Figure 5, the decoder of the conditional neural network model of each task inputs the unknown point x _* , and the corresponding training set is still input on the encoder training set. This part is to read the training set information and make subsequent predictions Help out. Since the information between tasks is shared through the similarity network layer k, the predicted value output by the conditional neural network model of each task is significantly better than the result predicted by a single task.

因此，为了实现超参数的优化，本发明提供了一种基于多任务条件神经网络模型的超参数优化方法，将多任务条件神经网络模型运用到贝叶斯决策算法的框架中，来实现超参数优化。如图6所示，根据前述可知，多任务条件神经网络模型具有优良的预测性能，本发明中将其作为代理模型应用到贝叶斯决策算法的框架中，实现对超参数的优化。 Therefore, in order to realize the optimization of hyperparameters, the present invention provides a hyperparameter optimization method based on a multi-task conditional neural network model. The multi-task conditional neural network model is applied to the framework of the Bayesian decision algorithm to realize the hyperparameter optimization. As shown in FIG. 6, according to the foregoing, the multi-task conditional neural network model has excellent predictive performance. In the present invention, it is applied as a proxy model to the framework of the Bayesian decision algorithm to optimize the hyperparameters .

该方法具体包括以下步骤：The method specifically includes the following steps:

步骤S1：网络参数的设置。比如确定参数的维度d、每个任务的初始数据的个数n(通常n＝11×d-1)、任务个数M、最大迭代次数T、以及第l个任务目标函数f _l等等参数。 Step S1: Setting of network parameters. For example, determine the dimension d of the parameters, the number n of initial data for each task (usually n=11×d-1), the number of tasks M, the maximum number of iterations T, and the objective function f _{l of the lth} task. .

步骤S2：根据参数的维度以及每个任务的初始数据的个数得出每个任务的初始值集合。比如初始值集合可表示为

Step S2: Obtain the initial value set of each task according to the dimension of the parameters and the number of initial data of each task. For example, the initial value set can be expressed as

例如：参数的维度为2，也即是两维的参数[x1,x2]，其中x1的范围[0,6]，x2的范围[6,10]。假设要产生4个初始点，即x1取2,4；x2取8,9。最后得出初始点[2,8],[2,9],[4,8],[4,9]。也即是说，在决策空间中均匀划分，然后在每个小部分中找点，最终得出初始点的数据，生成初始点集合。For example: the dimension of the parameter is 2, that is, the two-dimensional parameter [x1,x2], where the range of x1 is [0,6], and the range of x2 is [6,10]. Suppose that 4 initial points are to be generated, that is, x1 takes 2,4; x2 takes 8,9. Finally, the initial points [2,8],[2,9],[4,8],[4,9] are obtained. That is to say, divide evenly in the decision space, and then find points in each small part, and finally get the data of the initial point, and generate the initial point set.

步骤S3：将每个任务的初始值集合中的每个初始值输入到对应任务的目标函数中，计算得出对应任务的目标函数值，进而根据每个任务的初始值以及对应的目标函数值形成每个任务的观测集。也即是，目标函数值为

观测集为

Step S3: Input each initial value in the initial value set of each task into the objective function of the corresponding task, calculate the objective function value of the corresponding task, and then according to the initial value of each task and the corresponding objective function value Form the observation set for each task. That is, the objective function value is

The observation set is

步骤S4：根据预设比例从每个任务的观测集中选取一对应任务的子集，并将观测集作为解码器训练节，将子集作为编码器的训练集进行网络模型训练，进而训练得出多任务条件神经网络模型。其中，子集为

Step S4: Select a subset of the corresponding task from the observation set of each task according to the preset ratio, use the observation set as the decoder training section, and use the subset as the encoder training set for network model training, and then train to obtain Multi-task conditional neural network model. Among them, the subset is

步骤S5：根据多任务条件神经网络模型预测未知区域内任意点x在每个任务的预测值，也即是将该任意点输入到每个任务的条件神经网络模型中得出对应的目标函数值。比如均值

和方差σ _l＝1:M(x)。 Step S5: Predict the predicted value of any point x in each task in the unknown area according to the multi-task conditional neural network model, that is, input the arbitrary point into the conditional neural network model of each task to obtain the corresponding objective function value . Such as mean

The sum variance σ _l=1:M (x).

这样，每个任务均对应多个未知点以及对应目标函数值，其中未知点为任务的参数组合。In this way, each task corresponds to multiple unknown points and corresponding objective function values, where the unknown points are the combination of parameters of the task.

步骤S6：采用粒子群算法对上找到每个任务的最优参数组合。最优参数组合为

由于每个任务均在多个参数组合，因此根据每个任务的所有未知点与目标函数值，以及粒子群算法对未知点进行筛选，进而筛选出每个任务的最优点，也即是最优参数组合。 Step S6: Use the particle swarm algorithm to find the optimal parameter combination for each task. The optimal parameter combination is

Since each task is combined with multiple parameters, the unknown points are filtered according to all the unknown points and objective function values of each task, as well as the particle swarm algorithm, and then the best advantage of each task, that is, the optimal Parameter combination.

进一步地，对未知点进行筛选，本发明还采用采集函数EI进行筛选。由于采集函数EI具有油量的期望提升特性，因此，将采集函数EI作为粒子群算法的适应度函数。是根据未知区域每个点对每个任务额目标函数值计算得出每个任务的采集函数EI，并将每个任务的采集函数EI作为粒子群算法的适应度函数，然后根据粒子群算法选取采集函数EI的函数值最大的点作为每个任务的候选点，每个任务的候选点即为对应任务的最优参数组合。Further, to screen unknown points, the present invention also uses the collection function EI to screen. Since the collection function EI has the expected improvement characteristics of oil volume, the collection function EI is used as the fitness function of the particle swarm algorithm. It calculates the collection function EI of each task according to the value of the objective function of each task at each point in the unknown area, and uses the collection function EI of each task as the fitness function of the particle swarm algorithm, and then selects it according to the particle swarm algorithm The point with the largest function value of the collection function EI is used as the candidate point of each task, and the candidate point of each task is the optimal parameter combination of the corresponding task.

步骤S6：将每个任务的最优参数组合输入到对应任务的目标函数中进行真实值计算得出目标函数值，并将每个任务的最优参数组合以及目标函数值加入到对应任务的观测集中，并形成对应任务的新的观测集，然后根据新的观测集继续执行步骤S3～S6。比如对于第m个任务，将计算后的点以及目标函数值加入到观测集

然后再进行模型训练、预测、筛选等。 Step S6: Input the optimal parameter combination of each task into the objective function of the corresponding task to calculate the true value to obtain the objective function value, and add the optimal parameter combination and objective function value of each task to the observation of the corresponding task Concentrate and form a new observation set corresponding to the task, and then continue to perform steps S3 to S6 according to the new observation set. For example, for the mth task, add the calculated point and the objective function value to the observation set

Then perform model training, prediction, and screening.

如图7所示，当找出每个任务的最优参数组合后，将每个任务的最优参数组合进行真实评价，进而计算得出每个任务的最优参数组合对应的目标函数值。也即是说，将每个任务的最优参数组合与对应的目标函数值加入到对应任务的观测集中，再重新进行模型训练、预测、筛选以及评价等操作。这里的真实评价是指：再得到决策变量之后，将决策变量代入到测试问题中重新进行计算，也即是将最优参数组合输入到对应任务的目标函数中进行预测，得到对应的目标函数值。As shown in Figure 7, when the optimal parameter combination of each task is found, the optimal parameter combination of each task is actually evaluated, and then the objective function value corresponding to the optimal parameter combination of each task is calculated. That is to say, the optimal parameter combination of each task and the corresponding objective function value are added to the observation set of the corresponding task, and then the model training, prediction, screening, and evaluation operations are performed again. The real evaluation here refers to: after the decision variables are obtained, the decision variables are substituted into the test problem and recalculated, that is, the optimal parameter combination is input into the objective function of the corresponding task for prediction, and the corresponding objective function value is obtained .

步骤S7：以此类推，依次循环执行上述步骤，当达到算法最大迭代次数T后，从每个任务所形成的新的观测集中找到每个任务的目标函数值最大时所对应的点，也即是参数组合：

即为每个任务的最优超参数。 Step S7: By analogy, the above steps are executed in turn, when the maximum number of iterations of the algorithm T is reached, the new observation set formed by each task is to find the point corresponding to the maximum value of the objective function of each task, that is Is a combination of parameters:

That is the optimal hyperparameter for each task.

根据上述优化后的每个任务的超参数，通过该多任务条件神经网络模型就可以准确地预测任意点在每个任务上的目标函数值，比如均值和方差。According to the above optimized hyperparameters of each task, the multi-task conditional neural network model can accurately predict the objective function value of any point on each task, such as the mean value and the variance.

为了进一步说明上述超参数优化方法，本发明还给出了相关的实验测试，进而说明了本发明相对于现有技术的超参数优化的有效性、准确性，其具体数据参考如下：In order to further illustrate the above hyperparameter optimization method, the present invention also provides related experimental tests, which further illustrate the effectiveness and accuracy of the hyperparameter optimization of the present invention relative to the prior art. The specific data are referred to as follows:

在MNIST数据集错误！未找到引用源。和Fashion-MNIST数据集上测试Lenet-5网络多分类性能。Error in the MNIST data set! The reference source was not found. Test the multi-classification performance of the Lenet-5 network on the Fashion-MNIST dataset.

该实验设置了2种超参数问题：第一种设置学习率与偏置两个可变化超参数，其他超参数设置为默认定值；第二种是在第一种的基础上加上dropout值。在表格中使用Q1和Q2分别表示这两种问题，并展示三个任务在测试集上的准确率均值和标准差(括号中数据)。This experiment sets up two kinds of hyper-parameter problems: the first one sets two variable hyper-parameters of learning rate and bias, and the other hyper-parameters are set to default fixed values; the second one is to add the dropout value on the basis of the first one . In the table, use Q1 and Q2 to represent these two problems, and show the mean and standard deviation of the accuracy of the three tasks on the test set (data in parentheses).

本次实验通过采用条件神经网络模型和多任务条件神经网络模型这两种代理模型在贝叶斯决策算法框架中对于超参数优化的实验。也即是，寻找到一个能够使网络预测准确率高的超参数组合，进而可根据实验结果来验证本文给出的多任务条件神经网络模型对于超参数优化问题的有效性、准确性。另外，由于神经网络每次运行可能会得到不同的结果，为了消除误差所有准确率数值都是十次运行结果的平均值。This experiment uses two proxy models, the conditional neural network model and the multi-task conditional neural network model, to optimize the hyperparameters in the framework of the Bayesian decision algorithm. That is, to find a hyperparameter combination that can make the network prediction accuracy high, and then can verify the effectiveness and accuracy of the multi-task conditional neural network model given in this paper for the hyperparameter optimization problem based on the experimental results. In addition, since the neural network may get different results each time it is run, in order to eliminate errors, all accuracy values are the average of the results of ten runs.

表1神经网络测试MNIST准确率的均值和标准差Table 1 The mean and standard deviation of the accuracy of MNIST tested by neural network

表2神经网络测试Fashion-MNIST准确率的均值和标准差Table 2 The mean and standard deviation of the accuracy of the neural network test Fashion-MNIST

从表1和表2中，多任务条件神经网络模型(OMc-MTCNPs模型)的超参数表现优于基于条件神经网络模型(CNPs)的超参数优化、以及传统的基于高斯过程的超参数优化。因此，本发明提出的基于多任务条件神经网络模型的超参数优化方法更好。From Table 1 and Table 2, the hyperparameter performance of the multi-task conditional neural network model (OMc-MTCNPs model) is better than the hyperparameter optimization based on the conditional neural network model (CNPs) and the traditional hyperparameter optimization based on the Gaussian process. Therefore, the hyperparameter optimization method based on the multi-task conditional neural network model proposed by the present invention is better.

也即是说，本发明采用条件神经网络模型为贝叶斯决策算法框架中的代理模型，取代了传统的高斯过程，实现了单任务的超参数优化，并在条件神经网络模型的基础上通过相似性网络层拓展为多任务条件神经网络模型，实现了多任务的超参数优化。通过在多任务学习中提出了相似性网络层，与一般的正则化约束项不同的是其由神经网络组成，可以参数化任务之间的相似性，并且共享各个任务学习到的不同信息以提高每个任务的泛化性能。That is to say, the present invention adopts the conditional neural network model as the agent model in the Bayesian decision algorithm framework, replacing the traditional Gaussian process, realizing single-task hyperparameter optimization, and adopting the conditional neural network model on the basis of The similarity network layer is extended to a multi-task conditional neural network model, which realizes multi-task hyperparameter optimization. By proposing a similarity network layer in multi-task learning, it is different from the general regularization constraint item in that it is composed of a neural network, which can parameterize the similarity between tasks and share different information learned by each task to improve Generalization performance of each task.

另外，由于本发明是通过利用条件神经网络模型代替传统的高斯过程对于单任务建立模型，而条件神经网络模型是由多层特殊的网络结构组成，可以避免传统的高斯过程存在的复杂的协方差计算的问题，同时条件神经网络模型的学习拟合函数能力也比传统的高斯过程更强，并且不需要提前设定合适的先验分布(由于高斯过程是需要设定数据集分布模型(高斯分布)，然后利用数据集更新高斯分布中的参数，而且不一定适用于当前数据集。条件神经网络模型是设定特定的网络结构，只需要进行参数更新即可，具有很强的自适应性)，因此将条件神经网络模型与多任务学习结合所形成的多任务条件神经网络模型，大大提高了模型的学习能力，进而实现超参数优化，同时也避免了计算的复杂度。In addition, because the present invention uses the conditional neural network model to replace the traditional Gaussian process to build a model for a single task, and the conditional neural network model is composed of a multi-layer special network structure, it can avoid the complex covariance of the traditional Gaussian process The problem of calculation, and the learning and fitting function of the conditional neural network model is also stronger than the traditional Gaussian process, and there is no need to set a suitable prior distribution in advance (because the Gaussian process needs to set the data set distribution model (Gaussian distribution ), and then use the data set to update the parameters in the Gaussian distribution, and it may not be suitable for the current data set. Conditional neural network model is to set a specific network structure, only need to update the parameters, it has strong adaptability) Therefore, the multi-task conditional neural network model formed by combining the conditional neural network model with multi-task learning greatly improves the learning ability of the model, thereby realizing hyperparameter optimization, and avoiding computational complexity.

本发明还提供了一种电子设备，其包括存储器、处理器以及存储在存储器上并可在处理上运行的计算机程序，所述处理器执行所述程序时实现文中所述的一种深度神经网络超参数优化方法的步骤。The present invention also provides an electronic device, which includes a memory, a processor, and a computer program that is stored in the memory and can be run on the processing. The processor implements the deep neural network described in the article when the program is executed. Steps of hyperparameter optimization method.

本发明还提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现如文中所述的一种深度神经网络超参数优化方法的步骤。The present invention also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps of a deep neural network hyperparameter optimization method as described in the text are realized.

上述实施方式仅为本发明的优选实施方式，不能以此来限定本发明保护的范围，本领域的技术人员在本发明的基础上所做的任何非实质性的变化及替换均属于本发明所要求保护的范围。The foregoing embodiments are only preferred embodiments of the present invention, and cannot be used to limit the scope of protection of the present invention. Any insubstantial changes and substitutions made by those skilled in the art on the basis of the present invention belong to the present invention. The scope of protection required.

Claims

A method for optimizing hyperparameters of a deep neural network is characterized in that it comprises the following steps:

Parameter setting steps: set the number of training tasks, the objective function of each task, the number of initial values of each task, and the maximum number of iterations;

Model training steps: Obtain the observation set of each task according to the number of initial values of each task and the objective function, and train the network model according to the observation set of each task to obtain a multi-task conditional neural network model;

Prediction step: predict any point in the unknown area according to the multi-task conditional neural network model to obtain the objective function value of each point corresponding to each task;

Screening step: Screen the objective function value of each task according to the particle swarm algorithm and all points in the unknown area to obtain candidate points for each task;

Iterative steps: Substitute the candidate points of each task into the multi-task conditional neural network model to calculate the true value, and add the candidate points of each task and the calculated true value to the observation set of each task to form each The new observation set of the task, and then according to the new observation set, the model training step, the prediction step, the screening step and the iteration step are executed in sequence; until the maximum number of iterations is reached, the observation set formed by the last iteration is selected from the observation set corresponding to the largest response value The parameter combination of as a hyperparameter combination.

The method for optimizing deep neural network hyperparameters according to claim 1, characterized in that: the multi-task neural conditional network model learns from each other during the training of the conditional neural network model through multi-task learning, Then a multi-task conditional neural network model is formed.

The method for optimizing deep neural network hyperparameters according to claim 2, wherein the multi-task conditional neural network model combines the output layers of the conditional neural network model of multiple tasks with a similarity network layer to form a multi-task Conditional neural network model; the similarity network layer is composed of a fully connected network.

The method for optimizing hyperparameters of a deep neural network according to claim 1, wherein the model training step is specifically:

Step S11: Set the number of model training iterations, the maximum number of algorithm iterations, and the minimized loss function; the minimized loss function is the sum of the minimized loss functions of all tasks; the minimized loss function of each task is the minimized negative conditional log probability ；

Step S12: randomly select the corresponding initial value from the observation set of each task according to the preset ratio, generate the decoder training set of the corresponding task, and use the observation set of each task as the encoder training set of the corresponding task;

Step S13: Perform network model training according to the decoder training set and encoder training set of each task, and update the network parameters of the conditional neural network model of each task and the parameters of the similarity network layer according to the minimized loss function;

Step S14: When the network model training reaches the number of model training iterations, return to perform step S12 to step S14 in sequence; until the maximum number of algorithm iterations is reached, a multi-task conditional neural network model is obtained.

The method for optimizing hyperparameters of a deep neural network according to claim 4, wherein:

The step S13 further includes: updating the network parameters of the conditional neural network model and the similarity network layer of each task according to the minimized loss function and using the Adam optimizer back propagation algorithm.

The method for optimizing hyperparameters of a deep neural network according to claim 1, characterized in that: the generation process of the observation set is as follows: firstly, the objective function of each task is generated according to the objective function of each task and the number of initial values. The corresponding objective function value is obtained, and the observation value of each task is obtained according to the initial value of each task and the objective function value.

The method for optimizing the hyperparameters of a deep neural network according to claim 1, characterized in that: the screening step is: calculating the acquisition function EI of each task according to the objective function value of each task at each point in the unknown area , And use the collection function EI of each task as the fitness function of the particle swarm algorithm, and then select the point with the largest value of the collection function EI as the candidate point of each task according to the particle swarm algorithm.

An electronic device, comprising a memory, a processor, and a computer program stored on the memory and capable of running on the processor, characterized in that: the processor executes the program when the program is executed as in any one of claims 1-7 The steps of a deep neural network hyperparameter optimization method.

A computer-readable storage medium with a computer program stored thereon, wherein the computer program implements the deep neural network hyperparameter optimization method according to any one of claims 1-7 when the computer program is executed by a processor A step of.