CN109800875A

CN109800875A - Chemical industry fault detection method based on particle group optimizing and noise reduction sparse coding machine

Info

Publication number: CN109800875A
Application number: CN201910016558.1A
Authority: CN
Inventors: 苏堪裂; 李秀喜; 旷天亮
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-01-08
Filing date: 2019-01-08
Publication date: 2019-05-24

Abstract

The invention discloses a kind of chemical industry fault detection method based on particle group optimizing and noise reduction sparse coding machine, this method is mainly sparse from the unsupervised feature learning of code machine progress using the noise reduction of multiple stacks to the training data after standardization and albefaction, the training of Softmax sorter model is carried out by way of having supervision again, finally by the weight parameter for having supervision fine tuning whole network, particle swarm optimization algorithm automated tuning is introduced simultaneously for crucial adjustable hyper parameter, trained chemical process failure detection intelligent model is obtained for the fault detection of process real time data.The present invention uses the layer-by-layer training method of the greediness of deep neural network adaptively intelligence learning chemical process initial data institute tacit knowledge, so as to more accurately extract the information of procedure fault, the method is more intelligent compared with for conventional method, it can be improved the performance of fault detection, and since algorithms of automatic optimization is added, many times are saved than artificial parameter tuning.

Description

Chemical fault detection method based on particle swarm optimization and noise reduction sparse coding machine

技术领域technical field

本发明涉及化工过程故障检测与诊断领域，具体涉及一种基于粒子群优化和降噪稀疏编码机的化工故障检测方法。The invention relates to the field of chemical process fault detection and diagnosis, in particular to a chemical fault detection method based on particle swarm optimization and noise reduction sparse coding machine.

背景技术Background technique

化工过程故障检测作为化工过程异常工况管理最有力的工具之一，给过程安全预警提供了一定的保障，挽回了很大的经济损失。据美国国家安全监管局估计，异常工况给美国石油和化学工业造成了每年至少约200亿美元的经济损失，在英国，每年异常工况造成的损失高达270亿美元，因此开发性能较优的化工过程故障检测方法对于实际化工生产中至关重要。As one of the most powerful tools for the management of abnormal working conditions in the chemical process, chemical process fault detection provides a certain guarantee for process safety early warning and saves a lot of economic losses. According to estimates by the US National Security Administration, abnormal operating conditions have caused economic losses of at least about 20 billion US dollars per year to the US petroleum and chemical industry. In the United Kingdom, the annual loss caused by abnormal operating conditions is as high as 27 billion US dollars. The chemical process fault detection method is very important in the actual chemical production.

化工过程数据具有非线性、高维、非高斯分布等特征，因此对于故障检测过程提取故障的信息会更加复杂，传统的基于过程数据的化工过程故障检测方法也被开发出了很多，这类基于数据的方法并不需要提前获取大量的专家知识，只需获取化工过程采集到的数据，通过建立适当的故障检测模型，便可预测出当前系统的状况，因此这类方法在目前的科学研究和工业应用中都较为广泛。目前比较流行的PCA、ICA、KPCA、KICA、MICA等传统方法虽然可以有效地检测出某些故障，然而对于某些扰动性故障检出率极低，说明传统的方法依旧未能完全准确地提取出这些故障的信息，这就需要开发出一些新的方法，以提高故障的检出率。复杂化工过程的非线性、高噪声、非高斯分布等特性使得传统的化工过程故障检测方法并不表现出优良的诊断性能，因此开发适用于复杂非线性化工过程的故障监测方法是非常有必要的。而深度神经网络的逐层贪婪学习模式，能够更加准确地学习到化工过程原始数据所隐含的特征，从而应用适当的分类模型可以实现对化工过程的监控。Chemical process data has the characteristics of non-linear, high-dimensional, non-Gaussian distribution, etc., so it will be more complicated to extract fault information in the fault detection process. Many traditional chemical process fault detection methods based on process data have also been developed. The data method does not need to acquire a large amount of expert knowledge in advance, but only needs to acquire the data collected in the chemical process, and by establishing an appropriate fault detection model, the current system status can be predicted. It is widely used in industrial applications. Although the popular traditional methods such as PCA, ICA, KPCA, KICA, and MICA can effectively detect some faults, the detection rate of some disturbance faults is extremely low, indicating that the traditional methods still cannot fully accurately extract It is necessary to develop some new methods to improve the detection rate of these faults. The nonlinear, high noise, non-Gaussian distribution and other characteristics of complex chemical processes make traditional chemical process fault detection methods do not show excellent diagnostic performance. Therefore, it is necessary to develop fault monitoring methods suitable for complex non-linear chemical processes. . The layer-by-layer greedy learning mode of the deep neural network can more accurately learn the features hidden in the raw data of the chemical process, so that the application of an appropriate classification model can realize the monitoring of the chemical process.

随着深度学习的提出，众多计算机领域的学者开始转向于研究深度神经网络，深度神经网络与浅层神经网络的最大区别之处在于采用无监督学习的方式逐层训练深度网络，具有优异的特征学习能力，学习得到的特征对数据有更本质的刻画，从而有利于分类。自动编码机方向的研究是近年来深度学习领域的热门研究方向，因其在特征学习上具有优异的性能而被广泛应用于手写识别、图像分类、音频特征提取等方面，并在其他领域也逐渐应用广泛，例如在机械故障诊断中，开始陆续有学者应用自动编码机或改进的自动编码机来进行特征学习。2006年，深度学习著名学者Hiton等提出了自动编码机(Auto Encoder,AE)的深度神经网络算法，并应用到图像识别中得到了很好的分类效果，随后基于这种基础的自动编码机众多学者提出了改进的算法，例如，Bengio等提出了一种稀疏自动编码机(Sparse Auto Encoder,SAE)，加入了稀疏性限制，使学习到的特征能够更加准确地还原出原始数据；Vincent等在2008年提出了一种降噪自动编码机(Denoising Auto Encoder,DAE),通过在无监督特征学习时加入噪声降噪的策略，使学习到的特征更具有鲁棒性。栈式降噪稀疏自动编码机(Stacked Denoising Sparse Autoencoder,SDSA)是近三年研究较为热门的改进的自动编码机，栈式是指多个降噪稀疏自动编码机堆叠起来进行逐层训练和逐层学习的意思。该算法是以降噪稀疏自动编码机为基体，通过堆叠多个类似的降噪稀疏自动编码机，以无监督的方式逐层训练其各层权重参数和偏置项，从而搭建了一个深度神经网络，所学习到的特征能够应对高噪声的数据和使学习到的特征具有稀疏性，其训练方法是采用无监督的方式进行的，因此在特征学习上该算法比传统的人工提取特征具有自动化和智能化的优点，同时由于其算法在损失函数上是基于最小信息熵损的理论，因此所学习的特征具有较强的准确性，能够更为准确地预测模型的激活值。With the introduction of deep learning, many scholars in the field of computer began to study deep neural networks. The biggest difference between deep neural networks and shallow neural networks is that they use unsupervised learning to train deep networks layer by layer, which has excellent characteristics. Learning ability, the learned features have a more essential description of the data, which is conducive to classification. The research in the direction of auto-encoder is a popular research direction in the field of deep learning in recent years. Because of its excellent performance in feature learning, it is widely used in handwriting recognition, image classification, audio feature extraction, etc., and gradually in other fields. It is widely used. For example, in mechanical fault diagnosis, some scholars have begun to use automatic encoders or improved automatic encoders for feature learning. In 2006, the famous deep learning scholar Hiton et al. proposed the deep neural network algorithm of Auto Encoder (AE), and applied it to image recognition to obtain a good classification effect. Scholars have proposed improved algorithms. For example, Bengio et al. proposed a sparse auto encoder (Sparse Auto Encoder, SAE), adding sparsity constraints, so that the learned features can more accurately restore the original data; Vincent et al. In 2008, a Denoising Auto Encoder (DAE) was proposed, which made the learned features more robust by adding noise reduction strategies in unsupervised feature learning. Stacked Denoising Sparse Autoencoder (SDSA) is a popular improved autoencoder in the past three years. Stacked means that multiple denoising sparse autoencoders are stacked for layer-by-layer training and Layer learning means. The algorithm is based on the denoising sparse auto-encoder, and builds a deep neural network by stacking multiple similar denoising sparse auto-encoders to train the weight parameters and bias terms of each layer layer by layer in an unsupervised manner. Network, the learned features can deal with high noise data and make the learned features sparse, and its training method is carried out in an unsupervised manner, so the algorithm is more automatic than traditional manual feature extraction in feature learning. At the same time, because its algorithm is based on the theory of minimum information entropy loss in the loss function, the learned features have strong accuracy and can more accurately predict the activation value of the model.

由于本发明公开的方法涉及较多的人工可调超参数，采用人工试验的方法进行选择超参数具有随机性和耗时性，因此要对多参数进行单目标离散优化，以实现方法的自动调优功能，从而使方法更具智能性，达到更好的性能。常用的参数优化算法有遗传算法、人工鱼群算法、粒子群算法等。粒子群优化(PSO)算法是由Kennedy等人于1995年提出的一种模拟鸟群觅食行为的仿生算法，该算法以其收敛速度快、实现过程简单、优化过程易于理解等优点在科学和工程领域得到了广泛的应用，成为了发展最快的参数智能优化算法之一。Since the method disclosed in the present invention involves many manually adjustable hyperparameters, the selection of hyperparameters by manual testing is random and time-consuming. Therefore, single-objective discrete optimization should be performed on multiple parameters to realize automatic adjustment of the method. optimization features, so that the method is more intelligent and achieves better performance. Commonly used parameter optimization algorithms include genetic algorithm, artificial fish swarm algorithm, particle swarm algorithm, etc. Particle swarm optimization (PSO) algorithm is a bionic algorithm proposed by Kennedy et al. in 1995 to simulate the foraging behavior of birds. It has been widely used in the engineering field and has become one of the fastest-growing parameter intelligent optimization algorithms.

发明内容SUMMARY OF THE INVENTION

本发明的目的是针对现有技术的不足，提供了一种基于粒子群优化和降噪稀疏编码机的化工故障检测方法。所述方法将深度学习中的栈式降噪稀疏自动编码机(SDSA)算法应用到化工过程的特征学习中，再通过有监督的方式进行Softmax分类器模型训练，最后通过BP算法微调整个网络的权重参数，同时对于关键的可调超参数引入了粒子群优化算法自动调优。本方法能够采用深度神经网络的贪婪逐层训练方法自适应地智能学习原始数据所隐含的知识，从而提取出故障的信息，所提出的方法更具智能性、且由于加入自动优化算法，比人工调优节省了很多时间。The purpose of the present invention is to provide a chemical fault detection method based on particle swarm optimization and noise reduction sparse coding machine, aiming at the deficiencies of the prior art. The method applies the stack-type noise reduction sparse automatic encoder (SDSA) algorithm in deep learning to the feature learning of chemical process, and then conducts Softmax classifier model training in a supervised manner, and finally fine-tunes the entire network through BP algorithm. weight parameters, and the particle swarm optimization algorithm is introduced for automatic tuning of key adjustable hyperparameters. This method can use the greedy layer-by-layer training method of the deep neural network to adaptively and intelligently learn the implicit knowledge of the original data, so as to extract the fault information. Manual tuning saves a lot of time.

本发明的目的可以通过如下技术方案实现：The purpose of the present invention can be realized by following technical scheme:

一种基于粒子群优化和降噪稀疏编码机的化工故障检测方法，所述方法包括以下步骤：A chemical fault detection method based on particle swarm optimization and noise reduction sparse coding machine, the method comprises the following steps:

步骤一、数据采集：Step 1. Data collection:

对于模拟系统或DCS系统采集到的历史时序数据作为训练样本集X_train，来自DCS系统的化工过程实时数据作为测试样本集X_test，其中采集到的训练样本集X_train包含正常工况下和各种故障工况下的时序数据，用于建立本方法的智能故障检测模型，测试样本集X_test是在线监测的实时工况数据，也包含正常工况下和各种故障下的时序数据，用于验证本方法的诊断精度或实际工业中应用本方法建立的模型来实现故障检测；The historical time series data collected by the simulation system or DCS system is used as the training sample set X _train , and the real-time chemical process data from the DCS system is used as the test sample set X _test , wherein the collected training sample set X _train includes The time series data under various fault conditions is used to establish the intelligent fault detection model of this method. The test sample set X _test is the real-time operating condition data monitored online, and also includes the time series data under normal conditions and various faults. To verify the diagnostic accuracy of the method or apply the model established by the method in the actual industry to realize fault detection;

步骤二、数据预处理：Step 2: Data preprocessing:

首先计算出训练样本集X_train中正常工况下的数据的各监测变量的均值X_mean和标准差X_std，然后将训练样本集X_train和测试样本集X_test都利用该均值X_mean和标准差X_std进行标准化预处理，标准化预处理后的训练样本集X_trainstd和测试样本集X_teststd再进行白化预处理得到白化后的训练样本集X_trainwhite和测试样本集X_testwhite，至此完成训练样本和测试样本集的预处理；First, calculate the mean value X _mean and standard deviation X _std of each monitoring variable of the data in the training sample set X _train under _normal working conditions, and _then use the mean value X _mean and the standard The difference X _std is subjected to standardization preprocessing, and the preprocessed training sample set X _trainstd and test sample set X _teststd are subjected to whitening preprocessing to obtain the whitened training sample set X _trainwhite and test sample set X _testwhite , so far the training samples and Preprocessing of the test sample set;

步骤三、离线训练：Step 3. Offline training:

离线训练的目的是采用预处理好的训练样本建立化工过程智能故障检测模型，其过程可分为无监督预训练栈式降噪稀疏自动编码机、有监督预训练Softmax分类器、BP算法全局微调网络参数、粒子群优化可调超参数四大部分；首先进行无监督预训练栈式降噪稀疏自动编码机，对预处理后的训练集采用N个降噪稀疏编码机将样本集逐层编码为特征空间，采用逐层训练的方法，在每一层训练过程中优化其各层的损失函数最小化，从而得到各层模型参数，最终得到其N个隐层学习到的训练样本的最终特征h_N；其次进行有监督预训练Softmax分类器，将h_N作为Softmax分类器模型的输入，将所有学习到的训练样本特征附加对应的工况标签yⁱ＝1代表该样本是正常的，yⁱ＝2代表该样本是故障，通过最优化Softmax的成本函数从而得到其预训练后的Softmax模型参数；然后进行BP算法全局微调网络参数，对整个训练过程的所有模型参数进行有监督地微调，以无监督预训练好的SDSA模型参数和有监督预训练Softmax模型参数作为初始值，以预处理好的训练样本为输入层，首先通过这些SDSA参数逐层特征学习,从而得到最终隐层的特征矩阵，将该特征矩阵通过Softmax分类器计算其损失函数值，然后使用反馈传播(BP)算法优化全局参数从而使得损失函数最小化收敛即可，从而得到初步训练好的故障检测模型；最后进行粒子群优化可调超参数，由于整个智能故障检测模型中的人工可调超参数并非最优的，如果这些超参数设置不当可能会导致SDSA的性能并非最优，因此将调用粒子群(PSO)优化算法进行超参数调优，最后确定出最优化超参数的值和其对应的全部模型参数，从而得到最终训练好的故障检测模型，其模型可用于过程状态的在线监测；The purpose of offline training is to use pre-processed training samples to establish an intelligent fault detection model for chemical processes. The process can be divided into unsupervised pre-training stack noise reduction and sparse auto-encoder, supervised pre-training Softmax classifier, and BP algorithm global fine-tuning. Network parameters and particle swarm optimization can be divided into four major parts; first, unsupervised pre-training stack type noise reduction sparse automatic encoder is carried out, and N noise reduction sparse encoders are used for the preprocessed training set to convert the sample set layer by layer. Coding into feature space, using the method of layer-by-layer training, in the training process of each layer, the loss function of each layer is optimized to minimize, so as to obtain the model parameters of each layer, and finally obtain the final training sample learned by its N hidden layers. Feature h _N ; secondly, supervised pre-training Softmax classifier is performed, h _N is used as the input of the Softmax classifier model, and all the learned training sample features are attached with corresponding working condition labels y ⁱ =1 represents that the sample is normal, y ⁱ =2 represents that the sample is faulty, and the pre-trained Softmax model parameters are obtained by optimizing the cost function of Softmax; then the BP algorithm is used to globally fine-tune the network parameters. All model parameters in the training process are fine-tuned in a supervised manner. The unsupervised pre-trained SDSA model parameters and the supervised pre-trained Softmax model parameters are used as initial values, and the pre-processed training samples are used as the input layer. First, pass these SDSA parameters. Layer-by-layer feature learning is used to obtain the feature matrix of the final hidden layer. The feature matrix is used to calculate its loss function value through the Softmax classifier, and then the feedback propagation (BP) algorithm is used to optimize the global parameters so that the loss function can be minimized and converged. Obtain the fault detection model that has been initially trained; finally, adjust the hyperparameters by particle swarm optimization. Since the artificially adjustable hyperparameters in the entire intelligent fault detection model are not optimal, if these hyperparameters are set improperly, the performance of SDSA may not be the same. Therefore, the particle swarm (PSO) optimization algorithm will be called for hyperparameter tuning, and finally the optimal hyperparameter values and all corresponding model parameters will be determined, so as to obtain the final trained fault detection model, which can be used for Online monitoring of process status;

步骤四、在线监控：Step 4. Online monitoring:

对于实时的连续生产的化工过程，利用上述步骤三训练好的故障检测模型，可以有效地预测当前过程是否处于故障状态；以预处理后的测试样本X_testwhite作为输入层，根据训练好的故障检测模型，对确定的神经网络结构，采用逐层特征学习法得到最终学习到的测试特征，再通过所学习的最终特征计算其Softmax预测函数的隶属概率值，当得出的隶属类别为1，则说明该工况属于正常，当隶属类别为2，则说明有异常工况出现，随即连接的报警装置发出预警，指示故障检出，同时通知工艺员或工程师及时地检查系统安全并排除故障，从而可以实现对当前工况进行故障监控。For the chemical process of real-time continuous production, the fault detection model trained in the above step 3 can effectively predict whether the current process is in a fault state; using the preprocessed test sample X _testwhite as the input layer, according to the trained fault detection model The model, for the determined neural network structure, adopts the layer-by-layer feature learning method to obtain the final learned test features, and then calculates the membership probability value of its Softmax prediction function through the learned final features. When the obtained membership category is 1, then It indicates that the working condition is normal. When the affiliation category is 2, it means that there is an abnormal working condition, and the connected alarm device will issue an early warning to indicate the fault detection. It can realize fault monitoring of the current working condition.

进一步地，步骤二中，通过以下步骤实现训练样本集X_train和测试样本集X_test的标准化预处理：Further, in step 2, the standardized preprocessing of the training sample set X _train and the test sample set X _test is realized by the following steps:

(1)、训练样本集X_train为n×m的矩阵，其中n为样本数，m为观测变量数，通过下式求解标准化处理后的训练样本集X_trainstd和测试样本集X_teststd：(1) The training sample set X _train is an n×m matrix, where n is the number of samples and m is the number of observed variables. The standardized training sample set X _trainstd and the test sample set X _teststd are solved by the following formula:

其中，X_ij表示训练样本集X_train和测试样本集X_test中第i个样本的第j个变量的值，X_mean,j为训练样本X_train中正常工况下数据的第j个变量的均值，X_std,j训练样本X_train中正常工况下数据的第j个变量的标准差。Among them, X _ij represents the value of the jth variable of the ith sample in the training sample set X _train and the test sample set X _test , and X _mean,j is the value of the jth variable of the data in the training sample X _train under normal conditions. Mean, X _std,j The standard deviation of the jth variable of the data under normal conditions in the training sample X _train .

进一步地，步骤二中，通过以下步骤实现训练样本集X_train和测试样本集X_test的标准化和白化预处理：Further, in step 2, the standardization and whitening preprocessing of the training sample set X _train and the test sample set X _test is realized by the following steps:

(1)、将经过标准化处理后的训练样本集X_trainstd和测试样本集X_teststd进行白化预处理，通过下式先对训练样本的协方差矩阵进行特征值分解，得到协方差矩阵的特征向量的正交矩阵和其特征值的对角矩阵，从而得到白化矩阵W_white：(1) Perform whitening preprocessing on the standardized training sample set X _trainstd and the test sample set X _teststd , first perform eigenvalue decomposition on the covariance matrix of the training sample by the following formula, and obtain the eigenvector of the covariance matrix. The orthogonal matrix and the diagonal matrix of its eigenvalues, resulting in the whitening matrix W _white :

Cov＝VDV^T (2)Cov=VDV ^T (2)

W_white＝VD^-1/2V^T (3)W _white = VD ^-1/2 V ^T (3)

其中，Cov是标准化处理后的训练样本集X_trainstd的协方差矩阵，V是协方差矩阵的特征向量的正交矩阵，D是协方差阵的特征值的对角矩阵；Among them, Cov is the covariance matrix of the standardized training sample set X _trainstd , V is the orthogonal matrix of the eigenvectors of the covariance matrix, and D is the diagonal matrix of the eigenvalues of the covariance matrix;

(2)、然后白化处理后的训练样本X_trainwhite和测试样本X_testwhite均由W_white计算得出：(2), then the training sample X _trainwhite and the test sample X _testwhite after whitening are calculated by W _white :

其中，W_white为白化预处理的白化矩阵，X_trainwhite是白化处理后的训练样本集，X_testwhite是白化处理后的测试样本集。Among them, W _white is the whitening matrix of whitening preprocessing, X _trainwhite is the training sample set after whitening processing, and X _testwhite is the testing sample set after whitening processing.

进一步地，步骤三中，所述离线训练中的无监督预训练栈式降噪稀疏自动编码机具体通过以下步骤进行：Further, in step 3, the unsupervised pre-training stack-type noise reduction and sparse automatic encoder in the offline training is specifically performed through the following steps:

(1)、对于预处理后的训练样本X_trainwhite为n×m的一个矩阵，n为样本个数，m为观测变量个数，假设其栈式降噪稀疏自动编码机SDSA的隐层数为N个，设置其深度神经网络结构，即主要确定各隐层网络的节点数HL₁——HL_N，其深度神经网络的各层主要定义为：输入层(预处理后的训练样本)、N个隐层(即学习N层特征)、分类层(即输出的预测概率)，因此其全局网络可以看成是具有N+2层的神经网络；首先初始化第一隐层的权重矩阵参数，将X_trainwhite作为SDSA的输入层，训练出第一隐层的降噪稀疏自动编码机(DSA₁)，通过下式对训练样本X_trainwhite加入部分服从正态分布的高斯噪声，将训练集变成混有噪声的数据集Xc：(1) For the preprocessed training sample X _trainwhite is a matrix of n×m, n is the number of samples, m is the number of observation variables, assuming that the number of hidden layers of the stack-type noise reduction sparse automatic encoder SDSA is N, set its deep neural network structure, that is, mainly determine the number of nodes HL ₁ -HL _N of each hidden layer network, and each layer of its deep neural network is mainly defined as: input layer (training samples after preprocessing), N A hidden layer (that is, learning N-layer features) and a classification layer (that is, the predicted probability of output), so its global network can be regarded as a neural network with N+2 layers; first, initialize the weight matrix parameters of the first hidden layer, and set the X _{trainwhite is} used as the input layer of SDSA to train the noise reduction sparse auto-encoder (DSA ₁ ) of the first hidden layer. The training sample X _{trainwhite is} added with some Gaussian noise that obeys the normal distribution by the following formula, and the training set becomes a mixed Noisy dataset Xc:

Xc＝X+le*G (5)Xc=X+le*G (5)

其中，X为未加入噪声时的训练集矩阵，第一隐层时X即为预处理后的训练集X_trainwhite；le为噪声等级，可以人为设置(其值取0-1，一般取0.1即可)；G为产生的与未加入噪声时的训练集矩阵同维度的高斯噪声，Xc为含有人为噪声的数据集；Among them, X is the training set matrix when no noise is added, and X is the preprocessed training set X _trainwhite in the first hidden layer; le is the noise level, which can be set manually (its value is 0-1, generally 0.1 is can); G is the generated Gaussian noise with the same dimension as the training set matrix when no noise is added, and Xc is the data set containing artificial noise;

(2)、添加完干扰噪声后使用Xc作为输入进行编码学习和解码重构，编码阶段实质上是特征学习的阶段，解码则是对特征数据的重构，其编码公式和解码公式为：(2) After adding the interference noise, use Xc as the input to perform coding learning and decoding reconstruction. The coding stage is essentially the stage of feature learning, and the decoding is the reconstruction of the feature data. The coding formula and decoding formula are:

其中，h为学习的特征，Y为通过特征h重构出的信息，其值与未加入噪声的原始数据X越接近说明模型参数训练得越好；W是编码权重参数矩阵，b₁,b₂为偏置向量，W^T是解码权重参数矩阵，通过上式可学习到第一隐层的特征h₁和第一隐层的模型参数W₁， ₁₁,b₂₁；Among them, h is the learned feature, Y is the information reconstructed by the feature h, the closer the value is to the original data X without noise, the better the model parameters are trained; W is the encoding weight parameter matrix, b ₁ , b ₂ is the bias vector, W ^T is the decoding weight parameter matrix, the feature h ₁ of the first hidden layer and the model parameter W ₁ of the first hidden layer can be learned through the above formula, ₁₁ ,b ₂₁ ;

(3)、然而可以注意到这些模型参数并非最优的，因此导致学习的第一隐层特征并不能深刻地表达其原始数据所包含的信息；只需定义合理的损失函数，然后不断优化该损失函数达到最小值即能发挥模型的最优能力，其损失函数如下：(3) However, it can be noticed that these model parameters are not optimal, so the learned first hidden layer features cannot deeply express the information contained in the original data; just define a reasonable loss function, and then continuously optimize the When the loss function reaches the minimum value, the optimal ability of the model can be exerted. The loss function is as follows:

其中，L_total(W,W^T,b₁,b₂)代表当前隐层的降噪稀疏自动编码机的损失函数值,n为样本个数，Y_i为第i个样本的重构信息向量值，X_i为第i个样本未加入噪声时的原始数据向量值，为第l层的权重参数的第i行第j列的值，λ_r是正则化权重衰减超参数，用于调节该方程的权重；sl表示单个降噪自动编码机的第l层的节点数，s(l+1)表示第l+1层的节点数，β是控制稀疏性惩罚项的权重超参数，需要人工调节至一个较合适的值；s₂为当前隐层学习特征的节点数，即单个样本所学习到的特征的个数；是相对熵，代表当前平均激活度与稀疏性约束的差异；ρ为稀疏性参数，表示稀疏性约束，一般是人为设置的较小值，例如ρ取0.1；为特征矩阵h中第j个变量的平均激活度，代表当前隐层的特征矩阵h中第i个样本的第j个变量的值；Among them, L _total (W,W ^T ,b ₁ ,b ₂ ) represents the loss function value of the noise reduction sparse auto-encoder of the current hidden layer, n is the number of samples, and Y _i is the reconstructed information vector of the ith sample value, X _i is the original data vector value of the ith sample without adding noise, is the value of the i-th row and the j-th column of the weight parameter of the l-th layer, λ _r is the regularization weight decay hyperparameter used to adjust the weight of the equation; sl represents the number of nodes in the l-th layer of a single noise reduction auto-encoder , s(l+1) represents the number of nodes in the l+1th layer, β is the weight hyperparameter that controls the sparsity penalty term, and needs to be manually adjusted to a more appropriate value; s ₂ is the number of nodes of the current hidden layer learning feature , that is, the number of features learned by a single sample; is the relative entropy, which represents the difference between the current average activation degree and the sparsity constraint; ρ is the sparsity parameter, which represents the sparsity constraint, which is generally a small value set artificially, for example, ρ takes 0.1; is the average activation of the jth variable in the feature matrix h, Represents the value of the jth variable of the ith sample in the feature matrix h of the current hidden layer;

在离线训练时能够调用matlab工具箱minFunc对方程(8)进行最小化损失函数，即可求出第一隐层预训练后最优化的权重参数W₁,和偏置项b₁₁,b₂₁，然后将训练好的模型参数用于第一隐层训练样本的特征学习；During offline training, the matlab toolbox minFunc can be called to minimize the loss function of equation (8), and the optimized weight parameter W ₁ after the first hidden layer pre-training can be obtained, and bias terms b ₁₁ , b ₂₁ , and then use the trained model parameters for feature learning of the training samples of the first hidden layer;

(4)、上述前三个步骤相当于完成了第一隐层的模型离线训练，由于本方法的栈式降噪稀疏自动编码机是含有N个隐层的，因此需要重复调用步骤(1-3)进行多隐层网络的模型参数训练，直至完成N个隐层的SDSA预训练。(4) The first three steps above are equivalent to completing the offline training of the model of the first hidden layer. Since the stack-type noise reduction sparse automatic encoder of this method contains N hidden layers, it is necessary to repeat the calling step (1- 3) The model parameter training of the multi-hidden layer network is performed until the SDSA pre-training of N hidden layers is completed.

进一步地，所述步骤(4)具体包括：Further, the step (4) specifically includes:

首先对于上述第一隐层的降噪稀疏自动编码机(DSA₁)训练完成后，其训练样本X_trainwhite通过下式被编码为h₁：First, after the training of the noise reduction sparse autoencoder (DSA ₁ ) of the first hidden layer is completed, its training sample X _trainwhite is encoded as h ₁ by the following formula:

h₁＝f(W₁X+b₁₁) (11)h ₁ =f(W ₁ X+b ₁₁ ) (11)

其中，W₁，b₁₁为训练好的DSA₁的模型参数，X为未加入噪声时的训练集矩阵，第一隐层时X即预处理后的训练集X_trainwhite，第2-N隐层时则表示为前一层所学习到的特征矩阵；然后再使用学习好的第一隐层特征h₁作为第二隐层降噪稀疏自动编码机(DSA₂)的输入，重复调用步骤(1-3)去训练DSA₂，得到第二隐层的模型权重参数和偏置项W₂,b₁₂,b₂₂，并将其第二隐层的特征通过上式(11)编码为h₂，重复这个过程直至DSA_N训练完毕，得到其N个隐层学习到的最终特征h_N，将预训练所学习到的特征用于后续分类层，可以训练Softmax分类器的模型参数。Among them, W ₁ , b ₁₁ are the model parameters of the trained DSA ₁ , X is the training set matrix when no noise is added, X is the preprocessed training set X _trainwhite in the first hidden layer, and the second-N hidden layer is represented as the feature matrix learned by the previous layer; then use the learned first hidden layer feature h ₁ as the input of the second hidden layer noise reduction sparse auto-encoder (DSA ₂ ), and repeat the calling step (1 -3) To train DSA ₂ , get the model weight parameters and bias term W ₂ of the second hidden layer, b ₁₂ , b ₂₂ , encode the feature of the second hidden layer into h ₂ through the above formula (11), repeat this process until the DSA _N training is completed, and obtain the final feature h _N learned by its N hidden layers, and set The features learned by pre-training are used for the subsequent classification layer, and the model parameters of the Softmax classifier can be trained.

进一步地，步骤三中，所述离线训练中的有监督预训练Softmax分类器具体通过以下步骤进行：Further, in step 3, the supervised pre-training Softmax classifier in the offline training is specifically performed through the following steps:

(1)、将含有N个隐层的SDSA训练后所学习到的第N隐层特征h_N作为Softmax分类器模型的输入，然后附加对应的工况标签yⁱ＝1代表第i个样本是正常的，yⁱ＝2代表第i个样本是故障；首先初始化其模型参数矩阵θ，根据以下Softmax分类器预测函数预测其样本属于各类的概率值:(1) The Nth hidden layer feature hN learned after SDSA training with _N hidden layers is used as the input of the Softmax classifier model, and then the corresponding working condition label is attached y ⁱ = 1 represents that the ith sample is normal, and y ⁱ = 2 represents that the ith sample is faulty; first initialize its model parameter matrix θ, and predict the probability value of its sample belonging to each category according to the following Softmax classifier prediction function:

其中，为第i个样本隶属于各类别的概率值，θ是Softmax分类器的模型参数矩阵，由向量构成；k为分类器所定义的类别数，这里k＝2；为第i个样本的第N隐层的特征向量；in, is the probability value of the i-th sample belonging to each category, and θ is the model parameter matrix of the Softmax classifier, which is represented by Vector composition; k is the number of categories defined by the classifier, where k=2; is the feature vector of the Nth hidden layer of the ith sample;

(2)、由于预测函数预测出的各类隶属概率值并非准确的，需要构建一个分类损失函数，从而得到最优化的模型参数，采用的Softmax分类器考虑了正则化项，从而可以有效地避免分类模型训练过拟合，定义损失函数如下式:(2) Since the various membership probability values predicted by the prediction function are not accurate, a classification loss function needs to be constructed to obtain the optimal model parameters. The Softmax classifier used considers the regularization term, which can effectively avoid The classification model is trained and overfitted, and the loss function is defined as follows:

其中，1{.}部分的含义为括号内的指标函数为真则返回1，否则返回0；λ_sm是Softmax的权重衰减项系数，λ_sm>0，是人工可调的超参数；对于上述损失函数，能够调用matlab工具箱minFunc优化其损失函数最小，从而求出Softmax分类器模型的最优化参数矩阵θ。Among them, the meaning of the 1{.} part is that if the index function in the _parentheses is true, it will return 1, otherwise it will _return 0; Loss function, you can call the matlab toolbox minFunc to optimize the minimum loss function, so as to obtain the optimal parameter matrix θ of the Softmax classifier model.

进一步地，步骤三中，所述离线训练中的BP算法全局微调网络参数具体通过以下步骤进行：Further, in step 3, the global fine-tuning of network parameters of the BP algorithm in the offline training is specifically performed through the following steps:

以预训练好的所有模型参数{(W₁,b₁₁),(W₂,b₁₂),…,(W_N,b_1N),θ}作为初始值，以预处理好的训练样本X_trainwhite为输入层，首先通过SDSA的模型初始参数采用方程(11)进行逐层特征学习,从而得到最终隐层的特征矩阵，将该特征矩阵通过分类层计算方程(13)的损失值，然后使用反馈传播算法优化全局参数，每次迭代则更新一次所有模型参数，从而使得方程(13)的损失值收敛到最小即可，至此则完成了微调过程。Take all pre-trained model parameters {(W ₁ ,b ₁₁ ),(W ₂ ,b ₁₂ ),…,(W _N ,b _1N ),θ} as initial values, and take the pre-trained training sample X _trainwhite For the input layer, firstly through the model initial parameters of SDSA, Equation (11) is used to perform layer-by-layer feature learning, so as to obtain the final feature matrix of the hidden layer. The propagation algorithm optimizes the global parameters, and updates all model parameters once in each iteration, so that the loss value of equation (13) can be converged to the minimum, and the fine-tuning process is completed.

进一步地，步骤三中，所述离线训练中的粒子群优化可调超参数具体通过以下步骤进行：Further, in step 3, the particle swarm optimization adjustable hyperparameters in the offline training are specifically performed through the following steps:

(1)、本方法的参数可分为模型参数和可调参数两大类，所述方法对三个关键的可调参数进行优化，分别为：正则化权重衰减超参数λ_r，控制稀疏性惩罚项的权重超参数β，Softmax的权重衰减项系数λ_sm；以这三个参数构成PSO优化的单个粒子(λ_r,β,λ_sm),因此各个粒子的维度为3维；定义N_p个粒子进行同时寻优，设置K次迭代，初始化各粒子的初始位置和速度，其适应度函数值为训练样本的总体准确率，计算方法是以这三个关键可调超参数为自变量，通过训练好上述微调后的故障检测模型可得出训练样本的总体准确率Accuracy，其公式定义为下式:(1) The parameters of this method can be divided into two categories: model parameters and adjustable parameters. The method optimizes three key adjustable parameters, namely: regularization weight decay hyperparameter λ _r , control sparsity The weight hyperparameter β of the penalty term, the weight decay term coefficient λ _sm of Softmax; these three parameters constitute a single particle (λ _r , β, λ _sm ) optimized by PSO, so the dimension of each particle is 3 dimensions; define N _p Each particle is optimized at the same time, K iterations are set, the initial position and velocity of each particle are initialized, and the fitness function value is the overall accuracy rate of the training sample. The calculation method uses these three key adjustable hyperparameters as independent variables, By training the above fine-tuned fault detection model, the overall accuracy Accuracy of the training samples can be obtained, and its formula is defined as the following formula:

其中，pⁱ代表本故障检测模型预测出的第i个训练样本的所属类别，yⁱ代表人为给定的标签中第i个训练样本的所属类别，若二者相等，则返回为1，若不相等，意味着当前样本被模型诊断错误，返回值为0；通过上式(14)计算出各粒子的的最优适应度和位置以及粒子群的全局最优适应度和位置；Among them, pi represents the category of the ⁱ -th training sample predicted by the fault detection model, and ^yi represents the category of the i-th training sample in the artificially given label. If the two are equal, it will return 1. If they are not equal, it means that the current sample is wrongly diagnosed by the model, and the return value is 0; the optimal fitness and position of each particle and the global optimal fitness and position of the particle swarm are calculated through the above formula (14);

(2)、由于上述一次迭代后的全局最优适应度和最优位置并非最准确的，需要更新各粒子的速度和位置，设粒子在第t次迭代下的位置可表示为各个粒子在第t次迭代的飞行速度可表示为PSO算法通过以下公式更新各粒子的速度和位置：(2) Since the global optimal fitness and optimal position after the above one iteration are not the most accurate, the velocity and position of each particle need to be updated, and the position of the particle under the t-th iteration can be expressed as The flight speed of each particle in the t-th iteration can be expressed as The PSO algorithm updates the velocity and position of each particle by the following formulas:

其中W_p为惯性系数，C₁为粒子跟踪自身历史最优值的加速系数，表示粒子对自身的认知；C₂为粒子跟踪群体最优值的加速系数，表示粒子对群体知识的认知，即社会知识，通常设置C₁＝C₂＝2；t为当前迭代次数；ξ和η为在[0,1]区间内的均匀分布的随机数；为截止到第t次迭代时粒子i所经历过的最优适应度的位置；gb^t为截止到第t次迭代时粒子群所经历过的最好的位置；Among them, W _p is the inertia coefficient, C ₁ is the acceleration coefficient of the particle tracking its own historical optimal value, indicating the particle's cognition of itself; C ₂ is the acceleration coefficient of the particle tracking group optimal value, indicating the particle's cognition of the group knowledge , namely social knowledge, usually set C ₁ =C ₂ =2; t is the current number of iterations; ξ and η are uniformly distributed random numbers in the interval [0,1]; is the position of the optimal fitness experienced by particle i up to the t-th iteration; gb ^t is the best position experienced by the particle swarm up to the t-th iteration;

(3)、重新训练上述故障检测模型的无监督预训练SDSA、有监督预训练Softmax分类器、BP算法全局微调网络参数，得到新的全局最优适应度值和位置，并采用公式(15-16)不断更新粒子的位置和速度，当迭代次数超过其定义的K次迭代时则终止寻优，从而得到全局最优适应度下的粒子位置，即在该最优化可调超参数下，其训练样本的检测正确率最高，从而完成了自动调优超参数；再将其优化出的可调超参数值设置成SDSA的最优化可调超参数，并重新训练出该确定的网络结构和最优超参数下的SDSA的全部模型参数，从而得到训练好的故障检测模型，其模型参数可用于测试样本的特征学习和分类。(3) Retrain the unsupervised pre-training SDSA of the above fault detection model, the supervised pre-training Softmax classifier, and the BP algorithm to globally fine-tune the network parameters to obtain a new global optimal fitness value and position, and use the formula (15- 16) Continuously update the position and speed of the particle, and terminate the optimization when the number of iterations exceeds its defined K iterations, so as to obtain the particle position under the global optimal fitness, that is, under the optimal adjustable hyperparameter, its The detection accuracy rate of the training samples is the highest, thus completing the automatic tuning of hyperparameters; then set the optimized tunable hyperparameters as SDSA's optimized tunable hyperparameters, and retrain the determined network structure and the optimal tunable hyperparameters. All model parameters of SDSA under the optimal hyperparameters are obtained to obtain a trained fault detection model, whose model parameters can be used for feature learning and classification of test samples.

进一步地，步骤四中，所述在线监控的特征学习和隶属概率预测通过以下步骤进行：Further, in step 4, the feature learning and membership probability prediction of the online monitoring are performed through the following steps:

对于实时的连续生产的化工过程，利用上述步骤三训练好的故障检测模型，可以有效地预测当前过程是否处于故障状态，以预处理后的测试样本X_testwhite作为输入层，根据训练好的故障检测模型，对确定的神经网络结构，采用逐层特征学习法，通过训练好的各个降噪稀疏自动编码机的权重参数{(W₁,b₁₁),(W₂,b₁₂),…,(W_N,b_1N)}，采用公式(11)进行前馈学习得到测试样本当前隐层的特征，然后将当前隐层学习到的特征作为输入，联合下一隐层的DSA模型参数采用同样的公式进行学习特征，以此类推学习完所有隐层得到最终学习到的特征，再采用式(12)进行各隶属类别的预测，在故障检测中其类别只有两类，即故障状态和正常状态，因此对于单个样本其预测概率向量只有两个值，选取概率值最大的预测类别作为整个故障检测模型预测出的隶属类别，当计算出的隶属类别为1，则说明该工况属于正常，当隶属类别为2，则说明有异常工况出现，随即连接的报警装置发出预警，指示故障检出，同时通知工艺员或工程师及时地检查系统安全并排除故障，从而可以实现对当前工况进行故障监控。For the chemical process of real-time continuous production, the fault detection model trained in the above step 3 can effectively predict whether the current process is in a fault state. The preprocessed test sample X _{testwhite is} used as the input layer, and the trained fault detection Model, for the determined neural network structure, the layer-by-layer feature learning method is adopted, and the weight parameters {(W ₁ ,b ₁₁ ),(W ₂ ,b ₁₂ ),…,( W _N , b _1N )}, use formula (11) to perform feedforward learning to obtain the features of the current hidden layer of the test sample, and then use the features learned by the current hidden layer as input, and use the same DSA model parameters in conjunction with the next hidden layer. Formula to learn features, and so on after learning all hidden layers to get the final learned features, and then use formula (12) to predict each membership category. In fault detection, there are only two categories, namely fault state and normal state. Therefore, for a single sample, the predicted probability vector has only two values, and the prediction category with the largest probability value is selected as the membership category predicted by the entire fault detection model. When the calculated membership category is 1, it means that the working condition is normal. If the category is 2, it means that there is an abnormal working condition, and the connected alarm device will issue an early warning to indicate the fault detection, and at the same time notify the technician or engineer to check the system safety and eliminate the fault in time, so as to realize the fault monitoring of the current working condition. .

本发明与现有技术相比，具有如下优点和有益效果：Compared with the prior art, the present invention has the following advantages and beneficial effects:

1、本发明采用深度学习算法中的栈式降噪稀疏自编码机算法结合粒子群参数优化算法开发了一种新的适用于复杂非线性化工过程的故障监测方法，由于特征学习时无需标记数据，且采用了深度神经网络，与传统的浅层神经网络方法和人工神经网络相比，能够实现自适应地智能学习原始数据的特征，比人工提取特征和知识节约了大量的精力和时间，且所学习到特征能够更加深度地区分正常和异常数据，使各类别的样本聚类性能更明显，因此该算法更加智能。1. The present invention uses the stack noise reduction and sparse autoencoder algorithm in the deep learning algorithm combined with the particle swarm parameter optimization algorithm to develop a new fault monitoring method suitable for complex non-linear chemical processes, because there is no need to label data during feature learning. Compared with the traditional shallow neural network method and artificial neural network, it can realize adaptive and intelligent learning of the characteristics of the original data, which saves a lot of energy and time than manual extraction of features and knowledge. The learned features can more deeply distinguish normal and abnormal data, and make the clustering performance of each category of samples more obvious, so the algorithm is more intelligent.

2、本发明在人工可调超参数的抉择上避免了人工试验选择参数的随机性和耗时性，采用了参数自动调优的技术方案，通过在本方法中融合粒子群优化算法进行关键可调超参数自动调优，从而节省了大量的时间，使方法更具自动化。2. The present invention avoids the randomness and time-consuming of manual selection of parameters in the selection of artificially adjustable hyperparameters, and adopts the technical scheme of automatic parameter tuning. Tuning hyperparameters is automatically tuned, which saves a lot of time and makes the method more automated.

3、本发明和传统的化工过程故障检测技术(如PCA、ICA、KPCA、MICA等)相比，采用深度神经网络的逐层训练法，能够在特征学习时自适应地学习出非线性过程数据的信息，同时加入了噪声降噪策略，使算法对于高噪声的数据的鲁棒性增强，从而其建立的故障检测模型也更为精确，本发明的方法具备故障检出率明显提高、误报率较低、故障检出速度快、能够适用于大数据建模等优点，从而具有良好的监控效果，能够提升故障检测的性能。3. Compared with the traditional chemical process fault detection technology (such as PCA, ICA, KPCA, MICA, etc.), the present invention adopts the layer-by-layer training method of deep neural network, which can adaptively learn nonlinear process data during feature learning. At the same time, the noise reduction strategy is added to enhance the robustness of the algorithm to high-noise data, so that the fault detection model established by it is also more accurate. The method of the invention has the advantages of significantly improved fault detection rate and false alarm. It has the advantages of low rate, fast fault detection, and can be applied to big data modeling, so it has a good monitoring effect and can improve the performance of fault detection.

附图说明Description of drawings

图1为本发明实施例一种基于粒子群优化和降噪稀疏编码机的化工故障检测方法的流程图。FIG. 1 is a flowchart of a chemical fault detection method based on particle swarm optimization and noise reduction sparse encoder according to an embodiment of the present invention.

图2为本发明实施例中采用的田纳西-伊斯曼(TE)化工过程的工艺流程图。2 is a process flow diagram of the Tennessee-Eastman (TE) chemical process adopted in the embodiment of the present invention.

图3为本发明实施例十次重复试验平均故障检出率和误报率图。FIG. 3 is a graph of the average fault detection rate and false alarm rate of ten repeated tests according to an embodiment of the present invention.

图4为本发明实施例检出率提升较大的各方法故障检出率对比图。FIG. 4 is a comparison diagram of the fault detection rate of each method in which the detection rate is greatly improved according to the embodiment of the present invention.

图5(a)、图5(b)、图5(c)、图5(d)、图5(e)、图5(f)、图5(g)、图5(h)、图5(i)分别为故障1、故障2、故障4、故障6、故障7、故障8、故障12、故障13、故障17的故障监测结果图。Figure 5(a), Figure 5(b), Figure 5(c), Figure 5(d), Figure 5(e), Figure 5(f), Figure 5(g), Figure 5(h), Figure 5 (i) Fault monitoring result diagrams of fault 1, fault 2, fault 4, fault 6, fault 7, fault 8, fault 12, fault 13, and fault 17, respectively.

图6(a)为预处理后的测试样本前三个主元分析比较图，图6(b)为PSO-SDSA方法学习到的测试特征的前三个主元分析图。Figure 6(a) is a comparison diagram of the first three PCA of the preprocessed test samples, and Figure 6(b) is the first three PCA diagrams of the test features learned by the PSO-SDSA method.

具体实施方式Detailed ways

下面结合实施例及附图对本发明作进一步详细的描述，但本发明的实施方式不限于此。The present invention will be described in further detail below with reference to the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

实施例：Example:

本实施例提供了一种基于粒子群优化和降噪稀疏编码机的化工故障检测方法，该方法的流程图如图1所示，将本实施例提出的方法应用到田纳西-伊斯曼(TE)基准化工过程来对本实施例的方法做进一步说明，TE过程是Downs和Vogel于1993年发表在《Computers&Chemical Engineering》SCI期刊上的实际化工过程的计算机模拟，该过程现已主要发展成为评价过程监测方法的性能，该过程的工艺流程图如图2所示。TE过程主要包括5个操作单元，即：反应器、冷凝器、汽液分离器、循环压缩机、汽提塔。在模拟的数据中，一共对41个观测变量进行监测，分别有22个连续过程变量，19个成分变量。TE过程还包括21个预先设定的故障，本实施例采用前20个故障进行监测，20个预先设定的故障如下表1所示。This embodiment provides a chemical fault detection method based on particle swarm optimization and noise reduction sparse coding machine. The flowchart of the method is shown in Figure 1. The method proposed in this embodiment is applied to Tennessee-Eastman (TEE) ) benchmark chemical process to further illustrate the method of the present embodiment, the TE process is the computer simulation of the actual chemical process published by Downs and Vogel in the SCI journal of "Computers & Chemical Engineering" in 1993, and this process has now been mainly developed into evaluation process monitoring. The performance of the method, the process flow diagram of the process is shown in Figure 2. The TE process mainly includes five operating units, namely: reactor, condenser, vapor-liquid separator, circulating compressor, and stripper. In the simulated data, a total of 41 observed variables are monitored, including 22 continuous process variables and 19 component variables. The TE process also includes 21 preset faults. In this embodiment, the first 20 faults are used for monitoring, and the 20 preset faults are shown in Table 1 below.

表1.TE过程的20个预先设定的故障Table 1. 20 preset failures of the TE process

所述化工过程故障检测方法包括步骤：The chemical process fault detection method comprises the steps:

步骤一、数据采集：Step 1. Data collection:

采集TE过程正常工况和20个故障下的数据，分为训练样本集X_train和测试样本集X_test。训练样本集包含正常工况样本13480个，各个故障下的样本480个。测试样本集包含正常样本960个，每个故障样本960个，但是故障样本均是在第161个样本开始处于故障状态。过程监测了41个变量，因此，训练样本集构成一个23080×41的矩阵，测试样本集构成一个20160×41的矩阵。Collect the data under normal working conditions and 20 faults in the TE process, and divide it into a training sample set X _train and a test sample set X _test . The training sample set contains 13,480 samples under normal working conditions and 480 samples under each fault. The test sample set contains 960 normal samples and 960 fault samples each, but the fault samples are all in the fault state from the 161st sample. The process monitors 41 variables, so the training sample set forms a 23080×41 matrix and the test sample set forms a 20160×41 matrix.

步骤二、数据预处理：Step 2: Data preprocessing:

首先对训练数据中的13480个正常样本求取各变量的均值X_mean和标准差X_std。然后利用均值X_mean和标准差X_std采用公式(1)对训练样本集和测试样本集都进行标准化处理，再利用公式(2-3)求取训练样本集的白化矩阵W_white，利用公式(4-5)得到白化后的训练样本集X_trainwhite和测试样本集X_testwhite。First, the mean value X _mean and the standard deviation X _std of each variable are obtained from the 13480 normal samples in the training data. Then use the mean X _mean and the standard deviation X _std to standardize both the training sample set and the test sample set using formula (1), and then use formula (2-3) to obtain the whitening matrix W _white of the training sample set, and use formula ( 4-5) Obtain the whitened training sample set X _trainwhite and test sample set X _testwhite .

步骤三、离线训练：Step 3. Offline training:

首先定义其隐层数为2，其网络结构为41-130-20-2，初始化SDSA的权重参数矩阵和偏置项，以白化后的训练样本集X_trainwhite作为SDSA的输入层，通过2个隐层的逐层贪婪训练，利用matlab工具箱minFunc进行优化SDSA的损失函数，得到最优化的模型参数，并采用公式(11)前馈学习得到其训练样本的特征h₂。再将训练样本特征h₂作为模型输入作为Softmax分类器模型的输入，将所有学习到的训练样本特征附加对应的工况标签yⁱ＝1代表该样本是正常的，yⁱ＝2代表该样本是故障，通过matlab工具箱minFunc进行优化Softmax的损失函数从而得到其预训练后的Softmax模型参数；然后进行BP算法全局微调网络参数，对整个训练过程的所有模型参数进行有监督地微调，以无监督预训练好的SDSA模型参数和有监督预训练Softmax模型参数作为初始值，以预处理好的训练样本为输入层，使用反馈传播(BP)算法优化全局参数从而使得损失函数最小化收敛即可，从而得到初步训练好的故障检测模型。最后进行粒子群优化三个关键可调超参数，最后确定出最优化超参数的值和其对应的全部模型参数，从而得到最终训练好的故障检测模型，其模型可用于过程状态的在线监测。First, define the number of hidden layers as 2 and the network structure as 41-130-20-2, initialize the weight parameter matrix and bias term of SDSA, and use the whitened training sample set X _trainwhite as the input layer of SDSA, through 2 For the layer-by-layer greedy training of the hidden layer, use the matlab toolbox minFunc to optimize the loss function of SDSA to obtain the optimized model parameters, and use formula (11) feedforward learning to obtain the feature h ₂ of the training sample. Then use the training sample feature h ₂ as the model input as the input of the Softmax classifier model, and attach all the learned training sample features with the corresponding working condition labels y ⁱ =1 represents that the sample is normal, y ⁱ =2 represents that the sample is faulty, and the loss function of Softmax is optimized through the matlab toolbox minFunc to obtain the pre-trained Softmax model parameters; then the BP algorithm is used to globally fine-tune the network. parameters, supervised fine-tuning of all model parameters in the whole training process, using unsupervised pre-trained SDSA model parameters and supervised pre-trained Softmax model parameters as initial values, pre-processed training samples as input layer, using The feedback propagation (BP) algorithm optimizes the global parameters so that the loss function is minimized and converged, so as to obtain a preliminarily trained fault detection model. Finally, particle swarm optimization is carried out for three key adjustable hyperparameters, and finally the values of the optimized hyperparameters and all the corresponding model parameters are determined, so as to obtain the final trained fault detection model, which can be used for online monitoring of process status.

步骤四、在线监控：Step 4. Online monitoring:

对于实时的连续生产的化工过程，利用上述步骤三训练好的故障检测模型，可以有效地预测当前过程是否处于故障状态。以预处理后的测试样本X_testwhite作为输入层，根据训练好的故障检测模型，对确定的神经网络结构，采用逐层特征学习法，通过训练好的各个降噪稀疏自动编码机的权重参数{(W₁,b₁₁),(W₂,b₁₂),…,(W_N,b_1N)}，采用公式(11)进行前馈学习得到测试样本当前隐层的特征，然后将当前隐层学习到的特征作为输入，联合下一隐层的DSA模型参数采用同样的公式进行学习特征，以此类推学习完所有隐层得到最终学习到的特征，再采用公式(12)进行各隶属类别的预测，在故障检测中其类别只有两类，即故障状态和正常状态，因此对于单个样本其预测概率向量只有两个值，选取概率值最大的预测类别作为整个故障检测模型预测出的隶属类别，当计算出的隶属类别为1，则说明该工况属于正常，当隶属类别为2，则说明有异常工况出现，最后统计各故障的检出率；For the chemical process of real-time continuous production, using the fault detection model trained in the third step above can effectively predict whether the current process is in a fault state. Taking the preprocessed test sample X _testwhite as the input layer, according to the trained fault detection model, for the determined neural network structure, the layer-by-layer feature learning method is adopted, and the weight parameters of the trained noise reduction sparse auto-encoder are passed { (W ₁ ,b ₁₁ ),(W ₂ ,b ₁₂ ),…,(W _N ,b _1N )}, using formula (11) for feedforward learning to obtain the features of the current hidden layer of the test sample, and then the current hidden layer The learned features are used as input, and the DSA model parameters of the next hidden layer are combined with the same formula to learn the features, and so on after learning all hidden layers to obtain the final learned features, and then formula (12) is used for each membership category. Prediction, there are only two categories in fault detection, namely fault state and normal state, so for a single sample, the predicted probability vector has only two values, and the predicted category with the largest probability value is selected as the membership category predicted by the entire fault detection model. When the calculated affiliation category is 1, it means that the working condition is normal; when the affiliation category is 2, it means that there is an abnormal working condition, and finally the detection rate of each fault is counted;

将测试样本集看作实时工况数据，通过以上步骤便能够对实际化工过程采集到的实时数据做出故障诊断。The test sample set is regarded as real-time working condition data, and fault diagnosis can be made on the real-time data collected in the actual chemical process through the above steps.

首先对选取的网络结构为41-130-20-2和调优后的超参数下进行了十次重复试验，TE过程本方法得到的最优超参数列如下：正则化权重衰减超参数λ_r＝0.002，控制稀疏性惩罚项的权重超参数β＝2.628255，Softmax的权重衰减项系数λ_sm＝0.0001，其全局最优适应度值＝0.9998。将每次试验得到的测试平均故障检出率和测试平均误报率如图3所示，从图中可以看出，十次平均试验的测试样本故障检出率(FDR)的平均值为82.42％，标准偏差为0.857％，误报率(FAR)的平均值为0.64％，标准偏差为1.194％，说明本方法的总体FDR较高、FAR较低，根据故障检测性能评价的方法，本方法的诊断性能是较优的。此外，十次试验的FDR和FAR相互之间相差并不大，说明本方法融合了PSO优化具有提高模型的稳定性的效果，如果模型超参数不优化的情况下很容易使得参数不恰当导致训练困难和陷入局部最优，因此选择PSO优化既可以避免此类问题的出现，还能使模型在最优参数下达到更优的诊断性能，同时PSO优化作为发展较成熟的无约束离散优化方法，有着收敛速度快、易于编程实现等优点，综上所述，进行PSO参数优化是很有必要且易于实现的。First, ten repeated experiments were carried out under the selected network structure of 41-130-20-2 and the tuned hyperparameters. The optimal hyperparameters obtained by this method in the TE process are listed as follows: Regularization weight decay hyperparameter λ _r =0.002, the weight hyperparameter β=2.628255 that controls the sparsity penalty term, the coefficient of the weight decay term of Softmax λ _sm =0.0001, and its global optimal fitness value=0.9998. The average test failure detection rate and test average false alarm rate obtained in each test are shown in Figure 3. It can be seen from the figure that the average value of the test sample failure detection rate (FDR) of the ten average tests is 82.42 %, the standard deviation is 0.857%, the average value of false alarm rate (FAR) is 0.64%, and the standard deviation is 1.194%, indicating that the overall FDR of this method is high and the FAR is low. According to the method of fault detection performance evaluation, this method The diagnostic performance is better. In addition, the FDR and FAR of the ten trials are not much different from each other, indicating that this method integrates PSO optimization and has the effect of improving the stability of the model. If the model hyperparameters are not optimized, it is easy to make the parameters inappropriate and lead to training. Difficulty and falling into local optimum, so choosing PSO optimization can not only avoid the occurrence of such problems, but also enable the model to achieve better diagnostic performance under optimal parameters. At the same time, PSO optimization is a relatively mature unconstrained discrete optimization method. It has the advantages of fast convergence speed and easy programming. To sum up, it is necessary and easy to implement PSO parameter optimization.

选取本方法性能最好(故障检出率最高，误报率最低)的一次结果(第9次试验)来展示其测试样本各故障的检出率并与其他方法进行比较如表2所示。从表2可以看出，提出的PSO-SDSA方法对故障1，2，4，6，7，8，10，11，12，13，14，16，17，18，19，20均具有较高的诊断精度。可以看出，对比PCA方法,MICA方法以及KPCA方法可知，平均故障检出率所提出的方法(PSO-SDSA)最好，其所有故障的平均FDR达到了83.48％，FAR仅为0.21％。在以往开发出的方法中，故障3，9，15是TE过程中较难检出的故障，由于故障本身与正常样本数据的差异性不大，导致很多方法都在检测这几个故障时出现极低的诊断率(大都低于10％)，而由于深度神经网络与传统方法的建模时不同的是采用的建模数据是大量正常样本和故障样本，此外，本方法由于贪婪逐层深度训练和加入了噪声建模，严格限制信息的损失，导致其学习的故障特征能够深度挖掘这些微小扰动性的故障与正常的差异性，使其较好地区分于正常特征。因此故障3，9，15的故障检出率在所提出的方法中均得到了较大的提升，其中故障3达到了50.5％，故障5达到了44.755％，故障15达到了44.875％，其他方法都基本维持在10％以下，因此，从一定程度上说明提出的PSO-SDSA方法有效地提高了某些难以检测出故障点的性能。虽然PSO-SDSA方法在某几个故障(故障1，14)比其他方法略低一些，但总体而言本方法是在考虑全局误差下建模的，由于要寻求全局故障达到更小的误差，从而模型可能偏向于减少较难检出的故障的误差，而略微忽略了某些较易检出的故障样本，所以可能会导致某几个较易检出的故障FDR反而略低于传统方法，所导致的故障检出率(故障1为96.5％，故障14为90.375％)是完全可以接受的。从总体上而言，本方法对比以上其他方法，性能得到了很大的提升，是目前研究中较有优势的FDD方法。此外，图4给出了本方法与其他方法相比FDR有明显提升的故障检出率对比图(故障3,5,9,10,15,19,20)，可以更为直观的展现出本方法性能提升的优越性，可以看出，PSO-SDSA曲线各点都在其他方法曲线之上，说明对于给出的这7个故障的检出率都有较大的提升。The first result (the ninth test) with the best performance (the highest fault detection rate and the lowest false alarm rate) of this method is selected to show the detection rate of each fault in the test sample and compare it with other methods as shown in Table 2. From Table 2, it can be seen that the proposed PSO-SDSA method has a higher diagnostic accuracy. It can be seen that compared with the PCA method, the MICA method and the KPCA method, the proposed method (PSO-SDSA) with the average fault detection rate is the best, and the average FDR of all faults reaches 83.48%, and the FAR is only 0.21%. In the methods developed in the past, faults 3, 9, and 15 are the faults that are difficult to detect in the TE process. Because the difference between the fault itself and the normal sample data is not large, many methods appear when detecting these faults. Very low diagnosis rate (mostly lower than 10%), because the modeling data of deep neural network and traditional methods are different from a large number of normal samples and fault samples. Training and adding noise modeling to strictly limit the loss of information, the learned fault features can deeply mine the differences between these small disturbance faults and normal, so that they can be better differentiated from normal features. Therefore, the fault detection rates of faults 3, 9, and 15 have been greatly improved in the proposed method, among which fault 3 reaches 50.5%, fault 5 reaches 44.755%, fault 15 reaches 44.875%, and other methods are basically maintained below 10%, therefore, to a certain extent, the proposed PSO-SDSA method can effectively improve the performance of some difficult-to-detect fault points. Although the PSO-SDSA method is slightly lower than other methods in some faults (fault 1, 14), in general, this method is modeled under the consideration of global errors. Therefore, the model may be biased to reduce the error of the faults that are more difficult to detect, while slightly ignoring some samples of faults that are easier to detect, so the FDR of some faults that are easier to detect may be slightly lower than that of the traditional method. The resulting fault detection rate (96.5% for fault 1 and 90.375% for fault 14) is perfectly acceptable. In general, compared with the other methods above, the performance of this method has been greatly improved, and it is the most advantageous FDD method in the current research. In addition, Fig. 4 shows a comparison chart of the fault detection rate (faults 3, 5, 9, 10, 15, 19, 20) of this method compared with other methods, which can be significantly improved by FDR, which can more intuitively show this method. The superiority of the method performance improvement, it can be seen that each point of the PSO-SDSA curve is above the curve of other methods, indicating that the detection rate of the seven faults given has been greatly improved.

表2.TE过程各种方法的故障检出率Table 2. Failure detection rates of various methods in the TE process

对测试结果进行在线监测的预警速度分析，选取其中检出率较高的9个故障做检出速度分析(故障1，2，4，6，7，8，12，13，17)，将各故障样本与预测概率按照以上顺序分别画在图5中，图中0.5警戒线为分类控制限，当样本预测概率值超过该控制限，则被模型认定为故障状态。从图5中可以看出，对于所选的9个故障，在检测出故障后，此后的大部分样本都能被检测出来，因此导致了较高的故障检出率(其FDR均高于90％)，故障1,4,6,7,8,12都在第161个样本就检出了故障，延迟了0个点，说明这几个故障的检出速度是极高的。而故障2和17检出点在第162个样本处，虽然延迟了1个点，但检出速度依旧很高，在实际应用时其预警效果是完全可以接受的。故障13相对而言检出速度更低一些，在第167个样本处才检出故障，延迟了6个点开始报警。此外，从给出的故障在线监测结果可以看出，故障2,4,6,7,12检出故障后其样本低于控制限的点非常少，导致其故障检出率均在99％以上，说明本方法对这些故障的检测有较强的捕捉能力。故障1,8,13,17在检测出故障后偶尔会阶跃性地低于控制限，导致这几个故障的检出率更低一些，但检出率都在94％以上，因此其检测性能也是较好的。The early warning speed analysis of online monitoring is carried out on the test results, and 9 faults with a high detection rate are selected for the detection speed analysis (fault 1, 2, 4, 6, 7, 8, 12, 13, 17). The fault samples and predicted probability are respectively drawn in Figure 5 in the above order. The 0.5 warning line in the figure is the classification control limit. When the predicted probability value of the sample exceeds the control limit, the model is regarded as a fault state. As can be seen from Figure 5, for the selected 9 faults, after the fault is detected, most of the samples thereafter can be detected, thus resulting in a high fault detection rate (their FDRs are all higher than 90 %), faults 1, 4, 6, 7, 8, and 12 are all detected in the 161st sample, with a delay of 0 points, indicating that the detection speed of these faults is extremely high. The detection point of faults 2 and 17 is at the 162nd sample. Although it is delayed by 1 point, the detection speed is still very high, and its early warning effect is completely acceptable in practical application. The detection speed of fault 13 is relatively lower, and the fault is only detected at the 167th sample, and the alarm starts after a delay of 6 points. In addition, it can be seen from the fault online monitoring results given that after faults 2, 4, 6, 7, and 12 are detected, there are very few points below the control limit, resulting in a fault detection rate of more than 99%. , indicating that this method has a strong ability to capture these faults. Faults 1, 8, 13, and 17 occasionally fall below the control limit step by step after the fault is detected, resulting in a lower detection rate of these faults, but the detection rate is above 94%, so its detection Performance is also better.

为了进一步研究本方法在故障检出率、误报率均较好的原因，对本方法所学习到的特征进行主成分分析。我们选取了正常状态与故障1,2,6,7的数据的前3个PCs进行对比，图6(a)为预处理后的测试样本前三个主元分析比较图，图6(b)为PSO-SDSA方法学习到的测试特征的前三个主元分析图。对比图6(a)和图6(b)可知，在预处理后的测试数据中，故障1,2,7与正常样本的数据交杂重叠较为严重，显示出严重的线性不可分特性，这是由于TE过程具有复杂非线性的特点，直接馈送到分类模型是无法实现正确的分类的，而经过PSO-SDSA方法学习后的特征主成分分析可知，其故障1,2,6,7均具有较好的区分性，且各故障本身样本聚集也较好，同时注意到其正常样本之间的聚类性能也较好，没有很大的分散性，因此导致这几个故障与正常样本能够较为完全的区分开来，导致了故障诊断率较高而误报率也极低的效果，也说明了本方法是可以用来学习化工过程数据中隐含的知识的。在栈式降噪稀疏自编码机的特征学习下，能够实现深度地区分出故障信息和正常信息，使学习的特征具备优良的聚类性能，从而提高了故障检测的性能。In order to further study the reason why this method has better fault detection rate and false alarm rate, principal component analysis is carried out on the features learned by this method. We selected the first three PCs in the normal state and the data of faults 1, 2, 6, and 7 for comparison. Figure 6(a) is the comparison chart of the first three PCs of the preprocessed test samples, and Figure 6(b) Plots of the first three principal component analyses of the test features learned for the PSO-SDSA method. Comparing Fig. 6(a) and Fig. 6(b), it can be seen that in the pre-processed test data, the data of faults 1, 2, and 7 overlap with the data of normal samples more seriously, showing serious linear inseparability characteristics, which is Due to the complex and nonlinear characteristics of the TE process, it cannot be directly fed to the classification model to achieve correct classification. However, the principal component analysis of the features learned by the PSO-SDSA method shows that the faults 1, 2, 6, and 7 all have relatively high performance. Good discrimination, and each fault itself has good sample aggregation. At the same time, it is noted that the clustering performance between its normal samples is also good, and there is no great dispersion, so these faults and normal samples can be relatively complete It leads to the effect of high fault diagnosis rate and extremely low false alarm rate, which also shows that this method can be used to learn the implicit knowledge in chemical process data. Under the feature learning of the stack noise reduction sparse autoencoder, the fault information and normal information can be deeply distinguished, so that the learned features have excellent clustering performance, thereby improving the performance of fault detection.

以上所述，仅为本发明专利较佳的实施例，但本发明专利的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明专利所公开的范围内，根据本发明专利的技术方案及其发明专利构思加以等同替换或改变，都属于本发明专利的保护范围。The above is only a preferred embodiment of the patent of the present invention, but the protection scope of the patent of the present invention is not limited to this. The technical solution and the invention patent concept of the invention are equivalently replaced or changed, all belong to the protection scope of the invention patent.

Claims

1. a chemical fault detection method based on particle swarm optimization and noise reduction sparse coding machine, is characterized in that, described method comprises the following steps:

Step 1. Data collection:

The historical time series data collected by the simulation system or DCS system is used as the training sample set X _train , and the real-time chemical process data from the DCS system is used as the test sample set X _test , wherein the collected training sample set X _train includes The time series data under various fault conditions is used to establish the intelligent fault detection model of this method. The test sample set X _test is the real-time operating condition data monitored online, and also includes the time series data under normal conditions and various faults. To verify the diagnostic accuracy of the method or apply the model established by the method in the actual industry to realize fault detection;

Step 2: Data preprocessing:

First, calculate the mean value X _mean and standard deviation X _std of each monitoring variable of the data in the training sample set X _train under _normal working conditions, and _then use the mean value X _mean and the standard The difference X _std is subjected to standardization preprocessing, and the preprocessed training sample set X _trainstd and test sample set X _teststd are subjected to whitening preprocessing to obtain the whitened training sample set X _trainwhite and test sample set X _testwhite , so far the training samples and Preprocessing of the test sample set;

Step 3. Offline training:

The pre-processed training samples are used to establish an intelligent fault detection model for chemical processes. The process can be divided into unsupervised pre-training stack noise reduction and sparse automatic encoder, supervised pre-training Softmax classifier, BP algorithm global fine-tuning network parameters, particle swarm The four major parts of the adjustable hyperparameters are optimized. First, the unsupervised pre-training stack-type noise reduction sparse auto-encoder is performed, and N noise reduction sparse encoders are used for the preprocessed training set to encode the sample set into feature space layer by layer. Using the method of layer-by-layer training, in the training process of each layer, the loss function of each layer is optimized to minimize the loss function of each layer, so as to obtain the model parameters of each layer, and finally obtain the final feature h _N of the training samples learned by its N hidden layers; secondly Perform supervised pre-training of the Softmax classifier, use h _N as the input of the Softmax classifier model, and attach all the learned training sample features to the corresponding working condition labels y ⁱ =1 represents that the sample is normal, y ⁱ =2 represents that the sample is faulty, and the pre-trained Softmax model parameters are obtained by optimizing the cost function of Softmax; then the BP algorithm is used to globally fine-tune the network parameters. All model parameters in the training process are fine-tuned in a supervised manner. The unsupervised pre-trained SDSA model parameters and the supervised pre-trained Softmax model parameters are used as initial values, and the pre-processed training samples are used as the input layer. First, pass these SDSA parameters. Layer-by-layer feature learning is used to obtain the feature matrix of the final hidden layer. The feature matrix is used to calculate its loss function value through the Softmax classifier, and then the feedback propagation (BP) algorithm is used to optimize the global parameters so that the loss function can be minimized and converged. Obtain the fault detection model that has been initially trained; finally, adjust the hyperparameters by particle swarm optimization. Since the artificially adjustable hyperparameters in the entire intelligent fault detection model are not optimal, if these hyperparameters are set improperly, the performance of SDSA may not be the same. Therefore, the particle swarm PSO optimization algorithm will be called for hyperparameter tuning, and finally the optimal hyperparameter value and all corresponding model parameters will be determined, so as to obtain the final trained fault detection model for online monitoring of process status. ;

Step 4. Online monitoring:

Taking the preprocessed test sample X _testwhite as the input layer, according to the trained fault detection model, for the determined neural network structure, the layer-by-layer feature learning method is used to obtain the final learned test features, and then calculate the final learned features. The membership probability value of its Softmax prediction function, when the obtained membership category is 1, it means that the working condition is normal; when the membership category is 2, it means that there is an abnormal working condition, and then the connected alarm device issues an early warning to indicate a fault. It is detected, and at the same time, the technician or engineer is notified to check the safety of the system and eliminate the fault in time, so as to realize the fault monitoring of the current working condition.

2. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 1, is characterized in that, in step 2, realizes the standardization of training sample set X _train and test sample set X _test by following steps Preprocessing:

The training sample set X _train is an n×m matrix, where n is the number of samples and m is the number of observed variables. The standardized training sample set X _trainstd and the test sample set X _teststd are solved by the following formula:

Among them, X _ij represents the value of the jth variable of the ith sample in the training sample set X _train and the test sample set X _test , and X _{mean, j} is the value of the jth variable of the data in the training sample X _train under normal conditions. Mean, X _{std, j} The standard deviation of the jth variable of the data under normal conditions in the training sample X _train .

3. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 2, is characterized in that, in step 2, realizes the whitening of training sample set X _train and test sample set X _test by following steps Preprocessing:

(1) Perform whitening preprocessing on the standardized training sample set X _trainstd and the test sample set X _teststd , first perform eigenvalue decomposition on the covariance matrix of the training sample by the following formula, and obtain the eigenvector of the covariance matrix. The orthogonal matrix and the diagonal matrix of its eigenvalues, resulting in the whitening matrix W _white :

Cov=VDV ^T (2)

W _white = VD ^-1/2 V ^T (3)

Among them, Cov is the covariance matrix of the standardized training sample set X _trainstd , V is the orthogonal matrix of the eigenvectors of the covariance matrix, and D is the diagonal matrix of the eigenvalues of the covariance matrix;

(2), then the training sample X _trainwhite and the test sample X _testwhite after whitening are calculated by W _white :

Among them, W _white is the whitening matrix of whitening preprocessing, X _trainwhite is the training sample set after whitening processing, and X _testwhite is the testing sample set after whitening processing.

4. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 3, is characterized in that, in step 3, the unsupervised pre-training stack type noise reduction sparse automatic coding in described offline training The machine is carried out through the following steps:

(1) For the preprocessed training sample X _trainwhite is a matrix of n × m, n is the number of samples, m is the number of observation variables, assuming the hidden layer of the stack noise reduction sparse automatic encoder (SDSA) The number is N, and its deep neural network structure is set, that is, the number of nodes in each hidden layer network is mainly determined HL ₁ --- HL _N , and each layer of the deep neural network is mainly defined as: input layer, N hidden layers, classification Therefore, its global network can be regarded as a neural network with N+2 layers. First, initialize the weight matrix parameters of the first hidden layer, use X _trainwhite as the input layer of SDSA, and train the noise reduction and sparse automatic of the first hidden layer. The encoder DSA ₁ , adds part of the Gaussian noise that obeys the normal distribution to the training sample X _trainwhite by the following formula, and turns the training set into a data set Xc mixed with noise:

Xc=X+le*G (5)

Among them, X is the training set matrix when no noise is added, and X is the preprocessed training set X _trainwhite in the first hidden layer; le is the noise level, and its value is 0-1, generally 0.1; G is The generated Gaussian noise with the same dimension as the training set matrix when no noise is added, Xc is the data set containing artificial noise;

(2) After adding the interference noise, use Xc as the input to perform coding learning and decoding reconstruction. The coding stage is essentially the stage of feature learning, and the decoding is the reconstruction of the feature data. The coding formula and decoding formula are:

Among them, h is the learned feature, Y is the information reconstructed by the feature h, the closer the value is to the original data X without noise, the better the model parameters are trained; W is the encoding weight parameter matrix, b ₁ , b ₂ is the bias vector, W ^T is the decoding weight parameter matrix, the feature h ₁ of the first hidden layer and the model parameter W ₁ of the first hidden layer can be learned through the above formula, b ₁₁ , b ₂₁ ;

(3) Define a reasonable loss function, and then continuously optimize the loss function to reach the minimum value, which can exert the optimal ability of the model. The loss function is as follows:

Among them, L _total (W, W ^T , b ₁ , b ₂ ) represents the loss function value of the noise reduction sparse auto-encoder of the current hidden layer, n is the number of samples, and Y _i is the reconstructed information vector of the ith sample value, x _i is the original data vector value of the ith sample without adding noise, is the value of the i-th row and the j-th column of the weight parameter of the l-th layer, λ _r is the regularization weight decay hyperparameter used to adjust the weight of the equation; sl represents the number of nodes in the l-th layer of a single noise reduction auto-encoder , s(l+1) represents the number of nodes in the l+1th layer, β is the weight hyperparameter that controls the sparsity penalty term, and needs to be manually adjusted to a more appropriate value; s ₂ is the number of nodes of the current hidden layer learning feature , that is, the number of features learned by a single sample; is the relative entropy, which represents the difference between the current average activation degree and the sparsity constraint; ρ is the sparsity parameter, which represents the sparsity constraint, which is generally a small value set artificially, for example, ρ takes 0.1; is the average activation of the jth variable in the feature matrix h, Represents the value of the jth variable of the ith sample in the feature matrix h of the current hidden layer;

During offline training, the matlab toolbox minFunc can be called to minimize the loss function of equation (8), and the optimized weight parameter W ₁ after the first hidden layer pre-training can be obtained, and bias terms b ₁₁ , b ₂₁ , and then use the trained model parameters for feature learning of the training samples of the first hidden layer;

(4) Repeat calling steps (1-3) to train the model parameters of the multi-hidden layer network until the SDSA pre-training of N hidden layers is completed.

5. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 4, is characterized in that, described step (4) specifically comprises:

First, after the above-mentioned first hidden layer noise reduction sparse auto-encoder DSA ₁ is trained, its training sample X _trainwhite is encoded as h ₁ by the following formula:

h ₁ =f(W ₁ X+b ₁₁ ) (11)

Among them, W ₁ , b ₁₁ are the model parameters of the trained DSA ₁ , X is the training set matrix when no noise is added, X is the preprocessed training set X _trainwhite in the first hidden layer, and the second-N hidden layer is represented as the feature matrix learned by the previous layer;

Then use the learned first hidden layer feature h ₁ as the input of the second hidden layer noise reduction sparse auto-encoder DSA ₂ , and repeat steps (1-3) to train DSA ₂ to obtain the model weight of the second hidden layer parameter and bias term W ₂ , b ₁₂ , b ₂₂ , encode the feature of the second hidden layer as h ₂ through the above formula (11), repeat this process until the DSA _N training is completed, and obtain the final feature h _N learned by its N hidden layers, The features learned by pre-training are used for the subsequent classification layer, and the model parameters of the Softmax classifier can be trained.

6. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 5, is characterized in that, in step 3, the supervised pre-training Softmax classifier in described off-line training specifically passes following two steps to proceed:

(1) The Nth hidden layer feature hN learned after SDSA training with _N hidden layers is used as the input of the Softmax classifier model, and then the corresponding working condition label is attached y ⁱ =1 represents that the ith sample is normal, and y ⁱ =2 represents that the ith sample is faulty: first initialize its model parameter matrix θ, and predict the probability value of its sample belonging to each category according to the following Softmax classifier prediction function:

in, is the probability value of the i-th sample belonging to each category, and θ is the model parameter matrix of the Softmax classifier, which is represented by Vector composition; k is the number of categories defined by the classifier, where k=2; is the feature vector of the Nth hidden layer of the ith sample;

(2) Construct a classification loss function to obtain the optimal model parameters. The Softmax classifier used considers the regularization term, which can effectively avoid overfitting of the classification model training. The loss function is defined as follows:

Among them, the meaning of the 1{.} part is that if the index function in the _parentheses is true, it will return 1, otherwise it will _return 0; Loss function, call the matlab toolbox minFunc to optimize the minimum loss function, so as to obtain the optimal parameter matrix θ of the Softmax classifier model.

7. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 6, is characterized in that, in step 3, the BP algorithm in described off-line training global fine-tuning network parameter is specifically carried out by following steps :

Take all pre-trained model parameters {(W ₁ , b ₁₁ ), (W ₂ , b ₁₂ ), ..., (W _N , b _1N ), θ} as initial values, and take the pre-trained training samples as initial values X _trainwhite is the input layer. First, use equation (11) to perform layer-by-layer feature learning through the initial parameters of SDSA model, so as to obtain the feature matrix of the final hidden layer, and calculate the loss value of equation (13) through the feature matrix through the classification layer, and then The feedback propagation algorithm is used to optimize the global parameters, and all model parameters are updated once in each iteration, so that the loss value of equation (13) can be converged to the minimum, and the fine-tuning process is completed.

8. The chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 7, is characterized in that, in step 3, the particle swarm optimization adjustable hyperparameter in described offline training specifically passes following steps conduct:

(1) Optimizing three key adjustable parameters, namely: regularization weight decay hyperparameter λ _r , weight hyperparameter β controlling sparsity penalty term, and Softmax weight decay term coefficient λ _sm . Three adjustable parameters constitute a single particle (λ _r , β, λ _sm ) optimized by PSO, so the dimension of each particle is 3 dimensions; define N _p particles for simultaneous optimization, set K iterations, and initialize the initial value of each particle Position and speed, the fitness function value of which is the overall accuracy of the training samples. The calculation method uses these three key adjustable hyperparameters as independent variables. After training the above fine-tuned fault detection model, the overall training sample can be obtained. Accuracy, its formula is defined as the following formula:

Among them, pi represents the category of the ⁱ -th training sample predicted by the fault detection model, and ^yi represents the category of the i-th training sample in the artificially given label. If the two are equal, it will return 1. If they are not equal, it means that the current sample is wrongly diagnosed by the model, and the return value is 0; the optimal fitness and position of each particle and the global optimal fitness and position of the particle swarm are calculated through the above formula (14);

(2) Update the velocity and position of each particle, and suppose that the position of the particle under the t-th iteration can be expressed as The flight speed of each particle in the t-th iteration can be expressed as The PSO algorithm updates the velocity and position of each particle by the following formulas:

Among them, W _p is the inertia coefficient, C ₁ is the acceleration coefficient of the particle tracking its own historical optimal value, indicating the particle's cognition of itself; C ₂ is the acceleration coefficient of the particle tracking group optimal value, indicating the particle's cognition of the group knowledge , namely social knowledge, usually set C ₁ =C ₂ =2; t is the current number of iterations; ξ and η are uniformly distributed random numbers in the interval [0, 1]; is the position of the optimal fitness experienced by particle i up to the t-th iteration; gb ^t is the best position experienced by the particle swarm up to the t-th iteration;

(3) Retrain the unsupervised pre-training SDSA of the above fault detection model, the supervised pre-training Softmax classifier, and the BP algorithm to globally fine-tune the network parameters to obtain a new global optimal fitness value and position, and use the formula (15- 16) Continuously update the position and speed of the particle, and terminate the optimization when the number of iterations exceeds its defined K iterations, so as to obtain the particle position under the global optimal fitness, that is, under the optimal adjustable hyperparameter, its The detection accuracy rate of the training samples is the highest, thus completing the automatic tuning of hyperparameters; then set the optimized tunable hyperparameters as SDSA's optimized tunable hyperparameters, and retrain the determined network structure and the optimal tunable hyperparameters. All model parameters of SDSA under the optimal hyperparameters are obtained to obtain a trained fault detection model, whose model parameters can be used for feature learning and classification of test samples.

9. the chemical industry fault detection method based on particle swarm optimization and noise reduction sparse coding machine according to claim 8, is characterized in that, in step 4, the feature learning of described online monitoring and membership probability prediction are carried out by following steps:

Taking the preprocessed test sample X _testwhite as the input layer, according to the trained fault detection model, for the determined neural network structure, the layer-by-layer feature learning method is adopted, and the weight parameters of the trained noise reduction sparse auto-encoder are passed { (W ₁ , b ₁₁ ), (W ₂ , b ₁₂ ), ..., (W _N , b _1n )}, use formula (11) to perform feedforward learning to obtain the features of the current hidden layer of the test sample, and then use the current The features learned by the hidden layer are used as input, and the DSA model parameters of the next hidden layer are combined with the same formula to learn the features, and so on after all hidden layers are learned to obtain the final learned features, and then formula (12) is used for each membership. For category prediction, in fault detection, there are only two categories, namely fault state and normal state. Therefore, for a single sample, the predicted probability vector has only two values, and the predicted category with the largest probability value is selected as the membership predicted by the entire fault detection model. When the calculated affiliation category is 1, it means that the working condition is normal; when the affiliation category is 2, it means that there is an abnormal working condition, and then the connected alarm device will issue an early warning, indicating the fault is detected, and notify the technician at the same time. Or engineers can check the system safety and troubleshoot in time, so that the current working conditions can be monitored for faults.