CN114201853A

CN114201853A - Method for analyzing relevance of corn starch process parameters and starch milk DE values

Info

Publication number: CN114201853A
Application number: CN202111306689.7A
Authority: CN
Inventors: 李义; 周聪聪; 叔谋; 张磊; 佟毅; 都健; 刘颖慰; 赵优; 徐杨; 赵国兴; 刘琳琳; 董亚超; 陶然; 李明鑫
Original assignee: Dalian University of Technology; Cofco Nutrition and Health Research Institute Co Ltd; Cofco Jilin Bio Chemical Technology Co Ltd; Cofco Biotechnology Co Ltd
Current assignee: Dalian University of Technology; Cofco Nutrition and Health Research Institute Co Ltd; Cofco Jilin Bio Chemical Technology Co Ltd; Cofco Biotechnology Co Ltd
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2022-03-18
Anticipated expiration: 2041-11-05
Also published as: CN114201853B

Abstract

The embodiment of the invention provides a method for analyzing the relevance of corn starch process parameters and a starch milk DE value, which belongs to the field of process modeling of big data and comprises the following steps: performing feature extraction processing and dimension reduction processing on the initial input data by using a principal component analysis method to obtain a feature vector of the initial input data and principal component input data of the initial input data; training the principal component input data by using a convolutional neural network model, and calculating a total weight matrix of the trained convolutional neural network model; sequentially combining the characteristic vectors of the initial input data into a matrix as a weight matrix of the initial input data; and taking a matrix obtained by multiplying the total weight matrix of the convolutional neural network model by the weight matrix of the initial input data as an association matrix corresponding to the DE value of the corn starch process site and the starch milk product. According to the method, a process adjustment scheme can be quickly formed for different raw materials according to the correlation condition of the corn starch process site and the DE value of the starch milk product, and the DE value of the starch milk is improved.

Description

Correlation analysis method of corn starch process parameters and starch milk DE value

技术领域technical field

本发明涉及大数据的工艺建模领域，具体地涉及一种玉米淀粉工艺参数与淀粉乳DE值的关联度分析方法。The invention relates to the field of process modeling of big data, in particular to a method for analyzing the correlation degree between corn starch process parameters and starch milk DE value.

背景技术Background technique

玉米淀粉工业是碳水化合物衍生物的基础产业，其产品中的玉米淀粉是食品加工的重要原料。在国民生活、经济和农业产业化发展方面，食品加工中的淀粉行业是我国食品行业的重要组成部分，也是玉米深加工中最重要的环节之一。The corn starch industry is the basic industry of carbohydrate derivatives, and the corn starch in its products is an important raw material for food processing. In terms of national life, economy and agricultural industrialization development, the starch industry in food processing is an important part of my country's food industry and one of the most important links in corn deep processing.

不同来源的玉米原料，在加工过程中所对应的适宜的加工工艺参数也会有所不同。随着管理的精细化，玉米深加工企业在生产过程中采集的数据日益庞杂，数据量呈爆炸式增长，且数据均为时序数据。面对着海量的数据记录，生产管理人员仅依靠生产经验，很难准确地分析数据间的关联关系。在玉米淀粉工艺中反映了淀粉乳产品的品质的重要指标为DE值，DE值表示还原糖占糖浆干物质的百分比；如何利用历史工艺数据挖掘出对DE值影响最大的工艺参数，是我们亟需解决的技术问题。Different sources of corn raw materials have different appropriate processing parameters during processing. With the refinement of management, the data collected by corn deep processing enterprises in the production process is becoming more and more complex, and the amount of data is increasing explosively, and the data are all time series data. Faced with massive data records, it is difficult for production managers to accurately analyze the correlation between data only by relying on production experience. In the corn starch process, an important indicator that reflects the quality of starch milk products is the DE value, which represents the percentage of reducing sugar in the dry matter of the syrup; how to use the historical process data to dig out the process parameters that have the greatest impact on the DE value is an urgent issue for us. technical issues to be resolved.

发明内容SUMMARY OF THE INVENTION

本发明实施例的目的是提供一种玉米淀粉工艺参数与淀粉乳DE值的关联度分析方法，主要解决玉米淀粉工艺参数与淀粉乳DE值之间的关联度，以实现提高淀粉乳DE值的工艺调整方案。The purpose of the embodiment of the present invention is to provide a method for analyzing the correlation degree of corn starch process parameters and starch milk DE value, mainly to solve the correlation degree between corn starch process parameters and starch milk DE value, so as to realize the improvement of the DE value of starch milk. Process adjustment program.

为了实现上述目的，一种玉米淀粉工艺参数与淀粉乳DE值的关联度分析方法，包括：In order to achieve the above purpose, a method for analyzing the correlation between corn starch process parameters and starch milk DE value, comprising:

获取与玉米淀粉工艺中的位点相关联的玉米原料数据和玉米淀粉工艺中的生产监测原始数据共同形成初始输入数据；Obtaining corn feedstock data associated with sites in the corn starch process and production monitoring raw data in the corn starch process together form initial input data;

运用主成分分析法对初始输入数据进行特征提取处理，得到初始输入数据的特征向量；以及运用主成分分析法对初始输入数据进行降维处理，得到初始输入数据的主成分输入数据；Using the principal component analysis method to perform feature extraction processing on the initial input data to obtain the feature vector of the initial input data; and using the principal component analysis method to reduce the dimension of the initial input data to obtain the principal component input data of the initial input data;

将所述主成分输入数据运用卷积神经网络模型进行训练，并计算训练后的卷积神经网络模型的总权值矩阵；以及将所述初始输入数据的特征向量依次组合成的矩阵作为初始输入数据的权值矩阵；The principal component input data is trained using the convolutional neural network model, and the total weight matrix of the convolutional neural network model after training is calculated; and the matrix that the eigenvectors of the initial input data are sequentially combined into is used as the initial input The weight matrix of the data;

将所述卷积神经网络模型的总权值矩阵与所述初始输入数据的权值矩阵相乘，将得到的矩阵作为玉米淀粉工艺位点与玉米淀粉工艺中的淀粉乳产品DE值对应的关联度矩阵。Multiply the total weight matrix of the convolutional neural network model with the weight matrix of the initial input data, and use the obtained matrix as the correlation between the corn starch process site and the starch milk product DE value in the corn starch process. degree matrix.

优选的，所述获取与玉米淀粉工艺中的位点相关联的玉米原料数据和玉米淀粉工艺中的生产监测原始数据，形成初始输入数据，包括：Preferably, the acquisition of the corn raw material data associated with the site in the corn starch process and the production monitoring raw data in the corn starch process forms initial input data, including:

将与玉米淀粉工艺中的位点相关联的玉米原料数据和玉米淀粉工艺中的生产监测原始数据进行去噪处理，得到去噪数据；Denoising the corn raw material data associated with the site in the corn starch process and the production monitoring raw data in the corn starch process to obtain denoised data;

运用Lasso回归分析法对所述去噪数据进行特征选择，将特征选择后的数据作为所述初始输入数据。Feature selection is performed on the denoised data using Lasso regression analysis, and the data after feature selection is used as the initial input data.

优选的，所述运用主成分分析法对初始输入数据进行特征提取处理，得到初始输入数据的特征向量，包括：Preferably, the principal component analysis method is used to perform feature extraction processing on the initial input data to obtain the feature vector of the initial input data, including:

将初始输入数据进行标准化转换得到标准化矩阵Z，运用所述标准化矩阵Z得到所述标准化矩阵Z和标准化矩阵Z的转置矩阵之间的相关系数矩阵R；Standardize the initial input data to obtain a standardized matrix Z, and use the standardized matrix Z to obtain a correlation coefficient matrix R between the standardized matrix Z and the transposed matrix of the standardized matrix Z;

运用相关系数矩阵R的特征方程，得到单位特征向量

作为初始输入数据的特征向量。Use the eigen equation of the correlation coefficient matrix R to get the unit eigenvector

Feature vector as initial input data.

优选的，所述运用主成分分析法对初始输入数据进行降维处理，得到初始输入数据的主成分输入数据，包括：Preferably, the principal component analysis method is used to perform dimensionality reduction processing on the initial input data to obtain the principal component input data of the initial input data, including:

将所述标准化阵Z和所述单位特征向量

运用关联公式得到的矩阵作为初始输入数据的主成分输入数据；所述关联公式为：The normalized matrix Z and the unit eigenvector

The matrix obtained by using the correlation formula is used as the principal component input data of the initial input data; the correlation formula is:

j＝1,2,...,m；m是主成分个数。

j=1,2,...,m; m is the number of principal components.

优选的，所述将初始输入数据的特征向量依次组合成的矩阵作为初始输入数据的权值矩阵，包括：Preferably, the matrix formed by sequentially combining the eigenvectors of the initial input data is used as the weight matrix of the initial input data, including:

将单位特征向量

依次作为列向量，得到的矩阵作为初始输入数据的权值矩阵W_p：the unit eigenvector

As a column vector in turn, the resulting matrix is used as the weight matrix W _p of the initial input data:

m是主成分个数；p是特征位点的个数。

m is the number of principal components; p is the number of feature sites.

优选的，所述卷积神经网络模型包括多个一维卷积核和全连接神经网络；所述主成分输入数据包括训练集和验证集；Preferably, the convolutional neural network model includes a plurality of one-dimensional convolution kernels and a fully connected neural network; the principal component input data includes a training set and a validation set;

所述将主成分输入数据运用卷积神经网络模型进行训练，包括：The principal component input data is trained using a convolutional neural network model, including:

利用训练集中的多个一维卷积核对主成分输入数据进行两次卷积操作的数据输入全连接神经网络进行迭代训练，直到训练集和验证集的均方误差均收敛时，训练完成。Using multiple one-dimensional convolution kernels in the training set to perform two convolution operations on the principal component input data, the data is input to the fully connected neural network for iterative training until the mean square errors of the training set and the validation set converge, and the training is completed.

优选的，所述计算训练后的卷积神经网络模型的总权值矩阵，包括：Preferably, the calculation of the total weight matrix of the trained convolutional neural network model includes:

1)将卷积层中同一特征不同时间的权值相加，得到卷积层的权值w_jc：1) Add the weights of the same feature in the convolutional layer at different times to obtain the weights of the convolutional layer w _jc :

其中，s为卷积核的个数；p为卷积核的大小；n为卷积核c在卷积操作后的大小；wh为卷积核中的权值系数；wq为全连接层中的权值系数：wq＝wq₁wq₂,...,wq_a；其中，a为全连接层的层数；Among them, s is the number of convolution kernels; p is the size of the convolution kernel; n is the size of the convolution kernel c after the convolution operation; wh is the weight coefficient in the convolution kernel; wq is the fully connected layer. The weight coefficient of : wq=wq ₁ wq ₂ ,...,wq _a ; where a is the number of layers of the fully connected layer;

2)将卷积层的权值w_jc组合成的矩阵作为卷积神经网络模型的总权值矩阵W_c：W_c＝(w_1c,...,w_jc)。2) A matrix formed by combining the weights w _jc of the convolutional layer is used as the total weight matrix W _c of the convolutional neural network model: W _c =(w _1c , . . . , w _jc ).

优选的，所述玉米淀粉工艺位点与淀粉乳产品DE值对应的关联度矩阵W为：Preferably, the correlation matrix W corresponding to the corn starch process site and the DE value of the starch milk product is:

W＝W_pW_c。W ₌ _WpWc .

优选的，所述关联度分析方法，还包括：Preferably, the correlation analysis method further includes:

将所述玉米淀粉工艺位点与淀粉乳产品DE值对应的关联度矩阵中的元素的绝对值进行排序，得到排序序列；根据排序序列的先后顺序确定玉米淀粉工艺位点对DE值之间的影响程度。Sort the absolute values of the elements in the correlation matrix corresponding to the corn starch process site and the DE value of the starch milk product to obtain a sorting sequence; determine the difference between the corn starch process site and the DE value according to the sequence of the sorting sequence. influence level.

优选的，所述影响程度包括正相关和负相关；Preferably, the degree of influence includes positive correlation and negative correlation;

当玉米淀粉工艺位点与淀粉乳产品DE值对应的关联度矩阵中的元素为负值时，确定该元素对应的玉米淀粉工艺位点对淀粉乳DE值的影响程度为负相关；When the element in the correlation matrix corresponding to the corn starch processing site and the starch milk product DE value is negative, it is determined that the influence degree of the corn starch processing site corresponding to the element on the starch milk DE value is negative correlation;

当玉米淀粉工艺位点与淀粉乳产品DE值对应的关联度矩阵中的元素为正值时，确定该元素对应的玉米淀粉工艺位点对淀粉乳DE值的影响程度为正相关。When the element in the correlation degree matrix corresponding to the corn starch processing site and the starch milk product DE value is positive, it is determined that the influence degree of the corn starch processing site corresponding to the element on the starch milk DE value is positive correlation.

通过上述技术方案，主成分分析法PCA对玉米淀粉工艺数据进行特征位点的降维，一定程度上解决了数据的维度灾难问题；卷积神经网络CNN模型同时对时间特征进行了捕捉，解决数据的时序不一致问题，使大数据的建模更加准确可靠，最后通过权值连接法WCM可以直观易于理解展示了对淀粉乳产品DE值影响的特征位点的关联情况，最终根据关联情况，针对不同原料可以快速的形成工艺调整方案，提高淀粉乳产品的DE值。Through the above technical solutions, the principal component analysis method PCA reduces the dimensionality of the characteristic sites of the corn starch process data, which solves the problem of the dimensional disaster of the data to a certain extent; the convolutional neural network CNN model also captures the temporal features and solves the problem of the data. The problem of time series inconsistency makes the big data modeling more accurate and reliable. Finally, the weight connection method WCM can intuitively and easily understand the relationship between the feature sites that affect the DE value of starch milk products. The raw materials can quickly form a process adjustment plan to improve the DE value of starch milk products.

本发明实施例的其它特征和优点将在随后的具体实施方式部分予以详细说明。Other features and advantages of embodiments of the present invention will be described in detail in the detailed description section that follows.

附图说明Description of drawings

附图是用来提供对本发明实施例的进一步理解，并且构成说明书的一部分，与下面的具体实施方式一起用于解释本发明实施例，但并不构成对本发明实施例的限制。在附图中：The accompanying drawings are used to provide a further understanding of the embodiments of the present invention, and constitute a part of the specification, and are used to explain the embodiments of the present invention together with the following specific embodiments, but do not constitute limitations to the embodiments of the present invention. In the attached image:

图1是实施例中提供的一种玉米淀粉工艺参数与淀粉乳DE值的关联度分析方法示意图；Fig. 1 is the correlation analysis method schematic diagram of a kind of corn starch technological parameter and starch milk DE value provided in the embodiment;

图2是实施例中提供的主成分的贡献率及累计贡献率示意图；2 is a schematic diagram of the contribution rate and cumulative contribution rate of the principal components provided in the embodiment;

图3是实施例中提供的一种卷积神经网络模型模型原理图；Fig. 3 is a kind of convolutional neural network model schematic diagram provided in the embodiment;

图4是实施例中提供的卷积神经网路模型的训练结果示意图；Fig. 4 is the training result schematic diagram of the convolutional neural network model provided in the embodiment;

图5是实施例中提供的调整本发明确定的位点对应的参数前后的淀粉乳产品DE值检测图。Fig. 5 is the DE value detection diagram of starch milk products before and after adjusting the parameters corresponding to the site determined by the present invention provided in the Examples.

具体实施方式Detailed ways

以下结合附图对本发明实施例的具体实施方式进行详细说明。应当理解的是，此处所描述的具体实施方式仅用于说明和解释本发明实施例，并不用于限制本发明实施例。The specific implementations of the embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be understood that the specific implementation manners described herein are only used to illustrate and explain the embodiments of the present invention, and are not used to limit the embodiments of the present invention.

实施例1Example 1

如图1所示，本实施例提供的一种玉米淀粉工艺参数与淀粉乳DE值的关联度分析方法，包括：As shown in Figure 1, the correlation analysis method of a kind of corn starch technological parameter and starch milk DE value that the present embodiment provides, comprises:

运用主成分分析法对初始输入数据进行特征提取处理，得到初始输入数据的特征向量；以及运用主成分分析法对初始输入数据进行降维处理，得到初始输入数据的主成分输入数据；这里主要对进行初始输入数据一次特征提取和降维；Use the principal component analysis method to perform feature extraction processing on the initial input data to obtain the feature vector of the initial input data; and use the principal component analysis method to reduce the dimension of the initial input data to obtain the principal component input data of the initial input data; Perform feature extraction and dimensionality reduction on the initial input data once;

将所述主成分输入数据运用卷积神经网络模型进行训练，并计算训练后的卷积神经网络模型的总权值矩阵；以及将所述初始输入数据的特征向量依次组合成的矩阵作为初始输入数据的权值矩阵；通过卷积神经网络模型的机器学习，主要是通过卷积核进行二次特征提取后进行机器学习；The principal component input data is trained using the convolutional neural network model, and the total weight matrix of the convolutional neural network model after training is calculated; and the matrix that the eigenvectors of the initial input data are sequentially combined into is used as the initial input The weight matrix of the data; the machine learning through the convolutional neural network model mainly uses the convolution kernel for secondary feature extraction and then performs machine learning;

将所述卷积神经网络的总权值矩阵与所述初始输入数据的权值矩阵相乘，将得到的矩阵作为玉米淀粉工艺位点与淀粉乳产品DE值对应的关联度矩阵。根据关联度矩阵，可以确定影响度大小，从而运用影响度大的特征位点参数进行生产工艺参数的优化，提高淀粉乳产品DE值。其中，DE值反映了玉米淀粉工艺中淀粉乳产品的品质。The total weight matrix of the convolutional neural network is multiplied by the weight matrix of the initial input data, and the obtained matrix is used as the correlation degree matrix corresponding to the corn starch process site and the DE value of the starch milk product. According to the correlation degree matrix, the degree of influence can be determined, so that the parameters of the characteristic sites with high degree of influence can be used to optimize the production process parameters and improve the DE value of starch milk products. Among them, the DE value reflects the quality of starch milk products in the corn starch process.

优选的，所述获取与玉米淀粉工艺中的位点相关联的玉米原料数据和玉米淀粉工艺中的生产监测原始数据形成初始数据；初始数据为矩阵数据，矩阵数据的列为玉米原料数据中的理化数据及玉米淀粉工艺中的生产监测原始数据中的加工工艺数据，不要求排列顺序；行为时间(按照时间顺序，间隔为10min)，然后进行初始数据的特征选择。具体的，将与玉米淀粉工艺中的位点相关联的玉米原料数据和玉米淀粉工艺中的生产监测原始数据进行去噪处理，得到去噪数据；Preferably, the acquisition of the corn raw material data associated with the site in the corn starch process and the production monitoring raw data in the corn starch process forms initial data; the initial data is matrix data, and the matrix data is listed as one of the corn raw material data. The physical and chemical data and the processing technology data in the raw data of production monitoring in the corn starch process do not require an order; behavior time (according to the time order, the interval is 10min), and then the feature selection of the initial data is carried out. Specifically, the corn raw material data associated with the site in the corn starch process and the production monitoring raw data in the corn starch process are denoised to obtain denoised data;

具体的，去噪处理的具体方法可以采用调整停留时间，异常值去除，缺失值补充，噪声去除和特征降维中的一种或多种数据处理方式。其中各种数据处理方式具体方式如下1.1)-1.4)步骤中所示：Specifically, the specific method of denoising processing may adopt one or more data processing methods among adjusting residence time, removing outliers, supplementing missing values, removing noise and reducing feature dimension. The specific methods of various data processing methods are shown in the steps 1.1)-1.4) as follows:

1.1)考虑到同一批次物料在不同工序之中会有较长的停留时间，简单的截取同一时刻的工艺时间记录位点，会造成物料批次错位，用“现在”的输入参数去关联受到“过去”输入影响的目标参数。因此，以玉米原始数据为基准按照停留时间关联后段工艺数据：淀粉工艺数据和淀粉检测数据时间为48h(玉米浸泡)、糖化工段时间为50h(液化柱内停留)、果糖的OCC数据时间为98h(糖化时间)。以此确保输入数据矩阵时间行中每一列数据对应的是同一批玉米原料，保证数据信息的相对一致性。1.1) Considering that the same batch of materials will have a long residence time in different processes, simply intercepting the process time recording point at the same time will cause the material batch to be dislocated, and the "current" input parameters are used to correlate and be affected. The "Past" input affects the target parameter. Therefore, based on the original corn data, the process data of the later stage is correlated according to the residence time: the starch process data and starch detection data time is 48h (corn soaking), the saccharification section time is 50h (retention in the liquefaction column), and the OCC data time of fructose For 98h (saccharification time). In this way, it is ensured that each column of data in the time row of the input data matrix corresponds to the same batch of corn raw materials, and the relative consistency of data information is ensured.

1.2)由于工艺设备传感器的稳定性问题，其记录的工艺数据可能出现异常值的情况。对工艺中出现的异常值进行识别，并删除数据所对应时间的所有位点数据。如，数据矩阵j列(W位点)在第i个(T时刻)数据异常，则删除数据矩阵中的i行。异常值的判定优选采用肖维勒法对数据的异常值进行识别；肖维勒法的判别公式为：1.2) Due to the stability of the sensor of the process equipment, the recorded process data may have abnormal values. Identify outliers in the process and delete all site data at the time corresponding to the data. For example, if the data in the j column (W site) of the data matrix is abnormal at the ith (T time), then delete the i row in the data matrix. The judgment of outliers is preferably to use the Chauville method to identify the outliers of the data; the discriminant formula of the Chauville method is:

ω_n＝1+0.4ln(n) (2)ω _n =1+0.4ln(n) (2)

其中，

为数据的均值，S_x为数据的标准差，ω_n为肖维勒系数，n为数据点个数；运用(1)式，可以确定数据中的异常值。in,

is the mean value of the data, S _x is the standard deviation of the data, ω _n is the Chauville coefficient, and n is the number of data points; using formula (1), the outliers in the data can be determined.

1.3)由于玉米原料记录数据的时间间隔大于设定值，不同于后续玉米淀粉工艺操作的数据，需要对玉米原料数据进行填补，保证数据的稳定性。采用插值法对空缺的数据进行填补，如式(3-4)所示：1.3) Since the time interval of the corn raw material recording data is greater than the set value, it is different from the data of the subsequent corn starch process operation, and the corn raw material data needs to be filled to ensure the stability of the data. The vacant data is filled by interpolation method, as shown in formula (3-4):

x_{t_i}＝x_{t_n}+k*(i-n) (4)x _{t_i} = x _{t_n} +k*(in) (4)

其中，x_{t_n}和x_{t_m}为缺失值前后的已知数据；K为新增的空缺数据。Among them, x _{t_n} and x _{t_m} are the known data before and after the missing value; K is the newly added vacancy data.

1.4)由于工艺运行稳定性和检测技术的限制，采集的数据会出现存在大量噪声的情况，这些噪声极大影响模型训练结果。因此需要对原始的工艺数据进行降噪处理。在进行数据矩阵的异常值处理后，该过程无需对位点数据进行判断，对数据矩阵中所有数据按列进行降噪。采用“(2n+1点)单纯移动平均”法来平滑滤波进行数据的降噪，如式(5)所示：1.4) Due to the limitations of process operation stability and detection technology, there will be a lot of noise in the collected data, which greatly affects the model training results. Therefore, it is necessary to perform noise reduction processing on the original process data. After processing the outliers of the data matrix, the process does not need to judge the site data, and denoises all the data in the data matrix by column. The "(2n+1 point) simple moving average" method is used to smooth the data for noise reduction, as shown in formula (5):

其中，x′_i为降噪后的数据，x_i+1为以x_i为中心前后各n个的数据。Among them, x' _i is the data after noise reduction, and x _i+1 is the data of n before and after _xi as the center.

上述运用Lasso回归分析法对所述去噪数据进行特征选择，是为了选择有意义的特征进行下一步的模型训练。Lasso回归分析法的定义函数，如式(6)所示：The above-mentioned feature selection of the denoised data by using the Lasso regression analysis method is to select meaningful features for the next step of model training. The definition function of Lasso regression analysis method is shown in formula (6):

其中λ为非负正则参数，控制着模型的复杂程度；β为回归系数向量；n为特征个数。λ越大，对特征较多的线性模型惩罚力度就越大，从而获取一个特征较少的模型，本模型选取λ＝0.01。Among them, λ is a non-negative regular parameter, which controls the complexity of the model; β is the regression coefficient vector; n is the number of features. The larger the λ, the greater the penalty for the linear model with more features, so as to obtain a model with fewer features. This model selects λ=0.01.

将特征选择后的数据进行标准化转换得到标准化矩阵Z，运用所述标准化矩阵Z得到所述标准化矩阵Z和标准化矩阵Z的转置矩阵之间的相关系数矩阵R；具体的标准化转换的公式(7)：The data after feature selection is standardized and transformed to obtain a standardized matrix Z, and the standardized matrix Z is used to obtain a correlation coefficient matrix R between the standardized matrix Z and the transposed matrix of the standardized matrix Z; the formula of the specific standardized transformation (7 ):

其中，n为时间维度数据的个数，p为特征位点的个数，

为特征位点数据的平均值，s为特征位点数据的方差。Z_ij表示标准化矩阵Z中的元素，i和j为对应的行和列索引。Among them, n is the number of time dimension data, p is the number of feature sites,

is the mean of the feature site data, and s is the variance of the feature site data. Z _ij represents the elements in the normalized matrix Z, and i and j are the corresponding row and column indices.

运用相关系数矩阵R的特征方程，得到单位特征向量

作为初始输入数据的特征向量。具体的，相关系数矩阵的特征方程为：Use the eigen equation of the correlation coefficient matrix R to get the unit eigenvector

Feature vector as initial input data. Specifically, the characteristic equation of the correlation coefficient matrix is:

|R-λI_p|＝0 (8)|R-λI _p |=0 (8)

求解得到p个特征根λ；Solve to get p characteristic roots λ;

运用公式(9-10)求解得到单位特征向量

Use formula (9-10) to solve to get the unit eigenvector

Rb＝λ_jb,j＝1,2,...,m (10)Rb=λj b, _j =1,2,...,m (10)

其中，m为确定的主成分个数，主成分的贡献率K_j和累计贡献率S_m如图2所示。Among them, m is the determined number of principal components, and the contribution rate K _j and cumulative contribution rate S _m of the principal components are shown in Fig. 2 .

将所述标准化阵Z和所述单位特征向量

j＝1,2,...,m；m是主成分个数；Z_i是标准化矩阵Z的行向量。

j=1,2,...,m; m is the number of principal components; Z _i is the row vector of the normalized matrix Z.

将单位特征向量

m是主成分个数；p是特征位点的个数。

m is the number of principal components; p is the number of feature sites.

优选的，如图3所示，所述卷积神经网络模型包括多个一维卷积核和全连接神经网络；所述主成分输入数据包括训练集和验证集；Preferably, as shown in FIG. 3 , the convolutional neural network model includes a plurality of one-dimensional convolution kernels and a fully connected neural network; the principal component input data includes a training set and a validation set;

首先利用多个一维卷积核对主成分输入数据进行卷积操作，然后输入全连接层神经网络。对此过程进行卷积和神经网络的迭代训练，直到训练集和验证集的均方误差均收敛时，训练完成。均方误差的计算如式(12)所示。如图4所示，训练集和验证集的均方误差均一致接近于零，因此模型训练较好，且未发生过拟合现象。First, multiple one-dimensional convolution kernels are used to perform convolution operations on the principal component input data, and then input to the fully connected layer neural network. The iterative training of convolutional and neural networks is performed for this process until the mean squared errors of the training set and the validation set converge, and the training is completed. The calculation of the mean square error is shown in formula (12). As shown in Figure 4, the mean square errors of both the training set and the validation set are consistently close to zero, so the model is trained well without overfitting.

其中，y_i为真实值，y′_i为预测值。Among them, y _i is the real value, and y′ _i is the predicted value.

采用一维卷积核的原因在于，捕捉时序对玉米淀粉工艺的影响，而位点数据之间的相对位置特征对此工艺数据的知识学习没有帮助，否则机器学习模型会过于挖掘数据的特征。进行全连接神经网络模型训练，将学到的“分布式特征表示”映射到样本标记空间的作用，相当于将卷积操作提取的特征与目标联系在一起，通过不断地迭代训练和学习得出数据中隐含的规律。The reason for using a one-dimensional convolution kernel is to capture the impact of time series on the corn starch process, and the relative position characteristics between the site data are not helpful for the knowledge learning of this process data, otherwise the machine learning model will mine the characteristics of the data too much. Performing the training of the fully connected neural network model and mapping the learned "distributed feature representation" to the sample label space is equivalent to linking the features extracted by the convolution operation with the target, which is obtained through continuous iterative training and learning. patterns in the data.

优选的，所述玉米淀粉工艺位点与淀粉乳产品DE值对应的关联度矩阵W为：W＝W_pW_c。Preferably, the correlation matrix W corresponding to the corn starch process site and the DE value of the starch milk product is: W=W _p W _c .

对总权值矩阵W_c中元素的绝对值进行排序，可得到对淀粉乳产品DE值目标影响较大的特征位点，为下一步的工艺参数调优，提供一定的指导，结果如下表1所示。Sorting the absolute values of the elements in the total weight matrix W _c , the characteristic sites that have a greater impact on the DE value target of starch milk products can be obtained, which can provide certain guidance for the next step of process parameter tuning. The results are as follows in Table 1 shown.

表1：特征位点位号与DE值相关性对应表Table 1: Correlation table of feature site tag number and DE value

特征位点characteristic site 与DE值相关性Correlation with DE value LIA2103_7LIA2103_7 负相关negative correlation LIA2103_3LIA2103_3 正相关positive correlation LIA2103_5LIA2103_5 负相关negative correlation LIA2103_8LIA2103_8 正相关positive correlation LIA2103_1LIA2103_1 正相关positive correlation PID1\LIC1401_2_5-PVPID1\LIC1401_2_5-PV 正相关positive correlation LEVEL1\LIA_1473_1LEVEL1\LIA_1473_1 正相关positive correlation LEVEL2\LIA_1590LEVEL2\LIA_1590 负相关negative correlation PID1\LIC_302A-PVPID1\LIC_302A-PV 负相关negative correlation PID1\LIC_502A-PVPID1\LIC_502A-PV 正相关positive correlation PID1\LIC1401_2_12-PVPID1\LIC1401_2_12-PV 负相关negative correlation PID1\LIC_501A-PVPID1\LIC_501A-PV 负相关negative correlation PID1\LIC_301B-PVPID1\LIC_301B-PV 正相关positive correlation PID1\LIC_502B-PVPID1\LIC_502B-PV 负相关negative correlation LEVEL1\LIA_1401_1_5LEVEL1\LIA_1401_1_5 正相关positive correlation LEVEL1\LIA_1684LEVEL1\LIA_1684 正相关positive correlation LEVEL2\LIA_1401_3_2LEVEL2\LIA_1401_3_2 负相关negative correlation LEVEL1\LIA_1401_1_4LEVEL1\LIA_1401_1_4 负相关negative correlation LIA2103_2LIA2103_2 正相关positive correlation LEVEL1\LIA_1631LEVEL1\LIA_1631 正相关positive correlation

调整影响较大的特征位点，进行生产工艺参数的优化。比如，对淀粉乳产品DE值正负影响较大的特征位点参数，分别相应调整10％，某段时间的结果如图5所示，DE值累计提高21％。Adjust the feature sites with greater influence to optimize the production process parameters. For example, the parameters of the characteristic sites that have a large positive and negative impact on the DE value of starch milk products are adjusted by 10% respectively. The results of a certain period of time are shown in Figure 5, and the DE value has increased by 21%.

从以上技术方案可以看出，本发明具有以下优点：对玉米淀粉原料及工艺数据进行分析，得到针对不同玉米原料，影响淀粉乳产品DE值均较大的特征参数，结果具有很好的普适性。通过对挖掘的特征参数进行调整，提高了产品的DE值，为玉米淀粉加工生产提供了更好的辅助作用。It can be seen from the above technical solutions that the present invention has the following advantages: the corn starch raw materials and process data are analyzed to obtain characteristic parameters that affect the DE value of starch milk products for different corn raw materials, and the results have good universality. sex. By adjusting the characteristic parameters of the excavation, the DE value of the product is improved, which provides a better auxiliary effect for the processing and production of corn starch.

通过以上数据处理、主成分分析、卷积神经网络建模、模型权值提取和确定确定优化特征位点参数五个步骤，进行了玉米淀粉工艺参数与淀粉乳产品DE值的关联度分析。对玉米淀粉生产数据进行了特征选择和降维，不仅避免了大数据建模的维度灾难问题，而且保证了输入机器学习模型数据的独立性，同时进行了两次的特征提取和模型训练，在对工艺数据的时序特征进行挖掘和提取的同时，使此技术方案得到的结果更加可靠。此技术方案同样可用于其它工艺的大数据挖掘，调整生产工艺参数。从而优化产品生产、提高产品质量、维护企业生产安全。Through the above five steps of data processing, principal component analysis, convolutional neural network modeling, model weight extraction and determination of optimal characteristic site parameters, the correlation degree between corn starch process parameters and starch milk products DE value was analyzed. Feature selection and dimensionality reduction for corn starch production data not only avoids the dimensional disaster problem of big data modeling, but also ensures the independence of input machine learning model data. At the same time, feature extraction and model training are performed twice. While mining and extracting the time series features of the process data, the results obtained by this technical solution are more reliable. This technical solution can also be used for big data mining of other processes to adjust production process parameters. Thereby optimizing product production, improving product quality, and maintaining enterprise production safety.

以上结合附图详细描述了本发明实施例的可选实施方式，但是，本发明实施例并不限于上述实施方式中的具体细节，在本发明实施例的技术构思范围内，可以对本发明实施例的技术方案进行多种简单变型，这些简单变型均属于本发明实施例的保护范围。The optional embodiments of the embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the embodiments of the present invention are not limited to the specific details of the above-mentioned embodiments. A variety of simple modifications are made to the technical solution of the invention, and these simple modifications all belong to the protection scope of the embodiments of the present invention.

另外需要说明的是，在上述具体实施方式中所描述的各个具体技术特征，在不矛盾的情况下，可以通过任何合适的方式进行组合。为了避免不必要的重复，本发明实施例对各种可能的组合方式不再另行说明。In addition, it should be noted that each specific technical feature described in the above-mentioned specific implementation manner may be combined in any suitable manner under the circumstance that there is no contradiction. To avoid unnecessary repetition, various possible combinations are not further described in this embodiment of the present invention.

此外，本发明实施例的各种不同的实施方式之间也可以进行任意组合，只要其不违背本发明实施例的思想，其同样应当视为本发明实施例所公开的内容。In addition, various implementations of the embodiments of the present invention can also be combined arbitrarily, as long as they do not violate the ideas of the embodiments of the present invention, they should also be regarded as the contents disclosed in the embodiments of the present invention.

Claims

1. A method for analyzing the correlation degree of corn starch process parameters and a starch milk DE value is characterized by comprising the following steps:

obtaining corn raw material data associated with a site in a corn starch process and production monitoring raw data in the corn starch process to jointly form initial input data;

performing feature extraction processing on the initial input data by using a principal component analysis method to obtain a feature vector of the initial input data; performing dimensionality reduction on the initial input data by using a principal component analysis method to obtain principal component input data of the initial input data;

training the principal component input data by using a convolutional neural network model, and calculating a total weight matrix of the trained convolutional neural network model; and a matrix formed by sequentially combining the characteristic vectors of the initial input data is used as a weight matrix of the initial input data;

and multiplying the total weight matrix of the convolutional neural network model by the weight matrix of the initial input data, and taking the obtained matrix as an association matrix corresponding to the corn starch process site and the starch milk DE value.

2. The analytical method of claim 1, wherein the obtaining corn feedstock data associated with a site in a corn starch process and raw production monitoring data in the corn starch process together form initial input data comprising:

denoising corn raw material data associated with a site in a corn starch process and production monitoring original data in the corn starch process to obtain denoising data;

and selecting the characteristics of the de-noised data by using a Lasso regression analysis method, and taking the data after characteristic selection as the initial input data.

3. The analysis method according to claim 1 or 2, wherein the performing feature extraction processing on the initial input data by using a principal component analysis method to obtain a feature vector of the initial input data comprises:

carrying out standardization conversion on initial input data to obtain a standardized matrix Z, and obtaining a correlation coefficient matrix R between the standardized matrix Z and a transposed matrix of the standardized matrix Z by using the standardized matrix Z;

application correlationThe characteristic equation of the coefficient matrix R is used for obtaining a unit characteristic vector

As a feature vector of the initial input data.

4. The analysis method according to claim 3, wherein the performing the dimensionality reduction on the initial input data by using the principal component analysis method to obtain principal component input data of the initial input data comprises:

the normalized matrix Z and the unit feature vector are combined

Using a matrix obtained by a correlation formula as principal component input data of the initial input data; the correlation formula is as follows:

m is the number of principal components.

5. The analysis method according to claim 3, wherein the sequentially combining the eigenvectors of the initial input data into a matrix as the weight matrix of the initial input data comprises:

unit feature vector

Sequentially used as column vectors, and the obtained matrix is used as a weight matrix W of initial input data_p：

m is the number of main components; p is the number of characteristic sites.

6. The analytical method of claim 1, wherein the convolutional neural network model comprises a plurality of one-dimensional convolutional kernels and a fully-connected neural network; the principal component input data comprises a training set and a validation set;

the training of the principal component input data by applying the convolutional neural network model comprises the following steps:

and performing iterative training on the data input fully-connected neural network by performing two times of convolution operation on the principal component input data by utilizing a plurality of one-dimensional convolution cores in the training set until the mean square errors of the training set and the verification set are converged, and finishing the training.

7. The analysis method according to claim 6, wherein the calculating the total weight matrix of the trained convolutional neural network model comprises:

1) adding the weights of the convolution layer with the same characteristic at different time to obtain the weight w of the convolution layer_jc：

Wherein s is the number of convolution kernels; p is the size of the convolution kernel; n is the size of the convolution kernel c after the convolution operation; wh is a weight coefficient in the convolution kernel; wq is the weight coefficient in the full connection layer: wq ═ wq₁wq₂,...,wq_a(ii) a Wherein a is the number of the full connecting layers;

2) the weight w of the convolution layer_jcThe combined matrix is used as the total weight matrix W of the convolutional neural network model_c：W_c＝(w_1c,...,w_jc)。

8. The analytical method of claim 7, wherein the correlation matrix W for the corn starch process site and the DE value of the starch milk product is:

W＝W_pW_c。

9. the analytical method of claim 1, further comprising:

sequencing the absolute values of the elements in the relevance matrix corresponding to the corn starch process site and the starch milk DE value to obtain a sequencing sequence; and determining the influence degree of the corn starch process site on the DE value of the starch milk according to the sequence of the sequencing sequence.

10. The assay of claim 9, wherein the degree of influence comprises a positive correlation and a negative correlation;

when the element in the correlation matrix corresponding to the corn starch process site and the starch milk DE value is a negative value, determining that the influence degree of the corn starch process site corresponding to the element on the starch milk DE value is negative correlation;

when the element in the correlation matrix corresponding to the corn starch process site and the starch milk DE value is a positive value, determining that the influence degree of the corn starch process site corresponding to the element on the starch milk DE value is positive correlation.