CN113743301B

CN113743301B - Solid-state nanopore sequencing electric signal noise reduction processing method based on residual self-encoder convolutional neural network

Info

Publication number: CN113743301B
Application number: CN202111032628.6A
Authority: CN
Inventors: 唐鹏; 王德强; 翁婷; 方绍熙; 何石轩; 谢婉谊; 石彪
Original assignee: Chongqing Institute of Green and Intelligent Technology of CAS
Current assignee: Chongqing Institute of Green and Intelligent Technology of CAS
Priority date: 2021-09-03
Filing date: 2021-09-03
Publication date: 2023-09-26
Anticipated expiration: 2041-09-03
Also published as: CN113743301A

Abstract

The application provides a solid-state nanopore sequencing electric signal noise reduction processing method based on a residual error self-encoder convolutional neural network, which is characterized in that a self-encoder, a residual error bottleneck module, a shortcut connection, a convolutional layer, a batch regularization layer, an activation layer and other structures are introduced to construct a deep neural network model, a solid-state nanopore sequencing electric signal data set is utilized to train the model, so that the model accurately learns the characteristic mode of sequencing electric signal noise, a mapping from a noise signal to a clean signal is established, and finally the clean signal corresponding to the noise signal is predicted and estimated by using the learned mapping. The signal noise reduction method based on the residual error self-encoder convolutional neural network, disclosed by the application, enhances the recognition capability of the neural network on the noise part of the electric signal, establishes the accurate mapping from the noise electric signal to the clean signal, and realizes real-time denoising.

Description

A solid-state nanopore sequencing telecommunications based on residual autoencoder convolutional neural network Signal noise reduction processing method

技术领域Technical field

本发明属于数字信号处理技术领域，具体涉及一种基于残差自编码器卷积神经网络的固态纳米孔测序电信号降噪处理方法，其可适用于机器学习和数字信号处理技术中。The invention belongs to the field of digital signal processing technology, and specifically relates to a solid-state nanopore sequencing electrical signal denoising processing method based on a residual autoencoder convolutional neural network, which can be applied to machine learning and digital signal processing technology.

背景技术Background technique

近年来，固态纳米孔作为新兴的纳米结构材料得到了广泛地关注。与现有的基因测序系统相比，以固态纳米孔为基础研发的基因测序系统稳定，具有高通量及低成本的优势，因此得到了研究者广泛地关注。然而，固态纳米孔测序系统开发仍有大量问题亟待解决，其中固态纳米孔测序信号中的噪声严重影响最终测序结果的准确率。进一步的研究表明，固态纳米孔测序信号中的噪声主要由低频部分的flicker noise和高频部分的thermalnoise、dielectric noise、capacitive noise组成，但本质上来说：噪声测序电信号Y由干净的测序电信号X和加性测序系统噪声N叠加组成，其可表示为数学表达式：Y＝X+N。为提高测序准确率，需对噪声电信号进行降噪处理，尽可能的剔除叠加的噪声，将信号还原至干净信号的状态。In recent years, solid-state nanopores have received widespread attention as an emerging nanostructured material. Compared with existing gene sequencing systems, the gene sequencing system developed based on solid-state nanopores is stable, has the advantages of high throughput and low cost, and has therefore attracted widespread attention from researchers. However, there are still a lot of problems that need to be solved in the development of solid-state nanopore sequencing systems, among which the noise in the solid-state nanopore sequencing signals seriously affects the accuracy of the final sequencing results. Further research shows that the noise in the solid-state nanopore sequencing signal is mainly composed of flicker noise in the low-frequency part and thermalnoise, dielectric noise, and capacitive noise in the high-frequency part, but essentially: the noise sequencing electrical signal Y is composed of a clean sequencing electrical signal X and the additive sequencing system noise N are superimposed, which can be expressed as a mathematical expression: Y=X+N. In order to improve the sequencing accuracy, it is necessary to perform noise reduction processing on the noisy electrical signal, remove the superimposed noise as much as possible, and restore the signal to the state of a clean signal.

目前，针对纳米孔测序电信号的降噪处理方法主要是利用各类低通滤波器如Bessel滤波等剔除高频噪音部分以提高信噪比，然而低通滤波仅能将噪声电信号处理至可进行信号分析的程度，并且以牺牲测序信号的高频特征部分为代价。同时，该类信号处理方法不能解决噪声信号中的低频flicker noise部分影响，且处理效果不尽人意，为后续的测序信号分析带来极大不便。At present, the noise reduction processing method for nanopore sequencing electrical signals mainly uses various low-pass filters such as Bessel filter to remove the high-frequency noise part to improve the signal-to-noise ratio. However, low-pass filtering can only process the noisy electrical signals to a level that is acceptable. The extent to which signal analysis is performed, and at the expense of the high-frequency characteristic portion of the sequencing signal. At the same time, this type of signal processing method cannot solve the impact of low-frequency flicker noise in the noise signal, and the processing effect is unsatisfactory, which brings great inconvenience to subsequent sequencing signal analysis.

发明内容Contents of the invention

针对现有技术存在的问题，基于现实和生产实践的需要，本申请人投入大量资金，并经过长期研究，提供了一种基于残差自编码器卷积神经网络的固态纳米孔测序电信号降噪处理方法，其增强神经网络的对电信号噪声部分的识别能力，建立起噪声电信号到干净信号的准确映射，以实现针对固态纳米孔电信号的实时去噪。In view of the problems existing in the existing technology and based on the needs of reality and production practice, the applicant invested a large amount of money and after long-term research, provided a solid-state nanopore sequencing electrical signal reduction method based on a residual autoencoder convolutional neural network. Noise processing method, which enhances the neural network's ability to identify the noisy part of electrical signals and establishes an accurate mapping of noisy electrical signals to clean signals to achieve real-time denoising of solid-state nanopore electrical signals.

依据本发明专利的技术方案，提供一种基于残差自编码器卷积神经网络的固态纳米孔电信号降噪处理方法，所述方法引入自编码器结构、残差瓶颈模块、捷径连接、卷积层、批正则化层、激活层等结构创建深度神经网络模型，结合创建的固态纳米孔测序电信号训练数据集训练模型准确地学习信号中噪声的特征，建立起从噪声信号到干净信号的映射，从而能够运用学习到的映射根据噪声信号对其对应的干净信号进行预测和估计。According to the technical solution of the patent of the present invention, a solid-state nanopore electrical signal denoising processing method based on residual autoencoder convolutional neural network is provided. The method introduces the autoencoder structure, residual bottleneck module, shortcut connection, convolution Create a deep neural network model using stacked layers, batch regularization layers, activation layers and other structures. Combined with the created solid-state nanopore sequencing electrical signal training data set, the training model accurately learns the characteristics of noise in the signal and establishes a process from noise signal to clean signal. Mapping, so that the learned mapping can be used to predict and estimate the corresponding clean signal based on the noise signal.

进一步地，所述基于残差自编码器卷积神经网络的固态纳米孔电信号降噪处理方法，其包括以下步骤：Further, the solid-state nanopore electrical signal denoising processing method based on residual autoencoder convolutional neural network includes the following steps:

步骤S1，搭建残差自编码器卷积神经网络模型；Step S1, build a residual autoencoder convolutional neural network model;

步骤S2，选取创建训练数据集，设置所述残差自编码器卷积神经网络模型的训练参数；Step S2, select and create a training data set, and set the training parameters of the residual autoencoder convolutional neural network model;

步骤S3，依据设定的模型训练参数，以最小化损失函数为目标训练所述残差自编码器卷积神经网络模型，完成固态纳米孔测序电信号降噪处理的神经网络模型的构建；Step S3, according to the set model training parameters, train the residual autoencoder convolutional neural network model with the goal of minimizing the loss function, and complete the construction of the neural network model for solid-state nanopore sequencing electrical signal denoising processing;

步骤S4，将待处理的噪声电信号输入到固态纳米孔测序电信号降噪处理神经网络模型，输出降噪处理后的信号。Step S4: Input the noise electrical signal to be processed into the solid-state nanopore sequencing electrical signal noise reduction processing neural network model, and output the noise-reduced signal.

优选地，步骤S1中的残差自编码器卷积神经网络模型包括编码器部分和解码器部分，编码器部分包括多个残差编码模块，残差编码模块包括多个卷积层、多个激活层和多个批正则化层；解码器部分由多个逆卷积解码模块、一个卷积层和一个sigmoid激活层组成，逆卷积解码模块由多个逆卷积层、多个激活层和多个批正则化层组成。Preferably, the residual autoencoder convolutional neural network model in step S1 includes an encoder part and a decoder part. The encoder part includes multiple residual coding modules, and the residual coding module includes multiple convolutional layers, multiple Activation layer and multiple batch regularization layers; the decoder part consists of multiple inverse convolution decoding modules, a convolution layer and a sigmoid activation layer. The inverse convolution decoding module consists of multiple inverse convolution layers and multiple activation layers. and multiple batch regularization layers.

优选地，步骤S2中，步骤S2中所述训练数据集由干净的测序电信号和对应的噪声信号组成。进一步地，采集空白固态纳米孔测序系统噪声，根据DNA碱基信号标准库构建模拟过孔信号数据集并与采集的固态纳米孔测序系统噪声叠加创建训练数据集，并设置所述残差自编码器卷积神经网络模型的训练参数。Preferably, in step S2, the training data set in step S2 consists of clean sequencing electrical signals and corresponding noise signals. Further, blank solid-state nanopore sequencing system noise is collected, a simulated via signal data set is constructed based on the DNA base signal standard library and superimposed with the collected solid-state nanopore sequencing system noise to create a training data set, and the residual self-encoding is set training parameters of the convolutional neural network model.

更优选地，步骤S3中，步骤S3中的损失函数为均方误差函数：More preferably, in step S3, the loss function in step S3 is the mean square error function:

其中，X_i、Y_i分别为创建的所述训练集中的噪声信号和干净信号，θ为权重，n表示信号中的数据点数量，F(·)函数表示的是通过训练后得到的噪声信号到干净信号的映射。 _Among _them , Mapping to clean signals.

步骤S3中所述固态纳米孔测序电信号降噪处理的神经网络模型的权重的初始值由高斯随机函数生成，残差模块最后一个批正则化层的权重参数初始设定为零。The initial value of the weight of the neural network model for the solid-state nanopore sequencing electrical signal denoising process described in step S3 is generated by a Gaussian random function, and the weight parameter of the last batch regularization layer of the residual module is initially set to zero.

与现有技术相比，本发明的有益效果在于：本发明针对多种不同的噪声方差训练残差自编码器卷积神经网络模型形成对应的噪声方差下的信号降噪处理神经网络模型，并通过与待处理的图像相对应的噪声方差下的信号降噪处理神经网络模型对待处理的碱基过孔信号进行降噪处理，处理速度快。Compared with the existing technology, the beneficial effect of the present invention is that: the present invention trains the residual autoencoder convolutional neural network model for a variety of different noise variances to form a corresponding signal denoising processing neural network model under the noise variance, and The base via signal to be processed is denoised through the signal denoising neural network model under the noise variance corresponding to the image to be processed, and the processing speed is fast.

附图说明Description of the drawings

图1是依据本发明的基于残差自编码器卷积神经网络的固态纳米孔测序电信号降噪处理方法的流程图；Figure 1 is a flow chart of a solid-state nanopore sequencing electrical signal denoising processing method based on a residual autoencoder convolutional neural network according to the present invention;

图2是依据本发明的残差自编码器卷积神经网络模型的内部构造示意图；Figure 2 is a schematic diagram of the internal structure of the residual autoencoder convolutional neural network model according to the present invention;

图3是依据本发明的残差编码模块的内部构造示意图。Figure 3 is a schematic diagram of the internal structure of the residual coding module according to the present invention.

具体实施方式Detailed ways

下面将结合本发明专利实施例中的附图，对本发明专利实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本发明专利的一部分实施例，而不是全部的实施例。基于本发明专利中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明专利保护的范围。The technical solutions in the patent embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the patent embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the patent of the present invention, not all of them. Example. Based on the embodiments in the patent of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the patent of the present invention.

本发明的基于残差自编码器卷积神经网络的固态纳米孔测序电信号降噪处理方法，引入自编码器结构、残差瓶颈模块、捷径连接、卷积层、批正则化层、激活层等结构创建深度神经网络模型，结合创建的固态纳米孔测序电信号训练数据集训练模型准确地学习信号中噪声的特征，建立起从噪声信号到干净信号的映射，从而能够运用学习到的映射根据噪声信号对其对应的干净信号进行预测和估计。The solid-state nanopore sequencing electrical signal denoising processing method based on the residual autoencoder convolutional neural network of the present invention introduces the autoencoder structure, the residual bottleneck module, the shortcut connection, the convolution layer, the batch regularization layer, and the activation layer. Create a deep neural network model with other structures, and combine it with the created solid-state nanopore sequencing electrical signal training data set to train the model to accurately learn the characteristics of the noise in the signal, and establish a mapping from the noise signal to the clean signal, so that the learned mapping can be used Noisy signals are predicted and estimated from their corresponding clean signals.

具体地，本发明的基于残差自编码器卷积神经网络的固态纳米孔测序电信号降噪处理方法，主要包括以下步骤：搭建残差自编码器卷积神经网络模型，所述残差自编码器卷积神经网络模型包括编码器和解码器两个主要部分，每个部分包括多个卷积层和每个所述卷积层后的激活层；创建固态纳米孔电信号训练集，并设置所述残差自编码器卷积神经网络模型的训练参数；根据所述残差自编码器神经网络模型及其训练参数，以最小化损失函数为目标训练所述残差自编码器神经网络模型形成固态纳米孔电信号降噪处理模型；将待处理的固态纳米孔电信号输入到所述固态纳米孔电信号降噪处理模型，输出降噪处理后的电信号。Specifically, the solid-state nanopore sequencing electrical signal denoising processing method based on the residual autoencoder convolutional neural network of the present invention mainly includes the following steps: building a residual autoencoder convolutional neural network model, and the residual autoencoder convolutional neural network model is constructed. The encoder convolutional neural network model includes two main parts: an encoder and a decoder, each part including multiple convolutional layers and an activation layer after each of the convolutional layers; a solid-state nanopore electrical signal training set is created, and Setting the training parameters of the residual autoencoder convolutional neural network model; training the residual autoencoder neural network with the goal of minimizing the loss function according to the residual autoencoder neural network model and its training parameters The model forms a solid-state nanopore electrical signal noise reduction processing model; the solid-state nanopore electrical signal to be processed is input into the solid-state nanopore electrical signal noise reduction processing model, and the noise-reduced electrical signal is output.

下面对照附图并结合优选的实施方式对本发明作进一步说明。如图1所示，一种基于残差自编码器卷积神经网络的固态纳米孔电信号降噪处理方法，包括以下步骤：The present invention will be further described below with reference to the accompanying drawings and preferred embodiments. As shown in Figure 1, a solid-state nanopore electrical signal denoising processing method based on residual autoencoder convolutional neural network includes the following steps:

步骤S2，选取创建训练数据集，设置模型初始参数；Step S2, select to create a training data set and set the initial parameters of the model;

步骤S3，依据设定的模型训练参数，以最小化损失函数为目标训练所述残差自编码器卷积神经网络模型，完成固态纳米孔测序电信号降噪处理神经网络模型的构建；Step S3, according to the set model training parameters, train the residual autoencoder convolutional neural network model with the goal of minimizing the loss function, and complete the construction of the solid-state nanopore sequencing electrical signal noise reduction processing neural network model;

其中，步骤S1搭建残差自编码器神经网络模型中，所述残差神经网络模型包括编码器部分和解码器部分；编码器部分包括多个残差编码模块，残差编码模块包括多个卷积层、多个激活层和多个批正则化层；解码器部分包括多个逆卷积层、多个激活层和多个批正则化层。更进一步地，解码器部分由多个逆卷积解码模块、一个卷积层和一个sigmoid激活层组成，逆卷积解码模块由多个逆卷积层、多个激活层和多个批正则化层组成。Among them, in step S1, a residual autoencoder neural network model is built. The residual neural network model includes an encoder part and a decoder part; the encoder part includes multiple residual coding modules, and the residual coding module includes multiple convolutions. Product layer, multiple activation layers and multiple batch regularization layers; the decoder part includes multiple deconvolution layers, multiple activation layers and multiple batch regularization layers. Furthermore, the decoder part consists of multiple inverse convolution decoding modules, a convolution layer and a sigmoid activation layer. The inverse convolution decoding module consists of multiple inverse convolution layers, multiple activation layers and multiple batch regularization layer composition.

所述残差神经网络模型编码器部分的残差编码模块包括多个卷积核大于1×1的卷积层和多个卷积核为1×1的卷积层。更进一步地，残差编码模块包括一个卷积核大小为3×3的卷积层和两个卷积核为1×1的卷积层；且所述残差编码模块的第一层和第三层卷积核大小为1×1，第二层卷积核大小为3×3，构建起的残差编码模块为瓶颈神经网络结构。在进一步的实施例中，所述残差编码模块包括一个捷径连接用于绕过瓶颈卷积网络结构直接对输入和输出进行恒等映射。The residual coding module of the encoder part of the residual neural network model includes a plurality of convolution layers with a convolution kernel larger than 1×1 and a plurality of convolution layers with a convolution kernel of 1×1. Furthermore, the residual coding module includes one convolution layer with a convolution kernel size of 3×3 and two convolution layers with a convolution kernel size of 1×1; and the first and third layers of the residual coding module are The size of the three-layer convolution kernel is 1×1, and the size of the second-layer convolution kernel is 3×3. The constructed residual coding module is a bottleneck neural network structure. In a further embodiment, the residual coding module includes a shortcut connection for directly performing identity mapping on the input and output, bypassing the bottleneck convolutional network structure.

步骤S2所述选取创建训练数据集且设置模型初始参数，步骤S2中所述训练数据集由干净的测序电信号和对应的噪声信号组成。进一步为采集空白固态纳米孔测序系统噪声，根据DNA碱基信号标准库构建模拟过孔信号数据集并与采集的固态纳米孔测序系统噪声叠加创建训练数据集，并设置所述残差自编码器卷积神经网络模型的训练参数。In step S2, select and create a training data set and set initial parameters of the model. The training data set in step S2 consists of clean sequencing electrical signals and corresponding noise signals. To further collect the noise of the blank solid-state nanopore sequencing system, a simulated via signal data set is constructed based on the DNA base signal standard library and superimposed with the collected solid-state nanopore sequencing system noise to create a training data set, and the residual autoencoder is set Training parameters for the convolutional neural network model.

步骤S3进一步根据所述残差自编码器卷积神经网络模型及其训练参数，以最小化损失函数为目标训练所述残差自编码器卷积神经网络模型形成信号降噪处理神经网络模型；步骤S3中的损失函数为均方误差函数：Step S3 further trains the residual autoencoder convolutional neural network model based on the residual autoencoder convolutional neural network model and its training parameters with the goal of minimizing the loss function to form a signal noise reduction processing neural network model; The loss function in step S3 is the mean square error function:

其中，X_i、Y_i分别为创建的所述训练集中的噪声信号和干净信号，θ为权重，n表示信号中的数据点数量，F(·)函数表示的是通过训练后得到的噪声信号到干净信号的映射。更进一步地，步骤S3中所述固态纳米孔测序电信号降噪处理的神经网络模型的权重的初始值由高斯随机函数生成，残差模块最后一个批正则化层的权重参数初始设定为零。 _Among _them , Mapping to clean signals. Furthermore, the initial value of the weight of the neural network model for the solid-state nanopore sequencing electrical signal denoising process described in step S3 is generated by a Gaussian random function, and the weight parameter of the last batch regularization layer of the residual module is initially set to zero. .

步骤S4进一步将待处理的噪声电信号进行预处理后输入到所述电信号降噪处理神经网络模型，输出降噪处理后的信号。Step S4 further preprocesses the noise electrical signal to be processed and inputs it into the electrical signal noise reduction processing neural network model, and outputs the noise reduction processed signal.

如图2所示，在此增加对步骤S1所搭建的残差自编码器卷积神经网络模型进一步说明。在本发明优选的实施例中，残差自编码器卷积神经网络的编码器部分由八个残差编码模块组成，每个残差编码模块包括一个三层的瓶颈卷积网络结构和一个捷径连接。而解码器部分由四个逆卷积解码模块、一个卷积层和一个sigmoid激活层组成，其中每个逆卷积模块由一个逆卷积层、一个激活层和一个批正则化层组成。As shown in Figure 2, further explanation of the residual autoencoder convolutional neural network model built in step S1 is added here. In the preferred embodiment of the present invention, the encoder part of the residual autoencoder convolutional neural network consists of eight residual coding modules. Each residual coding module includes a three-layer bottleneck convolution network structure and a shortcut connect. The decoder part consists of four inverse convolution decoding modules, a convolution layer and a sigmoid activation layer, where each inverse convolution module consists of an inverse convolution layer, an activation layer and a batch regularization layer.

如图3所示，对残差自编码器卷积神经网络模型中的残差编码模块进一步说明。残差编码模块总共包括三个卷积层，其中第一个卷积层的卷积核大小为1×1，通道数为模块通道数N_C；第二个卷积层的卷积核为3×3，通道数为模块通道数N_C；第三个卷积层的卷积核为1×1，通道数为模块通道数N_C的四倍，即4N_C。这样构建起的残差编码模块属于瓶颈神经网络结构，三个卷积层对应实现的功能分别是维度压缩、特征提取、维度恢复。而捷径连接则是用于绕过瓶颈卷积网络结构直接对输入和输出进行恒等映射。As shown in Figure 3, the residual encoding module in the residual autoencoder convolutional neural network model is further explained. The residual coding module includes a total of three convolutional layers. The convolutional kernel size of the first convolutional layer is 1×1 and the number of channels is the module channel number N _C ; the convolutional kernel of the second convolutional layer is 3 ×3, the number of channels is the number of module channels N _C ; the convolution kernel of the third convolution layer is 1 × 1, and the number of channels is four times the number of module channels N _C , that is, 4N _C . The residual coding module constructed in this way belongs to the bottleneck neural network structure, and the corresponding functions implemented by the three convolutional layers are dimension compression, feature extraction, and dimension recovery. The shortcut connection is used to bypass the bottleneck convolutional network structure and directly perform identity mapping on the input and output.

因此所述残差编码模块可表示为如下函数：Therefore, the residual coding module can be expressed as the following function:

其中X和Y分别代表输入和输出矩阵，W是残差编码模块内瓶颈神经网络内的权重参数变量，代表瓶颈神经网络结构学习到的映射变换，I(·)代表捷径连接的恒等映射。残差编码模块中的瓶颈神经网络结构即/>部分学习的是输入到输入与输出之间差异的映射。Among them, X and Y represent the input and output matrices respectively, and W is the weight parameter variable in the bottleneck neural network in the residual coding module. represents the mapping transformation learned by the bottleneck neural network structure, and I(·) represents the identity mapping of shortcut connections. The bottleneck neural network structure in the residual coding module is/> Part of what is learned is the mapping of inputs to the differences between inputs and outputs.

更进一步地，每一个卷积层处理之后的输出在输入下一个卷积层计算之前会通过ReLU激活层和正则化层的处理。其中，网络结构中的ReLU激活层可将卷积计算后数值低于0的神经元剔除，以筛选出更有意义的特征，提高网络的稳定性，有效的避免梯度爆炸的问题。Furthermore, the output after each convolutional layer is processed by the ReLU activation layer and the regularization layer before being input to the next convolutional layer for calculation. Among them, the ReLU activation layer in the network structure can eliminate neurons whose values are lower than 0 after convolution calculation to screen out more meaningful features, improve the stability of the network, and effectively avoid the problem of gradient explosion.

在本发明的方法中，所述方法的数据来源于固态纳米孔测序系统采集的DNA通过固态纳米孔时产生的阻塞电流信号。其中，训练数据集的构建是将设计的DNA标准序列进行固态纳米孔测序实验采集的标准序列过孔信号数据。另外也可基于标准序列过孔信号数据构建标准信号数据库，之后依据标准信号数据库生成大量的模拟信号数据用作训练数据。同时，模拟信号数据可以考虑叠加不同程度的系统噪声构建不同系统噪声影响的信号数据集用以训练能够处理不同噪声影响的神经网络模型。而正则化层运用BatchNormalization的方法对卷积层计算结果进行标准化，使其服从同一分布。BatchNormalization的引入可以使整个网络表示的函数更加平滑，有利于稳定网络的训练和后续优化。In the method of the present invention, the data of the method comes from the blocking current signal generated when the DNA collected by the solid-state nanopore sequencing system passes through the solid-state nanopore. Among them, the training data set is constructed by carrying out the designed DNA standard sequence and conducting the solid-state nanopore sequencing experiment to collect the standard sequence via signal data. In addition, a standard signal database can also be constructed based on standard sequence via signal data, and then a large amount of simulated signal data can be generated based on the standard signal database to be used as training data. At the same time, the simulated signal data can be superimposed with different levels of system noise to construct signal data sets affected by different system noises to train neural network models that can handle different noise effects. The regularization layer uses the BatchNormalization method to standardize the calculation results of the convolution layer so that they obey the same distribution. The introduction of BatchNormalization can make the function represented by the entire network smoother, which is beneficial to stable network training and subsequent optimization.

本发明实施例的步骤S2还包括：将所述噪声碱基序列电信号和干净碱基序列电信号经过重采样和维度转换为20×10的二维信号数据。这是因为通过维度转换和卷积计算的共同作用，更利于模型挖掘电流数据时间维度上的相关特征，一定程度上加快训练速度，提高噪声处理性能。而因为输入数据仅电流幅值数据，所以输入数据通道数为1。之后，第一个卷积层的通道数设为64，目的是对输入数据进行维度上的增加，提高神经网络模型对数据特征的学习能力。八个残差编码模块的模块通道数N_C依次设置为64、64、32、32、16、16、8、8，用于对信号中的特征进行挖掘提取。这样的处理可以有效地提高模型学习的准确性。对应的，八个解码模块的通道数依次设置为32、64、128、256，用于信号重构。神经网络学习的速率设定为0.001(在其他实施例中可设为0.1～0.001中的任意值，根据实际情况进行优化选择)。Step S2 of the embodiment of the present invention also includes: resampling and dimensionally converting the noise base sequence electrical signal and the clean base sequence electrical signal into 20×10 two-dimensional signal data. This is because the joint action of dimension conversion and convolution calculation is more conducive to the model mining relevant features in the time dimension of current data, speeding up training to a certain extent, and improving noise processing performance. Since the input data is only current amplitude data, the number of input data channels is 1. After that, the channel number of the first convolutional layer was set to 64, with the purpose of increasing the dimension of the input data and improving the neural network model's ability to learn data features. The number of module channels N _C of the eight residual coding modules is set to 64, 64, 32, 32, 16, 16, 8, and 8 in sequence, which is used to mine and extract features in the signal. Such processing can effectively improve the accuracy of model learning. Correspondingly, the channel numbers of the eight decoding modules are set to 32, 64, 128, and 256 in sequence for signal reconstruction. The learning rate of the neural network is set to 0.001 (in other embodiments, it can be set to any value from 0.1 to 0.001, and the optimization selection is made according to the actual situation).

在本实施例中，所述神经网络模型的权重的初始值由高斯随机函数生成，但每个残差模块最后一个批正则化层的权重参数初始设定为零，保证残差模块的残差分支初始生成结果为零，且残差编码模块初始将表现为输入到输出的恒等映射。In this embodiment, the initial value of the weight of the neural network model is generated by a Gaussian random function, but the weight parameter of the last batch regularization layer of each residual module is initially set to zero to ensure the residual score of the residual module. The initial generation result of the branch is zero, and the residual coding module will initially behave as an identity mapping from input to output.

步骤S3中，根据所述残差自编码器卷积神经网络模型及其训练参数，以最小化损失函数为目标训练所述残差自编码器卷积神经网络模型完成信号降噪处理神经网络模型的构建。In step S3, according to the residual autoencoder convolutional neural network model and its training parameters, the residual autoencoder convolutional neural network model is trained with the goal of minimizing the loss function to complete the signal denoising processing neural network model. of construction.

神经网络训练的优化器选择Adam优化器，Adam优化器在优化网络参数时，在计算迭代步长时将梯度的一阶动量(梯度的指数移动平均)和二阶动量(梯度平方的指数移动平均值)纳入计算考虑，这样可以保证每个时间步长进行迭代优化时，针对梯度的一阶动量和二阶动量进行自适应调节，有效地避免神经网络模型在优化时收敛至局部最优解，同时加快优化的整体效率。The optimizer for neural network training chooses the Adam optimizer. When optimizing the network parameters, the Adam optimizer combines the first-order momentum of the gradient (the exponential moving average of the gradient) and the second-order momentum (the exponential moving average of the squared gradient) when calculating the iteration step size. value) is taken into consideration in the calculation, which can ensure that the first-order momentum and second-order momentum of the gradient are adaptively adjusted during iterative optimization at each time step, effectively preventing the neural network model from converging to the local optimal solution during optimization. while accelerating the overall efficiency of optimization.

步骤S4中，将待处理的噪声电信号进行预处理后输入到所述电信号降噪处理神经网络模型，输出降噪处理后的信号。In step S4, the noise electrical signal to be processed is preprocessed and then input into the electrical signal noise reduction processing neural network model, and a noise reduction processed signal is output.

在步骤S2中可以选取创建不同噪声影响的固态纳米孔测序电信号数据集，并按照步骤S3中所述训练针对性处理对应噪声的残差自编码器神经网络模型。而在步骤S4中将待处理的噪声电信号输入到对应噪声影响的神经网络模型中，即可预测出对应的干净信号，输出噪声处理后的测序电信号。In step S2, you can choose to create solid-state nanopore sequencing electrical signal data sets affected by different noises, and train a residual autoencoder neural network model that specifically handles the corresponding noise as described in step S3. In step S4, the noise electrical signal to be processed is input into the neural network model corresponding to the influence of noise, and the corresponding clean signal can be predicted and the noise-processed sequencing electrical signal can be output.

根据本发明的信号降噪处理方法，通过训练的残差自编码器神经网络模型对噪声信号的处理速度极快，能够在大部分硬件条件下以低于0.1秒的时间内完成信号的降噪处理，在实际应用中拥有极大的优势，特别是在于需要进行实时降噪处理的场合。According to the signal denoising processing method of the present invention, the trained residual autoencoder neural network model can process noise signals very quickly, and can complete signal denoising in less than 0.1 seconds under most hardware conditions. Processing has great advantages in practical applications, especially in situations where real-time noise reduction processing is required.

相对于现有技术，本发明提供的固态纳米孔测序电信号降噪处理方法结合自编码器和卷积神经网络结构，在编码器部分采用了残差网络中的瓶颈模块构建了残差自编码器神经网络，借助卷积层的学习能力和ReLU激活层的筛选能力，对测序电信号噪声的特征进行深度学习，从训练集中学习到最具代表性的噪声特征，之后通过逆卷积层和sigmoid激活层的转换，建立起噪声信号到干净信号的准确映射，可以实现实时去噪。Compared with the existing technology, the solid-state nanopore sequencing electrical signal denoising processing method provided by the present invention combines the autoencoder and the convolutional neural network structure. In the encoder part, the bottleneck module in the residual network is used to construct the residual autoencoder. The machine neural network uses the learning ability of the convolutional layer and the screening ability of the ReLU activation layer to perform deep learning on the characteristics of sequencing electrical signal noise. It learns the most representative noise characteristics from the training set, and then uses the deconvolution layer and The conversion of the sigmoid activation layer establishes an accurate mapping of the noise signal to the clean signal, which can achieve real-time denoising.

在进一步的方案中，本发明还可以具有以下有益效果：残差瓶颈模块的瓶颈设计可以在不牺牲模型时间复杂度的前提下有效提高模型的深度。而恒等映射的捷径连接可以在不增加额外参数的情况下有效地避免由网络结构层数增加带来的梯度消失或梯度爆炸问题，在保证了模型的稳定性的同时享受到深层网络带来的准确率提升。同时，残差瓶颈模块选用了大小合适的卷积核，在不引入额外的池化层的情况下就能够达到优秀的噪声处理效果，有效地避免了池化层引入减少模型参数导致模型精确度不够、噪声处理效果变差的问题。In a further solution, the present invention can also have the following beneficial effects: the bottleneck design of the residual bottleneck module can effectively increase the depth of the model without sacrificing the time complexity of the model. The shortcut connection of identity mapping can effectively avoid the problem of gradient disappearance or gradient explosion caused by the increase in the number of network structure layers without adding additional parameters. It ensures the stability of the model while enjoying the benefits of deep networks. The accuracy rate is improved. At the same time, the residual bottleneck module uses convolution kernels of appropriate size, which can achieve excellent noise processing effects without introducing additional pooling layers, effectively avoiding the introduction of pooling layers that reduce model parameters and lead to model accuracy. The problem is that it is not enough and the noise processing effect becomes poor.

以上所述，仅为本发明专利较佳的具体实施方式，但本发明专利的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明专利实施例揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明专利的保护范围之内。因此，本发明专利的保护范围应该以权利要求的保护范围为准。The above are only preferred specific implementations of the patent of the present invention, but the scope of protection of the patent of the present invention is not limited thereto. Any person familiar with the technical field can, within the technical scope disclosed in the embodiments of the patent of the present invention, Any easily imagined changes or substitutions should be covered by the protection scope of the patent of the present invention. Therefore, the protection scope of the patent of the present invention should be subject to the protection scope of the claims.

Claims

1. A solid-state nanopore electrical signal noise reduction processing method based on a residual self-encoder convolutional neural network is characterized by comprising the following steps of: the method comprises the steps of introducing a self-encoder, a residual bottleneck module, a shortcut connection, a convolution layer, a batch regularization layer and an activation layer structure to construct a deep neural network model, training the model by utilizing a solid-state nanopore sequencing electric signal data set, accurately learning a characteristic mode of sequencing electric signal noise, establishing a mapping from a noise signal to a clean signal, and finally predicting and estimating the clean signal corresponding to the noise signal by utilizing the learned mapping;

which comprises the following steps:

step S1, constructing a residual error self-encoder convolutional neural network model;

s2, selecting and creating a training data set, and setting training parameters of the residual error self-encoder convolutional neural network model;

step S3, training the residual error self-encoder convolutional neural network model by taking a minimized loss function as a target according to set model training parameters, and completing construction of a neural network model for noise reduction processing of the solid-state nanopore sequencing electric signal;

s4, inputting the noise electric signal to be processed into a solid-state nanopore sequencing electric signal noise reduction processing neural network model, and outputting a noise reduction processed signal;

the residual self-encoder convolutional neural network model in the step S1 comprises an encoder part and a decoder part, wherein the encoder part comprises a plurality of residual coding modules, and the residual coding modules comprise a plurality of convolutional layers, a plurality of activating layers and a plurality of batch regularization layers; the decoder part consists of a plurality of deconvolution decoding modules, a convolution layer and a sigmoid activation layer, wherein the deconvolution decoding modules consist of a plurality of deconvolution layers, a plurality of activation layers and a plurality of batch regularization layers;

wherein the residual coding module comprises a plurality of convolution layers with convolution kernels larger than 1×1 and a plurality of convolution layers with convolution kernels of 1×1;

wherein, the residual coding module comprises a convolution layer with a convolution kernel size of 3×3 and two convolution layers with convolution kernels of 1×1; the convolution kernel sizes of the first convolution layer and the third convolution layer of the residual error coding module are 1 multiplied by 1, the convolution kernel size of the second convolution layer is 3 multiplied by 3, and the constructed residual error coding module is of a bottleneck neural network structure;

wherein, the loss function in step S3 is a mean square error function:

wherein X is _i 、Y _i And respectively creating a noise signal and a clean signal in the training set, wherein θ is a weight, n represents the number of data points in the signal, and F (-) is a mapping from the noise signal to the clean signal, which is obtained after training.

2. The method for noise reduction of solid state nanopore electrical signals based on residual self encoder convolutional neural network of claim 1, wherein the residual encoding module comprises a shortcut connection for directly mapping identities of inputs and outputs bypassing a bottleneck convolutional network structure.

3. The method for noise reduction processing of solid state nanopore electrical signals based on residual self encoder convolutional neural network of claim 1, wherein the deconvolution decoding module of the decoder portion consists of an deconvolution layer, an activation layer and a batch regularization layer.

4. The residual self-encoder convolutional neural network-based solid state nanopore electrical signal noise reduction processing method of claim 1, wherein the training dataset in step S2 consists of clean sequencing electrical signals and corresponding noise signals.

5. The solid state nanopore electrical signal noise reduction processing method based on the residual self encoder convolutional neural network according to claim 1, wherein in step S3, an initial value of a weight of a neural network model of the solid state nanopore sequencing electrical signal noise reduction processing is generated by a gaussian random function, and a weight parameter of a last batch regularization layer of a residual module is initially set to zero.