CN114077755B

CN114077755B - Controllable and lightweight federated learning method, system and detection method for privacy protection

Info

Publication number: CN114077755B
Application number: CN202210057267.9A
Authority: CN
Inventors: 孙知信; 徐玉华; 赵学健; 孙哲; 胡冰; 宫婧; 汪胡青
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2022-01-19
Filing date: 2022-01-19
Publication date: 2022-05-31
Anticipated expiration: 2042-01-19
Also published as: CN114077755A

Abstract

The invention discloses a controllable light federal learning method, a controllable light federal learning system and a controllable light federal learning detection method for protecting privacy. The subnet trains the initial model by using local data, and calculates a local model compression snapshot according to the adjustment factor. And determining a compression model of the optimal node as a reference, and performing first-round model aggregation and compression according to the snapshot and a set channel recovery threshold value to form a global compression model. And the subnetworks train the global compression model by using local data, directly aggregate the trained models of the subnetworks, and sequentially iterate until convergence. The data of each subnet training set does not need to be collected or stored, so that the privacy of local data is guaranteed, and the network communication load and the local calculation load are reduced through controllable light weight federal learning.

Description

Controllable and lightweight federated learning method, system and detection method for privacy protection

技术领域technical field

本发明涉及网络流量检测、网络隐私保护等网络安全技术领域，以及联邦学习、数据模型压缩等大数据技术领域，具体涉及保护隐私的可控轻量化联邦学习方法、系统和检测方法。The invention relates to network security technical fields such as network traffic detection and network privacy protection, and big data technical fields such as federated learning and data model compression, in particular to a controllable and lightweight federated learning method, system and detection method for protecting privacy.

背景技术Background technique

随着网络规模的不断扩大，网络流量不断增加，分布式网络流量检测技术也不断深化。不同的子网的训练数据往往不是独立同分布的，各本地数据自身的差异可能会导致局部和全局模型训练不当，造成大量的虚假预警。因此需要进行多检测器协作的分布式流量检测。跨域的协同检测过程可能需要每个涉及域的详细网络数据，然而直接将网络数据进行交互，会造成网络隐私泄露。现在已有研究团队利用联邦学习技术来实现网络流量检测，通过参数来代替数据本身进行交互从而保护本地网络隐私安全，例如已公开专利“基于联邦学习的流量识别方法及装置”（中国专利公开号CN111970277A）、“一种基于联邦学习的流量分类方法及系统”（中国专利公告号CN111865815B）。但是这些研究直接将各本地模型参数直接进行聚合，没有进行轻量化处理，然而神经网络深度学习模型规模较为庞大，当海量的模型参数直接传输时会给造成较大的网络负担，同时影响分布式网络协作流量检测的可扩展性。With the continuous expansion of the network scale, the network traffic continues to increase, and the distributed network traffic detection technology is also deepened. The training data of different subnets are often not independent and identically distributed, and the differences of each local data may lead to improper training of local and global models, resulting in a large number of false warnings. Therefore, it is necessary to perform distributed traffic detection with multi-detector cooperation. The cross-domain collaborative detection process may require detailed network data of each involved domain, but directly interacting with network data will result in network privacy leakage. At present, some research teams use federated learning technology to realize network traffic detection, and use parameters to replace the data itself to interact to protect local network privacy. CN111970277A), "A traffic classification method and system based on federated learning" (Chinese Patent Bulletin No. CN111865815B). However, these studies directly aggregate the local model parameters without lightweight processing. However, the scale of the neural network deep learning model is relatively large. When a large number of model parameters are directly transmitted, it will cause a large network burden and affect the distributed distribution. Scalability of Network Collaborative Traffic Inspection.

虽然也有团队对联邦学习中的模型压缩方法进行研究，例如专利“基于联邦学习的目标检测方法、装置、设备”（中国专利公开号CN112257774A），但是该研究首先通过全局服务器收集各分布式网络的数据来对模型进行压缩，而在这过程可能就会造成各子网数据隐私的泄露。Although some teams have also conducted research on model compression methods in federated learning, such as the patent "Object Detection Method, Device, and Equipment Based on Federated Learning" (Chinese Patent Publication No. CN112257774A), this research first collects the data of each distributed network through a global server. The data is used to compress the model, and in this process, the data privacy of each subnet may be leaked.

发明内容SUMMARY OF THE INVENTION

鉴于现有技术中存在可能会造成各子网数据隐私泄露的技术问题，本发明提供保护隐私的可控轻量化联邦学习方法、系统和检测方法。In view of the technical problems in the prior art that may cause data privacy leakage of each subnet, the present invention provides a controllable and lightweight federated learning method, system and detection method for protecting privacy.

实现上述目的，本发明提供如下技术方案：To achieve the above object, the present invention provides the following technical solutions:

第一方面，本发明提供保护隐私的可控轻量化联邦学习方法，包括：各子网节点基于本地训练集以及预设的模型及参数训练本地模型；利用设定的调节因子计算本地模型各层中需要剪枝的输出通道并生成本地模型的压缩快照；In the first aspect, the present invention provides a controllable and lightweight federated learning method that protects privacy, including: each subnet node trains a local model based on a local training set and a preset model and parameters; and calculates each layer of the local model by using a set adjustment factor. output channels that need to be pruned and generate compressed snapshots of the local model;

根据各子网节点的训练集中数据的数目确定最优子网节点，以最优子网节点的压缩快照为基准快照，根据基准快照、通道恢复阈值和除最优子网节点以外的其它子网节点的压缩快照确定全局快照；Determine the optimal sub-network node according to the number of data in the training set of each sub-network node, take the compressed snapshot of the optimal sub-network node as the benchmark snapshot, and use the benchmark snapshot, channel recovery threshold and other sub-network nodes except the optimal sub-network node The compressed snapshot of the node determines the global snapshot;

对所述其它子网节点的模型进行加权聚合形成聚合后的全局模型；根据全局快照，对聚合后的全局模型的各层输出通道进行修剪形成全局压缩模型；Weighted aggregation is performed on the models of the other sub-network nodes to form an aggregated global model; according to the global snapshot, the output channels of each layer of the aggregated global model are pruned to form a global compression model;

各子网节点利用本地训练集对全局压缩模型进行训练，直至各对子网节点训练后的模型进行加权聚合后的模型收敛，获得最终的聚合后的模型。Each sub-network node uses the local training set to train the global compression model, until the model after weighted aggregation of the models trained by each pair of sub-network nodes converges, and the final aggregated model is obtained.

进一步地，所述模型采用卷积神经网络模型。Further, the model adopts a convolutional neural network model.

再进一步地，所述利用设定的调节因子计算本地模型各层中需要剪枝的输出通道，包括同时满足以下公式的第i层的第c个输出通道确定为需要修剪的输出通道，Still further, the use of the set adjustment factor to calculate the output channels that need to be pruned in each layer of the local model, including the c -th output channel of the i -th layer that simultaneously satisfies the following formula is determined as the output channel that needs to be pruned,

其中

表示模型第i层的第c个输出通道通过激活函数输出的平均特征映射值；

表示模型第i层的所有输出通道通过激活函数输出的平均特征映射值的平均值；

，

为调节因子且均大于0，

表示模型第i层的第c个输出通道的平均零激活百分比，

为模型第层所有输出通道的平均零激活百分比的平均值。 in

Represents the average feature map value output by the c -th output channel of the i -th layer of the model through the activation function;

Represents the average of the average feature map values output by all output channels of the i -th layer of the model through the activation function;

,

is the adjustment factor and is greater than 0,

represents the average zero-activation percentage of the c -th output channel of the i -th layer of the model,

is the average of the mean zero-activation percentages for all output channels of the model layer.

再进一步地，所述压缩快照包括需要被修剪的输出通道是第几层神经网络，以及需要修剪的输出通道的ID号。Still further, the compressed snapshot includes the neural network layer number of the output channel to be trimmed, and the ID number of the output channel to be trimmed.

进一步地，以最优子网节点的压缩快照为基准快照，根据基准快照、通道恢复阈值和除最优子网节点以外的其它各子网节点的压缩快照确定全局快照，包括：在所述其它子网节点的压缩快照中对基准快照中所有需要被修剪的输出通道进行扫描，并统计被修剪的输出通道是否存在于其它子网节点的压缩快照中；Further, take the compressed snapshot of the optimal subnet node as the reference snapshot, and determine the global snapshot according to the reference snapshot, the channel recovery threshold and the compressed snapshots of other subnet nodes except the optimal subnet node, including: In the compressed snapshot of the subnet node, scan all the output channels that need to be pruned in the benchmark snapshot, and count whether the pruned output channels exist in the compressed snapshots of other subnet nodes;

当某个输出通道不存在于其它子网节点的压缩快照中，则记录相应的子网节点，当记录的子网节点数量大于设定的输出通道恢复阈值时，则将该输出通道在基准快照中进行删除；最后获得全局快照。When an output channel does not exist in the compressed snapshots of other subnet nodes, the corresponding subnet node will be recorded. When the number of recorded subnet nodes is greater than the set output channel recovery threshold, the output channel will be recorded in the benchmark snapshot. delete in ; finally get a global snapshot.

进一步地，通过以下公式确定最优子网节点：Further, the optimal subnet node is determined by the following formula:

其中

为第i个子网节点的模型的数据不均衡度，

为第i个子网节点的本地训练集的数据量占所有训练数据的比例，

为最优子网节点。 in

is the data imbalance degree of the model of the i -th subnet node,

is the proportion of the data volume of the local training set of the i -th subnet node to all training data,

is the optimal subnet node.

进一步地，判断加权聚合后的模型是否收敛的方法包括：确定各子网节点训练的模型的损失函数和损失函数标准差的平均值；Further, the method for judging whether the weighted aggregation model has converged comprises: determining the loss function of the model trained by each sub-network node and the average value of the standard deviation of the loss function;

各子网节点训练的模型的损失函数和与损失函数标准差的平均值都小于等于设定阈值，则确定对子网控制器训练后的模型进行加权聚合后的模型收敛。If the average value of the loss function and the standard deviation of the model trained by each subnet node is less than or equal to the set threshold, it is determined that the model trained by the subnet controller is converged after weighted aggregation.

第二方面，本发明提供保护隐私的可控轻量化联邦学习系统，包括数据层，子网控制层和全局控制层；In a second aspect, the present invention provides a controllable and lightweight federated learning system that protects privacy, including a data layer, a subnet control layer and a global control layer;

所述数据层，用于各子网进行数据转发通信；The data layer is used for each sub-network to perform data forwarding communication;

所述子网控制层设置多个子网控制器，所述全局控制层设置全局控制器；The sub-network control layer is provided with a plurality of sub-network controllers, and the global control layer is provided with a global controller;

所述全局控制器用于向所有子网控制器传输预设的模型及参数与模型压缩所需的调节因子；The global controller is used to transmit the preset model and parameters and adjustment factors required for model compression to all sub-network controllers;

所述子网控制器用于进行数据采集并进行特征提取形成本地训练集；接收全局控制器传输的模型、参数和所述调节因子；利用本地训练集、模型及参数训练本地模型，利用设定的调节因子计算本地模型各层中需要剪枝的输出通道并生成本地模型的压缩快照；各所述子网控制器将本地训练集中数据的数目、模型和压缩快照传输至全局控制器。The sub-network controller is used for data collection and feature extraction to form a local training set; receiving the model, parameters and the adjustment factor transmitted by the global controller; using the local training set, model and parameters to train the local model, using the set The adjustment factor calculates the output channels to be pruned in each layer of the local model and generates a compressed snapshot of the local model; each of the sub-network controllers transmits the number of data in the local training set, the model and the compressed snapshot to the global controller.

全局控制器根据各子网控制器获得的训练集中数据的数目确定最优子网控制器，以最优子网控制器的压缩快照为基准快照，根据基准快照、通道恢复阈值和除最优子网控制器以外的其它子网控制器生成的压缩快照确定全局快照；The global controller determines the optimal sub-network controller according to the number of data in the training set obtained by each sub-network controller, takes the compressed snapshot of the optimal sub-network controller as the benchmark snapshot, Compressed snapshots generated by other subnet controllers other than the network controller determine the global snapshot;

全局控制器对各子网控制器训练的模型进行加权聚合形成聚合后的全局模型；根据全局快照，对聚合后的全局模型的各层输出通道进行修剪形成全局压缩模型。The global controller performs weighted aggregation on the models trained by each sub-network controller to form an aggregated global model; according to the global snapshot, the output channels of each layer of the aggregated global model are pruned to form a global compression model.

进一步地，所述子网控制器利用设定的调节因子计算本地模型各层中需要剪枝的输出通道，包括同时满足以下公式的第i层的第c输出通道确定为需要修剪的通道，Further, the sub-network controller uses the set adjustment factor to calculate the output channels that need to be pruned in each layer of the local model, including the c -th output channel of the i -th layer that simultaneously satisfies the following formula is determined as the channel that needs to be pruned,

其中

，

为调节因子且均大于0，

表示模型第i层的第c个输出通道的平均零激活百分比，

为模型第i层所有输出通道的平均零激活百分比的平均值。 in

,

is the adjustment factor and is greater than 0,

is the average of the mean zero-activation percentages for all output channels of the i -th layer of the model.

本发明还提供了保护隐私的可控轻量化联邦学习检测方法，采用如第一方面提供的技术方案任意一种可能的实施方式所提供的保护隐私的可控轻量化联邦学习方法获得模型；The present invention also provides a privacy-protecting controllable lightweight federated learning detection method, and a model is obtained by using the privacy-protecting controllable lightweight federated learning method provided by any possible implementation of the technical solution provided in the first aspect;

输入采集获取的网络流量数据，利用最终获得的模型进行流量检测。Input the network traffic data obtained by collection, and use the finally obtained model to perform traffic detection.

本发明取得以下有益技术效果：本发明中无需进行各子网数据的收集，而是由各子网节点在本地进行模型训练与压缩，再对各子网节点的压缩模型进行处理与聚合形成能够体现全局数据特性的全局压缩模型。The present invention achieves the following beneficial technical effects: in the present invention, it is not necessary to collect the data of each sub-network, but each sub-network node performs model training and compression locally, and then processes and aggregates the compressed model of each sub-network node to form a A global compression model that embodies the characteristics of global data.

另外在此过程中，系统可以通过参数调整实现全局压缩模型的规模控制，从而实现可控的轻量化联邦学习，且全局控制器不用收集与掌握具体的流量隐私信息。In addition, in this process, the system can realize the scale control of the global compression model through parameter adjustment, so as to achieve controllable lightweight federated learning, and the global controller does not need to collect and master specific traffic privacy information.

通过最终训练的全局压缩模型对本地流量进行检测，整个过程中本地的训练数据都不需要进行传输交互，从而保护了分布式网络的本地数据隐私安全。The local traffic is detected through the final trained global compression model, and the local training data does not need to be transmitted and interacted in the whole process, thus protecting the local data privacy security of the distributed network.

附图说明Description of drawings

图1是具体实施例提供的支持隐私保护的可控轻量化联邦学习系统框架示意图；1 is a schematic diagram of a controllable lightweight federated learning system framework supporting privacy protection provided by a specific embodiment;

图2是具体实施例提供的支持隐私保护的可控轻量化联邦学习方法流程示意图。FIG. 2 is a schematic flowchart of a controllable and lightweight federated learning method supporting privacy protection provided by a specific embodiment.

具体实施方式Detailed ways

以下结合说明书附图和具体实施例对本发明做进一步说明。The present invention will be further described below with reference to the accompanying drawings and specific embodiments.

实施例1：保护隐私的可控轻量化联邦学习方法，方法流程如图2所示，包括：Example 1: A controllable and lightweight federated learning method to protect privacy, the method flow is shown in Figure 2, including:

各子网节点基于本地训练集以及全局节点预设的模型及参数训练本地模型；利用设定的调节因子计算本地模型各层中需要剪枝的输出通道并生成本地模型的压缩快照；Each subnet node trains the local model based on the local training set and the model and parameters preset by the global node; uses the set adjustment factor to calculate the output channels that need to be pruned in each layer of the local model and generate a compressed snapshot of the local model;

全局节点根据各子网节点的训练集中数据的数目确定最优子网节点，以最优子网节点的压缩快照为基准快照，根据基准快照、通道恢复阈值和除最优子网节点以外的其它子网节点的压缩快照确定全局快照；The global node determines the optimal sub-network node according to the number of data in the training set of each sub-network node, takes the compressed snapshot of the optimal sub-network node as the benchmark snapshot, and uses the benchmark snapshot, channel recovery threshold and other parameters except the optimal sub-network node. Compressed snapshots of subnet nodes determine global snapshots;

全局节点对各子网节点的模型进行加权聚合形成聚合后的全局模型；根据全局快照，对聚合后的全局模型的各层输出通道进行修剪形成全局压缩模型；The global node performs weighted aggregation of the models of each sub-network node to form an aggregated global model; according to the global snapshot, the output channels of each layer of the aggregated global model are pruned to form a global compression model;

各子网节点的交换机数据转发通信。全局节点向所有子网节点传输与部署初始化的卷积神经网络深度学习模型、参数与模型压缩所需的调节因子。子网节点利用本地训练数据对初始模型进行训练，并根据调节因子计算本地模型压缩快照。全局节点挑选出各子网节点中的最优子网节点，将最优子网节点的压缩模型为基准快照；根据基准快照、通道恢复阈值和各子网节点的压缩快照确定全局快照；The switches of each subnet node forward the communication. The global node transmits and deploys the initialized convolutional neural network deep learning model, parameters and adjustment factors required for model compression to all subnet nodes. The subnet nodes use the local training data to train the initial model, and calculate the local model compressed snapshot according to the adjustment factor. The global node selects the optimal sub-network node in each sub-network node, and takes the compression model of the optimal sub-network node as the benchmark snapshot; determines the global snapshot according to the benchmark snapshot, the channel recovery threshold and the compressed snapshot of each sub-network node;

全局节点对各子网节点的模型进行加权聚合形成聚合后的全局模型；根据全局快照以及通道恢复阈值，对聚合后的全局模型的各层输出通道进行修剪形成全局压缩模型；The global node performs weighted aggregation of the models of each subnet node to form an aggregated global model; according to the global snapshot and channel recovery threshold, the output channels of each layer of the aggregated global model are pruned to form a global compression model;

子网节点再利用本地数据对全局压缩模型进行训练，全局节点再将各子网节点训练后的模型直接聚合，依次迭代直至聚合后的模型收敛。由于无需收集各子网训练集数据或存储训练集数据从而保证了本地数据的隐私性，并通过可控轻量化联邦学习减少网络通信负担与本地计算负载。本实施例提供的联邦学习方法具体包括以下步骤：The sub-network node uses the local data to train the global compression model, and the global node directly aggregates the models trained by each sub-network node, and iterates in turn until the aggregated model converges. Since there is no need to collect training set data of each subnet or store training set data, the privacy of local data is ensured, and the network communication burden and local computing load are reduced through controllable and lightweight federated learning. The federated learning method provided in this embodiment specifically includes the following steps:

步骤1，全局节点预先确定初始化的深度学习模型、参数与控制模型规模的压缩参数，即调节因子。Step 1, the global node pre-determines the initialized deep learning model, parameters and compression parameters that control the scale of the model, that is, the adjustment factor.

步骤2，子网节点通过本地交换机进行数据采集，并进行特征提取形成本地训练集。子网节点利用本地训练数据集，对初始化的模型进行训练。在第一轮模型聚合通信前，为了减少冗余的模型参数提高通信效率，子网节点根据预设的调节因子，计算本地训练的深度学习网络模型各层中需要剪枝的输出通道，并同时生成各压缩快照，压缩快照中记录各个子网模型中具体的剪枝通道。In step 2, the subnet nodes collect data through the local switch, and perform feature extraction to form a local training set. The subnet nodes use the local training dataset to train the initialized model. Before the first round of model aggregation communication, in order to reduce redundant model parameters and improve communication efficiency, the subnet nodes calculate the output channels that need to be pruned in each layer of the locally trained deep learning network model according to preset adjustment factors, and at the same time Each compressed snapshot is generated, and the specific pruning channel in each sub-network model is recorded in the compressed snapshot.

本实施例中，模型采用卷积神经网络。In this embodiment, the model adopts a convolutional neural network.

其中步骤2中，各子网节点计算需要剪枝的输出通道形成压缩快照的方法如下：In step 2, the method for each subnet node to calculate the output channel that needs to be pruned to form a compressed snapshot is as follows:

卷积神经网络中使用多个卷积层，卷积层包含多个卷积核即为多个输出通道，输入训练实例数据与卷积层的卷积核进行卷积运算，得到的值通过激活函数ReLU（RectifiedLinear Units）函数输出多个特征映射值。各输出通道通过激活函数输出的平均特征映射值与平均零激活百分比的均值作为的阈值来进行可控压缩。The convolutional neural network uses multiple convolutional layers. The convolutional layer contains multiple convolution kernels, which are multiple output channels. The input training instance data is convolutional with the convolutional kernel of the convolutional layer. The obtained value is activated by The function ReLU (RectifiedLinear Units) function outputs multiple feature map values. Each output channel performs controllable compression by using the mean value of the average feature map output by the activation function and the mean zero activation percentage as a threshold.

定义平均零激活百分比(APZ)来衡量各层输出通道神经元通过ReLU映射后的零激活百分比。令

表示第i层的第c个输出通道通过ReLU函数后的输出值。那么第i层的第c 个输出通道的平均零激活百分比

则表示为如下公式： The average percentage of zero activations (APZ) is defined to measure the percentage of zero activations of output channel neurons in each layer after ReLU mapping. make

Represents the output value of the c -th output channel of the i -th layer after passing through the ReLU function. Then the average zero activation percentage of the cth output channel of the ith layer

It is expressed as the following formula:

为第k个训练实例在模型第i层的第c个输出通道通过ReLU函数后输出的第j个特征映射值。其中，若Relu映射后的值为0则

为1，否则

的值为0。M代表

输出特征映射的总数。N代表训练实例的总数。

For the kth training instance, the jth feature map value output by the cth output channel of the ith layer of the model after passing through the ReLU function. Among them, if the value after Relu mapping is 0, then

is 1, otherwise

value is 0. M is for

The total number of output feature maps. N represents the total number of training instances.

为了判别卷积神经网络第i层的某个输出通道的参数冗余性是否过大，可以将该层的

作为阈值进行比较，

为模型第i层所有输出通道的平均零激活百分比的平均值，

的计算公式如下，其中H为该层的输出通道的数目。 In order to determine whether the parameter redundancy of an output channel of the i -th layer of the convolutional neural network is too large, the

as a threshold for comparison,

is the average of the average zero-activation percentages of all output channels of the i -th layer of the model,

The calculation formula is as follows, where H is the number of output channels of this layer.

虽然可以通过平均零激活百分比来衡量网络每层输出通道的冗余度，同时也需要衡量各输出通道的贡献度。因此还需计算每层输出通道通过ReLU函数之后的平均特征映射值。

表示模型第i层的第c个输出通道通过ReLU函数输出的平均特征映射值，表示如下： Although the redundancy of the output channels of each layer of the network can be measured by the average zero activation percentage, it is also necessary to measure the contribution of each output channel. Therefore, it is also necessary to calculate the average feature map value of each layer output channel after passing through the ReLU function.

Represents the average feature map value output by the c -th output channel of the i -th layer of the model through the ReLU function, which is expressed as follows:

其中

为第k个训练实例在模型第i层的第c个输出通道通过ReLU函数后输出的第j个特征映射值。 in

For the kth training instance, the jth feature map value output by the cth output channel of the ith layer of the model after passing through the ReLU function.

当

越大时，说明该输出通道的权值贡献度就越大，对于数据分类的影响就越大。 when

The larger the value, the greater the weight contribution of the output channel, and the greater the impact on data classification.

为了保留有用的输出通道，可以将

与该层的

作为阈值进行比较，

表示模型第层的所有输出通道通过ReLU函数输出的平均特征映射值的平均值，

的计算公式如下。 In order to preserve useful output channels, the

with the layer

as a threshold for comparison,

Represents the average of the average feature map values output by all output channels of the model layer through the ReLU function,

The calculation formula is as follows.

在计算本地模型需要剪枝的输出通道时，可将平均零激活百分比较高，而权值贡献度较低的输出通道剪枝，剪枝条件如以下公式。其中

，

为调节因子且均大于0，通过调整

，

的值可以调整压缩模型的大小。 When calculating the output channels that the local model needs to prune, the output channels with a higher average zero activation percentage and a lower weight contribution can be pruned. The pruning conditions are as follows. in

,

is the adjustment factor and is greater than 0, by adjusting

,

The value of can adjust the size of the compressed model.

同时满足以上两个公式的第i层的第c输出通道确定为需要修剪的通道。The c -th output channel of the i -th layer that satisfies the above two formulas at the same time is determined as the channel that needs to be trimmed.

然后通过key-value形式的快照，记录本地训练模型需要剪枝的输出通道，其中key记录第几层神经网络，value记录需要修剪的输出通道的ID号。Then, through the snapshot in the form of key-value, the output channel that needs to be pruned for the local training model is recorded, where the key records the number of layers of the neural network, and the value records the ID number of the output channel that needs to be pruned.

步骤3，全局节点利用各子网训练集各类数据的数目计算各子网训练性能，并选出最优子网节点。以最优子网节点的压缩快照为基准快照，并设置通道恢复阈值，然后对除最优子网节点的其它所有子网节点的压缩快照进行扫描，当基准快照中某个被修剪的输出通道不存在于其它节点的快照中，且这些节点数量大于设定的通道恢复阈值时，则将该输出通道在基准快照中进行删除，形成全局快照。其次，全局控制器对子网控制器的训练模型进行加权聚合形成全局模型，然后根据全局快照，对聚合的全局模型的各层输出通道进行修剪形成全局压缩模型。Step 3, the global node calculates the training performance of each subnet using the number of various types of data in each subnet training set, and selects the optimal subnet node. Take the compressed snapshot of the optimal subnet node as the baseline snapshot, and set the channel recovery threshold, and then scan the compressed snapshots of all other subnet nodes except the optimal subnet node. When a pruned output channel in the baseline snapshot If it does not exist in the snapshots of other nodes, and the number of these nodes is greater than the set channel recovery threshold, the output channel will be deleted in the benchmark snapshot to form a global snapshot. Secondly, the global controller performs weighted aggregation on the training model of the sub-network controller to form a global model, and then prunes the output channels of each layer of the aggregated global model according to the global snapshot to form a global compression model.

其中步骤3中，根据子网节点的训练集数据总量情况和数据不均衡度计算各子网训练集性能，根据各子网训练集性能选出最优子网节点,具体步骤如下：In step 3, the performance of each subnet training set is calculated according to the total amount of training set data of the subnet nodes and the degree of data imbalance, and the optimal subnet node is selected according to the performance of each subnet training set. The specific steps are as follows:

当第i个子网节点的本地训练集的数据量占所有训练数据的比例

越大，且第i 个子网节点的数据不均衡度

越小，该子网节点的模型的精度就有可能越高。

与

的计算公式如下： When the data volume of the local training set of the i -th subnet node accounts for the proportion of all training data

is larger, and the data imbalance degree of the i -th subnet node

The smaller it is, the higher the accuracy of the model for that subnet node is likely to be.

and

The calculation formula is as follows:

其中，

为第i个子网节点训练数据数量，

为第j个子网控制器训练数据数量，n 为数据的种类数目；

为第i个子网节点中第j种类别数据的数目；

为第i个子网控制器中各类数据的平均数目。 in,

The number of training data for the ith subnet node,

is the number of training data for the jth subnet controller, and n is the number of data types;

is the number of the jth category data in the ith subnet node;

is the average number of various types of data in the i -th subnet controller.

因而，可以集合各节点的训练数据数量占所有训练数据总量的比例和数据不均衡度通过以下公式找到最优节点，

表示第i个节点训练数据的性能评估值。但是最优子网节点的压缩模型结构只能体现其所对应的子网的训练数据特性。 Therefore, the ratio of the number of training data of each node to the total amount of all training data and the data imbalance degree can be found by the following formula to find the optimal node,

Represents the performance evaluation value of the i -th node training data. However, the compression model structure of the optimal subnet node can only reflect the training data characteristics of the corresponding subnet.

其中步骤3中，以最优子网节点的压缩模型为基准，进行模型聚合与压缩，具体步骤如下：In step 3, model aggregation and compression are performed based on the compression model of the optimal subnet node. The specific steps are as follows:

首先，将最优节点的压缩快照作为基准快照，对基准快照中所有需要被修剪的输出通道，在其它子网节点的压缩快照中进行扫描，并统计被修剪的输出通道是否存在于其它子网节点的压缩快照中。当基准快照中某个被修剪的输出通道不存在于其它子网节点的压缩快照中，则进行记录，当这些子网节点数量大于设定的通道恢复阈值Z时（

，K 为所有的子网控制器节点数目），则将该输出通道在基准快照中进行删除，获得全局快照。而当设定值Z越大时，需要从基准快照中删除的通道数会越小。 First, take the compressed snapshot of the optimal node as the benchmark snapshot, scan all the output channels that need to be pruned in the benchmark snapshot, scan the compressed snapshots of other subnet nodes, and count whether the pruned output channels exist in other subnets in the compressed snapshot of the node. When a pruned output channel in the benchmark snapshot does not exist in the compressed snapshots of other subnet nodes, it is recorded. When the number of these subnet nodes is greater than the set channel recovery threshold Z (

, K is the number of all subnet controller nodes), then delete the output channel in the benchmark snapshot to obtain a global snapshot. When the set value Z is larger, the number of channels that need to be deleted from the reference snapshot will be smaller.

然后，利用加权平均公式，对所有子网节点训练的模型的权重进行聚合，计算出全局模型。Then, using the weighted average formula, the weights of the models trained by all subnet nodes are aggregated to calculate the global model.

为第i个子网控制器训练数据数量，

表示第1轮聚合的第i个子网控制器的模型权值。

the number of training data for the ith subnet controller,

Represents the model weights of the ith subnetwork controller of the first round of aggregation.

其次，根据全局快照，对全局模型的各层输出通道进行剪枝形成全局压缩模型。Secondly, according to the global snapshot, the output channels of each layer of the global model are pruned to form a global compression model.

因而系统管理员可以通过设定的调节因子、通道恢复阈值来控制全局压缩模型规模大小。Therefore, the system administrator can control the scale of the global compression model through the set adjustment factor and channel recovery threshold.

步骤4，各子网节点利用本地数据对全局压缩模型进行训练，当到达规定训练次数，计算各子网模型t次训练的损失函数值的均值

与标准差

。 Step 4: Each subnet node uses local data to train the global compression model. When the specified number of training times is reached, calculate the mean value of the loss function value of each subnet model trained for t times.

with standard deviation

.

其中

为第k个子网控制器第j轮的t次训练的损失函数值的标准差，

为第i 次训练的损失函数，

是第j轮的t次训练的损失函数平均值。 in

is the standard deviation of the loss function value of the kth subnetwork controller in the jth round of the t training,

is the loss function of the i -th training,

is the average value of the loss function for t training sessions in the jth round.

步骤5，对各子网节点训练的模型进行加权聚合，然后计算本轮聚合通信的各子网节点训练的模型的损失函数和S、损失函数标准差的平均值

，计算公式如下（其中K是子网节点的个数）： Step 5: Perform weighted aggregation on the models trained by each sub-network node, and then calculate the loss function and S of the model trained by each sub-network node of this round of aggregated communication, and the average value of the standard deviation of the loss function

, the calculation formula is as follows (where K is the number of subnet nodes):

若各子网节点的模型的损失函数和S、损失函数标准差的平均值

都小于等于设定阈值，则聚合后的全局模型收敛；，否则子网节点重复步骤4。 If the loss function and S of the model of each subnet node, the average value of the standard deviation of the loss function

If both are less than or equal to the set threshold, the aggregated global model converges; otherwise, the subnet node repeats step 4.

可选地，还包括步骤6，各子网节点利用本地数据对收敛的全局模型进行微调，获得微调后的最新模型。Optionally, step 6 is also included, where each sub-network node uses local data to fine-tune the converged global model to obtain the latest fine-tuned model.

实施例2：与实施例1相对应地，本实施例提供保护隐私的可控轻量化联邦学习系统，该系统框架如图1所示，分为数据层，子网控制层和全局控制层。在数据层，各子网的交换机数据转发通信。而子网层控制器主要负责对本地的交换机进行管理，并对本地数据(本实施例采用网络流量数据)进行检测，彼此之间并互不通信，子网控制器也不会向外传输本地数据以防止隐私泄露。在全局层，全局控制器向所有子网控制器传输与部署初始化的流量检测卷积神经网络深度学习模型、参数与模型压缩所需的调节因子。子网控制器利用本地流量训练数据对初始模型进行训练，并根据调节因子计算本地模型压缩快照。全局控制器挑选出最优节点的压缩模型为基准快照；根据基准快照与设定的通道恢复阈值进行首轮模型聚合与压缩形成全局压缩模型，并将全局压缩模型发送至子网控制器。子网控制器再利用本地数据对全局压缩模型进行训练，全局控制器再直接聚合，依次迭代直至收敛。子网控制器利用收敛模型进行本地检测，由于全局控制器无需收集各子网训练集数据或存储训练集数据从而保证了本地数据的隐私性，并通过可控轻量化联邦学习减少网络通信负担与本地计算负载。本实施例包括以下：Embodiment 2: Corresponding to Embodiment 1, this embodiment provides a controllable and lightweight federated learning system that protects privacy. The system framework is shown in Figure 1, which is divided into a data layer, a subnet control layer and a global control layer. At the data layer, the switches of each subnet forward traffic. The subnet layer controller is mainly responsible for managing the local switches and detecting local data (the network traffic data is used in this embodiment). They do not communicate with each other, and the subnet controller will not transmit local data to prevent privacy leaks. At the global layer, the global controller transmits and deploys the initialized traffic detection convolutional neural network deep learning model, parameters and adjustment factors required for model compression to all subnet controllers. The subnet controller uses the local traffic training data to train the initial model and computes a local model compressed snapshot based on the adjustment factor. The global controller selects the compression model of the optimal node as the benchmark snapshot; performs the first round of model aggregation and compression according to the benchmark snapshot and the set channel recovery threshold to form a global compression model, and sends the global compression model to the subnet controller. The sub-network controller uses the local data to train the global compression model, and the global controller directly aggregates and iterates until convergence. The subnet controller uses the convergence model for local detection. Since the global controller does not need to collect the training set data of each subnet or store the training set data, the privacy of the local data is guaranteed, and the controllable lightweight federated learning reduces the network communication burden and Local computing load. This embodiment includes the following:

1）子网控制器在全局控制器中进行注册，由全局控制器对子网控制器进行统一管理，然后全局控制器向所有子网控制器传输与部署初始化的流量检测深度学习模型参数与控制模型规模的压缩参数。1) The subnet controller is registered in the global controller, the global controller manages the subnet controller uniformly, and then the global controller transmits and deploys the initialized traffic detection deep learning model parameters and control to all subnet controllers Compression parameter for model scale.

2）子网控制器通过本地交换机进行流量数据采集，并进行特征提取形成本地流量检测训练集。子网控制器利用本地训练数据集，对初始化的检测模型进行训练。在第一轮模型聚合通信前，为了减少冗余的模型参数提高通信效率，子网控制器根据接收的压缩参数，计算本地训练的深度学习网络模型各层中需要剪枝的输出通道，并同时生成各压缩快照，压缩快照中记录各个子网模型中具体的剪枝输出通道。本地控制器将本地各类训练数据的数目、各子网控制器训练模型、剪枝快照传输至全局控制器。2) The subnet controller collects traffic data through the local switch, and performs feature extraction to form a local traffic detection training set. The subnet controller uses the local training dataset to train the initialized detection model. Before the first round of model aggregation communication, in order to reduce redundant model parameters and improve communication efficiency, the subnet controller calculates the output channels that need to be pruned in each layer of the locally trained deep learning network model according to the received compression parameters, and at the same time Each compressed snapshot is generated, and the specific pruning output channel in each sub-network model is recorded in the compressed snapshot. The local controller transmits the number of local training data, the training model of each sub-network controller, and the pruning snapshot to the global controller.

其中各子网控制器计算需要剪枝的输出通道形成压缩快照的方法如下：卷积神经网络中使用多个卷积层，卷积层包含多个卷积核即为多个输出通道，输入训练实例数据与卷积层的卷积核进行卷积运算，得到的值通过激活函数ReLU（Rectified Linear Units）函数输出多个特征映射值。各输出通道通过激活函数输出的平均特征映射值与平均零激活百分比的均值作为的阈值来进行可控压缩。The method for each sub-network controller to calculate the output channel that needs to be pruned to form a compressed snapshot is as follows: multiple convolutional layers are used in the convolutional neural network, and the convolutional layer contains multiple convolution kernels, which are multiple output channels, and the input training The instance data is convolved with the convolution kernel of the convolution layer, and the obtained value outputs multiple feature map values through the activation function ReLU (Rectified Linear Units) function. Each output channel performs controllable compression by using the mean value of the average feature map output by the activation function and the mean zero activation percentage as a threshold.

定义平均零激活百分比(APZ)来衡量各层通道神经元通过ReLU映射后的零激活百分比。令

表示第i层的第c个输出通道通过ReLU函数后的输出值。那么第i层的第c个通道的平均零激活百分比

表示为如下公式： The average percentage of zero activation (APZ) is defined to measure the percentage of zero activation of each layer of channel neurons after ReLU mapping. make

Represents the output value of the c -th output channel of the i -th layer after passing through the ReLU function. Then the average zero activation percentage of the cth channel of the ith layer

It is expressed as the following formula:

为第k个训练实例在模型第i层的第c个输出通道通过ReLU函数后输出的第j 个特征映射值。其中，若Relu映射后的值为0则

为1，否则

的值为0。M代表

输出特征映射的总数。N代表训练实例的总数。

is 1, otherwise

value of 0. M is for

作为阈值进行比较，

为模型第i层所有输出通道的平均零激活百分比的平均值,

的计算公式如下，其中H为该层的通道数目。 In order to determine whether the parameter redundancy of an output channel of the i -th layer of the convolutional neural network is too large, the

as a threshold for comparison,

The calculation formula is as follows, where H is the number of channels in the layer.

虽然可以通过平均零激活百分比来衡量网络每层通道的冗余度，同时也需要衡量各通道的贡献度。因此还需计算每层输出通道通过ReLU函数之后的平均特征映射值。Although the redundancy of channels in each layer of the network can be measured by the average zero activation percentage, the contribution of each channel also needs to be measured. Therefore, it is also necessary to calculate the average feature map value of each layer output channel after passing through the ReLU function.

表示模型第i层的第c个输出通道通过ReLU函数输出的平均特征映射值，表示如下：

Represents the average feature map value output by the c -th output channel of the i -th layer of the model through the ReLU function, expressed as follows:

其中

当

越大时，说明该输出通道的权值贡献度就越大，对于流量分类的影响就越大。 when

The larger the value, the greater the weight contribution of the output channel, and the greater the impact on traffic classification.

为了保留有用的输出通道，可以将

与该层的

作为阈值进行比较，

表示模型第i层的所有输出通道通过ReLU函数输出的平均特征映射值的平均值，

的计算公式如下。 In order to preserve useful output channels, the

with the layer

as a threshold for comparison,

Represents the average of the average feature map values output by all output channels of the i -th layer of the model through the ReLU function,

The calculation formula is as follows.

在计算本地模型需要剪枝的输出通道时，可将平均零激活百分比较高，而权值贡献度较低的通道剪枝，剪枝条件如以下公式。其中

、

为调节因子且均大于0，通过调整

，

的值可以调整压缩模型的大小。 When calculating the output channels that need to be pruned in the local model, the channels with higher average zero activation percentage and lower weight contribution can be pruned. The pruning conditions are as follows. in

,

is the adjustment factor and is greater than 0, by adjusting

,

The value of can adjust the size of the compressed model.

然后通过key-value形式的快照，记录本地训练模型需要剪枝的输出通道，其中key记录第几层神经网络，value记录需要修剪的输出通道的ID号。然后，子网控制器将剪枝快照、子网控制器训练后模型参数矩阵、本地各类训练数据数目、模型训练后的损失函数值传输至全局控制器。Then, through the snapshot in the form of key-value, the output channel that needs to be pruned for the local training model is recorded, where the key records the number of layers of the neural network, and the value records the ID number of the output channel that needs to be pruned. Then, the subnet controller transmits the pruning snapshot, the model parameter matrix after training by the subnet controller, the number of local training data of various types, and the loss function value after model training to the global controller.

3）在首轮聚合通信时，全局控制器接收到各子网控制器的训练模型、快照和训练数据的数目参数后，利用各子网训练集各类数据的数目计算各子网训练性能，并选出最优子网控制器。以最优子网控制器的压缩快照为基准，并设置输出通道恢复阈值，然后对除最优子网的其它所有子网的压缩快照进行扫描，当基准快照中某个被修剪的输出通道不存在于其它节点的快照中，且这些节点数量大于设定的通道恢复阈值时，则将该输出通道在基准快照中进行删除，形成全局快照。其次，全局控制器对子网控制器的训练模型进行加权聚合形成全局模型，然后根据全局快照，对聚合的全局模型的各层输出通道进行修剪形成全局压缩模型，最后将全局压缩模型发送至各子网控制器。3) In the first round of aggregated communication, after the global controller receives the training model, snapshot and number of training data parameters of each subnet controller, it calculates the training performance of each subnet using the number of various types of data in each subnet training set. And select the optimal subnet controller. Take the compressed snapshot of the optimal subnet controller as the benchmark, and set the output channel recovery threshold, and then scan the compressed snapshots of all other subnets except the optimal subnet. When a pruned output channel in the benchmark snapshot is not If it exists in the snapshots of other nodes, and the number of these nodes is greater than the set channel recovery threshold, the output channel will be deleted in the benchmark snapshot to form a global snapshot. Secondly, the global controller performs weighted aggregation of the training model of the sub-network controller to form a global model, and then prunes the output channels of each layer of the aggregated global model according to the global snapshot to form a global compressed model, and finally sends the global compressed model to each Subnet Controller.

其中3）中，全局控制器接收到全部子网控制器节点数据总量情况和数据不均衡度计算各子网训练集性能，根据各子网训练集性能选出最优子网控制器节点,具体步骤如下：In 3), the global controller receives the total amount of data of all subnet controller nodes and the data imbalance to calculate the performance of each subnet training set, and selects the optimal subnet controller node according to the performance of each subnet training set. Specific steps are as follows:

当某子网控制器节点的训练数据量占所有训练数据的比例

越大，且第i个子网节点的数据不均衡度

越小，该节点的子网模型的精度就有可能越高。

与

的计算公式如下： When the training data of a certain subnet controller node accounts for the proportion of all training data

is larger, and the data imbalance degree of the i -th subnet node

The smaller it is, the higher the accuracy of the subnet model for that node is likely to be.

and

The calculation formula is as follows:

其中，

为第i个子网控制器训练数据数量；n为数据的种类数目；

为第i个子网控制器中第j种类别数据的数目；

为第i个子网控制器中各类数据的平均数目。 in,

is the number of training data for the ith subnet controller; n is the number of data types;

is the number of the jth category data in the ith subnet controller;

is the average number of various types of data in the i -th subnet controller.

因而，全局控制器可以集合各节点的训练数据数量占所有训练数据总量和数据不均衡度通过以下公式找到最优节点，

表示第i个节点训练数据的性能评估值。但是最优节点的子网压缩模型结构只能体现其所对应的子网的训练数据特性。 Therefore, the global controller can collect the number of training data of each node to account for the total amount of training data and the data imbalance degree to find the optimal node by the following formula:

Represents the performance evaluation value of the i -th node training data. However, the sub-network compression model structure of the optimal node can only reflect the training data characteristics of its corresponding sub-network.

其中全局控制器以最优子网控制器的压缩模型为基准，进行模型聚合与压缩，具体步骤如下：The global controller uses the compression model of the optimal subnet controller as the benchmark to perform model aggregation and compression. The specific steps are as follows:

首先，全局控制器将最优解点的快照最为基准，对基准的快照中所有需要被修剪的输出通道，在其它子网控制器的快照中进行扫描，并统计被修剪的输出通道是否存在于其它子网控制器的快照中。当基准快照中某个被修剪的输出通道不存在于其它节点的快照中，则进行记录，当这些节点数量大于设定的通道恢复阈值是Z时（

，K为所有的子网控制器节点数目），则将该输出通道在基准快照中进行删除，获得全局快照。而当设定值Z 越大时，需要从基准快照中删除的通道数会越小。 First, the global controller takes the snapshot of the optimal solution point as the benchmark, scans all the output channels that need to be pruned in the snapshot of the benchmark, scans the snapshots of other subnet controllers, and counts whether the pruned output channels exist in the in snapshots of other subnet controllers. When a pruned output channel in the benchmark snapshot does not exist in the snapshots of other nodes, it is recorded, when the number of these nodes is greater than the set channel recovery threshold is Z (

然后，全局控制器对利用加权平均公式，对所有子网控制器的训Then, the global controller pair uses the weighted average formula to train all sub-network controllers.

练模型的权重进行聚合，计算出全局模型。The weights of the training model are aggregated to calculate the global model.

为第i个子网控制器训练数据数量，

表示第1轮聚合的第i个子网控制器的模型权值。

the number of training data for the ith subnet controller,

其次，根据全局快照，对全局模型的各层输出通道进行剪枝形成全局压缩模型，并将全局压缩模型发送至各子网控制器。Secondly, according to the global snapshot, the output channels of each layer of the global model are pruned to form a global compressed model, and the global compressed model is sent to each sub-network controller.

因而系统管理员可以通过设定压缩参数、通道恢复阈值来控制全局压缩模型规模大小。Therefore, system administrators can control the size of the global compression model by setting compression parameters and channel recovery thresholds.

4）子网控制器利用本地数据对全局压缩模型进行训练，当到达规定训练次数，计算各子网模型t次训练的损失函数值的均值

与标准差

。 4) The sub-network controller uses local data to train the global compression model. When the specified number of training times is reached, the mean value of the loss function value of each sub-network model trained for t times is calculated.

with standard deviation

.

其中

为第k个子网控制器第j轮的t次训练的损失函数值的标准差，

为第i 次训练的损失函数，

是第j轮的t次训练的损失函数平均值。然后，子网控制器将损失函数值的均值、损失函数值的标准差、本地训练模型发送给全局控制器。 in

is the loss function of the i -th training,

is the average value of the loss function for t training sessions in the jth round. Then, the sub-network controller sends the mean of the loss function value, the standard deviation of the loss function value, and the locally trained model to the global controller.

5）为了使各子网模型的收敛达到平衡状态，全局控制器对各子网控制器的训练模型进行加权聚合，然后计算本轮聚合通信的各子网训练模型的损失函数和S、损失函数标准差的平均值

，计算公式如下（其中K是子网控制器的个数）： 5) In order to make the convergence of each sub-network model reach a balanced state, the global controller performs weighted aggregation on the training models of each sub-network controller, and then calculates the loss function and S , loss function of each sub-network training model of this round of aggregated communication mean of standard deviation

, the calculation formula is as follows (where K is the number of subnet controllers):

若各子网检测模型的损失函数和S、损失函数标准差的平均值

都小于等于设定阈值，则将聚合后的模型与收敛信息发送至子网控制器，否则将聚合后的模型与非收敛信息发送至各子网控制器，子网控制器重复步骤4）。 If the loss function and S of each sub-network detection model, the average value of the standard deviation of the loss function

If both are less than or equal to the set threshold, the aggregated model and convergence information are sent to the sub-network controller; otherwise, the aggregated model and non-convergence information are sent to each sub-network controller, and the sub-network controller repeats step 4).

可选地，还包括6）各子网控制器则利用本地数据对全局模型进行微调，然后利用微调后的最新模型对本地流量进行检测。当检测出异常流量时，各子网控制器通过有效措施对异常攻击进行有效缓解。Optionally, 6) each sub-network controller uses local data to fine-tune the global model, and then uses the fine-tuned latest model to detect local traffic. When abnormal traffic is detected, each subnet controller takes effective measures to mitigate abnormal attacks.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质（包括但不限于磁盘存储器、CD-ROM、光学存储器等）上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备（系统）、和计算机程序产品的流程图和／或方框图来描述的。应理解可由计算机程序指令实现流程图和／或方框图中的每一流程和／或方框、以及流程图和／或方框图中的流程和／或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其它可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其它可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in one or more of the flowcharts and/or one or more blocks of the block diagrams.

这些计算机程序指令也可存储在能引导计算机或其它可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer readable memory result in an article of manufacture comprising instruction means, the instructions An apparatus implements the functions specified in a flow or flows of the flowcharts and/or a block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其它可编程数据处理设备上，使得在计算机或其它可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其它可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和／或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in one or more of the flowcharts and/or one or more blocks of the block diagrams.

以上结合附图对本发明的实施例进行了描述，但是本发明并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本发明的启示下，在不脱离本发明宗旨和权利要求所保护的范围情况下，还可做出很多形式，这些均属于本发明的保护之内。The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-mentioned specific embodiments. The above-mentioned specific embodiments are only illustrative rather than restrictive. Under the inspiration of the present invention, without departing from the scope of protection of the present invention and the claims, many forms can be made, which all belong to the protection of the present invention.

Claims

1. A controllable lightweight federated learning method for protecting privacy is characterized by comprising the following steps:

each subnet node trains a local model based on a local training set and a model and parameters preset by the global node; each sub-network node calculates an output channel needing pruning in each layer of the local model by using a set adjusting factor and generates a compressed snapshot of the local model;

the global node determines an optimal subnet node according to the number of data in the training set of each subnet node and the data imbalance degree, takes the compressed snapshot of the optimal subnet node as a reference snapshot, and determines a global snapshot according to the reference snapshot, a channel recovery threshold and the compressed snapshots of other subnet nodes except the optimal subnet node;

the global node performs weighted aggregation on the models of the sub-network nodes to form an aggregated global model; according to the global snapshot, pruning each layer of output channels of the aggregated global model to form a global compression model;

each sub-network node trains the global compression model by using a local training set until the model after each pair of sub-network nodes trains converges after weighted aggregation, and a final aggregated model is obtained;

the calculating of the output channels needing pruning in each layer of the local model by using the set adjustment factors comprises a second step of simultaneously satisfying the following formulaiFirst of a layercEach output channel is determined as an output channel to be clipped,

wherein

The representation model isiFirst of a layercThe average characteristic mapping value of each output channel output by the activation function;

representation modelFirst, theiThe average value of the average feature mapping values output by all output channels of the layer through the activation function;

，

are adjustment factors and are all greater than 0,

the representation model isiFirst of a layercThe average zero activation percentage of the individual output channels,

is a model ofiAverage of the average zero activation percentage of all output channels of the layer.

2. The privacy-protecting controllable light-weight federal learning method as claimed in claim 1, wherein the step of determining the global snapshot according to the reference snapshot, the channel restoration threshold and the compressed snapshots of other subnet nodes except the optimal subnet node by taking the compressed snapshot of the optimal subnet node as the reference snapshot comprises the steps of: scanning all output channels needing to be pruned in the reference snapshot in the compressed snapshots of the other subnet nodes, and counting whether the pruned output channels exist in the compressed snapshots of the other subnet nodes;

when a certain output channel does not exist in the compressed snapshots of other subnet nodes, recording the corresponding subnet node, and when the number of the recorded subnet nodes is greater than a set channel recovery threshold, deleting the output channel in the reference snapshot; and finally, obtaining the global snapshot.

3. The privacy-preserving controllable lightweight federated learning method of claim 2, wherein the compressed snapshot includes that the output channel that needs to be pruned is a layer number of a neural network, and an ID number of the output channel that needs to be pruned.

4. The privacy-preserving, controllable, lightweight federated learning method of claim 1, wherein an optimal subnet node is determined by the following formula:

wherein

Is a firstiThe degree of data imbalance for each sub-network node,

is as followsiThe data volume of the local training set of individual subnet nodes is a proportion of all training data,

is shown asiThe performance evaluation value of the training data of each node,

is the optimal subnet node.

5. The privacy-preserving, controlled, lightweight federated learning method of claim 4, whereiniThe data volume of the local training set of each sub-network node is in proportion to all training data

And a firstiData imbalance of individual subnet nodes

The calculation formula of (a) is as follows:

wherein,

is as followsiThe amount of training data for the subnet controller,

is as followsjThe amount of training data for the subnet controller,nthe number of types of data;Kthe number of nodes of all the subnet controllers;

is as followsiFrom the subnet controller to the firstjThe number of seed class data;

is as followsiAverage number of various types of data in the subnet controller.

6. The privacy-preserving controlled lightweight federated learning method of claim 1, wherein the method of determining whether the weighted aggregated model converges comprises: determining the loss function of the model trained by each sub-network node and the average value of the standard deviation of the loss function;

and determining the model convergence after weighted aggregation is carried out on the model trained by the sub-network controller if the average value of the loss function of the model trained by each sub-network node and the standard deviation of the loss function is less than or equal to a set threshold value.

7. The controllable light-weight federal learning system for protecting privacy is characterized by comprising a data layer, a subnet control layer and a global control layer;

the data layer is used for carrying out data forwarding communication on each subnet;

the subnet control layer is provided with a plurality of subnet controllers, and the global control layer is provided with a global controller;

the global controller is used for transmitting preset models and parameters to all the subnet controllers and adjusting factors required by model compression;

the subnet controller is used for acquiring data and extracting features to form a local training set; receiving the model, the parameters and the adjustment factors transmitted by the global controller; training a local model by using a local training set, a model and parameters, calculating output channels needing pruning in each layer of the local model by using set adjustment factors, and generating a compressed snapshot of the local model; each subnet controller transmits the number of data in the local training set, the model and the compressed snapshot to a global controller;

the method comprises the steps that an optimal subnet controller is determined by a global controller according to the number of data in a training set obtained by each subnet controller and the data imbalance degree, a compressed snapshot of the optimal subnet controller is taken as a reference snapshot, and the global snapshot is determined according to the reference snapshot, a channel recovery threshold and compressed snapshots generated by other subnet controllers except the optimal subnet controller;

the global controller carries out weighted aggregation on the models trained by the sub-network controllers to form an aggregated global model; according to the global snapshot, pruning each layer of output channels of the aggregated global model to form a global compression model;

the subnet controller calculates output channels needing pruning in each layer of the local model by using the set regulating factors, and the output channels comprise a second channel simultaneously meeting the following formulaiFirst of a layercThe output channel is determined as the channel to be trimmed,

wherein

Represents the model number oneiFirst of the layercThe average characteristic mapping value output by each output channel through the activation function;

the representation model isiThe average value of the average feature mapping values output by all output channels of the layer through the activation function;

，

are adjustment factors and are all greater than 0,

the representation model isiFirst of the layercThe average zero activation percentage of the individual output channels,

is the average of the average zero activation percentage of all output channels of the model first layer.

8. The privacy-protecting controllable light-weight federal learning flow data detection method is characterized in that a model is obtained by adopting the privacy-protecting controllable light-weight federal learning method as claimed in any one of claims 1-6;

and inputting the acquired network flow data, and performing flow detection by using the finally acquired model.