CN116132081A

CN116132081A - Software defined network DDOS attack cooperative defense method based on ensemble learning

Info

Publication number: CN116132081A
Application number: CN202211077052.XA
Authority: CN
Inventors: 陈俊彦; 卢贤涛; 黄雪锋; 谢小兰; 廖岑卉珊
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2022-09-05
Filing date: 2022-09-05
Publication date: 2023-05-16
Anticipated expiration: 2042-09-05
Also published as: CN116132081B

Abstract

The invention relates to the technical field of intrusion detection, in particular to a software defined network DDOS attack cooperative defense method based on ensemble learning, which firstly provides a Bagging integration-based feature selection algorithm, and avoids the condition that a single feature selection algorithm ignores the inter-feature connection and tends to select redundant features; secondly, introducing paired diversity measurement for selective integration, selecting a superior heterogeneous integration model, and adopting a layering ten-fold cross validation method to avoid overfitting; and finally, integrating a base classifier model by adopting a weighted voting mechanism based on a Bagging integration algorithm, embedding the model into an SDN controller, setting a detection time interval and realizing real-time monitoring.

Description

Collaborative defense method for DDOS attacks in software-defined networks based on ensemble learning

技术领域Technical Field

本发明涉及入侵检测技术领域，具体涉及一种基于集成学习的软件定义网络DDOS攻击协同防御方法。The present invention relates to the technical field of intrusion detection, and in particular to a software defined network DDOS attack collaborative defense method based on ensemble learning.

背景技术Background Art

近年来，随着互联网快速发展，网络的规模日趋庞大，传统网络压力日益见长。软件定义网络(Software Defined Networking，SDN)采用可编程的方式，实现转控分离，可有效地管理网络设备资源，集中体现了以控制器为中心的管理模式，在云计算网络的发展中起着重要的作用。但这种模式会给各种恶意应用软件带来相应的攻击机会。尽管OpenFlow可以提供一些基于流的安全检测第法，但其潜在的假设是在SDN中应用软件未受到恶意攻击或北向接口未受到破坏的前提条件下，这会对SDN起到一定的安全保护作用，但对DDOS攻击却束手无策。基于SDN对数据流细粒化的控制，在DDOS攻击中可以利用控制器方便地进行报文过滤、速率限制和攻击溯源。然而，传统的分类算法使用较为单一的特征降维算法进行特征降维，没有考虑特征与特征之间的联系，且该方法只使用单个算法评估模型，导致模型的稳定性较弱，且冗余特征浪费计算资源。其次，在进行检测算法设计上仅以单一分类算法作为支撑，或者仅以准确值作为集成分类器的赋权指标，致使分类结果存在偏向性。In recent years, with the rapid development of the Internet, the scale of the network has become increasingly large, and the pressure on traditional networks has increased. Software Defined Networking (SDN) uses a programmable method to achieve the separation of forwarding and control, which can effectively manage network device resources, and embodies the controller-centric management model. It plays an important role in the development of cloud computing networks. However, this model will bring corresponding attack opportunities to various malicious applications. Although OpenFlow can provide some flow-based security detection methods, its underlying assumption is that the application software in the SDN is not maliciously attacked or the northbound interface is not damaged. This will play a certain security protection role for SDN, but it is helpless against DDOS attacks. Based on the fine-grained control of data flows by SDN, the controller can be used to conveniently perform packet filtering, rate limiting and attack tracing in DDOS attacks. However, the traditional classification algorithm uses a relatively single feature dimensionality reduction algorithm for feature dimensionality reduction, without considering the connection between features, and this method only uses a single algorithm to evaluate the model, resulting in weak model stability and redundant features that waste computing resources. Secondly, only a single classification algorithm is used as support in the design of the detection algorithm, or only the accuracy value is used as the weighting indicator of the integrated classifier, which leads to biased classification results.

发明内容Summary of the invention

本发明的目的在于提供一种基于集成学习的软件定义网络DDOS攻击协同防御方法，避免了现有防御方法中的单特征选择算法忽略特征间联系的情况，同时采用分层十倍交叉验证方法避免了过度拟合问题。The purpose of the present invention is to provide a software-defined network DDOS attack collaborative defense method based on ensemble learning, which avoids the situation that the single feature selection algorithm in the existing defense method ignores the connection between features, and adopts a layered ten-fold cross-validation method to avoid the overfitting problem.

为实现上述目的，本发明提供了一种基于集成学习的软件定义网络DDOS攻击协同防御方法，包括下列步骤：To achieve the above object, the present invention provides a software defined network DDOS attack collaborative defense method based on ensemble learning, comprising the following steps:

启动SDN控制器进行流量数据采集，并将所述流量数据存储在CSV文件中；Start the SDN controller to collect traffic data and store the traffic data in a CSV file;

将收集到的数据放入Double-Bagging检测模型进行训练；Put the collected data into the Double-Bagging detection model for training;

将训练生成的集成模型嵌入SDN控制器，启动DDOS攻击检测模块。The integrated model generated by training is embedded in the SDN controller, and the DDOS attack detection module is started.

其中，所述流量数据包括正常流量和攻击流量的数据，由IP源的速度、流量计数、流量表项的速度和流量对比值组成，所述IP源的速度、所述流量计数、所述流量表项的速度和所述流量对比值的数值在正常流量时均比在攻击流量时低。Among them, the traffic data includes data of normal traffic and attack traffic, and is composed of the speed of the IP source, the traffic count, the speed of the traffic table item and the traffic comparison value. The values of the speed of the IP source, the traffic count, the speed of the traffic table item and the traffic comparison value are lower in normal traffic than in attack traffic.

其中，所述Double-Bagging检测模型的建立过程，包括下列步骤：The establishment process of the Double-Bagging detection model includes the following steps:

进行数据预处理及特征降维，通过基于bagging集成算法的特征子集选择投票机制获取最优特征子集；Perform data preprocessing and feature dimension reduction, and obtain the optimal feature subset through the feature subset selection voting mechanism based on the bagging ensemble algorithm;

所述最优特征子集输入基分类器进行训练，选择优质的异质集成学习基分类器模型；The optimal feature subset is input into the base classifier for training, and a high-quality heterogeneous ensemble learning base classifier model is selected;

采用基于Bagging集成算法的加权投票机制进行选择基分类器模型的集成。A weighted voting mechanism based on the Bagging ensemble algorithm is used to select the ensemble of base classifier models.

其中，特征降维采用结合过滤法和嵌入法的降维算法，并选用滤法中的两种特征选择算法卡方验证与互信息算法以及嵌入法中的极限随机树算法计算特征贡献的排序。Among them, feature dimensionality reduction adopts a dimensionality reduction algorithm that combines filtering method and embedding method, and selects two feature selection algorithms in the filtering method, the chi-square verification and mutual information algorithm, and the extreme random tree algorithm in the embedding method to calculate the ranking of feature contributions.

其中，获取最优特征子集的过程，具体为通过设定的阈值决定每个特征选择算法结果的个数，根据特征贡献排序生成结果特征子集，采用投票策略选择出现次数最多的特征子集作为最优特征子集。The process of obtaining the optimal feature subset is to determine the number of results of each feature selection algorithm by setting a threshold, generate the result feature subset according to the feature contribution ranking, and use a voting strategy to select the feature subset with the largest number of occurrences as the optimal feature subset.

其中，在所述最优特征子集输入基分类器进行训练，选择优质的异质集成学习基分类器模型的过程中，引入集成学习中的多样性度量进行基分类器间的组合效果评估，并采用贝叶斯优化参数，通过计算模型的准确率和曲线下面积完成第一次的基分类器过滤。In the process of inputting the optimal feature subset into the base classifier for training and selecting a high-quality heterogeneous ensemble learning base classifier model, the diversity metric in ensemble learning is introduced to evaluate the combination effect between base classifiers, and the Bayesian optimization parameters are used to complete the first base classifier filtering by calculating the accuracy and area under the curve of the model.

其中，所述加权投票机制的权重由准确率和曲线下面积值确定，所述曲线下面积值通过基分类器的受试者工作特征曲线计算获取。The weight of the weighted voting mechanism is determined by the accuracy and the area under the curve value, and the area under the curve value is obtained by calculating the receiver operating characteristic curve of the base classifier.

其中，启动DDOS攻击检测模块后，当正常流量产生时，Double-Bagging模型算法将其预测为正常流量，当有攻击流量产生时，立即将其检测为DDOS攻击，并阻断其进入的端口。Among them, after starting the DDOS attack detection module, when normal traffic is generated, the Double-Bagging model algorithm predicts it as normal traffic. When attack traffic is generated, it is immediately detected as a DDOS attack and the port it enters is blocked.

本发明提供了一种基于集成学习的软件定义网络DDOS攻击协同防御方法，首先提出了一种基于Bagging集成的特征选择算法，避免单特征选择算法忽略特征间联系，倾向选择冗余特征的情况；其次引入成对多样性度量用于选择性集成，选择出较为优秀的异质集成模型，并采用分层十倍交叉验证方法避免了过度拟合；最后，采用基于Bagging集成算法的加权投票机制进行基分类器模型的集成，并将该模型嵌入SDN控制器当中，设置检测时间间隔，实现实时监测。The present invention provides a software defined network DDOS attack collaborative defense method based on ensemble learning. Firstly, a feature selection algorithm based on bagging ensemble is proposed to avoid the situation that a single feature selection algorithm ignores the connection between features and tends to select redundant features. Secondly, a pairwise diversity metric is introduced for selective integration to select a relatively excellent heterogeneous integration model, and a layered ten-fold cross validation method is used to avoid overfitting. Finally, a weighted voting mechanism based on bagging ensemble algorithm is used to integrate base classifier models, and the model is embedded in an SDN controller, and a detection time interval is set to realize real-time monitoring.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1是本发明的一种基于集成学习的软件定义网络DDOS攻击协同防御方法的流程示意图。FIG1 is a flow chart of a software defined network DDOS attack collaborative defense method based on ensemble learning according to the present invention.

图2是本发明的基于集成学习的软件定义网络DDOS攻击协同防御方法的具体实施流程图。FIG2 is a flowchart of a specific implementation of the software-defined network DDOS attack collaborative defense method based on ensemble learning of the present invention.

图3是本发明的Double-Bagging算法的流程图；Fig. 3 is a flow chart of the Double-Bagging algorithm of the present invention;

图4是本发明的基于Bagging的特征选择投票机制流程图。FIG4 is a flow chart of the feature selection voting mechanism based on Bagging of the present invention.

具体实施方式DETAILED DESCRIPTION

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本发明，而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and are intended to be used to explain the present invention, and should not be construed as limiting the present invention.

以下为本发明的部分名词以及术语说明：The following are some nouns and terminology descriptions of the present invention:

软件定义网络(Software Defined Networking，SDN)；Software Defined Networking (SDN);

受试者工作特征(receiver operating characteristic，ROC)；Receiver operating characteristic (ROC);

曲线下面积(Area Under Curve，AUC)。Area Under Curve (AUC).

请参阅图1，本发明提供了一种基于集成学习的软件定义网络DDOS攻击协同防御方法，包括下列步骤：Referring to FIG. 1 , the present invention provides a software defined network DDOS attack collaborative defense method based on ensemble learning, comprising the following steps:

S1：启动SDN控制器进行流量数据采集，并将所述流量数据存储在CSV文件中；S1: Start the SDN controller to collect traffic data and store the traffic data in a CSV file;

S2：将收集到的数据放入Double-Bagging检测模型进行训练；S2: Put the collected data into the Double-Bagging detection model for training;

S3：将训练生成的集成模型嵌入SDN控制器，启动DDOS攻击检测模块。S3: Embed the integrated model generated by training into the SDN controller and start the DDOS attack detection module.

基于集成学习的软件定义网络DDOS攻击协同防御方法总体设计如图2所示，协同防御模型的设计过程包括三个方面，分别是流量收集模块、算法模型选择模块以及检测防御机制设计三个模块，并将该模型嵌入Ryu控制器当中，设置检测时间间隔，实现实时监测，其详细的实施过程如下:The overall design of the collaborative defense method for DDOS attacks in software-defined networks based on ensemble learning is shown in Figure 2. The design process of the collaborative defense model includes three aspects: traffic collection module, algorithm model selection module, and detection and defense mechanism design. The model is embedded in the Ryu controller, and the detection time interval is set to achieve real-time monitoring. The detailed implementation process is as follows:

S1：启动SDN控制器进行流量数据采集，并将所述流量数据存储在CSV文件中，其中先采集正常流量数据，再采集攻击流量数据，供Double-Bagging模型算法训练使用；S1: Start the SDN controller to collect traffic data and store the traffic data in a CSV file, wherein normal traffic data is collected first and then attack traffic data is collected for use in Double-Bagging model algorithm training;

需要监控和采集的特性和参数如下：The characteristics and parameters that need to be monitored and collected are as follows:

(1)IP源的速度：该特性给出了在特定时间间隔内进入网络的TP源的总数。缩写为SSIP，定义为式1：(1) Speed of IP sources: This characteristic gives the total number of TP sources entering the network within a specific time interval. Abbreviated as SSIP, it is defined as formula 1:

其中SumIPsrc为每个流进入的IP源总数，T为采样时间间隔。检测系统每T秒进行一次流量监控和数据采集，并保存在该时间段内的源IP个数。控制器需要有足够的正常流量和攻击流量数据，机器学习算法才能预测攻击。对于普通攻击，SSIP通常较低，而对于攻击，计数通常较高。Where SumIPsrc is the total number of IP sources entering each flow, and T is the sampling time interval. The detection system monitors traffic and collects data every T seconds, and saves the number of source IPs in that time period. The controller needs to have enough normal traffic and attack traffic data for the machine learning algorithm to predict attacks. For normal attacks, SSIP is usually low, while for attacks, the count is usually high.

(2)流量计数：每个进入网络的流量都有一个特定的流量计数。正常流量比DDOS攻击流量少。(2) Traffic count: Each traffic entering the network has a specific traffic count. Normal traffic is less than DDOS attack traffic.

(3)流量表项的速度:在一定时间间隔内，网络中交换机的流量表项总数Flow_N。缩写为SFE，定义为式2：(3) Flow table speed: The total number of flow table entries in the network switches in a certain time interval, Flow _N. Abbreviated as SFE, it is defined as Formula 2:

这是攻击流量检测的一个非常相关的特征，因为在DDOS攻击的情况下，流量表项的数量在固定的时间间隔内会比正常流量的流量表项的速度值显著增加。This is a very relevant feature for attack traffic detection, since in case of DDOS attack the number of flow table entries in a fixed time interval can increase significantly compared to the rate value of flow table entries for normal traffic.

(4)流量对比值：指在T个时间段内，交换机流入的流量条目总数Flow_N，即交互IP数除以总流量。缩写为RPF，定义为式3：(4) Traffic comparison value: refers to the total number of traffic entries flowing into the switch in the T time period, Flow _N , that is, the number of interactive IPs divided by the total traffic. It is abbreviated as RPF and is defined as Formula 3:

其中srcIP为网络流中协作IP的总数。在正常流量情况下，第i条流的IP源与第j条流的目的IP相同，第j条流的IP源与第i条流的目的IP相同。这说明了一个交互流，而当它是DDOS攻击流量时就不是这样了。受到攻击时，到达目标主机的时间T的流表项迅速增加，目标主机无法响应。因此，当DDOS攻击开始时，攻击流量会突然减少。将协作流总数除以总流量，使该检测参数可扩展到不同运行条件下的网络。Where srcIP is the total number of collaborative IPs in the network flow. In the case of normal traffic, the IP source of the i-th flow is the same as the destination IP of the j-th flow, and the IP source of the j-th flow is the same as the destination IP of the i-th flow. This indicates an interactive flow, which is not the case when it is a DDOS attack traffic. When under attack, the flow table entries with a time T to reach the target host increase rapidly, and the target host cannot respond. Therefore, when the DDOS attack starts, the attack traffic decreases suddenly. Dividing the total number of collaborative flows by the total traffic makes this detection parameter scalable to networks under different operating conditions.

其中，Double-Bagging检测模型建立步骤如图3所示，首先提出了一种基于Bagging的投票机制选择最优特征子集。其次采用十倍交叉验证的方法检测数据集中的异常，选择基分类器模型。最后，采用基于Bagging集成算法的加权投票机制进行基分类器模型的集成。对于基分类器的赋权，提出了使用准确度和AUC值的综合作为权重对分类器进行集成，有效解决了赋权不均的问题。The steps of establishing the Double-Bagging detection model are shown in Figure 3. First, a voting mechanism based on Bagging is proposed to select the optimal feature subset. Secondly, the ten-fold cross-validation method is used to detect anomalies in the data set and select the base classifier model. Finally, a weighted voting mechanism based on the Bagging integration algorithm is used to integrate the base classifier models. For the weighting of the base classifier, it is proposed to use the combination of accuracy and AUC value as the weight to integrate the classifier, which effectively solves the problem of uneven weighting.

步骤S2.1、进行数据预处理及特征降维。首先先进行数据标准化，接着进行特征降维。本发明提出了结合过滤法和嵌入法的降维算法，能够尽可能地避免出现特征选择偏差及弥补各单一特征选择算法的不足，框架图如图4所示。首先对过滤法中的两种特征选择算法卡方验证与互信息算法，以及嵌入法中的极限随机树算法计算特征贡献(权重)。Step S2.1, perform data preprocessing and feature dimensionality reduction. First, perform data standardization, and then perform feature dimensionality reduction. The present invention proposes a dimensionality reduction algorithm that combines filtering method and embedding method, which can avoid feature selection bias as much as possible and make up for the shortcomings of each single feature selection algorithm. The framework diagram is shown in Figure 4. First, the feature contribution (weight) is calculated for the two feature selection algorithms in the filtering method, the chi-square verification and the mutual information algorithm, and the extreme random tree algorithm in the embedding method.

卡方验证的特征贡献计算算法如式4所示：The feature contribution calculation algorithm of chi-square verification is shown in formula 4:

其中c代表自由度，O代表观察值，E代表期望值。Where c represents the degrees of freedom, O represents the observed value, and E represents the expected value.

互信息算法的特征贡献计算算法如式5所示：The feature contribution calculation algorithm of the mutual information algorithm is shown in Formula 5:

其中p(x)表示X＝xi出现的概率，p(y)表示Y＝yi出现的概率，p(x,y)表示X＝xi,Y＝yi同时出现的概率，即联合概率。Among them, p(x) represents the probability of X=xi occurring, p(y) represents the probability of Y=yi occurring, and p(x,y) represents the probability of X=xi and Y=yi occurring at the same time, that is, the joint probability.

极限随机树特征贡献计算算法如式6所示：The extreme random tree feature contribution calculation algorithm is shown in Formula 6:

k代表k个类别，Pk代表类别k的样本权重。k represents k categories, and Pk represents the sample weight of category k.

随后对三个算法的特征贡献(权重)进行排序。权重越大，特征越重要。每种特征选择算法根据各自的算法准则为每个特征分配权重，并产生各自的特征权重分布。根据三种特征选择算法的特征权重排序，采用基于bagging集成算法的Voting投票机制进行最优特征子集选择。该机制通过设定的阈值决定每个方法结果的个数，对于三种方法的特征贡献排序生成多个结果特征子集，最后采用Voting投票策略选择出现次数最多的特征子集作为最优特征子集。The feature contributions (weights) of the three algorithms are then ranked. The larger the weight, the more important the feature. Each feature selection algorithm assigns weights to each feature according to its own algorithm criteria and generates its own feature weight distribution. According to the feature weight ranking of the three feature selection algorithms, the voting mechanism based on the bagging ensemble algorithm is used to select the optimal feature subset. This mechanism determines the number of results of each method through a set threshold, generates multiple result feature subsets for the feature contribution ranking of the three methods, and finally uses the voting strategy to select the feature subset with the most occurrences as the optimal feature subset.

步骤S2.2、把步骤S2.1得到的最优特征子集输入基分类器进行训练，引入集成学习中的多样性度量进行基分类器间的组合效果评估，选择优质的异质集成学习基分类器模型。对于拟选择的基分类器模型采用贝叶斯优化参数，通过计算模型的准确率(ACC)和曲线下面积(AUC)，完成第一次的基分类器过滤。Step S2.2: Input the optimal feature subset obtained in step S2.1 into the base classifier for training, introduce the diversity metric in ensemble learning to evaluate the combination effect between base classifiers, and select a high-quality heterogeneous ensemble learning base classifier model. For the base classifier model to be selected, Bayesian optimization parameters are used to complete the first base classifier filtering by calculating the accuracy (ACC) and area under the curve (AUC) of the model.

模型ACC的计算公式如式7所示：The calculation formula of model ACC is shown in Equation 7:

式7中，TP表示样本预测值与真实值相符且均为正的样本数量；FP表示样本预测值为正而真实值为负的样本数量；TN表示样本预测值与真实值相符且均为负的样本数量；FN表示样本预测值为负而真实值为正的样本数量。In Formula 7, TP represents the number of samples whose predicted values are consistent with the true values and are both positive; FP represents the number of samples whose predicted values are positive and the true values are negative; TN represents the number of samples whose predicted values are consistent with the true values and are both negative; FN represents the number of samples whose predicted values are negative and the true values are positive.

AUC采用受试者工作特征(receiver operating characteristic)ROC曲线，其计算方法如式8所示，AUC为1则对应理想分类器，其表示公式如下：AUC uses the receiver operating characteristic (ROC) curve, and its calculation method is shown in formula 8. AUC of 1 corresponds to an ideal classifier, and its expression formula is as follows:

式8中M_p，M_n分别表示正负样本个数，i表示正样本的排序编号，M*N表示随机从正负样本各取一个情况数。In Formula 8, M _p and M _n represent the number of positive and negative samples respectively, i represents the sorting number of the positive sample, and M*N represents the number of cases where one sample is randomly selected from each of the positive and negative samples.

为了获得最佳的预测效果，还需要选择差异度比较大的基学习器，进行第一次的基分类器过滤。对于模型的选择性集成，引入基于集成学习的成对多样性度量不一致度量Dis，Q统计以及双次失败度量DF。In order to obtain the best prediction effect, it is also necessary to select base learners with relatively large differences for the first base classifier filtering. For the selective integration of models, the pairwise diversity metric Dis, Q statistics and double failure metric DF based on ensemble learning are introduced.

不一致度量Dis重点讨论两种分类器分类结果不同的样本，分类结果不同的样本数量越多，两个分类器之间的差异程度越高。相反，多样性越少，其取值范围为[0,1]。不一致度量Dis计算如式9所示。样本总数用N表示。The inconsistency measure Dis focuses on the samples with different classification results of the two classifiers. The more samples with different classification results, the higher the degree of difference between the two classifiers. On the contrary, the less diversity, the value range is [0,1]. The calculation of the inconsistency measure Dis is shown in formula 9. The total number of samples is represented by N.

Q统计与分类器间的多样性程度有关，其取值范围为[-1,1]，若两分类器相对完全独立，则说明分类方法完全无关联，即其Q统计值为0。Q统计计算如The Q statistic is related to the diversity between classifiers, and its value range is [-1,1]. If two classifiers are relatively independent, it means that the classification methods are completely unrelated, that is, their Q statistic value is 0. The Q statistic is calculated as follows:

式10所示。As shown in formula 10.

双次失败度量DF主要针对被两个分类器错误分类的样本，其取值范围为[0,1]。两个分类器错误分类的样本数量越多，两个分类器就越倾向于在相同的样本上出错。双次失败度量DF计算如式11所示。样本总数用N表示。The double failure metric DF is mainly aimed at samples that are misclassified by two classifiers, and its value range is [0,1]. The more samples the two classifiers misclassify, the more likely the two classifiers are to make mistakes on the same samples. The calculation of the double failure metric DF is shown in Equation 11. The total number of samples is represented by N.

本发明对目标基分类器群中的单分类器两两进行度量计算，趋于选择出不一致度量Dis与Q统计趋向于1，而DF值趋向于0的基分类器组合，以选择最优异质集成学习基分类器模型。此外，为了避免模型过拟合的情况，采用十折交叉验证，将训练集分割成10个子样本，一个单独的子样本被保留作为验证模型的数据，其他(10-1)个样本用来训练。交叉验证重复10次，每个子样本验证一次，平均10次的结果或者使用其它结合方式，最终得到一个单一估测，从而模型的拟合情况。The present invention performs metric calculations on single classifiers in the target base classifier group in pairs, tending to select a base classifier combination whose inconsistency metric Dis and Q statistics tend to 1, and whose DF value tends to 0, so as to select the most excellent quality ensemble learning base classifier model. In addition, in order to avoid overfitting of the model, a ten-fold cross-validation is used, and the training set is divided into 10 sub-samples. A single sub-sample is retained as data for verifying the model, and the other (10-1) samples are used for training. The cross-validation is repeated 10 times, and each sub-sample is verified once. The results of 10 times are averaged or other combined methods are used to finally obtain a single estimate, thereby the fitting of the model.

步骤S2.3、采用基于Bagging集成算法的加权投票机制进行步骤S2.2选择基分类器模型的集成。系统获取基分类器的ROC曲线并根据ROC曲线计算AUC值，随后使用准确率和AUC值作为加权投票机制的权重。分类器间集成赋权函数如式12所示：Step S2.3: Use a weighted voting mechanism based on the Bagging ensemble algorithm to perform the integration of the base classifier model selected in step S2.2. The system obtains the ROC curve of the base classifier and calculates the AUC value based on the ROC curve, and then uses the accuracy and AUC value as the weights of the weighted voting mechanism. The inter-classifier ensemble weighting function is shown in Formula 12:

式12中，U_aoc,i表示第i个基学习器的ROC曲线未覆盖的面积(U_aoc,i＝1-AUC)和分别为所有基学习器中ROC曲线未覆盖面积的最大值和最小值，e_i，e_b，e_w分别表示第i个分类器准确率、集合中准确率最低的分类器准确率以及集合中准确率最高的分类器准确率。In Formula 12, U _aoc,i represents the area not covered by the ROC curve of the i-th base learner (U _aoc,i = 1-AUC) and are the maximum and minimum values of the area not covered by the ROC curve of all base learners, respectively. e _i , e _b , and _ew represent the accuracy of the i-th classifier, the accuracy of the classifier with the lowest accuracy in the set, and the accuracy of the classifier with the highest accuracy in the set, respectively.

将步骤S2.3生成的集成模型嵌入SDN控制器，启动DDOS攻击检测模块。当正常流量产生时，Double-Bagging模型算法将其预测为正常流量，当有攻击流量产生时，立即将其检测为DDOS攻击，并阻断其进入的端口。若某个端口被阻断，控制器仍然允许其他端口的正常流量通过。控制器在一段时间后解除阻塞端口并重新开始检测，如果攻击仍然活跃，它会再次检测并阻塞该端口。只要攻击持续，阻塞就会持续下去。The integrated model generated in step S2.3 is embedded into the SDN controller, and the DDOS attack detection module is started. When normal traffic is generated, the Double-Bagging model algorithm predicts it as normal traffic. When attack traffic is generated, it is immediately detected as a DDOS attack and the port it enters is blocked. If a port is blocked, the controller still allows normal traffic from other ports to pass. The controller unblocks the port after a period of time and restarts detection. If the attack is still active, it will detect and block the port again. The blocking will continue as long as the attack continues.

以上所揭露的仅为本发明一种较佳实施例而已，当然不能以此来限定本发明之权利范围，本领域普通技术人员可以理解实现上述实施例的全部或部分流程，并依本发明权利要求所作的等同变化，仍属于发明所涵盖的范围。What is disclosed above is only a preferred embodiment of the present invention, and it certainly cannot be used to limit the scope of rights of the present invention. Ordinary technicians in this field can understand that all or part of the processes of the above embodiment and equivalent changes made according to the claims of the present invention still fall within the scope of the invention.

Claims

1. The software defined network DDOS attack cooperative defense method based on the ensemble learning is characterized by comprising the following steps:

starting an SDN controller to collect flow data, and storing the flow data in a CSV file;

putting the collected data into a Double-Bagging detection model for training;

embedding the integrated model generated by training into an SDN controller, and starting a DDOS attack detection module.

2. The software defined network DDOS attack collaborative defense method based on ensemble learning according to claim 1,

the flow data comprise data of normal flow and attack flow, and consist of the speed of an IP source, the flow count, the speed of a flow meter item and a flow comparison value, wherein the speed of the IP source, the flow count, the speed of the flow meter item and the flow comparison value are lower in the normal flow than in the attack flow.

3. The software defined network DDOS attack collaborative defense method based on ensemble learning according to claim 1,

the establishing process of the Double-Bagging detection model comprises the following steps:

performing data preprocessing and feature dimension reduction, and acquiring an optimal feature subset through a feature subset selection voting mechanism based on a bagging integration algorithm;

the optimal feature subset is input into a base classifier for training, and a high-quality heterogeneous integrated learning base classifier model is selected;

and integrating the selection base classifier model by adopting a weighted voting mechanism based on a Bagging integration algorithm.

4. The software defined network DDOS attack collaborative defense method based on ensemble learning according to claim 3,

feature dimension reduction adopts dimension reduction algorithm combining filtering method and embedding method, and selects two feature selection algorithms in filtering method, namely chi-square verification and mutual information algorithm and limit random tree algorithm in embedding method to calculate the ranking of feature contribution.

5. The software defined network DDOS attack collaborative defense method based on ensemble learning according to claim 3,

and acquiring an optimal feature subset, namely determining the number of results of each feature selection algorithm through a set threshold value, generating a feature subset of the results according to feature contribution ordering, and selecting the feature subset with the largest occurrence number as the optimal feature subset by adopting a voting strategy.

6. The software defined network DDOS attack collaborative defense method based on ensemble learning according to claim 3,

and in the process of inputting the optimal feature subset into the base classifier for training and selecting a high-quality heterogeneous integrated learning base classifier model, introducing a diversity measure in integrated learning to evaluate the combination effect among the base classifiers, adopting Bayesian optimization parameters, and completing the first filtering of the base classifier through calculating the accuracy of the model and the area under the curve.

7. The software defined network DDOS attack collaborative defense method based on ensemble learning according to claim 3,

the weight of the weighted voting mechanism is determined by the accuracy and the area value under the curve, and the area value under the curve is obtained through calculation of the working characteristic curve of the subject of the base classifier.

8. The software defined network DDOS attack collaborative defense method based on ensemble learning according to claim 1,

after the DDOS attack detection module is started, when normal traffic is generated, the Double-Bagging model algorithm predicts the normal traffic as the normal traffic, and when attack traffic is generated, the attack traffic is immediately detected as the DDOS attack and the port into which the attack traffic enters is blocked.