+

CN120185929A - A network intrusion detection method and system based on reliability sample selection - Google Patents

A network intrusion detection method and system based on reliability sample selection Download PDF

Info

Publication number
CN120185929A
CN120185929A CN202510645750.2A CN202510645750A CN120185929A CN 120185929 A CN120185929 A CN 120185929A CN 202510645750 A CN202510645750 A CN 202510645750A CN 120185929 A CN120185929 A CN 120185929A
Authority
CN
China
Prior art keywords
samples
reliability
model
sample
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202510645750.2A
Other languages
Chinese (zh)
Other versions
CN120185929B (en
Inventor
张文翔
刘帅
田波
陈小龙
赵越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Hunan Normal University
Original Assignee
CETC 30 Research Institute
Hunan Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute, Hunan Normal University filed Critical CETC 30 Research Institute
Priority to CN202510645750.2A priority Critical patent/CN120185929B/en
Publication of CN120185929A publication Critical patent/CN120185929A/en
Application granted granted Critical
Publication of CN120185929B publication Critical patent/CN120185929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于可靠性样本选择的网络入侵检测方法及系统,包括以下步骤:对网络入侵数据进行数据增强,生成更具代表性的样本;然后通过可靠性评估选择高质量样本用于初始模型训练;对模型进行在线训练,当检测到概念漂移时,采用注意力机制引导的可靠性样本选择策略,优先选择对模型训练有价值的样本进行更新;当可靠性样本数量不足时,从非可靠性样本中选择模型损失值高的样本作为补充样本;将待检测样本输入到训练好的模型中,模型输出样本的分类预测结果。通过比较模型的预测结果与真实标签,评估模型的性能;通过上述步骤,本发明能够有效检测网络入侵中的异常行为,提高网络入侵检测系统的适应性和鲁棒性。

The present invention discloses a network intrusion detection method and system based on reliability sample selection, which includes the following steps: data enhancement is performed on network intrusion data to generate more representative samples; then high-quality samples are selected for initial model training through reliability evaluation; the model is trained online, and when concept drift is detected, a reliability sample selection strategy guided by an attention mechanism is adopted to give priority to updating samples valuable for model training; when the number of reliability samples is insufficient, samples with high model loss values are selected from non-reliability samples as supplementary samples; the samples to be detected are input into the trained model, and the model outputs the classification prediction results of the samples. The performance of the model is evaluated by comparing the prediction results of the model with the true labels; through the above steps, the present invention can effectively detect abnormal behaviors in network intrusions and improve the adaptability and robustness of network intrusion detection systems.

Description

Network intrusion detection method and system based on reliability sample selection
Technical Field
The invention belongs to the field of network information security, and particularly relates to a network intrusion detection method and system based on reliability sample selection.
Background
Network intrusion detection systems (Intrusion Detection Systems, IDS) are key technologies for securing networks, and are widely used to monitor and identify malicious activity in networks. With the increasing complexity of network environments, conventional rule or statistics based intrusion detection methods have been difficult to cope with changing attack means. To improve the accuracy and adaptability of intrusion detection, continuous learning (Continual Learning) techniques are introduced into network intrusion detection. Continuous learning allows models to be updated continuously during operation to accommodate new data distributions. However, the existing continuous learning method has shortcomings in sample selection and model update. They typically employ random sampling or representative selection strategies based on overall distribution, which tend to ignore the core part of the conceptual drift, i.e., the varying part of the data distribution. Thus, the model may not be able to efficiently adapt to the new data distribution.
Disclosure of Invention
The invention aims to solve the problem that the prior art cannot effectively cope with conceptual drift in a network environment in a network intrusion detection task, and provides a network intrusion detection method and system based on reliability sample selection. The invention provides a network intrusion detection method based on reliability sample selection, which comprises the following steps:
In the initial training phase of the training device,
S1, dividing a network intrusion data set, wherein the network intrusion data set comprises an original training data set, an online training data set and a test data set;
s2, carrying out data enhancement on the samples according to the risk and reliability of the original training samples in the S1, and selecting high-quality samples from the enhanced data;
S3, inputting the high-quality sample data selected in the S2 into a model for training to obtain an initial training model;
In the on-line training stage, the device comprises a device for training,
S4, detecting whether the distribution of the input sample and the original sample is the same according to the input online training data, so as to judge whether concept drift occurs;
S5, selecting a certain number of reliability samples by adopting an attention mechanism according to the judgment result of S4 and updating a training data set, wherein when concept drift occurs, calculating attention weights of old samples and new samples by using a self-attention model, judging a reliability threshold according to the attention weights, screening samples with attention weights higher than the threshold as candidate samples, sorting the candidate samples, selecting the first k candidate samples as the reliability samples, deleting the k samples from the old samples, and when the concept drift does not occur, calculating the attention weights of the old samples and the new samples by using the self-attention model, judging the reliability threshold according to the attention weights, screening samples with attention weights higher than the threshold as candidate samples, sorting the candidate samples, selecting the first k candidate samples as the reliability samples, and combining the selected reliability samples into the current training set without deleting the old samples, and updating the labels of the training set;
S6, when the reliability samples in the input samples are insufficient, selecting samples with high model loss values from the non-reliability samples as supplementary samples;
S7, inputting the training data sets updated in the S5 and the S6 into the model for training until the online training data are completely input;
in the test phase of the process, the test device,
S8, inputting test data into the model, outputting a classification prediction result of the sample by the model, and evaluating the performance of the model by comparing the prediction result of the model with the real label.
Preferably, S1 comprises the steps of:
s1.1, inputting original training data into a pre-training model to generate pseudo labels and scoring graphs of the data, and setting an original training data set as WhereinRepresenting the ith sample, a pre-trained modelThe structure is as follows:
;
Generating pseudo tags And scoring graph;
;
;
S1.2 according to the generated pseudo tagAnd scoring graphComputing risk for each sample in the raw training datasetAnd reliability;
;
Wherein g (·) and h (·) are calculation functions of risk and reliability, respectively.
Preferably, S2 comprises the steps of:
s2.1 based on the risk and reliability of the samples in S1.2, genetic programming is used Enhancing training data, performing t iterations, randomly selecting samples and corresponding labels in each iteration, and performing on each sample according to risk and reliabilityPerforming mutation or crossover operation;
Mutation operation: ;
Wherein, Is a binary mask, each element has a mutation rate mu of 1, otherwise, 0;
crossover operation: ;
Wherein, Is a binary mask with each element having a crossing rate gamma of 1, otherwise 0, the enhanced data set beingWhereinIs the number of samples after enhancement;
s2.2 based on the enhanced data in S2.1 Re-evaluating the reliability of the sampleCalculating the average reliability of all samplesSelecting the satisfaction of>As high quality samples, forming a high quality sample set;
preferably, in S4, the Kolmogorov-Smirnov test is used to detect the distribution difference between the new sample and the old sample, and the input new sample is set as The old sample isCalculating KS statistic S and P value P, and when the P value is smaller than a concept drift judgment threshold value drift_threshold, indicating that the concept drift occurs;
Preferably, in S5, according to the concept drift determination result, the reliability samples are selected and updated in two cases:
s5.1 when concept drift occurs, old samples are calculated using a multi-headed self-attention model And new samplesThe employed self-attention model outputs are expressed as follows:
;
wherein each head And attention weightThe calculation formula of (2) is as follows:
;
;
Wherein, Is the dimension of each of the heads,AndIs a learnable weight matrix;
;
;
judging a reliability sample threshold value theta according to the attention weight, screening samples with attention weight higher than the threshold value as candidate samples, sorting the candidate samples, and selecting the first k candidate samples as reliability samples And taking other samples as unreliable samples;
;
;
Then fromDelete k samplesCombining the selected reliability samples into the current training set, and updating the labels of the training set;
;
S5.2 calculating old samples using the self-attention model when no concept drift occurs And new samplesJudging a reliability threshold value theta according to the attention weight, screening out samples with the attention weight higher than the threshold value as candidate samples, sequencing the candidate samples, selecting the first k candidate samples as reliability samples, merging the selected reliability samples into a current training set, and updating the labels of the training set;
Preferably, in S6, when the new reliability sample n is smaller than the set number k of samples, a loss value of the non-reliability sample is calculated WhereinSelecting k-n samples as supplementary samples for the number of unreliable samples according to the loss value sequence, merging the supplementary samples into the current training set, and updating the labels of the training set;
;
preferably, in S8, the test dataset Input to the modelIn which, the classification prediction result is outputBy comparing the prediction results of the modelsWith real labelsThe performance of the model was evaluated using indexes including accuracy, precision, recall, and F1 score.
The embodiment of the application provides a network intrusion detection system based on reliability sample selection, which comprises a processor and a memory;
a memory for storing a computer program;
and a processor for implementing any of the method steps when executing the program stored on the memory.
The invention carries out data enhancement on network intrusion data through genetic programming to generate a more representative sample. The high-quality sample is selected for initial model training through reliability evaluation, so that the quality and the representativeness of training data are ensured, the initial performance of the model is improved, and the problems of uneven sample quality and unbalanced data distribution in network intrusion detection data can be effectively solved;
When the concept drift is detected during online training of the model, the invention adopts a reliability sample selection strategy guided by an attention mechanism. This strategy can preferentially select the most valuable samples for model training to update online, thereby quickly adapting to new data distributions. Through an attention mechanism, the model can dynamically adjust the selection of samples, so that the model can better adapt to the change of data distribution, and the adaptability problem of the model when facing the change of data distribution is effectively solved;
When the number of the reliability samples is insufficient, the invention selects the sample with high model loss value from the non-reliability samples as the supplementary sample. The strategy can effectively utilize the uncertainty information of the model, select samples which are most helpful to the improvement of the model performance, and effectively solve the problem of how to effectively utilize the existing data to improve the model performance when the number of the samples is limited.
Drawings
FIG. 1 is a flow chart of a network intrusion detection method based on reliability sample selection in an embodiment of the invention;
FIG. 2 is a graph of the performance contrast of a confusion matrix for a network intrusion detection method selected based on reliability samples according to an embodiment of the present invention, where (a) represents the results of the baseline model and (b) represents the results of the network intrusion detection method selected based on reliability samples;
FIG. 3 is a comparison of performance of a confusion matrix plot of an initial training model in an embodiment of the invention, where (a) represents the results of the baseline model and (b) represents the results of using a high quality sample selection strategy.
Detailed Description
The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
For network traffic attacks, in the network intrusion detection problem, an attacker aims to bypass the detection system by various means (such as data packet tampering, disguising normal traffic, etc.), so that the network system performs illegal operations or leaks sensitive information. In this embodiment, we define a normal behavior sample in network traffic as a normal sample and a sample containing attack behavior as an attack sample. The system aims to accurately distinguish benign samples from malicious samples by analyzing network traffic, so that potential network attacks are timely discovered and prevented;
as shown in fig. 1, the network intrusion detection method based on reliability sample selection provided in this embodiment includes the following steps.
The method comprises the steps of S1, dividing a network intrusion data set NSL-KDD, wherein the NSL-KDD data set comprises a training data set and a test data set, the training data set totally comprises 125973 pieces of sample data, the training data set is divided into an original training data set and an online training data set, the original training data set comprises 25194 pieces of sample data, the online training data set comprises 100779 pieces of sample data, the test data set comprises 22544 pieces of sample data, each piece of sample data in the data set comprises 41 pieces of characteristics and attack type labels normal and anomaly, and the characteristics comprise basic characteristics, content characteristics and flow-based characteristics;
Evaluating risk and reliability of original training sample, inputting original training data into pre-training model to generate pseudo label and scoring graph of data, setting original training data set as WhereinRepresenting the ith sample, a pre-trained modelThe structure is as follows:
;
Generating pseudo tags And scoring graph ;
;
;
Based on the generated pseudo tagAnd scoring graphCalculating the risk of each sampleAnd reliability;
;
Wherein g (·) and h (·) are calculation functions of risk and reliability, respectively.
S2, carrying out data enhancement on the samples according to the risk and reliability of the original training samples in the S1, and selecting high-quality samples from the enhanced data;
In particular, according to the risk of the sample And reliabilityBy genetic programmingFor training dataPerforming data enhancement, performing t iterations, randomly selecting samples and corresponding labels in each iteration, and performing data enhancement on each sample according to risk and reliabilityPerforming mutation or crossover operation:
Mutation operation: ;
Wherein, Is a binary mask, each element has a mutation rate mu of 1, otherwise, 0, wherein the mutation rate mu is 0.1;
crossover operation: ;
Wherein, Is a binary mask with each element having a crossing rate gamma of 1, otherwise 0. Where the crossover rate is 0.5, the enhanced data set isWhereinIs the number of samples after enhancement;
from enhanced data Re-evaluating the reliability of the sampleCalculating the average reliability of all samplesSelecting the satisfaction of> As high quality samples, forming a high quality sample set;
S3, inputting the high-quality sample in the S2 into a model, training by adopting a random gradient descent (Stochastic GRADIENT DESCENT, SGD) optimizer, wherein the learning rate is 0.001, and completing the initial model training after 250 iterations, wherein the training is carried out by adopting a Pytorch framework in NVIDIA GTX 3060 GPU.
S4, detecting whether the distribution of the input sample and the original sample is the same according to the input online training data so as to judge whether the concept drift occurs, adopting a Kolmogorov-Smirnov test to detect the distribution difference between the new sample and the old sample, and setting the input new sample asThe old sample is. Calculating KS statistics and P value, when the P value is smaller than the concept drift judgment threshold valueThen it is indicated that a conceptual drift has occurred;
S5, according to the judging result of the S4, selecting a reliability sample from the input samples and updating a model:
When concept drift occurs, old samples are calculated using a self-attention model And new samplesThe employed self-attention model outputs are expressed as follows:
;
wherein each head And attention weightThe calculation formula of (2) is as follows:
;
;
Wherein, Is the dimension of each of the heads,AndIs a learnable weight matrix;
;
;
And reliability sample threshold according to attention weight Judging, screening out samples with attention weights higher than the threshold as candidate samples, sorting the candidate samples, and selecting the first k candidate samples as reliability samplesAnd taking other samples as unreliable samples;
;
;
Then fromDelete k samplesCombining the selected reliability samples into the current training set, and updating the labels of the training set;
;
calculating old samples using a self-attention model when no concept drift occurs And new samplesAnd a reliability threshold is made according to the attention weightJudging, screening out samples with attention weight higher than the threshold value as candidate samples, sorting the candidate samples, selecting the first k candidate samples as reliability samples, merging the selected reliability samples into the current training set, and updating the labels of the training set;
s6, when the reliability samples in the input samples are insufficient, selecting samples with high model loss values from the non-reliability samples as supplementary samples, and when the new reliability samples n are smaller than the set sample number k, calculating the loss values of the non-reliability samples K-n samples are selected as supplementary samples according to the loss value ranking, and then the supplementary samples are combined into the current training set, and the labels of the training set are updated;
;
And S7, inputting the updated training data sets in the S5 and the S6 into the model for training until the online training data sets are completely input, training samples in the online training data sets in a segmented mode, inputting 5000 samples each time, and finishing model training after 20 iterations.
S8, testing the data setInput to the modelIn which, the classification prediction result is outputBy comparing the prediction results of the modelsWith real labelsThe performance of the model was evaluated using indexes including accuracy, precision, recall, and F1 score.
The method is compared with other classical continuous learning methods, namely SSF, AOC-IDS, EWC and LwF, by the method in the embodiment. The specific properties are shown in Table 1:
TABLE 1 comparison of the performance of the inventive method with other continuous learning methods
Method of Accuracy (%) Accuracy (%) Recall (%) F1(%)
The invention is that 92.5 91.2 96.1 93.6
SSF 90.5 89.2 94.7 91.9
AOC-IDS 81.7 78.9 92.5 85.2
EWC 81.7 89.1 77.4 82.7
LwF 82.7 89.2 79.1 83.8
As can be seen from Table 1, the invention can better complete network intrusion detection tasks, the invention improves the accuracy by 2%, which means that normal and intrusion samples can be accurately identified, improves the accuracy by 1%, which means that the more reliable the intrusion is predicted, the lower the false alarm rate is, improves the recall rate by 1.4%, which means that the detection of intrusion samples is more comprehensive, the lower the false alarm rate is, improves the F1 score by 1.7%, which means that better balance is achieved between the accuracy and the recall rate, and the overall performance is better;
In addition, FIG. 2 compares a thermodynamic diagram of a confusion matrix for a baseline algorithm and a network intrusion detection system selected based on reliability samples. When the network intrusion detection is carried out, the number of samples which are correctly predicted as attack traffic is increased, the number of samples which are incorrectly predicted as normal traffic is reduced, so that the network intrusion detection device has better capability of identifying the attack traffic, the number of samples which are correctly identified as normal traffic is increased, the number of samples which are incorrectly judged as attack traffic is reduced, and the network intrusion detection device has better capability of identifying the normal traffic.
TABLE 2 comparison of Performance of the inventive method in sequential use of the sample selection strategies
Method of Accuracy (%) Accuracy (%) Recall (%) F1(%)
Baseline 89.3 89.1 92.4 90.7
Baseline+HS 91.5 88.8 97.5 92.9
Baseline+HS+AS 92.1 89.3 97.7 93.4
Baseline+HS+AS+URS 92.5 91.2 96.1 93.6
Note that HS represents high quality sample selection, AS represents attention mechanism directed reliability sample selection, URS represents uncertain sample replenishment strategy;
As can be seen from table 2, the three strategies proposed by the present invention have improved performance on the baseline algorithm, and the high quality sample selection helping algorithm has improved 2.2% accuracy and 2.2% F1 fraction, while the accuracy is reduced by 0.3%, but the recall is improved by 5.1%. The reliability sample selection strategy guided by the attention mechanism helps the model to realize comprehensive improvement of the performance of the baseline method, and higher results are obtained on all evaluation indexes. The supplement strategy adopting the uncertainty sample helps the model achieve better balance between the accuracy rate and the recall rate;
In addition, fig. 3 also compares the impact of the high quality sample selection strategy on the initial model training, when the high quality sample selection strategy is adopted, the number of samples correctly predicted as attack traffic increases, the number of samples incorrectly predicted as normal traffic decreases, and the initial training model is better in identifying attack traffic.
TABLE 3 comparison of Performance of the inventive method for selection of different number of reliability samples
k Accuracy (%) Accuracy (%) Recall (%) F1(%)
50 91.5 89.7 96.1 92.8
150 92.1 89.5 97.3 93.3
250 92.2 89.4 97.7 93.4
350 92.5 91.2 96.1 93.6
450 92.4 90.5 96.9 93.5
As can be seen from table 3, the network intrusion detection performance was evaluated when k=50, 150, 250, 350, 450, it can be seen that using a larger number of reliability sample selections provides better accuracy and recall but results in a decrease in accuracy, and when k=350, an F1 score of 93.6% and an accuracy of 92.5% were achieved, which increased by 0.8% and 1% respectively over the k=50 results, and a better balance was achieved between accuracy and recall, so the experiment was evaluated using k=350.
The embodiment of the disclosure also provides a network intrusion detection system based on the reliability sample selection, which comprises a processor and a memory;
a memory for storing a computer program;
A processor for implementing any one of the method steps in the network intrusion detection system based on the reliability sample selection when executing the program stored on the memory;
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, or alternatives falling within the spirit and principles of the invention.

Claims (8)

1.一种基于可靠性样本选择的网络入侵检测方法,其特征在于,包括如下步骤:1. A network intrusion detection method based on reliability sample selection, characterized in that it includes the following steps: 初始训练阶段,Initial training phase, S1:划分网络入侵数据集,所述网络入侵数据集包括原始训练数据集、在线训练数据集和测试数据集;评估原始训练样本的风险性和可靠性;S1: Divide the network intrusion data set, which includes an original training data set, an online training data set, and a test data set; evaluate the risk and reliability of the original training samples; S2:根据S1中原始训练样本的风险性和可靠性对样本进行数据增强,并从增强后的数据中选择高质量样本;S2: Perform data augmentation on the samples according to the risk and reliability of the original training samples in S1, and select high-quality samples from the augmented data; S3:将S2中选择的高质量样本数据输入到模型中进行训练,获得初始训练模型;S3: Input the high-quality sample data selected in S2 into the model for training to obtain an initial training model; 在线训练阶段,During the online training phase, S4:根据输入的在线训练数据,检测输入样本和原始样本的分布是否相同,从而判断是否发生概念漂移;S4: Based on the input online training data, detect whether the distribution of the input sample and the original sample is the same, so as to determine whether concept drift occurs; S5:根据S4的判断结果,采用注意力机制对输入样本进行一定数量的可靠性样本的选择并进行训练数据集更新;包括当发生概念漂移时,使用自注意力模型计算旧样本和新样本的注意力权重,并根据注意力权重进行可靠性阈值判断,筛选出注意力权重高于该阈值的样本作为候选样本,对候选样本进行排序,选择前k个候选样本作为可靠性样本,并从旧样本中删除k个样本;未发生概念漂移时,使用自注意力模型计算旧样本和新样本的注意力权重,并根据注意力权重进行可靠性阈值判断,筛选出注意力权重高于该阈值的样本作为候选样本,对候选样本进行排序,选择前k个候选样本作为可靠性样本,不用删除旧样本;将选择的可靠性样本合并到当前训练集中,并更新训练集的标签;S5: According to the judgment result of S4, the attention mechanism is used to select a certain number of reliability samples for the input samples and update the training data set; including when concept drift occurs, the self-attention model is used to calculate the attention weights of the old and new samples, and the reliability threshold is judged according to the attention weights, and the samples with attention weights higher than the threshold are screened out as candidate samples, the candidate samples are sorted, the top k candidate samples are selected as reliability samples, and k samples are deleted from the old samples; when concept drift does not occur, the self-attention model is used to calculate the attention weights of the old and new samples, and the reliability threshold is judged according to the attention weights, and the samples with attention weights higher than the threshold are screened out as candidate samples, the candidate samples are sorted, and the top k candidate samples are selected as reliability samples without deleting the old samples; the selected reliability samples are merged into the current training set, and the labels of the training set are updated; S6:当输入样本中的可靠性样本不足时,从非可靠性样本中选择模型损失值高的样本作为补充样本;S6: When the reliability samples in the input samples are insufficient, samples with high model loss values are selected from the non-reliability samples as supplementary samples; S7:将S5和S6中更新的训练数据集输入到模型中进行训练,直到在线训练数据集被全部输入完毕;S7: input the training data set updated in S5 and S6 into the model for training until all the online training data sets are input; 测试阶段,Testing phase, S8:将测试数据输入到模型中,模型输出样本的分类预测结果;通过比较模型的预测结果与真实标签,评估模型的性能。S8: Input the test data into the model, and the model outputs the classification prediction results of the samples; evaluate the performance of the model by comparing the model's prediction results with the true labels. 2.根据权利要求1所述的一种基于可靠性样本选择的网络入侵检测方法,其特征在于,S1中,包含步骤:2. According to the network intrusion detection method based on reliability sample selection according to claim 1, it is characterized in that S1 comprises the steps of: S1.1:将原始训练数据输入到预训练模型中,生成数据的伪标签和评分图;设原始训练数据集为 ,其中表示第i个样本;预训练模型结构如下:S1.1: Input the original training data into the pre-training model to generate pseudo labels and score graphs for the data; suppose the original training data set is ,in Represents the i-th sample; pre-trained model The structure is as follows: ; 生成伪标签和评分图 Generate pseudo labels and rating graph ; ; ; S1.2:根据生成的伪标签和评分图计算原始训练数据集中每个样本的风险性和可靠性S1.2: Calculate the risk of each sample in the original training dataset based on the generated pseudo-labels and score graph and reliability ; 其中,g(·) 和 h(·) 分别是风险性和可靠性的计算函数。 Among them, g(·) and h(·) are the calculation functions of risk and reliability, respectively. 3.根据权利要求2所述的一种基于可靠性样本选择的网络入侵检测方法,其特征在于,S2中,包含步骤:3. According to the network intrusion detection method based on reliability sample selection according to claim 2, it is characterized in that S2 comprises the steps of: S2.1:根据S1.2中样本的风险性和可靠性,采用遗传编程对训练数据进行增强,进行t次迭代,每次迭代中,随机选择样本和对应的标签,根据风险性和可靠性,对样本进行变异或交叉操作;S2.1: Based on the risk and reliability of the samples in S1.2, genetic programming is used to enhance the training data for t iterations. In each iteration, samples and corresponding labels are randomly selected. Based on the risk and reliability, samples are Perform mutation or crossover operations; 变异操作:Mutation operation: ; 其中,是一个二进制掩码,每个元素以变异率 μ 为 1,否则为 0;in, is a binary mask where each element is 1 with mutation rate μ and 0 otherwise; 交叉操作:Crossover operation: ; 其中,是一个二进制掩码,每个元素以交叉率γ为1,否则为0;得到增强后的训练数据集为,其中是增强后的样本数量;in, is a binary mask, each element is 1 with a cross rate γ, otherwise it is 0; the enhanced training dataset is ,in is the number of samples after enhancement; S2.2:根据S2.1中增强的数据,重新评估样本的可靠性,计算所有样本的平均可靠性S2.2: Re-evaluate the reliability of the sample based on the enhanced data in S2.1 , calculate the average reliability of all samples : ; 选择满足的样本作为高质量样本,形成高质量样本集。Choose to meet The samples are taken as high-quality samples to form a high-quality sample set. 4.根据权利要求3所述的一种基于可靠性样本选择的网络入侵检测方法,其特征在于,S4中,采用Kolmogorov-Smirnov检验来检测新样本与旧样本之间的分布差异,设输入的新样本为,旧样本为;计算KS统计量和P值当P值小于概念漂移判定阈值,则说明发生了概念漂移。4. A network intrusion detection method based on reliability sample selection according to claim 3, characterized in that, in S4, a Kolmogorov-Smirnov test is used to detect the distribution difference between the new sample and the old sample, assuming that the input new sample is , the old sample is ; Calculate the KS statistic and P value . When the P value is less than the concept drift judgment threshold, it means that concept drift has occurred. 5.根据权利要求4所述的一种基于可靠性样本选择的网络入侵检测方法,其特征在于,S5中,根据概念漂移判定结果,分两种情况进行可靠性样本的选择和更新:5. According to the network intrusion detection method based on reliability sample selection of claim 4, it is characterized in that, in S5, according to the concept drift determination result, the reliability samples are selected and updated in two cases: S5.1:当发生概念漂移时,使用自注意力模型计算旧样本和新样本的注意力权重,采用的自注意力模型输出表示如下:S5.1: When concept drift occurs, the self-attention model is used to calculate the attention weights of old and new samples. The output of the adopted self-attention model is expressed as follows: ; 其中,每个头和注意力权重的计算公式为:Among them, the calculation formula for each head and attention weight is: ; ; 其中, 是每个头的维度, 是可学习的权重矩阵;in, is the dimension of each head, , and is a learnable weight matrix; ; ; 并根据注意力权重进行可靠性阈值判断,筛选出注意力权重高于该阈值的样本作为候选样本,对候选样本进行排序,选择前k个候选样本作为可靠性样本,并将其他样本作为非可靠性样本The reliability threshold is judged according to the attention weight, and the samples with attention weight higher than the threshold are selected as candidate samples. The candidate samples are sorted and the top k candidate samples are selected as reliability samples. , and treat the other samples as non-reliability samples ; ; ; 然后从中删除k个样本;将选择的可靠性样本合并到当前训练集中,并更新训练集的标签;Then from Delete k samples from ;Merge the selected reliability samples into the current training set and update the label of the training set; ; S5.2:未发生概念漂移时,使用自注意力模型计算旧样本和新样本的注意力权重,并根据注意力权重进行可靠性阈值判断,筛选出注意力权重高于该阈值的样本作为候选样本,对候选样本进行排序,选择前k个候选样本作为可靠性样本;将选择的可靠性样本合并到当前训练集中,并更新训练集的标签;S5.2: When concept drift does not occur, the self-attention model is used to calculate the attention weights of old and new samples, and the reliability threshold is judged based on the attention weights. Samples with attention weights higher than the threshold are selected as candidate samples, and the candidate samples are sorted. The top k candidate samples are selected as reliability samples; the selected reliability samples are merged into the current training set, and the label of the training set is updated; . 6.根据权利要求5所述的一种基于可靠性样本选择的网络入侵检测方法,其特征在于,S6中,当新可靠性样本n小于设定样本数k时,计算非可靠性样本的损失值,其中为非可靠性样本数量,根据损失值排序,选择k-n个样本作为补充样本,然后将补充样本合并到当前训练集中,并更新训练集的标签;6. A network intrusion detection method based on reliability sample selection according to claim 5, characterized in that in S6, when the new reliability sample n is less than the set sample number k, the loss value of the non-reliability sample is calculated ,in is the number of unreliable samples, sorted by loss value, select kn samples as supplementary samples, then merge the supplementary samples into the current training set, and update the label of the training set; ; . 7. 根据权利要求6所述的一种基于可靠性样本选择的网络入侵检测方法,其特征在于,S8中,将测试数据集输入到模型中,输出分类预测结果, 通过比较模型的预测结果与真实标签 ,采用包括准确率、精确率、召回率和F1分数指标来评估模型的性能。7. The network intrusion detection method based on reliability sample selection according to claim 6 is characterized in that, in S8, the test data set Input to model Output the classification prediction results , by comparing the prediction results of the model With the true label , the performance of the model is evaluated using indicators including accuracy, precision, recall and F1 score. 8.一种基于可靠性样本选择的网络入侵检测系统,其特征在于,包括处理器、存储器:8. A network intrusion detection system based on reliability sample selection, characterized by comprising a processor and a memory: 存储器,用于存放计算机程序;Memory, used to store computer programs; 处理器,用于执行存储器上所存放的程序时,实现权利要求1-7中任一所述的方法步骤。A processor, for implementing the method steps described in any one of claims 1 to 7 when executing a program stored in a memory.
CN202510645750.2A 2025-05-20 2025-05-20 A network intrusion detection method and system based on reliability sample selection Active CN120185929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510645750.2A CN120185929B (en) 2025-05-20 2025-05-20 A network intrusion detection method and system based on reliability sample selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510645750.2A CN120185929B (en) 2025-05-20 2025-05-20 A network intrusion detection method and system based on reliability sample selection

Publications (2)

Publication Number Publication Date
CN120185929A true CN120185929A (en) 2025-06-20
CN120185929B CN120185929B (en) 2025-07-15

Family

ID=96042947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510645750.2A Active CN120185929B (en) 2025-05-20 2025-05-20 A network intrusion detection method and system based on reliability sample selection

Country Status (1)

Country Link
CN (1) CN120185929B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154820A (en) * 2021-11-22 2022-03-08 南京航空航天大学 Production bottleneck prediction method based on increment simple cycle unit and double attention
KR20230099132A (en) * 2021-12-27 2023-07-04 광운대학교 산학협력단 Concept drift detecting apparatus, adaptive prediction system using the same and adaptive prediction method thereof
CN116668139A (en) * 2023-06-08 2023-08-29 南京邮电大学 A Detection Method for Concept Drift in Maliciously Encrypted DoH Traffic

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154820A (en) * 2021-11-22 2022-03-08 南京航空航天大学 Production bottleneck prediction method based on increment simple cycle unit and double attention
KR20230099132A (en) * 2021-12-27 2023-07-04 광운대학교 산학협력단 Concept drift detecting apparatus, adaptive prediction system using the same and adaptive prediction method thereof
CN116668139A (en) * 2023-06-08 2023-08-29 南京邮电大学 A Detection Method for Concept Drift in Maliciously Encrypted DoH Traffic

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
文益民等: "概念漂移数据流半监督分类综述", 《软件学报》, no. 04, 31 December 2022 (2022-12-31) *

Also Published As

Publication number Publication date
CN120185929B (en) 2025-07-15

Similar Documents

Publication Publication Date Title
CN112491796A (en) Intrusion detection and semantic decision tree quantitative interpretation method based on convolutional neural network
CN109902024A (en) A program path-sensitive gray box testing method and device
CN114565002B (en) Abnormal behavior detection method and system based on behavior and attention mechanism
Zhao et al. Suzzer: A vulnerability-guided fuzzer based on deep learning
CN114140246A (en) Model training method, fraud transaction identification method, device and computer equipment
CN115225336A (en) Vulnerability availability calculation method and device for network environment
CN111160797A (en) Wind control model construction method and device, storage medium and terminal
CN111047173A (en) Community credibility evaluation method based on improved D-S evidence theory
Vermetten et al. Is there anisotropy in structural bias?
CN118333397A (en) Prediction method for severity of marine traffic accident
Setiawan et al. Comparison of LSTM architecture for malware classification
CN113179276A (en) Intelligent intrusion detection method and system based on explicit and implicit feature learning
Marabad Credit card fraud detection using machine learning
CN119313338A (en) A transaction risk identification method and device based on AI intelligence
Parihar et al. IDS with deep learning techniques
CN120185929B (en) A network intrusion detection method and system based on reliability sample selection
Suhaimi et al. Network intrusion detection system using immune-genetic algorithm (IGA)
Paulraj Machine learning approaches for credit card fraud detection: A comparative analysis and the promise of 1D convolutional neural networks
Umaru et al. AN ENHANCED HYBRID MODEL COMBINING LSTM, RESNET, AND AN ATTENTION MECHANISM FOR CREDIT CARD FRAUD DETECTION
CN114022277A (en) Abnormal transaction detection method and device, computer equipment and storage medium
CN114519187A (en) Multi-dimensional hybrid feature-based Android malicious application detection method and system
Shanthakumara A Comparative Analysis of Supervised Classifiers for Detecting Credit Card Frauds
CN114493858A (en) A kind of illegal fund transfer suspicious transaction monitoring method and related components
CN119903901B (en) A computer virus detection method and system based on improved clone selection algorithm
CN114996256B (en) Data cleaning method based on class balance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载