+

CN111027069A - Malware family detection method, storage medium and computing device - Google Patents

Malware family detection method, storage medium and computing device Download PDF

Info

Publication number
CN111027069A
CN111027069A CN201911202586.9A CN201911202586A CN111027069A CN 111027069 A CN111027069 A CN 111027069A CN 201911202586 A CN201911202586 A CN 201911202586A CN 111027069 A CN111027069 A CN 111027069A
Authority
CN
China
Prior art keywords
malware
sample
layer
tested
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911202586.9A
Other languages
Chinese (zh)
Other versions
CN111027069B (en
Inventor
孙玉霞
宋涛
赵晶晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN201911202586.9A priority Critical patent/CN111027069B/en
Publication of CN111027069A publication Critical patent/CN111027069A/en
Application granted granted Critical
Publication of CN111027069B publication Critical patent/CN111027069B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Computer Hardware Design (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Virology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a malicious software family detection method, a storage medium and a computing device, wherein the method comprises the steps of respectively extracting the characteristics of all malicious software training samples of each class in a malicious software training set to obtain a plurality of corresponding characteristic vectors; respectively converting the plurality of feature vectors into feature images, generating an image pair according to the feature images, constructing a twin network model and training the model by using the image pair; taking out samples to be tested from the malware test set, and counting the similarity score of each sample to be tested and a malware training sample by using the trained twin network model; and calculating a threshold value, and distinguishing the sample to be detected as a known malware family or a new malware family according to the threshold value. The method and the device can be used for correctly detecting the category of the malicious software, and have a good classification effect.

Description

Malicious software family detection method, storage medium and computing device
Technical Field
The invention relates to the technical field of software security, in particular to a malicious software family detection method, a storage medium and a computing device.
Background
Malicious software is implanted into a computer of a victim by a hacker or an attacker through a security hole of an operating system or application software, so that the normal operation of a user is influenced, and malicious operations such as collection of sensitive information, stealing of super user rights and the like are performed. Generally, mainstream malware includes malware, exploits, backdoors, worms, trojan horses, spyware, rootkits, and the like, as well as combinations or variations of the above types. Malware is rapidly spread by means of the various pathways provided by the internet, affecting the proper functioning of the network. In recent years, the amount of malware has increased exponentially, making it difficult for malware analysts and antivirus software vendors to extract information from these large-scale data for analysis.
The emergence of new malware families brings new threats, which is worthy of attention of security researchers. Meanwhile, most of the existing research is dedicated to classifying malware with similar behaviors or characteristics into known malware families, but the classification method does not have the capability of distinguishing new malware families because samples of the new malware families do not participate in the training process. Therefore, how to correctly and effectively detect a new family of malware is an important research problem.
The progress of deep learning influences the solution of problems in various fields such as natural language processing, computer vision and the like, and gets rid of the dependence on characteristic engineering, so that a plurality of tasks become easier, and some tasks are better than those of human beings. In a general classification task, such a condition needs to be satisfied: the class to which the test set samples belong is consistent with the class to which the training set samples belong. By learning to distinguish samples of all known classes in the training set, the model has the ability to determine the class to which a test sample belongs. However, deep neural networks have one known disadvantage: when a sample of a class is not present in the prediction training set, it may output a value that is "too confident," i.e., overfitting. This is because the vector value output by the neural network usually has a sum of 1 for each class probability, when a sample of an unknown class is input, it still outputs the probabilities of each class, and the sum is still 1, which results in the problem that the neural network is "too confident" about what it has not seen. Thus, if the sample under test belongs to an unknown class (i.e., a class for which it is not trained), the neural network does not have the ability to output the correct results, thereby causing a misclassification. The same problem exists in the field of malware classification research, assuming that all samples of each existing malware family are collected for training, but because of natural antagonism in the field of malware research, malware authors will continuously release new malware families, so in a relatively open malware classification environment, a sample to be tested may belong to a known family in a training set or a new family that does not exist in the training set, and if a traditional classification manner is adopted, misclassification may be caused. In view of the above problems, it is necessary to develop a new malware family detection technique, that is, to detect a sample to be detected that does not belong to all known families in the training set, and label the detected sample as a new malware family.
In practical situations, many new families of malware are unrecorded or even noticed. At the same time, it is important for security researchers to quickly understand samples of a new malware family. Once it is detected that malware belongs to a new malicious family, they can look preferentially at this file, manually analyze its behavior (e.g., network activity, system calls, etc.), and remove it better only if they are aware of the malware-related behavior. In short, detecting a new malware family can mitigate new threats to cyberspace security to some extent.
In summary, it is of great importance to research a new family of malware detection technology and apply the technology to a relatively open malware detection environment. On one hand, the method can avoid the misclassification of the new malware family into the known malware family, and on the other hand, can help security researchers to pay attention to the new malware family in time.
Disclosure of Invention
The first purpose of the present invention is to overcome the drawbacks and deficiencies of the prior art, and to provide a malware family detection method, which can correctly detect the category to which malware belongs, and has a good classification effect.
A second object of the present invention is to provide a storage medium.
It is a third object of the invention to provide a computing device.
The first purpose of the invention is realized by the following technical scheme: a malware family detection method comprises the following steps:
s1, feature extraction: respectively extracting features of all the malware training samples of each class in the malware training set to obtain a plurality of corresponding feature vectors;
s2, twin network design: respectively converting the plurality of feature vectors into feature images, generating an image pair according to the feature images, constructing a twin network model and training the model by using the image pair;
s3, novelty measure: taking out samples to be tested from the malware test set, and counting the similarity score of each sample to be tested and a malware training sample by using the trained twin network model;
and calculating a threshold value, and distinguishing the sample to be detected as a known malware family or a new malware family according to the threshold value.
Preferably, in step S1, feature extraction is performed on the malware training samples to obtain corresponding feature vectors, and the process is as follows:
preprocessing a malware training sample: performing behavior analysis on each malicious software training sample to generate a corresponding report file, extracting all keywords in the report file, removing duplication, and storing the report file as a text file;
traversing all the text files stored with the keywords, constructing a dictionary according to the keywords in the text files, counting the occurrence frequency of each keyword, and deleting the keywords with the occurrence frequency equal to the sample number in the dictionary;
according to the occurrence times of the keywords, ordering the dictionaries in a descending order, and taking N keywords with the highest occurrence times as a new dictionary;
initializing an N-dimensional vector, wherein the N dimensions of the vector respectively correspond to N different keywords, traversing all the text files stored with the keywords again, judging whether the keywords appear in a new dictionary or not,
if yes, setting the corresponding dimension of the vector as 1; if not, setting the corresponding dimension of the vector as 0;
and the traversed N-dimensional binary vector is taken as a feature vector.
Furthermore, the sandbox is used for preprocessing the malware training samples, specifically, the malware training samples are submitted to the sandbox to be operated, and the sandbox generates a text file containing a behavior analysis report for each piece of malware.
Furthermore, the extracted keywords are unigrams, and the report file is a json report file.
Preferably, in step S2, the feature vectors are converted into feature images, an image pair is generated from the feature images, a twin network model is constructed, and the model is trained by using the image pair, as follows:
calculating the pixel value of each bit in the feature vector: mapping a bit value of 0 to a pixel value of 0 and a bit value of 1 to a pixel value of 255;
converting the N-dimensional eigenvector into an X multiplied by Y pixel matrix, wherein N is X.Y, X is the row number of the pixel matrix, and Y is the column number of the pixel matrix;
converting the pixel matrix into a characteristic image;
pairing the characteristic images pairwise to form a large number of image pairs, wherein the image pairs comprise similar image pairs and dissimilar image pairs;
constructing a twin network model: selecting a sub-network type of the twin network, and determining parameter configuration of a twin network model;
taking the image pair as an input to train the twin network model, and outputting the similarity of the two characteristic vectors by the twin network model;
calculating the loss function L (x)1,x2Y), loss function L (x)1,x2Y) the calculation formula is as follows:
L(x1,x2,y)=-(ylogp(x1,x2)+(1-y)log(1-p(x1,x2)))+λ||w||2
wherein x is1And x2Two characteristic images which are respectively an image pair; p (x)1,x2) Similarity for twin network model output; y is a label; lambada | | w | | non-conducting phosphor2Is the L2 weighted decay term; λ is the weight attenuation coefficient; w is the weight of the sub-network;
minimizing the calculation result of the loss function to enable the error between the output and the target output to be smaller and smaller until the twin network model converges; and when the twin network model reaches the number of training rounds, finishing the training.
Furthermore, the sub-network is a convolutional neural network, and the twin network model comprises an input layer, 4 convolutional layers, 3 pooling layers, 3 full-connection layers and an output layer;
the input layer has 2 input dimensions; the 4 convolutional layers are respectively a first convolutional layer, a second convolutional layer, a third convolutional layer and a fourth convolutional layer, the number of convolutional kernels is 32, 64 and 128, the convolutional kernels adopt the size of 5 multiplied by 5, and the activation function is ReLU; the 3 pooling layers are respectively a first pooling layer, a second pooling layer and a third pooling layer, the 3 pooling layers are in maximum pooling, and the window size is 2 multiplied by 2; the 3 full-connection layers are respectively a first full-connection layer, a second full-connection layer and a third full-connection layer, the number of neurons of the first full-connection layer is 4096, the number of neurons of the second full-connection layer is 2048, and the number of neurons of the third full-connection layer is 1;
the input layer, the first convolutional layer, the first pooling layer, the second convolutional layer, the second pooling layer, the third convolutional layer, the third pooling layer, the fourth convolutional layer, 3 full-connection layers and the output layer are sequentially connected, wherein the fourth convolutional layer is fully connected with 4096 neurons, and an activation function is ReLU; and then fully connecting the neural network with 2048 neurons, wherein an activation function is ReLU, mapping an input feature image into two 2048-dimensional feature vectors h1 and h2, taking the absolute difference between h1 and h2 as the input of a third fully-connected layer, and converting the output into a probability through a sigmoid function in the third fully-connected layer, namely normalizing the output to be between [0,1 ].
Preferably, in step S3, the samples to be tested are taken out from the malware test set, and the trained twin network model is used to calculate the similarity score between each sample to be tested and the malware training sample, where the process is as follows:
step 1, taking out a sample to be tested from a malware test set;
step 2, aiming at each sample to be tested, calculating the similarity mean value of the sample to be tested and all the malware training samples in each class in the malware training set by using the trained twin network model;
step 3, taking the maximum value in the similarity mean value as the similarity score of the sample to be detected;
and 4, repeating the steps 1-3 until the samples to be tested of the malware test set are taken, and obtaining a similarity score corresponding to each sample to be tested.
Preferably, a threshold is calculated, and the sample to be tested is distinguished as a known malware family or a new malware family according to the threshold, specifically as follows:
taking out a plurality of verification samples from the malware verification set, and counting the similarity score of each verification sample and a malware training sample by using a trained twin network model;
sequentially increasing the number of scores by taking a fixed value as a tolerance between the highest score and the lowest score of the similarity score to obtain a plurality of scores, and respectively calculating F1 scores of corresponding verification sets by taking the scores as temporary thresholds;
selecting a temporary threshold with the highest F1 score as a final threshold;
distinguishing the class of the sample to be tested according to a threshold value, and marking the class of the sample to be tested as a new malware family when the class of the sample to be tested does not belong to the known malware family in the training set;
the discrimination formula used is specifically as follows:
Figure BDA0002296237160000051
wherein X is a sample to be detected, and ND is a new family detector of the malicious software; score is the similarity score; τ is a suitable threshold; known family is a family of known malware; new family is a new malware family; otherwise denotes score ≦ τ.
The second purpose of the invention is realized by the following technical scheme: a storage medium stores a program that, when executed by a processor, implements the malware family detection method according to the first object of the present invention.
The third purpose of the invention is realized by the following technical scheme: a computing device comprising a processor and a memory for storing processor-executable programs, the processor, when executing the programs stored in the memory, implementing the malware family detection method of the first object of the present invention.
Compared with the prior art, the invention has the following advantages and effects:
(1) the invention relates to a malicious software family detection method, which comprises the steps of firstly, respectively extracting the characteristics of all malicious software training samples of each class in a malicious software training set to obtain a plurality of corresponding characteristic vectors; respectively converting the plurality of feature vectors into feature images, generating an image pair according to the feature images, constructing a twin network model and training the model by using the image pair; taking out samples to be tested from the malware test set, and counting the similarity score of each sample to be tested and a malware training sample by using the trained twin network model; and calculating a threshold value, and distinguishing the sample to be detected as a known malware family or a new malware family according to the threshold value. The detection method realizes the detection of the category of the malicious software through three steps of feature extraction, twin network design and novelty measurement, and has high detection accuracy and good classification effect; malware that does not belong to a known malware family in the training set is detected and labeled as a new malware family, and new threats to cyber-space security can be mitigated to some extent.
(2) The malicious software family detection method combines the twin network and the characteristic image, and has higher precision ratio, recall ratio, F1 score and correct ratio, and lower false alarm rate and false missing rate.
(3) In the method for detecting the malicious software family, the extracted features are not artificial features, but are automatically extracted according to the characteristics of the malicious software in operation, samples which are distributed differently from training samples are not additionally added in the training period, a plurality of training models are not needed in the process of extracting the new features, and the method is simple in process and has high popularization value.
Drawings
FIG. 1 is a flow chart of the malware family detection method of the present invention.
Fig. 2 is a flow chart of a feature vector generation process.
Fig. 3 is a flowchart of the feature image generation process.
Fig. 4 is a structural diagram of a twin network model.
FIG. 5 is a flow chart of a novelty measure.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Example 1
The embodiment discloses a malware family detection method, as shown in fig. 1, including the following steps:
s1, feature extraction: respectively performing feature extraction on all malware training samples of each class in the malware training set to obtain a plurality of corresponding feature vectors, as shown in fig. 2, the process is as follows:
s11, preprocessing the malware training sample: and performing behavior analysis on each malicious software training sample to generate a corresponding report file, extracting all keywords in the report file, removing duplication, and storing the report file as a text file. Each malware is a corresponding text file.
In this embodiment, a sandbox is used to preprocess a malware training sample, specifically, the malware training sample is submitted to the sandbox to be run, and the sandbox generates a text file containing a behavior analysis report for each malware. The sandbox may be a Cuckoo sandbox, which is a special system environment that records the behavior of programs running therein, such as API function calls, parameters passed, files created or deleted, websites and ports accessed, etc.
In this embodiment, the extracted keywords are unigrams, and the report file is a json report file. All unigrams are extracted and de-duplicated, e.g., given the sequence "api": DeleteFileW ", the extracted unigrams are" "api": and "(" DeleteFileW ","), and the json report file is saved as a txt text file.
S12, traversing all text files in which keywords are stored, constructing a dictionary according to the keywords in the text files, and counting the occurrence frequency of each keyword, and deleting the keywords whose occurrence frequency is equal to the sample number in the dictionary, that is, deleting the common keywords, for example, in this embodiment, deleting unigrams without valid information, such as field names json.
And S13, sorting the dictionaries according to the occurrence times of the keywords and the descending order, and taking the N keywords with the highest occurrence times as a new dictionary. Since this embodiment N is 20000, the dictionary of this embodiment stores the keywords of top20000 of all malware.
S14, initializing a vector with N dimensions, wherein the N dimensions of the vector respectively correspond to N different keywords, traversing all the text files stored with the keywords again, judging whether the keywords appear in the new dictionary or not,
if yes, setting the corresponding dimension of the vector as 1; if not, setting the corresponding dimension of the vector as 0;
and the traversed N-dimensional binary vector is taken as a feature vector.
S2, twin network design: respectively converting the plurality of feature vectors into feature images, generating an image pair according to the feature images, constructing a twin network model and training the model by using the image pair, wherein the process comprises the following steps:
s21, as shown in fig. 3, calculating the pixel value of each bit in the feature vector: a bit value of 0 is mapped to a pixel value of 0 and a bit value of 1 is mapped to a pixel value of 255.
The N-dimensional eigenvector is converted into an X × Y pixel matrix, where N is X · Y, X is the number of rows of the pixel matrix, and Y is the number of columns of the pixel matrix. This embodiment specifically converts the feature vector of 20000 dimensions into a pixel matrix of 200 × 100.
The pixel matrix is converted into a feature image.
And S22, pairing the characteristic images in pairs to form a plurality of image pairs, wherein the image pairs comprise similar image pairs and dissimilar image pairs. Similar image pairs may also be referred to as positive sample pairs and dissimilar image pairs may also be referred to as negative sample pairs.
S23, constructing a twin network model: selecting the sub-network type of the twin network, and determining the parameter configuration of the twin network model.
In this embodiment, the twin network model has a structure as shown in fig. 4, the sub-network is a convolutional neural network CNN, and the twin network model includes an input layer, 4 convolutional layers, 3 pooling layers, 3 fully-connected layers, and an output layer.
Parameter configuration as shown in table 1, the input layer has 2 input dimensions x1 and x 2. The 4 convolutional layers are respectively a first convolutional layer, a second convolutional layer, a third convolutional layer and a fourth convolutional layer, the number of convolutional kernels is 32, 64 and 128, the convolutional kernels are 5 multiplied by 5 in size, and the activation function is ReLU. The 3 pooling layers are respectively a first pooling layer, a second pooling layer and a third pooling layer, the 3 pooling layers are in maximum pooling, and the window size is 2 x 2. The number of the neurons of the first full connection layer is 4096, the number of the neurons of the second full connection layer is 2048, and the number of the neurons of the third full connection layer is 1.
The input layer, the first convolutional layer, the first pooling layer, the second convolutional layer, the second pooling layer, the third convolutional layer, the third pooling layer, the fourth convolutional layer, 3 full-connection layers and the output layer are sequentially connected, wherein the fourth convolutional layer is fully connected with 4096 neurons, and an activation function is ReLU; and then fully connecting the neural network with 2048 neurons, wherein an activation function is ReLU, mapping an input feature image into two 2048-dimensional feature vectors h1 and h2, taking the absolute difference between h1 and h2 as the input of a third fully-connected layer, and converting the output into a probability through a sigmoid function in the third fully-connected layer, namely normalizing the output to be between [0,1 ].
TABLE 1
Figure BDA0002296237160000091
Figure BDA0002296237160000101
And S24, training the twin network model by taking the image pair as input, and outputting the similarity of the two feature vectors by the twin network model.
S25, calculating a loss function L (x)1,x2Y), the loss function is the cross entropy of the two classes between the prediction and the target, and the calculation formula is as follows:
L(x1,x2,y)=-(ylogp(x1,x2)+(1-y)log(1-p(x1,x2)))+λ||w||2
wherein x is1And x2Two characteristic images which are respectively an image pair; p (x)1,x2) Similarity for twin network model output; y is a label; lambada | | w | | non-conducting phosphor2Is the L2 weighted decay term; λ is the weight attenuation coefficient; w is the weight of the sub-network;
minimizing the calculation result of the loss function to enable the error between the output and the target output to be smaller and smaller until the twin network model converges; and when the twin network model reaches the number of training rounds, finishing the training. The number of training rounds is specifically set to 20 rounds in this embodiment.
S3, novelty measure: as shown in fig. 5, samples to be tested are taken out from the malware test set, and the trained twin network model is used to count the similarity score between each sample to be tested and the malware training sample;
and calculating a threshold value, and distinguishing the sample to be detected as a known malware family or a new malware family according to the threshold value.
Wherein, the calculation process of the similarity score is as follows:
step 1, taking out a sample to be tested from a malware test set;
step 2, aiming at each sample to be tested, calculating the similarity mean value of the sample to be tested and all the malware training samples in each class in the malware training set by using the trained twin network model;
step 3, taking the maximum value in the similarity mean value as the similarity score of the sample to be detected;
and 4, repeating the steps 1-3 until the samples to be tested of the malware test set are taken, and obtaining a similarity score corresponding to each sample to be tested.
Calculating a threshold value, and distinguishing whether the sample to be detected is a known malware family or a new malware family according to the threshold value, wherein the method specifically comprises the following steps:
(1) taking out a plurality of verification samples from the malware verification set, and counting the similarity score of each verification sample and a malware training sample by using a trained twin network model;
(2) sequentially increasing the number of scores by taking a fixed value as a tolerance between the highest score and the lowest score of the similarity score to obtain a plurality of scores, and respectively calculating F1 scores of corresponding verification sets by taking the scores as temporary thresholds; the F1 score can be calculated by the usual F1 calculation formula. The constant value in this example is 0.1.
(3) And selecting the temporary threshold with the highest F1 score as the final threshold.
(4) And distinguishing the classes of the samples to be detected according to a threshold value, and marking the classes of the samples to be detected as a new malware family when the classes of the samples to be detected do not belong to the known malware family in the training set.
The discrimination formula used is specifically as follows:
Figure BDA0002296237160000111
wherein, X is a sample to be detected, and ND (novelty detector) is a new family detector of malicious software; score is the similarity score; τ is a suitable threshold; known family is a family of known malware; new family is a new malware family; otherwise denotes score ≦ τ.
Example 2
The embodiment discloses a storage medium, which stores a program, and when the program is executed by a processor, the method for detecting a malware family according to embodiment 1 is implemented, specifically as follows:
s1, feature extraction: respectively extracting features of all the malware training samples of each class in the malware training set to obtain a plurality of corresponding feature vectors;
s2, twin network design: respectively converting the plurality of feature vectors into feature images, generating an image pair according to the feature images, constructing a twin network model and training the model by using the image pair;
s3, novelty measure: taking out samples to be tested from the malware test set, and counting the similarity score of each sample to be tested and a malware training sample by using the trained twin network model;
and calculating a threshold value, and distinguishing the sample to be detected as a known malware family or a new malware family according to the threshold value.
The storage medium in this embodiment may be a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a Random Access Memory (RAM), a usb disk, a removable hard disk, or other media.
Example 3
The embodiment discloses a computing device, which includes a processor and a memory for storing an executable program of the processor, and when the processor executes the program stored in the memory, the method for detecting a malware family according to embodiment 1 is implemented, specifically as follows:
s1, feature extraction: respectively extracting features of all the malware training samples of each class in the malware training set to obtain a plurality of corresponding feature vectors;
s2, twin network design: respectively converting the plurality of feature vectors into feature images, generating an image pair according to the feature images, constructing a twin network model and training the model by using the image pair;
s3, novelty measure: taking out samples to be tested from the malware test set, and counting the similarity score of each sample to be tested and a malware training sample by using the trained twin network model;
and calculating a threshold value, and distinguishing the sample to be detected as a known malware family or a new malware family according to the threshold value.
The computing device described in this embodiment may be a desktop computer, a notebook computer, a smart phone, a PDA handheld terminal, a tablet computer, or other terminal device with a processor function.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (10)

1.一种恶意软件家族检测方法,其特征在于,步骤如下:1. a malware family detection method, is characterized in that, step is as follows: S1、特征提取:对恶意软件训练集中每个类的所有恶意软件训练样本分别进行特征提取,得到对应的多个特征向量;S1. Feature extraction: Feature extraction is performed on all malware training samples of each class in the malware training set to obtain multiple corresponding feature vectors; S2、孪生网络设计:将多个特征向量分别转换为特征图像,根据特征图像生成图像对,构建孪生网络模型和利用图像对对模型进行训练;S2. Siamese network design: convert multiple feature vectors into feature images respectively, generate image pairs according to the feature images, build a twin network model and use the image pairs to train the model; S3、新奇性度量:从恶意软件测试集中取出待测样本,利用训练好的孪生网络模型统计每个待测样本与恶意软件训练样本的相似度得分;S3. Novelty measurement: Take out the samples to be tested from the malware test set, and use the trained twin network model to count the similarity score between each sample to be tested and the malware training sample; 计算阈值,并根据阈值区分待测样本为已知恶意软件家族或者为新恶意软件家族。Calculate the threshold, and distinguish the sample to be tested as a known malware family or a new malware family according to the threshold. 2.根据权利要求1所述的恶意软件家族检测方法,其特征在于,步骤S1中,对恶意软件训练样本进行特征提取,得到对应的特征向量,过程如下:2. The malware family detection method according to claim 1, wherein in step S1, feature extraction is performed on the malware training sample to obtain a corresponding feature vector, and the process is as follows: 对恶意软件训练样本进行预处理:对每个恶意软件训练样本进行行为分析,生成对应的报告文件,提取出报告文件中的所有关键字并去重,将报告文件保存为文本文件;Preprocessing malware training samples: Conduct behavior analysis for each malware training sample, generate a corresponding report file, extract all keywords in the report file and remove duplicates, and save the report file as a text file; 遍历所有保存有关键字的文本文件,根据文本文件中的关键字构建字典,并统计每个关键字的出现次数,删除字典中出现次数等于样本数的关键字;Traverse all text files containing keywords, build a dictionary based on the keywords in the text file, count the number of occurrences of each keyword, and delete the keywords whose number of occurrences is equal to the number of samples in the dictionary; 根据关键字出现次数,按照从大到小的顺序对字典进行排序,并取出出现次数最高的N个关键字作为新字典;According to the number of occurrences of keywords, sort the dictionary in descending order, and take out the N keywords with the highest occurrence number as the new dictionary; 初始化一个N维的向量,该向量的N个维度分别对应N个不同的关键字,再次遍历所有保存有关键字的文本文件,判断关键字是否出现在新字典中,Initialize an N-dimensional vector, the N dimensions of the vector correspond to N different keywords, traverse all the text files with keywords again, and determine whether the keywords appear in the new dictionary, 若有,则将向量对应维度设为1;若无,则将向量对应维度设为0;If there is, set the corresponding dimension of the vector to 1; if not, set the corresponding dimension of the vector to 0; 完成遍历的N维二值向量为特征向量。The N-dimensional two-valued vector that completes the traversal is the feature vector. 3.根据权利要求2所述的恶意软件家族检测方法,其特征在于,利用沙箱对恶意软件训练样本进行预处理,具体是将恶意软件训练样本提交至沙箱中运行,沙箱为每个恶意软件生成包含行为分析报告的文本文件。3. malware family detection method according to claim 2, is characterized in that, utilizes sandbox to carry out preprocessing to malware training sample, specifically submits malware training sample to sandbox to run, sandbox is each The malware generates text files containing behavioral analysis reports. 4.根据权利要求2所述的恶意软件家族检测方法,其特征在于,提取的关键字为unigrams,报告文件为json报告文件。4. The malware family detection method according to claim 2, wherein the extracted keywords are unigrams, and the report file is a json report file. 5.根据权利要求1所述的恶意软件家族检测方法,其特征在于,在步骤S2中,将多个特征向量分别转换为特征图像,根据特征图像生成图像对,构建孪生网络模型和利用图像对对模型进行训练,过程如下:5. The malware family detection method according to claim 1, wherein in step S2, a plurality of feature vectors are converted into feature images respectively, image pairs are generated according to the feature images, a twin network model is constructed and a pair of images is utilized. To train the model, the process is as follows: 计算特征向量中每一位的像素值:将位数值0映射为像素值0,将位数值1映射为像素值255;Calculate the pixel value of each bit in the feature vector: map the bit value 0 to the pixel value 0, and map the bit value 1 to the pixel value 255; 将N维的特征向量转换为X×Y像素矩阵,N=X·Y,X为像素矩阵的行数,Y为像素矩阵的列数;Convert the N-dimensional feature vector into an X×Y pixel matrix, N=X Y, X is the number of rows of the pixel matrix, and Y is the number of columns of the pixel matrix; 将像素矩阵转换为特征图像;Convert pixel matrix to feature image; 将特征图像进行两两配对,组合成大量图像对,图像对包括相似图像对和不相似图像对;The feature images are paired in pairs, and combined into a large number of image pairs, the image pairs include similar image pairs and dissimilar image pairs; 构建孪生网络模型:选择孪生网络的子网络类型,确定孪生网络模型的参数配置;Build a twin network model: select the sub-network type of the twin network, and determine the parameter configuration of the twin network model; 将图像对作为输入对孪生网络模型进行训练,孪生网络模型输出两个特征向量的相似度;The image pair is used as input to train the twin network model, and the twin network model outputs the similarity of the two feature vectors; 计算损失函数L(x1,x2,y),损失函数L(x1,x2,y)计算公式如下:Calculate the loss function L(x 1 ,x 2 ,y), and the calculation formula of the loss function L(x 1 ,x 2 ,y) is as follows: L(x1,x2,y)=-(y log p(x1,x2)+(1-y)log(1-p(x1,x2)))+λ||w||2L(x 1 ,x 2 ,y)=-(y log p(x 1 ,x 2 )+(1-y)log(1-p(x 1 ,x 2 )))+λ||w|| 2 ; 其中,x1和x2分别为图像对的两张特征图像;p(x1,x2)为孪生网络模型输出的相似度;y为标签;λ||w||2为L2权重衰减项;λ为权重衰减系数;w为子网络的权重;Among them, x 1 and x 2 are the two feature images of the image pair, respectively; p(x 1 , x 2 ) is the similarity output by the Siamese network model; y is the label; λ||w|| 2 is the L2 weight decay term ; λ is the weight attenuation coefficient; w is the weight of the sub-network; 通过最小化损失函数的计算结果,以使输出和目标输出的误差越来越小,直至孪生网络模型收敛;当孪生网络模型达到训练轮数时,训练完成。By minimizing the calculation result of the loss function, the error between the output and the target output becomes smaller and smaller until the Siamese network model converges; when the Siamese network model reaches the number of training rounds, the training is completed. 6.根据权利要求5所述的恶意软件家族检测方法,其特征在于,子网络为卷积神经网络,孪生网络模型包括输入层、4个卷积层、3个池化层、3个全连接层和输出层;6. The malware family detection method according to claim 5, wherein the sub-network is a convolutional neural network, and the twin network model comprises an input layer, 4 convolutional layers, 3 pooling layers, and 3 full connections layer and output layer; 输入层具有2个输入维度;4个卷积层分别为第一卷积层、第二卷积层、第三卷积层和第四卷积层,包含卷积核个数分别为32、64、64、128,卷积核均采用5×5的尺寸,激活函数为ReLU;3个池化层分别为第一池化层、第二池化层和第三池化层,3个池化层均采用最大池化,窗口大小均为2×2;3个全连接层分别为第一全连接层、第二全连接层和第三全连接层,第一全连接层的神经元个数为4096,第二全连接层的神经元个数为2048,第三全连接层的神经元个数为1;The input layer has 2 input dimensions; the 4 convolutional layers are the first convolutional layer, the second convolutional layer, the third convolutional layer and the fourth convolutional layer, and the number of convolution kernels is 32 and 64 respectively. , 64, 128, the size of the convolution kernel is 5 × 5, the activation function is ReLU; the three pooling layers are the first pooling layer, the second pooling layer and the third pooling layer, and the three pooling layers are All layers use maximum pooling, and the window size is 2 × 2; the three fully connected layers are the first fully connected layer, the second fully connected layer and the third fully connected layer, and the number of neurons in the first fully connected layer is 4096, the number of neurons in the second fully connected layer is 2048, and the number of neurons in the third fully connected layer is 1; 输入层、第一卷积层、第一池化层、第二卷积层、第二池化层、第三卷积层、第三池化层、第四卷积层、3个全连接层和输出层依次连接,其中,第四卷积层与4096个神经元进行全连接,激活函数为ReLU;然后再与2048个神经元进行全连接,激活函数为ReLU,输入的特征图像映射成两个2048维的特征向量h1和h2,将h1和h2的绝对差作为第三全连接层的输入,通过第三全连接层中的sigmoid函数将输出转换为一个概率,即将输出归一化到[0,1]之间。Input layer, first convolution layer, first pooling layer, second convolution layer, second pooling layer, third convolution layer, third pooling layer, fourth convolution layer, 3 fully connected layers It is connected to the output layer in turn. Among them, the fourth convolutional layer is fully connected with 4096 neurons, and the activation function is ReLU; then it is fully connected with 2048 neurons, the activation function is ReLU, and the input feature image is mapped into two A 2048-dimensional feature vector h1 and h2, the absolute difference between h1 and h2 is used as the input of the third fully connected layer, and the output is converted into a probability through the sigmoid function in the third fully connected layer, that is, the output is normalized to [ 0,1]. 7.根据权利要求1所述的恶意软件家族检测方法,其特征在于,在步骤S3中,从恶意软件测试集中取出待测样本,利用训练好的孪生网络模型统计每个待测样本与恶意软件训练样本的相似度得分,过程如下:7. The malware family detection method according to claim 1, wherein in step S3, the sample to be tested is taken out from the malware test set, and each sample to be tested and the malware are counted by using the trained twin network model The similarity score of the training samples, the process is as follows: 步骤1、从恶意软件测试集中取出一个待测样本;Step 1. Take a sample to be tested from the malware test set; 步骤2、针对于每个待测样本,利用训练好的孪生网络模型计算出待测样本与恶意软件训练集中每个类中所有恶意软件训练样本的相似度均值;Step 2. For each sample to be tested, use the trained twin network model to calculate the average similarity between the sample to be tested and all malware training samples in each class in the malware training set; 步骤3、将相似度均值中的最大值作为待测样本的相似度得分;Step 3, taking the maximum value in the mean value of similarity as the similarity score of the sample to be tested; 步骤4、重复步骤1~3,直至取完恶意软件测试集的待测样本,每个待测样本对应得到一个相似度得分。Step 4: Repeat steps 1 to 3 until the samples to be tested in the malware test set are taken, and each sample to be tested correspondingly obtains a similarity score. 8.根据权利要求1所述的恶意软件家族检测方法,其特征在于,计算阈值,并根据阈值区分待测样本为已知恶意软件家族或者为新恶意软件家族,具体如下:8. The malware family detection method according to claim 1, wherein a threshold is calculated, and the sample to be tested is distinguished according to the threshold as a known malware family or a new malware family, as follows: 从恶意软件验证集取出多个验证样本,利用训练好的孪生网络模型统计每个验证样本与恶意软件训练样本的相似度得分;Take multiple verification samples from the malware verification set, and use the trained twin network model to calculate the similarity score between each verification sample and the malware training sample; 在相似度得分的最高分数和最低分数之间,以最低分数为基础,以一个定值作为公差,依次递增得到多个分数,并将这些分数作为临时阈值,分别计算出对应的验证集的F1分数;Between the highest score and the lowest score of the similarity score, based on the lowest score, a fixed value is used as a tolerance, and multiple scores are successively increased, and these scores are used as temporary thresholds to calculate the corresponding F1 of the validation set. Fraction; 选取F1分数最高的临时阈值作为最终的阈值;Select the temporary threshold with the highest F1 score as the final threshold; 根据阈值区分待测样本的所属类别,当待测样本的类别不属于训练集中的已知恶意软件家族,则将其标记为新恶意软件家族;Distinguish the category of the sample to be tested according to the threshold. When the category of the sample to be tested does not belong to the known malware family in the training set, it will be marked as a new malware family; 所用的区分公式具体如下:The differentiation formula used is as follows:
Figure FDA0002296237150000041
Figure FDA0002296237150000041
其中,X为待测样本,ND为恶意软件新家族检测器;score为相似度得分;τ为合适的阈值;known family为已知恶意软件家族;new family为新恶意软件家族;otherwise表示score≤τ。Among them, X is the sample to be tested, ND is the new malware family detector; score is the similarity score; τ is the appropriate threshold; known family is the known malware family; new family is the new malware family; otherwise, score≤ τ.
9.一种存储介质,存储有程序,其特征在于,所述程序被处理器执行时,实现权利要求1至8中任一项所述的恶意软件家族检测方法。9 . A storage medium storing a program, wherein when the program is executed by a processor, the malware family detection method according to any one of claims 1 to 8 is implemented. 10 . 10.一种计算设备,包括处理器以及用于存储处理器可执行程序的存储器,其特征在于,所述处理器执行存储器存储的程序时,实现权利要求1至8中任一项所述的恶意软件家族检测方法。10. A computing device, comprising a processor and a memory for storing a program executable by the processor, wherein when the processor executes the program stored in the memory, the processor described in any one of claims 1 to 8 is implemented. Malware family detection methods.
CN201911202586.9A 2019-11-29 2019-11-29 Malicious software family detection method, storage medium and computing device Active CN111027069B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911202586.9A CN111027069B (en) 2019-11-29 2019-11-29 Malicious software family detection method, storage medium and computing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911202586.9A CN111027069B (en) 2019-11-29 2019-11-29 Malicious software family detection method, storage medium and computing device

Publications (2)

Publication Number Publication Date
CN111027069A true CN111027069A (en) 2020-04-17
CN111027069B CN111027069B (en) 2022-04-08

Family

ID=70203636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911202586.9A Active CN111027069B (en) 2019-11-29 2019-11-29 Malicious software family detection method, storage medium and computing device

Country Status (1)

Country Link
CN (1) CN111027069B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783093A (en) * 2020-06-28 2020-10-16 南京航空航天大学 A Soft Dependency-Based Malware Classification and Detection Method
CN111984780A (en) * 2020-09-11 2020-11-24 深圳市北科瑞声科技股份有限公司 Multi-intention recognition model training method, multi-intention recognition method and related device
CN112001424A (en) * 2020-07-29 2020-11-27 暨南大学 Malicious software open set family classification method and device based on countermeasure training
CN112000954A (en) * 2020-08-25 2020-11-27 莫毓昌 A Malware Detection Method Based on Feature Sequence Mining and Reduction
CN112329786A (en) * 2020-12-02 2021-02-05 深圳大学 Method, device and equipment for detecting copied image and storage medium
CN112347479A (en) * 2020-10-21 2021-02-09 北京天融信网络安全技术有限公司 False alarm correction method, device, equipment and storage medium for malicious software detection
CN112764791A (en) * 2021-01-25 2021-05-07 济南大学 Incremental updating malicious software detection method and system
WO2021151343A1 (en) * 2020-09-09 2021-08-05 平安科技(深圳)有限公司 Test sample category determination method and apparatus for siamese network, and terminal device
CN113392399A (en) * 2021-06-23 2021-09-14 绿盟科技集团股份有限公司 Malicious software classification method, device, equipment and medium
CN113886821A (en) * 2021-09-01 2022-01-04 浙江大学 Malicious process identification method and device based on twin network, electronic equipment and storage medium
CN114139153A (en) * 2021-11-02 2022-03-04 武汉大学 A Malware Interpretability Classification Method Based on Graph Representation Learning
CN114266342A (en) * 2021-12-21 2022-04-01 中国科学院信息工程研究所 A method and system for detecting insider threats based on twin network
CN114462040A (en) * 2022-01-30 2022-05-10 全球能源互联网研究院有限公司 Malicious software detection model training method, malicious software detection method and malicious software detection device
CN114596454A (en) * 2022-01-25 2022-06-07 北京理工大学 A feature matching localization method and system based on Siamese convolutional neural network
CN114611102A (en) * 2022-02-23 2022-06-10 西安电子科技大学 Visual malicious software detection and classification method and system, storage medium and terminal
WO2022171067A1 (en) * 2021-02-09 2022-08-18 北京有竹居网络技术有限公司 Video processing method and apparatus, and storage medium and device
CN116010950A (en) * 2022-12-22 2023-04-25 广东工业大学 Malicious software detection method and system based on ViT twin neural network
CN116561630A (en) * 2023-05-11 2023-08-08 李亚康 Model training method, software detection method and device
CN118555149A (en) * 2024-07-30 2024-08-27 大数据安全工程研究中心(贵州)有限公司 Abnormal behavior safety analysis method based on artificial intelligence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803039A (en) * 2016-12-30 2017-06-06 北京神州绿盟信息安全科技股份有限公司 The homologous decision method and device of a kind of malicious file
CN108256325A (en) * 2016-12-29 2018-07-06 中移(苏州)软件技术有限公司 A kind of method and apparatus of the detection of malicious code mutation
CN109145605A (en) * 2018-08-23 2019-01-04 北京理工大学 A kind of Android malware family clustering method based on SinglePass algorithm
CN109670304A (en) * 2017-10-13 2019-04-23 北京安天网络安全技术有限公司 Recognition methods, device and the electronic equipment of malicious code family attribute
US10491627B1 (en) * 2016-09-29 2019-11-26 Fireeye, Inc. Advanced malware detection using similarity analysis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10491627B1 (en) * 2016-09-29 2019-11-26 Fireeye, Inc. Advanced malware detection using similarity analysis
CN108256325A (en) * 2016-12-29 2018-07-06 中移(苏州)软件技术有限公司 A kind of method and apparatus of the detection of malicious code mutation
CN106803039A (en) * 2016-12-30 2017-06-06 北京神州绿盟信息安全科技股份有限公司 The homologous decision method and device of a kind of malicious file
CN109670304A (en) * 2017-10-13 2019-04-23 北京安天网络安全技术有限公司 Recognition methods, device and the electronic equipment of malicious code family attribute
CN109145605A (en) * 2018-08-23 2019-01-04 北京理工大学 A kind of Android malware family clustering method based on SinglePass algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CORDONSKY I ETC.: "DeepOrigin: End-to-End Deep Learning for Detection of New Malware Families", 《2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)》 *
沈雁 等: "基于改进深度孪生网络的分类器及其应用", 《计算机工程与应用》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783093A (en) * 2020-06-28 2020-10-16 南京航空航天大学 A Soft Dependency-Based Malware Classification and Detection Method
CN112001424B (en) * 2020-07-29 2023-05-23 暨南大学 Malicious software open set family classification method and device based on countermeasure training
CN112001424A (en) * 2020-07-29 2020-11-27 暨南大学 Malicious software open set family classification method and device based on countermeasure training
CN112000954A (en) * 2020-08-25 2020-11-27 莫毓昌 A Malware Detection Method Based on Feature Sequence Mining and Reduction
CN112000954B (en) * 2020-08-25 2024-01-30 华侨大学 Malicious software detection method based on feature sequence mining and simplification
WO2021151343A1 (en) * 2020-09-09 2021-08-05 平安科技(深圳)有限公司 Test sample category determination method and apparatus for siamese network, and terminal device
CN111984780A (en) * 2020-09-11 2020-11-24 深圳市北科瑞声科技股份有限公司 Multi-intention recognition model training method, multi-intention recognition method and related device
CN112347479A (en) * 2020-10-21 2021-02-09 北京天融信网络安全技术有限公司 False alarm correction method, device, equipment and storage medium for malicious software detection
CN112347479B (en) * 2020-10-21 2021-08-24 北京天融信网络安全技术有限公司 False alarm correction method, device, equipment and storage medium for malicious software detection
CN112329786A (en) * 2020-12-02 2021-02-05 深圳大学 Method, device and equipment for detecting copied image and storage medium
CN112329786B (en) * 2020-12-02 2023-06-16 深圳大学 A duplicate image detection method, device, equipment and storage medium
CN112764791A (en) * 2021-01-25 2021-05-07 济南大学 Incremental updating malicious software detection method and system
CN112764791B (en) * 2021-01-25 2023-08-08 济南大学 Incremental update malicious software detection method and system
WO2022171067A1 (en) * 2021-02-09 2022-08-18 北京有竹居网络技术有限公司 Video processing method and apparatus, and storage medium and device
CN113392399A (en) * 2021-06-23 2021-09-14 绿盟科技集团股份有限公司 Malicious software classification method, device, equipment and medium
CN113886821A (en) * 2021-09-01 2022-01-04 浙江大学 Malicious process identification method and device based on twin network, electronic equipment and storage medium
CN114139153A (en) * 2021-11-02 2022-03-04 武汉大学 A Malware Interpretability Classification Method Based on Graph Representation Learning
CN114266342A (en) * 2021-12-21 2022-04-01 中国科学院信息工程研究所 A method and system for detecting insider threats based on twin network
CN114596454A (en) * 2022-01-25 2022-06-07 北京理工大学 A feature matching localization method and system based on Siamese convolutional neural network
CN114462040A (en) * 2022-01-30 2022-05-10 全球能源互联网研究院有限公司 Malicious software detection model training method, malicious software detection method and malicious software detection device
CN114611102A (en) * 2022-02-23 2022-06-10 西安电子科技大学 Visual malicious software detection and classification method and system, storage medium and terminal
CN116010950A (en) * 2022-12-22 2023-04-25 广东工业大学 Malicious software detection method and system based on ViT twin neural network
CN116010950B (en) * 2022-12-22 2025-07-11 广东工业大学 A malware detection method and system based on ViT twin neural network
CN116561630A (en) * 2023-05-11 2023-08-08 李亚康 Model training method, software detection method and device
CN118555149A (en) * 2024-07-30 2024-08-27 大数据安全工程研究中心(贵州)有限公司 Abnormal behavior safety analysis method based on artificial intelligence

Also Published As

Publication number Publication date
CN111027069B (en) 2022-04-08

Similar Documents

Publication Publication Date Title
CN111027069B (en) Malicious software family detection method, storage medium and computing device
CN112329016B (en) A visual malware detection device and method based on deep neural network
CN108737406B (en) Method and system for detecting abnormal flow data
TWI673625B (en) Uniform resource locator (URL) attack detection method, device and electronic device
CN110704840A (en) Convolutional neural network CNN-based malicious software detection method
CN109302410B (en) Method and system for detecting abnormal behavior of internal user and computer storage medium
Lu et al. An efficient combined deep neural network based malware detection framework in 5G environment
RU2708356C1 (en) System and method for two-stage classification of files
CN111915437A (en) RNN-based anti-money laundering model training method, device, equipment and medium
CN104715194B (en) Malware detection method and apparatus
Kakisim et al. Sequential opcode embedding-based malware detection method
CN112437053B (en) Intrusion detection method and device
CN113221112B (en) Malicious behavior identification method, system and medium based on weak correlation integration strategy
Widiono et al. Phishing website detection using bidirectional gated recurrent unit model and feature selection
CN118400152A (en) Network intrusion detection method
CN107944273A (en) A kind of malice PDF document detection method based on TF IDF algorithms and SVDD algorithms
Alazab et al. Detecting malicious behaviour using supervised learning algorithms of the function calls
CN117633811A (en) A code vulnerability detection method based on multi-view feature fusion
CN111400713B (en) Malicious software population classification method based on operation code adjacency graph characteristics
CN113901514A (en) Defense method for data attack and training method and system for model thereof
CN119030787B (en) Security protection method, device and storage medium based on network threat intelligence analysis
CN112632541B (en) Method, device, computer equipment and storage medium for determining malicious degree of behavior
CN111783088B (en) Malicious code family clustering method and device and computer equipment
CN113259369A (en) Data set authentication method and system based on machine learning member inference attack
Waghmare et al. A review on malware detection methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载