+

CN104077527A - Method and device for generating virus detection machine and method and device for virus detection - Google Patents

Method and device for generating virus detection machine and method and device for virus detection Download PDF

Info

Publication number
CN104077527A
CN104077527A CN201410281468.2A CN201410281468A CN104077527A CN 104077527 A CN104077527 A CN 104077527A CN 201410281468 A CN201410281468 A CN 201410281468A CN 104077527 A CN104077527 A CN 104077527A
Authority
CN
China
Prior art keywords
virus
sample
infected
detection
virus sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410281468.2A
Other languages
Chinese (zh)
Other versions
CN104077527B (en
Inventor
薛小昊
姚辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Seal Interest Technology Co Ltd
Original Assignee
Zhuhai Juntian Electronic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Juntian Electronic Technology Co Ltd filed Critical Zhuhai Juntian Electronic Technology Co Ltd
Priority to CN201410281468.2A priority Critical patent/CN104077527B/en
Publication of CN104077527A publication Critical patent/CN104077527A/en
Application granted granted Critical
Publication of CN104077527B publication Critical patent/CN104077527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/561Virus type analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method and device for generating a virus detection machine and a method and device for virus detection. The method for generating the virus detection machine comprises the steps of obtaining a plurality of normal files, infecting the normal files through virus samples so as to generate a plurality of infected files; obtaining behavior characteristics during the virus sample running, obtaining classification results of the virus samples according to the behavior characteristics, the normal files and the infected files; obtaining virus type characteristics corresponding to the virus samples according to the classification results; and generating the virus detection machine according to the virus type characteristics. The method for generating the virus detection machine improves virus detection accuracy and reduces virus detection work complexity.

Description

Generation method and device of virus detection machine and virus detection method and device
Technical Field
The present invention relates to the field of network security technologies, and in particular, to a method and an apparatus for generating a virus detection machine, and a method and an apparatus for detecting a virus.
Background
With the development of computer technology, the variety of computer viruses is increasing. For infectious viruses, there are some general features, for example, the segment attribute is writable. Therefore, in the detection of infectious viruses, it is possible to determine whether or not a target document is infected with an infectious virus using these general-purpose characteristics.
Specifically, first, analysis of a large number of infectious virus samples is required to extract the commonality characteristics of these infectious virus samples, and a detection rule of a computer virus is formulated using the commonality characteristics according to the experience of a tester. Then, analysis of the target file is performed to extract features in the target file. And finally, detecting whether the characteristics of the target file accord with the established detection rule or not so as to judge whether the target file is infected or not.
However, the prior art has problems that the amount of work required for analyzing a large number of infectious virus samples is large, and the accuracy of a detection rule made according to the experience of a detector is low and the detection error is large.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the above-mentioned problems in the prior art. Therefore, an object of the present invention is to provide a method and an apparatus for generating a virus detection machine, and a method and an apparatus for detecting a virus, which have high detection accuracy and small detection workload.
The first aspect of the embodiments of the present invention provides a method for generating a virus detection machine, including: acquiring a plurality of normal files, and infecting the normal files through a virus sample to generate a plurality of infected files; acquiring behavior characteristics of the virus sample during operation, and acquiring a classification result of the virus sample according to the behavior characteristics, the plurality of normal files and the plurality of infected files; acquiring virus type characteristics corresponding to the virus sample according to the classification result; and generating a virus detection machine according to the virus type characteristics.
In the embodiment of the invention, by acquiring the behavior characteristics of the virus sample during operation, classifying the virus sample according to the behavior characteristics, and extracting the virus type characteristics corresponding to the virus sample according to the classification result to generate the final virus detection machine, the virus detection rule is prevented from being formulated according to experience, and the virus detection machine is used for replacing manpower to complete the virus detection work. Therefore, the accuracy of virus detection is improved, and the complexity of virus detection work is reduced.
In an embodiment of the present invention, the obtaining the classification result of the virus sample according to the behavior feature, the normal files, and the infected files specifically includes: comparing the plurality of normal files with the plurality of infected files respectively, and acquiring a comparison result; acquiring the behavior characteristics of the virus sample during operation according to the comparison result; and classifying the virus sample according to the behavior characteristics of the virus sample to obtain a classification result of the virus sample.
In an embodiment of the present invention, the classifying the virus sample according to the behavior feature of the virus sample to obtain the classification result of the virus sample specifically includes: classifying the virus sample into an infectious virus sample and a non-infectious virus sample according to the behavior characteristics of the virus sample, wherein if the number of the import functions of the non-infectious virus sample is less than a preset import function number threshold, the non-infectious virus sample is classified into a non-infectious plus-shell virus sample; if the import function number of the non-infectious virus sample is greater than or equal to a preset import function number threshold, classifying the non-infectious virus sample as a non-infectious non-enveloped virus sample; if the entry point of the infected file corresponding to the infected virus sample is different from the entry point of the normal file corresponding to the infected file, classifying the infected virus sample into an infected virus sample of a modified entry point; and if the entry point of the infected file corresponding to the infected virus sample is the same as the entry point of the normal file corresponding to the infected file, classifying the infected virus sample as an infected virus sample without modifying the entry point.
In an embodiment of the present invention, the obtaining the virus type characteristics corresponding to the virus sample according to the classification result specifically includes: and extracting the corresponding virus type characteristics from a preset infection type characteristic set according to the classification result.
In one embodiment of the present invention, the generating a virus detection machine according to the virus type characteristic includes: loading the virus type features using a machine learning machine to generate an initial virus detection machine; detecting a plurality of infected sample files by using the initial virus detector, and calculating the number of correct detection results, wherein if the number of the correct detection results is greater than a detection rate threshold value, the initial virus detector is used as a final virus detector; and if the number of correct detection results is less than or equal to the detection rate threshold, adjusting the initial virus detector to generate a final virus detector.
In an embodiment of the present invention, the adjusting the initial virus detector to generate the final virus detector further includes: and increasing the number of the virus type characteristics loaded by the initial virus detection machine until the number of the correct detection results is greater than or equal to the detection rate threshold.
Preferably, in one embodiment of the invention, the machine learning machine is a support vector machine, a neural network or a carbama-cark spectral algorithm.
A second aspect of the embodiments of the present invention provides a generating apparatus of a virus detector, including: the infected file generation module is used for acquiring a plurality of normal files and infecting the normal files through virus samples to generate a plurality of infected files; the classification result acquisition module is used for acquiring the behavior characteristics of the virus sample during operation and acquiring the classification result of the virus sample according to the behavior characteristics, the normal files and the infected files; the virus type characteristic module is used for acquiring the virus type characteristics corresponding to the virus sample according to the classification result; and the virus detection machine generation module is used for generating a virus detection machine according to the virus type characteristics.
In the embodiment of the invention, by acquiring the behavior characteristics of the virus sample during operation, classifying the virus sample according to the behavior characteristics, and extracting the virus type characteristics corresponding to the virus sample according to the classification result to generate the final virus detection machine, the virus detection rule is prevented from being formulated according to experience, and the virus detection machine is used for replacing manpower to complete the virus detection work. Therefore, the accuracy of virus detection is improved, and the complexity of virus detection work is reduced.
In a specific embodiment of the present invention, the classification result obtaining module specifically includes: the file comparison submodule is used for comparing the plurality of normal files with the plurality of infected files respectively and acquiring a comparison result; a behavior characteristic obtaining submodule, configured to obtain the behavior characteristic of the virus sample during operation according to the comparison result; and the classification result acquisition submodule is used for classifying the virus sample according to the behavior characteristics of the virus sample so as to acquire the classification result of the virus sample.
In a specific embodiment of the present invention, the classification result obtaining sub-module specifically includes: a primary classification submodule for classifying the virus sample into an infectious virus sample and a non-infectious virus sample according to the behavior characteristics of the virus sample; a secondary classification submodule, configured to perform secondary classification on the infectious virus sample and the non-infectious virus sample according to the behavior feature of the virus sample, where the secondary classification specifically includes: if the number of the import functions of the non-infection type virus sample is smaller than a preset import function number threshold value, classifying the non-infection type virus sample into a non-infection plus shell type virus sample; if the import function number of the non-infectious virus sample is greater than or equal to a preset import function number threshold, classifying the non-infectious virus sample as a non-infectious non-enveloped virus sample; if the entry point of the infected file corresponding to the infected virus sample is different from the entry point of the normal file corresponding to the infected file, classifying the infected virus sample into an infected virus sample of a modified entry point; and if the entry point of the infected file corresponding to the infected virus sample is the same as the entry point of the normal file corresponding to the infected file, classifying the infected virus sample as an infected virus sample without modifying the entry point.
In an embodiment of the present invention, the virus type feature module specifically includes: the infection type feature set presetting submodule is used for generating a preset infection type feature set according to a plurality of infection type features; and the virus type feature extraction submodule is used for extracting the corresponding virus type features from the preset infection type feature set according to the classification result.
In an embodiment of the present invention, the virus detection machine generation module specifically includes: a machine learning machine for loading the virus type features to generate an initial virus detection machine; the detection result counting submodule is used for detecting a plurality of infected sample files by using the initial virus detector and calculating the number of correct detection results; and the detector correcting submodule is used for adjusting the initial virus detector according to the correct number of the detection results and the detection rate threshold value so as to generate a final virus detector.
In a preferred embodiment of the present invention, the virus detection machine generation module is specifically configured to increase the number of the virus type features loaded by the initial virus detection machine until the number of correct detection results is greater than or equal to the detection rate threshold.
Preferably, in one embodiment of the invention, the machine learning machine is a support vector machine, a neural network or a carbama-cark spectral algorithm.
A third aspect of the embodiments of the present invention provides a virus detection method, including: acquiring a plurality of normal files, and infecting the normal files through a virus sample to generate a plurality of infected files; acquiring behavior characteristics of the virus sample during operation, and acquiring a classification result of the virus sample according to the behavior characteristics, the plurality of normal files and the plurality of infected files; acquiring virus type characteristics corresponding to the virus sample according to the classification result; and carrying out virus detection on the target file according to the virus type characteristics.
A fourth aspect of the embodiments of the present invention provides a virus detection apparatus, including: the infected file generation module is used for acquiring a plurality of normal files and infecting the normal files through virus samples to generate a plurality of infected files; the classification result acquisition module is used for acquiring the behavior characteristics of the virus sample during operation and acquiring the classification result of the virus sample according to the behavior characteristics, the normal files and the infected files; the virus type characteristic module is used for acquiring the virus type characteristics corresponding to the virus sample according to the classification result; and the virus detection machine generation module is used for carrying out virus detection according to the virus type characteristics.
Drawings
FIG. 1 is a flow chart of a method of generating a virus detection machine according to an embodiment of the invention;
FIG. 2 is a flow chart of virus sample classification for a method of generating a virus detection machine according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of virus sample classification for a method of generating a virus detection machine according to an embodiment of the invention;
FIG. 4 is a schematic structural diagram of a generating device of a virus detection machine according to an embodiment of the invention;
FIG. 5 is a schematic flow chart of extracting virus type features by a virus detection machine according to an embodiment of the present invention; and
FIG. 6 is a flow chart of a virus detection method according to an embodiment of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
In the description of the present application, "a plurality" means two or more unless specifically limited otherwise. Further, the specific meanings of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
The following describes a generation method and a generation device of a virus detection machine, a virus detection method and a virus detection device according to an embodiment of the present application with reference to the drawings.
Fig. 1 is a flowchart of a method for generating a virus detector according to an embodiment of the present invention.
As shown in fig. 1, in an embodiment of the present invention, a method for generating a virus detection machine includes:
s101, acquiring a plurality of normal files, and infecting the normal files through virus samples to generate a plurality of infected files. In one embodiment of the invention, the plurality of normal files are passed through
S102, acquiring behavior characteristics of the virus sample during operation, and acquiring a classification result of the virus sample according to the behavior characteristics, the plurality of normal files and the plurality of infected files. Specifically, the behavior characteristics of the virus sample during running include: modifying the readability, the writability and the performability of the attribute of the section where the entry point of the Portable Executable file (namely a Portable Executable file, referred to as a PE file for short) is located; modifying the section where the entry point of the PE file is located into a resource section; modifying the section where the entry point of the PE file is located into the last section; performing obfuscation transformation on codes of entry points of the PE files; adding a cross-hop in an entry code of the PE file; inserting virus codes between gaps of all sections of the PE file; adding extra sections to the PE file; modifying the section attribute of the PE file; and modifying the readability, writeability, and performability of the resource section or the data section of the PE file.
Fig. 2 is a flowchart of virus sample classification of a method of generating a virus detection machine according to an embodiment of the present invention. Specifically, as shown in fig. 2, step S102 includes:
s1021, comparing the plurality of normal files with the plurality of infected files respectively, and obtaining the comparison result.
S1022, the behavior characteristic of the virus sample in operation is obtained according to the comparison result.
S1023, classifying the virus sample according to the behavior characteristics of the virus sample to obtain a classification result of the virus sample.
Fig. 3 is a schematic diagram of virus sample b classification of a generation method of a virus detection machine according to an embodiment of the present invention.
As shown in fig. 3, step S1023 specifically includes: classifying the virus sample b into a non-infectious virus sample b1 and an infectious virus sample b2 according to the behavior characteristic e of the virus sample b, wherein if the number m of import functions of the non-infectious virus sample b1 is less than a preset import function number threshold, classifying the non-infectious virus sample b1 into a non-infectious enveloped virus sample b 101; classifying the non-infectious virus sample b1 as a non-infectious non-enveloped virus sample b102 if the number m of import functions of the non-infectious virus sample b1 is greater than or equal to a preset threshold number of import functions; if the entry point n of the infected file c corresponding to the infected virus sample b2 is different from the entry point n of the normal file a corresponding to the infected file c, classifying the infected virus sample b2 as an infected virus sample b201 of a modified entry point; and if the entry point n of the infected file c corresponding to the infected virus sample b2 is the same as the entry point n of the normal file a corresponding to the infected file c, classifying the infected virus sample b2 as an infected virus sample b202 without modifying the entry point.
S103, acquiring virus type characteristics corresponding to the virus samples according to the classification result. In an embodiment of the present invention, the virus type feature is extracted from a predetermined infection type feature set according to the classification result.
As shown in fig. 3, in an embodiment of the present invention, the code for obtaining the virus text according to the virus type feature specifically includes:
(1) and obtaining the characteristic code of the non-infected non-enveloped virus sample. And bypassing the character string of the compiler after the non-infected non-enveloped virus sample entry point, extracting one or more sections of character strings, recording the position information of the character strings, and taking the character strings and the position information of the character strings as the feature codes of the non-infected non-enveloped virus sample.
(2) And obtaining the characteristic code of the non-infected enveloped virus sample. Extracting one or more character strings of the non-infected enveloped virus sample from a set extraction position, carrying out hash calculation on the character strings, and taking the character strings subjected to the hash calculation as the feature codes of the non-infected enveloped virus sample.
(3) Modifying the characteristic code acquisition mode of the infected virus sample of the entry point. After acquiring a plurality of infected files infected by the infected virus sample for modifying the entry point, comparing character strings of the infected files after the entry point, calculating the similarity of the character strings of the infected files by using a similarity algorithm, and extracting the same part of the character strings with the similarity larger than a preset threshold value as a public character string. And replacing different parts of the character string with wildcards, wherein the similarity of the different parts of the character string is greater than a preset threshold value, and using the common character string and the wildcards as the characteristic codes of the infectious virus samples of the modified entry points.
(4) The access point's signature code acquisition mode of the infectious virus is not modified. Comparing the normal file with the corresponding infected file infected by the infectious virus sample without modifying the entry point, calculating the similarity of the character strings in the infected file, which are increased compared with the normal file, by using a similarity algorithm, and extracting the same part of the character strings with the similarity larger than a preset threshold value as a public character string. And replacing different parts of the character string with wildcards, wherein the similarity of the different parts of the character string is greater than a set threshold value, and taking the common character string and the wildcards as the feature codes of the infectious virus sample without modifying the entry point.
And S104, generating a virus detector according to the virus type characteristics.
Specifically, in the embodiment of the present invention, step S104 includes: loading the virus type features using a machine learning machine to generate an initial virus detection machine; detecting a plurality of infected sample files by using the initial virus detector, and calculating the number of correct detection results, wherein if the number of the correct detection results is greater than a detection rate threshold value, the initial virus detector is used as a final virus detector; and if the number of correct detection results is less than or equal to the detection rate threshold, adjusting the initial virus detector to generate a final virus detector.
In an embodiment of the present invention, a Support Vector Machine (SVM) is first used to perform statistical analysis on known training feature sets (an infectious virus feature set and a normal file feature set) to obtain a difference between the normal file feature set and the infectious virus file feature set, and the difference is recorded in a training file. In the embodiment of the invention, the support vector machine learns black and white samples, namely infected file samples and normal file samples, so as to obtain the distinguishing situation between virus files and normal files. Then, a plurality of normal files and corresponding infected files are used for carrying out prediction test on the support vector machine loaded with the virus type characteristics, namely, the false alarm and missing report conditions of the support vector machine are detected, and the number of correct detection results is counted. And then, according to a preset detection rate threshold, if the number of correct detection results is smaller than the detection rate threshold, properly adjusting the virus type characteristics loaded by the support vector machine, and detecting the false alarm and missing report conditions of the support vector machine again until the number of correct detection results is larger than or equal to the preset detection rate threshold. If the virus type affects the virus detection efficiency too much, the Principal Component Analysis algorithm (PCA algorithm for short) is used to reduce the number of virus type features, thereby improving the virus detection efficiency. Specifically, the virus type features with low discrimination are deleted, and the virus type features with high discrimination are added until the condition that the detection result of the support vector machine fails to report and has false alarm is in accordance with the expected result, that is, the number of the correct detection results is greater than or equal to the preset detection rate threshold. And finally, taking the adjusted support vector machine as a final virus detection machine to carry out virus detection.
In the embodiment of the invention, by acquiring the behavior characteristics of the virus sample during operation, classifying the virus sample according to the behavior characteristics, and extracting the virus type characteristics corresponding to the virus sample according to the classification result to generate the final virus detection machine, the virus detection rule is prevented from being formulated according to experience, and the virus detection machine is used for replacing manpower to complete the virus detection work. Therefore, the accuracy of virus detection is improved, and the complexity of virus detection work is reduced.
Fig. 4 is a schematic structural diagram of a generation device of a virus detection machine according to an embodiment of the present invention.
As shown in fig. 4, in an embodiment of the present invention, a generating device of a virus detection machine includes: the system comprises an infected file generation module 10, a classification result acquisition module 20, a virus type characteristic module 30 and a virus detection machine generation module 40. The infected file generating module 10 is configured to obtain a plurality of normal files a, and infect the plurality of normal files a with a virus sample b to generate a plurality of infected files c. The classification result obtaining module 20 is configured to obtain a behavior feature e of the virus sample b during operation, and obtain a classification result f of the virus sample b according to the behavior feature e, the normal files a, and the infected files c. The virus type feature module 30 is configured to obtain a virus type feature h corresponding to the virus sample b according to the classification result f. The virus detector generating module 40 is configured to generate a virus detector l according to the virus type characteristic h.
In the embodiment of the invention, by acquiring the behavior characteristics of the virus sample during operation, classifying the virus sample according to the behavior characteristics, and extracting the virus type characteristics corresponding to the virus sample according to the classification result to generate the final virus detection machine, the virus detection rule is prevented from being formulated according to experience, and the virus detection machine is used for replacing manpower to complete the virus detection work. Therefore, the accuracy of virus detection is improved, and the complexity of virus detection work is reduced.
As shown in fig. 4, in an embodiment of the present invention, the classification result obtaining module 20 specifically includes: a file comparison sub-module 201, a behavior characteristic acquisition sub-module 202 and a classification result acquisition sub-module 203. The file comparison submodule 201 is configured to compare the plurality of normal files a with the plurality of infected files c, and obtain the comparison result d. The behavior feature obtaining sub-module 202 is configured to obtain the behavior feature e of the virus sample b during operation according to the comparison result d. The classification result obtaining sub-module 203 is configured to classify the virus sample b according to the behavior feature e of the virus sample b to obtain a classification result f of the virus sample b.
In an embodiment of the present invention, the classification result obtaining sub-module 203 specifically includes: a primary classification sub-module 2031 and a secondary classification sub-module 2032. Wherein the primary classification submodule 2031 is configured to classify the virus sample b into a non-infectious virus sample b1 and an infectious virus sample b2 according to the behavior feature e of the virus sample b. The secondary classification submodule 2032 is configured to perform secondary classification on the non-infectious virus sample b1 and the infectious virus sample b2 according to the behavior feature e of the virus sample b. Wherein the secondary classification specifically comprises: if the number m of the import functions of the non-infectious virus sample b1 is less than a preset import function number threshold, classifying the non-infectious virus sample b1 as a non-infectious enveloped virus sample b 101; classifying the non-infectious virus sample b1 as a non-infectious non-enveloped virus sample b102 if the number m of import functions of the non-infectious virus sample b1 is greater than or equal to a preset threshold number of import functions; if the entry point n of the infected file c corresponding to the infected virus sample b2 is different from the entry point n of the normal file a corresponding to the infected file c, classifying the infected virus sample b2 as an infected virus sample b201 of a modified entry point; and if the entry point n of the infected file c corresponding to the infected virus sample b2 is the same as the entry point n of the normal file a corresponding to the infected file c, classifying the infected virus sample b2 as an infected virus sample b202 without modifying the entry point.
Specifically, in one embodiment of the present invention, the virus type signature module 30 includes: an infection type feature set presetting sub-module 301 and a virus type feature extraction sub-module 302. The infection type feature set presetting submodule 301 is configured to generate a preset infection type feature set g according to a plurality of infection type features. The virus type feature extraction submodule 302 is configured to extract the corresponding virus type feature h from the preset infection type feature set g according to the classification result f.
FIG. 5 is a schematic flow chart of extracting virus type features by a virus detection machine according to an embodiment of the present invention.
As shown in fig. 4 and fig. 5, in an embodiment of the present invention, the virus detection module 40 specifically includes: a machine learning machine 401, a detection result statistics submodule 402 and a detector correction and module 403. Wherein the machine learning machine 401 is configured to load the virus type feature h to generate an initial virus detection machine i. The detector verification sub-module 402 is configured to detect a plurality of infected sample files j using the initial virus detector i, and calculate the number of correct detection results. The detector syndrome module 403 is configured to adjust the initial virus detector i according to the correct number of the detection results and the detection rate threshold k to generate a final virus detector i. The virus detector generating module 40 is further configured to increase the number of the virus type features h loaded by the initial virus detector i until the number of the correct detection results is greater than or equal to the detection rate threshold k.
Preferably, in one embodiment of the present invention, the machine learning machine 401 is a support vector machine, a neural network, or a carbama-cark spectral algorithm. Of course, the machine learning machine 401 may be other algorithms that can perform machine learning.
FIG. 6 is a flow chart of a virus detection method according to an embodiment of the invention.
As shown in fig. 6, in an embodiment of the present invention, a virus detection method according to an embodiment of the present invention includes:
s201, acquiring a plurality of normal files, and infecting the normal files through virus samples to generate a plurality of infected files.
S202, acquiring the behavior characteristics of the virus sample during operation, and acquiring the classification result of the virus sample according to the behavior characteristics, the plurality of normal files and the plurality of infected files.
S203, acquiring the virus type characteristics corresponding to the virus sample according to the classification result.
And S204, carrying out virus detection on the target file according to the virus type characteristics.
Further, a virus detection apparatus according to an embodiment of the present invention includes: the system comprises an infected file generation module 10, a classification result acquisition module 20, a virus type characteristic module 30 and a virus detection module 40. The infected file generating module 10 is configured to obtain a plurality of normal files a, and infect the plurality of normal files a with a virus sample b to generate a plurality of infected files c. The classification result obtaining module 20 is configured to obtain a behavior feature e of the virus sample b during operation, and obtain a classification result f of the virus sample according to the behavior feature e, the normal files a, and the infected files c. The virus type feature module 30 is configured to obtain a virus type feature h corresponding to the virus sample b according to the classification result f. The virus detection machine generation module 40 is used for performing virus detection according to the virus type characteristics h.
In the embodiment of the invention, by acquiring the behavior characteristics of the virus sample during operation, classifying the virus sample according to the behavior characteristics, and extracting the virus type characteristics corresponding to the virus sample according to the classification result to generate the final virus detection machine, the virus detection rule is prevented from being formulated according to experience, and the virus detection machine is used for replacing manpower to complete the virus detection work. Therefore, the accuracy of virus detection is improved, and the complexity of virus detection work is reduced.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples described in this specification can be combined and combined by those skilled in the art.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (28)

1. A method for generating a virus detection machine, comprising:
acquiring a plurality of normal files, and infecting the normal files through a virus sample to generate a plurality of infected files;
acquiring behavior characteristics of the virus sample during operation, and acquiring a classification result of the virus sample according to the behavior characteristics, the normal files and the infected files;
acquiring virus type characteristics corresponding to the virus samples according to the classification result; and
and generating a virus detection machine according to the virus type characteristics.
2. The method for generating a virus detection machine according to claim 1, wherein the obtaining the classification result of the virus sample according to the behavior feature, the plurality of normal files, and the plurality of infected files specifically includes:
comparing the plurality of normal files with the plurality of infected files respectively, and acquiring the comparison result;
acquiring the behavior characteristics of the virus sample during operation according to the comparison result; and
and classifying the virus samples according to the behavior characteristics of the virus samples to obtain classification results of the virus samples.
3. The method of claim 2, wherein the classifying the virus sample according to the behavior feature of the virus sample to obtain the classification result of the virus sample specifically comprises:
classifying the virus sample into an infectious virus sample and a non-infectious virus sample according to the behavior characteristics of the virus sample,
if the number of the import functions of the non-infection type virus sample is smaller than a preset import function number threshold value, classifying the non-infection type virus sample into a non-infection plus shell type virus sample;
classifying the non-infectious virus sample as a non-infectious non-enveloped virus sample if the number of import functions of the non-infectious virus sample is greater than or equal to a preset import function number threshold;
if the entry point of the infected file corresponding to the infected virus sample is different from the entry point of the normal file corresponding to the infected file, classifying the infected virus sample into an infected virus sample of a modified entry point; and
and if the entry point of the infected file corresponding to the infectious virus sample is the same as the entry point of the normal file corresponding to the infected file, classifying the infectious virus sample as an infectious virus sample without modifying the entry point.
4. The method for generating a virus detection machine according to claim 1, wherein the obtaining of the virus type characteristics corresponding to the virus samples according to the classification result specifically includes:
and extracting the corresponding virus type characteristics from a preset infection type characteristic set according to the classification result.
5. The method of generating a virus inspection machine according to claim 1, wherein the generating a virus inspection machine according to the virus type characteristics includes:
loading the virus type features using a machine learning machine to generate an initial virus detection machine;
detecting a plurality of infected sample files using the initial virus detector and counting the number of correct detection results, wherein,
if the number of the correct detection results is larger than a detection rate threshold value, taking the initial virus detector as a final virus detector; and
and if the number of correct detection results is less than or equal to the detection rate threshold, adjusting the initial virus detection machine to generate a final virus detection machine.
6. The method of claim 5, wherein the adjusting the initial virus detector to generate the final virus detector further comprises:
and increasing the number of the virus type characteristics loaded by the initial virus detection machine until the number of correct detection results is greater than or equal to the detection rate threshold.
7. The method of claim 5, wherein the machine learning machine is a support vector machine, a neural network, or a Cammarsard-Ka spectral algorithm.
8. An apparatus for generating a virus detector, comprising:
the infected file generation module is used for acquiring a plurality of normal files and infecting the normal files through virus samples to generate a plurality of infected files;
the classification result acquisition module is used for acquiring the behavior characteristics of the virus sample during operation and acquiring the classification result of the virus sample according to the behavior characteristics, the normal files and the infected files;
the virus type characteristic module is used for acquiring virus type characteristics corresponding to the virus samples according to the classification result; and
and the virus detection machine generation module is used for generating a virus detection machine according to the virus type characteristics.
9. The apparatus for generating a virus detection machine according to claim 8, wherein the classification result obtaining module specifically includes:
the file comparison submodule is used for comparing the plurality of normal files with the plurality of infected files respectively and acquiring comparison results;
the behavior characteristic acquisition submodule is used for acquiring the behavior characteristic of the virus sample during operation according to the comparison result; and
and the classification result acquisition submodule is used for classifying the virus samples according to the behavior characteristics of the virus samples so as to acquire the classification results of the virus samples.
10. The apparatus for generating a virus detection machine according to claim 9, wherein the classification result obtaining sub-module specifically includes:
a primary classification submodule for classifying the virus sample into an infectious virus sample and a non-infectious virus sample according to the behavioral characteristics of the virus sample;
a secondary classification sub-module for secondary classification of the infectious virus sample and the non-infectious virus sample according to the behavioral characteristics of the virus samples, wherein,
the secondary classification specifically includes:
if the number of the import functions of the non-infection type virus sample is smaller than a preset import function number threshold value, classifying the non-infection type virus sample into a non-infection plus shell type virus sample;
classifying the non-infectious virus sample as a non-infectious non-enveloped virus sample if the number of import functions of the non-infectious virus sample is greater than or equal to a preset import function number threshold;
if the entry point of the infected file corresponding to the infected virus sample is different from the entry point of the normal file corresponding to the infected file, classifying the infected virus sample into an infected virus sample of a modified entry point; and
and if the entry point of the infected file corresponding to the infectious virus sample is the same as the entry point of the normal file corresponding to the infected file, classifying the infectious virus sample as an infectious virus sample without modifying the entry point.
11. The apparatus for generating a virus detection machine according to claim 8, wherein the virus type feature module specifically comprises:
the infection type feature set presetting submodule is used for generating a preset infection type feature set according to a plurality of infection type features; and
and the virus type feature extraction submodule is used for extracting the corresponding virus type features from the preset infection type feature set according to the classification result.
12. The virus detection machine generation apparatus according to claim 8, wherein the virus detection machine generation module specifically includes:
a machine learning machine to load the virus type features to generate an initial virus detection machine;
the detection result counting submodule is used for detecting a plurality of infected sample files by using the initial virus detector and calculating the number of correct detection results; and
and the detector corrector submodule is used for adjusting the initial virus detector according to the correct number of the detection results and the detection rate threshold value so as to generate a final virus detector.
13. The virus detector generation apparatus of claim 8, wherein the virus detector generation module is specifically configured to increase the number of the virus-type features loaded by the initial virus detector until the number of correct detection results is greater than or equal to the detection rate threshold.
14. The apparatus of claim 8, wherein the machine learning machine is a support vector machine, a neural network, or a karma-karman algorithm.
15. A method for detecting a virus, comprising:
acquiring a plurality of normal files, and infecting the normal files through a virus sample to generate a plurality of infected files;
acquiring behavior characteristics of the virus sample during operation, and acquiring a classification result of the virus sample according to the behavior characteristics, the normal files and the infected files;
acquiring virus type characteristics corresponding to the virus samples according to the classification result; and
and carrying out virus detection on the target file according to the virus type characteristics.
16. The virus detection method according to claim 15, wherein the obtaining of the behavior feature of the virus sample during the running and the obtaining of the classification result of the virus sample according to the behavior feature, the plurality of normal files, and the plurality of infected files specifically comprises:
comparing the plurality of normal files with the plurality of infected files respectively, and acquiring the comparison result;
acquiring the behavior characteristics of the virus sample during operation according to the comparison result; and
and classifying the virus samples according to the behavior characteristics of the virus samples to obtain classification results of the virus samples.
17. The method of claim 16, wherein the classifying the virus sample according to the behavior feature of the virus sample to obtain the classification result of the virus sample specifically comprises:
classifying the virus sample into an infectious virus sample and a non-infectious virus sample according to the behavior characteristics of the virus sample; wherein,
if the number of the import functions of the non-infection type virus sample is smaller than a preset import function number threshold value, classifying the non-infection type virus sample into a non-infection plus shell type virus sample;
classifying the non-infectious virus sample as a non-infectious non-enveloped virus sample if the number of import functions of the non-infectious virus sample is greater than or equal to a preset import function number threshold;
if the entry point of the infected file corresponding to the infected virus sample is different from the entry point of the normal file corresponding to the infected file, classifying the infected virus sample into an infected virus sample of a modified entry point; and
and if the entry point of the infected file corresponding to the infectious virus sample is the same as the entry point of the normal file corresponding to the infected file, classifying the infectious virus sample as an infectious virus sample without modifying the entry point.
18. The virus detection method according to claim 16, wherein the obtaining of the virus type characteristics corresponding to the virus sample according to the classification result specifically comprises:
and extracting the corresponding virus type characteristics from a preset infection type characteristic set according to the classification result.
19. The virus detection method of claim 16, wherein the generating a virus detection engine based on the virus type signature specifically comprises:
loading the virus type features using a machine learning machine to generate an initial virus detection machine;
detecting a plurality of infected sample files by using the initial virus detector, and calculating the number of correct detection results;
if the number of the correct detection results is larger than a detection rate threshold value, taking the initial virus detector as a final virus detector; and
and if the number of correct detection results is less than or equal to the detection rate threshold, adjusting the initial virus detection machine to generate a final virus detection machine.
20. The virus detection method of claim 16, wherein the adjusting the initial virus detector to generate a final virus detector if the detection rate threshold is greater than the number of correct detection results further comprises:
increasing the number of the virus type features loaded by the initial virus detector until the number of correct detection results is greater than or equal to the detection rate threshold.
21. The virus detection method of claim 16, wherein the machine learning machine is a support vector machine, a neural network, or a carbama-cark spectral algorithm.
22. A virus detection device, comprising:
the infected file generation module is used for acquiring a plurality of normal files and infecting the normal files through virus samples to generate a plurality of infected files;
the classification result acquisition module is used for acquiring the behavior characteristics of the virus sample during operation and acquiring the classification result of the virus sample according to the behavior characteristics, the normal files and the infected files;
the virus type characteristic module is used for acquiring virus type characteristics corresponding to the virus samples according to the classification result; and
and the virus detection machine generation module is used for carrying out virus detection according to the virus type characteristics.
23. The virus detection device according to claim 22, wherein the classification result obtaining module specifically comprises:
the file comparison submodule is used for comparing the plurality of normal files with the plurality of infected files respectively and acquiring comparison results;
the behavior characteristic acquisition submodule is used for acquiring the behavior characteristic of the virus sample during operation according to the comparison result; and
and the classification result acquisition submodule is used for classifying the virus samples according to the behavior characteristics of the virus samples so as to acquire the classification results of the virus samples.
24. The virus detection device according to claim 22, wherein the classification result obtaining submodule specifically includes:
a primary classification submodule for classifying the virus sample into an infectious virus sample and a non-infectious virus sample according to the behavioral characteristics of the virus sample;
a secondary classification sub-module for secondary classification of the infectious virus sample and the non-infectious virus sample according to the behavioral characteristics of the virus samples, wherein,
the secondary classification specifically includes:
if the number of the import functions of the non-infection type virus sample is smaller than a preset import function number threshold value, classifying the non-infection type virus sample into a non-infection plus shell type virus sample;
classifying the non-infectious virus sample as a non-infectious non-enveloped virus sample if the number of import functions of the non-infectious virus sample is greater than or equal to a preset import function number threshold;
if the entry point of the infected file corresponding to the infected virus sample is different from the entry point of the normal file corresponding to the infected file, classifying the infected virus sample into an infected virus sample of a modified entry point; and
and if the entry point of the infected file corresponding to the infectious virus sample is the same as the entry point of the normal file corresponding to the infected file, classifying the infectious virus sample as an infectious virus sample without modifying the entry point.
25. The virus detection device according to claim 22, wherein the virus type feature module specifically comprises:
the infection type feature set presetting submodule is used for generating a preset infection type feature set according to a plurality of infection type features; and
and the virus type feature extraction submodule is used for extracting the corresponding virus type features from the preset infection type feature set according to the classification result.
26. The virus detection device according to claim 22, wherein the virus detection module specifically comprises:
a machine learning machine to load the virus type features to generate an initial virus detection machine;
the detection result counting submodule is used for detecting a plurality of infected sample files by using the initial virus detector and calculating the number of correct detection results;
the detector corrector submodule is used for adjusting the initial virus detector according to the number of correct detection results and the detection rate threshold value so as to generate a final virus detector; and
and the virus detection submodule is used for detecting the viruses by using the final virus detector.
27. The virus detection apparatus of claim 22, wherein the virus detection machine generation module is specifically configured to increase the number of the virus-type features loaded by the initial virus detection machine until the number of correct detection results is greater than or equal to the detection rate threshold.
28. The virus detection apparatus of claim 22, wherein the machine learning machine is a support vector machine, a neural network, or a carbama-cark spectral algorithm.
CN201410281468.2A 2014-06-20 2014-06-20 The generation method and device and method for detecting virus and device of Viral diagnosis machine Active CN104077527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410281468.2A CN104077527B (en) 2014-06-20 2014-06-20 The generation method and device and method for detecting virus and device of Viral diagnosis machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410281468.2A CN104077527B (en) 2014-06-20 2014-06-20 The generation method and device and method for detecting virus and device of Viral diagnosis machine

Publications (2)

Publication Number Publication Date
CN104077527A true CN104077527A (en) 2014-10-01
CN104077527B CN104077527B (en) 2017-12-19

Family

ID=51598777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410281468.2A Active CN104077527B (en) 2014-06-20 2014-06-20 The generation method and device and method for detecting virus and device of Viral diagnosis machine

Country Status (1)

Country Link
CN (1) CN104077527B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899510A (en) * 2015-05-11 2015-09-09 国网甘肃省电力公司电力科学研究院 Virus detecting and killing method for removable storage devices
CN106709350A (en) * 2016-12-30 2017-05-24 腾讯科技(深圳)有限公司 Virus detection method and device
CN107315954A (en) * 2016-04-27 2017-11-03 腾讯科技(深圳)有限公司 A kind of file type identification method and server
CN112084500A (en) * 2020-09-15 2020-12-15 腾讯科技(深圳)有限公司 Method and device for clustering virus samples, electronic equipment and storage medium
CN112580037A (en) * 2019-09-30 2021-03-30 奇安信安全技术(珠海)有限公司 Method, device and equipment for repairing virus file data
CN113434863A (en) * 2021-06-25 2021-09-24 上海观安信息技术股份有限公司 Method and device for realizing remote control of host based on PE file structure

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804924A (en) * 2018-06-15 2018-11-13 深信服科技股份有限公司 A kind of method for detecting virus, system and relevant apparatus based on sandbox

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050132184A1 (en) * 2003-12-12 2005-06-16 International Business Machines Corporation Apparatus, methods and computer programs for controlling performance of operations within a data processing system or network
CN102346830A (en) * 2011-09-23 2012-02-08 重庆大学 Gradient histogram-based virus detection method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050132184A1 (en) * 2003-12-12 2005-06-16 International Business Machines Corporation Apparatus, methods and computer programs for controlling performance of operations within a data processing system or network
CN102346830A (en) * 2011-09-23 2012-02-08 重庆大学 Gradient histogram-based virus detection method

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899510A (en) * 2015-05-11 2015-09-09 国网甘肃省电力公司电力科学研究院 Virus detecting and killing method for removable storage devices
CN107315954A (en) * 2016-04-27 2017-11-03 腾讯科技(深圳)有限公司 A kind of file type identification method and server
CN107315954B (en) * 2016-04-27 2020-06-12 腾讯科技(深圳)有限公司 File type identification method and server
CN106709350A (en) * 2016-12-30 2017-05-24 腾讯科技(深圳)有限公司 Virus detection method and device
CN112580037A (en) * 2019-09-30 2021-03-30 奇安信安全技术(珠海)有限公司 Method, device and equipment for repairing virus file data
CN112580037B (en) * 2019-09-30 2023-12-12 奇安信安全技术(珠海)有限公司 Method, device and equipment for repairing virus file data
CN112084500A (en) * 2020-09-15 2020-12-15 腾讯科技(深圳)有限公司 Method and device for clustering virus samples, electronic equipment and storage medium
CN112084500B (en) * 2020-09-15 2025-08-05 腾讯科技(深圳)有限公司 Virus sample clustering method, device, electronic device and storage medium
CN113434863A (en) * 2021-06-25 2021-09-24 上海观安信息技术股份有限公司 Method and device for realizing remote control of host based on PE file structure
CN113434863B (en) * 2021-06-25 2023-11-24 上海观安信息技术股份有限公司 Method and device for realizing remote control of host based on PE file structure

Also Published As

Publication number Publication date
CN104077527B (en) 2017-12-19

Similar Documents

Publication Publication Date Title
CN104077527A (en) Method and device for generating virus detection machine and method and device for virus detection
US11783034B2 (en) Apparatus and method for detecting malicious script
US9864956B1 (en) Generation and use of trained file classifiers for malware detection
US11048798B2 (en) Method for detecting libraries in program binaries
JP6698956B2 (en) Sample data generation device, sample data generation method, and sample data generation program
US11475133B2 (en) Method for machine learning of malicious code detecting model and method for detecting malicious code using the same
CN113935033B (en) Feature fusion malicious code family classification method, device and storage medium
JP4711949B2 (en) Method and system for detecting malware in macros and executable scripts
RU2708356C1 (en) System and method for two-stage classification of files
US10032021B2 (en) Method for detecting a threat and threat detecting apparatus
CN106131071A (en) A kind of Web method for detecting abnormality and device
CN110119620A (en) System and method of the training for detecting the machine learning model of malice container
US20170193230A1 (en) Representing and comparing files based on segmented similarity
CN107229563A (en) A kind of binary program leak function correlating method across framework
CN106357618A (en) Web abnormality detection method and device
KR102192196B1 (en) An apparatus and method for detecting malicious codes using ai based machine running cross validation techniques
CN104123501B (en) A kind of viral online test method based on many assessor set
CN109670318B (en) Vulnerability detection method based on cyclic verification of nuclear control flow graph
CN105224600A (en) A kind of detection method of Sample Similarity and device
JP2018160172A (en) Malware determining method, malware determining apparatus, malware determining program
CN112861127A (en) Malicious software detection method and device based on machine learning and storage medium
Ugarte-Pedrero et al. On the adoption of anomaly detection for packed executable filtering
KR20180133726A (en) Appratus and method for classifying data using feature vector
KR102318991B1 (en) Method and device for detecting malware based on similarity
CN111200576A (en) Method for realizing malicious domain name recognition based on machine learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20181214

Address after: 519030 Room 105-53811, No. 6 Baohua Road, Hengqin New District, Zhuhai City, Guangdong Province

Patentee after: Zhuhai Seal Interest Technology Co., Ltd.

Address before: 519070, six level 601F, 10 main building, science and technology road, Tangjia Bay Town, Zhuhai, Guangdong.

Patentee before: Zhuhai Juntian Electronic Technology Co.,Ltd.

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载