KR101930293B1

KR101930293B1 - Apparatus and Method for Identifying Variety Malicious Code Using Static Analysis and Dynamic Analysis

Info

Publication number: KR101930293B1
Application number: KR1020170116412A
Authority: KR
Inventors: 권태경; 유정빈; 박래현
Original assignee: 연세대학교 산학협력단
Priority date: 2017-09-12
Filing date: 2017-09-12
Publication date: 2018-12-18
Anticipated expiration: 2037-09-12

Abstract

본 발명의 실시예에 따른 변종 악성코드 식별 방법은 유입되는 복수의 악성코드들을 정적 분석을 통해 기존 악성코드 또는 변종 악성코드로 1차 분류하는 단계, 2차 분류를 위하여, 미리 학습된 학습데이터를 기반으로 상기 1차 분류된 기존 악성코드를 분류하거나, 상기 1차 분류된 변종 악성코드에 대한 데이터들을 동적 분석을 통해 적어도 하나의 대표 악성코드들을 선정하는 단계 및 상기 2차 분류된 기존 악성코드 및 상기 대표 악성코드를 기 설정된 기준 값에 따라 상호간의 관계를 식별하는 단계를 포함할 수 있다.The malicious code identification method according to an embodiment of the present invention includes the steps of first classifying a plurality of incoming malicious codes into existing malicious codes or variant malicious codes through static analysis, Selecting at least one representative malicious code by dynamically analyzing the data of the first classified malicious code based on the first classified malicious code based on the first classified malicious code, And identifying the mutual relationship of the representative malicious code according to a predetermined reference value.

Description

TECHNICAL FIELD The present invention relates to an apparatus and method for identifying malicious codes using static analysis and dynamic analysis,

본 발명의 실시예가 속하는 기술 분야는 악성코드를 식별하는 장치 및 방법에 관한 것이고, 보다 상세하게는 정적 분석과 정적 분석을 이용하여 변종 악성코드를 식별하는 장치 및 방법에 관한 것이다.Technical field to which the embodiments of the present invention belong is an apparatus and method for identifying a malicious code, and more particularly, to an apparatus and method for identifying a malicious code using static analysis and static analysis.

최근 사용자의 편의를 제공하기 위하여 컴퓨팅을 이용한 다양한 서비스가 제공됨에 따라 공인 인증서나 카드 정보, 로그인 정보 등과 같은 중요한 정보들은 컴퓨팅 기기에 저장하는 횟수가 증가하였으며, 이를 이용하는 사례가 빈번히 발생하고 있어 정보의 보호에 대한 관심과 필요성이 높아지고 있다.Recently, as various services using computing have been provided to provide convenience of users, important information such as authorized certificates, card information, and login information have been frequently stored in computing devices, There is a growing interest and need for protection.

컴퓨팅 기기의 대표적인 비정상 행위로는 악성 코드에 의한 행위가 있을 수 있으며, 악성 코드(Malicious code)는 컴퓨팅 기기에 피해를 주기 위해 제작된 모든 소프트웨어의 총칭을 말한다.Malicious code (malicious code) is a generic term for all software that is designed to inflict damage on a computing device.

종래의 기존 악성코드 분석 기술은 악성코드를 실제로 실행하지 않고 코드를 활용하는 정적 분석 또는 통제가 가능한 가상 환경에서 직접 실행시켜 악성코드의 행위를 식별하는 동적 분석을 이용하여 분석을 했다. Conventional existing malicious code analysis techniques have been analyzed using dynamic analysis that identifies the malicious code's behavior by directly executing it in a static analysis or controllable virtual environment that uses code without actual execution of the malicious code.

그러나, 정적 분석 또는 동적 개별 분석은 각 방식에 따라 한계를 가지고 있다. 정적 분석방식은 시간 비용이 낮지만, 변종 악성코드에 유연하지 못한 한계가 있고, 동적 분석방식은 변종 악성코드에 유연하지만, 시간 비용이 높은 한계점이 있다. However, static analysis or dynamic individual analysis has limitations depending on each method. Although the static analysis method has a low time cost, there are limitations to the variant malicious code and the dynamic analysis method is flexible to the variant malicious code, but the time cost is high.

나아가, 악성코드 식별 기술이 진화함에 따라서 악성코드 제작자의 제작 기술 또한 더 교묘해지고 있어 기존 악성코드 분석 기술로는 정확한 악성코드 식별이 어렵다는 문제가 있었다.Furthermore, as the malicious code identification technology evolves, the manufacturing technology of the malicious code maker becomes more sophisticated, so that there is a problem that it is difficult to identify the malicious code with the existing malicious code analysis technology.

한국 공개 특허 제10-2011-0124918호 (공개)Korean Patent Publication No. 10-2011-0124918 (published)

상기와 같은 문제점을 해결하기 위하여, 본 발명의 변종 악성코드 식별 장치 및 방법을 제공함으로써, 유입된 변종 악성코드의 기원을 식별하고, 종래의 정적 또는 동적 분석을 이용한 식별 장치 및 방법보다 빠르고 정확하게 식별할 수 있도록 하는 것을 목적으로 한다.In order to solve the above problems, it is an object of the present invention to provide an apparatus and method for identifying malignant variant malicious codes of the present invention to identify origin of malignant variant malicious codes and to identify and identify malignant malignant codes faster and more accurately than conventional identification methods and methods using static or dynamic analysis. To be able to do so.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 변종 악성코드 식별 방법은 유입되는 복수의 악성코드들을 정적 분석을 통해 기존 악성코드 또는 변종 악성코드로 1차 분류하는 단계, 2차 분류를 위하여, 미리 학습된 학습데이터를 기반으로 상기 1차 분류된 기존 악성코드를 분류하거나, 상기 1차 분류된 변종 악성코드에 대한 데이터들을 동적 분석을 통해 적어도 하나의 대표 악성코드들을 선정하는 단계 및 상기 2차 분류된 기존 악성코드 및 상기 대표 악성코드를 기 설정된 기준 값에 따라 상호간의 관계를 식별하는 단계를 포함할 수 있다.According to an aspect of the present invention, there is provided a method for identifying a malicious code, the method comprising: classifying a plurality of incoming malicious codes into an existing malicious code or a variant malicious code through static analysis; Classifying the existing malicious code classified in the primary classification on the basis of learning data learned in advance or selecting at least one representative malicious code through dynamic analysis of data on the primary classified malicious code, And distinguishing the existing classified malicious code and the representative malicious code from each other according to a predetermined reference value.

또한, 상기 관계 식별 결과에 따라서 상기 2차 분류된 기존 악성코드 및 상기 변종 악성코드를 하나의 제1 그룹으로 통합하거나 새로운 제2 그룹을 생성하는 단계를 더 포함할 수 있다.The method may further include merging the existing malicious code and the variant malicious code classified into the second group into a first group or creating a new second group according to the relationship identification result.

또한, 상기 통합된 제1 그룹 또는 상기 생성된 제2 그룹을 이용하여, 유입되는 변종 악성코드를 식별하기 위한 상기 학습데이터로 학습시킴으로써, 상기 학습데이터를 업데이트하는 단계를 더 포함할 수 있다.The method may further include updating the learning data by learning with the learning data for identifying the variant malicious code that is input using the integrated first group or the generated second group.

또한, 상기 2차 분류된 기존 악성코드 및 대표 악성코드의 관계를 식별하는 단계는, 기 설정된 알고리즘에 따라 상기 2차 분류된 기존 악성코드 및 상기 대표 악성코드 간 유사도를 산출함으로써 상기 관계를 식별할 수 있다.The step of identifying the relationship between the existing classified malicious code and the representative malicious code may further include identifying the relationship by calculating the similarity between the existing malicious code and the representative malicious code that are secondary classified according to a predetermined algorithm .

또한, 상기 제1 그룹으로 통합하거나 제2 그룹을 생성하는 단계는, 상기 산출된 유사도와 상기 기준값을 비교함에 따라 상기 제1 그룹으로 통합하거나 상기 제2 그룹으로 생성할 수 있다. In addition, the step of integrating into the first group or generating the second group may be integrated into the first group or may be generated into the second group by comparing the calculated similarity and the reference value.

또한, 상기 제1 그룹으로 통합하거나 제2 그룹을 생성하는 단계는, 상기 산출된 유사도가 상기 기준값보다 크거나 동일하면 상기 1차 분류된 기존 악성코드와 상기 변종 악성코드를 하나의 그룹으로 통합하고, 상기 산출된 유사도가 상기 기준값보다 작으면 상기 대표 악성코드에 대한 새로운 그룹을 생성할 수 있다.If the calculated similarity is greater than or equal to the reference value, the merging of the first group or the second group may include merging the existing classified malicious code and the variant malicious code into one group , And if the calculated similarity is smaller than the reference value, a new group for the representative malicious code can be generated.

또한, 상기 적어도 하나의 대표 악성코드들을 선정하는 단계는, 상기 1차 분류된 변종 악성코드들을 기 설정된 기준에 따라 군집화(clustering)하기 위해 적어도 하나의 변종 군집들을 생성하고, 생성된 각 변종 군집들의 대표 악성코드를 선정할 수 있다.The selecting of the at least one representative malicious code may include generating at least one variant clusters for clustering the first classified malicious codes according to a predetermined criterion, A representative malicious code can be selected.

또한, 상기 적어도 하나의 대표 악성코드들을 선정하는 단계는, 상기 변종 군집들을 생성하기 위해 적어도 하나의 중심점(Centroid)들을 선정하는 단계, 상기 선정된 변종 군집들 각각에 대한 중심점들과 상기 1차 분류된 변종 악성 코드들 간 거리를 계산하는 단계 및 상기 계산 결과에 따라 상기 중심점과 가장 가까운 거리에 위치하는 변종 코드를 상기 대표 악성코드로 선정하는 단계를 더 포함할 수 있다.In addition, the step of selecting the at least one representative malicious code may include selecting at least one centroid to generate the variant clusters, selecting center points for each of the selected variant clusters, Calculating a distance between the variant malicious codes, and selecting a variant code located at a distance closest to the center point according to the calculation result, as the representative malicious code.

또한, 상기 1차 분류하는 단계는, 유입되는 복수의 악성코드들 간의 유사도를 분석하고, 분석 결과에 따라 상기 악성코드들을 기존 악성코드 또는 변종 악성코드로 분류할 수 있다.The first sorting step may analyze the similarity between the plurality of incoming malicious codes and classify the malicious codes into existing malicious codes or malicious malicious codes according to the analysis result.

본 발명의 또 다른 실시예에 따른 변종 악성코드 식별 장치는, 유입되는 복수의 악성코드들을 정적 분석을 통해 기존 악성코드 또는 변종 악성코드로 1차 분류하는 1차 분류모듈, 2차 분류를 위하여, 미리 학습된 학습데이터를 기반으로 상기 1차 분류된 기존 악성코드를 분류하거나, 상기 1차 분류된 변종 악성코드에 대한 데이터들을 동적 분석을 통해 적어도 하나의 대표 악성코드들을 선정하는 2차 분류 모듈 및 기 설정된 알고리즘에 따라 상기 2차 분류된 기존 악성코드 및 상기 대표 악성코드 간 유사도를 산출하고, 산출된 유사도를 기반으로 상호간의 관계를 식별하는 관계 식별부를 포함할 수 있다.According to another embodiment of the present invention, there is provided an apparatus for discriminating malicious codes, comprising: a first classifying module for classifying a plurality of incoming malicious codes into existing malicious codes or variant malicious codes through static analysis; A secondary classification module for classifying the existing malicious code classified on the basis of previously learned learning data or selecting at least one representative malicious code through dynamic analysis of the data for the primary classified malicious code, And a relation distinguishing unit for calculating the similarity between the existing malicious code and the representative malicious code classified in the secondary classification according to a predetermined algorithm and identifying the mutual relationship based on the calculated similarity.

또한, 상기 관계 식별 결과에 따라서 상기 2차 분류된 기존 악성코드 및 상기 변종 악성코드를 하나의 제1 그룹으로 통합하는 그룹통합부 및 새로운 제2 그룹을 생성하는 그룹생성부를 더 포함할 수 있다.The apparatus may further include a group integration unit for merging the existing malicious code and the variant malicious code into a first group according to the relationship identification result, and a group generation unit for generating a new second group.

또한, 상기 통합된 제1 그룹 또는 상기 생성된 제2 그룹을 이용하여, 유입되는 변종 악성코드를 식별하기 위한 상기 학습데이터로서 학습시키는 지도학습부를 더 포함할 수 있다.Further, the information processing apparatus may further include a map learning unit which learns, using the integrated first group or the generated second group, as the learning data for identifying the infected malicious code.

또한, 상기 그룹통합부는 상기 산출된 유사도가 상기 기준값보다 크거나 동일하면 상기 2차 분류된 기존 악성코드와 상기 변종 악성코드를 하나의 그룹으로 통합하고, 상기 그룹생성부는 상기 산출된 유사도가 상기 기준값보다 작으면 상기 대표 악성코드에 대한 새로운 그룹을 생성할 수 있다.If the calculated degree of similarity is greater than or equal to the reference value, the group merging unit merges the existing malicious code and the variant malicious code, which are secondary classified, into one group, A new group for the representative malicious code can be generated.

또한, 상기 2차 분류 모듈은, 상기 1차 분류된 변종 악성코드에 대한 데이터들을 기 설정된 기준에 따라 군집화(clustering)하기 위해 적어도 하나의 변종 군집들을 생성하고, 생성된 각 변종 군집들의 대표 악성코드를 선정할 수 있다. In addition, the secondary classification module may generate at least one variant clusters for clustering data on the first-order variant malicious code according to a predetermined criterion, and generate representative malicious codes Can be selected.

본 발명의 변종 악성코드 식별 장치 및 방법은 정적 및 동적 방식에 따른 문제점을 상호보완 할 수 있고, 나아가 기계학습을 통해 추후에 유입되는 알려지지 않은 변종 악성코드를 식별하는데 있어서 보다 정확하고 유연하게 분석할 수 있는 효과가 있다.The apparatus and method for identifying malicious codes according to the present invention can solve the problems according to the static and dynamic methods, and can further analyze malicious codes that are unknown to be introduced later by machine learning more accurately and flexibly There is an effect that can be.

도1은 본 발명의 일 실시예에 따른 악성코드 식별 장치의 구성을 개략적으로 도시한 블록도이다.
도2는 본 발명의 일 실시예에 따른 변종 악성코드 식별 방법을 시간의 흐름에 따라 도시한 흐름도이다.1 is a block diagram schematically illustrating a configuration of a malicious code identification apparatus according to an embodiment of the present invention.
FIG. 2 is a flowchart illustrating a malicious code identification method according to an embodiment of the present invention, according to the flow of time.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다. In order to fully understand the present invention, operational advantages of the present invention, and objects achieved by the practice of the present invention, reference should be made to the accompanying drawings and the accompanying drawings which illustrate preferred embodiments of the present invention.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써, 본 발명을 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, the present invention will be described in detail with reference to the preferred embodiments of the present invention with reference to the accompanying drawings. However, the present invention may be embodied in many different forms,

수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계 없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다. And the present invention is not limited to the illustrated embodiment. In order to clearly describe the present invention, parts that are not related to the description are omitted, and the same reference numerals in the drawings denote the same members.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함" 한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...부", "...기", "모듈", "블록"등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로구현될 수 있다.Throughout the specification, when an element is referred to as "including" an element, it does not exclude other elements unless specifically stated to the contrary. The terms "part", "unit", "module", "block", and the like described in the specification mean units for processing at least one function or operation, And a combination of software.

이하, 본 발명의 일 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략할 수 있다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

이하에서는 본 발명의 실시예에 따른 악성코드 식별 장치의 구성을 관련된 도면을 참조하여 상세히 설명한다. 예컨대, 본 발명의 실시예에 따른 악성코드 식별 장치는 컴퓨터와 같이 독립된 사용자 디바이스에 탑재되는 것이 바람직하다. 이러한, 본 발명의 악성코드 식별 장치는 사용자 단말에 설치되는 어플리케이션 형태로 구현될 수도 있다. Hereinafter, the configuration of a malicious code identification apparatus according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings. For example, the malicious code identification apparatus according to an embodiment of the present invention is preferably installed in an independent user device such as a computer. The malicious code identification apparatus of the present invention may be implemented in an application form installed in a user terminal.

도1은 본 발명의 일 실시예에 따른 악성코드 식별 장치의 구성을 개략적으로 도시한 블록도이다. 도1에 도시된 바와 같이, 본 발명의 악성코드 식별 장치는 1차 분류 모듈(110), 2차 분류 모듈(120), 관계식별부(130), 그룹생성부(141), 그룹통합부(142) 및 지도학습부(150)를 포함할 수 있다. 본 발명의 악성코드 식별 장치는 도면에 도시하지는 않았지만, 외부로부터 파일을 다운로드 하거나 웹서버로부터 웹페이지 소스를 다운로드하는 다운로드부 및 실행파일을 저장하는 데이터베이스를 더 포함하여, 데이터베이스에 저장된 실행파일을 정적 및 동적으로 분석하여, 분석된 데이터에 기초하여 악성코드를 식별할 수 있다.1 is a block diagram schematically illustrating a configuration of a malicious code identification apparatus according to an embodiment of the present invention. 1, the malicious code identification apparatus of the present invention includes a primary classification module 110, a secondary classification module 120, a relationship identification unit 130, a group generation unit 141, a group integration unit 142, and a map learning unit 150. The malicious code identification device of the present invention further includes a download section for downloading a file from the outside or a web page source from a web server and a database for storing the executable file, And dynamically analyze and identify malicious code based on the analyzed data.

본 발명은 응용 프로그램의 내부 또는 외부에서 악의적인 의도를 가진 소프트웨어(즉, 악성 코드)에 의해 비정상 행위가 발생하는 경우를 탐지 및 식별하는 기술로서, 응용 프로그램의 내부에서 발생하는 비정상 행위는 응용 프로그램 내에서 감염된 악성 코드가 동작하는 것을 의미하며, 응용 프로그램 외부에서 발생하는 비정상 행위는 이 응용 프로그램과는 무관한 다른 코드에서 이 응용 프로그램의 허용되지 않는 자원에 접근하는 행위를 의미한다.The present invention is a technology for detecting and identifying a case where an abnormal behavior occurs due to malicious intentional software (that is, malicious code) inside or outside an application program, This means that malicious code running outside of the application will access the unauthorized resources of the application in other code that is not related to the application.

본 발명의 실시예에 따른 1차 분류모듈(110)은 상기 다운로드부 및 데이터베이스로 입력됨에 따라 유입되는 복수의 악성코드들을 정적 분석을 통해 기존 악성코드 또는 변종 악성코드로 1차 분류할 수 있다. 여기서, 상기 정적 분석이란 별도의 실행 없이 컴퓨터 소프트웨어를 분석하는 것을 의미하고, 일반적으로 정적 분석으로 이용되는 방법은, 바이너리를 블록 단위로 해싱(hashing)하여 유사도를 식별하는 ssdeep, 악성코드에 적용된 패킹(packing) 여부를 식별할 수 있는 PeID, 상업용 정적 분석 도구 IDA Pro 등이 사용될 수 있는데, 본 발명의 일 실시예에 따른 변종 악성코드 식별 장치는 ssdeep을 중심적으로 활용하여 악성코드를 식별하는 장치 및 방법에 대하여 설명하도록 한다.The primary classification module 110 according to the embodiment of the present invention can classify a plurality of malicious codes flowing into the download section and the database into primary malicious code or malicious malicious code through static analysis. Here, the static analysis refers to analyzing computer software without performing any other operation. Generally, a method used for static analysis includes ssdeep for identifying a degree of similarity by hashing a binary on a block basis, packing used for malicious code a PeID for identifying whether or not the packet is packed, a commercial static analysis tool IDA Pro, and the like can be used. The malicious code identification device according to an embodiment of the present invention is a device for identifying a malicious code using ssdeep as a core, The method will be described.

1차 분류 모듈(110)은 유입되는 복수의 악성코드들 간의 유사도를 분석하고, 분석 결과에 따라 상기 악성코드들을 기존 악성코드 또는 변종 악성코드로 분류할 수 있다. 보다 상세하게는, 유입된 적어도 하나의 악성코드들 간의 유사도를 식별하고, 미리 설정한 임계치(threshold)를 기반으로 분류를 수행한다. 예를 들어, 임계치가 80%이고, 입력된 악성코드 A와 악성코드 B의 ssdeep 유사도가 81%라면, 악성코드 A와 악성코드 B는 같은 그룹으로 분류될 수 있다. The primary classification module 110 may analyze the similarity between the plurality of incoming malicious codes and classify the malicious codes into existing malicious codes or malicious malicious codes according to the analysis result. More specifically, it identifies the similarity between at least one malicious code that has been imported, and performs classification based on a preset threshold. For example, if the threshold value is 80% and the input ssdeep similarity degree between malicious code A and malicious code B is 81%, malicious code A and malicious code B can be classified into the same group.

그러나, 변종 악성코드의 경우, 기존 악성코드에 난독화 및 패킹 등을 적용했기 때문에 기존 악성코드와 같은 그룹으로 분류되기 어렵다. 즉, 변종 악성코드인 경우, 악성코드가 오직 1개만 포함된 단독 그룹을 형성할 것이다. 이와 같이, 본 발명의 1차 분류 모듈(110)을 통해 1차 분류를 수행함으로써 단독 그룹을 형성한 악성코드를 변종(신종) 악성코드로 판단하고, 단독 그룹을 형성하지 않은 악성코드는 기존 악성코드로 판단한다.However, in case of variant malicious code, obfuscation and packing are applied to existing malicious code, so it is difficult to classify malicious code into the same group as existing malicious code. That is, in case of variant malicious code, malicious code will form a single group containing only one. As described above, malicious codes forming a single group are determined as variant (new) malicious codes by performing primary classification through the primary classification module 110 of the present invention, and malicious codes not forming a single group are classified into a malicious code Code.

1차 분류 모듈(110)로부터 유입된 복수의 악성코드들을 기존 악성코드 및 변종 악성코드로 분류한 뒤, 2차 분류 모듈(120)은 미리 학습된 학습데이터를 기반으로 하여 1차 분류된 기존 악성코드를 2차 분류하거나, 동적 분석을 통해 1차 분류된 변종 악성코드들 중 적어도 하나의 대표 악성코드들을 선정한다.After classifying a plurality of malicious codes inputted from the primary classification module 110 into existing malicious code and variant malicious code, the secondary classification module 120 classifies the existing malicious code Code is classified into the second category, or at least one representative malicious code among the first classified malicious codes is selected through dynamic analysis.

상술한 바와 같이, 본 발명의 변종 악성코드 식별 장치는 1차 분류 모듈(110)을 통해 유입된 악성코드들의 정적 특징을 활용하여 1차 분류를 수행함으로써, 식별할 수 있는 기존 악성코드와 그렇지 않은 변종 악성코드를 분류함으로써 2차 분류 모듈(120)을 통해 동적 분석을 수행하는 단계에서 발생할 수 있는 시간 비용을 최소화하도록 한다.As described above, the variant malicious code identification device of the present invention performs the primary classification using the static characteristics of the malicious codes inputted through the first classification module 110, thereby detecting the existing malicious code that can be identified and the non- Thereby minimizing the time cost that may occur in performing the dynamic analysis through the secondary classification module 120 by classifying the malicious code of variant.

일 실시예에 따른 2차 분류 모듈(120)은 도면에 도시되지 않았으나, 정적 분석부와 동적 분석부를 더 포함할 수 있다. 정적 분석부는 지도 학습(supervised learning)을 기반으로 하여 분석하는 방법으로서, 사전에 식별해둔 학습 데이터를 기준으로 분류를 수행하는데, 본 발명에서는 정적 분석부는 1차 분류 모듈(110)을 통해 분류하기 위해 분석을 수행하여 1차 분류된 기존 악성코드를 레이블링 한다. 이때, 상기 학습데이터는 학습데이터로 활용된 악성코드를 말하며, 정적 분석으로 이용되는 분류 알고리즘으로는 Random Forest 및 Decision Tree 등이 이용될 수 있다. The secondary classification module 120 according to an exemplary embodiment may further include a static analysis unit and a dynamic analysis unit, although it is not shown in the drawing. The static analysis unit performs a classification on the basis of learning data previously identified as a method of analyzing based on supervised learning. In the present invention, the static analysis unit classifies the classified data using the first classification module 110 Analysis is performed to label existing malicious codes that are first classified. In this case, the learning data refers to malicious code used as learning data, and a random forest and a decision tree can be used as classification algorithms used for static analysis.

동적 분석부는 비지도 학습(unsupervised learning)을 기반으로 하는 분석 방법으로서, 1차 분류 모듈(110)을 통해 1차 분류된 변종 악성코드를 뷴류하기 위해 분석을 수행한다. 변종 악성코드는 기존 악성코드와 다르게 난독화 등이 적용되어 있어 정적 특징만으로는 식별되는데 한계가 있다. 따라서, 1차 분류 모듈(110)을 통해 변종 악성코드로 식별된 악성코드를 동적 분석부의 동적 분석을 통해 정확한 행위를 파악하도록 한다. The dynamic analysis unit is an analysis method based on unsupervised learning. The dynamic analysis unit performs analysis to classify the first-order classified malicious code through the first classification module 110. Unlike the existing malicious code, variant malicious code is obfuscated, so it is limited only by static features. Accordingly, the malicious code identified as the variant malicious code through the first classification module 110 is identified through the dynamic analysis of the dynamic analysis section.

보다 상세하게는, 본 발명의 실시예에 따른 동적 분석부는, 1차 분류 모듈(110)로부터 1차 분류된 변종 악성코드들은 학습데이터 기반 식별이 어렵기 때문에, 클러스터링(clustering) 알고리즘을 활용하여 상기 1차 분류된 악성 코드들을 군집화한다. 일 실시예로서, 동적 분석부는 랜덤 또는 사전지식을 기반으로 하여 k개의 군집들을 생성하고, 거리 기반의 함수를 통해 데이터들을 각 군집들과의 거리 차이의 분산을 최소화하도록 군집에 할당하여, 이후의 생성된 군집들 각각의 내부에서의 유사도는 최대로 하고, 서로 다른 군집들 간의 유사도는 최소화할 수 있도록 군집화를 반복하여 수행한다.More specifically, since the dynamic analysis unit according to the embodiment of the present invention is difficult to identify the learning data base based on the first-order classified malicious codes from the first classification module 110, the dynamic analysis unit may use the clustering algorithm Clusters the malicious codes of the first class. In one embodiment, the dynamic analysis unit generates k clusters based on random or prior knowledge, assigns the data to the clusters to minimize the variance of the distance difference from each cluster through the distance-based function, Clustering is repeated so that the degree of similarity within each of the generated clusters is maximized and the similarity between the different clusters is minimized.

보다 구체적으로 설명하면, 동적 분석부는 기 설정된 k개의 중심점들에 따라, 상기 1차 분류된 변종 악성코드들을 상기 k개의 중심점들 중 가장 가까운 중심점에 할당한다. 그리고, 동적 분석부는 상기 변종 악성코드들을 각 중심점들에 할당하고 난 후, 각 중심점들을 각 군집들의 무게중심으로 이동시키고, 상기 무게중심이 이동된 중심점들과 상기 중심점들 각각과 인접한 위치에 있는 변종 악성코드들과의 거리를 다시 계산하여 변종 악성코드들 각각을 자신과 가장 가까운 군집에 할당하도록 한다. 이때, 동적 분석부는 이와 같은 동작을 한번만 수행하는 것이 아니고, 군집의 형태가 변하지 않을 때까지 반복하여 수행될 수 있다. More specifically, the dynamic analysis unit allocates the first-order variant malicious codes to the nearest center point among the k center points according to predetermined k center points. The dynamic analysis unit may assign the variant malicious codes to the respective center points, move each center point to the center of gravity of each of the clusters, and calculate a variant of the center point, Recalculate the distances to the malicious codes and assign each of the variant malicious codes to the closest cluster. At this time, the dynamic analysis unit does not perform this operation only once, but can be performed repeatedly until the form of the cluster does not change.

다음으로, 동적 분석부는 대표 악성코드를 선정하기 위하여, 상술한 바와 같이 형성된 각 변종 군집들 각각의 중심점에 가장 가까운 변종 악성코드를 추출한다. 이때, 상기 대표 악성코드는 상기 중심점에 가장 가까운 변종 악성코드들로 선정될 수 있으며, 이렇게 선정된 상기 대표 악성코드는 정적 분석부를 통해 2차 분류된 기존 악성코드와의 관계식별에 활용된다. 일 예로, 동적 분석부로 이용되는 알고리즘으로는 클러스터링 알고리즘이 적합하며, 예컨대, K-Means, Agglomerative Clustering 등이 활용될 수 있다.Next, in order to select a representative malicious code, the dynamic analysis unit extracts variant malicious codes that are closest to the center of each of the variant communities formed as described above. At this time, the representative malicious code may be selected as the variant malicious codes that are closest to the center point, and the representative malicious code selected in this way is used for identifying the relation with the existing malicious code classified by the static analysis unit. For example, a clustering algorithm is suitable for an algorithm used as a dynamic analysis unit, and for example, K-Means, Agglomerative Clustering, and the like can be utilized.

이렇게, 2차 분류 모듈(120)의 정적 분석부와 동적 분석부를 통해 기존 악성코드와 변종 악성코드를 분류한 후, 본 발명의 실시예에 따른 관계식별부(130)는 기 설정된 알고리즘에 따라 상기 2차 분류 모듈(120)로부터 2차 분류된 기존 악성코드 및 상기 대표 악성코드 간 유사도를 산출함으로써, 상호간의 관계를 식별할 수 있다.After classifying the existing malicious code and the variant malicious code through the static analysis unit and the dynamic analysis unit of the secondary classification module 120, the relation identification unit 130 according to the embodiment of the present invention classifies the malicious code and the malicious code according to a predetermined algorithm The mutual relationship can be identified by calculating the degree of similarity between the existing malicious code secondary classified from the secondary classification module 120 and the representative malicious code.

본 발명의 관계식별부(130)는 유입된 기존 악성코드 식별을 위해 학습 데이터로 활용된 악성코드들의 정적 및 동적 특징을 모두 활용하여, 상기 정적 분석부를 통해 2차 분류를 통해 레이블링된 기존 악성코드와 상기 동적 분석부를 통해 선정된 대표 악성코드의 관계를 식별할 수 있다. 보다 구체적으로 설명하면, 본 발명의 실시예에 따른 관계식별부(130)는 학습 데이터를 이용하여 정적 분석부를 통해 2차 분류를 통해 레이블링된 기존 악성코드들의 동적 특징을 식별한다. 그에 따라, 동적 특징이 식별된 상기 2차 분류된 기존 악성코드와 상기 동적 분석부를 통해 선정된 대표 악성코드 간의 유사도를 산출한다. 여기서, 관계식별부(130)는 클러스터링 알고리즘 또는 유사도 비교 알고리즘(예를 들어, Cosine, Correlation) 등을 이용하여, 상기 두 악성코드 간 유사도를 산출함에 따라 관계를 식별할 수 있도록 한다. The relationship identifying unit 130 of the present invention utilizes both static and dynamic characteristics of malicious codes used as learning data to identify existing malicious code that has been imported, And the representative malicious code selected through the dynamic analysis unit. More specifically, the relationship identifying unit 130 according to an embodiment of the present invention identifies dynamic characteristics of existing malicious codes labeled through secondary classification through a static analysis unit using learning data. Accordingly, the degree of similarity between the existing malicious code classified in the secondary classified in which the dynamic characteristic is identified and the representative malicious code selected in the dynamic analysis unit is calculated. Here, the relation identifying unit 130 can identify the relationship by calculating the similarities between the two malicious codes using a clustering algorithm or a similarity comparison algorithm (e.g., cosine, correlation).

상술한 바와 같이, 본 발명의 실시예에 따른 변종 악성코드 식별 장치는, 1차 분류 모듈(110)의 미리 학습된 학습데이터를 이용하는 정적 분석과 분류된 기존 악성코드와 변종 악성코드를 또 한번 2차 분류 모듈(120)의 정적 분석부 및 동적 분석부를 통해 재분류하고, 후에 관계식별부(130)를 통해 최종 분류된 악성코드간의 관계를 식별함으로써, 변종 악성코드의 기원이 되는 기존 악성코드의 그룹을 식별할 수 있을 뿐만 아니라, 종래의 악성 코드 식별 방법에 따른 정적 분석만을 이용하거나 동적 분석만을 이용하는 방법보다 보다 빠르고 정확하게 식별할 수 있는 장점이 있다.As described above, the variant malicious code identification apparatus according to the embodiment of the present invention can perform the static analysis using the pre-learned learning data of the first classification module 110 and classify the existing malicious code and the malicious code, The malicious code is recalculated through the static analysis unit and the dynamic analysis unit of the main classification module 120 and the relationship between the malicious codes finally classified through the relationship identification unit 130 is identified, It is possible to identify a group and to identify it faster and more accurately than a method using only a static analysis according to a conventional malicious code identification method or a method using only a dynamic analysis.

그리고, 상기 관계식별부(130)를 통해 식별된 상기 관계 식별 결과에 따라서 본 발명의 실시예에 따른 그룹 통합부(141)는 상기 2차 분류된 기존 악성코드 및 상기 변종 악성코드를 하나의 제1 그룹으로 통합할 수 있고, 그룹 생성부(142)는 상기 관계 식별 결과에 따라 새로운 그룹인 제2 그룹을 생성할 수도 있다.In accordance with the relationship identification result identified through the relation identifying unit 130, the group integrating unit 141 according to the embodiment of the present invention divides the existing classified malicious code and the variant malicious code into one 1 group, and the group generating unit 142 may generate a second group, which is a new group, in accordance with the relationship identification result.

일 실시예로, 관계식별부(130)를 통해 관계 식별 대상으로서 이용된 기존 악성 코드와 대표 악성코드 간 산출된 유사도가 미리 설정된 기준 값보다 크거나 동일하면, 그룹 통합부(141)가 상기 대표 악성코드를 기존 악성코드로부터 파생된 것으로서 판단함으로써, 상기 관계 식별로 이용된 상기 두 악성코드를 하나의 그룹으로 통합하고, 반대로 상기 산출된 유사도가 상기 기준 값보다 작으면, 그룹 생성부(142)는 상기 대표 악성코드가 기존에 존재했던 악성코드들과는 구별되는 새로운 악성코드로 판단함에 따라 신종 악성코드로 간주하여 상기 대표 악성코드에 대한 새로운 그룹을 생성한다. In one embodiment, if the similarity calculated between the existing malicious code and the representative malicious code used as the relationship identification object through the relationship identification unit 130 is greater than or equal to a preset reference value, When the calculated similarity is smaller than the reference value, the group generating unit 142 generates a group of malicious codes by grouping the two malicious codes used for the relationship identification into one group by judging that the malicious code is derived from the existing malicious code. Generates a new group for the representative malicious code by considering the representative malicious code as a new malicious code distinguished from existing malicious codes and regards the representative malicious code as a new malicious code.

관계 식별부(130) 및 그룹 통합부(141)와 그룹 생성부(142)를 통해 식별된 관계에 따라 유입된 악성코드들의 기원을 파악하고 나면, 본 발명의 실시예에 따른 지도학습부(150)는 상기 통합된 제1 그룹 및 상기 생성된 제2 그룹에 대한 정보를 이용하여, 이후에 유입되는 변종 악성코드를 식별하기 위한 학습데이터로서 학습시킬 수 있도록 한다. After the origins of the malicious codes flowing in accordance with the relationship identified through the relationship identifying unit 130 and the group integrating unit 141 and the group generating unit 142 are grasped, the map learning unit 150 according to the embodiment of the present invention ) Can use the information on the first group and the second group to be learned as learning data for identifying the variant malicious code that flows thereafter.

본 발명의 지도학습부(150)는 그룹 통합부(141) 및 그룹 생성부(142)를 통해 정적 특징 및 동적 특징을 포함하는 레이블링된 악성코드 정보를 재학습시킴으로써, 악성코드를 식별하기 위한 학습데이터를 업데이트 할 수 있다.The map learning unit 150 of the present invention re-learns the labeled malicious code information including the static characteristic and the dynamic characteristic through the group integration unit 141 and the group generation unit 142, Data can be updated.

도2는 본 발명의 일 실시예에 따른 변종 악성코드 식별 방법을 시간의 흐름에 따라 도시한 흐름도이다.FIG. 2 is a flowchart illustrating a malicious code identification method according to an embodiment of the present invention, according to the flow of time.

도2를 참조하면, 먼저 S200 단계에서, 본 발명의 1차 분류 모듈(110)이 유입되는 복수의 악성코드들을 정적 분석을 수행한다. 보다 상세하게는, 1차 분류 모듈(110)은 상기 유입되는 악성코드들을 서로 비교하여, 비교 분석한 적어도 두 개의 악성코드의 유사도를 산출하고, 산출된 유사도를 기반으로, S210 단계에서 상기 유입된 악성코드들을 기존 악성코드와 변종 악성코드로 분류한다. 이때, 1차 분류 모듈(110)은 유입된 악성코드들 중 정적 분석으로 식별할 수 없는 악성코드를 기존 악성코드에 식별하기 어렵게 변종 엔진(Mutation Engine)을 적용한 악성코드로 판단하여 변종 악성코드로 분류할 수 있다.Referring to FIG. 2, first, in step S200, the primary classification module 110 of the present invention performs a static analysis on a plurality of malicious codes into which the primary classification module 110 flows. More specifically, the first classification module 110 compares the incoming malicious codes with each other, calculates the similarities of at least two malicious codes that are compared and analyzed, and based on the calculated similarities, Classify malicious code into existing malicious code and variant malicious code. At this time, the primary classification module 110 determines that malicious code that can not be identified by the static analysis among the malicious codes that have been input is determined as a malicious code to which the mutation engine is applied so that it is difficult to identify the malicious code with the existing malicious code, Can be classified.

다음으로, 상기 S200과 S201 단계에서, 1차 분류 모듈(110)을 통해 유입된 악성코드들이 기존 악성코드와 변종 악성코드로 1차 분류가 모두 수행되면, S220 단계에서 본 발명의 2차 분류 모듈(120)은 1차 분류된 기존 악성코드를 지도 학습(supervised learning)을 기반으로 하는 정적 분석을 수행하여 레이블링 한다. 보다 구체적으로 설명하면, 본 발명의 2차 분류 모듈(120)은 학습데이터의 정적 특징 및 지도 학습 알고리즘을 이용함으로써 상기 1차 분류된 기존 악성코드를 식별한다.If the malicious codes inputted through the primary classification module 110 are subjected to the first classification with the existing malicious code and the variant malicious code in steps S200 and S201, (120) performs a static analysis based on supervised learning on the existing malicious codes classified in the first order and labels the malicious codes. More specifically, the secondary classification module 120 of the present invention identifies the existing classified primary malicious code by using the static characteristics of the learning data and the learning algorithm.

또 다른 예로서, S230 단계에서 1차 분류 모듈(110)을 통해 변종 악성코드로 1차 분류된 악성코드들은, 2차 분류 모듈(120)을 통해 동적 분석을 수행한다. S231 단계에서, 변종 악성코드로 분류된 악성코드들은 적어도 하나의 가상 머신들을 이용하여 동적 분석을 수행하고, 비지도 학습을 활용해 식별된다. 즉, 1차 분류된 변종 악성코드들은 상기 가산 머신을 통해 클러스터링 알고리즘에 따라 기 설정된 k개의 군집을 형성함으로써, 군집화한다(S232).As another example, in step S230, malicious codes classified as malicious malicious codes through the first classification module 110 are subjected to dynamic analysis through the second classification module 120. FIG. In step S231, malicious codes classified as variant malicious codes are dynamically analyzed using at least one virtual machine, and identified using non-bad learning. That is, the first sorted malicious codes are clustered by forming k predetermined clusters according to the clustering algorithm through the adder machine (S232).

S233 단계에서, 본 발명의 2차 분류 모듈(120)은 상기와 같이 군집화된 상기 변종 악성코드들 중 각 군집들 중 대표가 되는 변종 악성코드를 선정한다.In step S233, the secondary classification module 120 of the present invention selects variant malicious codes representative of the respective clusters among the clustered malicious codes.

대표 변종 악성코드가 선정되면, 본 발명의 실시예에 따른 관계식별부(130)는 관계 식별을 위하여, 2차 분류 모듈로부터 2차 분류된 기존 악성코드의 동적 특징을 학습데이터를 이용하여 로딩한다(S221).When the representative variant malicious code is selected, the relation identifying unit 130 according to the embodiment of the present invention loads the dynamic characteristic of the existing malicious code secondary classified from the secondary classification module using the learning data (S221).

그에 따라, S240 단계에서, 본 발명의 관계식별부(130)는 상기 로딩된 1차 분류된 기존 악성코드와 상기 선정된 대표 변종 악성코드의 관계를 식별한다. 즉 관계식별부(130)는 1차 분류된 기존 악성코드의 동적 특징을 이용하여, 기존 악성코드와 대표 악성코드(변종)의 관계를 식별하는데, 이때 관계 식별 방법으로는 기 설정된 알고리즘에 따라 상기 2차 분류 모듈(120)로부터 2차 분류된 기존 악성코드 및 상기 대표 악성코드 간 유사도를 산출함으로써, 상호간의 관계를 식별할 수 있다.Accordingly, in step S240, the relationship identifying unit 130 of the present invention identifies the relationship between the loaded primary classified malicious code and the selected representative malicious malicious code. That is, the relationship identifying unit 130 identifies the relationship between the existing malicious code and the representative malicious code (variant) by using the dynamic characteristics of the existing malicious code classified in the primary classification. In this case, The mutual relationship can be identified by calculating the degree of similarity between the existing malicious code secondary classified from the secondary classification module 120 and the representative malicious code.

본 발명의 그룹 통합부(142)는 상기 S240의 관계 식별 단계에서, 상호간 유사도를 산출하기 위해 미리 설정된 기준값과 산출된 유사도를 비교한 결과 유사도가 기준 값보다 크거나 같으면 관계 식별에 이용된 대표 악성코드(변종)와 기존 악성코드를 하나의 그룹으로서 제1 그룹으로 통합하고(S250), 본 발명의 그룹 생성부(141)는 상기 기준값과 산출된 유사도를 비교한 결과 유사도가 기준 값보다 작으면, 상기 대표 악성코드(변종)는 기존 악성코드로터 파생된 악성코드가 아닌 것으로 판단함으로써, 상기 대표 악성코드에 대한 신종 그룹인 제2 그룹을 생성하도록 한다(S251).The group integrating unit 142 of the present invention compares the calculated similarity with a preset reference value in order to calculate mutual similarity in S240, and if the similarity is greater than or equal to the reference value, The group generation unit 141 of the present invention compares the code (variant) and the existing malicious code into a first group as a group (S250). When the similarity is compared with the reference value and the calculated similarity is smaller than the reference value , It is determined that the representative malicious code (variant) is not a malicious code derived from the existing malicious code rotor, so that a second group, which is a new group for the representative malicious code, is generated (S251).

상술한 바와 같이, S250과 S251 단계를 통해, 유입된 악성코드들을 그룹으로 최종 분류하고 나면, S260 단계에서 지도 학습부(150)는 상기 제1 그룹과 제2 그룹의 정보를 학습시킴으로써, 이후에 유입되는 악성코드를 식별하는데 이용될 수 있도록 학습 데이터를 업데이트 한다.After the malicious codes are finally classified into groups through steps S250 and S251, the map learning unit 150 learns information of the first group and the second group in step S260, The learning data is updated so that it can be used to identify the malicious code that is entering.

또한, 상기 S250 및 S251 단계가 완료되고 나면, 악성코드 식별 분석을 계속 진행할 것인지 판단하고, 판단 결과에 따라 진행하고자 하면, 이후에 유입된 악성코드에 대한 정정 분석을 수행하는 S200단계로 돌아가고, 유입되는 악성코드가 없다면 종료한다.When the steps S250 and S251 are completed, it is determined whether or not the malicious code identification analysis is to be continued. If the malicious code identification analysis is to proceed according to the determination result, the flow returns to step S200 in which a correction analysis is performed for the malicious code, If there is no malicious code, terminate it.

이상에서 설명한 본 발명의 실시예를 구성하는 모든 구성요소들이 하나로 결합하거나 결합하여 동작하는 것으로 기재되어 있다고 해서, 본 발명이 반드시 이러한 실시예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 또한, 이와 같은 컴퓨터 프로그램은 USB 메모리, CD 디스크, 플래쉬 메모리 등과 같은 컴퓨터가 읽을 수 있는 기록매체(Computer Readable Media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시예를 구현할 수 있다. 컴퓨터 프로그램의 기록매체로서는 자기 기록매체, 광 기록매체 등이 포함될 수 있다.It is to be understood that the present invention is not limited to these embodiments, and all elements constituting the embodiment of the present invention described above are described as being combined or operated in one operation. That is, within the scope of the present invention, all of the components may be selectively coupled to one or more of them. In addition, although all of the components may be implemented as one independent hardware, some or all of the components may be selectively combined to perform a part or all of the functions in one or a plurality of hardware. As shown in FIG. In addition, such a computer program may be stored in a computer readable medium such as a USB memory, a CD disk, a flash memory, etc., and read and executed by a computer to implement an embodiment of the present invention. As the recording medium of the computer program, a magnetic recording medium, an optical recording medium, or the like can be included.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위 내에서 다양한 수정, 변경 및 치환이 가능할 것이다. 따라서, 본 발명에 개시된 실시예 및 첨부된 도면들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예 및 첨부된 도면에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구 범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.It will be apparent to those skilled in the art that various modifications, substitutions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims. will be. Therefore, the embodiments disclosed in the present invention and the accompanying drawings are intended to illustrate and not to limit the technical spirit of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments and the accompanying drawings . The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.

Claims

In a method for identifying a malicious code by a malicious code identification device,
Performing a static analysis on each of a plurality of malicious codes to be identified by the malicious code identification device, thereby classifying each malicious code into an existing malicious code or a variant malicious code;
Classifying the existing malicious code based on learning data previously learned for classification according to the static characteristics of the existing malicious code when each of the malicious codes is classified into the existing malicious code according to the primary classification, When each of the malicious codes is classified into the malicious codes according to the primary classification, the malicious codes classified into the malicious codes are clustering in consideration of the dynamic characteristics of the malicious codes classified as the malicious codes, ;
If at least one of the existing malicious code and malicious malicious code is secondary classified, the malicious code and the variant malicious code are classified into the secondary classified malicious code and the malicious code by considering the static characteristic and the dynamic characteristic of the secondary malicious code and the variant malicious code, Identifying a relationship between the secondary malicious code and the variant malicious code by comparing the calculated similarity with a preset reference value;
The malicious code identification method comprising:

The method according to claim 1,
Integrating the secondarily sorted variant malicious code into the first group to which the existing malicious code belongs secondary belongs according to the relationship identification result or generating the second classified malicious code as a new second group Including more variant malware identification methods.

3. The method of claim 2,
By using the integrated first group or the generated second group to learn with the learning data for identifying the variant malicious code flowing into the first group or after the generation of the second group, Further comprising the step of updating the learning data.

delete

3. The method of claim 2,
The step of integrating into the first group or creating the second group comprises:
If the calculated degree of similarity is greater than or equal to the reference value, integrating the existing malicious code and the variant malicious code into the first group,
And if the calculated degree of similarity is smaller than the reference value, generating the second group for the variant malicious code.

The method according to claim 1,
The second classifying the variant malicious code includes:
Generating at least one variant cluster by performing clustering on malicious codes classified as the variant malicious code; And
Selecting a representative malicious code representative of the at least one variant community;
Further comprising the steps of:

8. The method of claim 7,
The step of selecting the representative malicious code includes:
Selecting a centroid of the at least one variant population;
Calculating a distance between a center point of the generated variant cluster and at least one variant malicious code belonging to the variant cluster; And
Selecting a variant code located at a distance closest to the center point as the representative malicious code according to the calculation result;
Further comprising the steps of:

The method according to claim 1,
Wherein the primary classification comprises:
And analyzing the similarities among the plurality of malicious codes to be identified and classifying the malicious codes into existing malicious codes or malicious malicious codes according to the analysis result.

A first classifying module for performing a static analysis on each of a plurality of malicious codes to be identified, thereby first classifying each of the malicious codes into an existing malicious code or a variant malicious code;
Classifying the existing malicious code based on learning data previously learned for classification according to the characteristics of the existing malicious code when each of the malicious codes is classified into the existing malicious code according to the primary classification, When each of the malicious codes is classified into the malicious code according to the first classification, the malicious code classified as the malicious code is clustering in consideration of the characteristics of the malicious code classified as the malicious code, Car classification module; And
If at least one of the existing malicious code and malicious malicious code is secondary classified, the malicious code and the variant malicious code are classified into the secondary classified malicious code and the malicious code by considering the static characteristic and the dynamic characteristic of the secondary malicious code and the variant malicious code, A relation distinguishing unit for calculating the similarity of the variant malicious code and comparing the calculated similarity with the preset reference value to identify the relationship between the existing classified malicious code and the variant malicious code;
The malicious code identification device comprising:

11. The method of claim 10,
A group merging unit for merging the secondarily sorted variant malicious code into the first group to which the existing malignant code belongs, according to the relationship identification result; and a second group And a group generating unit for generating the malicious code.

12. The method of claim 11,
A map for learning as the learning data for identifying the variant malicious code flowing into the first group or after the generation of the second group by using the integrated first group or the generated second group And further comprising a learning unit.

12. The method of claim 11,
If the calculated similarity degree is greater than or equal to the reference value, the group merging unit merges the existing malicious code and the variant malicious code classified in the secondary classification into the first group, Value, the second group for the variant malicious code is generated.

11. The apparatus of claim 10, wherein the secondary classification module comprises:
Generating at least one variant cluster by performing clustering on malicious codes classified as the variant malicious code, and selecting a representative malicious code representative of the at least one variant cluster.

A computer program stored on a computer readable medium for causing a computer to carry out a method of identifying variant malicious codes according to any one of claims 1 to 3 and 6 to 9.