KR20210011822A

KR20210011822A - Method of detecting abnormal log based on artificial intelligence and system implementing thereof

Info

Publication number: KR20210011822A
Application number: KR1020190089196A
Authority: KR
Inventors: 김혜란; 최영수; 안석준
Original assignee: 주식회사 엑셈
Priority date: 2019-07-23
Filing date: 2019-07-23
Publication date: 2021-02-02

Abstract

The present invention relates to a method for detecting an abnormal log based on artificial intelligence and a system implementing the same. A system for detecting an abnormal log based on artificial intelligence according to an embodiment of the present invention comprises: a log collector which collects log data computed during an operation of the system; a data pre-processing unit which pre-processes and vectorize the collected log data to generate structured log data; an AI engine which automatically labels data by clustering the structured log data and classifies the labeled data into normal log data and abnormal log data; a black filter which compares the normal log data with a blacklist and reclassifies some of the normal log data as abnormal log data; and a white filter which reclassifies some of the abnormal log data as normal log data by comparing the abnormal log data with a white list. The present invention can detect the abnormal log data and can automatically check the error status of the system.

Description

Artificial intelligence-based abnormal log detection method and system implementing it {METHOD OF DETECTING ABNORMAL LOG BASED ON ARTIFICIAL INTELLIGENCE AND SYSTEM IMPLEMENTING THEREOF}

본 발명은 인공 지능 기반 비정상 로그를 탐지하는 방법 및 이를 구현하는 시스템에 관한 것이다.The present invention relates to a method of detecting an abnormal log based on artificial intelligence and a system implementing the same.

컴퓨터 시스템 또는 네트워크 시스템에서는 각각의 동작에 대응하여 로그(log)와 같은 데이터들을 생성한다. 이들 데이터는 시스템 내에서 어떠한 동작이 발생했는지, 혹은 어떤 오류가 발생했는지를 보여준다. In a computer system or a network system, data such as a log is generated in response to each operation. These data show what actions have occurred in the system or what errors have occurred.

그런데, 이러한 데이터들은 시스템에서 특정한 문자와 숫자로 구성된 텍스트로 출력되는 비정형의 데이터들이다. 따라서, 로그를 분석하기 위해서는 사람이 수작업으로 로그를 확인하거나 키워드로 검색하여 특정 로그를 추출하는 방식이 있다.However, these data are unstructured data that the system outputs as text composed of specific letters and numbers. Therefore, in order to analyze a log, there is a method in which a person manually checks the log or retrieves a specific log by searching with a keyword.

그러나, 다양한 로그가 발생하거나 발생된 로그의 숫자가 많은 경우에, 수작업을 통한 로그의 분석에는 한계가 있다. 따라서, 로그와 같은 데이터를 처리하고 분류하여 중요한 내용만을 선별하여 처리하는 방법과 장치에 대해 살펴본다.However, when various logs are generated or the number of generated logs is large, there is a limit to manual log analysis. Therefore, we will look at a method and apparatus for processing and classifying data such as logs and selecting and processing only important contents.

본 발명의 목적은 비정상 로그 데이터를 탐지하여 시스템의 에러 상황을 자동으로 확인할 수 있는 시스템 및 방법을 제공하는데 있다. An object of the present invention is to provide a system and method capable of automatically checking an error condition of a system by detecting abnormal log data.

본 발명의 목적은 다양한 시스템에서 발생하는 다양한 종류의 로그 데이터를 자동으로 분류하여 시스템의 이상 탐지 효율을 높이는 데 있다. It is an object of the present invention to automatically classify various types of log data generated in various systems to increase the efficiency of detecting anomalies of the system.

본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제들로 제한되지 않으며, 여기서 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problem to be solved by the present invention is not limited to the problems mentioned above, and other problems not mentioned herein will be clearly understood by those skilled in the art from the following description.

본 발명의 일 실시예에 의한 인공 지능 기반 비정상 로그를 탐지하는 시스템은 시스템의 동작 과정에서 산출되는 로그 데이터를 수집하는 로그 수집기와, 수집된 로그 데이터를 전처리하여 벡터화시킨 정형화된 로그 데이터를 생성하는 데이터 전처리부와, 정형화된 로그 데이터를 클러스터링하여 데이터를 자동으로 레이블링하고 정상 로그 데이터와 비정상 로그 데이터로 분류를 수행하는 AI 엔진과, 정상 로그 데이터와 블랙리스트를 비교하여 정상 로그 데이터 중 일부를 비정상 로그 데이터로 재분류하는 블랙 필터와 비정상 로그 데이터와 화이트리스트를 비교하여 비정상 로그 데이터 중 일부를 정상 로그 데이터로 재분류하는 화이트 필터를 포함하는 포함한다. The system for detecting abnormal logs based on artificial intelligence according to an embodiment of the present invention includes a log collector that collects log data calculated during the operation of the system, and generates standardized log data that is vectorized by preprocessing the collected log data. The data preprocessor and the AI engine that automatically label the data by clustering the standardized log data and classify it into normal and abnormal log data, and the normal log data and blacklist are compared to make some of the normal log data abnormal. It includes a black filter for reclassifying as log data and a white filter for reclassifying some of the abnormal log data as normal log data by comparing abnormal log data and white list.

본 발명의 일 실시예에 의한 인공 지능 기반 비정상 로그를 탐지하는 방법은 로그 수집기가 시스템의 동작 과정에서 산출되는 로그 데이터를 수집하는 단계와, 데이터 전처리부가 수집된 로그 데이터를 전처리하여 벡터화시킨 정형화된 로그 데이터를 생성하는 단계와, AI 엔진이 정형화된 로그 데이터를 클러스터링하여 데이터를 자동으로 레이블링하고 정상 로그 데이터와 비정상 로그 데이터로 분류를 수행하는 단계와, 블랙 필터가 정상 로그 데이터와 블랙리스트를 비교하여 정상 로그 데이터 중 일부를 비정상 로그 데이터로 재분류하는 단계와 화이트 필터가 비정상 로그 데이터와 화이트리스트를 비교하여 비정상 로그 데이터 중 일부를 정상 로그 데이터로 재분류하는 단계를 포함한다. The method of detecting abnormal logs based on artificial intelligence according to an embodiment of the present invention includes the steps of a log collector collecting log data calculated during the operation of the system, and a data preprocessing unit preprocessing and vectorizing the collected log data. Generate log data, the AI engine clusters the standardized log data to automatically label the data and classify it into normal and abnormal log data, and a black filter compares normal log data and blacklist. And reclassifying some of the normal log data as abnormal log data, and reclassifying some of the abnormal log data as normal log data by comparing the abnormal log data with the white list by the white filter.

본 발명을 구현할 경우 비정상 로그 데이터를 탐지하여 시스템의 에러 상황을 자동으로 확인할 수 있다. When implementing the present invention, abnormal log data can be detected to automatically check an error condition of the system.

본 발명을 구현할 경우 다양한 시스템에서 발생하는 다양한 종류의 로그 데이터를 자동으로 분류하여 시스템의 이상 탐지 효율을 높일 수 있다. When implementing the present invention, it is possible to automatically classify various types of log data generated in various systems to increase the efficiency of detecting anomalies of the system.

본 발명이 제공하는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 여기서 언급되지 않은 또 다른 효과들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects provided by the present invention are not limited to the above-mentioned effects, and other effects not mentioned herein will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 일 실시에 의한 전체 작업 플로우를 보여준다.
도 2는 본 발명의 일 실시예에 의한 시스템의 구성을 보여준다.
도 3은 본 발명의 일 실시예에 의한 학습 과정을 보여준다.
도 4는 본 발명의 일 실시예에 의한 시스템이 로그 데이터에서 이상을 탐지하는 과정을 보여준다.
도 5는 본 발명의 일 실시예에 의한 분석 과정을 보여준다.
도 6은 본 발명의 일 실시예에 의한 로그 데이터를 처리하는 과정을 보여준다.
도 7은 본 발명의 일 실시예에 의한 블랙리스트를 업데이트하는 예시를 보여준다.1 shows the overall work flow according to an embodiment of the present invention.
2 shows the configuration of a system according to an embodiment of the present invention.
3 shows a learning process according to an embodiment of the present invention.
4 shows a process of detecting an abnormality in log data by the system according to an embodiment of the present invention.
5 shows an analysis process according to an embodiment of the present invention.
6 shows a process of processing log data according to an embodiment of the present invention.
7 shows an example of updating a blacklist according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다.Advantages and features of the present invention, and a method of achieving them will become apparent with reference to the embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in a variety of different forms, only these embodiments make the disclosure of the present invention complete, and common knowledge in the technical field to which the present invention pertains. It is provided to completely inform the scope of the invention to those who have, and the invention is only defined by the scope of the claims. The same reference numerals refer to the same elements throughout the specification.

본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 동일 또는 유사한 구성요소에 대해서는 동일한 참조 부호를 붙이도록 한다. 또한, 본 발명의 일부 실시예들을 예시적인 도면을 참조하여 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가질 수 있다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략할 수 있다.In order to clearly describe the present invention, parts irrelevant to the description have been omitted, and the same reference numerals are assigned to the same or similar components throughout the specification. In addition, some embodiments of the present invention will be described in detail with reference to exemplary drawings. In adding reference numerals to elements of each drawing, the same elements may have the same numerals as possible even if they are indicated on different drawings. In addition, in describing the present invention, when it is determined that a detailed description of a related known configuration or function may obscure the subject matter of the present invention, a detailed description thereof may be omitted.

또한, 본 발명의 구성 요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질, 차례, 순서 또는 개수 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 또는 접속될 수 있지만, 각 구성 요소 사이에 다른 구성 요소가 "개재"되거나, 각 구성 요소가 다른 구성 요소를 통해 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다.In addition, in describing the constituent elements of the present invention, terms such as first, second, A, B, (a), (b) may be used. These terms are only for distinguishing the component from other components, and the nature, order, order, or number of the component is not limited by the term. When a component is described as being "connected", "coupled" or "connected" to another component, the component may be directly connected or connected to that other component, but other components between each component It is to be understood that is "interposed", or that each component may be "connected", "coupled" or "connected" through other components.

본 발명의 실시예에서는 비정형 데이터의 예시로 로그 데이터(log data)를 중심으로 설명한다. 로그 데이터는 시스템이 동작하는 과정에서 어떤 에러가 발생하거나, 상태의 변화가 있거나 혹은 동작 결과를 남기기 위해 생성된 데이터를 의미한다. 즉, 시스템 또는 시스템을 구성하는 각각의 구성요소들이 남기는 어떤 동작(프로세스)에 대한 정보, 특정한 이벤트의 발생 사실, 오류의 발생 사실 등은 모두 로그 데이터로 남겨지며, 시스템 운영자는 이를 확인할 수 있다. In the embodiment of the present invention, a description will be given focusing on log data as an example of unstructured data. Log data refers to data generated to leave a certain error, state change, or operation result during system operation. That is, information on certain actions (processes) left by the system or each of the components constituting the system, the occurrence of a specific event, and the occurrence of errors are all left as log data, and the system operator can check this.

본 명세서에서는 시스템에 누적되는 방대한 양의 비정형 데이터인 로그 데이터를 사람이 모니터링하고 평가하는 문제를 해결하기 위해 인공지능(Artificial Intelligence, AI) 기반 비정상 로그 탐지 시스템을 제공하고자 한다. 보다 상세히, 인공 지능 기반 비정상 로그를 탐지하기 위해 본 명세서의 시스템은 비정형 로그 데이터를 클러스터링하여 블랙리스트를 선별할 수 있다. In this specification, in order to solve the problem of human monitoring and evaluation of log data, which is a vast amount of unstructured data accumulated in the system, an artificial intelligence (AI)-based abnormal log detection system is provided. In more detail, in order to detect abnormal logs based on artificial intelligence, the system of the present specification may select a blacklist by clustering unstructured log data.

본 발명의 실시예를 적용할 경우, 운영중인 시스템의 문제 발생의 단서가 될 수 있는 로그 라인들만을 선별하여 제공함으로써 사람이 분석해야 하는 수많은 라인 수를 줄여주고 시스템의 의심스러운 활동을 로그를 통해 신속하게 확인할 수 있다. When applying the embodiment of the present invention, by selecting and providing only log lines that may lead to problems in the operating system, the number of lines that must be analyzed by humans is reduced, and suspicious activity of the system is recorded through the log. You can check it quickly.

즉, 본 발명의 실시예를 적용할 경우, 시스템의 문제를 야기시키거나 문제가 발생한 결과에 해당하는 로그를 본 시스템이 신속하게 제시하여 로그 파일 분석에 필요한 노력과 휴먼 분석의 피로도를 줄여준다. That is, when the embodiment of the present invention is applied, the system promptly presents a log corresponding to a system problem or a result of a problem, thereby reducing the effort required for log file analysis and the fatigue of human analysis.

로그 이상 탐지 시스템은 WAS(Web Application Server), 웹 애플리케이션(Web application), 데이터베이스(Database, DB) 등의 어떠한 시스템에서 발생하는 로그에도 적용할 수 있는 범용적인 시스템이다. 따라서, 특정 분야 시스템의 도메인에 관한 정보 없이도 데이터 전처리를 최소화하여 로그 원천 데이터를 그대로 학습하여 이상을 탐지할 수 있다. The log abnormality detection system is a general-purpose system that can be applied to logs generated in any system such as WAS (Web Application Server), Web application, and Database (DB). Therefore, it is possible to detect anomalies by learning log source data as it is by minimizing data preprocessing without information on the domain of a system in a specific field.

이상 로그 또는 비정상 로그란 자주 발생하지 않는 희소한 데이터 셋을 기준으로 하며, 과거에 등장하지 않았거나, 혹은 과거에 자주 발생하지 않았던 새로운 로그 또는 희소한 로그를 일 실시예로 한다. An abnormal log or abnormal log is based on a rare data set that does not occur frequently, and a new log or a rare log that did not appear in the past or that did not occur frequently in the past is used as an embodiment.

이하, 인공지능에 기반하여 비정형 데이터에서 이상을 탐지하는 과정을 살펴본다. 비정형 데이터의 예시로 로그 데이터를 중심으로 살펴본다. Hereinafter, a process of detecting anomalies in unstructured data based on artificial intelligence will be described. As an example of unstructured data, we look around log data.

도 1은 본 발명의 일 실시에 의한 전체 작업 플로우를 보여준다. 인공지능에 기반하여 이상을 탐지하기 위해서는 우선 학습 과정이 필요하다. 이를 위해 시스템은 로그 데이터를 수집하고(S1), 로그 데이터를 학습 및 이상 탐지에 적용할 수 있도록 적합하게 변환하기 위해 로그 데이터를 전처리한다(S2). 1 shows the overall work flow according to an embodiment of the present invention. To detect anomalies based on artificial intelligence, a learning process is first required. To this end, the system collects log data (S1) and preprocesses the log data in order to appropriately convert the log data so that it can be applied to learning and abnormality detection (S2).

이후, 전처리된 로그 데이터는 머신러닝(Machine Learning)을 이용하여 학습된다. 로그 이상을 탐지하는 시스템 내에 혹은 별도의 다른 시스템에 배치된 머신러닝이 로그 데이터를 입력받아 학습을 수행한다(S3). 충분히 학습된 머신 러닝 네트워크는 시스템에 배치될 수 있다. Thereafter, the preprocessed log data is learned using machine learning. Machine learning deployed in a system that detects log abnormalities or in a separate system receives log data and performs learning (S3). A fully trained machine learning network can be deployed in the system.

이후 충분히 학습된 시스템은 이후 입력되는 로그 데이터를 분석하여 이상 여부를 탐지한다(S4). 그리고 이상을 탐지한 후 분석을 수행한다(S5). 분석은 이상으로 확인된 로그에 대응하여 실제 문제 상황이 발생했는지, 또는 이상으로 확인된 로그가 아니었으나 문제 상황이 발생했는지를 확인하는 과정을 포함한다. 이들은 시스템 상에서 자동으로 이루어질 수 있으며, 여기서 나온 결과는 다시 머신 러닝 네트워크에 입력되고, 머신 러닝 네트워크는 재학습 또는 보완 학습을 수행할 수 있다. After that, the sufficiently learned system analyzes the log data input thereafter to detect whether there is an abnormality (S4). And after detecting the abnormality, the analysis is performed (S5). The analysis includes the process of checking whether an actual problem situation has occurred in response to a log that has been identified as abnormal, or whether a problem situation has occurred even though the log was not identified as abnormal. These can be done automatically on the system, the results from which are fed back into the machine learning network, and the machine learning network can perform retraining or complementary learning.

도 2는 본 발명의 일 실시예에 의한 시스템의 구성을 보여준다. 2 shows the configuration of a system according to an embodiment of the present invention.

로그 수집기(Log Collecgtor)(200)는 여러 장치들이 생성하는 로그를 수집한다. 로그 수집기(200)는 하나의 시스템이 다수의 컴퓨터들을 포함하는 경우에 포함된 모든 컴퓨터들로부터 로그를 수집할 수 있다. 또한, 시스템이 하나의 컴퓨터를 포함하는 경우, 해당 컴퓨터에서 발생한 로그를 수집할 수 있다. The log collector 200 collects logs generated by various devices. The log collector 200 may collect logs from all computers included in a case where one system includes a plurality of computers. In addition, if the system includes one computer, logs generated in that computer can be collected.

로그 수집기(200)는 시스템(100) 외부에 배치될 수도 있고 또는 시스템(100) 내부에 배치될 수도 있다. The log collector 200 may be disposed outside the system 100 or may be disposed inside the system 100.

시스템(100) 또는 시스템 내의 다수의 컴퓨터에서 동작 중인 WAS, 웹 어플리케이션, DB 등에서 발생한 로그 원천 데이터는 수집 엔진인 로그 수집기(200)에 의해 수집되어 데이터 레이크(110)에 저장된다. Log source data generated from the system 100 or a WAS, web application, DB, etc. operating on a plurality of computers in the system are collected by the log collector 200, which is a collection engine, and stored in the data lake 110.

스케줄러(scheduler)(120)는 시스템의 학습을 위해서 사용자가 설정하거나 시스템에서 미리 설정된 학습 주기마다 스케줄링을 실행하여 데이터 레이크(110)에서 로그 데이터를 가져온다. 또한 시스템(100)이 실시간 탐지를 수행할 경우 실시간 발생하는 로그 데이터를 매 분마다 또는 특정 시간마다 가져와서 데이터 큐(130)에 전달한다. The scheduler 120 fetches log data from the data lake 110 by executing scheduling at each learning cycle set by a user or preset in the system for learning of the system. In addition, when the system 100 performs real-time detection, log data generated in real time is fetched every minute or at a specific time and transmitted to the data queue 130.

이때, 스케줄러(120)는 전달할 데이터가 학습 용인지 실시간 탐지 용인지를 지시하는 정보(예를 들어 구분할 수 있는 아이디)를 데이터에 추가하여 데이터 큐(130)에 전달할 수 있다. 데이터 큐(data queue)(130)는 학습을 위한 데이터와 실시간 데이터를 전달하는 파이프라인으로 사용한다. In this case, the scheduler 120 may add information indicating whether the data to be transmitted is for learning or for real-time detection (for example, a distinguishable ID) to the data and transmit it to the data queue 130. The data queue 130 is used as a pipeline for transmitting data for learning and real-time data.

데이터 전처리부(Data Preprocessor)(145)는 시스템의 학습을 위한 로그 데이터가 유입되면 데이터를 가져와서 로그 문자열을 벡터화하는 전처리를 수행한다. 예를 들어 데이터 전처리부(145)는 로그가 발생한 시간 정보 필드와 같은 시간 속성을 제거하는 등의 최소한의 전처리와 로그 문자열을 벡터화하는 과정을 수행한다. 즉, 비정형의 로그 데이터를 학습이나 탐지에 적합하게 정형화 시키는 작업을 수행한다. When log data for system learning is introduced, the data preprocessor 145 fetches the data and performs preprocessing to vectorize the log string. For example, the data preprocessor 145 performs minimal preprocessing such as removing a time attribute such as a time information field in which a log has occurred, and a process of vectorizing a log string. In other words, it performs a task of formalizing unstructured log data suitable for learning or detection.

트레이닝 매니저(Training Manager)(141)는 시스템의 학습을 위한 로그 데이터가 유입되면 데이터를 가져와서 데이터 전처리부(145)를 통해 로그 문자열을 벡터화하는 전처리를 거친 후 AI 엔진(150)을 통해서 머신러닝의 클러스터링 기법을 이용하여 데이터 자동 레이블링을 한다.When the log data for system training is introduced, the Training Manager 141 fetches the data and performs preprocessing to vectorize the log string through the data preprocessor 145, and then machine learning through the AI engine 150. Automatic data labeling is performed using the clustering technique of

트레이닝 매니저(Training Manager)(141)는 레이블링된 데이터와 데이터 마트(190)에 저장된 이전 학습 데이터를 가져와서 지도 학습으로 분류 모델을 학습하고 생성한다. 자동 레이블링된 데이터는 다음 학습 주기에 사용할 수 있도록 데이터 마트(190)에 저장한다. The training manager 141 learns and generates a classification model through supervised learning by taking the labeled data and previous training data stored in the data mart 190. The automatically labeled data is stored in the data mart 190 to be used in the next learning cycle.

발생하는 로그는 최근 로그와 연관이 있기 때문에 과거 모든 데이터를 학습에 사용하지는 않는다. 최근 몇일 또는 최소 몇달의 데이터만을 사용하며 시스템에서 자동으로 설정할 수 있다. 또는 전술한 설정은 사용자 또는 분석가에 의해 수동으로 설정될 수 있다. Since the generated log is related to the recent log, not all past data is used for training. It only uses data from the last few days or at least several months and can be set automatically by the system. Alternatively, the above-described setting may be manually set by a user or an analyst.

서빙 매니저(143)는 실시간으로 발생하는 로그가 유입되면 데이터 전처리부(145)를 통해 로그 문자열을 벡터화하는 전처리를 거친 후 AI 엔진(150)의 분류 모델을 통해 이상 로그를 탐지한다. 쓰레졸드 필터(Threshold Filter)(163)를 적용하여 예측 점수가 설정한 임계값 보다 작은 값들을 필터링하고, 화이트 필터(White Filter)(165)를 통해 이상으로 분류되었으나 이상이 아닌 것을 제거 한 후 탐지된 이상 로그는 데이터 마트(190)에 저장하고 대시보드(180)를 통해 사용자에게 알림을 전달한다. When a log generated in real time is introduced, the serving manager 143 performs preprocessing to vectorize a log string through the data preprocessor 145 and detects an abnormal log through the classification model of the AI engine 150. A threshold filter (163) is applied to filter values whose prediction score is less than the set threshold, and detected after removing the ones classified as abnormal but not abnormal through a white filter (165). The abnormal log is stored in the data mart 190 and a notification is delivered to the user through the dashboard 180.

AI 엔진(AI Engine)(150)는 학습 모델을 포함하며, 하기의 두 가지 기능인 클러스터링과 분류 작업을 수행한다. 클러스터링은 머신 러닝 클러스터링(Machine Learning Clustering)을 일 실시예로 한다. 클러스터링은 로그 데이터를 비슷한 패턴끼리 클러스터링하는 작업을 의미한다. The AI engine 150 includes a learning model and performs the following two functions, clustering and classification. Clustering uses Machine Learning Clustering as an embodiment. Clustering means clustering log data with similar patterns.

그리고 이 과정에서 어떤 클러스터에도 포함되지 않거나 희소한 클러스터는 "abnormal"로 레이블링을 하고 화이트 필터(Whilte Filter)(165)를 통해 화이트리스트 필터링을 수행한다.In this process, a cluster that is not included in any cluster or is rare is labeled as "abnormal" and whitelist filtering is performed through a white filter 165.

나머지 데이터에는 "normal" 레이블링을 하고 "normal" 데이터에서는 블랙 필터(Black Filter) (161)를 통해 블랙리스트를 필터링한다.The rest of the data is labeled "normal", and the "normal" data is filtered through a black filter (161).

레이블링 된 모든 데이터는 데이터 마트(190)에 저장된다.All labeled data is stored in the data mart 190.

분류 모델은 딥러닝을 이용하여 분류를 수행한다. AI 엔진(150)은 딥러닝 슈퍼바이즈 분류(Deep Learning supervised Classification)를 적용하여 클러스터링을 통해 자동 레이블링된 데이터와 이전에 레이블링되어 데이터 마트에 저장된 데이터를 합쳐서 이상 로그를 분류 하기 위한 모델을 학습을 하고 모델을 생성한다. 모델의 생성은 소프트웨어 모듈로 구성될 수 있고 다른 시스템에 배치될 수 있다. The classification model performs classification using deep learning. The AI engine 150 applies Deep Learning supervised Classification to train a model for classifying anomalous logs by combining automatically labeled data through clustering and previously labeled data stored in a data mart. And create a model. The creation of the model can consist of software modules and can be deployed on different systems.

블랙 필터(Black Filter)(161)는 정상으로 분류된 데이터에서 문제가 되는, 즉 이상 상태와 관련된 로그를 추출한다. 추출한 결과인 블랙리스트는 미리 시스템에서 자동으로 생성되거나 사용자/분석가가 미리 데이터베이스로 만든 것으로, 반드시 발견되어야 하는 로그들을 의미한다. 블랙 필터(161)는 학습 데이터의 'normal" 데이터에서 블랙리스트를 필터링하는 작업을 수행한다. The black filter 161 extracts a log related to a problem, that is, an abnormal state from data classified as normal. The blacklist, which is the result of the extraction, is automatically created in the system in advance or created in a database by the user/analyst, and means logs that must be found. The black filter 161 performs an operation of filtering a black list from'normal' data of the training data.

또한, 블랙 필터(161)는 블랙 필터를 자동으로 업데이트 할 수 있다.In addition, the black filter 161 may automatically update the black filter.

화이트 필터(White Filter)(165)는 탐지된 이상 로그에 포함된 화이트리스트를 필터링한다. 화이트리스트는 시스템에서 자동으로 생성되거나 분석가/사용자가 미리 데이터베이스로 만든 것으로, 이상 로그로 판단 되지 않을 로그를 의미한다. 또한, 화이트 필터(165)는 화이트 필터를 자동으로 업데이트 할 수 있다. A white filter 165 filters a white list included in the detected abnormal log. A white list is a log that is automatically created in the system or created in a database by an analyst/user in advance, and is not judged as an abnormal log. In addition, the white filter 165 may automatically update the white filter.

쓰레졸드 필터(Threshold Filter)(163)는 학습 모델이 얼마나 정확하게 예측했는지에 대한 예측 점수에 임계값을 설정하여 특정 점수 이하의 로그들을 필터링한다. 화이트 필터와 함께 오탐지로 인하 과도한 알림을 방지하기 위함이다. A threshold filter 163 filters logs below a specific score by setting a threshold value on a prediction score for how accurately the learning model predicts. This is to prevent excessive notifications cut by false positives along with the white filter.

Rest API(170)는 대시보드(dashboard)(1800와 특정 프로토콜로 통신 하며 사용자 요청 또는 시스템에서 생성한 데이터를 송수신한다. 프로토콜은 HTTP 프로토콜을 일 실시예로 한다. The Rest API 170 communicates with the dashboard 1800 through a specific protocol, and transmits and receives user requests or data generated by the system. The protocol uses the HTTP protocol as an embodiment.

도 2의 구성을 정리하면 다음과 같다. 도 2의 시스템은 시스템의 동작 과정에서 산출되는 로그 데이터를 수집하는 로그 수집기(200)를 선택적으로 포함할 수 있다. 그리고 데이터 전처리부(145)는 수집된 로그 데이터를 전처리하여 벡터화시킨 정형화된 로그 데이터를 생성한다. AI 엔진(150)는 정형화된 로그 데이터를 클러스터링하여 데이터를 자동으로 레이블링하고 정상 로그 데이터와 비정상 로그 데이터로 분류를 수행한다. The configuration of FIG. 2 is summarized as follows. The system of FIG. 2 may optionally include a log collector 200 that collects log data calculated during operation of the system. In addition, the data preprocessing unit 145 pre-processes the collected log data to generate a vectorized form of log data. The AI engine 150 automatically labels the data by clustering the standardized log data and classifies it into normal log data and abnormal log data.

그리고 블랙 필터(161)는 정상 로그 데이터와 블랙리스트를 비교하여 정상 로그 데이터 중 일부를 비정상 로그 데이터로 재분류하고, 화이트 필터(165)는 비정상 로그 데이터와 화이트리스트를 비교하여 비정상 로그 데이터 중 일부를 정상 로그 데이터로 재분류한다. In addition, the black filter 161 compares the normal log data and the black list to reclassify some of the normal log data as abnormal log data, and the white filter 165 compares the abnormal log data and the white list to determine some of the abnormal log data. Reclassify as normal log data.

도 3은 본 발명의 일 실시예에 의한 학습 과정을 보여준다. 도 1의 S1, S2, S3을 보다 상세히 제시한다. 3 shows a learning process according to an embodiment of the present invention. S1, S2, S3 of Figure 1 is presented in more detail.

데이터 수집 단계(S11)를 살펴본다. 시스템은 이전까지, 혹은 현재에도 계속 누적되는 로우 데이터(raw data)에서 학습을 위한 로그 데이터를 수집한다(S11). 이를 위해 원천이 되는 로그 데이터가 쌓여 있는 데이터 레이크(data lake)(110)에서 학습을 위한 데이터를 가져가는 방식을 포함한다.Look at the data collection step (S11). The system collects log data for learning from raw data that continues to accumulate until before or even now (S11). For this, it includes a method of taking data for learning from a data lake 110 in which log data as a source is accumulated.

즉, 로그 수집기(200)의 로그 수집 방식에는 미리 설정된 학습 주기마다 로그 데이터를 가져가는 방식과 실시간 탐지를 위해 실시간으로 데이터를 가져가는 방식 두 가지가 있다. 두 가지 방식 모두 스케줄러(scheduler)(120)의 동작으로 수행될 수 있다. S11 과정은 이후 탐지 과정에서 실시간 로그 데이터 수집 시에도 적용된다. That is, there are two types of log collection methods of the log collector 200: a method of taking log data every preset learning period, and a method of taking data in real time for real-time detection. Both methods may be performed by the operation of the scheduler 120. The S11 process is also applied when real-time log data is collected in the subsequent detection process.

다음으로 데이터 전처리 단계(S12)를 살펴본다. 데이터 전처리부(145)는 도 3의 학습 과정 외에도 후술할 탐지 과정에서도 동작한다. 전처리부(145)는 스케줄러(120)가 큐(130)를 통해 전달한 데이터를 추출하여 최소한의 전처리와 로그 문자열을 벡터화하는 과정을 수행한다. Next, a data preprocessing step (S12) will be described. In addition to the learning process of FIG. 3, the data preprocessor 145 operates in a detection process to be described later. The preprocessor 145 extracts the data transmitted by the scheduler 120 through the queue 130 to perform minimal preprocessing and a process of vectorizing a log string.

이후 학습 단계가 시작된다. AI 엔진(150)은 데이터 전처리를 거친 로그 데이터들을 머신러닝 클러스터링 기법으로 클러스터링한다(S13). Then the learning phase begins. The AI engine 150 clusters log data that has undergone data preprocessing using a machine learning clustering technique (S13).

그리고 AI 엔진(150)은 로그데이터가 아웃라이어(outlier)이거나 희소 패턴의 경우에는 "abnormal"(비정상 로그 데이터)이라고 레이블링한다(S14, S17). 반면 AI 엔진(150)은 로그데이터가 아웃라이어도 아니고 희소 패턴도 아닌 경우 "normal"(정상 로그 데이터)이라고 레이블링한다(S14, S15).In addition, the AI engine 150 labels the log data as “abnormal” (abnormal log data) in case the log data is outlier or sparse pattern (S14, S17). On the other hand, if the log data is neither an outlier nor a sparse pattern, the AI engine 150 labels it as "normal" (normal log data) (S14, S15).

로그의 양이 방대하므로 사람이 직접 일일이 "normal" 이나 "abnormal"이라는 레이블을 추가하지 않고 시스템이 자동으로 클러스터링을 통해 로그 데이터에 대한 레이블링을 수행한다. Since the amount of logs is vast, the system automatically performs labeling of log data through clustering without manually adding "normal" or "abnormal" labels.

여기서 아웃라이어/희소 패턴의 판단의 일 실시예로 다음과 같이 클러스터링할 수 있다. 어떠한 클러스터에도 속하지 못한 로그 데이터이거나 클러스터의 밀도나 구성비 등이 전체 데이터의 특정 크기(예를 들어 5%) 미만이라면 AI 엔진(150)은 해당 로그를 이상 로그로 판단하여 각각에 "abnormal" 이라고 라벨을 추가한다. 그리고 나머지 로그에는 "normal" 라벨을 추가한다. Here, as an example of determining the outlier/sparse pattern, clustering may be performed as follows. If log data does not belong to any cluster, or if the density or composition ratio of the cluster is less than a certain size (for example, 5%) of the total data, the AI engine 150 determines the log as an abnormal log and labels each log as "abnormal". Add And add "normal" label to the rest of the log.

여기서, 시스템, 보다 상세히 화이트 필터(165)는 정형화된 로그 데이터 전체에서 밀도 또는 구성비가 가장 높은 클러스터를 구성하는 로그 데이터를 화이트리스트에 추가할 수 있다. 예를 들어 정형화된 로그 데이터 전체에서 반복하여 많은 수로 산출되는 로그 데이터가 있다면, 해당 로그 데이터는 화이트리스트로 추가한다. Here, the system, in more detail, the white filter 165 may add log data constituting a cluster having the highest density or composition ratio in the entire standardized log data to the white list. For example, if there is log data that is repeatedly calculated in a large number over all of the standardized log data, the log data is added to the white list.

전체 로그 데이터 중에서 클러스터를 구성할 때 클러스터 내의 로그 데이터 간의 유사도가 높은 경우, 시스템은 밀도가 높은 클러스터로 판단할 수 있다. 또는 전체 로그 데이터 중에서 가장 빈번하게 산출되는 로그 데이터로 구성된 클러스터는 에러 발생과 무관한 로그 데이터일 가능성이 높다. 따라서 화이트 필터는 이들 로그 데이터를 화이트리스트에 추가할 수 있다.When configuring a cluster among all log data, if the similarity between log data in the cluster is high, the system may determine that the cluster has a high density. Alternatively, the cluster consisting of log data that is most frequently calculated among all log data is likely to be log data irrelevant to the occurrence of errors. Thus, the white filter can add these log data to the whitelist.

또한 S15 이후 시스템(100)은 "normal" 데이터에 대해 블랙 필터(161)를 적용하여 블랙리스트를 필터링한다(S16). 블랙리스트는 반드시 탐지 되어야 이상 로그 리스트이다.In addition, after S15, the system 100 filters the blacklist by applying the black filter 161 to the "normal" data (S16). The blacklist is a list of abnormal logs that must be detected.

또한 S17이후 시스템(100)은 "abnormal" 데이터에 대해 화이트 필터(165)를 적용하여 화이트리스트를 필터링한다(S18). 화이트리스트는 이상 로그로 간주되지 않는 로그 리스트이다.In addition, after S17, the system 100 filters the white list by applying the white filter 165 to the "abnormal" data (S18). The white list is a log list that is not considered an abnormal log.

S13 내지 S17은 로그 데이터 각각에 대해서 반복하여 수행될 수 있다. S13 to S17 may be repeatedly performed for each log data.

이후 시스템(100)은 이전에 저장된 레이블링 데이터를 데이터 마트(190)에서 추출하여 통합한다(S19). 또한 새롭게 레이블링 데이터는 데이터 마트(190)에 저장한다(S20).Thereafter, the system 100 extracts and integrates the previously stored labeling data from the data mart 190 (S19). In addition, the labeling data is newly stored in the data mart 190 (S20).

시스템(100)은 레이블링된 데이터로 분류 모델을 지도 학습하고 모델을 생성한다(S21, S22). 또한 시스템(100)은 레이블링된 데이터를 데이터 마트(190)에 저장하여 다음 학습 때에도 사용할 수 있다. The system 100 supervises learning a classification model with the labeled data and generates a model (S21, S22). In addition, the system 100 can store the labeled data in the data mart 190 and use it for the next learning.

도 4는 본 발명의 일 실시예에 의한 시스템이 로그 데이터에서 이상을 탐지하는 과정을 보여준다. 4 shows a process of detecting an abnormality in log data by the system according to an embodiment of the present invention.

탐지 단계에서도 전술한 실시간 로그 데이터 수집(S31)과 데이터 전처리 및 벡터화(S32)를 수행한다. 시스템(100)은 실시간으로 발생하는 로그 데이터를 전처리를 거친 후 학습 단계에서 생성된 분류 모델을 통해 이상 로그를 탐지한다(S33, S34).Also in the detection step, the above-described real-time log data collection (S31) and data preprocessing and vectorization (S32) are performed. The system 100 pre-processes log data generated in real time and then detects an abnormal log through the classification model generated in the learning step (S33, S34).

탐지된 이상 로그는 두 단계의 필터링을 거친다. 시스템(100)은 앞서 설정된 예측 점수 임계값을 적용하여 임계값 이하의 예측 점수 값을 갖는 데이터를 필터링한다(S35). 예측 점수란 모델이 얼마나 정확하게 이상 로그를 예측 했는지에 대한 점수이다. 범위는 최소값-최대값을 가지며, 일 실시예로 최저 0에서 최고 1으로 설정 할 수 있다. The detected abnormal log is filtered in two stages. The system 100 filters data having a prediction score value less than or equal to the threshold value by applying the previously set prediction score threshold (S35). The prediction score is a score for how accurately the model predicted the abnormal log. The range has a minimum value-a maximum value, and may be set from a minimum of 0 to a maximum of 1 in an embodiment.

다음으로 시스템(100)은 탐지된 데이터에서 화이트리스트에 있는 데이터를 필터링한다(S36). 화이트리스트는 이상으로 간주하지 않는 로그 리스트이다. 시스템(100)은 모든 필터링을 통과하여 잔류하는 비정상(이상) 로그 데이터를 데이터 마트(190)에 저장하고(S37), 대시보드(180)를 통해 사용자에게 알림을 준다(S38). Next, the system 100 filters whitelisted data from the detected data (S36). The white list is a log list that is not considered abnormal. The system 100 stores abnormal (abnormal) log data remaining after passing all filtering in the data mart 190 (S37), and notifies the user through the dashboard 180 (S38).

예를 들어 데이터마트(190)는 필터링 후 최종적으로 비정상 로그 데이터로 출력된 탐지 결과를 실시간 또는 일정한 시간 간격을 가지고 저장할 수 있다. For example, the data mart 190 may store a detection result finally output as abnormal log data after filtering in real time or at a predetermined time interval.

실시간 탐지는 이미 학습된 분류 모델을 포함하는 AI 엔진을 이용하여 실시간 발생한 로그 데이터가 비정상 로그 데이터인지, 혹은 정상 로그데이터인지를 판단하는 프로세스이다. Real-time detection is a process of determining whether log data generated in real time is abnormal log data or normal log data using an AI engine that includes a classification model that has already been learned.

따라서, 로그 수집기(200)는 실시간으로 누적된 로우 데이터에서 로그 데이터를 수집할 수 있다. 또한, 쓰레졸드 필터(163)는 로그 데이터의 예측 점수를 산출하여 분류된 로그 데이터가 시스템의 이상 상태와 관련될 가능성을 산출한다. Accordingly, the log collector 200 may collect log data from raw data accumulated in real time. In addition, the threshold filter 163 calculates a probability that the classified log data is related to an abnormal state of the system by calculating a predicted score of log data.

즉, AI 엔진(150)의 분류모델은 정형화된 로그 데이터를 입력받아 비정상 로그 데이터인지 여부를 출력한다. 그리고 쓰레졸드 필터(163)가 산출한 예측 점수에 따라 필터링된 결과를 탐지 결과로 저장한다. 저장된 탐지 결과는 데이터 마트(190)에 저장되어 대시보드(180)를 통해 모니터링 담당자가 시각적으로 확인할 수 있도록 한다. That is, the classification model of the AI engine 150 receives standardized log data and outputs whether it is abnormal log data. In addition, the filtered result according to the prediction score calculated by the threshold filter 163 is stored as a detection result. The stored detection result is stored in the data mart 190 so that a monitoring person in charge can visually check it through the dashboard 180.

도 5는 본 발명의 일 실시예에 의한 분석 과정을 보여준다. 5 shows an analysis process according to an embodiment of the present invention.

도 4의 과정에서 이상 로그가 탐지되면 대시보드(180)를 통해 알림이 발생하며(S41), 시스템은 이상 로그를 분석한다(S42). 즉, 시스템(100)은 탐지된 로그를 통해 시스템 문제 발생의 원인을 찾아 조치할 수 있다. When an abnormal log is detected in the process of FIG. 4, a notification is generated through the dashboard 180 (S41), and the system analyzes the abnormal log (S42). That is, the system 100 may find the cause of the system problem and take action through the detected log.

또한 시스템(100)은 탐지된 로그에서 무시해도 되는 로그는 화이트리스트로 작성할 수 있으며, 반드시 탐지 되어야 하는 로그는 블랙리스트로 작성할 수 있다. In addition, the system 100 may create a white list of logs that can be ignored from the detected logs, and a black list of logs that must be detected.

예를 들어, 이상 로그에서 화이트 로그가 포함된 경우(S43) 이를 화이트리스트에 추가한다(S44). 화이트 로그는 이상 로그로 확인되었으나 시스템 상에서 실제 오류나 문제 상황이 발생하지 않은 시점의 로그이며, 시스템(100)은 이를 화이트 로그로 판단하여 화이트리스트에 추가할 수 있다. For example, if a white log is included in the abnormal log (S43), it is added to the white list (S44). The white log is a log at a point in time that has been identified as an abnormal log but no actual error or problem situation has occurred in the system, and the system 100 may determine this as a white log and add it to the white list.

또한, 시스템 상에서 문제가 발생하거나 오류가 발생했을 때의 로그에 대해서 시스템(100)은 이상 로그가 아닌 것으로 도 4에서 확인된 경우라 하여도 해당 로그(블랙 로그)를 블랙 로그로 판단한다(S45). 그리고 시스템(100)은 블랙 로그를 블랙리스트에 추가하고 이를 "abnormal"로 라벨링하여 데이터 마트(190)에 저장한다(S46). In addition, for a log when a problem occurs or an error occurs in the system, the system 100 determines the log (black log) as a black log even if it is confirmed in FIG. 4 that it is not an abnormal log (S45). ). Then, the system 100 adds the black log to the blacklist, labels it as "abnormal" and stores it in the data mart 190 (S46).

도 5의 과정에서 문제 상황을 지시하는 로그를 보다 명확하게 판별할 수 있으며, 분석 과정이 반복될 수록 데이터마트(190)에 블랙리스트로 추가되는 로그들이 정밀해지면서 이상 탐지의 정확도를 높일 수 있다. In the process of FIG. 5, the log indicating the problem situation can be more clearly identified, and as the analysis process is repeated, the logs added to the data mart 190 as blacklists become more precise, thereby increasing the accuracy of anomaly detection. .

전술한 실시예들을 적용할 경우, 시스템(100)은 지도 학습(supervised learning)시 로그를 생성하는 시스템이 속한 도메인에 대한 별도의 데이터 없이도 각 로그 데이터에 대한 레이블링을 자동으로 수행할 수 있다. AI 엔진의 모델을 학습시키는 과정에서 방대하고 다양한 로그를 "normal"/"abnormal"로 신속하게 처리할 수 있다. When the above-described embodiments are applied, the system 100 may automatically label each log data without additional data on a domain to which the system generating the log belongs during supervised learning. In the process of training the model of the AI engine, it is possible to quickly process a large variety of logs into "normal"/"abnormal".

또한 시스템(100)은 자동 레이블링된 학습 데이터를 데이터 마트(190)에 저장하고 학습 데이터를 누적한다. 이로 인해 딥러닝 과정에서 학습 데이터가 증가할수록 이상 탐지의 정확도를 높일 수 있다. In addition, the system 100 stores the automatically labeled training data in the data mart 190 and accumulates the training data. For this reason, the accuracy of anomaly detection can be increased as the training data increases in the deep learning process.

또한, 시스템은 여러가지 필터들(161, 163, 165)를 둠으로써 누락되는 이상 로그 데이터를 추가할 수 있으며, 정상 로그 데이터에 대한 불필요한 알림을 방지할 수 있다. In addition, the system can add the missing abnormal log data by providing various filters 161, 163, and 165, and it is possible to prevent unnecessary notification of normal log data.

도 6은 본 발명의 일 실시예에 의한 로그 데이터를 처리하는 과정을 보여준다. Raw log(51)는 시스템에서 생성한 로그 데이터들을 의미한다. 일정한 데이터베이스에 저장되거나, 파일 형태로 생성될 수 있다. 로그 데이터가 데이터 전처리 과정으로 입력되는 것은 실시간으로(51b) 또는 배치(batch) 방식(51a)으로 이루어질 수 있다. 6 shows a process of processing log data according to an embodiment of the present invention. Raw log (51) refers to log data generated by the system. It can be stored in a certain database or can be created in the form of a file. Log data may be input in a data pre-processing process in real time 51b or in a batch method 51a.

데이터 전처리 과정(S52)의 일 실시예로 데이터 전처리부(145)는 로그 데이터를 이루는 단어를 벡터 스페이스에 임베딩하여 로그 데이터를 벡터화한다. 일 실시예로 데이터 전처리부(145)는 Word2Vec 알고리즘을 적용할 수 있다.As an embodiment of the data pre-processing process (S52), the data pre-processing unit 145 vectorizes log data by embedding a word constituting log data in a vector space. In an embodiment, the data preprocessor 145 may apply the Word2Vec algorithm.

S52 단계에서 벡터화한 로그데이터는 AI 엔진(150)에서 클러스터링을 수행한다(S53). 클러스터링은 기계학습에서 지도 학습을 하기 위해서 학습 데이터에 라벨링하는 작업을 자동화시킨다.The log data vectorized in step S52 is clustered by the AI engine 150 (S53). Clustering automates the task of labeling training data for supervised learning in machine learning.

또한, S53의 클러스터링 과정의 일 실시예는 K-Means 알고리즘을 사용하며 NLP(Natural Language Processing)에 기반하여 로그 데이터들 간의 유사도를 측정하여 유사한 로그 데이터들을 군집한다. 유사도는 로그 데이타의 벡터간 유클리디언 거리(Euclidean Distance)를 이용하여 산출할 수 있다.In addition, an embodiment of the clustering process of S53 uses the K-Means algorithm and clusters similar log data by measuring similarity between log data based on NLP (Natural Language Processing). The similarity can be calculated by using the Euclidean distance between vectors of log data.

또한, 클러스터링 과정에서, 각각의 군집을 클러스터(cluster)라 지칭하는데, 클러스터는 두 종류로 구분할 수 있다. 즉, AI 엔진(150)은 희소한 클러스터, 즉 자주 나타나지 않는 로그 집단을 이상 로그로 간주한다. 그리고 희소한 클러스터에 포함된 로그 데이터에 "abnormal"로 라벨링을 한다. 그리고 AI 엔진(150)은 희소한 클러스터가 아닌 다른 클러스터 내에 포함된 로그 데이터에는 "normal"로 라벨링을 한다. In addition, in the clustering process, each cluster is referred to as a cluster, and clusters can be classified into two types. That is, the AI engine 150 regards a sparse cluster, that is, a log group that does not appear frequently, as an abnormal log. And the log data included in the sparse cluster is labeled as "abnormal". In addition, the AI engine 150 labels log data included in a cluster other than a rare cluster as "normal".

S53의 패턴 분류 및 자동 라벨링이 완료되면 기계학습을 수행한다(S54). 이 과정에서 일시적으로 필터링 과정을 수행할 수 있다(S55). 필터링은 전술한 바와 같이 블랙 필터와 화이트 필터를 자동으로 적용할 수 있다. 또는 블랙 필터만을 자동으로 적용할 수 있다. When the pattern classification and automatic labeling of S53 are completed, machine learning is performed (S54). In this process, a filtering process may be temporarily performed (S55). Filtering may automatically apply a black filter and a white filter as described above. Alternatively, only the black filter can be applied automatically.

예를 들어, "normal"로 분류되었으나, 미리 정해진 블랙리스트(60)에 포함된 로그와 유사도가 높으면 이를 "abnormal"로 재분류한다. 또는 시스템에서 해당 로그가 생성될 당시 에러가 발생한 경우가 있다면 이 역시 "abnormal"로 재분류한다. For example, if the log is classified as "normal" but has a high similarity to the log included in the predetermined blacklist 60, it is reclassified as "abnormal". Or, if an error occurs when the log is created in the system, it is also reclassified as "abnormal".

반대로, "abnormal"로 분류되었으나 미리 정해진 화이트리스트에 포함된 로그와 유사도가 높으면 이를 "normal"로 재분류한다. 또는 시스템에서 해당 로그가 생성될 전후로 정상적인 동작 상황이었다면 이 역시 "normal"로 재분류한다. Conversely, if the log is classified as "abnormal" but has high similarity to the log included in the predetermined white list, it is reclassified as "normal". Or, if there was a normal operation before or after the log was created in the system, it is also reclassified as "normal".

기계학습 과정은 클러스터링을 통해 만들어진 "abnormal"과 "normal" 2개의 클래스를 갖는 학습 데이터로 DNN(Deep neural network)을 지도 학습 한다. The machine learning process supervises DNN (deep neural network) with training data having two classes, "abnormal" and "normal" created through clustering.

학습이 완료되면 탐지(S56)를 수행한다. 탐지는 S52에서 생성 및 벡터화된 로그 정보를 이용하여 탐지가 이루어지며, 이 로그를 S54에서 학습된 학습 네트워크에 입력하여 비정상인지 정상인지를 산출한다. When learning is completed, detection (S56) is performed. Detection is performed using log information generated and vectorized in S52, and the log is input to the learning network learned in S54 to calculate whether it is abnormal or normal.

이 과정에서 스코어링(S57)을 수행하는데 탐지 스코어링에는 모델의 예측 점수가 사용된다. 쓰레졸드 필터(Threshold Filter)는 탐지 모델이 탐지한 로그의 신뢰 스코어(confidence score)가 미리 설정된 임계값 보다 높은 로그를 검출한다. In this process, scoring (S57) is performed, and the predicted score of the model is used for detection scoring. The Threshold Filter detects a log in which the confidence score of the log detected by the detection model is higher than a preset threshold.

한편, 블랙리스트(60)는 시스템에 따라 미리 제공되는 "abnormal" 로그들을 포함할 수 있다. 또는 시스템의 동작 과정에서 문제 상황이 발생한 시점을 전후로 출력된 로그들 중에서 평이하게 출력되는 로그가 아닌 경우에 이를 블랙리스트에 포함시킬 수 있다. Meanwhile, the blacklist 60 may include "abnormal" logs provided in advance according to the system. Alternatively, when a problem occurs during the operation of the system, if it is not a plainly output log among the logs output before and after, it may be included in the blacklist.

블랙리스트에 포함된 로그들은 클러스터링 과정에서 "normal"로 분류된 경우라 하여도 다시 "abnormal"로 재분류하여 "normal"로 분류된 클래스의 노이즈를 제고한다. Even if the logs included in the blacklist are classified as "normal" during the clustering process, the noise of the class classified as "normal" is improved by reclassifying them as "abnormal".

블랙리스트는 실시간으로 업데이트될 수 있다. 즉 이상 상황이 발생한 경우를 전후로 산출된 로그는 문제적인 로그데이터일 가능성이 높으므로, 이들 중에서 통상적으로 발생하는 로그가 아닌, 간헐적으로 산출된 로그는 블랙리스트에 자동으로 추가된다. The blacklist can be updated in real time. In other words, since logs calculated before and after the occurrence of an abnormal situation are highly likely to be problematic log data, intermittently calculated logs are automatically added to the blacklist, not those that normally occur among them.

그리고, 클러스터링 단계에서 "normal"로 산출된 로그들 중에서 블랙리스트에 포함된 로그가 있는지 유사도를 비교하여 동일 또는 유사도가 높은 로그는 "abnormal"로 설정한다. Also, among logs calculated as "normal" in the clustering step, a similarity degree is compared to see if there is a log included in the blacklist, and a log with the same or high similarity is set to "abnormal".

한편, 화이트리스트 역시 자동으로 생성할 수 있다. 예를 들어, 시스템은 정기적으로 출력하며 정상인 상태에서 생성하는 로그에 대한 정보를 보유할 수 있다. 따라서, 정상 동작에 대응하는 로그들을 화이트리스트에 추가하고, 이후 이상 로그로 탐지된 로그 데이터를 화이트리스트와 비교하여 동일 또는 유사도가 높은 로그는 "normal"로 설정한다. On the other hand, whitelists can also be automatically created. For example, the system prints out on a regular basis and can retain information about logs that it generates in a normal state. Accordingly, logs corresponding to normal operation are added to the white list, and log data detected as abnormal logs is then compared with the white list, and logs having the same or high similarity are set to "normal".

그리고 유사도의 기준은 블랙리스트의 경우와 화이트리스트의 경우를 달리 설정할 수 있다. 예를 들어, 블랙리스트는 "normal" 중의 일부를 "abnormal"로 재분류하는 것으로, 블랙리스트 적용시의 유사도를 BLACK_PRO라 한다. 즉, 블랙 필터(161)는 로그 데이터와 블랙리스트에 포함된 로그를 비교하여 재분류하는 유사도 기준값을 BLACK_PRO로 설정할 수 있다. In addition, the similarity criterion may be set differently for a blacklist and a whitelist. For example, the blacklist reclassifies some of "normal" as "abnormal", and the degree of similarity when the blacklist is applied is called BLACK_PRO. That is, the black filter 161 may set a similarity reference value for reclassifying by comparing log data with logs included in the blacklist as BLACK_PRO.

블랙 필터(161)는 로그 데이터와 블랙리스트에 포함된 로그를 비교하여 산출한 유사도가 BLACK_PRO 보다 낮은 경우 해당 로그 데이터는 블랙리스트에 포함된 로그와 상이한 것으로 판단한다. The black filter 161 determines that the log data is different from the log included in the blacklist when the similarity calculated by comparing the log data with the log included in the blacklist is lower than BLACK_PRO.

그리고 화이트리스트는 "abnormal" 중의 일부를 "normal"로 재분류하는 것으로, 화이트리스트 적용시의 유사도를 WHITE_PRO라 한다. 즉, 화이트 필터(165)는 로그 데이터와 화이트리스트에 포함된 로그를 비교하여 재분류하는 유사도 기준값을 WHITE_PRO로 설정할 수 있다. In addition, the whitelist reclassifies some of the "abnormal" as "normal", and the degree of similarity when the whitelist is applied is called WHITE_PRO. That is, the white filter 165 may set a similarity reference value for reclassifying by comparing log data with logs included in the white list as WHITE_PRO.

화이트 필터(165)는 로그 데이터와 화이트리스트에 포함된 로그를 비교하여 산출한 유사도가 WHITE_PRO 보다 낮은 경우 해당 로그 데이터는 화이트리스트에 포함된 로그와 상이한 것으로 판단한다.The white filter 165 determines that the log data is different from the log included in the white list when the similarity calculated by comparing the log data with the log included in the white list is lower than WHITE_PRO.

시스템은 BLACK_PRO를 WHITE_PRO 보다 낮은 값을 가지도록 유지할 수 있다. 왜냐하면 시스템의 안정성을 위해 "normal" 중의 일부를 "abnormal"로 재분류하는 것(BLACK_PRO)은 보다 용이하게 구성하고, "abnormal" 중의 일부를 "normal"로 재분류하는 것(WHITE_PRO)는 보다 어렵게 구성하여 최대한 문제상황 발생시의 대처 가능성을 높인다. The system can keep BLACK_PRO to have a lower value than WHITE_PRO. Because for the stability of the system, reclassifying some of the "normal" as "abnormal" (BLACK_PRO) makes it easier to configure, and reclassifying some of the "abnormal" as "normal" (WHITE_PRO) is more difficult. Configuration to increase the possibility of coping with problem situations as much as possible.

또한, 다른 실시예에 의하면, "abnormal"로 분류된 로그들이 지나치게 많아질 경우에는 시스템의 효율을 위해 BLACK_PRO를 WHITE_PRO 보다 높게 혹은 같은 수준으로 유지할 수 있다.In addition, according to another embodiment, when there are too many logs classified as "abnormal", BLACK_PRO may be maintained at a higher or equal level than WHITE_PRO for system efficiency.

블랙 필터(161)는 "normal" 데이터에서 블랙리스트를 적용하여 필터링한다. 블랙리스트는 자동으로 생성되거나 추가될 수 있는데, 일 실시예로, 스코어링 단계에서 모델의 예측 점수가 0.999(임계값은 시스템에 따라 설정 가능하며, 1에 가까운 값으로 엄격하게 설정할 수 있다) 이상인 로그를 블랙리스트로 저장한다. 이 블랙리스트는 학습 데이터에 라벨링을 하는 과정에서 "normal" 클래스에 포함된 노이즈를 제거하는데 사용된다. The black filter 161 filters the "normal" data by applying a black list. The blacklist can be automatically generated or added.In one embodiment, in the scoring step, the predicted score of the model is 0.999 (the threshold can be set according to the system, and can be set strictly to a value close to 1) or higher. Is saved as a blacklist. This blacklist is used to remove noise contained in the "normal" class in the process of labeling the training data.

블랙리스트의 필터링 과정은 다음과 같다. 클러스터링 단계에서 "normal"로 분류한 클러스터내에 포함된 로그 중에서 블랙리스트에 포함된 로그가 있는지 확인한다. 문장 비교나 주요 단어 비교 등으로 유사도를 산출할 수 있다. 그리고 동일하거나 유사도가 높은 경우 해당 로그를 "abnormal" 데이터로 라벨링한다. The blacklist filtering process is as follows. In the clustering step, check if there are any logs included in the blacklist among logs included in the cluster classified as "normal". Similarity can be calculated by comparing sentences or comparing key words. And if the log is the same or has high similarity, the log is labeled as "abnormal" data.

그리고 최종 스코어링 결과는 시스템 상의 인터페이스를 통해 출력한다(S58). 이는 사용자가 확인하거나 모니터링 과정에 이용될 수 있도록 한다. And the final scoring result is output through the interface on the system (S58). This allows the user to check or be used in the monitoring process.

도 7은 본 발명의 일 실시예에 의한 블랙리스트를 업데이트하는 예시를 보여준다. 7 shows an example of updating a blacklist according to an embodiment of the present invention.

도 7의 화살표는 시스템 또는 시스템을 구성하는 여러 장치들이 동작하는 시간적 흐름을 의미한다. 이 과정에서 t1 시점에 이상 상황이 발생한다. 이 경우, t1 시점을 기준으로 일정시간 이전 또는 이후에 발생한 로그들은 블랙리스트의 후보가 될 수 있다. Arrows in FIG. 7 indicate a time flow in which the system or various devices constituting the system operate. In this process, an abnormal situation occurs at the time t1. In this case, logs generated before or after a predetermined time from the time point t1 may be candidates for the blacklist.

즉, 다시 말하면, 로그 데이터가 산출된 시간을 기준으로 일정 시간 이전 또는 이후에 발생한 시스템의 이상 상황을 기반으로 로그 데이터를 블랙리스트에 포함시킨다. That is, in other words, the log data is included in the blacklist based on the abnormal situation of the system occurring before or after a certain time based on the time at which the log data was calculated.

따라서, t0~t2 사이에 발생한 로그데이터들 중에서 정상 로그 데이터로 확실히 분류된 로그 데이터를 제외하고 남는 로그 데이터를 블랙리스트 후보로 설정한다. 이후에도 이상 상황이 발생한 경우, 블랙리스트 후보에 이미 포함된 로그데이터들 중에서 중복 발생한 로그 데이터는 블랙리스트로 추가할 수 있다. Therefore, among the log data generated between t0 and t2, the log data remaining excluding log data that is clearly classified as normal log data is set as a blacklist candidate. If an abnormal situation occurs even after that, redundant log data among log data already included in the blacklist candidate can be added to the blacklist.

따라서, 블랙리스트의 업데이트는 1) 이상상황 발생시 생성된 로그 데이터들 중에서 정상 로그 데이터가 아닌 로그 데이터이며, 2) 이상 상황 발생시 반복 산출된 로그 데이터를 블랙리스트에 포함시킬 수 있다. Therefore, the update of the blacklist is 1) log data that is not normal log data among log data generated when an abnormal situation occurs, and 2) log data that is repeatedly calculated when an abnormal situation occurs may be included in the blacklist.

그리고 시스템, 보다 상세히 블랙 필터(161)는 개별 로그 데이터에 대한 비정상 여부를 판단할 때, 아웃라이어인 로그데이터가 시스템/장치의 이상 상황 시점과 유사한 시점에 산출된 경우, 시스템 안정성을 위해 블랙리스트에 해당 로그 데이터를 추가할 수 있다. In addition, the system, in more detail, the black filter 161, when determining whether the individual log data is abnormal, and when the log data, which is an outlier, is calculated at a time similar to that of the abnormal situation of the system/device, the black filter 161 blacklists for system stability. You can add the log data to.

본 발명의 실시예를 구성하는 모든 구성 요소들이 하나로 결합되거나 결합되어 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시예에 한정되는 것은 아니며, 본 발명의 목적 범위 내에서 모든 구성 요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성 요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성 요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수 개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 그 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 본 발명의 기술 분야의 당업자에 의해 용이하게 추론될 수 있을 것이다. 이러한 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 저장매체(Computer Readable Media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시예를 구현할 수 있다. 컴퓨터 프로그램의 저장매체로서는 자기 기록매체, 광 기록매체, 반도체 기록소자를 포함하는 저장매체를 포함한다. 또한 본 발명의 실시예를 구현하는 컴퓨터 프로그램은 외부의 장치를 통하여 실시간으로 전송되는 프로그램 모듈을 포함한다. Even if all the constituent elements constituting the embodiments of the present invention are described as being combined or combined into one operation, the present invention is not necessarily limited to these embodiments, and all constituent elements within the scope of the present invention are one or more. It can also be selectively combined and operated. In addition, although all of the components may be implemented as one independent hardware, a program module that performs some or all functions combined in one or more hardware by selectively combining some or all of the components. It may be implemented as a computer program having Codes and code segments constituting the computer program may be easily inferred by those skilled in the art. Such a computer program is stored in a computer-readable storage medium, and is read and executed by a computer, thereby implementing an embodiment of the present invention. The storage medium of the computer program includes a magnetic recording medium, an optical recording medium, and a storage medium including a semiconductor recording element. In addition, the computer program implementing the embodiment of the present invention includes a program module that is transmitted in real time through an external device.

전술된 실시예는 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해되어야 하며, 본 발명의 범위는 전술된 상세한 설명보다는 후술될 특허청구범위에 의해 나타내어질 것이다. 그리고 이 특허청구범위의 의미 및 범위는 물론, 그 등가개념으로부터 도출되는 모든 변환 및 변형 가능한 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.It should be understood that the above-described embodiments are illustrative and non-limiting in all respects, and the scope of the present invention will be indicated by the claims to be described later rather than the detailed description described above. And the meaning and scope of the claims, as well as all transformable and deformable forms derived from the equivalent concept should be interpreted as being included in the scope of the present invention.

100: 시스템
110: 데이터 레이크
120: 스케줄러
130: 데이터큐
145: 데이터 전처리부
150: AI 엔진
161: 블랙필터
165: 화이트필터100: system
110: data lake
120: scheduler
130: data queue
145: data preprocessor
150: AI engine
161: black filter
165: white filter

Claims

A log collector for collecting log data calculated during the operation of the system;
A data preprocessing unit that pre-processes the collected log data to generate a vectorized form of log data;
An AI engine that automatically labels the data by clustering the standardized log data and classifies it into normal log data and abnormal log data;
A black filter for reclassifying some of the normal log data as abnormal log data by comparing the normal log data with a black list; And
A system for detecting abnormal logs based on artificial intelligence, comprising a white filter for reclassifying some of the abnormal log data as normal log data by comparing the abnormal log data with a white list.

The method of claim 1,
The log collector collects log data from the accumulated raw data every preset learning period,
The AI engine labels outliers as abnormal log data in the standardized log data, labels other log data as normal log data, and then supervises learning a classification model using the labeled data. A system that detects logs.

The method of claim 2,
The white filter is a system for detecting abnormal logs based on artificial intelligence that adds log data constituting a cluster having the highest density or composition ratio among the standardized log data to the white list.

The method of claim 1,
The black filter includes the log data in the blacklist based on an abnormal situation of a system occurring before or after a predetermined time based on a time at which the log data is calculated. A system for detecting abnormal logs based on artificial intelligence.

The method of claim 1,
The log collector collects log data from raw data accumulated in real time,
Further comprising a threshold filter for calculating the predicted score of the log data,
The classification model of the AI engine receives the standardized log data and outputs whether it is abnormal log data, and
A system for detecting an abnormal log based on artificial intelligence that stores the filtered result as a detection result according to the prediction score calculated by the threshold filter.

The method of claim 1,
The black filter sets a similarity reference value for reclassifying by comparing the log data with the logs included in the blacklist as BLACK_PRO,
The white filter sets a similarity reference value for reclassifying by comparing the log data with the logs included in the white list as WHITE_PRO,
The system sets the value of BLACK_PRO to a value lower than the value of WHITE_PRO. A system for detecting an abnormal log based on artificial intelligence.

Collecting, by a log collector, log data calculated during operation of the system;
Generating, by a data preprocessing unit, the collected log data and vectorized standardized log data;
An AI engine clustering the standardized log data to automatically label the data and classify it into normal log data and abnormal log data;
Reclassifying some of the normal log data as abnormal log data by comparing the normal log data with a black list by a black filter; And
And reclassifying some of the abnormal log data as normal log data by comparing the abnormal log data with a white list by a white filter.

The method of claim 7,
The log collector collecting log data from the accumulated raw data every preset learning period; And
The AI engine labels outliers as abnormal log data in the standardized log data, labels other log data as normal log data, and supervises learning a classification model using the labeled data. , Artificial intelligence-based anomalous log detection method.

The method of claim 8,
The method of detecting an abnormal log based on artificial intelligence, further comprising the step of adding, by the white filter, log data constituting a cluster having the highest density or composition ratio among the standardized log data to the white list.

The method of claim 7,
The black filter further comprises the step of including the log data in the blacklist based on an abnormal situation of the system occurring before or after a predetermined time based on the time when the log data was calculated, artificial intelligence-based abnormal log How to detect.

The method of claim 7,
The log collector collecting log data from raw data accumulated in real time;
Calculating, by a threshold filter, a prediction score of the log data;
Receiving the standardized log data and outputting whether the classification model of the AI engine is abnormal log data; And
The data mart further comprises the step of storing the filtered result as a detection result according to the prediction score calculated by the threshold filter.

The method of claim 7,
The black filter comparing the log data with the logs included in the black list and reclassifying the similarity reference value as BLACK_PRO;
The white filter comparing the log data with the log included in the white list and reclassifying the similarity reference value is set to WHITE_PRO; And
The system sets the value of BLACK_PRO to a value lower than the value of WHITE_PRO. Method for detecting an abnormal log based on artificial intelligence.