KR101772955B1

KR101772955B1 - Record processing method using index data structure in distributed processing system based on mapreduce

Info

Publication number: KR101772955B1
Application number: KR1020160087887A
Authority: KR
Inventors: 김진수; 이영식
Original assignee: 성균관대학교산학협력단
Priority date: 2016-07-12
Filing date: 2016-07-12
Publication date: 2017-08-31
Anticipated expiration: 2036-07-12

Abstract

맵리듀스 기반의 분산 처리 시스템에서 인덱스를 이용하여 레코드를 처리하는 방법은 분산 처리 시스템의 분산 노드가 입력 파일에서 분석 대상인 레코드들을 분류하는 단계, 상기 분산 노드가 각 레코드의 키(key) 및 저장 위치를 지시하는 복수의 인덱스를 갖는 자료 구조를 생성하는 단계, 상기 분산 노드가 상기 자료 구조에서 상기 키의 순서대로 상기 복수의 인덱스에 접근하면서 레코드의 키, 레코드의 저장 위치 및 레코드가 소속되었던 자료구조의 식별자를 지시하는 새로운 인덱스를 갖는 새로운 인덱스 자료 구조를 생성하는 단계, 상기 분산 노드가 새로운 자료구조에서 키의 순서대로 인덱스에 접근하면서 동일한 키 값을 갖는 인덱스가 지시하는 레코드의 저장 위치 및 자료 구조의 식별자를 기준으로 저장된 레코드에 리듀스 함수를 적용하는 단계를 포함한다.A method of processing a record using an index in a distributed processing system based on MapReduce includes the steps of classifying records to be analyzed in an input file by a distributed node of the distributed processing system, Generating a data structure having a plurality of indexes indicating a plurality of indexes in the data structure, accessing the plurality of indexes in the order of the keys in the data structure, Generating a new index data structure having a new index indicating an identifier of the new data structure, storing the new data structure in the new data structure, To a stored record based on the identifier of And a step.

Description

TECHNICAL FIELD [0001] The present invention relates to a method of processing records using indexes in a distributed processing system based on MapReduce,

이하 설명하는 기술은 맵리듀스 프레임워크에서 레코드를 처리하는 기법에 관한 것이다.The techniques described below relate to techniques for processing records in the MapReduce framework.

2004년 OSDI 컨퍼런스에서 구글은 "MapReduce: Simplified Data Processing on Large Cluster"란 논문을 발표한다. 이후 맵리듀스(MapReduce)에 기반한 다양한 시스템이 개발되었다. 특히 빅데이터 처리를 위한 분산 처리 플랫폼으로 아파치 재단의 하둡 (Apache Hadoop; 이하 하둡)이 주목받고 있다. At the OSDI conference in 2004, Google will publish a paper called "MapReduce: Simplified Data Processing on Large Clusters." Various systems based on MapReduce have been developed since then. The Apache Foundation's Apache Hadoop (Hadoop) is getting attention as a distributed processing platform for big data processing.

다만 종래 맵리듀스 프레임워크는 맵(Map) 함수와 리듀스(Reduce) 함수에서 병합정렬을 사용한다. 알려진 바와 같이 병합정렬은 물리적인 저장 장치를 활용한 기법으로 필연적으로 저장 장치에 여러 번 접근을 해야하는 단점이 있다.However, the conventional MapReduce framework uses merge sorting in Map functions and Reduce functions. As is known, merge sorting is a technique utilizing physical storage devices, which inevitably requires access to the storage device several times.

이하 설명하는 기술은 인덱스 기반의 자료 구조를 이용하는 맵리듀스 프레임워크를 제공하고자 한다.The technique described below is intended to provide a MapReduce framework using an index-based data structure.

맵리듀스 기반의 분산 처리 시스템에서 인덱스를 이용하여 레코드를 처리하는 방법은 분산 처리 시스템의 분산 노드가 입력 파일에서 분석 대상인 레코드들을 분류하는 단계, 상기 분산 노드가 각 레코드의 키(key) 및 저장 위치를 지시하는 인덱스를 갖는 자료 구조를 생성하는 단계, 상기 분산 노드가 상기 자료 구조에서 상기 키의 순서대로 상기 인덱스에 접근하고, 접근한 인덱스가 지시하는 저장 위치에 저장된 레코드들을 전송하는 단계 및 상기 분산 노드가 상기 전송된 레코드들에 대한 리듀스(reduce) 작업을 수행하는 단계를 포함한다.A method of processing a record using an index in a distributed processing system based on MapReduce includes the steps of classifying records to be analyzed in an input file by a distributed node of the distributed processing system, The distributed node accessing the index in the order of the keys in the data structure and transmitting the records stored in the storage location indicated by the accessed index, And the node performing a reduce operation on the transmitted records.

이하 설명하는 기술은 인덱스 기반의 자료 구조를 사용하여 저장 장치에 대한 접근 횟수를 줄인다. 따라서 이하 설명하는 기술은 맵리듀스 프레임워크를 이용한 빅데이터 응용의 수행 시간을 단축할 수 있다. The techniques described below use index based data structures to reduce the number of accesses to storage devices. Therefore, the technique described below can shorten the execution time of the big data application using the MapReduce framework.

도 1은 하둡 프레임워크에 대한 구성을 도시한 예이다.
도 2는 맵 리듀스에 기반한 분산 처리 시스템에서 레코드 처리 과정을 도시한 예이다.
도 3은 맵 리듀스에 기반한 분산 처리 시스템에서 인덱스를 이용하여 레코드를 처리하는 과정에 대한 예이다.
도 4는 맵 리듀스에 기반한 분산 처리 시스템에서 인덱스를 이용하여 레코드를 처리하는 과정에 대한 다른 예이다.1 shows an example of a configuration of a Hadoop framework.
2 shows an example of a record processing process in a distributed processing system based on map reduction.
3 is an example of a process of processing a record using an index in a distributed processing system based on map reduction.
4 is another example of a process of processing a record using an index in a distributed processing system based on map reduction.

이하 설명하는 기술은 다양한 변경을 가할 수 있고 여러 가지 실시례를 가질 수 있는 바, 특정 실시례들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 이하 설명하는 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 이하 설명하는 기술의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.The following description is intended to illustrate and describe specific embodiments in the drawings, since various changes may be made and the embodiments may have various embodiments. However, it should be understood that the following description does not limit the specific embodiments, but includes all changes, equivalents, and alternatives falling within the spirit and scope of the following description.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 이하 설명하는 기술의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.The terms first, second, A, B, etc., may be used to describe various components, but the components are not limited by the terms, but may be used to distinguish one component from another . For example, without departing from the scope of the following description, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함한다" 등의 용어는 설시된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.As used herein, the singular " include "should be understood to include a plurality of representations unless the context clearly dictates otherwise, and the terms" comprises & , Parts or combinations thereof, and does not preclude the presence or addition of one or more other features, integers, steps, components, components, or combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다.Before describing the drawings in detail, it is to be clarified that the division of constituent parts in this specification is merely a division by main functions of each constituent part. That is, two or more constituent parts to be described below may be combined into one constituent part, or one constituent part may be divided into two or more functions according to functions that are more subdivided. In addition, each of the constituent units described below may additionally perform some or all of the functions of other constituent units in addition to the main functions of the constituent units themselves, and that some of the main functions, And may be carried out in a dedicated manner.

또, 방법 또는 동작 방법을 수행함에 있어서, 상기 방법을 이루는 각 과정들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 과정들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.Also, in performing a method or an operation method, each of the processes constituting the above method may occur in a different order than that described in the context without explicitly specifying a specific order in the context. That is, each process may occur in the same order as described, may be performed substantially concurrently, or may be performed in the opposite order.

이하 설명하는 기술은 맵리듀스(MapReduce) 프레임워크를 사용하는 분산 처리 시스템에서 입력받은 레코드(데이터)를 처리하는 방법에 관한 것이다. 맵리듀스에 기반한 대표적인 분산 처리 시스템은 하둡(Hadoop)이 있다. 하둡은 빅데이터 처리에 매우 효율적인 시스템 아키텍처를 제공한다. 이하 설명하는 기술은 하둡에만 적용되는 것은 아니지만, 설명의 편의를 위해 하둡을 기준으로 설명하고자 한다.The technique described below relates to a method for processing records (data) input in a distributed processing system using the MapReduce framework. A typical distributed processing system based on MapReduce is Hadoop. Hadoop provides a very efficient system architecture for big data processing. Although the technique described below is not applied only to Hadoop, it will be described on the basis of Hadoop for convenience of explanation.

도 1은 하둡 프레임워크에 대한 구성을 도시한 예이다. 하둡 프레임워크는 데이터를 저장하는 분산파일시스템인 HDFS(Hadoop Distributed File System, 하둡 분산 파일 시스템)(101)과 데이터 분석을 수행하는 MapReduce 프레임워크(102)로 구성되어 있다.1 shows an example of a configuration of a Hadoop framework. The Hadoop framework comprises a Hadoop Distributed File System (HDFS) 101, which is a distributed file system for storing data, and a MapReduce framework 102, which performs data analysis.

HDSF(101)는 일반적인 분산 파일 시스템과 같이 마스터 노드(110)와 슬레이브 노드(120)로 구성된다. The HDSF 101 is composed of a master node 110 and a slave node 120 like a general distributed file system.

마스터 노드(110)는 네임노드(name node)라 불리며, 데이터노드(data node)로 불리는 슬레이브 노드(120)의 동작 상태를 실시간으로 관리하고 최대 수천 대의 데이터노드에 분산 저장되어 있는 데이터에 대한 메타데이터를 관리한다. The master node 110 is called a name node and manages the operation state of the slave node 120 called a data node in real time and stores the meta data Manage data.

데이터는 여러 개의 블록으로 나누어지고, 복수의 슬레이브 노드(120)가 하나의 블록을 복제하여 보관하는 방식으로 분산 저장된다. 마스터 노드(110)는 메타데이터를 이용하여 특정 블록이 어떤 슬레이브 노드(120)에 저장되어 있는지 여부를 알 수 있다.The data is divided into a plurality of blocks, and a plurality of slave nodes 120 are distributedly stored in such a manner that one block is duplicated and stored. The master node 110 can know whether a specific block is stored in which slave node 120 using the metadata.

HDFS(101)는 수십 PB에 이르는 대용량 데이터를 수천 대의 서버를 이용하여 빠르게 처리할 수 있도록 설계되었다. 다른 분산 파일 시스템에 비해 HDFS(101)는 메타데이터에 접근하거나 데이터를 변경하는 작업의 대기시간을 희생하는 대신 데이터를 읽어오는 작업의 처리량 (throughput)을 높여 큰 데이터를 한번에 빠르게 가져올 수 있도록 설계되었다. 예컨대, HDFS(101)에서 한 블록의 크기는 기본적으로 64MB로 설정되어 있다. 이는 일반적인 파일 시스템의 블록 사이즈인 수십 KB와는 큰 차이를 보인다. 하둡은 수집된 데이터의 배치 처리에 최적화된 플랫폼이다. HDFS(101)는 한 번 쓰기 완료된 데이터는 수정이 불가능하고 오직 이어쓰기만이 가능하도록 설계되었다. 이처럼 데이터 저장 방식이 간단해지면 전체 시스템의 관리가 간편해져 수천 대의 서버로 구성된 클러스터도 무리 없는 운영이 가능해진다. 또한 후술할 맵리듀스의 구현도 간단해진다.The HDFS 101 is designed to process large volumes of data, ranging from tens of PBs, to thousands of servers. Compared to other distributed file systems, HDFS (101) is designed to speed up the throughput of data read operations at a time, instead of sacrificing latency in accessing or changing metadata . For example, the size of one block in the HDFS 101 is basically set to 64 MB. This is significantly different from the typical block size of a file system, which is several tens of KB. Hadoop is a platform optimized for batch processing of collected data. The HDFS (101) is designed so that once written data can not be modified and only write-once is possible. The simplicity of data storage simplifies the management of the entire system, enabling clusters of thousands of servers to run smoothly. The implementation of the MapReduce described below is also simplified.

맵리듀스(MapReduce,102)는 HDFS(101) 상에서 동작하는 데이터 분석 프레임워크이다. 맵리듀스는 일반 프로그래밍방법과는 다른 데이터 중심 프로그래밍 모형을 제공한다. 일반적인 분산 환경에서의 프로그래밍은 대개의 프로그래머가 익숙한, 단일 서버에서의 프로그래밍과 달리 분산된 작업의 스케줄링이나 일부 서버의 고장, 서버 간 네트워크 구성 등 많은 문제를 고려해야한다. 맵리듀스에서는 이런 복잡한 문제들이 플랫폼 차원에서 단순화되어 프로그래머는 데이터의 배치 처리를 위한 맵 (mapper)과 리듀스 (reducer) 함수만을 작성하면 되도록 구현되어 있다. 맵리듀스 프레임워크는 맵 합수를 이용한 맵(Map) 단계와 리듀스 함수를 이용한 리듀스(Reduce) 단계를 통해 데이터를 처리한다. MapReduce 102 is a data analysis framework that operates on the HDFS 101. MapReduce provides a data-driven programming model that differs from traditional programming methods. Programming in a typical distributed environment requires many considerations, such as scheduling of distributed work, breakdown of some servers, and network configuration between servers, unlike programming on a single server, which most programmers are familiar with. In MapReduce, these complex problems are simplified at the platform level, so programmers are only required to write mapper and reducer functions for batch processing of data. The MapReduce framework processes data through the Map step using the map sum and the Reduce step using the Reduce function.

마스터 노드(110)에서 잡 트래커(Job Tracker)는 슬레이브 노드(120)의 태스크 트래커(Task Tracker)가 수행할 태스크의 스케줄링을 수행한다. 슬레이브 노드(120)의 태스크 트래커는 일정한 태스크를 수행하고, 경과를 잡 트래커에 보고한다. 태스크 트래커가 수행하는 태스크가 맵 단계 및 리듀스 단계에 해당한다.In the master node 110, a job tracker performs scheduling of a task to be performed by a task tracker of the slave node 120. The task tracker of the slave node 120 performs a certain task and reports the progress to the job tracker. The tasks performed by the task tracker correspond to the map step and the reduce step.

맵리듀스 프레임워크에서는 데이터를 표현하기 위해 키(key)와 값(value)으로 이루어진 레코드(record)를 사용한다. 맵 단계는 데이터를 키와 값을 갖는 레코드 형태로 연관성 있는 데이터를 분류하는 과정이다. 리듀스 단계는 같은 키를 가지는 레코드를 대상으로 리듀스 함수를 적용하여 최종 결과를 가지는 키와 값을 생성한다. 리듀스 단계는 일반적으로 중복된 데이터를 제거하고, 원하는 데이터 내지 정보를 추출한다.The MapReduce framework uses a record consisting of a key and a value to represent data. A map step is a process of sorting data that is related to data in the form of records having keys and values. The Reduce step applies the Reduce function to a record with the same key to generate a key and value with the final result. The Reduce step generally removes redundant data and extracts the desired data or information.

맵 함수는 HDFS(101)에서 불러온 데이터를 가공하여 새로운 <키, 값> 집합을 출력한다. 맵리듀스 시스템에서는 같은 키를 갖는 값들을 묶어 <키, (값1, 값2,...)> 식의 새로운 <키, 값> 쌍의 집합을 만든다. 리듀스 함수는 여기에 집계 연산을 수행하여 또 다른 <키, 값> 쌍의 집합을 생성하고 이를 HDFS에 저장한다.The map function processes the data loaded from the HDFS 101 and outputs a new set of < key, value >. In the MapReduce system, we create a set of new <key, value> pairs of <key, (value 1, value 2, ...)> expressions by grouping values with the same key. The Reduce function performs an aggregation operation on it to create another set of <key, value> pairs and stores them in HDFS.

하둡은 기본적으로 여러 대의 서버로 구성되어 있는 완전 분산 모드 (fully distributed mode)에서 작동한다. 나아가 하둡은 모든 프로세스가 한 대의 서버에서 동작하는 의사 분산 모드 (pseudo-distributed mode)로 동작할 수도 있다. 후자의 경우 물리적으로는 하나의 서버에서 모든 과정이 수행된다.Hadoop basically operates in fully distributed mode, which consists of several servers. Furthermore, Hadoop can operate in pseudo-distributed mode, where all processes run on a single server. In the latter case, the entire process is performed physically on one server.

도 2는 맵리듀스에 기반한 분산 처리 시스템에서 레코드 처리 과정을 도시한 예이다. 도 2는 종래 맵리듀스 프레임워크에서 맵 단계와 리듀스 단계를 수행하는 과정이다. 도 2는 하둡 프레임워크를 예로 도시한다. 하둡 프레임워크에서 맵리듀스를 수행하는 장치는 각 슬레이브 노드(120)에 해당한다. 슬레이브 노드(120)는 일정한 데이터를 처리하고, 데이터를 통신망을 통해 송수신할 수 있는 컴퓨터 장치(서버 등)를 말한다. 이하 컴퓨터 장치가 맵리듀스를 수행한다고 설명한다. 이하 설명에서 (a) 내지 (e)로 표시하는 부분은 도면에서 동일한 부호로 표시된 과정을 의미한다. FIG. 2 shows an example of a record processing process in a distributed processing system based on MapReduce. 2 is a flowchart illustrating a process of performing a map step and a re-de-step step in a conventional maple deuce framework. 2 shows an example of the Hadoop framework. In Hadoop framework, a device that performs map de-miss corresponds to each slave node 120. The slave node 120 refers to a computer device (server or the like) that processes certain data and can transmit and receive data through a communication network. Hereinafter, it will be explained that the computer apparatus performs MapReduce. In the following description, parts denoted by (a) to (e) are denoted by the same reference numerals in the drawings.

먼저 컴퓨터 장치는 하둡 파일 시스템(HDFS)으로부터 입력 데이터를 받는다. 컴퓨터 장치는 입력 데이터에 맵 함수를 적용하여 버퍼에 저장한다. 이때 리듀스 작업이 여러 개라면 파티션을 나누어서 저장한다. 도 2는 2개의 리듀스 작업이 있는 경우를 가정한 것이다. 버퍼의 크기가 일정 크기를 넘어서면, 컴퓨터 장치는 해당 버퍼를 퀵 정렬(quick sort)을 이용하여 정렬하고 로컬 저장 장치에 저장하고 버퍼를 비운다(a 과정). 컴퓨터 장치가 로컬 저장 장치에 저장하는 파일을 스필 파일(spill files)이라고 한다. 컴퓨터 장치는 맵 작업에 할당된 입력데이터를 모두 처리할 때까지 스필 파일을 로컬 저장 장치에 저장한다.First, the computer device receives input data from the Hadoop file system (HDFS). The computer device applies the map function to the input data and stores it in a buffer. If there are multiple redo operations, the partition is saved. Fig. 2 assumes that there are two reduction tasks. When the size of the buffer exceeds a predetermined size, the computer device arranges the buffer using a quick sort, stores the buffer in a local storage device, and empties the buffer (step a). The files that the computer device stores on the local storage device are called spill files. The computer device stores the spill file on the local storage device until it has processed all of the input data assigned to the map job.

여러 개의 조각(spill) 파일이 생성된 경우 병합 정렬을 이용하여 하나의 파일로 만든다(b 과정). 병합 정렬된 하나의 파일이 맵 과정의 출력 자료(Map output)에 해당한다. 이 때 생성된 파일은 리듀스 작업의 개수만큼 파티션으로 나누어져 있다. If multiple spill files are created, they are merged into a single file (step b). The merged and sorted file corresponds to the map output of the map process. The generated files are partitioned by the number of redo operations.

컴퓨터 장치는 맵의 출력 파일(병합 정렬된 레코드)을 리듀스 작업에 전달하기 위해 셔플 파일(shuffle files)을 생성한다(c 과정). 이 과정을 셔플링(shuffling)이라고 한다. 복수의 리듀스 작업이 있는 경우 파티션 별로 해당되는 리듀스 작업에 데이터를 보낸다. 셔플링은 맵 합수의 결과를 취합하여 리듀스 함수로 전달하는 역할을 한다. The computer device generates shuffle files to pass the map's output file (merge-aligned records) to the redess task (step c). This process is called shuffling. If there are multiple redo operations, the data is sent to the corresponding redo operations for each partition. Shuffling collects the result of the map sum and transfers it to the reduction function.

컴퓨터 장치는 여러 개의 맵 작업으로부터 데이터를 받아서 다시 병합 정렬을 수행한다(d 과정). 병합 정렬하여 생성된 파일(Reduce input으로 표시)은 리듀스 작업에 사용된다.The computer device receives data from multiple map operations and performs merge sorting again (step d). Files created by merge sorting (denoted by Reduce input) are used for redescing operations.

컴퓨터 장치는 d 과정에서 생성된 파일에 리듀스 함수를 적용하여 최종 결과를 작성한 뒤, HDFS에 그 결과를 저장한다(e 과정).The computer device applies the Reduce function to the file generated in step d, creates the final result, and stores the result in the HDFS (step e).

맵 출력 자료(Map output)을 생성하는 과정과 리듀스를 위한 입력 자료(Reduce input)를 생성하는 과정에서 외부 병합 정렬을 수행한다. 이 과정에서 여러 번의 READ/WRITE가 발생하게 된다. 스필 파일을 작성할 때 WRITE가 발생하고, 맵 출력 자료(Map output)을 생성하는 과정에서에서 병합 정렬을 수행하기 위해 스필 파일에 대한 READ가 발생하고, 맵 출력을 저장하기 위한 WRITE가 발생한다. Outer merge sorting is performed in the process of generating the map output data and generating the reduce input for the reduction. In this process, several READ / WRITE occur. When creating a spill file, a WRITE occurs. In the process of generating a map output, a READ for the spill file occurs to perform the merge sort, and a WRITE for storing the map output occurs.

그리고 셔플 파일 생성 과정에서 맵 출력을 읽어서 리듀스 작업에 보내주고 저장하기 위해 READ, WRITE가 발생한다. 또한 리듀스 작업에 대한 입력 자료(Reduce input)를 생성하는 과정에서 다시 병합정렬을 하기 위해 셔플 파일에 접근하는 READ가 발생한다. 종합적으로 READ 3번, WRITE 3번 발생하게 된다. 또한 맵 출력 자료(Map output)을 생성하는 과정과 리듀스를 위한 입력 자료(Reduce input)를 생성하는 과정에서 병합이 필요한 파일 개수가 많을 경우 여러 번의 병합 과정을 거치게 되는데, 그 만큼 READ/WRITE가 더 많이 발생하게 된다.In the shuffle file creation process, READ and WRITE occur to read map output, send it to the redo job, and save it. In addition, during the process of generating the reduce input for the redo operation, a READ accessing the shuffle file occurs again for merge sorting. Generally, READ 3 times, WRITE 3 times. In addition, in the process of generating map output and generating reduce input for redundancy, if there are many files that need to be merged, they are merged several times, so READ / WRITE And more.

특히 맵리듀스 과정에서 병합 정렬을 이용하기 때문에 저장 장치에 대한 접근 회수가 많아지고, 결과적으로 데이터 처리에 지연을 가져올 수 있다. 이하 설명하는 기술은 종래 맵리듀스 과정에서 병합 정렬 대신에 인덱스 자료 구조를 사용하고자 한다. 이를 통해 컴퓨터 장치가 로컬 저장 장치에 접근하는 횟수를 줄이고자 한다.In particular, since the merge sorting is used in the mapping process, the number of accesses to the storage device increases, resulting in a delay in data processing. The technique described below tries to use an index data structure instead of merge sorting in the conventional maple deuce process. Thereby reducing the number of times a computer device accesses a local storage device.

키(key)의 크기가 레코드 크기보다 매우 작을 경우, 인덱스를 이용한 정렬은 외부 병합 정렬보다 디스크 접근 횟수를 상당히 줄일 수 있다. 정렬의 대상이 되는 레코드들은 입력 데이터 그대로 디스크에 저장하고, 저장 위치와 키값만을 인덱스 구조로 관리한다. 그리고 정렬 결과가 필요할 때 인덱스 구조를 통해 키 순서대로 레코드를 접근하여 정렬 결과를 얻는다. 이 때 다양한 자료구조가 사용가능하다. 예컨대, B 트리 내지 B+ 트리와 같은 인덱스 자료 구조를 이용하면 자연스럽게 키 순서대로 정렬된 결과를 얻을 수 있다. If the size of the key is much smaller than the record size, indexed sorting can significantly reduce disk access times compared to external merge sorting. Records to be sorted are stored in the disk as input data, and only the storage location and key value are managed in an index structure. Then, when the sort result is needed, the records are accessed in key order through the index structure to obtain the sort result. Various data structures are available at this time. For example, using an index data structure such as a B-tree or a B + -type tree, results naturally sorted in key order can be obtained.

도 3은 맵리듀스에 기반한 분산 처리 시스템에서 인덱스를 이용하여 레코드를 처리하는 과정에 대한 예이다. 도 3은 맵 출력 자료(Map output)를 생성하는 과정에 인덱스 자료 구조를 사용한 예이다.FIG. 3 shows an example of a process of processing a record using an index in a distributed processing system based on MapReduce. FIG. 3 shows an example using an index data structure in the process of generating map output data.

컴퓨터 장치는 파일 시스템(HDFS)으로부터 입력 데이터를 받는다. 컴퓨터 장치는 입력 데이터에 맵 함수를 적용하여 버퍼에 저장한다. 이때 리듀스 작업이 여러 개라면 파티션을 나누어서 저장한다. 도 3은 2개의 리듀스 작업이 있는 경우를 가정한 것이다. 버퍼의 크기가 일정 크기를 넘어서면, 컴퓨터 장치는 해당 버퍼를 정렬하고 로컬 저장 장치에 저장하고 버퍼를 비운다(a 과정). 컴퓨터 장치가 로컬 저장 장치에 저장하는 파일을 스필 파일(spill files)이라고 한다. 컴퓨터 장치는 맵 작업에 할당된 입력데이터를 모두 처리할 때까지 스필 파일을 로컬 저장 장치에 저장한다. 스필 파일은 현재 정렬되는 않은 상태(unsorted)이다. 또는 각각의 스필 파일을 저장할 때 퀵 정렬(quick sort)을 이용하여 정렬을 수행할 수 있다. 다만 이 경우에도 여러 개의 스필 파일을 합치기 위한 작업은 필요하다. A computer device receives input data from a file system (HDFS). The computer device applies the map function to the input data and stores it in a buffer. If there are multiple redo operations, the partition is saved. FIG. 3 assumes that there are two redundancy operations. When the size of the buffer exceeds a certain size, the computer device sorts the buffer, stores it in the local storage device, and empties the buffer (step a). The files that the computer device stores on the local storage device are called spill files. The computer device stores the spill file on the local storage device until it has processed all of the input data assigned to the map job. The spill file is currently unsorted. Alternatively, you can use the quick sort to perform the sort when you save each spill file. In this case, however, it is necessary to combine several spill files.

이 때 컴퓨터 장치는 각 레코드 별로 키 값과 디스크 저장 위치를 인덱스로 관리한다(b 과정). 예컨대, 인덱스 자료 구조는 키에 대한 순차 처리가 용이한 B+ 트리를 이용할 수 있다. 컴퓨터 장치는 인덱스 자료 구조를 메모리에서 관리한다.At this time, the computer device manages the index value of the key value and the disk storage location for each record (step b). For example, an index data structure can use a B + tree that is easy to process sequential keys. The computer device manages the index data structure in memory.

컴퓨터 장치는 맵 작업의 결과를 리듀스 작업으로 보낼 때, 인덱스를 키 순서대로 순차 접근하여 각 레코드별 디스크 저장위치를 알아낸다. 그리고 컴퓨터 장치는 해당 위치에 저장된 레코드를 리듀스 작업으로 보내준다(c 과정). 이 과정은 도 2에서의 셔플 파일 생성에 해당한다.When the computer device sends the result of the map operation as a redisplay operation, it sequentially accesses the indexes in the order of the keys to find the disk storage location for each record. Then, the computer device sends the record stored in the corresponding position to the reuse process (step c). This process corresponds to the generation of the shuffle file in Fig.

컴퓨터 장치는 여러 개의 맵 작업으로부터 데이터를 받아서 다시 병합 정렬을 수행한다(d 과정). 병합 정렬하여 생성된 파일(Reduce input)은 리듀스 작업에 사용된다.The computer device receives data from multiple map operations and performs merge sorting again (step d). Files created by merge sorting (Reduce input) are used for resume operations.

도 3에서 최초 스필 파일을 저장하는 a과정에서 WRITE가 발생하고, 인덱스에 기반하여 리듀스에 입력할 파일을 전달하는 c과정에서 READ가 발생하고, 리듀스 작업에서 처리하기 위해 WRITE가 발생한다. 그리고 리듀스 단계의 병합 정렬을 위해 READ가 발생한다. 종합적으로 READ 2번, WRITE 2번으로 도 2의 방법보다 적은 횟수의 디스크 접근이 발생하게 된다.In FIG. 3, a WRITE occurs in a process of storing an initial spill file, a READ occurs in a process c for transferring a file to be input to a redess based on the index, and a WRITE occurs for processing in a redess operation. And a READ occurs for the merge alignment of the reduction step. Comprehensively, READ 2 and WRITE 2 occur less frequently than the method of FIG. 2.

도 4는 맵리듀스에 기반한 분산 처리 시스템에서 인덱스를 이용하여 레코드를 처리하는 과정에 대한 다른 예이다. 도 4는 맵 출력 자료(Map output)와 리듀스 입력 자료(Reduce input)을 생성하는 과정에 모두 인덱스 자료 구조를 사용한 예이다.4 is another example of a process of processing a record using an index in a distributed processing system based on MapReduce. FIG. 4 shows an example in which an index data structure is used in the process of generating map output data and reduce input data.

이 때 컴퓨터 장치는 각 레코드 별로 키 값과 디스크 저장 위치를 인덱스로 관리한다(b 과정). 예컨대, 인덱스 자료 구조는 키에 대한 순차 처리가 용이한 B+ 트리를 이용할 수 있다. 컴퓨터 장치는 인덱스 자료 구조를 메모리에서 관리한다. 인덱스 자료 구조는 스필 파일에 하나씩 생성될 수 있다. 도 4에서는 하나의 파티션에 각각 2개의 인덱스 자료 구조(I₁, I₂, I₃, I₄)가 생성된 예를 도시한다.At this time, the computer device manages the index value of the key value and the disk storage location for each record (step b). For example, an index data structure can use a B + tree that is easy to process sequential keys. The computer device manages the index data structure in memory. Index data structures can be created in the spill file, one by one. In FIG. 4, two index data structures (I ₁ , I ₂ , I ₃ , I ₄ ) are generated in one partition.

컴퓨터 장치는 리듀스 작업을 수행할 때 여러 개의 맵 작업에 대한 결과를 받고 합쳐서 리듀스 입력 자료(Reduce input)을 만든다. 이를 위해 여러 개의 맵 작업으로부터 만들어진 인덱스를 받아서 새로운 큰 인덱스를 만든다(c 과정). 이 경우 인덱스만을 리듀스 작업 노드로 전달하기 때문에 도 2의 셔플 파일은 생성되지 않는다.When a computer device performs a redox operation, it receives the results of several map operations and combines them to produce a Reduce input. To do this, we take an index created from several map operations and create a new large index (c). In this case, the shuffle file of Fig. 2 is not generated because only the index is transferred to the redo job node.

새로운 인덱스를 만드는 과정은 메모리에서 수행된다. 큰 인덱스를 만드는 과정은 다음과 같다. (1) 맵의 인덱스를 순차 접근하여 키와 디스크 저장 위치를 알아낸다. (2) 키와 디스크 저장 위치, 맵의 식별자를 새로운 인덱스에 넣어 관리한다. 일반적으로 복수의 맵 출력 자료가 사용되기 때문에 맵의 식별자라는 새로운 항목을 사용한다. 새로운 인덱스는 키, 저장 위치, 맵의 식별자를 지시한다. 맵의 식별자는 맵 출력 자료에 대한 인덱스 자료 구조의 식별자에 해당한다. 예컨대, 도 4에 표시한 I₁, I₂, I₃, I₄가 맵 식별자(인덱스 자료 구조의 식별자)에 해당할 수 있다. (3) 모든 맵의 모든 레코드에 대하 전술한 (1)과 (2) 과정을 반복한다. 도 4에서는 이러한 과정을 거쳐 두 개의 새로운 인덱스(I₅, I₆)가 생성된 예를 도시한다.The process of creating a new index is performed in memory. The process of creating a large index is as follows. (1) The indexes of the map are sequentially accessed to find the key and disk storage location. (2) Manage the key, disk storage location, and map identifier in a new index. Generally, multiple map output data is used, so a new entry called map identifier is used. The new index indicates the key, the storage location, and the identifier of the map. The identifier of the map corresponds to the identifier of the index data structure for the map output data. For example, I ₁ , I ₂ , I ₃ , and I ₄ shown in FIG. ₄ may correspond to a map identifier (an identifier of an index data structure). (3) Repeat steps (1) and (2) for all records in all maps. FIG. 4 shows an example in which two new indexes I ₅ and I ₆ are generated through this process.

컴퓨터 장치는 새로 생성된 큰 인덱스를 순차 접근하여 같은 키값을 가지는 레코드를 가져온다. 이 때 실제 자료는 맵 작업을 수행한 노드에 저장되어 있으므로, 맵 식별자와 디스크 저장 위치를 이용하여 키값이 동일한 레코드에 접근한다(d 과정).The computer device sequentially accesses the newly created large indexes and fetches the records having the same key value. In this case, since the actual data is stored in the node performing the map operation, the record having the same key value is accessed using the map identifier and the disk storage location (step d).

컴퓨터 장치는 같은 키값을 가지는 레코드들을 대상으로 리듀스 함수를 적용하고 최종 결과를 만들어 HDFS에 저장한다(e 과정).The computer device applies the reduction function to the records having the same key value, creates the final result, and stores it in the HDFS (e process).

도 4에서 최초 스필 파일을 저장하는 a과정에서 WRITE가 발생하고, 인덱스에 기반하여 리듀스에 입력할 파일을 전달하는 d과정에서 READ가 발생한다. 종합적으로 READ 1번, WRITE 1번만의 디스크 접근으로 맵과 리듀스 단계 수행이 가능한 것이다.In FIG. 4, a WRITE occurs in a process of storing an initial spill file, and a READ is generated in a process of transferring a file to be input to a redisplay based on an index. Comprehensively, it is possible to perform the map and the reduction step by accessing only the READ 1 and WRITE 1 discs.

도 3이나 도 4에서 컴퓨터 장치는 키 값을 기준으로 인덱스에 순차 접근하여 레코드를 가져온다. 따라서 컴퓨터 장치는 스필 파일을 임의 접근을 잘 처리할 수 있는 SSD와 같은 저장장치에 저장하는 것이 바람직하다. 또한 인덱스를 통해 디스크 접근 위치를 미리 알아낼 수 있으므로, libAIO와 같은 비동기 방식을 사용하면 더 빠르게 데이터를 가져 올수 있다.In FIG. 3 or 4, the computer device sequentially accesses the index based on the key value to fetch the record. It is therefore desirable that the computer device store the spill file in a storage device such as an SSD that can handle random access. In addition, indexes can be used to determine disk access locations ahead of time, so asynchronous methods such as libAIO can be used to fetch data faster.

전술한 설명에서 인덱스 자료 구조로 B 트리 내지 B+ 트리를 사용한다고 설명하였다. 그러나 인덱스 자료 구조는 반드시 B 트리를 사용해야 하는 것은 아니다. 기본적으로 메모리 상에서 키를 기준으로 순차 접근이 가능한 다양한 자료 구조를 사용할 수 있다.In the foregoing description, it has been explained that B-tree or B + -tree is used as the index data structure. However, index data structures do not necessarily have to use a B-tree. Basically, you can use a variety of data structures that can be sequentially accessed based on keys in memory.

도 3 및 도 4에서 컴퓨터 장치는 스필 파일을 생성한다고 설명하였다. 그러나 만일 맵 작업을 위한 입력데이터가 같은 노드의 HDFS에 존재하고 레코드별로 접근 가능한 경우, 데이터 저장을 위한 WRITE 작업을 수행할 필요가 없다. 컴퓨터 장치는 입력 데이터를 읽어서 키 값과 저장위치를 알아내고, 이 정보를 인덱스로 관리할 수 있다.In Figures 3 and 4, the computer device has been described as generating a spill file. However, if the input data for the map operation exists in the HDFS of the same node and is accessible by record, there is no need to perform a WRITE operation for data storage. The computer device reads the input data, finds the key value and storage location, and manages this information as an index.

컴퓨터 장치는 스필 파일로부터 인덱스 자료 구조를 만드는 과정에서 레코드에 count, sum, min/max 등과 같은 통계 함수를 적용할 수 있다. 즉, 컴퓨터 장치가 인덱스를 기준으로 레코드를 읽을 때 특정한 연산이 바로 가능한 경우 연산 결과를 더 지시하는 인덱스를 생성할 수 있다. 이 경우 인덱스 자료 구조는 키, 저장 위치 및 연산 결과를 포함한다. 이후 리듀스 함수에서 이와 같은 통계함수만을 이용하는 경우 레코드의 실제 데이터를 읽을 필요 없이 계산 결과값을 인덱스로부터 얻어올 수 있다.The computer device can apply statistical functions such as count, sum, min / max, etc. to a record in the process of creating an index data structure from a spill file. That is, when a computer device reads a record based on an index, it is possible to generate an index that further indicates the result of the operation when a particular operation is immediately possible. In this case, the index data structure includes the key, the storage location, and the operation result. If you use only these statistical functions in the Reduce function, you can get the result of the calculation from the index without having to read the actual data of the record.

전술한 맵 식별자는 어느 노드로부터 데이터를 읽어올지 판단하는 용도이다. 컴퓨터 장치가 스필 자료로부터 인덱스를 만들 때 맵 식별자를 인덱스에 같이 포함하여 관리할 수도 있다. The above-described map identifier is used for determining from which node data is to be read. When the computer device creates an index from the spill data, the map identifier may be managed by including it in the index.

본 실시례 및 본 명세서에 첨부된 도면은 전술한 기술에 포함되는 기술적 사상의 일부를 명확하게 나타내고 있는 것에 불과하며, 전술한 기술의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당업자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시례는 모두 전술한 기술의 권리범위에 포함되는 것이 자명하다고 할 것이다.The present embodiment and drawings attached hereto are only a part of the technical idea included in the above-described technology, and it is easy for a person skilled in the art to easily understand the technical idea included in the description of the above- It will be appreciated that variations that may be deduced and specific embodiments are included within the scope of the foregoing description.

101 : HDSF
102 : 맵리듀스 프레임워크
110 : 마스터 노드
120 : 슬레이브 노드101: HDSF
102: MapReduce Framework
110: master node
120: Slave node

Claims

A distributed node of a distributed processing system based on MapReduce classifies input data into records to be analyzed using a map function and stores the classified data in a storage device of the distributed node; The records are each composed of a pair of a key and a value,
Generating an index structure in which the distributed node is configured with indices that together indicate the key and the storage location in which each record is stored in the storage device; The indices corresponding to the records, respectively,
The distributed node accesses the indices according to the order of the keys in the index structure to identify each storage location for the records, and stores the records stored in each identified storage location in a partition ; And
Wherein the distributed node performs the redesing operation on the records classified by the partition by using a reduce function, and in the distributed processing system based on MapReduce, How to process.

The method according to claim 1,
Wherein the index structure is located in a memory of the distributed node and is sequentially accessible based on the key.

delete

The method according to claim 1,
The performing of the reduction operation may include performing a merge sort on the sorted records, and removing redundant data based on the merge sorted records. In the distributed processing system based on the MapReduce, How to process records using.

delete

The method according to claim 1,
In the step of creating the index structure
Wherein the distributed node generates the index structure further including a result of calculating a value of a record based on the key, in a distributed processing system based on MapReduce.

The method according to claim 6,
Wherein the calculation is performed using at least one of a summation for a key having the same value, a number of keys having the same value, a minimum value among values for the keys having the same value, and a maximum value among the values for the key having the same value, A method for processing a record using an index in a distributed processing system based on XML.

A distributed node in a distributed processing system based on MapReduce is provided with a first key for each first record and a first index indicating a first storage location of each first record in the storage device of the distributed node Generating a first index structure comprising a plurality of index structures;
A second index structure consisting of second indexes for indicating, for the second records, a key of each second record and a storage location in which each second record is stored in the storage device of the distributed node, &Lt; / RTI >
Wherein the distributed node accesses the first indexes in the order of the keys in the first index structure and accesses the second indexes in the order of keys in the second index structure, Generating a third index structure including all of the indexes included in the second index structure, the third indexes indicating the keys of the respective records, the storage locations of the respective records, and third indexes indicating the identifiers of the index structures to which the respective records belong; ; And
The distributed node accesses the third indexes according to the order of the keys in the third index structure and accesses the storage nodes based on the storage location of the record indicated by the third index having the same key value and the identifier of the data structure A method for processing a record using an index in a distributed processing system based on a MapReduce, the method comprising: applying a reduction function to a record.

delete