KR20100073154A

KR20100073154A - Method for data processing and asymmetric clustered distributed file system using the same

Info

Publication number: KR20100073154A
Application number: KR1020080131747A
Authority: KR
Inventors: 김영철; 김영균; 남궁한
Original assignee: 한국전자통신연구원
Priority date: 2008-12-22
Filing date: 2008-12-22
Publication date: 2010-07-01

Abstract

본 발명은 비대칭 클러스터 분산 파일 시스템에서 클라이언트들로부터 집중적인 접근이 이루어지는 데이터에 대한 접근을 분산하는 방법에 관한 것으로서, 데이터 서버에서 핫 데이터로 판단되는 데이터 청크를 분할하여 다른 데이터 서버로 복제함으로써 복제로 인한 비용을 최소화하면서 클라이언트로부터의 핫 데이터에 대한 접근 요청을 분산할 수 있을 뿐 아니라, 분산된 핫 데이터 청크를 효율적으로 관리하면서 핫 데이터 청크에 대한 축소와 확장이 손쉽게 이루어질 수 있도록 한다.The present invention relates to a method for distributing access to data in which a centralized access is made from clients in an asymmetric cluster distributed file system, by dividing a data chunk determined as hot data in a data server and replicating the data chunk to another data server. In addition to distributing access requests for hot data from clients while minimizing the cost, it also makes it easy to reduce and expand hot data chunks while efficiently managing distributed hot data chunks.

Description

Metadata server, data server processing method and asymmetric clustered distributed file system using the same

본 발명은 비대칭 클러스터 분산 파일 시스템에서 클라이언트들로부터 집중적인 접근이 이루어지는 데이터에 대한 접근을 분산하는 방법에 관한 것이다.The present invention relates to a method for distributing access to data in which intensive access is made from clients in an asymmetric cluster distributed file system.

본 발명은 지식경제부 및 정보통신연구진흥원의 IT성장동력기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: 2007-S-016-02, 과제명: 저비용 대규모 글로벌 인터넷 서비스 솔루션 개발].The present invention is derived from the research conducted as part of the IT growth engine technology development project of the Ministry of Knowledge Economy and the Ministry of Information and Communication Research and Development. [Task management number: 2007-S-016-02] ].

최근에 인터넷 서비스로 UCC 등과 같이 동영상 서비스를 제공하는 경우에 다수의 클라이언트들로부터 특정 동영상들에 대한 접근이 한 순간에 일시적 또는 지속적으로 폭주하는 일이 빈번하게 발생하고 있다.Recently, when a video service such as UCC is provided as an Internet service, access to specific videos from a plurality of clients is frequently or temporarily congested at a moment.

이런 인기 동영상과 같이, 현저한 액세스(access)량을 보이는 데이터를 핫 데이터(hot data)라고 한다. 핫 데이터에 대한 서비스를 효율적으로 제공하지 못한다면 서비스의 지연 또는 중지로 이어질 수 있으며 다른 데이터에 대한 서비스에도 영향을 미칠 수 있다.Like this popular video, data showing a significant amount of access is called hot data. Failure to provide services for hot data effectively can lead to delays or suspensions of services and impact on services for other data.

핫 데이터에 대한 서비스를 효율적으로 제공하기 위해서는, 클라이언트로부터 요청되는 핫 데이터에 대한 접근 빈도가 상승할 때, 핫 데이터를 여러 서버들로 효율적으로 분산하면서 핫 데이터에 대한 서비스가 지연되지 않도록 보장할 수 있는 방법이 요구된다.In order to efficiently provide service for hot data, when the frequency of access to hot data requested from a client increases, the hot data can be efficiently distributed to multiple servers while ensuring that the service for the hot data is not delayed. How is it required?

비대칭 클러스터 분산 파일 시스템은 파일의 메타데이터와 실제 데이터를 분리하여 저장하고 관리하는 시스템으로서, 메타데이터는 메타데이터 서버에서 관리되며 실제 데이터는 여러 대의 데이터 서버들에 분산되어 저장된다. 메타데이터 서버에서 관리하는 메타데이터에는 실제 데이터가 저장되어 있는 데이터 서버에 대한 정보가 포함된다. The asymmetric cluster distributed file system is a system that stores and manages metadata of files and actual data separately. The metadata is managed in a metadata server, and the actual data is distributed and stored in multiple data servers. The metadata managed by the metadata server includes information about the data server where the actual data is stored.

이러한 메타데이터 서버와 데이터 서버들은 네트워크로 연결되어 분산된 구조를 갖는다. 따라서 클라이언트는 파일의 메타데이터와 데이터를 접근하는 경로가 분리된다. 즉, 파일을 액세스하기 위해서, 클라이언트는 먼저 메타데이터 서버에 있는 파일의 메타데이터를 액세스해서 실제 데이터가 저장되어 있는 데이터 서버들에 대한 정보를 얻는다. 그리고 나서 실제 데이터에 대한 입출력은 데이터 서버들을 통하여 이루어진다.These metadata servers and data servers are distributed over a network. Thus, the client separates the file's metadata from the path to access the data. In other words, to access a file, a client first accesses the file's metadata in the metadata server to obtain information about the data servers where the actual data is stored. Then input and output to the actual data is done through the data servers.

비대칭 클러스터 분산 파일 시스템에서는 파일 데이터를 고정된 크기의 데이터 청크(chunk) 단위로 나누어 데이터 서버들에 분산하여 저장한다. In an asymmetric cluster distributed file system, file data is divided into data chunks of fixed size and distributed and stored in data servers.

또한 서버나 네트워크 등이 고장난 경우에, 데이터에 대한 입출력을 할 수 없는 문제를 해결하고자 청크에 대한 복제본을 만들어 다른 데이터 서버들에 저장하는데, 복제본은 저장 비용 등을 고려하여 세 개 정도를 유지하는 것이 일반적이 다. Also, in case of server or network failure, in order to solve the problem of not being able to input / output data, make a duplicate of the chunk and store it in other data servers. It is common.

복제본을 여러 데이터 서버에 유지함에 따라서, 사용자로부터의 액세스 부하를 분산시킬 수 있게 된다. By keeping replicas on multiple data servers, you can distribute the access load from users.

하지만, 동영상 파일을 접근하는 대부분의 클라이언트들은 동영상 파일의 유명하거나 특징적인 부분에 집중적으로 액세스하는 경향이 많다. 따라서 특정 파일 데이터 청크에 대한 액세스가 급증할 수 있으며, 이렇게 액세스가 급증한 특정 데이터 청크, 즉 핫 데이터 청크에 대한 서비스를 효율적으로 제공해야 하는 문제가 제기되고 있다.However, most clients accessing a video file tend to focus on the famous or characteristic parts of the video file. Therefore, access to a specific file data chunk may increase rapidly, and a problem that needs to efficiently provide a service for a specific data chunk, that is, a hot data chunk, has been raised.

현재 비대칭 클러스터 분산 파일 시스템에서는 핫 데이터에 대한 이벤트가 발생하였을 때, 데이터를 다시 여러 데이터 서버들에 복제함으로써 핫 데이터에 대한 액세스를 분산시키는 방법을 사용한다. Currently, asymmetric cluster distributed file system uses a method of distributing access to hot data by replicating the data back to multiple data servers when an event for hot data occurs.

하지만 핫 데이터를 다른 데이터 서버에 복제하는 작업은 핫 데이터를 서비스하고 있는 데이터 서버에게는 부담으로 작용할 수 있다. 뿐만 아니라 일반적으로 비대칭 클러스터 분산 파일 시스템에서는, 파일 데이터에 대한 입출력을 위해서 매번 메타데이터 서버에 액세스해야 하는 비용을 줄임과 동시에 메타데이터 서버에서 유지하는 메타데이터 크기를 줄일 수 있다는 점 등을 고려하여, 일반 파일 시스템의 블록보다 훨씬 큰 단위로 파일 데이터를 나누어 저장하기 때문에, 데이터 청크 자체를 복제하는 비용도 커다란 부담으로 작용할 수 있다. However, replicating hot data to other data servers can be a burden on the data server serving the hot data. In addition, in general, in an asymmetric cluster distributed file system, the size of metadata maintained by the metadata server can be reduced while reducing the cost of accessing the metadata server each time for input and output of file data. The cost of replicating the data chunks themselves can also be a huge burden, because file data is divided and stored in much larger units than blocks in a regular file system.

이를 고려해서 인기 있을 것으로 예상되는 동영상 데이터를 미리 여러 데이터 서버에 복제하여 두는 방법도 사용되고 있지만, 예상이 항상 만족하는 것은 아 니기 때문에 저장 공간의 낭비를 초래할 수 있다. Considering this, the video data that is expected to be popular is copied to several data servers in advance. However, since the prediction is not always satisfied, it may cause a waste of storage space.

따라서, 핫 데이터 청크에 대한 접근을 빠르게 분산시킬 수 있는 방법이 요구된다.Thus, what is needed is a way to quickly distribute access to hot data chunks.

본 발명에서는 비대칭 클러스터 분산 파일 시스템에서 핫 데이터에 대한 접근을 효율적으로 분산할 수 있도록 한다.In the present invention, it is possible to efficiently distribute access to hot data in an asymmetric cluster distributed file system.

본 발명에서는 핫 데이터를 분할함으로써 핫 데이터에 대한 접근을 효과적으로 분산할 수 있도록 한다.In the present invention, by splitting the hot data, it is possible to effectively distribute the access to the hot data.

본 발명에서는 클라이언트로부터 요청된 핫 데이터에 대한 접근을 핫 데이터를 분할한 데이터 블록들로 분산할 수 있도록 한다.According to the present invention, access to hot data requested from a client can be distributed among data blocks divided into hot data.

본 발명에서는 분할되어 있던 데이터 블록들을 조합함으로써 핫 데이터에 대한 수요가 감소한 경우에, 데이터를 효율적으로 관리할 수 있도록 한다.In the present invention, by combining the data blocks that have been divided, it is possible to efficiently manage data when the demand for hot data is reduced.

본 발명에서는 핫 데이터에 대한 접근이 줄어들었을 때 분할했던 핫 데이터 블록을 조합하면서 데이터 블록들의 수를 줄여나갈 수 있도록 한다.According to the present invention, the number of data blocks can be reduced while combining hot data blocks that are divided when access to hot data is reduced.

본 발명에서는 핫 데이터 전체가 아니라 핫 데이터 블록 단위의 복제로 핫 데이터에 대한 복제가 보다 빠르게 수행될 수 있도록 한다.In the present invention, the hot data can be replicated faster by the hot data block unit instead of the hot data as a whole.

본 발명에서는 핫 데이터 블록을 이용하여 핫 데이터에 대한 회복이 수행될 수 있도록 한다.In the present invention, the recovery of the hot data can be performed by using the hot data block.

본 발명에서는, 데이터 파일에 대한 메타데이터; 상기 데이터 파일에서 분할된 각 데이터 청크에 대한 메타데이터; 및 상기 데이터 청크에서 분할된 데이터 청크블록에 대한 메타데이터를 저장하는 데이터 저장부를 포함하는, 비대칭 클러스터 분산 파일 시스템의 메타데이터 서버를 제공한다.In the present invention, the metadata for the data file; Metadata for each data chunk divided in the data file; And it provides a metadata server of the asymmetric cluster distributed file system including a data storage for storing metadata for the data chunk block divided in the data chunk.

본 발명에 따른 비대칭 클러스터 분산 파일 시스템의 메타데이터 서버에서, 상기 데이터 파일에 대한 메타데이터는, 상기 데이터 청크를 식별하는 청크 식별자를 포함하며, 상기 데이터 청크에 대한 메타데이터는, 상기 데이터 파일을 식별하는 파일 식별자; 및 상기 데이터 청크블록을 식별하는 데이터 청크블록 식별자를 포함하고, 상기 데이터 청크블록에 대한 메타데이터는, 상기 데이터 청크를 식별하는 데이터 청크 식별자를 포함할 수도 있다. In the metadata server of the asymmetric cluster distributed file system according to the present invention, the metadata for the data file includes a chunk identifier for identifying the data chunk, and the metadata for the data chunk identifies the data file. A file identifier; And a data chunk block identifier for identifying the data chunk block, and the metadata for the data chunk block may include a data chunk identifier for identifying the data chunk.

본 발명에 따른 비대칭 클러스터 분산 파일 시스템의 메타데이터 서버에서, 상기 데이터 저장부는, 상기 데이터 청크블록에서 분할된 자(子) 데이터 청크블록에 대한 메타데이터를 추가로 저장하며, 상기 데이터 청크블록에 대한 메타데이터는, 상기 자(子) 데이터 청크블록을 식별하는 자(子) 데이터 청크블록 식별자를 더 포함하고, 상기 자(子) 데이터 청크블록에 대한 메타데이터는, 상기 데이터 청크블록을 식별하는 모(母) 데이터 청크블록 식별자를 포함할 수도 있다.In the metadata server of the asymmetric cluster distributed file system according to the present invention, the data storage unit further stores metadata of a child data chunk block divided from the data chunk block, and stores the metadata for the data chunk block. The metadata further includes a child data chunk block identifier identifying the child data chunk block, and the metadata for the child data chunk block is a parent identifying the data chunk block. (Iii) may include a data chunkblock identifier.

본 발명은 또한, 데이터 파일의 데이터 청크를 여러 개의 데이터 청크 블록으로 나누어 저장하는 단계; 클라이언트의 데이터 액세스 요청을 수신하는 단계; 상기 액세스 요청의 대상이 되는 데이터 청크 블록에 대하여, 액세스 요청의 누적 횟수를 검색하는 단계; 상기 액세스 요청의 누적 횟수가 최대 기준값 이하인 경우에는, 액세스 요청의 대상 데이터를 반환하는 단계; 상기 액세스 요청의 누적 횟수가 최대 기준값 이상인 경우에는, 다른 데이터 서버에 상기 데이터 청크 블록을 복제하는 단계; 및 상기 복제된 데이터 청크 블록에 대한 식별자를 클라이언트에게 반환하는 단계를 포함하는, 비대칭 클러스터 분산 파일 시스템에 있어서 데이터 서버의 데이터 처리 방법을 제공한다. The present invention also includes dividing and storing a data chunk of a data file into a plurality of data chunk blocks; Receiving a data access request from a client; Retrieving a cumulative number of access requests for the data chunk block that is the target of the access request; If the cumulative number of access requests is equal to or less than a maximum reference value, returning target data of the access request; If the cumulative number of access requests is greater than or equal to a maximum reference value, replicating the data chunk block to another data server; And returning an identifier for the replicated data chunk block to a client, the data processing method of the data server in the asymmetric cluster distributed file system.

본 발명에 따른 비대칭 클러스터 분산 파일 시스템에 있어서 데이터 서버의 데이터 처리 방법에서는, 상기 데이터 청크 블록의 복제 단계를 실행한 후, 상기 데이터 청크 블록에 대한 정보를 메타데이터 서버에 전송하는 단계를 더 포함할 수도 있다.In the data processing method of the data server in the asymmetric cluster distributed file system according to the present invention, after the step of replicating the data chunk block, further comprising the step of transmitting information about the data chunk block to a metadata server; It may be.

본 발명에 따른 비대칭 클러스터 분산 파일 시스템에 있어서 데이터 서버의 데이터 처리 방법에서는 또한, 상기 복제한 데이터 청크 블록을 여러 개의 자(子) 데이터 청크 블록으로 분할하여 저장하는 단계를 더 포함할 수도 있다.In the data processing method of the data server in the asymmetric cluster distributed file system according to the present invention, the method may further include dividing and storing the replicated data chunk block into a plurality of child data chunk blocks.

아울러, 본 발명에 따른 비대칭 클러스터 분산 파일 시스템에 있어서 데이터 서버의 데이터 처리 방법에서는, 상기 액세스 요청의 누적 횟수 검색 단계 전에, 상기 데이터 액세스 요청의 대상이 된 데이터 청크 블록에 대하여, 다른 데이터 서버에 복제된 데이터 청크 블록의 식별자가 있는지 검색하는 단계를 더 포함하며, 다른 데이터 서버에 복제된 데이터 청크 블록의 식별자가 있는 경우에는, 상기 데이터 청크 블록의 식별자를 클라이언트에게 반환하고, 다른 데이터 서버에 복제된 데이터 청크 블록의 식별자가 있는 경우에는, 상기 액세스 요청의 누적 횟수 검색 단계를 실행할 수도 있다.In addition, in the data processing method of the data server in the asymmetric cluster distributed file system according to the present invention, the data chunk block that is the target of the data access request is replicated to another data server before the step of retrieving the cumulative number of times of the access request. Retrieving whether there is an identifier of the data chunk block that has been replicated, and if there is an identifier of the data chunk block that has been replicated to another data server, returning the identifier of the data chunk block to the client and replicating the data chunk block to another data server. If there is an identifier of the data chunk block, the cumulative number retrieval step of the access request may be executed.

또한, 본 발명에 따른 비대칭 클러스터 분산 파일 시스템에 있어서 데이터 서버의 데이터 처리 방법에서, 상기 데이터 서버는, 상기 데이터 액세스 요청의 누적 횟수가 최대 기준값 이하인 경우에는, 상기 데이터 액세스 요청의 누적 횟수를 감소시키고, 상기 상기 데이터 액세스 요청의 누적 횟수가 최저 기준값 이하로 감소하는 경우에는, 상기 분할된 데이터 청크 블록을 데이터 청크로 조합해도 된다.Further, in the data processing method of the data server in the asymmetric cluster distributed file system according to the present invention, when the cumulative number of data access requests is less than or equal to the maximum reference value, the data server reduces the cumulative number of data access requests. When the cumulative number of the data access requests decreases below a minimum reference value, the divided data chunk blocks may be combined into data chunks.

본 발명은 이와 함께, 데이터 파일의 데이터 청크를 여러 개의 데이터 청크 블록으로 나누어 저장하는 데이터 서버; 상기 데이터 파일의 메타데이터, 상기 데이터 청크의 메타데이터 및 상기 데이터 청크 블록의 메타데이터를 저장하는 메타데이터 서버; 및 클라이언트를 포함하는 비대칭 클러스터 분산 파일 시스템을 제공한다. The present invention also includes a data server for dividing and storing data chunks of a data file into a plurality of data chunk blocks; A metadata server for storing metadata of the data file, metadata of the data chunk and metadata of the data chunk block; And an asymmetric cluster distributed file system comprising a client.

본 발명에 따른 비대칭 클러스터 분산 파일 시스템에서 상기 데이터 서버는, 상기 클라이언트의 데이터 액세스 요청에 대응하여, 상기 액세스 요청의 대상이 되는 데이터 청크 블록에 대한 액세스 요청 누적 횟수가 최대 기준값 이하인 경우에는, 액세스 요청의 대상 데이터를 반환하며, 상기 액세스 요청의 누적 횟수가 최대 기준값 이상인 경우에는, 다른 데이터 서버에 상기 데이터 청크 블록을 복제하고, 상기 복제된 데이터 청크 블록에 대한 식별자를 클라이언트에게 반환하며, 상기 메타데이터 서버는, 상기 데이터 서버로부터, 또는 상기 데이터 서버에게서 데이터 청크 블록을 복제받은 데이터 서버로부터, 데이터 청크 블록의 복제 정보를 수신하고, 상기 복제된 데이터 청크 블록에 관한 메타데이터 정보 및 상기 데이터 청크 및 데이터 청크 블록 사이의 식별자를 생성할 수도 있다.In the asymmetric cluster distributed file system according to the present invention, in response to a data access request of the client, the data server requests an access request when a cumulative number of access requests for a data chunk block targeted for the access request is equal to or less than a maximum reference value. If the cumulative number of the access request is greater than or equal to the maximum reference value, the data chunk block is replicated to another data server, an identifier for the replicated data chunk block is returned to a client, and the metadata is returned. The server receives the replication information of the data chunk block from the data server or from the data server that has replicated the data chunk block from the data server, and receives metadata information about the replicated data chunk block and the data chunk and data. Chunks It may generate an identifier between.

또한, 본 발명에 따른 비대칭 클러스터 분산 파일 시스템에서 상기 데이터 서버는, 상기 액세스 요청의 누적 횟수가 최대 기준값 이하인 경우에는, 상기 액세스 요청의 누적 횟수를 감소시키고, 상기 액세스 요청의 누적 횟수가 최소 기준값 이하로 감소한 경우에는, 상기 상기 분할된 데이터 청크 블록을 데이터 청크로 조합해도 된다.Further, in the asymmetric cluster distributed file system according to the present invention, when the cumulative number of access requests is less than or equal to the maximum reference value, the data server reduces the cumulative number of access requests, and the cumulative number of access requests is less than or equal to the minimum reference value. In the case of decreasing to, the divided data chunk block may be combined into a data chunk.

아울러, 본 발명에 따른 비대칭 클러스터 분산 파일 시스템에서, 상기 데이터 서버로부터 데이터 청크 블록을 복제받은 데이터 서버는, 상기 복제받은 데이터 청크 블록을, 여러 개의 데이터 청크 블록에 분할하여 저장하며, 설정된 수만큼 다른 데이터 서버로 복제하고, 상기 데이터 청크 블록의 복제 정보를 메타데이터 서버에 전송할 수도 있다.In addition, in the asymmetric cluster distributed file system according to the present invention, the data server that has replicated the data chunk block from the data server divides the replicated data chunk block into several data chunk blocks and stores the same data chunk block. It may be replicated to the data server, and the replication information of the data chunk block may be transmitted to the metadata server.

본 발명에 의하면, 비대칭 클러스터 분산 파일 시스템의 클라이언트들로부터 집중적으로 액세스 요청되는 데이터 청크를 분할하여 복제함으로써 데이터 복제에 대한 부하와 비용을 최소화하면서 클라이언트로부터의 액세스를 효과적으로 분산할 수 있다.According to the present invention, by dividing and replicating data chunks that are intensively requested from clients of an asymmetric cluster distributed file system, it is possible to effectively distribute access from clients while minimizing the load and cost for data replication.

또한 본 발명에 의하면, 핫 데이터 전체가 아니라 핫 데이터 블록 단위의 복제가 이루어지므로, 데이터 복제 뿐만 아니라 액세스 부하의 분산이 더 빠르게 수행될 수 있다.In addition, according to the present invention, not only the hot data but also the entire hot data block is made, so that not only data replication but also distribution of access load can be performed more quickly.

아울러, 본 발명에 의하면, 분할되어 있던 데이터 블록들을 조합함으로써 핫 데이터에 대한 수요가 감소한 경우에, 데이터를 효율적으로 관리할 수 있다.In addition, according to the present invention, data can be efficiently managed when the demand for hot data is reduced by combining the divided data blocks.

본 발명에 따른 비대칭 클러스터 분산 파일 시스템은 메타데이터를 저장하고 관리하는 메타데이터 서버, 파일 데이터를 저장하고 관리하는 데이터 서버들이 네 트워크로 연결되어 구성되는 시스템으로서, 상기 메타데이터 서버는 파일 데이터를 저장하고 있는 데이터 서버들에 대한 정보를 유지하고 클라이언트로부터의 메타데이터 접근 요청을 처리할 뿐만 아니라 파일 데이터를 저장할 데이터 서버를 선정한다. The asymmetric cluster distributed file system according to the present invention is a system in which a metadata server for storing and managing metadata and a data server for storing and managing file data are connected to a network, and the metadata server stores file data. It selects a data server to store file data as well as maintain information about the data servers it is performing and handle metadata access requests from clients.

또한, 본 발명에 따른 비대칭 클러스터 분산 파일 시스템의 데이터 서버들은 파일 데이터를 정해진 크기의 파일 데이터 청크들로 나눠서 분산 저장하고, 클라이언트로부터의 데이터 액세스 요청을 처리하며, 클라이언트로부터의 집중적인 액세스 요청으로 핫 데이터가 발생하였을 때, 핫 데이터를 분할하여 다른 데이터 서버들로 분산함으로써 클라이언트로부터의 액세스 요청을 분산시킨다.In addition, the data servers of the asymmetric cluster distributed file system according to the present invention divide and store file data into file data chunks of a predetermined size, process a data access request from a client, and hotly request an intensive access request from a client. When data occurs, it distributes hot data and distributes it to other data servers to distribute access requests from clients.

이하, 본 발명의 바람직한 실시예를 첨부된 도면들을 참조하여 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

<비대칭 클러스터 분산 파일 시스템의 개략><Summary of Asymmetric Cluster Distributed File System>

본 발명에 따른 비대칭 클러스터 분산 파일 시스템에서는, 클라이언트가 액세스할 파일에 대한 메타데이터를 메타데이터 서버로부터 수신하여, 데이터 서버에 파일 데이터를 요청하면, 데이터 서버는 클라이언트로부터의 액세스 요청에 의해 대상 데이터가 일정한 기준 이상 액세스 요청이 쇄도한 핫 데이터인지, 혹은 핫 데이터가 되었는지를 판단하고, 핫 데이터로 판단된 경우에는 데이터 블록들로 분할하여, 다른 데이터 서버에 복제하며, 클라이언트의 액세스 요청을 복제된 데이터 블록이 있는 데이터 서버로 분산시킨다. 이어서, 핫 데이터에 대한 액세서 요청이 줄어들면 핫 데이터의 블록들을 조합하여 데이터 저장 관리의 효율을 도모한다.In the asymmetric cluster distributed file system according to the present invention, when a client receives metadata about a file to be accessed from a metadata server and requests file data from the data server, the data server receives the target data by an access request from the client. It is determined whether the access request is a certain amount of hot data or hot data exceeding a certain criterion. If it is determined as hot data, the access request is divided into data blocks, replicated to another data server, and the client's access request is copied. Distributed to data servers with blocks. Then, when accessor requests for hot data are reduced, the blocks of hot data are combined to improve the efficiency of data storage management.

도 1은 본 발명에 따른 비대칭 클러스터 분산 파일 시스템의 구성을 개략적으로 도시한 것이다. 1 schematically illustrates the configuration of an asymmetric cluster distributed file system according to the present invention.

도 1에 도시된 바와 같이, 비대칭 클러스터 분산 파일 시스템은 클라이언트(101), 메타데이터 서버(102), 데이터 서버(103)을 포함한다. As shown in FIG. 1, an asymmetric cluster distributed file system includes a client 101, a metadata server 102, and a data server 103.

클라이언트(101)는 클라이언트 응용을 수행하면서 메타데이터 서버(102)로부터 파일 메타데이터에 액세스하고 데이터 서버(103)로부터 파일 데이터를 입출력한다. The client 101 accesses file metadata from the metadata server 102 and inputs and outputs file data from the data server 103 while executing a client application.

메타데이터 서버(101)는 파일 시스템의 모든 파일에 대한 메타데이터를 저장하고 관리하며, 모든 데이터 서버(103)에 대한 상태 정보를 관리한다. The metadata server 101 stores and manages metadata about all files in the file system and manages state information for all data servers 103.

비대칭 클러스터 분산 파일 시스템에서는 파일 데이터가 정해진 크기의 청크 단위로 나뉘어서 데이터 서버들(103)에 분산 생성된다. 메타데이터 서버(101)는 파일 데이터 청크가 생성될 데이터 서버를 선정하며 복제본을 저장할 데이터 서버들도 선정한다. 파일 데이터 청크는 유일한 식별자에 의해 구별되며 생성된 파일 데이터 청크에 대한 식별자 정보는 파일의 메타데이터 정보로 메타데이터 서버(101)에 유지된다. In the asymmetric cluster distributed file system, file data is divided into chunk units of a predetermined size and distributed to the data servers 103. The metadata server 101 selects a data server on which the file data chunk is to be generated, and also selects data servers to store the replica. The file data chunks are distinguished by unique identifiers, and the identifier information for the generated file data chunks is maintained in the metadata server 101 as metadata information of the file.

데이터 서버(103)는 파일 데이터 청크를 저장하고 관리한다. 데이터 서버(103)는 주기적으로 메타데이터 서버(102)에게 자신의 상태 정보를 보고한다. The data server 103 stores and manages file data chunks. The data server 103 periodically reports its status information to the metadata server 102.

클라이언트(101), 메타데이터 서버(102), 데이터 서버(103)들은 네트워크로 상호 연결된다.The client 101, metadata server 102, and data server 103 are interconnected over a network.

<데이터 서버 - 데이터의 분할 복제><Data Server-Split Replication of Data>

도 2는 본 발명에 따른 비대칭 클러스터 분산 파일 시스템의 데이터 서버에서 핫 데이터를 핫 데이터 블록들로 분할하고 분산하는 것을 개략적으로 도시한 것이다.FIG. 2 schematically illustrates partitioning and distributing hot data into hot data blocks in a data server of an asymmetric cluster distributed file system according to the present invention.

파일 데이터 청크는 서버 고장 또는 네트워크 고장 등에 의한 데이터 손실 대비 또는 데이터 서버의 부하 분산 등을 목적으로 여러 대의 데이터 서버에 복제본을 만들어 저장된다. File data chunks are stored in duplicates on multiple data servers for purposes of data loss due to server or network failures, or for load balancing of data servers.

일반적으로 데이터 청크는 복제본을 포함하여 세 개 정도를 유지하도록 관리된다. 즉 세 대의 별도의 데이터 서버들(201, 202, 203)에 데이터 청크가 저장된다. Typically, data chunks are managed to maintain three or more copies, including replicas. That is, data chunks are stored in three separate data servers 201, 202, and 203.

데이터 서버(201)는 클라이언트로부터 파일 데이터에 대한 액세스 요청을 받으면 해당 데이터가 핫 데이터에 해당하는지를 판단한다.When the data server 201 receives an access request for file data from the client, the data server 201 determines whether the data corresponds to hot data.

좀 더 구체적으로 도면을 참조하여 설명하면, 본 발명에서, 데이터 서버(201)는 파일 데이터 청크(211)를 데이터 청크 블록(211a, 211b, 211c)로 분할하여, 저장하고 있다. 여기서 데이터 청크 블록은, 데이터 청크의 데이터를 더 잘게 분할한 단위를 말하는 것으로, 저장 공간을 일컫는 데이터 블록과는 구별된다.More specifically, referring to the drawings, in the present invention, the data server 201 divides and stores the file data chunk 211 into data chunk blocks 211a, 211b, and 211c. Herein, the data chunk block refers to a unit in which data of the data chunk is further divided, and is distinguished from a data block called storage space.

액세스 요청이 있으면, 데이터 서버(201)은 데이터 청크 블록 단위로 액세스 요청의 횟수를 산출한다. 데이터 청크 블록에 대한 지금까지 누적된 액세스 요청의 횟수가 미리 설정한 기준 누적 횟수를 초과하였을 때는 해당 데이터 블록을 핫 데이터 청크 블록으로 판단하고, 다른 데이터 서버로 복제한다. If there is an access request, the data server 201 calculates the number of access requests in units of data chunk blocks. When the number of accumulated access requests for the data chunk blocks exceeds the preset reference cumulative number, the corresponding data block is determined as a hot data chunk block and replicated to another data server.

예컨대, 데이터 청크 블록(211a)에 대한 액세스 요청 누적 횟수가 미리 정한 기준 누적 횟수를 초과하면, 해당 청크 블록(211a)만을 다른 데이터 서버(204)에 복제한다. 이를 통해, 해당 데이터 청크 블록을 가진 데이터 서버가 늘어나면서, 데이터 서버(201)에 대한 액세스 요청 부하가 분산될 수 있다.For example, when the cumulative number of access requests for the data chunk block 211a exceeds a predetermined reference cumulative number, only the chunk block 211a is replicated to another data server 204. As a result, as the number of data servers having corresponding data chunk blocks increases, the load of access requests to the data server 201 may be distributed.

이때, 데이터 청크 블록(211a)이 복제된 데이터 서버(204)는 복제된 데이터 청크 블록(213)을 또다시 데이터 청크 블록(213a, 213b, 213c)으로 분할하여 저장할 수도 있다. In this case, the data server 204 in which the data chunk block 211a is duplicated may divide the duplicated data chunk block 213 into data chunk blocks 213a, 213b, and 213c.

뿐만 아니라, 일반적으로 복제가 이루어지는 경우에 유지되는 복제본의 갯수가 미리 설정되어 있는 경우에는, 그에 따라 다른 데이터 서버에 연속하여 데이터복제를 행한다. 예를 들어, 복제본이 전부 3개가 유지되도록 미리 설정되어 있다면, 데이터 청크 블록을 복제받는 데이터 서버(204)는 다른 데이터 서버로 차례로 복제(데이터 서버-61 → 데이터 서버-62 → 데이터 서버-63)함으로써, 핫 데이터 청크 블록에 대한 복제본이 3개 유지되도록 한다. In addition, in general, when the number of replicas held in the case of duplication is set in advance, data duplication is continuously performed to other data servers accordingly. For example, if all three replicas are preset, the data server 204 receiving the data chunk blocks is replicated in turn to another data server (data server-61 to data server-62 to data server-63). This allows three copies of the hot data chunk block to be maintained.

핫 데이터 청크 블록을 감지하고 복제하는 절차는 다른 데이터 서버들(202, 203)에서도 수행된다. The procedure for detecting and duplicating hot data chunk blocks is also performed at other data servers 202 and 203.

복제가 이루어지는 경우에는, 원본 데이터 블록을 가지고 있는 데이터 서버 또는 복제를 받은 데이터 서버 중 어느 하나, 혹은 두 데이터 서버 모두가 해당 정보를 메타데이터 서버에 전송한다.In the case of replication, either the data server having the original data block, the data server receiving the replication, or both data servers transmit the information to the metadata server.

복제된 핫 데이터 청크 블록에 대한 액세스 요청이 계속해서 발생하여, 또다시 핫 데이터가 될 수도 있다. 이 경우에 데이터 서버는 또다시 해당 데이터 청크 블록을 분할 분산 복제함으로써 액세스 요청을 분산시킬 수 있으며, 이를 대비해, 복제된 데이터 청크 블록을 다시 여러 개의 블록에 나누어 저장할 수도 있다.Requests to access the replicated hot data chunk blocks may continue to occur, again becoming hot data. In this case, the data server may distribute the access request by splitting and replicating the corresponding data chunk block again. Alternatively, the data server may divide and store the replicated data chunk block in several blocks.

예를 들어, 데이터 서버(205)는 복제받은 데이터 청크 블록(213)을 세 개의 데이터 블록(213a, 213b, 213c)에 나누어 저장하고, 이 새로운 자(子) 데이터 청크 블록 별로 액세스 요청 횟수를 누적해 나갈 수도 있다. 그 결과, 특정 자(子) 데이터 청크 블록(213a)에 대한 액세스 요청의 누적 횟수가 또다시 기준 누적횟수를 초과하면, 이 데이터 청크 블록(213a)을 다른 데이터 서버(208)에 복사한다. For example, the data server 205 stores the replicated data chunk block 213 in three data blocks 213a, 213b, and 213c, and accumulates the number of access requests for each new child data chunk block. You can do it. As a result, if the cumulative number of access requests for the specific child data chunk block 213a again exceeds the reference cumulative number, the data chunk block 213a is copied to another data server 208.

이처럼, 분할된 데이터 청크 블록을 다시 데이터 청크 블록으로 분할하고 분산 복제/저장하는 경우에는, 메타데이터 서버에 이들 모(母) 데이터 청크 블록과 자(子) 데이터 청크 블록을 식별할 수 있는 정보를 함께 전송한다.As described above, when the divided data chunk blocks are divided into data chunk blocks and distributed replicated / stored, information for identifying the parent data chunk block and the child data chunk blocks is provided to the metadata server. Send together.

물론, 일단 데이터 서버(201)에서 다른 데이터 서버(204)로 핫 데이터 청크 블록에 분할 분산 복제가 이루어진 뒤에도, 해당 데이터 청크 블록에 대한 액세스 요청이 계속해서 일정 기준 이상이면, 데이터 서버(201)는 다른 데이터 서버(205)에 또다시 해당 데이터 청크 블록을 복제할 수도 있다. Of course, once a distributed distributed replication of a hot data chunk block from the data server 201 to another data server 204 is made, if the access request for the data chunk block continues to be above a certain criterion, the data server 201 The data chunk block may be replicated to another data server 205 again.

핫 데이터 청크 블록에 대한 접근 요청이 정해진 시간 동안, 기준치를 초과하지 않으면, 해당 데이터 청크 블록에 대한 수요가 감소하고 있다는 것을 의미하므로, 여러 데이터 서버(205, 207 등)에 복제되었던 데이터 청크 블록들을 데이터 서버(209 등)에 조합함으로써 점진적으로 핫 데이터 청크 블록들을 줄여나감으로써, 저장공간을 다시 확보할 수도 있다.If a request for access to a hot data chunk block does not exceed the threshold, it means that the demand for that data chunk block is decreasing, so that data chunk blocks that have been replicated to multiple data servers (205, 207, etc.) The storage space may be reclaimed by gradually reducing the hot data chunk blocks by combining to the data server 209 or the like.

<메타데이터 서버 및 데이터 서버 - 데이터에 관한 정보 관리>Metadata Servers and Data Servers-Information Management on Data

도 3은 본 발명에 따른 비대칭 클러스터 분산 파일 시스템의 메타데이터 서 버에서 저장/관리하는 메타데이터를 개략적으로 도시한 것이다.3 schematically illustrates metadata stored / managed in a metadata server of an asymmetric cluster distributed file system according to the present invention.

도 3을 참조하면, 데이터 파일에 대한 메타데이터(301)에는, 파일에 대한 여러 가지 메타데이터 정보뿐만 아니라, 파일 데이터의 데이터 청크들을 식별하기 위한 식별자 정보가 유지된다. Referring to FIG. 3, metadata 301 for a data file holds not only various metadata information for the file, but also identifier information for identifying data chunks of file data.

구체적으로, 파일 데이터 청크에 대한 메타데이터(302)에는 복제본 데이터 청크를 식별할 수 있는 식별자, 해당 청크를 저장하고 있는 데이터 서버 주소, 그리고 데이터 청크 블록에 관한 정보를 유지한다. Specifically, the metadata 302 for the file data chunks maintains an identifier for identifying the duplicate data chunks, a data server address that stores the chunks, and information about the data chunk blocks.

데이터 청크 블록에 대한 메타데이터(303)에는 해당 데이터 청크 블록이 분할되어 나온 데이터 청크에 대한 식별자와 복제본 데이터 청크 블록의 식별자, 데이터 청크 블록을 저장하고 있는 데이터 서버 주소 그리고 자(子) 데이터 청크 블록들의 식별자를 유지한다. The metadata 303 for the data chunk block includes an identifier for the data chunk from which the data chunk block is divided, an identifier of the replica data chunk block, a data server address storing the data chunk block, and a child data chunk block. Maintain their identifiers.

최하위 데이터 청크 블록에 대한 메타데이터(304)에는 모(母) 데이터 청크 블록의 식별자, 복제본 데이터 청크 블록의 식별자, 데이터 청크 블록을 저장하고 있는 데이터 서버 주소 등을 유지한다. The metadata 304 for the lowest data chunk block holds the identifier of the parent data chunk block, the identifier of the replica data chunk block, the data server address that stores the data chunk block, and so forth.

이러한 메타데이터들은 모자(母子) 관계, 즉 분할 관계로 서로 연결된다. These metadata are connected to each other in a hat relationship, that is, a partition relationship.

또한, 데이터 청크 블록은 항상 분할되어서 저장되는 것뿐만 아니라, 조합되어서 저장될 수도 있다. 이 경우, 파일 데이터 청크에 대한 메타데이터(302)에서 각각의 데이터 청크 블록의 식별자는 두 개 이상의 데이터 청크 블록을 합쳐서 데이터를 저장하고 있는 하나의 데이터 청크 블록에 대한 메타데이터(305)를 지시하게 된다.In addition, the data chunk blocks are not only always divided and stored, but also may be stored in combination. In this case, the identifier of each data chunk block in the metadata 302 for file data chunks combines two or more data chunk blocks to indicate metadata 305 for one data chunk block that is storing data. do.

뿐만 아니라, 자(子) 데이터 청크 블록(304)가 있는 데이터 서버를 찾아가기 위해, 데이터 파일의 메타데이터(301)로부터 데이터 청크에 식별자를 통해 데이터 청크의 메타데이터(302)를 찾아가고, 데이터 청크의 메타데이터(302)에서 다시 데이터 청크 블록에 대한 식별자를 통해 데이터 청크 블록의 메타데이터(303)을 찾은 후, 다시 자(子) 데이터 청크 블록의 메타데이터(304)를 찾아가지 않고, 데이터 파일의 메타데이터(301)에서, 바로 모든 정보를 관리할 수도 있다.In addition, in order to visit the data server with the child data chunk block 304, the metadata 302 of the data chunk is retrieved from the metadata 301 of the data file via an identifier in the data chunk, and the data After searching the metadata 303 of the chunk block again through the identifier for the data chunk block in the metadata 302 of the chunk, the data 304 is not found again. In the metadata 301 of the file, all the information may be managed immediately.

도 4는 본 발명에 따른 비대칭 클러스터 분산 파일 시스템의 데이터 서버에서 파일 데이터 청크와 청크 블록에 대한 정보를 관리하기 위해 저장하는 데이터 구조를 개략적으로 도시한 것이다.4 schematically illustrates a data structure stored in a data server of an asymmetric cluster distributed file system according to the present invention for managing information about file data chunks and chunk blocks.

도 4에 도시된 바와 같이, 데이터 서버에 저장된 데이터 청크에 대한 정보(401)에는 데이터 청크 식별자 그리고 데이터 청크 블록의 식별자와 해당 데이터 청크 블록에 대한 액세스 요청 누적 횟수 등이 포함된다. 데이터 청크 블록에 대한 메타데이터(402) 및 자(子) 데이터 청크 블록에 대한 메타데이터(403)에도 청크 블록에 대한 메타데이터와 동일하게 청크 블록에 대한 식별자와 접근 요청 회수 등을 포함한다.As shown in FIG. 4, the information 401 for the data chunks stored in the data server includes a data chunk identifier, an identifier of the data chunk block, and the accumulated number of access requests for the data chunk block. The metadata 402 for the data chunk block and the metadata 403 for the child data chunk block also include the identifier for the chunk block, the number of access requests, and the like, similarly to the metadata for the chunk block.

데이터 서버에서는 이를 통해, 핫 데이터 발생 여부를 판단할 수 있고, 메타데이터 서버의 메타데이터에 액세스함으로써 데이터 청크 및 데이터 청크 블록 간의 분할 관계를 검색할 수 있다.Through this, the data server may determine whether hot data is generated, and retrieve the partition relationship between the data chunk and the data chunk block by accessing the metadata of the metadata server.

클라이언트의 액세스 요청에 따른 핫 데이터 발생Hot data generated by client's access request

도 5는 본 발명에 따른 비대칭 클러스터 분산 파일 시스템에서 클라이언트가 데이터에 액세스하고자 할 때 핫 데이터가 발생한 경우의 처리에 관한 플로우차트이다.FIG. 5 is a flowchart illustrating processing when hot data occurs when a client attempts to access data in an asymmetric cluster distributed file system according to the present invention.

클라이언트는 먼저 접근하고자 하는 데이터 청크 또는 데이터 청크 블록에 대한 메타데이터가 이미 캐싱되어 있는지를 판단한다(S501 단계). The client first determines whether metadata for the data chunk or data chunk block to be accessed is already cached (step S501).

캐싱되어 있다면, 이 메타데이터를 통해 데이터 서버에 해당 데이터를 읽기 위한 요청을 한다(S504 단계).If it is cached, a request is made to read the corresponding data to the data server through this metadata (step S504).

만약 메타데이터가 캐싱되어 있지 않으면 메타데이터 서버로부터 액세스할 데이터 청크 또는 데이터 청크 블록에 대한 메타데이터를 요청한다(S502 단계). If the metadata is not cached, the metadata for the data chunk or data chunk block to be accessed from the metadata server is requested (step S502).

그리고 수신한 메타데이터를 캐시에 반영한다(S503 단계). The received metadata is reflected in the cache (S503).

클라이언트는 메타데이터를 통해 읽을 데이터 청크를 가진 데이터 서버를 결정하고 데이터 청크가 저장된 데이터 서버에 데이터 청크에 대한 읽기 요청을 한다(S504 단계). The client determines a data server having a data chunk to read through the metadata and makes a read request for the data chunk to the data server where the data chunk is stored (step S504).

클라이언트는 데이터를 읽기 위한 요청에 한 데이터 서버로부터, 해당 데이터 청크의 데이터를 수신하며, 해당 데이터 청크가 핫 데이터 청크일 경우에는 다른 데이터 서버에 복제된 데이터 청크 블록에 대한 식별자를 수신한다(S505 단계). The client receives data of the corresponding data chunk from the data server making a request to read data, and if the data chunk is a hot data chunk, receives the identifier for the data chunk block that is replicated to another data server (step S505). ).

데이터 서버로부터 데이터 청크 블록의 식별자를 수신한 경우(S506 단계의 ‘예’), 클라이언트는 수신한 데이터 청크 블록의 식별자를 통해, 메타데이터 서버에 데이터 청크 블록의 메타데이터를 다시 요청한다(S502 단계).When the identifier of the data chunk block is received from the data server (YES in step S506), the client requests the metadata of the data chunk block again from the metadata server through the identifier of the received data chunk block (step S502). ).

데이터 서버로부터 데이터 청크 블록 식별자를 받지 않은 경우(S506 단계의 ‘아니오’)는, 해당 데이터를 수신한 것이므로, 데이터 읽기 요청에 대한 응답을 종료한다.If the data chunk block identifier is not received from the data server (NO in step S506), since the corresponding data is received, the response to the data read request is terminated.

데이터 서버에서의 데이터 처리Data processing on the data server

도 6은 본 발명에 따른 비대칭 클러스터 분산 파일 시스템의 데이터 서버에서 핫 데이터 청크를 검출하고 핫 데이터 청크 블록을 복제하는 처리 과정을 개략적으로 도시한 플로우차트이다.6 is a flowchart schematically illustrating a process of detecting a hot data chunk and duplicating a hot data chunk block in a data server of an asymmetric cluster distributed file system according to the present invention.

클라이언트는 액세스할 데이터 청크에 대해 데이터 서버로 읽기 요청을 한다(S601 단계). The client makes a read request to the data server for the data chunk to be accessed (step S601).

이에 대하여, 데이터 서버는 클라이언트로부터 읽기 요청된 부분에 해당하는 데이터 청크 블록에 대한 액세스 요청 누적 횟수를 증가시키다(S602 단계). In contrast, the data server increases the accumulated number of access requests for the data chunk block corresponding to the read request from the client (step S602).

데이터 서버는 이 액세스 요청의 누적 횟수가 미리 설정해 둔 기준 누적 횟수를 초과하였는지 판단한다(S603 단계). The data server determines whether the cumulative number of times of this access request exceeds the preset reference cumulative number (step S603).

해당 데이터 청크 블록에 대한 액세스 요청의 누적 횟수가 기준 누적 횟수를 초과하지 않았으면(S603 단계의 ‘아니오’), 데이터 서버는 해당 데이터 청크 블록에 대한 액세스 요청 누적 횟수를 감소시키고(S606 단계), 요청받은 데이터를 클라이언트에게 반환한다(S607 단계). If the cumulative number of access requests for the data chunk block does not exceed the reference cumulative number (NO in step S603), the data server decreases the cumulative number of access requests for the data chunk block (step S606). The requested data is returned to the client (step S607).

해당 데이터 청크 블록에 대한 액세스 요청의 누적 횟수가 기준 누적 횟수를 초과하였으면(S603 단계의 ‘예’), 해당 데이터 청크 블록은 핫 데이터에 해당하므로, 데이터 청크 블록을 복제할 다른 데이터 서버의 데이터 블록에 대한 할당을 메타데이터 서버에 요청한다(S606 단계). If the cumulative number of access requests for the data chunk block exceeds the reference cumulative number (YES in step S603), since the data chunk block corresponds to hot data, the data block of another data server to which the data chunk block is to be replicated. Request the allocation to the metadata server (step S606).

데이터 블록을 할당받으면, 데이터 서버는 할당받은 데이터 블록에 액세스 요청의 누적 횟수가 기준 누적 횟수를 초과한 데이터 청크 블록을 복제한다(S607 단계). When the data block is allocated, the data server replicates the data chunk block in which the cumulative number of access requests exceeds the reference cumulative number in the allocated data block (step S607).

데이터 청크 블록에 대한 복제가 완료되면, 복제를 행한 데이터 서버 또는 복제를 받은 데이터 서버 중 어느 하나 혹은 둘 모두가 이에 대한 정보를 메타데이터 서버로 전송한다(S608 단계). 이때, 복제를 받은 데이터 서버는 미리 설정된 수만큼 다른 데이터 서버에 복제를 행해서, 비대칭 클러스터 분산 파일 시스템 내에 복제본이 정해진 수만큼 유지될 수 있도록 한다.When the replication of the data chunk block is completed, either or both of the replicated data server or the replicated data server transmits the information about the data chunk to the metadata server (step S608). At this time, the replicated data server replicates to another data server by a predetermined number so that a predetermined number of copies can be maintained in the asymmetric cluster distributed file system.

데이터 서버는 클러이언트에게 복제된 청크 블록을 식별할 수 있는 식별자를 생성하고 반환함으로써, 액세스 요청을 분산시킨다(S609 단계).The data server distributes the access request by generating and returning an identifier for identifying the chunk block that has been replicated to the client (step S609).

이에 대하여, 데이터 서버는 읽기 요청의 대상 데이터 청크 블록에 대하여 식별자가 이미 생성되어 있는지를 먼저 확인할 수도 있다(S602 단계). 즉, 데이터 청크 블록에 대한 식별자가 이미 존재한다면, 해당 데이터는 핫 데이터 청크로 판단되었고, 다른 데이터 서버에 해당 데이터 청크에 관한 데이터 청크 블록이 복제되어 분산되어 있음을 의미한다. 또한, 이 데이터 서버에 대한 액세스 요청 부하가 크게 증가한 상태임을 의미하므로, 부하를 분산시키기 위해, 이때는 클라이언트에게 복제된 데이터 청크 블록에 대한 식별자를 전달한다. In contrast, the data server may first check whether an identifier has already been generated for the target data chunk block of the read request (step S602). That is, if an identifier for the data chunk block already exists, the corresponding data is determined to be a hot data chunk, and means that the data chunk block related to the data chunk is replicated and distributed to another data server. In addition, since the load of access requests to the data server is greatly increased, in order to distribute the load, an identifier for the replicated data chunk block is transmitted to the client.

지금까지 도면을 참조로 본 발명의 구체적인 구현 방법을 설명하였지만, 이는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 쉽게 이해할 수 있도 록 하기 위한 것이고 발명의 기술적 범위를 제한하기 위한 것이 아니다. 도면을 참조로 한 이상과 같은 설명은 본 발명의 기술적 사상의 범위 내에서 충분히 변형되거나 수정될 수 있다.Although a specific implementation method of the present invention has been described above with reference to the drawings, it is intended to be easily understood by those skilled in the art and is not intended to limit the technical scope of the present invention. The above description with reference to the drawings may be sufficiently modified or modified within the scope of the technical idea of the present invention.

도 3은 본 발명에 따른 비대칭 클러스터 분산 파일 시스템의 메타데이터 서버에서 저장/관리하는 메타데이터를 개략적으로 도시한 것이다.3 schematically illustrates metadata stored / managed by a metadata server of an asymmetric cluster distributed file system according to the present invention.

Claims

Metadata for the data file;

Metadata for each data chunk divided in the data file; And

It includes a data storage for storing metadata for the data chunk block divided from the data chunk,

Metadata server in an asymmetric cluster distributed file system.

The method of claim 1,

Metadata for the data file,

A chunk identifier identifying the data chunk,

Metadata for the data chunk,

A file identifier identifying the data file; And

A data chunk block identifier identifying the data chunk block,

Metadata for the data chunk block,

A data chunk identifier identifying the data chunk,

Metadata server in an asymmetric cluster distributed file system.

The method of claim 2,

The data storage unit,

Additionally storing metadata about a child data chunk block partitioned from the data chunk block,

Metadata for the data chunk block,

Further comprising a child data chunk block identifier identifying the child data chunk block,

Metadata for the child data chunk block,

A parent data chunkblock identifier identifying the data chunkblock;

Metadata server in an asymmetric cluster distributed file system.

Receiving a data access request from a client;

Retrieving a cumulative number of access requests for the data chunk block that is the target of the access request;

If the cumulative number of access requests is equal to or less than a maximum reference value, returning target data of the access request;

If the cumulative number of access requests is greater than or equal to a maximum reference value, replicating the data chunk block to another data server; And

Generating an identifier for the replicated data chunk block and returning it to a client,

A method for processing data in a data server in an asymmetric cluster distributed file system.

The method of claim 4, wherein

After executing the copying step of the data chunk block,

Transmitting information on the data chunk block to a metadata server;

The method of claim 4, wherein

The method may further include dividing and storing the replicated data chunk block into a plurality of child data chunk blocks.

The method of claim 4, wherein

Before the cumulative number retrieval step of the access request,

Searching for the data chunk block targeted for the data access request, whether there is an identifier of the data chunk block replicated to another data server,

If there is an identifier of the data chunk block replicated to another data server, the identifier of the data chunk block is returned to the client,

If there is an identifier of the data chunk block replicated in another data server, executing the step of retrieving the cumulative number of access requests,

The method of claim 4, wherein

The data server,

If the cumulative number of data access requests is less than or equal to a reference cumulative number, the cumulative number of data access requests is decreased,

When the cumulative number of the data access request is equal to or less than a reference cumulative number of times during a reference time, the divided data chunk blocks are combined into data chunks.

A data server for dividing and storing data chunks of the data file into a plurality of data chunk blocks;

A metadata server for storing metadata of the data file, metadata of the data chunk and metadata of the data chunk block; And

Clients

Asymmetric cluster distributed file system containing.

10. The method of claim 9,

The data server,

In response to the data access request of the client, if the cumulative number of access requests for the data chunk block targeted for the access request is less than or equal to a reference cumulative number, the target data of the access request is returned.

If the cumulative number of access requests is greater than or equal to a reference cumulative number, the data chunk block is replicated to another data server, and an identifier for the replicated data chunk block is returned to a client.

The metadata server,

Receiving the replication information of the data chunk block from the data server or from the data server that has replicated the data chunk block from the data server,

Generating metadata information about the replicated data chunk block and an identifier between the data chunk and the data chunk block,

Asymmetric Cluster Distributed File System.

The method of claim 10,

The data server,

If the cumulative number of access requests is less than or equal to a reference cumulative number, the cumulative number of access requests is decreased.

When the cumulative number of access requests is equal to or less than a reference cumulative number of times during a reference time, the divided data chunk blocks are combined into data chunks.

Asymmetric Cluster Distributed File System.

The method of claim 10,

The data server receiving the data chunk block from the data server,

The replicated data chunk block,

Split into multiple data chunk blocks and store, replicate as many data servers as the set number

Transmitting the replication information of the data chunk block to a metadata server,

Asymmetric Cluster Distributed File System.