KR20200072666A

KR20200072666A - Selective data processing method of convolution layer and neural network processor using thereof

Info

Publication number: KR20200072666A
Application number: KR1020180160495A
Authority: KR
Inventors: 하순회
Original assignee: 서울대학교산학협력단
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2020-06-23
Anticipated expiration: 2038-12-13
Also published as: KR102167211B1

Abstract

선택적 합성곱 뉴럴 프로세서는 상기 합성곱 계층은,특징 지도를 포함하는 상기 특징 지도 버퍼의 출력 및 필터를 포함하는 상기 필터 버퍼의 출력을 입력으로 하여 병렬로 서로 곱하는 곱셈 연산부, 상기 곱셈 연산부의 출력을 누산하는 누산 연산부, 상기 곱셈 연산부의 출력을 덧셈트리 형식으로 더하는 덧셈트리 연산부 및 상기 곱셈 연산부의 출력을 상기 누산 연산부 또는 상기 덧셈트리 연산부 중 어느 하나의 입력으로 선택하여 전송하는 연산 선택부를 포함한다.The selective convolutional neural processor may include a multiplication operation unit that multiplies each other in parallel by inputting an output of the feature map buffer including a feature map and an output of the filter buffer including a filter as inputs. It includes an accumulating accumulator, an addition tree operator adding the output of the multiplication operator in an addition tree format, and an operation selector selecting and transmitting the output of the multiplication operator as an input of the accumulating operator or the addition tree operator.

Description

Selective data processing method of convolutional layer and neural processor using the same{SELECTIVE DATA PROCESSING METHOD OF CONVOLUTION LAYER AND NEURAL NETWORK PROCESSOR USING THEREOF}

본 발명은 합성곱 계층의 선택적 데이터 처리 방법 및 합성곱 계층의 선택적 데이터 처리 방법을 이용하는 뉴럴 프로세서에 관한 것이다. 보다 상세하게는 합성곱 계층의 세부 연산을 선택적으로 수행할 수 있는 합성곱 계층의 선택적 데이터 처리 방법 및 합성곱 계층의 선택적 데이터 처리 방법을 이용하는 뉴럴 프로세서에 관한 것이다.The present invention relates to a neural processor using a selective data processing method of a convolution layer and a selective data processing method of a convolution layer. More specifically, it relates to a neural processor using a selective data processing method of a convolutional layer and a selective data processing method of a convolutional layer that can selectively perform detailed operations of the convolutional layer.

일반적으로 뉴럴 프로세서란 계산량이 많은 기계 학습(machine learning) 알고리즘을 가속하기 위한 프로세서로서 뉴럴 엔진, 뉴로 프로세서, 뉴로 컴퓨터, 하드웨어 가속기, 연산 가속기 등의 다양한 용어로 불리워지고 있다. 뉴럴 프로세서는 특히 합성곱 기반의 심층 학습(딥 러닝: Deep Learning)인 CNN(Convolution Neural Network)을 가속하기 위해 주로 사용되고 있다. CNN은 DNN(Deep Neural Network)의 일종으로 이미지 분류, 객체 인식 등의 다양한 기계 학습 응용에서 널리 사용되는 기술로써 네트워크에서 요구하는 계산량이 매우 높기 때문에 이를 하드웨어적으로 가속하는 하드웨어 가속기인 뉴럴 프로세서의 연산 성능은 CNN에서 매우 중요한 요소이다.In general, a neural processor is a processor for accelerating a machine learning algorithm having a large amount of computation, and is called various terms such as a neural engine, a neuro processor, a neuro computer, a hardware accelerator, and a computational accelerator. The neural processor is mainly used to accelerate the convolution neural network (CNN), which is a deep learning (Deep Learning) based on a convolution product. CNN is a type of deep neural network (DNN). It is a technique widely used in various machine learning applications such as image classification and object recognition.It is calculated by neural processor, which is a hardware accelerator that accelerates hardware because it has a very high computational amount. Performance is a very important factor in CNN.

CNN의 핵심 연산은 입력을 필터링하는 합성곱(convolution)이며 하나의 네트웍이 복수의 합성곱 계층으로 구성된다. 일반적인 CNN은 하나의 입력에 대하여 복수개의 필터를 적용하여 합성곱을 계산하는데, 하나의 합성곱은 입력과 필터의 원소들을 곱하고 곱한 값들을 더하는 방식으로 이루어진다. 하나의 합성곱 계층을 병렬화하는 방식은 매우 다양하며 기존의 뉴럴 프로세서들은 하나의 병렬화 방식을 선택하여 합성곱을 가속한다.The core operation of CNN is convolution to filter the input, and one network consists of multiple convolution layers. A general CNN calculates a composite product by applying a plurality of filters to one input, and one composite product is obtained by multiplying input and filter elements and adding multiplied values. There are many ways to parallelize one convolutional layer, and existing neural processors accelerate the convolution by selecting one parallelization method.

기존의 뉴럴 프로세서들의 합성곱 계층을 병렬화 하는 방식은 곱셈을 병렬화 하는 관점에서 보면 출력 픽셀을 몇 개를 동시에 계산할 것인지를 정하는 픽셀 레벨 병렬화 정도와 하나의 출력 픽셀을 계산하기 위하여 몇 개의 곱셈을 병렬로 수행할 것인지를 정하는 곱셈 레벨 병렬화 정도에 따라서 구조가 나뉘며, 곱셈과 덧셈을 수행하는 데이터패스의 관점에서 보면 곱셈과 누산을 동시에 수행하는 MAC(Multiply and Accumulate) 유닛을 사용하는 구조와 병렬로 수행한 다수개의 곱셈을 덧셈트리로 더하는 구조로 나뉜다.The parallelization method of the convolutional layer of the existing neural processors is the degree of parallelization of the pixel level, which determines how many output pixels are calculated at the same time from the viewpoint of parallelizing the multiplication, and how many multiplications are parallelized to calculate one output pixel. The structure is divided according to the degree of multiplication level parallelism that determines whether to perform. From the perspective of the datapath that performs multiplication and addition, it is performed in parallel with a structure that uses a multiply and accumulate (MAC) unit that performs multiplication and accumulation simultaneously. It is divided into a structure that adds multiple multiplications to an addition tree.

기존의 덧셈트리로 더하는 구조는 도 4와 같이 하나의 특징 지도 버퍼당 출력 픽셀 개수만큼의 필터 버퍼를 필요로 하며, 각 필터 버퍼당 하나의 병렬 곱셈기 및 이와 연결되는 하나의 덧셈트리를 필요로 한다. 여기서, 병렬 곱셈기는 복수개의 곱셈기의 집합일 수 있다. 이러한 기존의 덧셈트리로 더하는 구조의 경우 채널 방향으로 곱셈을 먼저 수행하도록 입력 특징 지도와 채널을 배열하여 버퍼에 저장한다. 널리 알려진 CNN 중에는 필터 채널의 크기를 1로 하고 입력 특징지도의 채널 별로 다른 필터를 적용하는 뎁스별 컨볼루션(Depthwise Convolution)을 사용하는 것이 있는데, 필터의 채널 크기가 1이면 병렬적으로 수행할 곱셈의 개수가 1이고 하나의 출력 픽셀 계산에 필요한 곱셈의 개수가 필터의 크기와 같기 때문에 덧셈 트리기반의 뉴럴 프로세서는 비효율적인 구조가 되어 구조의 활용도(utilization)가 매우 낮아 지는 문제점이 있다. 기존의 MAC 유닛을 사용하는 구조도 채널 방향으로 곱셈을 병렬적으로 수행하도록 연산기가 구성되어 있는 경우 채널의 크기가 1이면 채널 방향의 곱셈 병렬성을 활용하지 못하는 문제점이 있다. 또한, 기존의 덧셈트리로 더하는 구조의 경우 출력 특징 지도와 입력 특징 지도를 픽셀별로 더하는 연산을 지원하지 못하는 문제점이 있어 개선이 요구되어 왔다.The structure added to the existing addition tree requires a filter buffer corresponding to the number of output pixels per one feature map buffer, as shown in FIG. 4, and requires one parallel multiplier for each filter buffer and one addition tree connected thereto. . Here, the parallel multiplier may be a set of a plurality of multipliers. In the case of the structure of adding to the existing addition tree, input feature maps and channels are arranged and stored in a buffer to perform multiplication in the channel direction first. Among widely known CNNs, there is a depthwise convolution that applies a filter channel size of 1 and a different filter for each channel of the input feature map. If the filter channel size is 1, multiplication is performed in parallel. Since the number of is 1 and the number of multiplications required for calculating one output pixel is the same as the size of a filter, the addition tree-based neural processor has an inefficient structure, so that the utilization of the structure is very low. In the case of a structure using a conventional MAC unit, when an operator is configured to perform multiplication in parallel in the channel direction, if the channel size is 1, there is a problem in that multiplication parallelism in the channel direction cannot be utilized. In addition, in the case of the structure of adding to the existing addition tree, there is a problem that the operation of adding the output feature map and the input feature map for each pixel is not supported, and thus improvement has been required.

한국공개특허공보 제10-2018-0052063호(2018.05.17)Korean Patent Publication No. 10-2018-0052063 (2018.05.17)

이에 본 발명의 기술적 과제는 이러한 점에서 착안된 것으로, 본 발명의 목적은 합성곱 계층의 세부 연산을 선택적으로 수행할 수 있는 합성곱 계층의 선택적 데이터 처리 방법을 제공하는 것이다.Accordingly, the technical problem of the present invention has been devised in this regard, and an object of the present invention is to provide a method for selectively processing a data of a convolution layer capable of selectively performing detailed operations of the convolution layer.

또한 본 발명의 다른 목적은 세부 연산을 선택적으로 수행할 수 있는 합성곱 계층의 선택적 데이터 처리 방법을 이용하는 뉴럴 프로세서를 제공하는 것이다. 본 발명에서는 합성곱 계층의 선택적 데이터 처리 방법을 이용하는 뉴럴 프로세서를 선택적 합성곱 뉴럴 프로세서로 명명한다.Another object of the present invention is to provide a neural processor using a selective data processing method of a convolutional layer capable of selectively performing detailed operations. In the present invention, a neural processor using a selective data processing method of a convolution layer is called an optional convolutional neural processor.

상기한 본 발명의 목적을 실현하기 위한 선택적 합성곱 뉴럴 프로세서에서 상기 합성곱 계층은,특징 지도를 포함하는 상기 특징 지도 버퍼의 출력 및 필터를 포함하는 상기 필터 버퍼의 출력을 입력으로 하여 병렬로 서로 곱하는 곱셈 연산부, 상기 곱셈 연산부의 출력을 누산하는 누산 연산부, 상기 곱셈 연산부의 출력을 덧셈트리 형식으로 더하는 덧셈트리 연산부 및 상기 곱셈 연산부의 출력을 상기 누산 연산부 또는 상기 덧셈트리 연산부 중 어느 하나의 입력으로 선택하여 전송하는 연산 선택부를 포함한다.In the optional convolutional neural processor for realizing the above object of the present invention, the convolutional layers are inputted to the output of the feature map buffer including a feature map and the output of the filter buffer containing a filter as inputs, in parallel. The multiplication operation part to multiply, the accumulation operation part to accumulate the output of the multiplication operation part, the addition tree operation part to add the output of the multiplication operation part in the form of an addition tree, and the output of the multiplication operation part to the input of either the accumulation operation part or the addition tree operation part And an operation selection unit to select and transmit.

본 발명의 일 실시예에 있어서, 상기 특징 지도 버퍼, 상기 필터 버퍼, 상기 곱셈 연산부, 상기 연산 선택부, 상기 누산 연산부 및 상기 덧셈트리 연산부는 복수개일 수 있다.In one embodiment of the present invention, the feature map buffer, the filter buffer, the multiplication operation unit, the operation selection unit, the accumulation operation unit, and the addition tree operation unit may be plural.

본 발명의 일 실시예에 있어서, 상기 특징 지도 버퍼의 개수는 상기 필터 버퍼의 개수와 동일할 수 있다.In one embodiment of the present invention, the number of feature map buffers may be the same as the number of filter buffers.

본 발명의 일 실시예에 있어서, 상기 합성곱 계층은 채널별 합성곱을 수행하고, 상기 곱셈 연산부는 상기 특징 지도의 채널 방향으로 곱셈을 수행할 수 있다.In one embodiment of the present invention, the convolution layer performs convolution for each channel, and the multiplication operator may multiply in the channel direction of the feature map.

본 발명의 일 실시예에 있어서, 상기 합성곱 계층은 채널별 합성곱 및 픽셀별 합성곱을 수행하고, 상기 곱셈 연산부는 상기 특징 지도의 채널 방향으로 곱셈을 수행하며, 상기 연산 선택부는 상기 채널별 합성곱을 수행할 경우 상기 곱셈 연산부의 출력을 상기 누산 연산부의 입력으로 전송하고, 상기 픽셀별 합성곱을 수행할때는 상기 곱셈 연산부의 출력을 상기 덧셈트리 연산부의 입력으로 전송할 수 있다.In one embodiment of the present invention, the convolution layer performs convolution by channel and convolution by pixel, and the multiplication operation unit performs multiplication in the channel direction of the feature map, and the operation selection unit performs synthesis by channel When performing multiplication, the output of the multiplication operation unit may be transmitted to the input of the accumulation operation unit, and when performing the multiplication by pixel, the output of the multiplication operation unit may be transmitted to the input of the addition tree operation unit.

본 발명의 일 실시예에 있어서, 선택적 합성곱 뉴럴 프로세서는 상기 특징 지도 버퍼의 출력 및 상기 필터 버퍼의 출력을 상기 곱셈 연산부 또는 타 연산부 중 어느 하나의 입력으로 선택하여 전송하는 합성곱 선택부를 더 포함할 수 있다.In one embodiment of the present invention, the selective convolutional neural processor further includes a convolution selection unit that selects and transmits the output of the feature map buffer and the output of the filter buffer as inputs of one of the multiplication operation unit or another operation unit. can do.

본 발명의 일 실시예에 있어서, 상기 타 연산부는 상기 특징 지도 버퍼의 출력 및 상기 필터 버퍼의 출력을 입력으로 하여 서로 더하는 덧셈 연산부 또는 상기 특징 지도 버퍼의 출력 및 상기 필터 버퍼의 출력을 입력으로 하여 산술 논리 연산을 수행하는 ALU(산술논리연산장치: Arithmetic and Logic Unit)일 수 있다.In one embodiment of the present invention, the other operation unit may input the output of the feature map buffer and the output of the filter buffer as inputs, or an addition operation unit or the output of the feature map buffer and the output of the filter buffer as inputs. It may be an ALU (Arithmetic and Logic Unit) that performs an arithmetic logic operation.

본 발명의 일 실시예에 있어서, 상기 연산 선택부는 선택입력을 포함하는 디멀티플렉서(Demux)일 수 있다.In one embodiment of the present invention, the operation selection unit may be a demultiplexer (Demux) including a selection input.

본 발명의 일 실시예에 있어서, 상기 필터 버퍼에는 필터들이 인터리빙 방식으로 저장될 수 있다.In one embodiment of the present invention, filters may be stored in the filter buffer in an interleaving manner.

본 발명의 일 실시예에 있어서, 상기 누산 연산부의 개수는 상기 필터 버퍼의 폭과 같고, 상기 덧셈트리 연산부의 입력의 개수는 상기 필터 버퍼의 폭과 같을 수 있다.In one embodiment of the present invention, the number of the accumulation operation units may be equal to the width of the filter buffer, and the number of inputs of the addition tree operation units may be equal to the width of the filter buffer.

상기한 본 발명의 목적을 실현하기 위한 합성곱 계층의 선택적 데이터 처리 방법은 곱셈 연산부가 특징 지도를 포함하는 특징 지도 버퍼의 출력 및 필터를 포함하는 필터 버퍼의 출력을 입력으로 하여 병렬로 서로 곱하는 단계, 연산 선택부가 상기 곱셈 연산부의 출력을 누산 연산부 또는 덧셈트리 연산부 중 어느 하나의 입력으로 선택하여 전송하는 연산 선택 단계, 누산 연산부가 상기 곱셈 연산부의 출력을 누산하는 단계 및 덧셈트리 연산부가 상기 곱셈 연산부의 출력을 덧셈트리 형식으로 더하는 단계를 포함한다.In order to realize the object of the present invention, the selective data processing method of the composite product layer includes multiplying and multiplying each other in parallel by inputting the output of the feature map buffer including the feature map and the output of the filter buffer containing the filter as input. , An operation selection step of selecting and transmitting the output of the multiplication operation unit as an input of an accumulation operation unit or an addition tree operation unit, an accumulation operation unit accumulating the output of the multiplication operation unit, and an addition tree operation unit the multiplication operation unit And adding the output of the in addition tree form.

본 발명의 일 실시예에 있어서, 상기 특징 지도 버퍼, 상기 필터 버퍼, 상기 곱셈 연산부, 상기 연산 선택부, 상기 누산 연산부 및 상기 덧셈트리 연산부는 복수개이고, 상기 병렬로 서로 곱하는 단계, 연산 선택 단계, 누산하는 단계 및 덧셈트리 형식으로 더하는 단계는 각 상기 특징 지도 버퍼, 상기 필터 버퍼, 상기 곱셈 연산부, 상기 연산 선택부, 상기 누산 연산부 및 상기 덧셈트리 연산부에서 독립적으로 수행될 수 있다.In one embodiment of the present invention, the feature map buffer, the filter buffer, the multiplication operation unit, the operation selection unit, the accumulation operation unit and the addition tree operation unit is plural, multiplying each other in parallel, operation selection step, The step of accumulating and adding in the form of an addition tree may be independently performed in each of the feature map buffer, the filter buffer, the multiplication operation unit, the operation selection unit, the accumulation operation unit, and the addition tree operation unit.

본 발명의 일 실시예에 있어서, 상기 합성곱 계층은 채널별 합성곱을 수행하고, 상기 병렬로 서로 곱하는 단계에서는 곱셈 연산부가 상기 특징 지도의 채널 방향으로 곱셈을 수행할 수 있다.In one embodiment of the present invention, the convolution layer performs convolution for each channel, and in the step of multiplying each other in parallel, a multiplication operator may multiply in the channel direction of the feature map.

본 발명의 일 실시예에 있어서, 상기 합성곱 계층은 채널별 합성곱 및 픽셀별 합성곱을 수행하고, 상기 병렬로 서로 곱하는 단계에서는 상기 곱셈 연산부가 상기 특징 지도의 채널 방향으로 곱셈을 수행하며, 상기 연산 선택 단계에서는 상기 연산 선택부는 상기 채널별 합성곱을 수행할 경우 상기 곱셈 연산부의 출력을 상기 누산 연산부의 입력으로 전송하고, 상기 픽셀별 합성곱을 수행할때는 상기 곱셈 연산부의 출력을 상기 덧셈트리 연산부의 입력으로 전송할 수 있다.In one embodiment of the present invention, the convolution layer performs convolution for each channel and convolution for each pixel, and in the step of multiplying each other in parallel, the multiplication operator performs multiplication in the channel direction of the feature map. In the operation selection step, the operation selection unit transmits the output of the multiplication operation unit to the input of the accumulation operation unit when performing the channel-wise synthesis product, and when performing the pixel-by-pixel synthesis product, inputs the output of the multiplication operation unit to the addition tree operation unit. Can be transferred.

본 발명의 일 실시예에 있어서, 합성곱 계층의 선택적 데이터 처리 방법은 합성곱 선택부가 상기 특징 지도 버퍼의 출력 및 상기 필터 버퍼의 출력을 상기 곱셈 연산부 또는 타 연산부 중 어느 하나의 입력으로 선택하여 전송하는 합성곱 선택 단계를 더 포함할 수 있다.In one embodiment of the present invention, in the selective data processing method of the convolution layer, the convolution selector selects and transmits the output of the feature map buffer and the output of the filter buffer as an input of the multiplication operator or another operation unit. It may further include a convolution selection step.

본 발명의 일 실시예에 있어서, 필터 버퍼가 필터들을 인터리빙 방식으로 저장하는 단계를 더 포함할 수 있다.In one embodiment of the present invention, the filter buffer may further include storing the filters in an interleaving manner.

본 발명의 실시예들에 따르면, 합성곱 계층의 선택적 데이터 처리 방법 및 합성곱 계층의 선택적 데이터 처리 방법을 이용하는 뉴럴 프로세서는 곱셈 연산부의 출력을 누산 연산부 또는 덧셈트리 연산부 중 어느 하나의 입력으로 선택해서 전송하는 연산 선택부를 포함한다. 따라서, 일반적인 합성곱, 채널별 합성곱 및 픽셀별 합성곱을 수행시 효과적인 연산기를 선택하여 효율적으로 가속할 수 있다.According to embodiments of the present invention, the neural processor using the selective data processing method of the convolutional layer and the selective data processing method of the convolutional layer selects the output of the multiplication operation unit as an input of either the accumulation operation unit or the addition tree operation unit. And an operation selection unit to transmit. Therefore, when performing the general convolution, the convolution by channel, and the convolution by pixel, an effective operator can be selected to accelerate efficiently.

또한, 합성곱 계층의 선택적 데이터 처리 방법 및 합성곱 계층의 선택적 데이터 처리 방법을 이용하는 뉴럴 프로세서는 특징 지도 버퍼의 출력 및 필터 버퍼의 출력을 곱셈 연산부 또는 타 연산부 중 어느 하나의 입력으로 선택해서 전송하는 합성곱 선택부를 포함한다. 따라서, 덧셈트리 데이터패스 구조를 포함하는 뉴럴 프로세서에서도 합성곱 연산이 아닌 타 연산을 제공할 수 있다.In addition, the neural processor using the selective data processing method of the convolution layer and the selective data processing method of the convolution layer selects and transmits the output of the feature map buffer and the output of the filter buffer as an input of one of a multiplication operation unit or another operation unit. It includes a convolution selection section. Therefore, the neural processor including the addition tree datapath structure can provide other operations, not convolutional operations.

도 1은 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서를 나타내는 구성도이다.
도 2는 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서를 나타내는 구성도이다.
도 3은 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서의 구현예를 나타내는 도면이다.
도 4는 일반적인 뉴럴 프로세서의 구현예를 나타내는 도면이다.
도 5는 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서를 나타내는 구성도이다.
도 6은 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서를 나타내는 구성도이다.
도 7은 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서를 나타내는 구성도이다.
도 8은 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서를 나타내는 구성도이다.
도 9는 본 발명의 일 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법을 나타내는 흐름도이다.
도 10은 본 발명의 일 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법을 나타내는 흐름도이다.
도 11은 본 발명의 일 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법을 나타내는 흐름도이다.
도 12는 본 발명의 일 실시예에 따른 특징 지도 및 필터를 나타내는 도면이다.
도 13은 본 발명의 일 실시예에 따른 픽셀별 합성곱 연산 및 채널별 합성곱 연산을 나타내는 도면이다.
도 14는 본 발명의 일 실시예에 따른 특징 지도 버퍼에 저장되는 특징 지도의 데이터 레이아웃을 나타내는 도면이다.
도 15는 본 발명의 일 실시예에 따른 필터 버퍼에 저장되는 필터의 데이터 레이아웃을 나타내는 도면이다.
도 16은 채널별 합성곱 연산 및 픽셀별 합성곱을 나타내는 도면이다.
도 17은 일반적인 뉴럴 프로세서 구조를 포함하는 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서를 나타내는 구성도이다.1 is a block diagram showing an optional convolutional neural processor according to an embodiment of the present invention.
2 is a block diagram showing an optional convolutional neural processor according to an embodiment of the present invention.
3 is a view showing an implementation of a selective convolutional neural processor according to an embodiment of the present invention.
4 is a view showing an implementation of a general neural processor.
5 is a block diagram showing an optional convolutional neural processor according to an embodiment of the present invention.
6 is a block diagram showing an optional convolutional neural processor according to an embodiment of the present invention.
7 is a block diagram showing an optional convolutional neural processor according to an embodiment of the present invention.
8 is a block diagram showing an optional convolutional neural processor according to an embodiment of the present invention.
9 is a flowchart illustrating a method of selectively processing a data of a convolution layer according to an embodiment of the present invention.
10 is a flowchart illustrating a method for selectively processing a data of a convolution layer according to an embodiment of the present invention.
11 is a flowchart illustrating a method of selectively processing a data of a convolution layer according to an embodiment of the present invention.
12 is a view showing a feature map and a filter according to an embodiment of the present invention.
13 is a diagram illustrating a convolution operation for each pixel and a convolution operation for each channel according to an embodiment of the present invention.
14 is a diagram illustrating a data layout of a feature map stored in a feature map buffer according to an embodiment of the present invention.
15 is a view showing a data layout of a filter stored in a filter buffer according to an embodiment of the present invention.
16 is a diagram showing a convolution calculation for each channel and a convolution for each pixel.
17 is a block diagram showing an optional convolutional neural processor according to an embodiment of the present invention including a general neural processor structure.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는 바, 실시예들을 본문에 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 개시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다.The present invention can be applied to various changes and can have various forms, and the embodiments are described in detail in the text. However, this is not intended to limit the present invention to a specific disclosure form, and it should be understood as including all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing each drawing, similar reference numerals are used for similar components. Terms such as first and second may be used to describe various components, but the components should not be limited by the terms.

상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.The terms are used only for the purpose of distinguishing one component from other components. The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise.

본 출원에서, "포함하다" 또는 "이루어진다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this application, terms such as “comprise” or “consist of” are intended to indicate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, and that one or more other features are present. It should be understood that the existence or addition possibilities of fields or numbers, steps, actions, components, parts or combinations thereof are not excluded in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person skilled in the art to which the present invention pertains. Terms, such as those defined in a commonly used dictionary, should be interpreted as having meanings consistent with meanings in the context of related technologies, and should not be interpreted as ideal or excessively formal meanings unless explicitly defined in the present application. Does not.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

일반적인 합성곱 계층을 포함하는 뉴럴 프로세서는 기계학습이 필요한 데이터를 입력받는 입력부, 상기 입력된 데이터의 특징을 추출하여 특징 지도를 생성하고 저장하는 특징 지도부, 상기 입력된 데이터에 대한 필터를 생성하고 저장하는 필터부, 상기 특징 지도와 상기 필터에 합성곱을 수행하는 합성곱 계층 및 상기 합성곱 계층을 출력 픽셀로 하는 출력부를 포함한다. 여기서 상기 특징 지도부는 특정 순서, 특정 시간 간격 및 특정 크기에 따라서 상기 특징 지도를 상기 합성곱 계층으로 전송하기 위한 특징 지도 버퍼를 포함할 수 있으며, 상기 필터부 또한 특정 순서, 특정 시간 간격 및 특정 크기에 따라서 상기 필터를 상기 합성곱 계층으로 전송하기 위한 필터 버퍼를 포함할 수 있다.The neural processor including a general convolutional layer generates an input unit for receiving data requiring machine learning, a feature map unit for extracting and storing feature maps of the input data, and generating and storing a filter for the input data. The filter unit includes a convolution layer that performs convolution on the feature map and the filter, and an output unit that uses the convolution layer as an output pixel. Here, the feature map unit may include a feature map buffer for transmitting the feature map to the composite product layer according to a specific order, specific time interval, and specific size, and the filter unit also has a specific order, specific time interval, and specific size. Accordingly, a filter buffer for transmitting the filter to the convolutional layer may be included.

상기 입력부, 상기 특징 지도부, 상기 필터부, 상기 합성곱 계층 및 상기 출력부에서 입력으로부터 출력 한 픽셀의 값을 구하기 위한 뉴럴 프로세서의 동작을 콘볼루션 신경망(CNN: Convolutional Neural Network)에서 예를 들어 설명하면, 도 12와 같이 상기 입력부는 Ci개의 크기가 W×H인 2차원 이미지를 입력받고, 상기 특징 지도부는 이들 입력을 크기가 W, H, Ci인 x, y, z 축 방향의 3차원 특징 지도 (feature map)로 저장하며, 상기 합성곱 계층은 상기 필터부에 저장된 KХKХCi 크기를 갖는 Co개의 3차원 필터를 상기 특징 지도에 적용하여 Co개의 다른 특징들이 추출된 출력 특징 지도를 생성할 수 있다. 이때, z 방향의 좌표를 채널이라 할 수 있다. 상기 뉴럴 프로세서의 합성곱 계층에서 상기 출력 특징 지도의 각 좌표를 얻기 위한 연산을 예를 들어 설명하면, 도 13과 같이 하나의 좌표를 얻기 위해서는 K*K*Ci 개의 곱셈과 동일한 개수의 덧셈을 수행하며, 좌표 개수가 H*W*Co개인 상기 출력 특징 지도의 모든 좌표를 계산하기 위해서는 H*W*Co*K*K*Ci개의 곱셈 연산 및 이와 동일한 개수의 덧셈을 수행할 수 있다.The convolutional neural network (CNN) describes, for example, the operation of a neural processor for obtaining a value of a pixel output from an input from the input unit, the feature map unit, the filter unit, the convolution layer, and the output unit. Then, as shown in FIG. 12, the input unit receives a two-dimensional image having Ci size W×H, and the feature map unit uses these inputs as three-dimensional features in the x, y, and z-axis directions of sizes W, H, and Ci. It is stored as a feature map, and the composite product layer can generate an output feature map from which Co other features are extracted by applying Co three-dimensional filters having KХKХCi size stored in the filter unit to the feature map. . At this time, the coordinates in the z direction may be referred to as a channel. When an operation for obtaining each coordinate of the output feature map in the convolution layer of the neural processor is described as an example, in order to obtain one coordinate as shown in FIG. 13, K*K*Ci multiplication and addition of the same number are performed. In order to calculate all coordinates of the output feature map having H*W*Co coordinates, H*W*Co*K*K*Ci multiplication operations and the same number of additions may be performed.

본 발명의 선택적 합성곱 뉴럴 프로세서는 특징 지도 버퍼, 필터 버퍼 및 합성곱 계층을 포함하는 뉴럴 프로세서에 대한 것으로, 본 발명의 구성 및 특징들은 특징 지도 버퍼, 필터 버퍼 및 합성곱 계층을 포함하는 모든 뉴럴 프로세서의 구조에 적용될 수 있으며, 그 구성을 포함할 수 있다, 예를 들면, 도 17과 같이 본 발명의 선택적 합성곱 뉴럴 프로세서는 입력부, 특징 지도 버퍼를 포함하는 특징 지도부, 필터 버퍼를 포함하는 필터부, 합성곱 계층 및 출력부를 포함할 수 있다. 따라서, 일반적인 뉴럴 프로세서의 구성 및 구조에 대한 상세한 설명은 생략하며, 본 발명의 특징적인 구성에 대한 부분만을 상세하게 설명한다. The selective convolutional neural processor of the present invention relates to a neural processor including a feature map buffer, a filter buffer, and a convolutional layer, and the configuration and features of the present invention are all neural including a feature map buffer, a filter buffer, and a convolutional layer. It may be applied to the structure of the processor, and may include a configuration thereof. For example, as shown in FIG. 17, the optional convolutional neural processor of the present invention includes an input unit, a feature map unit including a feature map buffer, and a filter including a filter buffer It may include a sub, a convolutional layer and an output. Therefore, a detailed description of the configuration and structure of a general neural processor is omitted, and only a portion of the characteristic configuration of the present invention will be described in detail.

도 1은 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서의 합성곱 계층을 나타내는 구성도이다. 도 2는 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서를 나타내는 구성도이다. 도 3은 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서의 구현예를 나타내는 도면이다. 1 is a block diagram showing a convolution layer of a selective convolution neural processor according to an embodiment of the present invention. 2 is a block diagram showing an optional convolutional neural processor according to an embodiment of the present invention. 3 is a view showing an implementation of a selective convolutional neural processor according to an embodiment of the present invention.

도 1 내지 도 3을 참조하면, 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서의 합성곱 계층은 곱셈 연산부, 연산 선택부, 누산 연산부 및 덧셈트리 연산부를 포함한다. 1 to 3, the convolution layer of the selective convolution neural processor according to an embodiment of the present invention includes a multiplication operation unit, an operation selection unit, an accumulation operation unit, and an addition tree operation unit.

상기 특징 지도 버퍼는 상기 특징 지도(feature map)를 입력 받아 행 단위로 저장하여 출력할 수 있다. 예를 들면, 상기 특징 지도는 영상 데이터에서 사람의 시각에 두드러지게 구분되는 영역을 표현하는 특징을 나타내는 정보일 수 있다. 상기 특징 지도 버퍼의 행 단위로 출력되는 데이터는 상기 합성곱 계층으로 전송될 수 있다. 상기 특징 지도 버퍼는 입력을 행 단위로 저장하고 출력하는 시프트 버퍼일 수 있으나, 본 발명은 이에 한정되는 것은 아니고, 상기 특징 지도를 행 단위로 저장하고 출력하는 다양한 방식의 버퍼가 사용될 수 있다.The feature map buffer may receive the feature map, store it in rows, and output the feature map. For example, the feature map may be information representing features expressing regions distinctly distinguished from a person's perspective in image data. Data output in units of rows of the feature map buffer may be transmitted to the composite layer. The feature map buffer may be a shift buffer that stores and outputs input in units of rows, but the present invention is not limited thereto, and various buffers for storing and outputting the feature map in units of rows may be used.

상기 특징 지도는 상기 특징 지도 버퍼에 채널방향으로 저장될 수 있다. 예를 들면, 도 14와 같이 특징 지도는 x,y,z 축을 갖는 3차원 데이터이고, 상기 특징 지도 버퍼의 하나의 행에는 상기 3차원 데이터인 특징 지도의 z축 방향의 데이터가 순차적으로 저장될 수 있다. 따라서, 상기 특징 지도 버퍼는 채널방향으로 저장된 상기 특징 지도를 합성곱 계층의 입력으로 전송할 수 있다.The feature map may be stored in a channel direction in the feature map buffer. For example, as shown in FIG. 14, the feature map is three-dimensional data having x, y, and z axes, and data in the z-axis direction of the feature map that is the three-dimensional data is sequentially stored in one row of the feature map buffer. Can. Therefore, the feature map buffer can transmit the feature map stored in the channel direction as an input of a composite product layer.

상기 필터 버퍼는 상기 필터를 행 단위로 저장하여 출력할 수 있다. 예를 들면, 상기 필터는 가중치 정보를 갖는 가중치 필터일 수 있다. 예를 들면, 상기 선택적 합성곱 뉴럴 프로세서의 입력은 2차원 영상 데이터이고, 상기 특징 지도는 상기 영상 데이터에서 사람의 시각에 두드러지게 구분되는 영역을 표현한 데이터이며, 상기 필터는 상기 특징 지도에 대한 가중치 데이터일 수 있다. 예를 들면, 데이터패스에서 덧셈트리 대신 누산기나 ALU를 사용하는 경우 상기 필터는 이전 단계에서 출력된 특징 지도일 수 있다. 상기 필터 버퍼의 행 단위로 출력되는 데이터는 상기 합성곱 계층으로 전송될 수 있다. 상기 필터 버퍼는 입력을 행 단위로 저장하고 출력하는 시프트 버퍼일 수 있으나, 본 발명은 이에 한정되는 것은 아니고, 상기 필터를 행 단위로 저장하고 출력하는 다양한 방식의 버퍼가 사용될 수 있다.The filter buffer may store and output the filter in units of rows. For example, the filter may be a weight filter having weight information. For example, the input of the selective convolutional neural processor is two-dimensional image data, the feature map is data representing an area that is distinguished from the human eye in the image data, and the filter is weighted for the feature map. It can be data. For example, when an accumulator or an ALU is used instead of an addition tree in the datapath, the filter may be a feature map output in the previous step. Data output in units of rows of the filter buffer may be transmitted to the composite layer. The filter buffer may be a shift buffer that stores and outputs input in units of rows, but the present invention is not limited thereto, and various buffers for storing and outputting the filters in units of rows may be used.

상기 필터 버퍼에는 필터들이 인터리빙 방식으로 저장될 수 있다. 인터리빙 방식이란 끼워넣기 방식으로 데이터를 정렬하는 방식으로, 예를 들면, 도 16과 같이 폭(하나의 행의 원소 개수)이 L인 필터 버퍼에는 L개의 필터의 원소가 하나씩 순차적으로 돌아가며 저장될 수 있다. 예를 들면, 하나의 필터는 상기 필터 버퍼의 깊이 방향으로 배치될 수 있다. 상기 필터 버퍼는 필터의 크기와 동일한 개수의 행의 개수를 가질 수 있다. 예를 들면, 상기 필터의 크기는 K*K이고 상기 필터 버퍼는 K*K개의 행에 깊이 방향으로 저장되며, K*K개의 클럭 사이클에 상기 필터 버퍼의 전송이 종료되며 상기 필터 버퍼와 연결된 누산 연산부의 최종값이 출력 픽셀 값이 될 수 있다.Filters may be stored in the filter buffer in an interleaving manner. The interleaving method is a method of sorting data by an interleaving method. For example, in the filter buffer having a width (number of elements in one row) L as shown in FIG. 16, elements of L filters may be sequentially rotated and stored one by one. have. For example, one filter may be disposed in the depth direction of the filter buffer. The filter buffer may have the same number of rows as the size of the filter. For example, the size of the filter is K*K, the filter buffer is stored in the depth direction in K*K rows, the transmission of the filter buffer is terminated in K*K clock cycles, and an accumulation connected to the filter buffer The final value of the operation unit may be an output pixel value.

상기 특징 지도 버퍼의 개수는 상기 필터 버퍼의 개수와 동일할 수 있다. 예를 들면, 상기 특징 지도 버퍼에 n개의 필터가 사용될 때 상기 n개의 필터들은 인터리빙 방식으로 하나의 필터 버퍼에 저장될 수 있다.The number of the feature map buffers may be the same as the number of filter buffers. For example, when n filters are used in the feature map buffer, the n filters may be stored in one filter buffer in an interleaving manner.

상기 곱셈 연산부는 특징 지도를 포함하는 특징 지도 버퍼의 출력 및 필터를 포함하는 필터 버퍼의 출력을 입력으로 하여 병렬로 서로 곱할 수 있다. 예를 들면, 상기 특징 지도 버퍼는 L개의 원소(버퍼의 폭)를 갖는 시프트 버퍼이고, 상기 필터 버퍼 또한 L개의 원소(버퍼의 폭)를 갖는 시프트 버퍼이며, 상기 곱셈 연산부는 상기 특징 지도 버퍼 및 상기 필터 버퍼의 각 원소에 1:1로 연결되는 복수개의 곱셈기 일 수 있다. 따라서, 상기 곱셈 연산부에서 곱셈기의 개수는 상기 특징 지도 버퍼의 폭 또는 상기 필터 버퍼의 폭과 같을 수 있다.The multiplication operation unit may multiply each other in parallel by using the output of the feature map buffer including the feature map and the output of the filter buffer including the filter as input. For example, the feature map buffer is a shift buffer having L elements (width of the buffer), the filter buffer is also a shift buffer having L elements (width of the buffer), and the multiplication operation unit comprises the feature map buffer and It may be a plurality of multipliers connected 1:1 to each element of the filter buffer. Accordingly, the number of multipliers in the multiplication operation unit may be equal to the width of the feature map buffer or the width of the filter buffer.

상기 누산 연산부는 상기 곱셈 연산부의 출력을 누산할 수 있다. 상기 누산 연산부의 개수는 상기 필터 버퍼의 폭(L)과 같을 수 있다. 상기 누산 연산부의 개수는 상기 특징 지도 버퍼의 폭과 같을 수 있다. 상기 누산 연산부는 각 출력 픽셀의 부분합을 계산할 수 있다. 출력 픽셀이란 상기 출력부의 출력인 출력 특징 지도의 픽셀을 의미한다.The accumulation operator may accumulate the output of the multiplication operator. The number of the accumulation operation units may be equal to the width L of the filter buffer. The number of the accumulation operation units may be equal to the width of the feature map buffer. The accumulation operation unit may calculate a subtotal of each output pixel. The output pixel means a pixel of an output feature map that is an output of the output unit.

상기 누산 연산부는 복수개의 누산기를 포함할 수 있다. 각 누산기의 출력은 하나의 출력 픽셀일 수 있다. 상기 누산 연산부의 출력은 상기 출력부로 전송될 수 있다. 상기 누산 연산부는 상기 곱셈 연산부가 포함하는 곱셈기의 개수와 동일한 개수의 누산기를 포함할 수 있다. 예를 들면, 상기 곱셈 연산부는 상기 특징 지도 버퍼의 폭 및 상기 필터 버퍼의 폭(L)과 동일한 개수의 곱셈기를 포함하고, 상기 누산 연산부는 상기 곱셈기의 개수와 동일한 개수의 누산기를 포함하며, 상기 곱셈기들의 출력은 상기 누산기들의 입력에 각각 연결될 수 있고, 상기 곱셈기들의 출력과 상기 누산기들의 입력 사이에는 연산 선택부가 배치될 수 있다. 또한, 상기 곱셈기들의 출력과 상기 누산기들의 입력 사이에는 데이터 버스가 배치될 수 있으나, 본 발명에서는 데이터 버스의 유무를 한정하는 것은 아니다.The accumulator operation unit may include a plurality of accumulators. The output of each accumulator can be one output pixel. The output of the accumulation operation unit may be transmitted to the output unit. The accumulation operator may include the same number of accumulators as the number of multipliers included in the multiplication operator. For example, the multiplication operation unit includes the same number of multipliers as the width of the feature map buffer and the width L of the filter buffer, and the accumulation operation unit includes the same number of accumulators as the number of multipliers, and The outputs of the multipliers may be respectively connected to the inputs of the accumulators, and an operation selector may be disposed between the outputs of the multipliers and the inputs of the accumulators. In addition, a data bus may be arranged between the outputs of the multipliers and the inputs of the accumulators, but the present invention does not limit the presence or absence of a data bus.

상기 덧셈트리 연산부는 상기 곱셈 연산부의 출력을 덧셈트리 형식으로 더할 수 있다. 상기 덧셈트리 연산부의 입력의 개수는 상기 필터 버퍼의 폭(L)과 같을 수 있다. 상기 덧셈트리 연산부의 입력의 개수는 상기 특징 지도 버퍼의 폭(L)과 같을 수 있다. The addition tree operation unit may add the output of the multiplication operation unit in the form of an addition tree. The number of inputs of the addition tree operation unit may be equal to the width L of the filter buffer. The number of inputs of the addition tree operation unit may be equal to the width L of the feature map buffer.

상기 덧셈트리 연산부는 상기 필터의 모든 원소를 전부 입력으로 받을때까지 상기 덧셈트리 연산부의 출력이 상기 덧셈트리 연산부의 입력으로 다시 되먹임될 수 있다. 따라서, 상기 필터별 덧셈 결과값을 누적할 수 있다. 예를 들면, 상기 덧셈트리 연산부의 되먹임 되지 않는 최종 출력은 상기 필터 버퍼의 필터별 부분합일 수 있다. 여기서, 상기 필터는 상기 필터 버퍼에 인터리빙 형식으로 저장되는 필터들의 집합을 의미할 수 있다.The output of the addition tree operation unit may be fed back to the input of the addition tree operation unit until all the elements of the filter are received as input. Therefore, it is possible to accumulate the result of addition for each filter. For example, the non-feedback final output of the addition tree operation unit may be a subtotal for each filter of the filter buffer. Here, the filter may mean a set of filters stored in an interleaving format in the filter buffer.

상기 덧셈트리 연산부의 출력은 하나의 출력 픽셀일 수 있다. 상기 덧셈트리 연산부의 출력은 상기 출력부로 전송될 수 있다. 상기 덧셈트리 연산부는 상기 곱셈 연산부가 포함하는 곱셈기의 개수와 동일한 개수의 입력 데이터패스를 포함할 수 있다. 예를 들면, 상기 곱셈 연산부는 상기 특징 지도 버퍼의 폭 및 상기 필터 버퍼의 폭(L)과 동일한 개수의 곱셈기를 포함하고, 상기 덧셈트리 연산부는 상기 곱셈기의 개수와 동일한 개수의 입력 데이터패스를 포함하며, 상기 곱셈기들의 출력은 상기 덧셈트리 연산부의 입력 데이터패스에 각각 연결될 수 있고, 상기 곱셈기들의 출력과 상기 덧셈트리 연산부의 입력 데이터패스 사이에는 연산 선택부가 배치될 수 있다. 또한, 상기 곱셈기들의 출력과 상기 덧셈트리 연산부의 입력 데이터패스 사이에는 데이터 버스가 배치될 수 있으나, 본 발명에서는 데이터 버스의 유무를 한정하는 것은 아니다.The output of the addition tree operator may be one output pixel. The output of the addition tree operation unit may be transmitted to the output unit. The addition tree operation unit may include the same number of input data paths as the number of multipliers included in the multiplication operation unit. For example, the multiplication operation unit includes the same number of multipliers as the width of the feature map buffer and the width L of the filter buffer, and the addition tree operation unit includes the same number of input data paths as the number of multipliers. The outputs of the multipliers may be respectively connected to the input datapath of the addition tree operation unit, and an operation selection unit may be disposed between the outputs of the multipliers and the input datapath of the addition tree operation unit. In addition, a data bus may be arranged between the outputs of the multipliers and the input data path of the addition tree operation unit, but the presence or absence of a data bus is not limited in the present invention.

상기 덧셈트리 연산부는 L개의 곱셈으로부터 하나의 출력 픽셀값을 생성하며, 상기 누산 연산부는 L개의 곱셈으로부터 L개의 각각 다른 출력 픽셀 값을 계산할 수 있다.The addition tree operation unit generates one output pixel value from L multiplications, and the accumulation operation unit may calculate L different output pixel values from L multiplications.

상기 연산 선택부는 상기 곱셈 연산부의 출력을 상기 누산 연산부 또는 상기 덧셈트리 연산부 중 어느 하나의 입력으로 선택하여 전송할 수 있다. 상기 연산 선택부는 상기 곱셈 연산부의 출력을 상기 누산 연산부 또는 상기 덧셈트리 연산부 중 어느 하나의 입력으로 선택하여 전송하도록 결정하는 제어입력을 포함할 수 있다. 예를 들면, 상기 연산 선택부는 선택입력을 포함하는 디멀티플렉서(Demux)이고, 상기 제어입력은 상기 선택입력일 수 있다. 디멀티플렉서란 하나의 입력 신호를 받아 다수의 출력 데이터패스중 하나로 출력하는 장치로서, 디멀티플렉서의 선택입력이란 상기 다수의 출력 데이터패스를 선택하는 입력신호이다. 상기 연산 선택부는 복수개의 디멀티플렉서(Demux)를 포함할 수 있다.The operation selection unit may select and transmit the output of the multiplication operation unit as an input of the accumulation operation unit or the addition tree operation unit. The operation selection unit may include a control input for determining to transmit the output of the multiplication operation unit as an input of the accumulation operation unit or the addition tree operation unit. For example, the operation selection unit may be a demultiplexer (Demux) including a selection input, and the control input may be the selection input. A demultiplexer is a device that receives one input signal and outputs one of a plurality of output data paths, and the select input of the demultiplexer is an input signal for selecting the plurality of output data paths. The operation selection unit may include a plurality of demultiplexers (Demux).

상기 연산 선택부는 채널별 합성곱을 수행할 경우에는 상기 곱셈 연산부의 출력을 상기 누산 연산부의 입력으로 전송할 수 있다. 채널별 합성곱은 합성곱을 채널별로 수행하는 것을 의미한다. 예를 들면, 채널별로 독립적인 필터를 갖는 뎁스별 컨볼루션(Depthwise Convolution) 등이 포함될 수 있다. 상기 채널별 합성곱의 필터에서 채널의 크기는 1일 수 있다. 상기 곱셈 연산부는 상기 특징 지도의 채널 방향으로 곱셈을 수행할 수 있다. 예를 들면, 상기 합성곱 계층은 채널별 합성곱을 수행하고, 상기 곱셈 연산부는 상기 특징 지도의 채널 방향으로 곱셈을 수행하며, 상기 누산 연산부는 상기 곱셈 연산부의 출력을 누산하여 출력 픽셀을 병렬로 생성할 수 있다.When performing the multiplication for each channel, the operation selection unit may transmit the output of the multiplication operation unit to the input of the accumulation operation unit. The convolution for each channel means to perform the convolution for each channel. For example, depth-wise convolution having an independent filter for each channel may be included. The size of the channel may be 1 in the filter of the composite product for each channel. The multiplication operation unit may perform multiplication in a channel direction of the feature map. For example, the convolution layer performs convolution for each channel, the multiplication operator performs multiplication in the channel direction of the feature map, and the accumulation operator accumulates the output of the multiplication operator to generate output pixels in parallel. can do.

상기 합성곱 계층(Convolution layer)은 채널별 합성곱 및 픽셀별 합성곱을 수행하고, 상기 곱셈 연산부는 상기 특징 지도의 채널 방향으로 곱셈을 수행할 수 있다. 상기 연산 선택부는 채널별 합성곱의 수행시에는 상기 곱셈 연산부의 출력을 상기 누산 연산부의 입력으로 전송하고, 상기 픽셀별 합성곱의 수행시에는 상기 곱셈 연산부의 출력을 상기 덧셈트리 연산부의 입력으로 전송할 수 있다. 픽셀별 합성곱(Pointwise Convolution)이란 하나의 입력 픽셀에 대하여 1대1로 합성곱 연산을 하는 방법을 의미한다. 예를 들면, 도 16과 같이 뎁스별 컨볼루션의 수행 결과값에 가중치 필터를 1대1로 곱한 후 그 결과값을 전부 더하여 하나의 필터 당 하나의 출력 픽셀을 생성할 수 있다. 상기 합성곱 계층(Convolution layer)은 채널별 합성곱 및 픽셀별 합성곱을 순차적으로 수행하는 세퍼러블 컨볼루션(Depthwise Separable Convolution)을 수행할 수 있다.The convolutional layer performs convolutional per channel and convolutional per pixel, and the multiplication operator may multiply in the channel direction of the feature map. The operation selection unit transmits the output of the multiplication operation unit to the input of the accumulation operation unit when performing the multiplication by channel, and transmits the output of the multiplication operation unit to the input of the addition tree operation unit when performing the pixel-by-pixel synthesis operation. Can. Pointwise convolution per pixel means a method of performing a one-to-one convolution operation on one input pixel. For example, as shown in FIG. 16, one output pixel per filter can be generated by multiplying the result of performing convolution for each depth by a weight filter by one-to-one and adding all the results. The convolution layer may perform a depthwise separable convolution that sequentially performs convolution for each channel and convolution for each pixel.

도 2를 참조하면, 상기 특징 지도 버퍼, 상기 필터 버퍼, 상기 곱셈 연산부, 상기 연산 선택부, 상기 누산 연산부 및 상기 덧셈트리 연산부는 복수개일 수 있다.Referring to FIG. 2, the feature map buffer, the filter buffer, the multiplication operation unit, the operation selection unit, the accumulation operation unit, and the addition tree operation unit may be plural.

상기 복수개의 특징 지도 버퍼는 특징 지도를 복수개로 나누어 저장할 수 있다. 예를 들면, 상기 특징 지도 버퍼는 N개이고, 상기 특징 지도 버퍼들은 상기 특징 지도를 채널 방향으로 N등분하여 저장할 수 있다. 이때, 상기 특징 지도의 채널 크기가 상기 특징 지도의 폭(W)과 상기 특징 지도 버퍼의 개수(N)의 곱보다 작을 때에는 상기 특징 지도를 X 방향 또는 Y 방향으로 나누어 상기 특징 지도 버퍼에 저장할 수 있다. 따라서 상기 특징 지도의 폭(W)과 상기 특징 지도 버퍼의 개수(N)를 곱한 개수만큼의 출력 픽셀이 동시에 병렬적으로 연산될 수 있다. The plurality of feature map buffers may divide and store the feature maps into a plurality. For example, the number of feature map buffers is N, and the feature map buffers may store the feature map by dividing it into N equal channels. At this time, when the channel size of the feature map is smaller than the product of the width (W) of the feature map and the number (N) of the feature map buffers, the feature maps may be divided into X or Y directions and stored in the feature map buffer. have. Accordingly, as many output pixels as the product of the width W of the feature map and the number N of the feature map buffers can be simultaneously calculated in parallel.

상기 특징 지도 버퍼, 상기 필터 버퍼, 상기 곱셈 연산부, 상기 연산 선택부, 상기 누산 연산부 및 상기 덧셈트리 연산부의 개수는 동일할 수 있다. 예를 들면, 하나의 상기 특징 지도 버퍼 및 하나의 상기 필터 버퍼는 하나의 곱셈 연산부에 연결되고, 상기 하나의 곱셈 연산부는 하나의 상기 연산 선택부에 연결되며, 상기 하나의 연산 선택부는 하나의 상기 누산 연산부 및 하나의 상기 덧셈트리 연산부에 연결될 수 있다. 이때, 상기 하나의 연산 선택부는 복수개의 디멀티플렉서를 포함할 수 있고, 상기 하나의 누산 연산부는 복수개의 누산기를 포함할 수 있다.The number of the feature map buffer, the filter buffer, the multiplication operation unit, the operation selection unit, the accumulation operation unit, and the addition tree operation unit may be the same. For example, one feature map buffer and one filter buffer are connected to one multiplication operation unit, the one multiplication operation unit is connected to one operation selection unit, and the one operation selection unit is one of the It may be connected to an accumulating operation unit and one of the addition tree operation units. In this case, the one operation selection unit may include a plurality of demultiplexers, and the one accumulation operation unit may include a plurality of accumulators.

도 5는 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서를 나타내는 구성도이다. 도 6은 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서를 나타내는 구성도이다. 도 7은 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서를 나타내는 구성도이다. 도 8은 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서를 나타내는 구성도이다.5 is a block diagram showing an optional convolutional neural processor according to an embodiment of the present invention. 6 is a block diagram showing an optional convolutional neural processor according to an embodiment of the present invention. 7 is a block diagram showing an optional convolutional neural processor according to an embodiment of the present invention. 8 is a block diagram showing an optional convolutional neural processor according to an embodiment of the present invention.

본 실시예에 따른 선택적 합성곱 뉴럴 프로세서는 합성곱 선택부, 덧셈 연산부 및 ALU를 제외하고는 도 1 내지 도 3의 선택적 합성곱 뉴럴 프로세서와 실질적으로 동일하다. 따라서, 도 1 내지 도 3의 선택적 합성곱 뉴럴 프로세서와 동일한 구성요소는 동일한 도면 부호를 부여하고, 반복되는 설명은 생략한다.The selective convolutional neural processor according to this embodiment is substantially the same as the selective convolutional neural processor of FIGS. 1 to 3 except for the convolution selection unit, the addition operation unit, and the ALU. Therefore, the same components as those of the selective convolution neural processor of FIGS. 1 to 3 are given the same reference numerals, and repeated descriptions are omitted.

도 5 내지 도 8을 참조하면, 본 발명의 일 실시예에 따른 선택적 합성곱 뉴럴 프로세서는 합성곱 선택부를 포함하며, 덧셈 연산부 또는 ALU를 포함한다5 to 8, the optional convolutional neural processor according to an embodiment of the present invention includes a convolution selection unit, and includes an addition operation unit or an ALU.

상기 덧셈 연산부는 상기 특징 지도 버퍼의 출력 및 상기 필터 버퍼의 출력을 입력으로 하여 서로 더할 수 있다. 상기 덧셈 연산부는 픽셀간의 덧셈을 수행할 수 있다. 예를 들면, 상기 합성곱 계층은 상기 출력부의 출력인 출력 특징 지도가 상기 필터 버퍼로 연결되어 상기 특징 지도부의 특징 지도에 더해지는 연산을 수행하고, 상기 덧셈 연산부는 상기 출력 특징 지도 및 상기 특징 지도의 덧셈 연산을 수행할 수 있다.The addition operation unit may add each other by using the output of the feature map buffer and the output of the filter buffer as inputs. The addition operation unit may perform addition between pixels. For example, the convolution layer performs an operation in which an output feature map, which is an output of the output unit, is connected to the filter buffer and is added to the feature map of the feature map unit, and the addition operation unit comprises the output feature map and the feature map. You can perform addition operations.

상기 합성곱 선택부는 상기 특징 지도 버퍼의 출력 및 상기 필터 버퍼의 출력을 상기 곱셈 연산부 또는 상기 덧셈 연산부 중 어느 하나의 입력으로 선택하여 전송할 수 있다. 상기 합성곱 선택부는 복수개일 수 있다. 상기 합성곱 선택부들은 상기 특징 지도 버퍼의 출력을 상기 곱셈 연산부 또는 상기 덧셈 연산부 중 어느 하나의 입력으로 선택하여 전송하는 제1 합성곱 선택부 및 상기 필터 버퍼의 출력을 상기 곱셈 연산부 또는 상기 덧셈 연산부 중 어느 하나의 입력으로 선택하여 전송하는 제2 합성곱 선택부를 포함할 수 있다. 예를 들면, 상기 합성곱 계층은 상기 출력부의 출력인 출력 특징 지도가 상기 특징 지도부로 연결되어 상기 특징 지도부의 특징 지도에 더해지는 연산을 수행하고, 상기 제1 합성곱 선택부는 상기 특징 지도 버퍼의 출력을, 상기 제2 합성곱 선택부는 상기 필터 버퍼의 출력을 상기 덧셈 연산부로 전송하며, 상기 덧셈 연산부는 상기 특징 지도 버퍼의 출력과 상기 필터 버퍼의 출력을 더할 수 있다. 이때, 상기 필터 버퍼에는 상기 출력 특징 지도가 저장될 수 있다. 그러나, 본 발명은 이에 한정되는 것은 아니고, 상기 필터 버퍼에는 상기 특징 지도와 덧셈 연산이 필요한 다양한 데이터가 저장될 수 있다.The composite product selection unit may select and transmit the output of the feature map buffer and the output of the filter buffer as an input of either the multiplication operation unit or the addition operation unit. The composite product selection unit may be a plurality. The convolution selection units select a first convolution product selection unit for transmitting the output of the feature map buffer as an input of the multiplication operation unit or the addition operation unit, and the multiplication operation unit or the addition operation unit to output the output of the filter buffer. It may include a second convolution selection unit for selecting and transmitting as any one of the input. For example, the composite product layer performs an operation in which an output feature map, which is an output of the output unit, is connected to the feature map unit and added to the feature map of the feature map unit, and the first composite product selection unit outputs the feature map buffer. For example, the second convolution selection unit may transmit the output of the filter buffer to the addition operation unit, and the addition operation unit may add the output of the feature map buffer and the output of the filter buffer. At this time, the output feature map may be stored in the filter buffer. However, the present invention is not limited to this, and various data requiring the feature map and the addition operation may be stored in the filter buffer.

상기 ALU(산술논리연산장치: Arithmetic and Logic Unit)는 상기 특징 지도 버퍼의 출력 및 상기 필터 버퍼의 출력을 입력으로 하여 산술 논리 연산을 수행할 수 있다. The ALU (Arithmetic and Logic Unit) may perform arithmetic logic operations by inputting the output of the feature map buffer and the output of the filter buffer as inputs.

상기 합성곱 선택부는 상기 특징 지도 버퍼의 출력 및 상기 필터 버퍼의 출력을 상기 곱셈 연산부 또는 상기 ALU중 어느 하나의 입력으로 선택하여 전송할 수 있다. 상기 합성곱 선택부는 복수개일 수 있다. 상기 합성곱 선택부들은 상기 특징 지도 버퍼의 출력을 상기 곱셈 연산부 또는 상기 ALU중 어느 하나의 입력으로 선택하여 전송하는 제3 합성곱 선택부 및 상기 필터 버퍼의 출력을 상기 곱셈 연산부 또는 상기 ALU중 어느 하나의 입력으로 선택하여 전송하는 제4 합성곱 선택부를 포함할 수 있다.The composite product selection unit may select and transmit the output of the feature map buffer and the output of the filter buffer as one of the multiplication operation unit or the ALU. The composite product selection unit may be plural. The convolution selectors select a third convolution selector for transmitting the output of the feature map buffer as an input of the multiplication operator or the ALU, and the output of the filter buffer, either the multiplication operator or the ALU. It may include a fourth convolution selection unit for selecting and transmitting as one input.

상기 합성곱 선택부는 상기 덧셈 연산부 또는 ALU 외에도 다양한 연산기와 연결되어 선택적 연산을 구현할 수 있다.In addition to the addition operator or the ALU, the convolution product selection unit may be connected to various operators to implement selective operation.

상기 특징 지도 버퍼, 상기 필터 버퍼, 상기 곱셈 연산부, 상기 연산 선택부, 상기 누산 연산부, 상기 덧셈트리 연산부, 상기 합성곱 선택부, 상기 덧셈 연산부 및 상기 ALU는 복수개일 수 있다. 이러한 구조는 상기 도 2를 참조하여 설명한 부분에 상기 합성곱 선택부와 상기 덧셈 연산부 또는 상기 ALU가 추가된 것을 제외하면 실질적으로 동일하므로 반복되는 설명은 생략한다.The feature map buffer, the filter buffer, the multiplication operation unit, the operation selection unit, the accumulation operation unit, the addition tree operation unit, the synthesis product selection unit, the addition operation unit, and the ALU may be plural. This structure is substantially the same except that the composite product selection unit and the addition operation unit or the ALU are added to a portion described with reference to FIG. 2, and thus repeated descriptions are omitted.

상기 합성곱 선택부의 개수는 상기 특징 지도 버퍼의 개수와 동일할 수 있다. 이때, 상기 합성곱 선택부는 상기 제1 합성곱 선택부 및 상기 제2 합성곱 선택부 또는 상기 제3 합성곱 선택부 및 상기 제4 합성곱 선택부를 포함하는 하나의 합성곱 선택부 그룹을 의미한다. 상기 덧셈 연산부의 개수는 상기 특징 지도 버퍼의 개수와 동일할 수 있다. 상기 ALU의 개수는 상기 특징 지도 버퍼의 개수와 동일할 수 있다.The number of the composite product selection units may be the same as the number of feature map buffers. In this case, the convolution product selection unit refers to a group of the convolution product selection unit including the first convolution product selection unit and the second convolution product selection unit or the third convolution product selection unit and the fourth convolution product selection unit. . The number of addition operation units may be equal to the number of the feature map buffers. The number of ALUs may be the same as the number of feature map buffers.

도 9는 본 발명의 일 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법을 나타내는 흐름도이다. 도 10은 본 발명의 일 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법을 나타내는 흐름도이다. 도 11은 본 발명의 일 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법을 나타내는 흐름도이다.9 is a flowchart illustrating a method of selectively processing a data of a convolution layer according to an embodiment of the present invention. 10 is a flowchart illustrating a method for selectively processing a data of a convolution layer according to an embodiment of the present invention. 11 is a flowchart illustrating a method of selectively processing a data of a convolution layer according to an embodiment of the present invention.

본 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법은 선택적 합성곱 뉴럴 프로세서에서 수행되며, 카테고리만 상이할 뿐 도 1 내지 도 3, 도 5 내지 도 8의 선택적 합성곱 뉴럴 프로세서와 실질적으로 동일하다. 따라서, 도 1 내지 도 3, 도 5 내지 도 8의 선택적 합성곱 뉴럴 프로세서와 동일한 구성요소는 동일한 도면 부호를 부여하고, 반복되는 설명은 생략하며, 각 단계의 연관성만을 추가 기술한다.The selective data processing method of the convolutional layer according to the present embodiment is performed in the selective convolutional neural processor, and is substantially the same as the selective convolutional neural processor of FIGS. 1 to 3 and 5 to 8 except only categories. . Accordingly, the same components as those of the selective convolutional neural processor of FIGS. 1 to 3 and 5 to 8 are given the same reference numerals, repeated descriptions are omitted, and only the association of each step is additionally described.

도 9 내지 도 11을 참조하면, 본 발명의 일 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법은 병렬로 서로 곱하는 단계(S100), 연산 선택 단계(S200), 누산하는 단계(S300) 및 덧셈트리 형식으로 더하는 단계(S400)를 포함한다.9 to 11, the selective data processing method of the convolution layer according to an embodiment of the present invention includes multiplying each other in parallel (S100), calculating selection step (S200), accumulating step (S300) and addition In step S400, a tree format is added.

상기 병렬로 서로 곱하는 단계(S100)에서는 곱셈 연산부가 특징 지도를 포함하는 특징 지도 버퍼의 출력 및 필터를 포함하는 필터 버퍼의 출력을 입력으로 하여 병렬로 서로 곱할 수 있다. In the step of multiplying each other in parallel (S100), the multiplication operation unit may multiply each other in parallel by using the output of the feature map buffer including the feature map and the output of the filter buffer including the filter as input.

상기 연산 선택 단계(S200)에서는 연산 선택부가 상기 곱셈 연산부의 출력을 누산 연산부 또는 덧셈트리 연산부 중 어느 하나의 입력으로 선택하여 전송할 수 있다.In the operation selection step (S200 ), the operation selection unit may select and transmit the output of the multiplication operation unit as an input of the accumulation operation unit or the addition tree operation unit.

상기 누산하는 단계(S300)에서는 누산 연산부가 상기 곱셈 연산부의 출력을 누산할 수 있다.In the accumulating step (S300), the accumulating operator may accumulate the output of the multiplication operator.

상기 덧셈트리 형식으로 더하는 단계(S400)에서는 덧셈트리 연산부가 상기 곱셈 연산부의 출력을 덧셈트리 형식으로 더하는 단계를 포함할 수 있다.In the adding (S400) of the addition tree format, the addition tree operation unit may include adding the output of the multiplication operation unit in the addition tree format.

상기 합성곱 계층은 채널별 합성곱 및 픽셀별 합성곱의 수행하고 상기 병렬로 서로 곱하는 단계(S100)에서는 상기 곱셈 연산부가 상기 특징 지도의 채널 방향으로 곱셈을 수행하며, 상기 연산 선택 단계(S200)에서는 상기 연산 선택부는 상기 채널별 합성곱을 수행할 경우 상기 곱셈 연산부의 출력을 상기 누산 연산부의 입력으로 전송하고, 상기 픽셀별 합성곱을 수행할때는 상기 곱셈 연산부의 출력을 상기 덧셈트리 연산부의 입력으로 전송할 수 있다. 따라서, 상기 채널별 합성곱을 수행할 경우 상기 누산하는 단계(S300)가 수행되고, 상기 픽셀별 합성곱을 수행할때는 상기 덧셈트리 형식으로 더하는 단계(S400)가 수행될 수 있다.The convolution layer performs convolution by channel and convolution by pixel and multiplies each other in parallel (S100), wherein the multiplication operator performs multiplication in the channel direction of the feature map, and the operation selection step (S200). In the operation selection unit, when performing the multiplication by channel, the output of the multiplication operation unit may be transmitted as an input of the accumulation operation unit, and when performing the multiplication by pixel, the output of the multiplication operation unit may be transmitted as an input of the addition tree operation unit. have. Accordingly, when performing the multiplication for each channel, the accumulating step (S300) is performed, and when performing the pixel-by-pixel synthesis, adding (S400) to the addition tree format may be performed.

상기 병렬로 서로 곱하는 단계(S100)는 상기 선택적 합성곱 뉴럴 프로세서의 곱셈 연산부가 수행하는 동작과 실질적으로 동일하며, 상기 연산 선택 단계(S200), 상기 누산하는 단계(S300) 및 상기 덧셈트리 형식으로 더하는 단계(S400)는 각각 상기 선택적 합성곱 뉴럴 프로세서의 연산 선택부, 누산 연산부 및 덧셈트리 연산부가 수행하는 동작과 실질적으로 동일하다. 따라서, 반복되는 설명은 생략한다.The step of multiplying each other in parallel (S100) is substantially the same as the operation performed by the multiplication operator of the selective convolutional neural processor, in the operation selection step (S200), the accumulation step (S300) and the addition tree format. The step of adding (S400) is substantially the same as the operation performed by the operation selection unit, the accumulation operation unit, and the addition tree operation unit of the selective convolutional neural processor, respectively. Therefore, repeated descriptions are omitted.

상기 합성곱 계층은 채널별 합성곱을 수행하고 상기 병렬로 서로 곱하는 단계(S100)에서는 곱셈 연산부가 상기 특징 지도의 채널 방향으로 곱셈을 수행할 수 있다. 상기 채널별 합성곱 및 이를 수행하는 방법에 대한 설명은 상기 선택적 합성곱 뉴럴 프로세서의 채널별 합성곱 부분의 설명과 동일하므로 반복되는 설명은 생략한다.In the convolution layer, the multiplication operation unit may perform multiplication in the channel direction of the feature map in step S100 of performing multiplication by channel and multiplying each other in parallel. The description of the convolution for each channel and a method for performing the same is the same as the description of the convolution for each channel of the selective convolutional neural processor, so repeated descriptions are omitted.

상기 특징 지도 버퍼, 상기 필터 버퍼, 상기 곱셈 연산부, 상기 연산 선택부, 상기 누산 연산부 및 상기 덧셈트리 연산부는 복수개이고, 상기 병렬로 서로 곱하는 단계(S100), 상기 연산 선택 단계(S200), 상기 누산하는 단계(S300) 및 상기 덧셈트리 형식으로 더하는 단계(S400)는 각 상기 특징 지도 버퍼, 상기 필터 버퍼, 상기 곱셈 연산부, 상기 연산 선택부, 상기 누산 연산부 및 상기 덧셈트리 연산부에서 독립적으로 수행될 수 있다.The feature map buffer, the filter buffer, the multiplication operation unit, the operation selection unit, the accumulation operation unit, and the addition tree operation unit are plural, and multiplying each other in parallel (S100), the operation selection step (S200), and the accumulation Step S300 and step S400 of adding in the form of the addition tree may be independently performed in each of the feature map buffer, the filter buffer, the multiplication operation unit, the operation selection unit, the accumulation operation unit, and the addition tree operation unit. have.

상기 특징 지도 버퍼의 개수는 상기 필터 버퍼의 개수와 동일할 수 있다. 상기 연산 선택부는 선택입력을 포함하는 디멀티플렉서(Demux)일 수 있다. 상기 누산 연산부의 개수는 상기 필터 버퍼의 폭과 같고, 상기 덧셈트리 연산부의 입력의 개수는 상기 필터 버퍼의 폭과 같을 수 있다. 상기 필터 버퍼가 필터들을 인터리빙 방식으로 저장하는 단계(미도시)를 더 포함할 수 있다. The number of the feature map buffers may be the same as the number of filter buffers. The operation selection unit may be a demultiplexer (Demux) including a selection input. The number of the accumulation operation units may be equal to the width of the filter buffer, and the number of inputs of the addition tree operation units may be equal to the width of the filter buffer. The filter buffer may further include storing filters in an interleaving manner (not shown).

본 발명의 일 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법은 합성곱 선택 단계(S90)를 더 포함할 수 있다. 상기 합성곱 선택 단계(S90)에서는 합성곱 선택부가 상기 특징 지도 버퍼의 출력 및 상기 필터 버퍼의 출력을 상기 곱셈 연산부 또는 타 연산부 중 어느 하나의 입력으로 선택하여 전송할 수 있다.The selective data processing method of the convolution layer according to an embodiment of the present invention may further include a convolution selection step (S90). In the convolution selection step (S90 ), the convolution selection unit may select and transmit the output of the feature map buffer and the output of the filter buffer as an input of the multiplication operation unit or another operation unit.

상기 타 연산부는 상기 특징 지도 버퍼의 출력 및 상기 필터 버퍼의 출력을 입력으로 하여 서로 더하는 덧셈 연산부 또는 상기 특징 지도 버퍼의 출력 및 상기 필터 버퍼의 출력을 입력으로 하여 산술 논리 연산을 수행하는 ALU(산술논리연산장치: Arithmetic and Logic Unit)일 수 있다.The other operation unit is an ALU (arithmetic) that performs an arithmetic logic operation by using the output of the feature map buffer and the output of the filter buffer as inputs, or an addition operation unit or the output of the feature map buffer and the output of the filter buffer as inputs. It may be a logical operation unit: Arithmetic and Logic Unit.

본 발명의 일 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법은 서로 더하는 단계(S500)를 더 포함할 수 있으며, 상기 서로 더하는 단계(S500)에서는 상기 특징 지도 버퍼의 출력 및 상기 필터 버퍼의 출력을 입력으로 하여 서로 더하는 연산을 수행할 수 있다. 상기 서로 더하는 단계(S500)는 상기 선택적 합성곱 뉴럴 프로세서의 상기 덧셈 연산부의 설명과 동일하므로 반복되는 설명은 생략한다.The selective data processing method of the convolution layer according to an embodiment of the present invention may further include adding each other (S500), and in the adding (S500), outputting the feature map buffer and outputting the filter buffer You can perform operations to add each other as input. The step of adding each other (S500) is the same as that of the addition operation unit of the selective convolutional neural processor, so repeated descriptions are omitted.

본 발명의 일 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법은 산술 논리 연산을 수행하는 단계(S600)를 더 포함할 수 있으며, 상기 산술 논리 연산을 수행하는 단계(S600)에서는 상기 특징 지도 버퍼의 출력 및 상기 필터 버퍼의 출력을 입력으로 하여 산술 논리 연산을 수행할 수 있다. 상기 산술 논리 연산을 수행하는 단계(S600)는 상기 선택적 합성곱 뉴럴 프로세서의 상기 ALU의 설명과 동일하므로 반복되는 설명은 생략한다.The selective data processing method of the convolution layer according to an embodiment of the present invention may further include performing an arithmetic logic operation (S600), and in the performing the arithmetic logic operation (S600), the feature map buffer An arithmetic logic operation may be performed using the output of and the output of the filter buffer as inputs. The step of performing the arithmetic logic operation (S600) is the same as that of the ALU of the selective convolutional neural processor, so repeated descriptions are omitted.

도 10을 참조하여 본 발명의 일 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법의 단계별 연관관계를 설명하면, 상기 연산 선택 단계(S200)는 상기 병렬로 서로 곱하는 단계(S100)의 수행 후에 수행될 수 있다. 상기 누산하는 단계(S300) 및 상기 덧셈트리 형식으로 더하는 단계(S400)는 상기 연산 선택 단계(S200)에서 선택된 경우에 수행될 수 있다.Referring to Figure 10 to describe the step-by-step correlation of the selective data processing method of the composite layer according to an embodiment of the present invention, the operation selection step (S200) is performed after performing the step of multiplying each other in parallel (S100) Can be. The accumulating step (S300) and adding in the addition tree form (S400) may be performed when selected in the operation selection step (S200).

도 10을 참조하여 본 발명의 일 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법의 단계별 연관관계를 설명하면, 상기 서로 더하는 단계(S500) 및 상기 병렬로 서로 곱하는 단계(S100)는 상기 합성곱 연산 선택 단계(S90)에서 선택된 경우에 수행될 수 있다. 상기 연산 선택 단계(S200)는 상기 병렬로 서로 곱하는 단계(S100)의 수행 후에 수행될 수 있다. 상기 누산하는 단계(S300) 및 상기 덧셈트리 형식으로 더하는 단계(S400)는 상기 연산 선택 단계(S200)에서 선택된 경우에 수행될 수 있다.Referring to FIG. 10, a step-by-step relationship of a method for selectively processing a data of a composite product layer according to an embodiment of the present invention is described above. The step of adding each other (S500) and the step of multiplying each other in parallel (S100) are the composite products. It may be performed when selected in the operation selection step (S90). The operation selection step (S200) may be performed after performing the step (S100) of multiplying each other in parallel. The accumulating step (S300) and adding in the addition tree form (S400) may be performed when selected in the operation selection step (S200).

도 11을 참조하여 본 발명의 일 실시예에 따른 합성곱 계층의 선택적 데이터 처리 방법의 단계별 연관관계를 설명하면, 상기 산술 논리 연산을 수행하는 단계(S600) 및 상기 병렬로 서로 곱하는 단계(S100)는 상기 합성곱 연산 선택 단계(S90)에서 선택된 경우에 수행될 수 있다. 상기 연산 선택 단계(S200)는 상기 병렬로 서로 곱하는 단계(S100)의 수행 후에 수행될 수 있다. 상기 누산하는 단계(S300) 및 상기 덧셈트리 형식으로 더하는 단계(S400)는 상기 연산 선택 단계(S200)에서 선택된 경우에 수행될 수 있다.Referring to FIG. 11, a step-by-step correlation of a method for selectively processing a data of a composite product layer according to an embodiment of the present invention is to perform the arithmetic logic operation (S600) and to multiply each other in parallel (S100). Can be performed when selected in the convolution operation selection step (S90). The operation selection step (S200) may be performed after performing the step (S100) of multiplying each other in parallel. The accumulating step (S300) and adding in the addition tree form (S400) may be performed when selected in the operation selection step (S200).

상기 필터들을 인터리빙 방식으로 저장하는 단계(미도시)는 상기 병렬로 서로 곱하는 단계(S100)의 수행 전 또는 상기 합성곱 연산 선택 단계(S90)의 이전에 수행될 수 있다.The step of storing the filters in an interleaving manner (not shown) may be performed before the step of multiplying each other in parallel (S100) or before the step of selecting the convolution operation (S90).

본 발명의 실시예들에 따르면, 연산 선택부는 누산 연산부와 덧셈트리 연산부를, 합성곱 선택부는 곱셈 연산부와 타 연산부를 선택하여 합성곱 계층의 데이터패스를 유동적으로 결정할 수 있다. 따라서, 필요에 따라 효과적인 합성곱 계층의 데이터패스를 선택하여 하드웨어의 가속도를 높이고 다양한 연산의 선택적 데이터패스를 제공하여 하드웨어의 활용도를 높일 수 있다. According to embodiments of the present invention, the operation selection unit may flexibly determine the data path of the convolution layer by selecting the accumulation operation unit and the addition tree operation unit, and the multiplication selection unit the multiplication operation unit and other operation units. Accordingly, if necessary, an effective convolutional layer data path can be selected to increase the acceleration of the hardware and to provide an optional data path for various operations to increase the utilization of the hardware.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 통상의 기술자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described above with reference to embodiments, those skilled in the art can variously modify and change the present invention without departing from the spirit and scope of the present invention as set forth in the claims below. You will understand that there is.

100: 곱셈 연산부
200: 연산 선택부
300: 누산 연산부
400: 덧셈트리 연산부
500: 합성곱 선택부100: multiplication operator
200: operation selection unit
300: Accumulation calculator
400: addition tree operation unit
500: composite product selection unit

Claims

In an optional convolutional neural processor comprising a feature map buffer, a filter buffer and a convolutional layer,
The composite product layer,
A multiplication operation unit multiplying each other in parallel by using an output of the feature map buffer including a feature map and an output of the filter buffer including a filter as inputs;
An accumulation operation unit accumulating the output of the multiplication operation unit;
An addition tree operation unit for adding the output of the multiplication operation unit in an addition tree format; And
An optional convolutional neural processor including an operation selection unit that selects and transmits the output of the multiplication operation unit as an input of the accumulation operation unit or the addition tree operation unit.

The selective convolutional neural processor according to claim 1, wherein the feature map buffer, the filter buffer, the multiplication operator, the operation selector, the accumulation operator, and the addition tree operator are plural.

The method according to claim 1, wherein the number of the feature map buffers is the same as that of the filter buffers.

The method of claim 1. The convolution layer performs convolution for each channel,
The multiplication operation unit is an optional convolutional neural processor that performs multiplication in a channel direction of the feature map.

According to an embodiment of the present invention in claim 1, the convolution layer performs convolution per channel and convolution per pixel,
The multiplication operation unit performs multiplication in the channel direction of the feature map,
The operation selector transmits the output of the multiplication operation unit to the input of the accumulation operation unit when performing the multiplication by channel, and the output of the multiplication operation unit to the input of the addition tree operation unit when performing the pixel-by-pixel synthesis operation. Convolutional neural processor.

According to one embodiment of the present invention, the optional feature further comprises a convolution product selection unit for selecting and transmitting the output of the feature map buffer and the output of the filter buffer as an input of one of the multiplication operation unit or another operation unit. Convolutional neural processor.

In one embodiment of the present invention, the other operation unit,
An arithmetic logic operation device that performs an arithmetic logic operation by using the output of the feature map buffer and the output of the filter buffer as inputs, or an addition operation unit or the output of the feature map buffer and the output of the filter buffer as inputs: Arithmetic and Logic Unit) is an optional convolutional neural processor.

The method according to claim 1, wherein the operation selector is a demultiplexer (Demux) including a selection input.

The selective convolutional neural processor according to claim 1, wherein filters are stored in the filter buffer in an interleaving manner.

In one embodiment of the present invention according to claim 1, the number of the accumulation operation unit is equal to the width of the filter buffer,
The number of inputs of the addition tree operator is an optional convolutional neural processor equal to the width of the filter buffer.

In the selective data processing method of the convolutional layer performed in an optional convolutional neural processor including a feature map buffer, a filter buffer, and a convolutional layer,
Multiplying and multiplying in parallel the output of the feature map buffer including the feature map and the output of the filter buffer containing the filter as inputs;
An operation selection step in which the operation selection unit selects and transmits the output of the multiplication operation unit as an input of an accumulation operation unit or an addition tree operation unit;
Accumulating the output of the multiplication operation unit by the accumulation operation unit; And
And adding an output of the multiplication operation unit in an addition tree format by an addition tree operation unit.

12. The method of claim 11, wherein the feature map buffer, the filter buffer, the multiplication operation unit, the operation selection unit, the accumulation operation unit, and the addition tree operation unit are plural,
The step of multiplying each other in parallel, the operation selection step, the accumulation step, and the addition in the form of an addition tree are each of the feature map buffer, the filter buffer, the multiplication operation unit, the operation selection unit, the accumulation operation unit, and the addition tree operation unit. An independent method of processing the data of the convolutional layer.

12. The method of claim 11, wherein the number of feature map buffers is the same as the number of filter buffers.

12. The convolution layer performs convolution for each channel,
In the step of multiplying each other in parallel, the multiplication operation unit performs multiplication in the channel direction of the feature map.

The method of claim 14,
The convolution layer performs convolution by channel and convolution by pixel,
In the step of multiplying each other in parallel, the multiplication operator performs multiplication in the channel direction of the feature map,
In the operation selection step, the operation selection unit transmits the output of the multiplication operation unit to the input of the accumulation operation unit when performing the channel-wise synthesis product, and when performing the pixel-by-pixel synthesis product, outputs the multiplication operation unit to the addition tree operation unit. Selective data processing method of convolutional layer to transmit as input.

12. The method of claim 11, wherein the convolution product selection unit selects and transmits the output of the feature map buffer and the output of the filter buffer as an input of the multiplication operation unit or another operation unit. Optional data processing method of the convolution layer further comprising a.

12. The method of claim 11, wherein the other calculation unit adds the output of the feature map buffer and the output of the filter buffer as inputs, or an addition operation unit or the output of the feature map buffer and the output of the filter buffer. ALU (Arithmetic Logic Unit: Arithmetic and Logic Unit) that performs arithmetic logic operations as input.

12. The method of claim 11, wherein the operation selector is a demultiplexer (Demux) including a selection input.

12. The method of claim 11, further comprising the filter buffer storing the filters in an interleaved manner.

In one embodiment of the present invention according to claim 11, the number of the accumulation operation unit is equal to the width of the filter buffer,
The number of inputs of the addition tree operation unit is a method of selectively processing data of a convolutional layer equal to the width of the filter buffer.