JP2001084171A

JP2001084171A - Image processing device

Info

Publication number: JP2001084171A
Application number: JP25801699A
Authority: JP
Inventors: Shunichi Ishiwatari; 俊一石渡
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-09-10
Filing date: 1999-09-10
Publication date: 2001-03-30

Abstract

PROBLEM TO BE SOLVED: To miniaturize the constitution by sharing a circuit for operating plural different segmentation processing to plural processors. SOLUTION: In this picture processor, one part of data read from a DRAM 4 being an outside memory connected with a shared bus 2 connecting plural processors 1A and 1B in parallel is segmented by a funnel shifter 31 being a first segmenting circuit, and the segmented data are segmented by a second segmenting circuit, and written through local buses 6A and 6B in the processor in local memories 7A and 7B.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、記憶された画像デ
ータを画面表示に適した操作を行う画像処理装置に関
し、特にＭＰＥＧなどの国際標準に基づいて画像圧縮、
伸長を行う画像処理装置に使用されるものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image processing apparatus for performing an operation suitable for displaying stored image data on a screen.
It is used in an image processing device that performs decompression.

【０００２】[0002]

【従来の技術】従来、この種の画像処理装置の基本構成
としては、例えば図８に示すようなものが知られてい
る。図８において、画像処理装置は、高い演算性能を実
現するためにマルチプロセッサ構成とし、また、プロセ
ッサ間でデータの共有が容易に行えるように、複数のプ
ロセッサコア１０１Ａ、１０１Ｂを共有バス１０２で結
合して、このバス１０２にＤＲＡＭコントローラ１０３
を介して大容量のメインメモリとなるＤＲＡＭ１０４を
接続する。加えて、プロセッサコア１０１Ａ、１０１Ｂ
の動作と並行してデータ転送が効率良く行えるように、
共有バス１０２とプロセッサコア１０１Ａ、１０１Ｂ間
のそれぞれにブリッジ１０５Ａ、１０５Ｂを入れ、さら
にこのブリッジ１０５Ａ、１０５Ｂにローカルバス１０
６Ａ、１０６Ｂを接続し、このローカルバス１０５Ａ、
１０５Ｂに小容量だが高速のローカルメモリ１０７Ａ、
１０７Ｂを接続し、ローカルメモリ１０７Ａ、１０７Ｂ
とメインメモリとなるＤＲＡＭ１０４との間でＤＭＡデ
ータ転送を制御するＤＭＡ（ダイレクト・メモリ・アク
セス）コントローラＡ、Ｂを備えるという構成をとる。
ここで、例えば外部メモリのＤＲＡＭ１０４と共有バス
１０２のデータ幅は共に８バイト幅とし、１ワードを８
バイトと定義する。2. Description of the Related Art Conventionally, as a basic configuration of an image processing apparatus of this kind, for example, the one shown in FIG. 8 is known. In FIG. 8, the image processing apparatus has a multi-processor configuration in order to realize high arithmetic performance, and a plurality of processor cores 101A and 101B are connected by a shared bus 102 so that data can be easily shared between the processors. Then, a DRAM controller 103 is connected to the bus 102.
, A DRAM 104 serving as a large-capacity main memory is connected. In addition, the processor cores 101A, 101B
In order to perform data transfer efficiently in parallel with the operation of
Bridges 105A and 105B are inserted between the shared bus 102 and the processor cores 101A and 101B, respectively.
6A, 106B, and the local bus 105A,
105B, small capacity but high speed local memory 107A,
107B, and the local memories 107A, 107B
And DMA (direct memory access) controllers A and B for controlling DMA data transfer between the DRAM 104 serving as a main memory.
Here, for example, the data widths of the external memory DRAM 104 and the shared bus 102 are both 8 bytes, and one word is 8 bytes.
Defined as bytes.

【０００３】上記構成の画像処理装置において、以下に
説明する処理が行われる。まず、１つまたは複数のプロ
セッサにおいて、図９に示すように、ＤＲＡＭ１０４か
ら読み出された３ワードの中からワード境界に整合され
るとは限らない連続する１８バイトを切り出し、８バイ
ト幅のローカルメモリ１０７Ａ、１０７Ｂに順次書くと
いう動作を繰り返す。また、これと似ているが、１つま
たは複数のプロセッサにおいて、図１０に示すように、
ＤＲＡＭ１０４から読み出された２ワードの中からワー
ド境界に整合されるとは限らない連続する８バイトを切
り出し、８バイト幅のローカルメモリメモリ１０７Ａ、
１０７Ｂに書くという動作を繰り返す。但し、両方の処
理を行うプロセッサもあるが、後者しか行わないプロセ
ッサもある。In the image processing apparatus having the above configuration, the following processing is performed. First, in one or a plurality of processors, as shown in FIG. 9, a continuous 18 bytes that are not always aligned with a word boundary are cut out from three words read from the DRAM 104, and a local 8-byte width is cut out. The operation of sequentially writing data in the memories 107A and 107B is repeated. Also similar, but in one or more processors, as shown in FIG.
From the two words read from the DRAM 104, eight consecutive bytes not necessarily aligned with a word boundary are cut out, and the local memory memory 107A having a width of eight bytes is extracted.
The operation of writing to 107B is repeated. However, some processors perform both processes, while others perform only the latter.

【０００４】このような処理において、連続する１８バ
イトを読み出そうとすると、まずＤＲＡＭ１０４の３つ
の連続するアドレスから合計３ワード（２４バイト）の
データを読み出し、次にこのデータの中から必要な連続
する１８バイトのみを切り出すという処理が必要にな
る。また、ワード境界に整合されない連続する８バイト
をＤＲＡＭ１０４から読み出そうとすると、ＤＲＡＭ１
０４の２つの連続するアドレスから合計２ワード（１６
バイト）のデータを読み出し、次にこのデータの中から
必要な連続する８バイトのみを切り出すという処理が必
要になる。In such a process, when trying to read consecutive 18 bytes, data of a total of three words (24 bytes) is first read from three consecutive addresses of the DRAM 104, and then necessary data is read from the data. It is necessary to perform processing of cutting out only continuous 18 bytes. If an attempt is made to read from the DRAM 104 eight consecutive bytes that are not aligned with a word boundary, the DRAM 1
04 from two consecutive addresses for a total of two words (16
(Byte) data, and then a process of cutting out only necessary continuous 8 bytes from the data is required.

【０００５】このような処理を図８に示す画像処理装置
で行う場合に、従来では、図１１に示すように、図９に
示す処理を行う切り出し回路１０８Ａと図１０に示す処
理を行う切り出し回路１０８Ｂをそれぞれ別々に用意
し、図９に示す処理を行う切り出し回路１０８Ａをロー
カルバス１０６Ａとローカルメモリ１０７Ａとの間に挿
入し、図１０に示す処理を行う切り出し回路１０８Ｂを
ローカルバス１０６Ｂとローカルメモリ１０７Ｂとの間
に挿入するようにしていた。When such processing is performed by the image processing apparatus shown in FIG. 8, conventionally, as shown in FIG. 11, a clipping circuit 108A for performing the processing shown in FIG. 9 and a clipping circuit for performing the processing shown in FIG. 108B are separately prepared, a cutout circuit 108A for performing the processing shown in FIG. 9 is inserted between the local bus 106A and the local memory 107A, and a cutout circuit 108B for performing the processing shown in FIG. 10 is connected to the local bus 106B and the local memory. 107B.

【０００６】[0006]

【発明が解決しようとする課題】以上説明したように、
画像データを複数の異なる切り出し処理を行う従来の画
像処理装置においては、それぞれの切り出し処理が異な
るため、それぞれの切り出し処理に適した回路をそれぞ
れ別々に用意し、画像データのそれぞれの切り出し処理
をその処理に対応した回路で行っていた。このため、構
成の大型化を招くといった不具合を招いていた。As described above,
In a conventional image processing apparatus that performs a plurality of different cutout processes on image data, since each cutout process is different, a circuit suitable for each cutout process is separately prepared, and each cutout process of the image data is performed. The processing was performed by a circuit corresponding to the processing. For this reason, a problem such as an increase in the size of the configuration has been caused.

【０００７】そこで、この発明は、上記に鑑みてなされ
たものであり、その目的とするところは、複数のプロセ
ッサに対して複数の異なる切り出し処理を行う回路を共
有化して、構成の小型化を達成し得る画像処理装置を提
供することにある。Accordingly, the present invention has been made in view of the above, and it is an object of the present invention to reduce the size of the configuration by sharing a circuit for performing a plurality of different extraction processes for a plurality of processors. An object of the present invention is to provide an image processing apparatus that can be achieved.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するため
に、課題を解決する第１の手段は、データ転送を制御す
るプロセッサコアと、ローカルバスに接続されたローカ
ルメモリと、前記プロセッサコアと前記ローカルバスと
を接続するブリッジとを備えた複数のプロセッサが前記
それぞれのブリッジを介して並列に共有バスに接続さ
れ、前記共有バスにメインメモリ用のメモリコントロー
ラを介してメインメモリが接続され、前記複数のプロセ
ッサコアの動作と並行して前記ローカルメモリならびに
前記メインメモリ間でデータ転送を行い、画像データを
処理する画像処理装置において、前記メインメモリ用の
メモリコントローラに設けられ、前記メインメモリから
読み出されたデータを受けて、該データの一部を切り出
し、切り出したデータを前記共有バスに出力する第１の
切り出し回路と、前記それぞれのブリッジ内に設けら
れ、前記第１の切り出し回路から前記共有バスに出力さ
れたデータを受けて、該データの一部を切り出し、切り
出したデータを前記ローカルバスを介して前記ローカル
メモリに与える第２の切り出し回路とを有することを特
徴とする。Means for Solving the Problems To achieve the above object, a first means for solving the problems is a processor core for controlling data transfer, a local memory connected to a local bus, and the processor core. A plurality of processors including a bridge connecting the local bus and a plurality of processors are connected to the shared bus in parallel via the respective bridges, and a main memory is connected to the shared bus via a memory controller for the main memory; In the image processing device that performs data transfer between the local memory and the main memory in parallel with the operation of the plurality of processor cores and processes image data, the image processing device is provided in a memory controller for the main memory, and Upon receiving the read data, a part of the data is cut out, and the cut out data is A first cutout circuit that outputs the data to the shared bus, provided in each of the bridges, receives data output from the first cutout circuit to the shared bus, and cuts out a part of the data; A second cutout circuit for giving the cutout data to the local memory via the local bus.

【０００９】第２の手段は、前記第１の手段において、
前記第１の切り出し回路は、入力データを所定のシフト
量だけシフトするシフタが複数段縦続接続されたファネ
ルシフタからなることを特徴とする。[0009] The second means is the first means,
The first cutout circuit is characterized in that a shifter for shifting input data by a predetermined shift amount comprises a funnel shifter in which a plurality of stages are cascaded.

【００１０】前記第３の手段は、前記第２の手段におい
て、ファネルシフタは、連続した複数の単位データから
該単位データの境界に整合されない連続したデータを切
り出すことを特徴とする。The third means is characterized in that, in the second means, the funnel shifter cuts out continuous data which is not aligned with a boundary of the unit data from a plurality of continuous unit data.

【００１１】前記第４の手段は、前記１，２又は３の手
段において、第前記第２の切り出し回路は、シフタ又は
マルチプレクサを備えたバイト変換回路からなることを
特徴とする。According to a fourth aspect of the present invention, in the first, second or third aspect, the second cutout circuit comprises a byte conversion circuit having a shifter or a multiplexer.

【００１２】前記第５の手段は、前記１，２，３又は４
の手段において、前記共有バスを８バイト幅とし、前記
ローカルバスの少なくとも１つを９バイト幅とし、該９
バイト幅のローカルバスとは異なる前記ローカルバスの
少なくとも１つを８バイト幅とすることを特徴とする。[0012] The fifth means may be the one, two, three or four.
Means, wherein said shared bus is 8 bytes wide, at least one of said local buses is 9 bytes wide,
At least one of the local buses different from the byte-wide local bus has an 8-byte width.

【００１３】[0013]

【発明の実施の形態】以下、図面を用いてこの発明の実
施形態を説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１４】図１はこの発明の一実施形態に係る画像処
理装置の構成を示す図である。図１において、この実施
形態の画像処理装置は、図８の基本構成に示すように、
マルチプロセッサシステムを採用して、プロセッサコア
１Ａ、１Ｂ、共有バス２、メモリコントローラとなるＤ
ＲＡＭコントローラ３、メインメモリとして機能するＤ
ＲＡＭ４、ブリッジ５Ａ、５Ｂ、ローカルバス６Ａ、６
Ｂ、ローカルメモリ７Ａ、７Ｂ、ＤＭＡコントローラ
Ａ、Ｂを備え、図８に示す構成に対してメインメモリの
コントローラとなるＤＲＡＭコントローラ３、ブリッジ
５Ａ、９バイトのローカルバス６Ａが異なり、他の構成
は同様であり、前述した２種類の切り出し処理を行う。
なお、このような画像処理装置は、１チップのＩＣに構
成され、又はＤＲＡＭ４を除いた構成が１チップのＩＣ
で構成される。また、メインメモリはＤＲＡＭ４に限る
ことはなく、他の構成のメモリであってもかまわない。FIG. 1 is a diagram showing a configuration of an image processing apparatus according to an embodiment of the present invention. In FIG. 1, the image processing apparatus according to this embodiment has a basic configuration shown in FIG.
By adopting a multiprocessor system, the processor cores 1A and 1B, the shared bus 2, and the memory controller D
RAM controller 3, D functioning as main memory
RAM 4, bridges 5A, 5B, local buses 6A, 6
B, a local memory 7A, 7B, a DMA controller A, B, and a DRAM controller 3 serving as a main memory controller, a bridge 5A, and a 9-byte local bus 6A are different from those shown in FIG. Similarly, the two types of cutout processing described above are performed.
Note that such an image processing apparatus is configured as a one-chip IC, or the configuration excluding the DRAM 4 is a one-chip IC.
It consists of. The main memory is not limited to the DRAM 4, but may be a memory having another configuration.

【００１５】ＤＲＡＭコントローラ３は、メインメモリ
のコントローラとなるメモリコントローラとして機能
し、第１の切り出し回路となるファネルシフタ３１を備
えている。ファネルシフタ３１は、図２に示すように、
１１個のマルチプレクサ（ＭＵＸ）３２からなる４バイ
トのシフタと、９個のマルチプレクサ３３からなる２バ
イトのシフタと、８個のマルチプレクサ３４からなる１
バイトのシフタで構成され、ファネルシフタ３１はＤＲ
ＡＭ４から与えられるデータを受けて保持する８バイト
幅のレジスタ３５と、このレジスタ３２に保持されたデ
ータを受けて保持する８バイト幅のレジスタ３６から与
えられるデータを受けて、プロセッサコア１Ａ、１Ｂか
ら指示されるシフト量に基づいてシフトを行い、シフト
したデータを８バイト幅のレジスタ３７に与えて保持す
る。The DRAM controller 3 functions as a memory controller serving as a main memory controller, and includes a funnel shifter 31 serving as a first cutout circuit. The funnel shifter 31, as shown in FIG.
A 4-byte shifter composed of eleven multiplexers (MUX) 32, a 2-byte shifter composed of nine multiplexers 33, and one composed of eight multiplexers
The funnel shifter 31 is composed of a byte shifter.
The processor cores 1A and 1B receive the data supplied from the 8-byte register 35 for receiving and holding the data supplied from the AM 4 and the data provided from the 8-byte register 36 for receiving and holding the data stored in the register 32. The shift is performed based on the shift amount instructed from, and the shifted data is given to and held in an 8-byte width register 37.

【００１６】ブリッジ５Ａは、共有バス２とローカルバ
ス６Ａとの間で転送されるデータのバイト変換を行う第
２の切り出し回路となるシフタ５１を備えている。この
シフタ５１は、図４に示すように、９個のマルチプレク
サからなる１バイトのシフタであり、共有バス２から与
えられるデータを受ける８バイト幅のレジスタ５２に保
持された一部のデータと、このレジスタ５２に保持され
たデータを受ける８バイト幅のレジスタ５３に保持され
たデータを入力とし、８バイトのデータを９バイトのデ
ータにバイト変換し、変換したデータをローカルバス６
Ａ、６Ｂに出力する。The bridge 5A includes a shifter 51 serving as a second extraction circuit for performing byte conversion of data transferred between the shared bus 2 and the local bus 6A. As shown in FIG. 4, the shifter 51 is a one-byte shifter composed of nine multiplexers. The shifter 51 includes a part of data held in an 8-byte width register 52 receiving data supplied from the shared bus 2, and The data held in the register 52, which receives the data held in the register 52, is input, the 8-byte data is converted into 9-byte data by byte conversion, and the converted data is transferred to the local bus 6.
A and 6B.

【００１７】このような構成において、図９に示すよう
に、ＤＲＡＭ４から３ワードのデータを読み出し、この
中からワード境界に整合されるとは限らない連続する１
８バイトを切り出す処理の流れを説明する。まず、ＤＲ
ＡＭ４から連続する３ワード（２４バイト）を読み出
し、ＤＲＡＭコントローラ３に内蔵されたファネルシフ
タ３１により必要な１８バイトを切り出して、共有バス
２を経由してブリッジ５Ａに転送する。次に、ブリッジ
５Ａで９バイト幅に変換してこれをローカルバス６Ａ経
由で順次ローカルメモリ７Ａに転送する。In such a configuration, as shown in FIG. 9, three words of data are read from the DRAM 4, and one of them is not necessarily aligned with a word boundary.
The flow of processing for cutting out 8 bytes will be described. First, DR
Three consecutive words (24 bytes) are read from the AM 4, necessary 18 bytes are cut out by the funnel shifter 31 built in the DRAM controller 3, and transferred to the bridge 5 A via the shared bus 2. Next, the data is converted into a 9-byte width by the bridge 5A and is sequentially transferred to the local memory 7A via the local bus 6A.

【００１８】３ワード（２４バイト）から１８バイトを
切り出す処理において、ＤＲＡＭ４からは１サイクルに
１ワードずつ読み出せるものとする。したがって、３ワ
ード（２４バイト）のデータを読み出すには３サイクル
かかる。このとき、切り出し処理は、最初の２ワードデ
ータをＤＲＡＭ４から読み出した段階で１ワード（８バ
イト）を切り出し、次の１ワードのデータをＤＲＡＭ４
から読み出した段階でその次の１ワード（バイト）を切
り出し、その次のサイクルで最後の２バイトを切り出す
という手順で行うことにする。こうすることで、ＤＲＡ
Ｍ４からデータを読み出すレートと、切り出した結果を
共有バス２に出力するレートが、１行あたり３サイクル
となって一致するので、余分なバッファを必要とせず都
合が良い。また、１サイクルあたりに転送する最大のバ
イト数が８なので、共有バス２の幅を８のままに抑える
ことができ都合が良い。In the process of cutting out 18 bytes from 3 words (24 bytes), it is assumed that one word can be read from the DRAM 4 per cycle. Therefore, it takes three cycles to read data of three words (24 bytes). At this time, in the cutout processing, one word (8 bytes) is cut out when the first two-word data is read from the DRAM 4, and the next one-word data is read out from the DRAM 4
The next one word (byte) is cut out at the stage of reading from, and the last two bytes are cut out in the next cycle. By doing this, DRA
Since the rate at which data is read from M4 and the rate at which the cut-out result is output to the shared bus 2 coincide with three cycles per row, no extra buffer is required, which is convenient. Further, since the maximum number of bytes to be transferred per cycle is 8, the width of the shared bus 2 can be kept at 8, which is convenient.

【００１９】但し、最終的にデータを書き込むローカル
メモリ７Ａの幅は９バイトであり、１サイクルに９バイ
トずつ書き込まなければならないので、途中で１サイク
ルあたり８バイトから９バイトにバイト変換しなければ
ならない。この変換をブリッジ５Ａのシフタ５１で行
う。共有バス２経由で１行中の最初の２ワード（１６バ
イト）のデータが転送されてきたら、この中から必要な
９バイトを切り出し、ローカルバス６Ａに出力する。次
に、最後のワード（有効なデータは２バイトのみ）が転
送されてきたら、この２バイトと前のサイクルで残って
いた７バイトの合計９バイトをローカルバス６Ａに出力
する。However, the width of the local memory 7A for finally writing data is 9 bytes, and it is necessary to write 9 bytes per cycle. Therefore, if the byte is converted from 8 bytes to 9 bytes per cycle on the way, No. This conversion is performed by the shifter 51 of the bridge 5A. When the data of the first two words (16 bytes) in one row is transferred via the shared bus 2, the necessary 9 bytes are cut out of the data and output to the local bus 6A. Next, when the last word (valid data is only 2 bytes) is transferred, a total of 9 bytes including the 2 bytes and the 7 bytes remaining in the previous cycle are output to the local bus 6A.

【００２０】次に、切り出し処理の詳細を、ワード境界
に対して２バイト右の位置からの連続する１８バイトを
切り出して９バイト幅のローカルメモリ７Ａに順次書く
場合を一例として、図３ならびに図４を参照して説明す
る。なお、図３、図４の正方形の中の数字は、ＤＲＡＭ
４中でのバイト位置を表しており、この中で２番から１
９番までの１８バイトをローカルメモリ７Ａに書く場合
を表している。Next, the details of the cut-out process will be described with reference to FIGS. 3 and 3 by taking as an example a case where 18 consecutive bytes from a position two bytes to the right of a word boundary are cut out and sequentially written in a 9-byte wide local memory 7A. This will be described with reference to FIG. The numbers in the squares in FIGS.
4 represents the byte position in the number 4, and
In this case, 18 bytes up to the ninth byte are written in the local memory 7A.

【００２１】まず、図３に示されていないサイクル０で
ＤＲＡＭ４から１ワード（８バイト）のデータを読み出
し、ＤＲＡＭコントローラ３内の右側の８バイト幅のレ
ジスタ３５に入れる。この結果図３のサイクル１の状態
になる。次にサイクル１では、次の１ワードをＤＲＡＭ
４から読み出し、これを右側の８バイト幅のレジスタ３
５に入れる。同時に右側の８バイト幅のレジスタ３５か
ら左側の８バイト幅のレジスタ３６にデータを転送す
る。この結果、ファネルシフタ入力が図３のサイクル２
に示す状態になる。First, in a cycle 0 not shown in FIG. 3, one word (8 bytes) of data is read from the DRAM 4 and stored in the right-side 8-byte register 35 in the DRAM controller 3. As a result, the state of cycle 1 in FIG. 3 is obtained. Next, in cycle 1, the next one word is
4 and read it out to the right, 8-byte wide register 3.
Put in 5. At the same time, data is transferred from the right 8-byte width register 35 to the left 8-byte width register 36. As a result, the funnel shifter input becomes the cycle 2 in FIG.
The state shown in is shown.

【００２２】サイクル２では、ファネルシフタ３１にお
けるシフト量を２に設定する。この結果、ファネルシフ
タ３１の出力が２番から９番までの８バイトになる。ま
た、ＤＲＡＭ４から次の１ワードを読み出し右側の８バ
イト幅のレジスタ３５に入れると共に、右側の８バイト
幅のレジスタ３５から左側の８バイト幅のレジスタ３６
にデータを転送する。この結果、共有バス２に出力され
るデータとファネルシフタ入力が図３に示すサイクル３
の状態になる。In cycle 2, the shift amount in the funnel shifter 31 is set to 2. As a result, the output of the funnel shifter 31 becomes 8 bytes from the second to the ninth. Also, the next one word is read out from the DRAM 4 and put into the right-side 8-byte width register 35, and the right-side 8-byte width register 35 is shifted to the left side 8-byte width register 36.
Transfer data to As a result, the data output to the shared bus 2 and the funnel shifter input correspond to the cycle 3 shown in FIG.
State.

【００２３】サイクル３では、ファネルシフタ３１にお
けるシフト量を２に設定する。この結果、ファネルシフ
タ３１の出力が１０番から１７番までの８バイトにな
る。また、ブリッジ５Ａが共有バス２からデータを取り
込み、ブリッジ内の右側の８バイト幅のレジスタ５２に
格納する。ＤＲＡＭ４からの読み出し及び読み出したデ
ータのレジスタへの格納の動作はどのサイクルでも同様
とする。この結果、ファネルシフタ３１の入力、共有バ
ス２、ブリッジ５Ａ内のバッファ（レジスタ５２、レジ
スタ５３）は図４に示すサイクル４の状態になる。In cycle 3, the shift amount in funnel shifter 31 is set to 2. As a result, the output of the funnel shifter 31 becomes 8 bytes from No. 10 to No. 17. Further, the bridge 5A takes in data from the shared bus 2 and stores the data in the right-side 8-byte register 52 in the bridge. The operation of reading from the DRAM 4 and storing the read data in the register is the same in any cycle. As a result, the input of the funnel shifter 31, the shared bus 2, and the buffers (registers 52 and 53) in the bridge 5A enter the state of cycle 4 shown in FIG.

【００２４】サイクル４では、ファネルシフタ３１にお
けるシフト量を２に設定する。この結果、図４に示すよ
うに、ファネルシフタ３１から１８番から２３番までの
６バイトが出力される。ブリッジ５Ａは、共有バス２か
らデータを取り込み右側の８バイト幅のレジスタ５２に
取り込むと共に、同時に右側の８バイト幅のレジスタ５
２から左側の８バイト幅のレジスタ５３にデータを転送
する。この結果、図４のサイクル５に示す状態になる。In cycle 4, the shift amount in funnel shifter 31 is set to 2. As a result, as shown in FIG. 4, six bytes from No. 18 to No. 23 are output from the funnel shifter 31. The bridge 5A fetches data from the shared bus 2 and fetches it into the right 8-byte width register 52, and at the same time, the right 8-byte width register 5
The data is transferred from 2 to the left-side 8-byte register 53. As a result, the state shown in cycle 5 of FIG. 4 is obtained.

【００２５】サイクル５では、ブリッジ内のシフタ５１
を構成するマルチプレクサで左側を選択する。この結
果、ローカルバス６Ａには、２番から１０番までの９バ
イトが出力される。また、共有バス２からデータを取り
込み右側の８バイト幅のレジスタ５２に取り込むと共
に、同時に右側の８バイト幅のレジスタ５２から左側の
８バイト幅のレジスタ５３にデータを転送する。この結
果、図４のサイクル６に示す状態になる。In cycle 5, the shifter 51 in the bridge
Is selected on the left side by the multiplexer constituting. As a result, 9 bytes from No. 2 to No. 10 are output to the local bus 6A. In addition, data is taken in from the shared bus 2 and taken into the right-side 8-byte width register 52, and at the same time, data is transferred from the right-side 8-byte width register 52 to the left-side 8-byte width register 53. As a result, the state shown in cycle 6 of FIG. 4 is obtained.

【００２６】サイクル６では、ブリッジ内のマルチプレ
クサで右側を選択する。これは、１バイト左にシフトす
る操作を行うことに相当する。この結果、ローカルバス
６Ａには、１１番から１９番までの９バイトが出力され
る。In cycle 6, the right side is selected by the multiplexer in the bridge. This corresponds to performing an operation of shifting one byte to the left. As a result, 9 bytes from No. 11 to No. 19 are output to the local bus 6A.

【００２７】以上のようにして、２番から１９番までの
ワード境界に整合されない連続する１８バイトが９バイ
トずつ順次ローカルメモリ７Ａに書かれる。この場合の
スループットは１行あたり３サイクルとなり、これはＤ
ＲＡＭ４から１行のデータを読み出すレートと一致する
ことになる。As described above, consecutive 18 bytes which are not aligned with the word boundaries from No. 2 to No. 19 are sequentially written in the local memory 7A by 9 bytes. The throughput in this case is 3 cycles per row, which is D
This matches the rate at which one row of data is read from the RAM 4.

【００２８】一方、図１０に示すように、ＤＲＡＭ４か
ら読み出された２ワードの中からワード境界に整合され
るとは限らない連続する８バイトを切り出し、８バイト
幅のローカルメモリ７Ｂに書くという切り出し処理は、
ＤＲＡＭ４から順次２ワードのデータを読み出し、読み
出したデータをＤＲＡＭコントローラ３のレジスタ３
５、レジスタ３６に順次格納保持し、格納保持したデー
タをファネルシフタ３１により所定量シフトし８バイト
のデータを切り出し、切り出した８バイトのデータをレ
ジスタ３７に格納保持し、保持したデータを共有バス
２、ブリッジ５Ｂ、ローカルバス６Ｂを介してローカル
メモリ７Ｂに書き込む。On the other hand, as shown in FIG. 10, eight consecutive bytes not necessarily aligned with a word boundary are cut out from two words read from the DRAM 4 and written into the local memory 7B having an 8-byte width. The clipping process is
Two words of data are sequentially read from the DRAM 4 and the read data is stored in the register 3 of the DRAM controller 3.
5. The data is sequentially stored and held in the register 36, the stored data is shifted by a predetermined amount by the funnel shifter 31, and 8-byte data is cut out. The cut-out 8-byte data is stored and held in the register 37, and the held data is stored in the shared bus 2. , The bridge 5B and the local bus 6B to the local memory 7B.

【００２９】このように、上記実施形態においては、ワ
ード境界に整合されるとは限らない連続する１８バイト
を読み出すことを想定して作られたプロセッサと、ワー
ド境界に整合されない連続する８バイトを読み出すこと
はあっても連続する１８バイトを読み出すことはないと
仮定して作られたプロセッサの間で、切り出しに必要な
回路を最大限共有しつつ、前者のために必要な第２の切
り出し回路の構成を最小限に抑えることができる。これ
により、構成の小型化を図ることが可能となる。As described above, in the above embodiment, a processor made on the assumption that 18 consecutive bytes not always aligned on a word boundary is read, and a continuous 8 bytes not aligned on a word boundary are read. The second cutout circuit necessary for the former, while sharing as much as possible the circuit necessary for cutout between processors made on the assumption that reading is performed but not reading 18 consecutive bytes, Can be minimized. This makes it possible to reduce the size of the configuration.

【００３０】次に、この発明の他の実施形態を説明す
る。Next, another embodiment of the present invention will be described.

【００３１】この実施形態の特徴とするところは、図２
に示すファネルシフタ３１に代えて図５に示す構成のフ
ァネルシフタ６２を採用し、かつ図２に示すブリッジ５
Ａのシフタ５１に代えて図５に示すマルチプレクサ６７
を採用したことにあり、他の構成は図１、図２と同様で
ある。The feature of this embodiment is shown in FIG.
A funnel shifter 62 having a configuration shown in FIG. 5 is employed in place of the funnel shifter 31 shown in FIG.
A multiplexer 67 shown in FIG.
The other configuration is the same as that of FIGS. 1 and 2.

【００３２】ファネルシフタ６２は、それぞれ８個のマ
ルチプレクサからなる４組のシフタ６３、６４、６５、
６６から構成されている。The funnel shifter 62 includes four sets of shifters 63, 64, 65, each including eight multiplexers.
66.

【００３３】次に、上述したと同様の切り出し処理を、
図６ならびに図７を参照して説明する。なお、図６、図
７において、正方形の中の数字は、ＤＲＡＭ４中でのバ
イト位置を表しており、この中で２番から１９番までの
１８バイトをローカルメモリ７Ａに書く場合を表してい
る。Next, the same clipping processing as described above is performed.
This will be described with reference to FIGS. 6 and 7, the numbers in the squares represent the byte positions in the DRAM 4, in which 18 bytes from the second to the 19th are written in the local memory 7A. .

【００３４】まず、図６に示されていないサイクル０で
ＤＲＡＭ４から１ワード（８バイト）のデータを読み出
し、ＤＲＡＭコントローラ３内の右側の８バイト幅のレ
ジスタ３５に入れる。この結果図６に示すサイクル１の
状態になる。First, in a cycle 0 not shown in FIG. 6, one word (8 bytes) of data is read from the DRAM 4 and stored in the right-side 8-byte register 35 in the DRAM controller 3. As a result, the state becomes the cycle 1 shown in FIG.

【００３５】次にサイクル１では、次の１ワードをＤＲ
ＡＭ４から読み出し、これを右側の８バイト幅のレジス
タ３５に入れる。同時に右側の８バイト幅のレジスタ３
５から左側の８バイト幅のレジスタ３６にデータを転送
する。この結果、ファネルシフタ入力が図６のサイクル
２に示す状態になる。Next, in cycle 1, the next one word is
The data is read from AM4, and is stored in an 8-byte width register 35 on the right side. At the same time, right-side 8-byte register 3
The data is transferred from 5 to the register 36 having a width of 8 bytes on the left side. As a result, the funnel shifter input enters the state shown in cycle 2 in FIG.

【００３６】サイクル２では、ファネルシフタ６２にお
けるシフト量を２に設定する。この結果、ファネルシフ
タ６２の出力が２番から９番までの８バイトになる。ま
た、ＤＲＡＭ４から次の１ワードを読み出し右側の８バ
イト幅のレジスタ３５に入れると共に、右側の８バイト
幅のレジスタ３５から左側の８バイト幅のレジスタ３６
にデータを転送する。この結果、共有バス２に出力され
るデータとファネルシフタ６２の入力が図６に示すサイ
クル３の状態になる。In cycle 2, the shift amount in the funnel shifter 62 is set to 2. As a result, the output of the funnel shifter 62 becomes 8 bytes from the second to the ninth. Further, the next one word is read from the DRAM 4 and put into the right-side 8-byte width register 35, and the right-side 8-byte width register 35 is shifted to the left side 8-byte width register 36.
Transfer data to As a result, the data output to the shared bus 2 and the input of the funnel shifter 62 enter the state of cycle 3 shown in FIG.

【００３７】サイクル３では、ファネルシフタ６２にお
けるシフト量を前のサイクルよりも１増やして３に設定
する。また、１０番のバイトだけは、ファネルシフタ６
２中の１番上の段の左から３番目のマルチプレクサで左
側の入力を選択する。この結果、図６に示すような形式
でファネルシフタ６２の出力が１０番から１７番までの
８バイトになる。また、ブリッジ５Ａが共有バス２から
データを取り込み、ブリッジ６Ａ内の右側の８バイト幅
のレジスタ５２に格納する。ＤＲＡＭ４からの読み出し
及び読み出したデータのレジスタへの格納の動作はどの
サイクルでも同様とする。この結果、ファネルシフタ入
力、共有バス２、ブリッジ内バッファは図７に示すサイ
クル４の状態になる。In cycle 3, the shift amount in the funnel shifter 62 is increased by one from the previous cycle and set to 3. Only the 10th byte is the funnel shifter 6
The leftmost input is selected by the third multiplexer from the left of the top stage in 2. As a result, the output of the funnel shifter 62 becomes 8 bytes from No. 10 to No. 17 in the format as shown in FIG. The bridge 5A takes in the data from the shared bus 2 and stores the data in the right-side 8-byte register 52 in the bridge 6A. The operation of reading from the DRAM 4 and storing the read data in the register is the same in any cycle. As a result, the funnel shifter input, the shared bus 2, and the buffer in the bridge enter the state of cycle 4 shown in FIG.

【００３８】サイクル４では、ファネルシフタ６２にお
けるシフト量をさらに１増やして４に設定する。また、
１８番と１９番の２バイトに関しては、ファネルシフタ
６２中の１番上の段の左から３番目と４番目のマルチプ
レクサで左側の入力を選択する。この結果、図７に示す
ような形式で１８番から２３番までの６バイトがファネ
ルシフタ６２から出力される。ブリッジ５Ａは、共有バ
ス２からデータを取り込み右側の８バイト幅のレジスタ
５２Ａに取り込むと共に、同時に右側の８バイト幅のレ
ジスタ５２から左側の８バイト幅のレジスタ５３にデー
タを転送する。この結果、図７のサイクル５に示す状態
になる。In cycle 4, the shift amount in the funnel shifter 62 is further increased by 1 and set to 4. Also,
With respect to the two bytes of Nos. 18 and 19, the leftmost input is selected by the third and fourth multiplexers from the left in the uppermost stage in the funnel shifter 62. As a result, six bytes from No. 18 to No. 23 are output from the funnel shifter 62 in a format as shown in FIG. The bridge 5A takes in data from the shared bus 2 and takes in the right-side 8-byte width register 52A, and simultaneously transfers the data from the right-side 8-byte width register 52 to the left-side 8-byte width register 53. As a result, the state shown in cycle 5 of FIG. 7 is obtained.

【００３９】サイクル５では、ブリッジ５Ａ内のマルチ
プレクサ６７で左側入力を選択する。この結果、ローカ
ルバス６Ａには、２番から１０番までの９バイトが出力
される。また、共有バス２からデータを取り込み右側の
８バイト幅のレジスタ５２に取り込むと共に、同時に右
側の８バイト幅のレジスタ５２から左側の８バイト幅の
レジスタ５３にデータを転送する。この結果、図７のサ
イクル６に示す状態になる。In cycle 5, the multiplexer 67 in the bridge 5A selects the left input. As a result, 9 bytes from No. 2 to No. 10 are output to the local bus 6A. In addition, data is taken in from the shared bus 2 and taken into the right-side 8-byte width register 52, and at the same time, data is transferred from the right-side 8-byte width register 52 to the left-side 8-byte width register 53. As a result, the state shown in cycle 6 of FIG. 7 is obtained.

【００４０】サイクル６では、ブリッジ５Ａ内のマルチ
プレクサ６７で右側入力を選択する。この結果、ローカ
ルバス６Ａには、１１番から１９番までの９バイトが出
力される。In cycle 6, the right input is selected by the multiplexer 67 in the bridge 5A. As a result, 9 bytes from No. 11 to No. 19 are output to the local bus 6A.

【００４１】以上のようにして、２番から１９番までの
ワード境界に整合されない連続する１８バイトが９バイ
トずつ順次ローカルメモリ７Ａに書かれる。この場合の
スループットは１行あたり３サイクルであり、これはＤ
ＲＡＭ４から１行を読み出すレートと一致している。As described above, consecutive 18 bytes that are not aligned with the word boundaries from No. 2 to No. 19 are sequentially written in the local memory 7A by 9 bytes. The throughput in this case is 3 cycles per row, which is D
It matches the rate at which one row is read from the RAM 4.

【００４２】一方、図１０に示すように、ＤＲＡＭ４か
ら読み出された２ワードの中からワード境界に整合され
るとは限らない連続する８バイトを切り出し、８バイト
幅のローカルメモリ７Ｂに書くという切り出し処理は、
先の実施形態と同様にして行われる。On the other hand, as shown in FIG. 10, eight consecutive bytes that are not necessarily aligned with word boundaries are cut out from two words read from the DRAM 4 and written into the local memory 7B having a width of eight bytes. The clipping process is
This is performed in the same manner as in the previous embodiment.

【００４３】先の実施形態とこの実施形態のそれぞれの
構成を比較すると、図２に示すファネルシフタ３１なら
びにブリッジ５Ａのシフタ５１を構成するマルチプレク
サの数は３７個であるのに対して、図５に示す同様の構
成では３３個に削減されている。これにより、この実施
形態は、前記実施形態に比べて、前記実施形態ではシフ
ト量の制御が一通りで制御が容易であるのに対して、こ
の実施形態ではシフト量の制御が複数通りとなるが、よ
り一層構成を小型化することが可能となる。Comparing the respective configurations of the previous embodiment and this embodiment, the number of the multiplexers constituting the funnel shifter 31 and the shifter 51 of the bridge 5A shown in FIG. 2 is 37, whereas FIG. In the similar configuration shown, the number is reduced to 33. Thereby, in this embodiment, the control of the shift amount is simple and the control is easy in the above-described embodiment as compared with the above-described embodiment, whereas the control of the shift amount is plural in this embodiment. However, the configuration can be further reduced in size.

【００４４】[0044]

【発明の効果】以上説明したように、この発明によれ
ば、切り出し回路のハードウェアの大部分を複数のプロ
セッサで共有することができ、回路規模を従来に比べて
削減することができる。この結果、構成を小型化するこ
とが可能となる。As described above, according to the present invention, most of the hardware of the cutout circuit can be shared by a plurality of processors, and the circuit scale can be reduced as compared with the related art. As a result, the configuration can be reduced in size.

[Brief description of the drawings]

【図１】本発明の一実施形態形態に係る画像処理装置の
構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of an image processing apparatus according to an embodiment of the present invention.

【図２】図１に示すＤＲＡＭコントローラとブリッジの
構成を示す図である。FIG. 2 is a diagram showing a configuration of a DRAM controller and a bridge shown in FIG. 1;

【図３】本発明の一実施形態形態に係る画像処理装置に
おける動作のタイミングを示す図である。FIG. 3 is a diagram illustrating operation timings in the image processing apparatus according to the embodiment of the present invention.

【図４】本発明の一実施形態形態に係る画像処理装置に
おける動作のタイミングを示す図である。FIG. 4 is a diagram illustrating operation timings in the image processing apparatus according to the embodiment of the present invention.

【図５】本発明の他の実施形態形態に係る画像処理装置
におけるＤＲＡＭコントローラとブリッジの構成を示す
図である。FIG. 5 is a diagram illustrating a configuration of a DRAM controller and a bridge in an image processing apparatus according to another embodiment of the present invention.

【図６】本発明の他の実施形態形態に係る画像処理装置
における動作のタイミングを示す図である。FIG. 6 is a diagram showing operation timings in an image processing apparatus according to another embodiment of the present invention.

【図７】本発明の他の実施形態形態に係る画像処理装置
における動作のタイミングを示す図である。FIG. 7 is a diagram illustrating operation timings in an image processing apparatus according to another embodiment of the present invention.

【図８】従来の画像処理装置の一構成を示すブロック図
である。FIG. 8 is a block diagram illustrating one configuration of a conventional image processing apparatus.

【図９】ワード境界に整合されない１８バイトの切り出
しを表す図である。FIG. 9 is a diagram illustrating clipping of 18 bytes that are not aligned with word boundaries.

【図１０】ワード境界に整合されない８バイトの切り出
しを表す図である。FIG. 10 is a diagram illustrating clipping of 8 bytes not aligned with a word boundary.

【図１１】切り出し回路を備えた従来の画像処理装置の
構成を示すブロック図である。FIG. 11 is a block diagram illustrating a configuration of a conventional image processing apparatus including a cutout circuit.

[Explanation of symbols]

１Ａ，１Ｂプロセッサコア２共有バス３ＤＲＡＭコントローラ４ＤＲＡＭ５Ａ，５Ｂブリッジ６Ａ，６Ｂローカルバス７Ａ，７Ｂローカルメモリ３１，６２ファネルシフタ３２，３３，３４，５１，６３〜６６シフタ３５，３６，３７，５２，５３レジスタ６７マルチプレクサ 1A, 1B Processor core 2 Shared bus 3 DRAM controller 4 DRAM 5A, 5B Bridge 6A, 6B Local bus 7A, 7B Local memory 31, 62 Funnel shifter 32, 33, 34, 51, 63-66 Shifter 35, 36, 37, 52 , 53 register 67 multiplexer

Claims

[Claims]

1. A plurality of processors each including a processor core for controlling data transfer, a local memory connected to a local bus, and a bridge connecting the processor core and the local bus, via the respective bridges. Connected in parallel to a shared bus, a main memory is connected to the shared bus via a memory controller for the main memory, and data is transferred between the local memory and the main memory in parallel with the operation of the plurality of processor cores. In the image processing apparatus for processing image data, provided in a memory controller for the main memory,
A first cutout circuit that receives data read from the main memory, cuts out a part of the data, and outputs the cutout data to the shared bus; and a first cutout circuit provided in each of the bridges, And a second cutout circuit for receiving data output to the shared bus from the cutout circuit, cutting out a part of the data, and providing the cutout data to the local memory via the local bus. Image processing apparatus.

2. The image processing apparatus according to claim 1, wherein said first cutout circuit comprises a funnel shifter in which a plurality of stages of shifters for shifting input data by a predetermined shift amount are connected in cascade.

3. The image processing apparatus according to claim 2, wherein the funnel shifter cuts out continuous data that is not aligned with a boundary of the unit data from a plurality of continuous unit data.

4. The image processing apparatus according to claim 1, wherein said second cutout circuit comprises a byte conversion circuit having a shifter or a multiplexer.

5. The shared bus is 8 bytes wide, at least one of the local buses is 9 bytes wide, and at least one of the local buses different from the 9 byte wide local bus is 8 bytes wide. The image processing apparatus according to claim 1, 2, 3, or 4, wherein: