US20020135502A1 - Method and apparatus for convolution encoding and viterbi decoding of data that utilize a configurable processor to configure a plurality of re-configurable processing elements - Google Patents
Method and apparatus for convolution encoding and viterbi decoding of data that utilize a configurable processor to configure a plurality of re-configurable processing elements Download PDFInfo
- Publication number
- US20020135502A1 US20020135502A1 US09/818,746 US81874601A US2002135502A1 US 20020135502 A1 US20020135502 A1 US 20020135502A1 US 81874601 A US81874601 A US 81874601A US 2002135502 A1 US2002135502 A1 US 2002135502A1
- Authority
- US
- United States
- Prior art keywords
- processing elements
- configurable
- configurable processing
- digital signal
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/65—Purpose and implementation aspects
- H03M13/6508—Flexibility, adaptability, parametrability and configurability of the implementation
- H03M13/6511—Support of multiple decoding rules, e.g. combined MAP and Viterbi decoding
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/03—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
- H03M13/23—Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using convolutional codes, e.g. unit memory codes
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/37—Decoding methods or techniques, not specific to the particular type of coding provided for in groups H03M13/03 - H03M13/35
- H03M13/39—Sequence estimation, i.e. using statistical methods for the reconstruction of the original codes
- H03M13/3961—Arrangements of methods for branch or transition metric calculation
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/37—Decoding methods or techniques, not specific to the particular type of coding provided for in groups H03M13/03 - H03M13/35
- H03M13/39—Sequence estimation, i.e. using statistical methods for the reconstruction of the original codes
- H03M13/41—Sequence estimation, i.e. using statistical methods for the reconstruction of the original codes using the Viterbi algorithm or Viterbi processors
- H03M13/4161—Sequence estimation, i.e. using statistical methods for the reconstruction of the original codes using the Viterbi algorithm or Viterbi processors implementing path management
- H03M13/4169—Sequence estimation, i.e. using statistical methods for the reconstruction of the original codes using the Viterbi algorithm or Viterbi processors implementing path management using traceback
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M13/00—Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
- H03M13/65—Purpose and implementation aspects
- H03M13/6563—Implementations using multi-port memories
Definitions
- the present invention relates to digital signal processing, and more particularly to the mapping of a convolution encoder and a Viterbi decoder onto a dynamically re-configurable two-dimensional single instruction multiple data (SIMD) processor array architecture.
- SIMD single instruction multiple data
- DSP digital signal processing
- convolution encoding is a coding scheme that associates at least one encoded data element with each source data element to be encoded, this encoded data element being obtained by the modulo-two summation of this source data element with at least one of the previous source data elements.
- each encoded symbol is a linear combination of the source data element to be encoded and the previous source data elements.
- FIG. 1A a schematic diagram of a standard convolution encoder with a code rate of one half is shown.
- the encoder is shown to be comprised of two delay elements, 10 and 12 , and three exclusive-OR Boolean operators 20 , 22 , and 24 .
- an input u(t) is connected to a first delay element 10 , a first exclusive-OR operator 20 , and a second exclusive-OR operator 22 .
- the output u(t- 1 ) of the first delay element 10 is connected to the input of the second delay element 12 and to the second exclusive-OR operator 22 .
- the output u(t- 2 ) of the second delay element 20 is then connected to the first exclusive-OR operator 20 and to the third exclusive-OR operator 24 .
- the encoding outputs, a(t) and b(t) are then respectively taken from the outputs of the first exclusive-OR operator 20 and the third exclusive-OR operator 24 . It should be appreciated that there are four possible binary states of the encoder (u(t- 1 ), u(t- 2 )), including state zero ( 00 ), state one ( 01 ), state two ( 10 ), and state three ( 11 ).
- the encoding process of the described encoder may also be characterized by the finite state machine illustrated in FIG. 1B.
- each circle is labeled with a binary representation of one of the four binary states of the encoder.
- this diagram provides binary representations for state zero 40 , state one 44 , state two 42 , and state three 46 .
- This diagram is further comprised of several arrows representing the respective transition paths taken into each particular state. In this example, a total of eight transition paths 30 , 31 , 32 , 33 , 34 , 35 , 36 , and 37 are illustrated.
- Each transition path also includes an input/output pair (u(t)/a(t), b(t)) uniquely identifying the conditions needed for that particular transition to occur.
- Path 30 depicts an input u(t) of zero that produces respective outputs a(t), b(t) of zero, zero ( 0 / 00 ), thereby causing the finite state machine to remain at state zero 40 (or 00 ).
- Path 31 depicts an input u(t) of one and respective outputs a(t), b(t) of one, one ( 1 / 11 ), thereby causing the finite state machine to transition to state two 42 (or 10 ). From state two 42 , there are two possible transition paths, including path 32 and path 37 .
- Path 32 depicts an input u(t) of one that produces respective outputs a(t), b(t) of one, zero ( 1 / 10 ), thereby causing the finite state machine to transition to state three 46 (or 11 ).
- Path 37 depicts an input u(t) of zero and respective outputs a(t), b(t) of zero, one ( 0 / 01 ), thereby causing the finite state machine to transition to state one 44 (or 01 ).
- the remaining transition paths follow in like manner.
- any given stream of input bits u(t) can be uniquely determined directly from its corresponding sequence of outputs, a(t) and b(t), and information derived from the encoder's trellis diagram. For example, if after four instants the observed noiseless outputs ⁇ a 1 (t)/b 1 (t), a 2 (t)/b 2 (t), a 3 (t)/b 3 (t), a 4 (t)/b 4 (t) ⁇ at a receiver are ⁇ 11 , 10 , 10 , 00 ⁇ , then the corresponding input sequence ⁇ u 1 (t), u 2 (t), u 3 (t), u 4 (t) ⁇ is ⁇ 1 , 1 , 0 , 1 ⁇ according to the trellis diagram shown in FIG.
- the number of decoded input bits is determined directly from the number of instants traced back in a given trellis diagram.
- two trace-back approaches are used. In the first approach, the number of instants traced back in a trellis diagram is equal to the total number of bits in the entire bit stream (resulting in the decoding of the entire bit stream at once). In the second approach, a pre-determined number of instants is used resulting in the decoding of partial bit streams at a time.
- the Viterbi algorithm gives a binary estimation of each input u(t) coded at transmission. This estimation is determined by finding the most likely transition path of a given trellis with respect to the noisy output data (X(t), Y(t)) received by a decoder respectively corresponding to the originally encoded output data (a(t), b(t)).
- Each node of the trellis used during decoding contains an information element on the survivor path of the two possible paths ending at that particular node.
- the basic principle of the Viterbi algorithm consists in considering, at each node, only the most probable path as to enable easy trace-back operations on the trellis and hence to determine an a posteriori estimation of the value received several reception instants earlier.
- the Viterbi algorithm involves the execution of a particular set of operations.
- a computation is made of the distances, also called branch metrics, between the received noisy output data (X(t), Y(t)) and the symbols (a(t), b(t)) corresponding to the required noiseless outputs of a particular state transition path.
- branch metrics also called branch metrics
- Branch( a s , b s ) a s X k +b s Y k
- (a s , b s ) represent the required noiseless outputs of a particular state transition path and (X k , Y k ) represent a received noisy output received at time k (it should be noted that, in the modulation scheme described herein, zero logic values are replaced by negative ones in the right-side of the above formula).
- a set of incoming data is defined as (X 0 , Y 0 ), which corresponds to a particular output (a 0 , b 0 ) of an encoder for a certain input u 0 with a code rate of one half.
- Branch ( 0 , 0 ) ⁇ X 0 ⁇ Y 0
- Branch ( 1 , 1 ) X 0 +Y 0
- a cumulative branch metric is then determined at each node after each instant.
- a cumulative branch metric P(s, t) is defined for each node where s represents the state of the node and t represents the instant as:
- P(j, t) represents the cumulative branch metric of state j at instant t
- P(i, t ⁇ 1) represents the cumulative branch metric of a state i preceding state j at instant (t ⁇ 1)
- Branch ij represents the branch metric needed to transition from state i to state j.
- the most likely path M(j, t) coming into state j at time instant t is then defined as:
- Branch ( 0 , 0 ) ⁇ X 0 ⁇ Y 0
- Branch ( 0 , 1 ) ⁇ X 0 +Y 0
- Branch ( 1 , 0 ) X 0 ⁇ Y 0
- Branch ( 1 , 1 ) X 0 +Y 0
- the cumulative branch metrics corresponding to the two possible paths for each state are then compared in order to determine the paths most likely taken at this particular instant.
- the selected paths and the cumulative branch metrics of each state are then both stored in memory until the next instant.
- a trace-back operation is made in order to determine the optimal cumulative path taken.
- the path with the largest cumulative path metric is chosen as the optimal path (although some implementations use the smallest cumulative path metric).
- This optimal path is then used to decode the original coded bit stream of information according the procedure described earlier for noiseless conditions.
- the Viterbi algorithm has been implemented in the prior art using either hardware or software systems.
- Software implementations of the Viterbi algorithm adapted to run on general purpose digital signal processors have the advantage of better flexibility than hardware implementations, since the software can be readily reprogrammed.
- hardware implementations of the Viterbi algorithm using application specific integrated circuits (ASICs) can achieve higher performance than the software implementations in terms of lower power consumption, higher decoding rates, etc., but cannot be easily modified.
- a method and apparatus for convolution encoding and Viterbi decoding utilizes a flexible, digital signal processing architecture that comprises a core processor and a plurality of re-configurable processing elements arranged in a two-dimensional array.
- the present invention therefore enables the convolution encoding and Viterbi decoding functions to be mapped onto this flexible architecture, thereby overcoming the disadvantages of conventional hardware and software solutions.
- the core processor is operable to configure the re-configurable processing elements to perform data encoding and data decoding functions.
- a received data input is encoded by configuring one of the re-configurable processing elements to emulate a convolution encoding algorithm and applying the received data input to the convolution encoding algorithm.
- a received encoded data input is decoded by configuring the plurality of re-configurable processing elements to emulate a Viterbi decoding algorithm wherein the plurality of re-configurable processing elements is configured to accommodate every data state of the convolution encoding algorithm.
- the core processor initializes the re-configurable processing elements by assigning register values to registers that define parameters such as constraint length and code rate for the convolution encoding algorithm.
- the encoding function further comprises generating a multiple output sequence corresponding to the received data input.
- the encoding function comprises performing a modulo-two addition of selected taps of a serially timedelayed sequence of the received data input.
- the decoding function further comprises mapping a trellis diagram onto the plurality of re-configurable processing elements.
- the re-configurable processing elements calculate cumulative branch metric units for each node of the trellis diagram, and the core processor selects a most probable state transition path of the trellis diagram based on the branch metric units.
- FIG. 1A is a schematic diagram of a convolution encoder having a code rate of one half
- FIG. 1B is a schematic diagram of a finite state machine of an encoder having a code rate of one half;
- FIG. 1C is a trellis diagram illustrating the possible state transitions of encoded data having a code rate of one half;
- FIG. 2 is a block diagram of a preferred embodiment of the invention.
- FIG. 3A is a schematic diagram illustrating the internal quadrants of the RC array
- FIG. 3B is a schematic diagram illustrating the internal express lanes of the RC array
- FIG. 3C is a schematic diagram illustrating the internal data-bus connections of the RC array
- FIG. 4A is a schematic diagram of a convolution encoder having a code rate of one third and constraint length of nine;
- FIG. 4B is a trellis diagram illustrating the possible state transitions of encoded data having a code rate of one third and constraint length of nine;
- FIG. 5 is a diagram illustrating the various registers allocated for encoding in an RC
- FIG. 6 is a flow chart illustrating the steps for encoding one bit of information according to a preferred embodiment of the invention.
- FIG. 7 is a flow chart illustrating the steps for decoding a bit stream of information according to a preferred embodiment of the invention.
- FIG. 8 is diagram illustrating the state transition mapping of a Viterbi decoder for encoded data having a code rate of one third and a constraint length of nine;
- FIG. 9 is a diagram illustrating the branch metric mapping of a Viterbi decoder for encoded data having a code rate of one third and a constraint length of nine;
- FIG. 10 is a schematic diagram demonstrating the data collapse procedure for writing path information into the frame buffer.
- the present invention is directed towards a method and apparatus for convolution encoding and Viterbi decoding.
- this invention provides a unique re-configurable architecture that addresses the performance limitations currently known in the art by simultaneously achieving the flexibility of software pertaining to general-purpose processors and sustaining the high performance pertaining to hardware implementations of application-specific circuits.
- like element numerals are used to describe like elements illustrated in one or more of the figures.
- An embodiment of the invention shown in FIG. 2 comprises an architecture including a dynamically re-configurable two-dimensional SIMD processor array 200 .
- this architecture is comprised of a core processor 210 , a re-configurable cell (RC) array 260 , a row context memory 240 , a column context memory 250 , a frame buffer 230 , and a direct memory access (DMA) controller 220 .
- the core processor 210 communicates with the core processor external memory unit 110 while the DMA controller 220 communicates with the DMA external memory unit 120 .
- instructions and data for both the core processor 210 and the DMA controller 220 are respectively provided by external memory units 110 and 120 .
- Reconfiguration for example, is achieved by caching several context words from the DMA external memory unit 120 onto the row and column context memories, 240 and 250 , of the processor array 260 .
- the core processor 210 and the DMA controller 220 respectively communicate with external memory units, 110 and 120 , through parallel data-buses (e.g., 32-bit).
- a parallel data-bus (e.g., 32-bit) also connects the core processor 210 with the frame buffer 230 , and the DMA controller 220 with both the row context memory 240 and the column context memory 250 .
- Another parallel data-bus (e.g., 128-bit) connects the DMA controller 220 with the frame buffer 230 , as well as the frame buffer 230 and the RC array 260 .
- the row context memory 240 and the column context memory 250 are then both connected to the RC array 260 through a parallel context-bus (e.g., 256-bit) in both column and row direction.
- FIG. 3A a diagram illustrating the internal connections of the RC array 260 is provided.
- RC's 262 in the RC array 260 are connected in two levels of hierarchy.
- cells are grouped into four quadrants, quad one 270 , quad two 272 , quad three 274 , and quad four 276 , in which RC's 262 of a particular quadrant are directly connected to each RC 262 in the row or column of that quadrant.
- cells in adjacent quadrants are connected via express lanes 264 , that enable a cell in a quadrant to broadcast its results to the cells in the adjacent quadrant as illustrated in FIG. 3B.
- Each RC 262 of a particular row (i.e., eight RC's 262 per row in this particular embodiment) is also further comprised of two sixteen-bit connections allowing it to communicate with the frame buffer 230 both via a one-hundred-twenty-eight-bit operand bus 266 and a one-hundred-twenty-eight-bit result bus 268 as illustrated in FIG. 3C.
- the processing element of this invention is called the re-configurable cell (RC) 262 .
- RC re-configurable cell
- a total of sixty-four RC's 262 are grouped into an eight by eight matrix, called the RC array 260 .
- alternative embodiments of this RC array 260 can be created by grouping a total of m RC's 262 into an n ⁇ n matrix (where m is an arbitrary number of RC's defined by the product of n times n).
- the function of the frame buffer 230 is analogous to an internal data cache for the RC array 260 .
- the row context memory 240 and the column context memory 250 are then both used to locally store the configuration contexts of the RC array 260 , thus making their function analogous to an instruction cache for the RC array 260 .
- the core processor 210 ultimately controls operation of the RC array 260 . It initiates all data transfers to and from the frame buffer 230 and configures the loading of the row and column context memories, 240 and 250 , through the DMA controller 220 . It should be noted, however, that the core processor 210 instead of the RC array 260 calculates some computations. For example, the core processor 210 computes the trace-back procedure of the Viterbi decoder, as will be described later.
- the programmability of this architecture is derived from context words that are broadcast to the rows or columns of the RC array 260 by either the row context memory 240 or the column context memory 250 .
- each RC 262 can access the output of any other RC 262 in its column or row, select an input from its own register file, or access data from the frame buffer 230 .
- the context word thus provides functional programmability by configuring each RC 262 to perform specific operations.
- a method in accordance with an embodiment of this invention is described for the case of a standard convolutional code, with a constraint length of nine and a code rate of one third, obtained by means of an exemplary coder shown in FIG. 4A.
- convolution encoding involves the modulo-two addition of selected taps of a serially time-delayed data sequence. As illustrated in FIG.
- an input u(t) is passed through a series of eight delay elements 50 , 51 , 52 , 53 , 54 , 55 , 56 , and 57 each of which is appropriately summed by several exclusive-OR operators 60 , 61 , 62 , 63 , 64 , 65 , 70 , 71 , 72 , 73 , 74 , 80 , 81 , 82 , and 83 . Consequently, this operation generates a three-output sequence, X(t), Y(t), and Z(t), corresponding to a particular input u(t).
- the resultant output is ( 1 , 1 , 1 ) and the resultant next state is state one hundred twenty-eight (S 128 ).
- the trellis shown in FIG. 4B corresponds to only one of several trellis stages (namely, only one set of state transitions).
- FIG. 5 provides a schematic diagram illustrating how internal memory space is allocated for one third code rate encoding in the single functional RC 262 .
- various registers 300 , 305 , 310 , 315 , 320 , 325 , 330 , 335 , 340 , and 345 are used to perform this encoding operation.
- Registers 300 , 305 , and 310 are reserved for polynomial values X, Y, and Z corresponding to the respective outputs X(t), Y(t), and Z(t) of the encoder shown in FIG. 4A. It should be noted that these polynomial values are usually programmed into these registers according to industry standards for convolution encoders. For example, conventional 3 G wireless standards define these values as being 557 (octal), 663 (octal), and 711 (octal), for X, Y, and Z, respectively.
- Register 315 is reserved for the current eight-bit state of the encoder (corresponding to the eight delay elements 50 , 51 , 52 , 53 , 54 , 55 , 56 , and 57 of FIG.
- register 320 is reserved for the actual data to be encoded (entered sixteen-bits at a time).
- Registers 325 and 330 are used as masks to respectively extract the most and least significant bits from other registers.
- Registers 335 and 340 are then used to temporarily store intermediate values ascertained during the encoding procedure.
- register 345 is used to store the three-output sequence (X(t), Y(t), Z(t)) of encoded values.
- FIG. 6 a flow chart describing the encoding procedure for one bit of data is provided.
- Encoding begins at step 400 and continues with the core processor 210 getting encoding instructions from external memory unit 110 at step 405 .
- the core processor 210 then proceeds by initializing the RC array 260 for the encoding procedure at step 410 .
- This initialization step includes allocating the internal memory space described previously (here, it is assumed that a code rate of one third is desired).
- register values are appropriately loaded into each of the registers illustrated in FIG. 5.
- the most significant bit is taken from the data register 320 at step 420 and temporarily stored in temporary register 335 (where it is understood that the MSB is extracted from the data register 320 through a simple logic operation with the MSB mask stored in register 325 ) at step 425 .
- the stored MSB value is then concatenated with the value stored in the state register 315 at step 430 .
- the value derived at step 430 is then temporarily stored back into temporary register 335 at step 435 .
- a bit-wise AND operation is performed between the value stored in temporary register 335 and the appropriate value representing polynomial i stored in either register 300 , 305 , or 310 (where it is understood that this step will alternate these three values at each respective iteration).
- the result of the operation performed at step 440 is then stored in temporary register 340 at step 445 .
- the RC 262 then performs a “ones” counter operation on the value stored in temporary register 340 at step 450 and stores this value back into temporary register 340 at step 455 .
- the least significant bit (LSB) is then extracted from the value stored in temporary register 340 at step 460 using the LSB mask stored in register 330 .
- the LSB found at step 460 represents the encoded output corresponding to the polynomial used at step 440 .
- This value is then stored in the output register 345 at step 465 .
- step 468 If, at step 468 , it is determined that encoding for this particular bit is complete, then the data register is left-shifted by one at step 470 in preparation for encoding the next bit; otherwise, encoding of the current bit continues by returning the procedure to step 440 where calculations are made according to the next polynomial value.
- the core processor 210 determines if the encoding is complete. After left-shifting the data register at step 470 , the procedure determines whether the entire encoding process is complete (i.e., there is no further data to be encoded) at step 475 . If at step 475 , it is determined that encoding is complete, then the encoded stream of values is provided to the frame buffer 230 at step 480 ; otherwise, the procedure returns to step 420 where it proceeds in determining the next encoded set of values.
- FIG. 7 a flow chart illustrating the steps for decoding a bit-stream of encoded data is shown.
- the mapping of the Viterbi decoder onto the aforementioned RC array 260 is herein described for encoded data with constraint lengths of nine (corresponding to 2 8 states) and code rates of one-third, which correspond to typical standards used in the art.
- the following mapping methods can be easily adapted for Viterbi decoders with different constraint lengths and different code rates through minor software modifications. This flexibility, therefore, enables the present invention to reconfigure itself without having to make any hardware modifications.
- Decoding begins at step 500 and continues with the reception of an encoded stream of data that is temporarily stored in the DMA external memory unit 120 at step 505 .
- the DMA controller 220 then transfers this encoded data from the external memory unit 120 to the frame buffer 230 at step 510 .
- the core processor 210 determines the format of the incoming data (e.g., code rate, constraint length, etc.), and initializes the RC array 260 according to this format at step 515 .
- the RC array 260 must be initialized according to data having a constraint length of nine and having a code rate of one-third. Since these specifications result in a total of two hundred fifty six states, each RC 262 is assigned trellis information for four states as shown in FIG. 8.
- a particular instruction is selected from the row context memory 240 enabling the first encoded packet of data (X 0 , Y 0 , Z 0 ) to be loaded into each RC 262 of the RC array 260 .
- branch metric calculations may begin at step 525 .
- each RC 262 will calculate its respective branch metrics (two branch metrics per RC 262 ) and store them in its local memory. It should be noted that, in general:
- Branch( a s , b s , c s ) ⁇ Branch( ⁇ a s , ⁇ b s , ⁇ c s ),
- each RC 262 sums the calculated branch metric from step 525 with the cumulative branch metric of the corresponding state from the previous trellis stage and compares its two possible paths (as shown in FIG. 4B). Since each state has only two possible paths, one bit can be used to describe which path was chosen.
- each RC 262 has four bits of data that need to be stored in the frame buffer 230
- each column of the RC array 260 will have a total of thirty-two bits requiring storage in the frame buffer 230 .
- a data collapse mechanism is implemented at each column by broadcasting particular instructions from either the row context memory 240 or the column context memory 250 . This mechanism merges the first two bits of each RC 262 into a single sixteen-bit word and then takes the remaining two bits of each RC 262 and merges them into another sixteen-bit word. In FIG. 10, this mechanism is described for one of the eight columns of the RC array 260 .
- this process begins by taking the first two bits of each RC 262 and merging them with the first two bits of a neighboring RC 262 to form a set of four four-bit words.
- the first two bits of rows zero and one, two and three, four and five, and six and seven are respectively merged in order to create this set of four-bit words.
- Each four-bit word is then respectively stored in one particular RC 262 of the aforementioned RC 262 pairs. In the example shown, these four-bit words are respectively stored in rows zero, two, four, and six.
- a similar mechanism then follows in order to merge this set of four four-bit words into a set of two eight-bit words.
- the two four-bit words in rows zero and two merge to form the eight-bit word shown in row zero while the two four-bit words in rows four and six merge to form the eight-bit word shown in row four.
- the two eight-bit words are then merged to form the sixteen-bit word shown in row zero.
- the sixteen-bit word is then sent to the frame buffer 230 via the result-bus 268 . Once this first sixteen-bit word is stored in the frame buffer 230 , operations may begin to create the second sixteen-bit word through the same procedure.
- a re-ordering of the state metrics is then made at step 540 .
- the purpose of this step is to prepare the RC array 260 for the next trellis stage.
- the branch metric values calculated and assigned to each “next state” at step 530 must be updated so that they are labeled “current state” branch metric values in the following trellis stage.
- the core processor 210 catalyzes this state re-ordering procedure by broadcasting particular instructions from either the row context memory 240 or the column context memory 250 . By way of these instructions, cumulative branch metric values are easily communicated from one RC 262 to another.
- an internal criterion algorithm determines whether an additional trellis stage is needed at step 545 (where it is understood that either of the two aforementioned trace-back approaches may be used). If at step 545 , it is indeed determined that an additional trellis stage is needed, the procedure returns to step 520 and thus repeats the above iteration for the following trellis stage; otherwise, the procedure initiates its trace-back operation at step 550 . Once this trace-back operation is initiated, the core processor 210 selects the optimal path from the plethora of paths stored in the frame buffer 230 . In a known way, the core processor 210 then takes this optimal path and determines which bit stream was most likely transmitted by the encoder. This decoded bit stream is then output to the frame buffer 230 at step 555 .
Landscapes
- Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Error Detection And Correction (AREA)
Abstract
Description
- 1. Field of the Invention
- The present invention relates to digital signal processing, and more particularly to the mapping of a convolution encoder and a Viterbi decoder onto a dynamically re-configurable two-dimensional single instruction multiple data (SIMD) processor array architecture.
- 2. Description of Related Art
- The field of digital signal processing (DSP) has grown dramatically in recent years and has quickly become a key component in many consumer, communications, medical, and industrial products. DSP technology involves the analyzing and processing of digital data in the form of sequences of ones and zeros. In the field of communications, analog signals are converted to such digital sequences for processing and transmission. During transmission, however, these digital sequences may be easily distorted by noise. In order to address this problem, digital data is often encoded before transmission. One form of encoding, known as convolution encoding, is widely used in digital communication and signal processing to protect transmitted data against noise, and its efficiency is well known in terms of error correction quality. In general, convolution encoding is a coding scheme that associates at least one encoded data element with each source data element to be encoded, this encoded data element being obtained by the modulo-two summation of this source data element with at least one of the previous source data elements. Thus, each encoded symbol is a linear combination of the source data element to be encoded and the previous source data elements.
- In FIG. 1A, a schematic diagram of a standard convolution encoder with a code rate of one half is shown. For this type of encoder, two encoding outputs, a(t) and b(t), are transmitted for every input u(t). The encoder is shown to be comprised of two delay elements,10 and 12, and three exclusive-OR
Boolean operators first delay element 10, a first exclusive-OR operator 20, and a second exclusive-OR operator 22. The output u(t-1) of thefirst delay element 10 is connected to the input of thesecond delay element 12 and to the second exclusive-OR operator 22. The output u(t-2) of thesecond delay element 20 is then connected to the first exclusive-OR operator 20 and to the third exclusive-OR operator 24. The encoding outputs, a(t) and b(t), are then respectively taken from the outputs of the first exclusive-OR operator 20 and the third exclusive-OR operator 24. It should be appreciated that there are four possible binary states of the encoder (u(t-1), u(t-2)), including state zero (00), state one (01), state two (10), and state three (11). - The encoding process of the described encoder may also be characterized by the finite state machine illustrated in FIG. 1B. In this diagram, each circle is labeled with a binary representation of one of the four binary states of the encoder. In particular, this diagram provides binary representations for state zero40, state one 44, state two 42, and state three 46. This diagram is further comprised of several arrows representing the respective transition paths taken into each particular state. In this example, a total of eight
transition paths - For example, beginning at state zero40, there are two possible transition paths, including
path 30 andpath 31.Path 30 depicts an input u(t) of zero that produces respective outputs a(t), b(t) of zero, zero (0/00), thereby causing the finite state machine to remain at state zero 40 (or 00).Path 31 depicts an input u(t) of one and respective outputs a(t), b(t) of one, one (1/11), thereby causing the finite state machine to transition to state two 42 (or 10). From state two 42, there are two possible transition paths, includingpath 32 andpath 37.Path 32 depicts an input u(t) of one that produces respective outputs a(t), b(t) of one, zero (1/10), thereby causing the finite state machine to transition to state three 46 (or 11).Path 37 depicts an input u(t) of zero and respective outputs a(t), b(t) of zero, one (0/01), thereby causing the finite state machine to transition to state one 44 (or 01). The remaining transition paths follow in like manner. - In order to depict how the described encoder evolves over time, a trellis diagram is shown in FIG. 1C. As illustrated, this diagram is comprised of several nodes (denoted by dots) and transition paths (denoted by solid lines). Each column of nodes represents all states at a particular instant. In this particular example, five instants are described (corresponding to t=1 through t=5). Therefore, this trellis diagram can be regarded as illustrating the sequence of all possible state transition paths over five instants (where it is assumed that the initial state is state zero40). As a result, any given stream of input bits u(t) can be uniquely determined directly from its corresponding sequence of outputs, a(t) and b(t), and information derived from the encoder's trellis diagram. For example, if after four instants the observed noiseless outputs {a1(t)/b1(t), a2(t)/b2(t), a3(t)/b3(t), a4(t)/b4(t)} at a receiver are {11, 10, 10, 00}, then the corresponding input sequence {u1(t), u2(t), u3(t), u4(t)} is {1, 1, 0, 1} according to the trellis diagram shown in FIG. 1C. In this example, it should be clear that the number of decoded input bits is determined directly from the number of instants traced back in a given trellis diagram. In practice, two trace-back approaches are used. In the first approach, the number of instants traced back in a trellis diagram is equal to the total number of bits in the entire bit stream (resulting in the decoding of the entire bit stream at once). In the second approach, a pre-determined number of instants is used resulting in the decoding of partial bit streams at a time.
- In general, noise will occur during transmission. For example, if the observed output sequence is {10, 10, 10, 00}, the corresponding input sequence is unclear. Thus in practical applications, statistical decoding methods that account for such noise must be implemented. It should be noted that although each
transition path transition paths - In the presence of noise, the most commonly used approach to decode convolution codes is via the Viterbi algorithm. In particular, the Viterbi algorithm gives a binary estimation of each input u(t) coded at transmission. This estimation is determined by finding the most likely transition path of a given trellis with respect to the noisy output data (X(t), Y(t)) received by a decoder respectively corresponding to the originally encoded output data (a(t), b(t)). Each node of the trellis used during decoding contains an information element on the survivor path of the two possible paths ending at that particular node. The basic principle of the Viterbi algorithm consists in considering, at each node, only the most probable path as to enable easy trace-back operations on the trellis and hence to determine an a posteriori estimation of the value received several reception instants earlier.
- The Viterbi algorithm involves the execution of a particular set of operations. First, a computation is made of the distances, also called branch metrics, between the received noisy output data (X(t), Y(t)) and the symbols (a(t), b(t)) corresponding to the required noiseless outputs of a particular state transition path. In particular these branch metric units are defined as:
- Branch(a s , b s)=a s X k +b s Y k
- where (as, bs) represent the required noiseless outputs of a particular state transition path and (Xk, Yk) represent a received noisy output received at time k (it should be noted that, in the modulation scheme described herein, zero logic values are replaced by negative ones in the right-side of the above formula). For example, suppose a set of incoming data is defined as (X0, Y0), which corresponds to a particular output (a0, b0) of an encoder for a certain input u0 with a code rate of one half. If the trellis shown in FIG. 1C is used (where it is assumed that state zero 40 is the initial state), then the procedure begins by calculating branch metric units for
state transition paths - Branch (0, 0)=−X 0 −Y 0
- Branch (1, 1)=X 0 +Y 0
- where Branch (0, 0) describes the branch metric needed to transition from state zero 40 to state zero 40 (where as=0 and bs=0), and Branch (1, 1) describes the branch metric needed to transition from state zero 40 to state two 42 (where as=1 and bs=1). A cumulative branch metric is then determined at each node after each instant. In particular, a cumulative branch metric P(s, t) is defined for each node where s represents the state of the node and t represents the instant as:
- P(j, t)=P(i, t−1)+Branchij
- where P(j, t) represents the cumulative branch metric of state j at instant t, P(i, t−1) represents the cumulative branch metric of a state i preceding state j at instant (t−1), and Branchij represents the branch metric needed to transition from state i to state j. The most likely path M(j, t) coming into state j at time instant t is then defined as:
- M(j, t)=max{i*}[M i*(t−1)+Branchi*j]
- where {i*} represents the set of states having transitions into state j. It should be noted that the above formula is only needed when there are two possible state transition paths into a particular node (otherwise, the most likely path into state j M(j, t) is simply P(j, t)). In the current example, it should thus be clear that this calculation is not needed until the fourth instant (t=4). It should also be noted that, in the current example, it is assumed that all cumulative branch metrics are initially zero. Therefore, P(0, 1) and M(0, 1) are both initialized to zero at the first instant (t=1).
- In the next instant (t=2), four branch metric calculations are needed. Namely, the following branches are needed:
- Branch (0, 0)=−X 0 −Y 0
- Branch (0, 1)=−X 0 +Y 0
- Branch (1, 0)=X 0 −Y 0
- Branch (1, 1)=X 0 +Y 0
- The cumulative branch metrics corresponding to the two possible paths for each state are then compared in order to determine the paths most likely taken at this particular instant. The selected paths and the cumulative branch metrics of each state are then both stored in memory until the next instant.
- After a pre-determined number of instants, a trace-back operation is made in order to determine the optimal cumulative path taken. In particular, the path with the largest cumulative path metric is chosen as the optimal path (although some implementations use the smallest cumulative path metric). This optimal path is then used to decode the original coded bit stream of information according the procedure described earlier for noiseless conditions.
- The Viterbi algorithm has been implemented in the prior art using either hardware or software systems. Software implementations of the Viterbi algorithm adapted to run on general purpose digital signal processors have the advantage of better flexibility than hardware implementations, since the software can be readily reprogrammed. Conversely, hardware implementations of the Viterbi algorithm using application specific integrated circuits (ASICs) can achieve higher performance than the software implementations in terms of lower power consumption, higher decoding rates, etc., but cannot be easily modified.
- It would therefore be advantageous to develop a method and apparatus for convolution encoding and Viterbi decoding that addresses these limitations of known hardware and software implementations. More specifically, it would be advantageous to develop a method and apparatus for convolution encoding and Viterbi decoding that has the flexibility of the software implementations, with the superior performance of the hardware implementations.
- A method and apparatus for convolution encoding and Viterbi decoding utilizes a flexible, digital signal processing architecture that comprises a core processor and a plurality of re-configurable processing elements arranged in a two-dimensional array. The present invention therefore enables the convolution encoding and Viterbi decoding functions to be mapped onto this flexible architecture, thereby overcoming the disadvantages of conventional hardware and software solutions.
- In an embodiment of the invention, the core processor is operable to configure the re-configurable processing elements to perform data encoding and data decoding functions. A received data input is encoded by configuring one of the re-configurable processing elements to emulate a convolution encoding algorithm and applying the received data input to the convolution encoding algorithm. A received encoded data input is decoded by configuring the plurality of re-configurable processing elements to emulate a Viterbi decoding algorithm wherein the plurality of re-configurable processing elements is configured to accommodate every data state of the convolution encoding algorithm. The core processor initializes the re-configurable processing elements by assigning register values to registers that define parameters such as constraint length and code rate for the convolution encoding algorithm.
- More particularly, the encoding function further comprises generating a multiple output sequence corresponding to the received data input. Essentially, the encoding function comprises performing a modulo-two addition of selected taps of a serially timedelayed sequence of the received data input. The decoding function further comprises mapping a trellis diagram onto the plurality of re-configurable processing elements. The re-configurable processing elements calculate cumulative branch metric units for each node of the trellis diagram, and the core processor selects a most probable state transition path of the trellis diagram based on the branch metric units.
- A more complete understanding of the method and apparatus for convolution encoding and Viterbi decoding will be afforded to those skilled in the art, as well as a realization of additional advantages and objects thereof, by a consideration of the following detailed description of the preferred embodiment. Reference will be made to the appended sheets of drawings which will first be described briefly.
- FIG. 1A is a schematic diagram of a convolution encoder having a code rate of one half;
- FIG. 1B is a schematic diagram of a finite state machine of an encoder having a code rate of one half;
- FIG. 1C is a trellis diagram illustrating the possible state transitions of encoded data having a code rate of one half;
- FIG. 2 is a block diagram of a preferred embodiment of the invention;
- FIG. 3A is a schematic diagram illustrating the internal quadrants of the RC array;
- FIG. 3B is a schematic diagram illustrating the internal express lanes of the RC array;
- FIG. 3C is a schematic diagram illustrating the internal data-bus connections of the RC array;
- FIG. 4A is a schematic diagram of a convolution encoder having a code rate of one third and constraint length of nine;
- FIG. 4B is a trellis diagram illustrating the possible state transitions of encoded data having a code rate of one third and constraint length of nine;
- FIG. 5 is a diagram illustrating the various registers allocated for encoding in an RC;
- FIG. 6 is a flow chart illustrating the steps for encoding one bit of information according to a preferred embodiment of the invention;
- FIG. 7 is a flow chart illustrating the steps for decoding a bit stream of information according to a preferred embodiment of the invention;
- FIG. 8 is diagram illustrating the state transition mapping of a Viterbi decoder for encoded data having a code rate of one third and a constraint length of nine;
- FIG. 9 is a diagram illustrating the branch metric mapping of a Viterbi decoder for encoded data having a code rate of one third and a constraint length of nine; and
- FIG. 10 is a schematic diagram demonstrating the data collapse procedure for writing path information into the frame buffer. DE
- The present invention is directed towards a method and apparatus for convolution encoding and Viterbi decoding. In particular, this invention provides a unique re-configurable architecture that addresses the performance limitations currently known in the art by simultaneously achieving the flexibility of software pertaining to general-purpose processors and sustaining the high performance pertaining to hardware implementations of application-specific circuits. In the detailed description that follows, it should be appreciated that like element numerals are used to describe like elements illustrated in one or more of the figures.
- An embodiment of the invention shown in FIG. 2 comprises an architecture including a dynamically re-configurable two-dimensional
SIMD processor array 200. In particular, this architecture is comprised of acore processor 210, a re-configurable cell (RC)array 260, arow context memory 240, acolumn context memory 250, aframe buffer 230, and a direct memory access (DMA)controller 220. As illustrated, thecore processor 210 communicates with the core processorexternal memory unit 110 while theDMA controller 220 communicates with the DMAexternal memory unit 120. It should be appreciated that instructions and data for both thecore processor 210 and theDMA controller 220 are respectively provided byexternal memory units external memory unit 120 onto the row and column context memories, 240 and 250, of theprocessor array 260. - In a preferred embodiment of this invention, the
core processor 210 and theDMA controller 220 respectively communicate with external memory units, 110 and 120, through parallel data-buses (e.g., 32-bit). A parallel data-bus (e.g., 32-bit) also connects thecore processor 210 with theframe buffer 230, and theDMA controller 220 with both therow context memory 240 and thecolumn context memory 250. Another parallel data-bus (e.g., 128-bit) connects theDMA controller 220 with theframe buffer 230, as well as theframe buffer 230 and theRC array 260. Therow context memory 240 and thecolumn context memory 250 are then both connected to theRC array 260 through a parallel context-bus (e.g., 256-bit) in both column and row direction. - In FIG. 3A, a diagram illustrating the internal connections of the
RC array 260 is provided. In particular, RC's 262 in theRC array 260 are connected in two levels of hierarchy. First, cells are grouped into four quadrants, quad one 270, quad two 272, quad three 274, and quad four 276, in which RC's 262 of a particular quadrant are directly connected to eachRC 262 in the row or column of that quadrant. Furthermore, cells in adjacent quadrants are connected viaexpress lanes 264, that enable a cell in a quadrant to broadcast its results to the cells in the adjacent quadrant as illustrated in FIG. 3B. EachRC 262 of a particular row (i.e., eight RC's 262 per row in this particular embodiment) is also further comprised of two sixteen-bit connections allowing it to communicate with theframe buffer 230 both via a one-hundred-twenty-eight-bit operand bus 266 and a one-hundred-twenty-eight-bit result bus 268 as illustrated in FIG. 3C. - Returning to the architecture illustrated in FIG. 2, the function of each component is now described. The processing element of this invention is called the re-configurable cell (RC)262. In this particular embodiment, a total of sixty-four RC's 262 are grouped into an eight by eight matrix, called the
RC array 260. It should be noted that alternative embodiments of thisRC array 260 can be created by grouping a total of m RC's 262 into an n×n matrix (where m is an arbitrary number of RC's defined by the product of n times n). The function of theframe buffer 230 is analogous to an internal data cache for theRC array 260. Therow context memory 240 and thecolumn context memory 250 are then both used to locally store the configuration contexts of theRC array 260, thus making their function analogous to an instruction cache for theRC array 260. Thecore processor 210 ultimately controls operation of theRC array 260. It initiates all data transfers to and from theframe buffer 230 and configures the loading of the row and column context memories, 240 and 250, through theDMA controller 220. It should be noted, however, that thecore processor 210 instead of theRC array 260 calculates some computations. For example, thecore processor 210 computes the trace-back procedure of the Viterbi decoder, as will be described later. - The programmability of this architecture is derived from context words that are broadcast to the rows or columns of the
RC array 260 by either therow context memory 240 or thecolumn context memory 250. Depending on the context word, eachRC 262 can access the output of anyother RC 262 in its column or row, select an input from its own register file, or access data from theframe buffer 230. The context word thus provides functional programmability by configuring eachRC 262 to perform specific operations. - A method in accordance with an embodiment of this invention is described for the case of a standard convolutional code, with a constraint length of nine and a code rate of one third, obtained by means of an exemplary coder shown in FIG. 4A. It should be understood that the decoding method and apparatus presented by this invention may be applied to all convolutional codes having code rates of η=1/K (where K is an integer >1) and varying constraint lengths, by a simple generalization of the described method. As illustrated, convolution encoding involves the modulo-two addition of selected taps of a serially time-delayed data sequence. As illustrated in FIG. 4A, an input u(t) is passed through a series of eight
delay elements OR operators - The dynamics of this coder are described by the diagram of the trellis shown in FIG. 4B and are well known in the art. For this particular example, it is shown that for each of the two hundred fifty six possible current states, there are two potential state transition paths that can be taken into the next state. For example, if a zero input u(t) is passed through the coder when the current state is zero (S0), the resultant output (X0, Y0, Z0) is (0, 0, 0) and the resultant next state is state zero (S0). In this same example, if an input u(t) of one is passed through the coder, the resultant output is (1, 1, 1) and the resultant next state is state one hundred twenty-eight (S128). It should be noted that, for simplicity, the trellis shown in FIG. 4B corresponds to only one of several trellis stages (namely, only one set of state transitions).
- In a preferred embodiment of the present invention, only one
RC 262 is needed for convolution encoding. During this time, all other RC's 262 are shut off in order to conserve power. FIG. 5 provides a schematic diagram illustrating how internal memory space is allocated for one third code rate encoding in the singlefunctional RC 262. In particular,various registers Registers Register 315 is reserved for the current eight-bit state of the encoder (corresponding to the eightdelay elements register 320 is reserved for the actual data to be encoded (entered sixteen-bits at a time).Registers Registers - In FIG. 6, a flow chart describing the encoding procedure for one bit of data is provided. Encoding begins at
step 400 and continues with thecore processor 210 getting encoding instructions fromexternal memory unit 110 atstep 405. Thecore processor 210 then proceeds by initializing theRC array 260 for the encoding procedure atstep 410. This initialization step includes allocating the internal memory space described previously (here, it is assumed that a code rate of one third is desired). Atstep 415, register values are appropriately loaded into each of the registers illustrated in FIG. 5. Next, the most significant bit (MSB) is taken from the data register 320 atstep 420 and temporarily stored in temporary register 335 (where it is understood that the MSB is extracted from the data register 320 through a simple logic operation with the MSB mask stored in register 325) atstep 425. The stored MSB value is then concatenated with the value stored in thestate register 315 atstep 430. The value derived atstep 430 is then temporarily stored back intotemporary register 335 atstep 435. Atstep 440, a bit-wise AND operation is performed between the value stored intemporary register 335 and the appropriate value representing polynomial i stored in either register 300, 305, or 310 (where it is understood that this step will alternate these three values at each respective iteration). The result of the operation performed atstep 440 is then stored intemporary register 340 atstep 445. TheRC 262 then performs a “ones” counter operation on the value stored intemporary register 340 atstep 450 and stores this value back intotemporary register 340 atstep 455. The least significant bit (LSB) is then extracted from the value stored intemporary register 340 atstep 460 using the LSB mask stored inregister 330. The LSB found atstep 460 represents the encoded output corresponding to the polynomial used atstep 440. This value is then stored in theoutput register 345 atstep 465. Atstep 468, it is then determined whether encoding for this particular bit is complete (i.e., if there are three encoded values). If, atstep 468, it is determined that encoding for this particular bit is complete, then the data register is left-shifted by one atstep 470 in preparation for encoding the next bit; otherwise, encoding of the current bit continues by returning the procedure to step 440 where calculations are made according to the next polynomial value. Atstep 475, thecore processor 210 then determines if the encoding is complete. After left-shifting the data register atstep 470, the procedure determines whether the entire encoding process is complete (i.e., there is no further data to be encoded) atstep 475. If atstep 475, it is determined that encoding is complete, then the encoded stream of values is provided to theframe buffer 230 atstep 480; otherwise, the procedure returns to step 420 where it proceeds in determining the next encoded set of values. - In FIG. 7, a flow chart illustrating the steps for decoding a bit-stream of encoded data is shown. For simplicity, the mapping of the Viterbi decoder onto the
aforementioned RC array 260 is herein described for encoded data with constraint lengths of nine (corresponding to 2 8 states) and code rates of one-third, which correspond to typical standards used in the art. However, it should be noted that the following mapping methods can be easily adapted for Viterbi decoders with different constraint lengths and different code rates through minor software modifications. This flexibility, therefore, enables the present invention to reconfigure itself without having to make any hardware modifications. Decoding begins atstep 500 and continues with the reception of an encoded stream of data that is temporarily stored in the DMAexternal memory unit 120 atstep 505. TheDMA controller 220 then transfers this encoded data from theexternal memory unit 120 to theframe buffer 230 atstep 510. Thecore processor 210 then determines the format of the incoming data (e.g., code rate, constraint length, etc.), and initializes theRC array 260 according to this format atstep 515. For this particular example, theRC array 260 must be initialized according to data having a constraint length of nine and having a code rate of one-third. Since these specifications result in a total of two hundred fifty six states, eachRC 262 is assigned trellis information for four states as shown in FIG. 8. Atstep 520, a particular instruction is selected from therow context memory 240 enabling the first encoded packet of data (X0, Y0, Z0) to be loaded into eachRC 262 of theRC array 260. - Once this first packet of data is loaded into the
RC array 260, branch metric calculations may begin atstep 525. According to the branch metric assignments shown in FIG. 9, eachRC 262 will calculate its respective branch metrics (two branch metrics per RC 262) and store them in its local memory. It should be noted that, in general: - Branch(a s , b s , c s)=−Branch(−a s , −b s , −c s),
- where −as, −bs, and −cs are the respective binary inverses of as, bs, and cs. This simplification is well-known in the art and is implemented as shown in FIG. 9. The procedure continues at
step 530 by selecting the most probable path for each state at this particular trellis stage. Namely, atstep 530, eachRC 262 sums the calculated branch metric fromstep 525 with the cumulative branch metric of the corresponding state from the previous trellis stage and compares its two possible paths (as shown in FIG. 4B). Since each state has only two possible paths, one bit can be used to describe which path was chosen. The calculated sum corresponding to the most probable path of each state is then assigned to the next state respectively described by each of these paths. These cumulative branch metrics are then stored locally in eachRC 262 until the next trellis stage. Thus, for each node of the trellis, both a cumulative branch metric value and a path-defining value is stored. - Next, the selected path is recorded and written back to the
frame buffer 230 atstep 535. Since eachRC 262 has four bits of data that need to be stored in theframe buffer 230, each column of theRC array 260 will have a total of thirty-two bits requiring storage in theframe buffer 230. In order to pass this data through the sixteen-bit resultbus 268, a data collapse mechanism is implemented at each column by broadcasting particular instructions from either therow context memory 240 or thecolumn context memory 250. This mechanism merges the first two bits of eachRC 262 into a single sixteen-bit word and then takes the remaining two bits of eachRC 262 and merges them into another sixteen-bit word. In FIG. 10, this mechanism is described for one of the eight columns of theRC array 260. - As illustrated, this process begins by taking the first two bits of each
RC 262 and merging them with the first two bits of a neighboringRC 262 to form a set of four four-bit words. In particular, the first two bits of rows zero and one, two and three, four and five, and six and seven are respectively merged in order to create this set of four-bit words. Each four-bit word is then respectively stored in oneparticular RC 262 of theaforementioned RC 262 pairs. In the example shown, these four-bit words are respectively stored in rows zero, two, four, and six. A similar mechanism then follows in order to merge this set of four four-bit words into a set of two eight-bit words. In particular, the two four-bit words in rows zero and two merge to form the eight-bit word shown in row zero while the two four-bit words in rows four and six merge to form the eight-bit word shown in row four. The two eight-bit words are then merged to form the sixteen-bit word shown in row zero. The sixteen-bit word is then sent to theframe buffer 230 via the result-bus 268. Once this first sixteen-bit word is stored in theframe buffer 230, operations may begin to create the second sixteen-bit word through the same procedure. - Returning to the flow chart illustrated in FIG. 7, a re-ordering of the state metrics is then made at
step 540. The purpose of this step is to prepare theRC array 260 for the next trellis stage. In order for this to occur, the branch metric values calculated and assigned to each “next state” atstep 530 must be updated so that they are labeled “current state” branch metric values in the following trellis stage. It should be noted that thecore processor 210 catalyzes this state re-ordering procedure by broadcasting particular instructions from either therow context memory 240 or thecolumn context memory 250. By way of these instructions, cumulative branch metric values are easily communicated from oneRC 262 to another. - After updating these branch metric values at
step 540, an internal criterion algorithm determines whether an additional trellis stage is needed at step 545 (where it is understood that either of the two aforementioned trace-back approaches may be used). If atstep 545, it is indeed determined that an additional trellis stage is needed, the procedure returns to step 520 and thus repeats the above iteration for the following trellis stage; otherwise, the procedure initiates its trace-back operation atstep 550. Once this trace-back operation is initiated, thecore processor 210 selects the optimal path from the plethora of paths stored in theframe buffer 230. In a known way, thecore processor 210 then takes this optimal path and determines which bit stream was most likely transmitted by the encoder. This decoded bit stream is then output to theframe buffer 230 atstep 555. - Having thus described a preferred embodiment of the method and apparatus for convolution encoding and Viterbi decoding, it should be apparent to those skilled in the art that certain advantages of the within system have been achieved. It should also be appreciated that various modifications, adaptations, and alternative embodiments thereof may be made within the scope and spirit of the present invention. The invention is further defined by the following claims.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/818,746 US6448910B1 (en) | 2001-03-26 | 2001-03-26 | Method and apparatus for convolution encoding and viterbi decoding of data that utilize a configurable processor to configure a plurality of re-configurable processing elements |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/818,746 US6448910B1 (en) | 2001-03-26 | 2001-03-26 | Method and apparatus for convolution encoding and viterbi decoding of data that utilize a configurable processor to configure a plurality of re-configurable processing elements |
Publications (2)
Publication Number | Publication Date |
---|---|
US6448910B1 US6448910B1 (en) | 2002-09-10 |
US20020135502A1 true US20020135502A1 (en) | 2002-09-26 |
Family
ID=25226306
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/818,746 Expired - Lifetime US6448910B1 (en) | 2001-03-26 | 2001-03-26 | Method and apparatus for convolution encoding and viterbi decoding of data that utilize a configurable processor to configure a plurality of re-configurable processing elements |
Country Status (1)
Country | Link |
---|---|
US (1) | US6448910B1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060073804A1 (en) * | 2004-10-04 | 2006-04-06 | Hiroshi Tanaka | Semiconductor integrated circuit and a software radio device |
US20070101244A1 (en) * | 2005-09-21 | 2007-05-03 | Nils Haustein | Apparatus, system, and method for converting between serial data and encoded holographic data |
KR100980090B1 (en) * | 2008-05-28 | 2010-09-03 | 한국해양연구원 | Reconfigurable Convolutional Coding Method and Viterbi Decoding Method and Apparatus Using General Purpose Signal Processor |
US7933277B1 (en) * | 2006-05-12 | 2011-04-26 | Xilinx, Inc. | Method and apparatus for processing scalable content |
US9660882B2 (en) | 2014-01-07 | 2017-05-23 | International Business Machines Corporation | Selective convolution encoding of data transmitted over degraded links |
US10869108B1 (en) | 2008-09-29 | 2020-12-15 | Calltrol Corporation | Parallel signal processing system and method |
US20220342669A1 (en) * | 2021-04-23 | 2022-10-27 | Nxp B.V. | Processor and instruction set |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2370380B (en) | 2000-12-19 | 2003-12-31 | Picochip Designs Ltd | Processor architecture |
US20030123563A1 (en) * | 2001-07-11 | 2003-07-03 | Guangming Lu | Method and apparatus for turbo encoding and decoding |
US7043682B1 (en) * | 2002-02-05 | 2006-05-09 | Arc International | Method and apparatus for implementing decode operations in a data processor |
US7729898B1 (en) * | 2002-10-17 | 2010-06-01 | Altera Corporation | Methods and apparatus for implementing logic functions on a heterogeneous programmable device |
US7260154B1 (en) * | 2002-12-30 | 2007-08-21 | Altera Corporation | Method and apparatus for implementing a multiple constraint length Viterbi decoder |
US20050094551A1 (en) * | 2003-09-25 | 2005-05-05 | Broadcom Corporation | Processor instruction for DMT encoding |
US7305608B2 (en) * | 2003-09-25 | 2007-12-04 | Broadcom Corporation | DSL trellis encoding |
US7580412B2 (en) * | 2003-09-26 | 2009-08-25 | Broadcom Corporation | System and method for generating header error control byte for Asynchronous Transfer Mode cell |
US7751557B2 (en) * | 2003-09-26 | 2010-07-06 | Broadcom Corporation | Data de-scrambler |
US7734041B2 (en) * | 2003-09-26 | 2010-06-08 | Broadcom Corporation | System and method for de-scrambling and bit-order-reversing payload bytes in an Asynchronous Transfer Mode cell |
US7903810B2 (en) | 2003-09-26 | 2011-03-08 | Broadcom Corporation | Single instruction for data scrambling |
US7756273B2 (en) * | 2003-09-26 | 2010-07-13 | Broadcom Corporation | System and method for bit-reversing and scrambling payload bytes in an asynchronous transfer mode cell |
US7072320B2 (en) * | 2003-11-12 | 2006-07-04 | Morpho Technologies | Apparatus for spreading, scrambling and correlation in a reconfigurable digital signal processor |
US20050141784A1 (en) * | 2003-12-31 | 2005-06-30 | Ferriz Rafael M. | Image scaling using an array of processing elements |
JP2007174561A (en) * | 2005-12-26 | 2007-07-05 | Fujitsu Ltd | Viterbi decoder |
US7865025B2 (en) * | 2006-08-01 | 2011-01-04 | Ching-Wei Yeh | Data processing method in embedded block coding with optimized truncation module |
GB2454865B (en) | 2007-11-05 | 2012-06-13 | Picochip Designs Ltd | Power control |
GB2457310B (en) * | 2008-02-11 | 2012-03-21 | Picochip Designs Ltd | Signal routing in processor arrays |
GB2470037B (en) | 2009-05-07 | 2013-07-10 | Picochip Designs Ltd | Methods and devices for reducing interference in an uplink |
GB2470771B (en) | 2009-06-05 | 2012-07-18 | Picochip Designs Ltd | A method and device in a communication network |
GB2470891B (en) | 2009-06-05 | 2013-11-27 | Picochip Designs Ltd | A method and device in a communication network |
GB2474071B (en) | 2009-10-05 | 2013-08-07 | Picochip Designs Ltd | Femtocell base station |
GB2482869B (en) | 2010-08-16 | 2013-11-06 | Picochip Designs Ltd | Femtocell access control |
GB2489919B (en) | 2011-04-05 | 2018-02-14 | Intel Corp | Filter |
GB2489716B (en) | 2011-04-05 | 2015-06-24 | Intel Corp | Multimode base system |
GB2491098B (en) | 2011-05-16 | 2015-05-20 | Intel Corp | Accessing a base station |
US10075290B2 (en) * | 2013-12-20 | 2018-09-11 | Koninklijke Philips N.V. | Operator lifting in cryptographic algorithm |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS62233933A (en) * | 1986-04-03 | 1987-10-14 | Toshiba Corp | Viterbi decoding method |
FR2675968B1 (en) | 1991-04-23 | 1994-02-04 | France Telecom | METHOD FOR DECODING A CONVOLUTIVE CODE WITH MAXIMUM LIKELIHOOD AND WEIGHTING OF DECISIONS, AND CORRESPONDING DECODER. |
JPH0555932A (en) * | 1991-08-23 | 1993-03-05 | Matsushita Electric Ind Co Ltd | Error correction coding and decoding device |
FR2716588B1 (en) | 1994-02-18 | 1996-03-29 | Alcatel Telspace | Convolutional coding and decoding system of viterbi transparent to the phase jumps of pi and pi / 2, applicable in particular to TDMA transmissions. |
JPH0837467A (en) * | 1994-07-26 | 1996-02-06 | Sony Corp | Viterbi decoder and viterbi decoding method |
FR2730883B1 (en) | 1995-02-17 | 1997-04-04 | Alcatel Telspace | DEVICE FOR INITIALIZING A VITERBI DECODER COMPRISED IN A RECEIVER OF SIGNALS TRANSMITTED IN THE FORM OF PACKETS TRANSMITTER, RECEIVER AND CORRESPONDING METHOD |
GB2309867A (en) * | 1996-01-30 | 1997-08-06 | Sony Corp | Reliability data in decoding apparatus |
JPH10285653A (en) * | 1997-04-10 | 1998-10-23 | Sony Corp | Transmission rate estimate device and transmission rate estimate method |
JP3266182B2 (en) * | 1997-06-10 | 2002-03-18 | 日本電気株式会社 | Viterbi decoder |
JP3277856B2 (en) * | 1997-08-29 | 2002-04-22 | 日本電気株式会社 | Viterbi decoder |
-
2001
- 2001-03-26 US US09/818,746 patent/US6448910B1/en not_active Expired - Lifetime
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060073804A1 (en) * | 2004-10-04 | 2006-04-06 | Hiroshi Tanaka | Semiconductor integrated circuit and a software radio device |
US7756505B2 (en) * | 2004-10-04 | 2010-07-13 | Hitachi, Ltd. | Semiconductor integrated circuit and a software radio device |
US20070101244A1 (en) * | 2005-09-21 | 2007-05-03 | Nils Haustein | Apparatus, system, and method for converting between serial data and encoded holographic data |
US7549111B2 (en) | 2005-09-21 | 2009-06-16 | International Business Machines Corporation | Apparatus, system, and method for converting between serial data and encoded holographic data |
US7933277B1 (en) * | 2006-05-12 | 2011-04-26 | Xilinx, Inc. | Method and apparatus for processing scalable content |
KR100980090B1 (en) * | 2008-05-28 | 2010-09-03 | 한국해양연구원 | Reconfigurable Convolutional Coding Method and Viterbi Decoding Method and Apparatus Using General Purpose Signal Processor |
US10869108B1 (en) | 2008-09-29 | 2020-12-15 | Calltrol Corporation | Parallel signal processing system and method |
US9660882B2 (en) | 2014-01-07 | 2017-05-23 | International Business Machines Corporation | Selective convolution encoding of data transmitted over degraded links |
US20220342669A1 (en) * | 2021-04-23 | 2022-10-27 | Nxp B.V. | Processor and instruction set |
US11995442B2 (en) * | 2021-04-23 | 2024-05-28 | Nxp B.V. | Processor having a register file, processing unit, and instruction sequencer, and operable with an instruction set having variable length instructions and a table that maps opcodes to register file addresses |
Also Published As
Publication number | Publication date |
---|---|
US6448910B1 (en) | 2002-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6448910B1 (en) | Method and apparatus for convolution encoding and viterbi decoding of data that utilize a configurable processor to configure a plurality of re-configurable processing elements | |
JP3677257B2 (en) | Convolution decoding device | |
US5471500A (en) | Soft symbol decoding | |
US7127664B2 (en) | Reconfigurable architecture for decoding telecommunications signals | |
US20150003541A1 (en) | Method and system for reconfigurable channel coding | |
US6865710B2 (en) | Butterfly processor for telecommunications | |
KR20090009892A (en) | Radix-4 Viterbi decoding | |
JP3281868B2 (en) | Viterbi decoding method | |
KR100779782B1 (en) | High Speed ACS Unit for Viterbi Decoder | |
US20020162074A1 (en) | Method and apparatus for path metric processing in telecommunications systems | |
KR100336246B1 (en) | Integrated circuit with digital processor and co-processor | |
KR100785671B1 (en) | Method and apparatus for effectively reading and storing state metrics in memory for high speed AC Viterbi decoder implementation | |
CN101145790B (en) | Decoder, add-compare-select unit and method thereof | |
JP2010529764A (en) | Decoding recursive convolutional codes with a decoder for nonrecursive convolutional codes | |
US20070201586A1 (en) | Multi-rate viterbi decoder | |
US20070205921A1 (en) | Four-Symbol Parallel Viterbi Decoder | |
KR20010067413A (en) | Viterbi decoder with reduced number of bits in branch metric calculation processing | |
US6910177B2 (en) | Viterbi decoder using restructured trellis | |
KR100928861B1 (en) | Turbo decoding method and apparatus for wireless communication | |
Campos et al. | A runtime reconfigurable architecture for Viterbi decoding | |
EP1192719A1 (en) | Viterbi decoder | |
CN112290956B (en) | A CTC encoder and encoding method based on pipeline structure | |
US7260154B1 (en) | Method and apparatus for implementing a multiple constraint length Viterbi decoder | |
JPH0730438A (en) | Viterbi decoding method | |
KR20010071792A (en) | Fast metric calculation for viterbi decoder implementation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MORPHO TECHNOLOGIES, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LU, GUANGMING;REEL/FRAME:013135/0059 Effective date: 20020722 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: SMART TECHNOLOGY VENTURES III SBIC, L.P., CALIFORN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 Owner name: BRIDGEWEST, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 Owner name: ELLUMINA, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 Owner name: AMIRRA INVESTMENTS LTD., SAUDI ARABIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 Owner name: AMIR MOUSSAVIAN, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 Owner name: MILAN INVESTMENTS, LP, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 Owner name: LIBERTEL, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 Owner name: AMIR MOUSSAVIAN, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 Owner name: BRIDGEWEST, LLC, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 Owner name: AMIRRA INVESTMENTS LTD., SAUDI ARABIA Free format text: SECURITY INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 Owner name: LIBERTEL, LLC, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 Owner name: ELLUMINA, LLC, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 Owner name: MILAN INVESTMENTS, LP, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 Owner name: SMART TECHNOLOGY VENTURES III SBIC, L.P., CALIFORN Free format text: SECURITY INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES;REEL/FRAME:015550/0970 Effective date: 20040615 |
|
AS | Assignment |
Owner name: MORPHO TECHNOLOGIES, INC., CALIFORNIA Free format text: RELEASE OF SECURITY AGREEMENT;ASSIGNORS:SMART TECHNOLOGY VENTURES III SBIC, L.P.;BRIDGE WEST, LLC;ELLUMINA, LLC;AND OTHERS;REEL/FRAME:016863/0843 Effective date: 20040608 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
REMI | Maintenance fee reminder mailed | ||
FEPP | Fee payment procedure |
Free format text: PAT HOLDER NO LONGER CLAIMS SMALL ENTITY STATUS, ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: STOL); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: FINLASIN TECHNOLOGY LLC, DELAWARE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MORPHO TECHNOLOGIES, INC.;REEL/FRAME:021876/0560 Effective date: 20081009 |
|
FEPP | Fee payment procedure |
Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FPAY | Fee payment |
Year of fee payment: 12 |
|
AS | Assignment |
Owner name: NYTELL SOFTWARE LLC, DELAWARE Free format text: MERGER;ASSIGNOR:FINLASIN TECHNOLOGY LLC;REEL/FRAME:037392/0653 Effective date: 20150826 |
|
AS | Assignment |
Owner name: MORPHO TECHNOLOGIES, CALIFORNIA Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE RECEIVING PARTY AND EXECUTION DATES PREVIOUSLY RECORDED AT REEL: 016863 FRAME: 0843. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY AGREEMENT;ASSIGNORS:SMART TECHNOLOGY VENTURES III SBIC, L.P.;BRIDGEWEST, LLC;ELLUMINA, LLC;AND OTHERS;REEL/FRAME:048210/0534 Effective date: 20050802 |
|
AS | Assignment |
Owner name: M-RED INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTELLECTUAL VENTURES ASSETS 108 LLC;REEL/FRAME:048662/0318 Effective date: 20190315 |
|
AS | Assignment |
Owner name: INTELLECTUAL VENTURES ASSETS 108 LLC, DELAWARE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NYTELL SOFTWARE LLC;REEL/FRAME:048680/0705 Effective date: 20190222 |