WO2007067320A2

WO2007067320A2 - Low-complexity audio matrix decoder

Info

Publication number: WO2007067320A2
Application number: PCT/US2006/044447
Authority: WO
Inventors: Ching-Wei Chen; Christophe Chabanne
Original assignee: Dolby Laboratories Licensing Corporation
Priority date: 2005-12-02
Filing date: 2006-11-16
Publication date: 2007-06-14
Also published as: TWI420918B; WO2007067320A3; EP1964443A2; CN101336563B; TW200746872A; HK1123663A1; EP1964443B1; CN101336563A

Abstract

Deriving n audio output signals from m audio input signals, where m and n are positive whole integers and the n audio output signals are derived using an adaptive matrix or matrixing process responsive to one or more control signals, which matrix or matrixing process produces n audio signals in response to m audio signals, and (b) deriving a plurality of time- varying control signals from the m audio input signals, wherein the control signals are derived from the m input audio signals using (i) a processor or process that produces a plurality of directional dominance signals in response to the m audio input signals, at least one directional dominance signal relating to a first directional axis and at least one other directional dominance signal relating to a second directional axis, and (iÊ) a processor or process that produces the control signals in response to the directional dominance signals.

Description

Low-Complexity Audio Matrix Decoder

Technical Filed

The invention relates to audio signal processing. More particularly, the invention relates to a low-complexity adaptive audio matrix decoder or decoding process usable for decoding both encoded and non-encoded input signals. Although usable as a stand-alone decoder or decoding process, the decoder or decoding process may be advantageously used in combination with a "virtualizer" or "virtualization" process such that the decoder or decoding process provides multichannel inputs to the virtualizer or

virtualization process. The invention also relates to computer programs, stored on a computer-readable medium, for causing a computer to perform a decoding process or decoding and virtualization process according to aspects of the invention.

Incorporation by Reference

Each of the patents, published patent applications and references cited herein are hereby incorporated by reference in their entirety.

Background Art

"Virtual headphone" and "virtual loudspeaker" audio processors ("virtualizers") typically encode multichannel audio signals, each associated with a direction, into two encoded channels so that, when the encoded channels are applied to a pair of transducers such as a pair of headphones or a pair of loudspeakers, a listener suitably located with respect to the transducers perceives the audio signals as coming from locations that may be different from the location of the transducers, desirably the directions associated with the directions of the multichannel audio signals. Headphone virtualizers typically result in a listener perceiving that the sounds are "out- of-head" rather then inside the head. Both virtual headphone and virtual loudspeaker processors involve the application of head-related-transfer- functions (HRTFs) to multichannel audio signals applied to them. Virtual headphone and virtual loudspeaker processors are well known in the art and are similar to each other (a virtual loudspeaker processor may differ from a virtual headphone processor, for example, by including a "crosstalk canceller").

Examples of headphone and loudspeaker virtualizers include

virtualizers sold under the trademarks "Dolby Headphone" and "Dolby Virtual Speaker." "Dolby", "Dolby Headphone", and "Dolby Virtual Speaker" are trademarks of Dolby Laboratories Licensing Corporation.

Patents and an application relating to Dolby Headphone and Dolby Virtual Speaker include U.S. Patents 6,370,256; 6,574,649; and 6,741,706 and published International Application WO 99/14983. Other "virtualizers" include, for example, those described in U.S. Patent 6,449,368 and published International Patent Application WO 2003/053099.

Dolby Headphone and Dolby Virtual Speaker provide, respectively, the impression of multichannel surround sound using a pair of standard headphones or a pair of standard loudspeakers. Recently, low-complexity versions of Dolby Headphone and Dolby Virtual Speaker were introduced that are useful, for example, in a wide variety of new, low-cost products, such as multimedia mobile phones, portable media players, portable game consoles, and low-cost television sets. However, such low-cost products typically are two-channel stereophonic ("stereo") devices; whereas a virtualizer requires a multichannel surround sound input.

Although existing matrix decoders, for example Dolby Pro Logic II and its predecessor Pro Logic, are useful in matching the two-channel stereo audio output of low-cost devices to the multichannel surround sound input of a Dolby Headphone virtualizer, existing matrix decoders typically may be more complex and resource intensive than desirable for use with some low- cost devices. "Dolby Pro Logic" and "Dolby Pro Logic II" are a trademarks of Dolby Laboratories Licensing Corporation. Aspects of Dolby Pro Logic II are set forth in U.S. Patents 6,920,223 and 6,970,567 and in published International Patent Application WO 2002/019768. Aspects of Dolby Pro Logic are set forth in U.S. Patents 4,799,260; 4,941,177; and 5,046,098.

Thus, there is a need for a low-complexity matrix decoder, in

particular one intended and optimized for use with virtualizers, particularly virtualizers such as Dolby Headphone and Dolby Virtual Speaker. Ideally, such a new matrix decoder should minimize complexity in every stage of the process, while obtaining performance similar to a Dolby Pro Logic II decoder.

Disclosure of the Invention

The invention relates to a method for processing audio signals by (1) deriving n audio output signals from m audio input signals, where m and n are positive whole integers and the n audio output signals are derived using an adaptive matrix or matrixing process responsive to one or more control signals, which matrix or matrixing process produces n audio signals in response to m audio signals, (2) deriving a plurality of time-varying control signals from the m audio input signals, wherein the control signals are derived from the m input audio signals using (a) a processor or process that produces a plurality of directional dominance signals in response to the m audio input signals, at least one directional dominance signal relating to a first directional axis and at least one other directional dominance signal relating to a second directional axis, and (b) a processor or process that produces the control signals in response to the directional dominance signals.

The adaptive matrix or matrixing process may include (1) a passive matrix or matrixing process that produces n audio signals in response to m audio signals, and (2) amplitude sealers or amplitude scaling processes, each of which amplitude scales one of the audio signals produced by the passive matrix or matrixing process in response to a time-varying amplitude-scale- factor control signal to produce the n audio output signals, wherein the plurality of time-varying control signals are n time-varying amplitude scale factor control signals, one for amplitude scaling each of the audio signals produced by the passive matrix or matrixing process.

The value m may be 2 and the value n may be 4 or 5.

The processor or process that produces directional dominance signals may use (1) a passive matrix or matrixing process that produces pairs of signals in response to the m audio input signals, a first pair of signals representing signal strength in opposing directions along a first directional axis and a second pair of signals representing signal strength in opposing directions along a second directional axis, and (2) a processor or process that produces, in response to the two pairs of signals, the plurality of directional dominance signals, at least one relating to each of the first and second directional axes.

The processor or process that produces a plurality of directional dominance signals may use linear amplitude domain subtractors or

subtraction processes that obtain a positive or negative difference between the magnitudes of each pair of signals, an amplifier or amplification process that amplifies each of the differences, a clipper or clipping process that limits each of the amplified differences substantially at a positive clipping level and a negative clipping level, and a smoother or smoothing process that time-averages each of the amplified and limited differences.

subtraction processes that obtain a positive or negative difference between the magnitudes of each pair of signals, a clipper or clipping process that limits each of the differences substantially at a positive clipping level and a negative clipping level, an amplifier or amplification process that amplifies each of the limited differences, and a smoother or smoothing process that time-averages each of the limited and amplified differences.

The relationship between the amplification factor of the amplifier or amplification process and the clipping level at which the clipper or clipping function limits the amplified difference may constitute a positive and negative threshold for magnitudes below which the limited and amplified difference may have an amplitude between zero and substantially the clipping level, and above which the limited and amplified difference may have an amplitude substantially at the clipping level.

For uncorrelated audio input signals the directional dominance signal may approximate a directional dominance signal based on a ratio of signal pairs comparison and for correlated audio input signals the directional dominance signal may tend toward the negative or positive clipping level.

The transfer function of the limited and amplified difference with respect to the difference may be substantially linear between the thresholds.

A difference above the positive threshold may indicate a positive dominance along a directional axis, a difference below the negative threshold may indicate a negative dominance along a directional axis, and a difference between the positive and negative threshold may indicate non-dominance along a directional axis.

The processor or process that produces a plurality of directional dominance signals may also modify the amplified and limited difference signal prior to or after smoothing so that the derived directional dominance signal is biased along the axis to which the directional dominance signal relates.

The processor or process that produces a plurality of directional dominance signals may modify the amplified and limited difference signal differently when there is non-dominance along a directional axis than when there is positive or negative dominance.

The processor or process that produces the control signals in response to the plurality of directional dominance signals may apply at least one panning function to each of the plurality of directional dominance signals.

In another aspect, the invention may derive p audio signals from the n audio output signals, wherein p is two and the p audio signals are derived from the n audio signals using a virtualizer or virtualization process such that, when the p audio signals are applied to a pair of transducers, a listener suitably located with respect to the transducers perceives the n audio signals as coming from locations that may be different from the location of the transducers. The virtualizer or virtualization process may include the application of one or more head-related-transfer-functions to ones of the n audio output signals. The transducers may be a pair of headphones or a pair of loudspeakers.

Although aspects of the present invention are usable with other types of matrix decoders, in an exemplary embodiment a fixed-matrix-variable- gains approach is employed because of its low complexity compared to the variable-matrix approach. The excellent isolation of single sound sources occurring with use of a variable gains decoder may be acceptable, if not preferable, for game audio where single audio events may be common.

When working with virtualizers, it is desirable to reduce inter-channel leakage as much as possible, because of the interactions and cancellations between and among the Head Related Transfer Functions (HRTFs) of different channels. The variable gains approach allows turning off certain channels completely, keeping inter-channel leakage to a minimum.

Furthermore, "pumping" side-effects that may occur under certain signal conditions when using a variable gains decoder are not as

objectionable when used in conjunction with a virtualizer. This is because of the nature of virtualizers to produce two channels of output for every one input channel. Although a variable gains matrix decoder may cause certain speakers to turn off completely, neither of the two outputs of a virtualizer turn off completely as long as at least one of its inputs is active.

As explained further below, optimizations may be made to deal with another known disadvantage of the variable-gains approach— the loss of non-dominant signals, resulting in a decoder with the best of both worlds.

Also, because one use of the matrix decoder according to aspects of the invention is to derive multichannel content for virtualizers, the number of outputs may be restricted to four: Left, Right, Left Surround, and Right Surround. Indeed, the main goal of virtualizers is to convey a good sense directionality all around the listener; this may be achieved using only four channels, omitting the center channel, the inclusion of which would have significantly increased the processing execution time, while marginally enhancing the perception of directionality.

Because destructive interferences occur when Head Related Transfer Functions (HRTFs) are summed together; it is preferable to avoid

correlations between and among channels. In other words, virtualizers perform better when sources are steered as much as possible toward one speaker at a time. However, achieving such a result should be balanced against compromising the overall soundstage.

Description of the Drawings

FIG. 1 is a schematic functional block diagram showing an example of a processor or process according to aspects of the present invention for deriving pairs of intermediate control signals from a plurality of audio input signals, the pairs of intermediate controls signals representing signal strength in opposing directions along a directional axis. In this example, which may be designated "Stage 1," there are two audio input signals, Lin and Rin and there are two pairs of intermediate control signals, L-R and F-B. FIG. 2 is a schematic functional block diagram showing an example of a processor or process according to aspects of the present invention for deriving a plurality of directional dominance signals, at least one such signal for every pair of intermediate control signals. In this example, which may be designated "Stage 2," there are two pairs of intermediate control signals, L-R and F-B and two directional dominance signals, LR and FB.

FIG. 3 shows an example of a notional or theoretical directional dominance vector in a two-dimensional plane based on orthogonal LR and FB axes.

FIG. 4 is an idealized plot of signal amplitude versus time showing the absolute values L and R, respectively, of a two-channel stereo signal in which the Left input channel (Lin) before taking its absolute value is a 50 Hz sine wave with a peak amplitude of 0.4, and the Right input channel (Rin) before taking its absolute value is a sine wave with frequency of (50 * V2) Hz and peak amplitude of 1.0. The frequencies of the sine waves are uncorrelated, while the level of the Left channel is 0.4 times the level of the Right channel.

FIG. 5 is an idealized plot of signal amplitude versus time showing both the result of subtracting L from R and the result of multiplying the difference and then clipping at -1.0 and +1.0 to provide a quasi-rectangular wave.

FIG. 6 is an idealized plot of signal amplitude versus time showing a smoothed LR intermediate control signal resulting from feeding the quasi- rectangular wave of FIG. 5 through a smoother filter, illustrating that, for substantially non-correlated signal inputs, the directional dominance signal approaches a value close to a value that would result from a ratio-based comparison of signal strengths along the directional axis to which the LR intermediate control signal relates. FIG. 7 is a schematic functional block diagram showing an example of a modification of the processor or process according to aspects of the present invention shown in FIG. 2. In this example, which may also be designated "Stage 2," the amplified and clipped FB difference is limited to values less than zero in order to bias the FB dominance signal towards the back.

FIG. 8 is an idealized plot of gain versus angle in radians showing a common pan-law between Left (L) and Right (R) audio channels, a

sine/cosine pan-law where L=cos(x)*input, and R=sin(x)*input, with x varying from 0 to π/2.

FIG. 9a is an idealized plot of gain versus directional dominant signal level for panL and panR when the same Sine/Cosine pan-law of FIG. 8 is applied to the LR axis, panL and panR representing the gain contribution, respectively, from Left and Right.

FIG. 9b is an idealized plot of gain versus directional dominant signal level for panB and panF when the same Sine/Cosine pan-law of FIG. 8 is applied to the FB axis, panB and panF representing the gain contribution, respectively, from Back and Front.

FIG. 10 is an idealized plot showing a quasi-3 -dimensional

representation of the LGain equation (the axes being normalized gain, and the values of FB and LR).

FIG. 11 is an idealized plot showing a quasi-3 -dimensional

representation of the LGain, RGain, LsGain and RsGain equations (the axes being normalized gain, and the values of FB and LR).

FIG. 12 is an idealized plot showing a cosine curve and a second-order polynomial approximation of a cosine curve between 0 and π/2, showing that the approximation, y = (1 - x² ) is reasonably close to y = cos (x * π/2) within the range 0<x<l. The lower curve is the approximation.

FIG. 13 is an idealized plot showing a quasi-3 -dimensional

representation of a modification to the LGain, RGain, LsGain and RsGain equations (the axes being normalized gain, and the values of FB and LR) in which the LR panning component is not employed when calculating LGain and RGain.

FIG. 14 is a schematic functional block diagram showing an example of a processor or process according to aspects of the present invention for deriving a plurality of control signals from the plurality of directional dominance signals. In this example, which may be designated "Stage 3," four control signals LGain, RGain, LsGain and RsGain are derived from two directional dominance signals LR and FB.

FIG. 15 is a schematic functional block diagram showing an example of an adaptive matrix processor or process according to aspects of the present invention for deriving a plurality of audio output signals from the input audio signals and a plurality of control signals. In this example, which may be designated "Stage 4," a pair of audio input signals Lin and Rin are applied to a passive matrix and the level of each matrix output is controlled by a respective one of the four control signals LGain, RGain, LsGain and RsGain to produce four audio output signals LOut, ROut, LsOut and RsOut.

FIG. 16 is a schematic functional block diagram showing an overview of all four Stages of the example, indicating their inter-relationships.

Best Mode for Carrying Out the Invention

Aspects of the present invention may be better understood in connection with an exemplary embodiment, which embodiment may be broken into four "stages" for convenience in description. The overall relationship of the four stages in the context of an adaptive matrix audio decoder or decoding process receiving m input audio signals, two signals, Lin and Rin in this example, and outputting n audio signals, four signals, LOut (left out), ROut (right out), LsOut (left-surround out), and RsOut (right-surround out), in this example, is shown in FIG. 16. The decoder or decoding process has a control path that includes Stages 1, 2 and 3 and a signal path that includes an adaptive matrix or matrixing process in Stage 4. A plurality of time-varying control signals, four control signals in this example, are generated by the control path and are applied to the adaptive matrix or matrixing process.

Stage 1

Turning first to Stage 1, shown in FIG. 1, m audio input signals, Lin and Rin in this example, are applied to a processor or process that derives pairs of signals in response to the m audio input signals, a first pair of signals, L and R in this example, representing signal strength in opposing directions along a first directional axis, an L-R or Left-Right axis in this example, and a second pair of signals, F and B in this example, representing signal strength in opposing directions along a second directional axis, an F-B or Front-Back axis in this example. Although this example employs two directional axes that are orthogonal, there may be more than two directional axes (and, hence, more than two pairs of signals representing signal strength in opposing directions along respective ones of additional directional axes) and the axes need not be orthogonal (see, e.g., said U.S. Patent 6,970,567). The processor or process of Stage 1 may be viewed as a passive matrix or matrixing process. In this example, a simple passive matrix computes Left, Right, Sum and Difference signals, and their absolute values are used as intermediate control signals L, R, F, and B. More specifically, the passive matrix or passive matrixing process of this example may be characterized by the following equations:

L = I Lin I

R = \ Rin I

F = \ (0.5 * Lm) + (0.5 * Rin) \

B = I (0.5 * Lin) - (0.5 * Rin) \ Stage 2

Turning next to Stage 2, shown in FIG. 2, the plurality of pairs of signals, each pair representing a signal strength in opposing directions along a directional axis, are applied to a processor or process that produces a plurality of directional dominance signals. In this example, there are two pairs of signals L-R and F-B applied to Stage 2 and two directional

dominance signals , LR and FB are produced by Stage 2. In principle, as mentioned above, there may be more than two directional axes (and, hence, more than two pairs of signals and more than two directional dominance signals). It is also possible to produce more directional dominance signals than there are pairs of signals and related axes. This may be accomplished by processing a pair of applied signals in more than one way so as to produce multiple directional dominance signals in response to a particular pair of applied signals. Before turning to details of a Stage 2 example, it is useful to explain the operational rationale of Stage 2.

Having obtained a measure of the signal's strength in each of the four directions (L, R, F, B), one would like to compare the strength in one direction against the strength in the opposite direction (L against R, and F against B) to provide a measure of the dominance along that directional axis. Because the four directions of this example provide two directional axes at 90-degrees to each other (orthogonal axes), such a pair of dominances may be interpreted as a single dominance vector on a 2-dimensional LR/FB plane. Such a notional or theoretical dominance vector may be shown as in the example of FIG. 3. Although such a dominance vector is implicit in the operation of a matrix decoder or decoding process in accordance with aspects of the invention, such a dominance vector need not be explicitly calculated.

A negative value along the LR axis may indicate dominance towards the Left, while a positive LR value may indicate dominance towards the Right. Similarly, a negative FB value may indicate dominance towards the Back, while a positive FB value may indicate dominance towards the Front. Interpreting the two dominance values as components of a 2D vector, one may visualize the dominance of a signal as lying anywhere on the LR/FB plane.

In most modern matrix decoders, including Dolby Pro Logic and Dolby Pro Logic II, the dominance in the LR direction is computed using the ratio of L and R, and the dominance in the FB direction is computed using the ratio of F and B. Because a ratio is independent of the magnitude of the two signals being compared, it provides a steady dominant direction throughout the natural amplitude variations found in real audio signals.

Unfortunately, if implemented by a computer program controlling a digital signal processor ("DSP"), such an approach requires case statements in the program to choose the numerator and denominator, as well as to assign the sign to the dominance value. More importantly, common methods of deriving a ratio, such as a division, or a subtraction in the log domain, require significant computational resources. A more simplistic approach of subtracting the two numbers in the linear amplitude domain (e.g., not the logarithmic-domain) is certainly more efficient to compute, but such subtraction produces dominance signals that change rapidly with natural variances in signal amplitudes.

To reduce the complexity of implementation, aspects of the present invention retain much of the amplitude independence of the ratio-based comparison, but require much less computation.

The processor or process of Stage 2 produces a plurality of directional dominance signals using linear-amplitude-domain subtractors or subtraction processes that obtain a positive or negative difference between the

magnitudes of each pair of applied signals. Such subtraction may be implemented with very low computation resources. The result of each subtraction is amplified by an amplifier or amplification process and the amplified difference is applied to a clipper or clipping process that limits each of the amplified differences substantially at a positive clipping level and a negative clipping level. Alternatively, the order of the

amplifier/amplification process and the clipper/clipping process may be reversed, using appropriate clipping levels in order to produce an equivalent result. A smoother or smoothing process may time average each of the amplified and limited differences to provide a directional dominance signal.

The relationship between the amplification factor of the amplifier or amplification process and the clipping level at which the clipper or clipping function limits the amplified difference constitutes a positive and negative threshold for magnitudes below which the limited and amplified difference has an amplitude between zero and substantially the clipping level, and above which the limited and amplified difference has an amplitude substantially at the clipping level. Although the particular transfer function is not critical and may take many forms, a transfer function in which the limited and amplified difference with respect to the difference is

substantially linear between the thresholds has very low computational requirements and is suitable.

The processor or process of Stage 2 may include modifications to an amplified and limited difference signal prior to or after smoothing during its processing so that the derived directional dominance signal is "biased" along the axis to which the directional dominance signal relates. The bias may be fixed or adaptive. For example, a difference signal after amplification and clipping may be scaled in amplitude and/or shifted in amplitude {i.e., offset) and/or restricted in amplitude or sign in a fixed manner or, for example, as a function of the magnitude, sign, or magnitude and sign of the amplified and clipped difference signal. The result, for example, may include the application of less bias to non-dominant signals than to dominant signals (dominance and non-dominance are explained further below). An example of applying "bias" to a directional dominance is described below in

connection with FIG. 7.

In the Stage 2 example of FIG. 2, two pair of signals, L-R and F-B₅ are applied in order to produce two directional dominance signals LR and FB. Given the four intermediate directionality signals (L, R, F, B), as described above, one would like to derive two dominance signal components, LR and FB, by comparing the directionality along each axis. According to aspects of this invention, this is accomplished by subtracting R from L, and B from F (or vice-versa in each case), to provide a magnitude difference signal along each axis. Heavy gain is applied to the difference signals, and the amplified difference is clipped (hard limited) to -1.0 and +1.0. The clipped difference signal is then applied to a time-smoothing filter.

By applying heavy gain and clipping to the difference signals, essentially any amount of dominance in a direction is treated as an absolute dominance in that direction. For signals where instantaneous directionality changes from one polarity to another, the result of this operation is similar to a rectangular wave with varying frequency and duty-cycle. The time- smoothing filter averages out the mostly rectangular wave to provide a continuous curve that approximates a ratio of the original directionality signals to one another. Although the exact filter used is a design choice, the filter may be implemented efficiently, for example, as a first order digital IIR lowpass filter having a time constant of about 40 ms.

In addition to detecting the dominant direction along each axis, it may be advantageous to represent "non-dominance." For example, a purely Left- steered input signal should exhibit a strong dominance on the Left-Right axis, but should have absolutely no dominance along the Front-Back axis. Another example is for extremely low level signals such as background noise, which one would prefer not to cause any steering effects. In accordance with aspects of the invention, a general approach to doing this is to choose a threshold value, and assign differences with a magnitude greater than the threshold a value of -1.0 or 1.0 (depending on the sign of the difference), and assign differences with magnitudes smaller than this threshold some value in between the two extremes. One possibility is to assign a value of 0.0 to all difference values below the threshold. To implement this in a program-controlled DSP would require some case statements and numerical comparisons. A better approach from the

standpoint of low complexity is to amplify the difference by a large gain such that the output of values below the threshold follow a linear function from -1.0 to +1.0. The gain is the inverse of the threshold. This approach is very efficient - both the gain and clipping stages may be implemented in a program-controlled DSP as an arithmetic left shift (for gains that are a power of 2) with the DSP's ''saturation logic" set (i.e., set a control register/bit in the DSP so that when the ALU overflows, the result is set to the maximum positive value or minimum negative value represented by the platform, depending on the sign). Gains that are not a power of two may be

implemented with only a slight increase in processing complexity.

A three-regioned dominance signal (negative dominance, positive dominance, and non-dominance) permits distinguishing between dominance and non-dominance along a directional axis before smoothing.

Distinguishing dominance and non-dominance facilitates the adaptive application of "bias" to a directional dominance signal, as mentioned above and an example of which is given below in connection with FIG. 7. For example, as shown below, it is useful in aspects of the present invention for distinguishing, before smoothing, a solely Left-steered signal from a Left Surround-steered signal, and a solely Right-steered signal from a Right Surround-steered signal. In a practical embodiment of the invention, to determine the minimum gain necessary to distinguish a side (Left or Right)-steered signal from a Surround (Left Surround or Right Surround)-steered signal, musical material encoded with a Dolby Pro Logic II matrix encoder was decoded. The average (F-B) difference signal was measured for a Left Surround- or Right Surround-steered input and this was used as an estimate of the maximum threshold (minimum gain) that would maintain a clear distinction between Left and Left Surround (or Right and Right Surround). In this practical embodiment of a decoder according to aspects of the invention, a gain factor of 1024 was used, equivalent to a threshold of approximately 0.001 for signals normalized to [-1 +1]. Thresholds smaller than 0.001 produce marginal audible improvement, while larger thresholds reduce the separation between the sides (Left and Right) and the surrounds (Left Surround and Right Surround) to unacceptable levels. In general, the threshold level is not critical.

To illustrate this technique, consider a two-channel stereo signal where the Left input channel (Lin) is a 50 Hz sine wave with a peak amplitude of 0.4, and the Right input channel (Rin) is a sine wave with frequency of (50 * V2) Hz and peak amplitude of 1.0. Such signals are shown in FIG. 4. The frequencies of the sine waves are uncorrelated, while the level of the Left channel is 0.4 times the level of the Right channel.

Using a ratio-based comparison as described above, this provides a dominance in the Right direction (defined as positive here) of 0.6. As shown in Stage 1, the L and R intermediate signals are the magnitudes of the input signals Lin and Rin.

After subtracting L from R, the difference is multiplied, for example, by 1024 (implemented as an arithmetic left shift of 10 bits), and then clipped at -1.0 and +1.0 to provide a quasi-rectangular wave. FIG. 5 shows the difference signal before and after clipping. Feeding the quasi-rectangular wave through a smoother filter that provides the LR directional dominance signal. In this example in which the input signals have constant levels, the directional dominance signal eventually reaches and oscillates around a value of 0.65, as shown in FIG. 6, close to the dominance value computed using a ratio-based comparison. The smoothness of the oscillation is a function of the order and characteristics of the smoother filter.

This example is representative of audio material that has significant amounts of uncorrelated signals in each input, such as un-encoded two- channel stereo music, where the polarity of the clipped amplified difference signal is inverted very often. Under these input conditions, the

subtract/amplify/clip derived dominance control signal produces results close to those obtained from a ratio-based comparison.

However, for material with common (i.e., correlated) signals in both channels, such as a steered mono sound source contained in matrix-encoded content, the clipped difference signal does not contain many zero crossings. In such cases, even the smoothed control signal tends to "lock" to one of the two extremes (i.e., +1.0 and -1.0), with a smoothed transition across to the other extreme if and when the polarity of the difference signal eventually inverts. Such "locking" of one dominance component may be thought of as pulling a 2-dimensional dominance vector out along the edges of the LR/FB plane. When both components are "locked", the dominance vector is pulled to one of the four comers of the LR/FB plane. According to aspects of the present invention, such hard-panning improves the spatial imaging of matrix- encoded content, by providing a more discrete, single channel of input to a virtualizer.

Front-Back Dominance Bias

A shortcoming of the variable gain approach is that non-dominant signals may be lost in the decoded output. This is apparent in musical sound sources, where there are a large number of sound sources mixed together with many different level and phase differences. Often, there are a few main instruments and vocals mixed equally in both Left and Right, while there are still many other less dominant, out-of-phase sounds that add to the overall space and ambience of the soundfϊeld. Because the decoder uses only the direction of the most dominant sound component, a traditional variable gains approach on such material may result in almost no output of the out-of-phase material from the rear decoder outputs (the Left Surround and Right

Surround outputs in the example).

According to an aspect of the present invention, this problem is mitigated by biasing the FB dominance signal towards the back, assuring that out-of-phase material is not completely removed from the surround outputs. One way to accomplish this is to limit the FB signal to negative values before the smoother filter. This is shown in the example of FIG. 7. For a pure rectangular wave between -1.0 and 1.0, this is equivalent to scaling down by half the output of the smoother filter followed by a fixed offset of -0.5. Thus, such a modification may be imposed either before or after the smoother filter. However, the clipped difference signal may not be a pure rectangular wave. Rather, it may contain in-between values when the difference signal falls below the threshold value, indicating non-dominance along a particular axis. When the magnitude of the clipped difference signal is smaller than 1.0, the process of limiting FB to negative values results in a smaller to negligible effective bias after smoothing. Thus, being able to distinguish non-dominance from positive or negative dominance before smoothing in this way allows pure Left and Right-steered signals to maintain high separation from the surrounds, while giving most other signals a significant bias towards the back. Stage 3

The processor or process of Stage 3 produces control signals for controlling the adaptive matrix or matrixing process in response to the plurality of directional dominance signals by applying one or more panning functions (a panning function is a transfer function representing an

interchannel "panning" characteristic) to each of the directional dominance signals. One or more of the panning functions may implement one or more of:

a trigonometric transfer function (such as a sine or cosine transfer function),

a logarithmic transfer function,

a linear transfer function, and

a mathematically simplified approximation of a

trigonometric transfer function.

The goal of Stage 3, in the example, is to take the LR and FB

dominance signals computed in the previous Stage, and derive the gain factors that are applied to the outputs of the passive matrix to produce the decoded outputs.

The general approach for the matrix decoder or decoding process according to aspects of the present invention is this: having detected a certain dominant directionality in the input, emphasize the output channels closest to that dominant location, and de-emphasize the outputs furthest from the dominant location. Between the two outputs closest to the dominant location, the problem may be reduced to a pair- wise pan, which may be expressed as a panning function.

Sine/Cosine Pan-law

The most common pan-law between two channels is the sine/cosine pan-law where L=cos(x)*input, and R=sin(x)*input, with x varying from 0 to π/2. See FIG. 8. 2-Dimensional Sine/Cosine Pan-law

The gain for each decoder output channel must be expressed as a function of LR and FB:

LGain =f_L (LR, FB)

RGain =f_R (LR, FB)

LsGain =f_£s (LR, FB)

RsGain =f_Rs (LR, FB)

One may apply to the LR and FB axes the same Sine/Cosine pan-law described above, and obtain the panning curves shown in FIGS. 9a and 9b, where panL, panR, panB, and panF represent the gain contribution from respectively Left, Right, Back and Front.

Recognizing that the sine function is a cosine with a phase shift, one may obtain the following panning equations using only cosine functions:

panL = cos ( (LR + I) / 2 * π/2)

panR = sin ((LR + 1) / 2 * π/2) = cos ((LR - l) /2 * π/2) panB = cos ( (FB + 1) * π/2 )

panF = sin ( (FB + 1) * π/2) = cos (FB * π/2)

By the nature of the left channel location on the LR/FB plane (see FIG. 3), LGain should be maximum only when both panL and panF are maximum, and should decrease as the dominance gets farther away on both, or either of the axis. This may be achieved by multiplying panL with panF. The same principle may be applied to RGain, LsGain and RsGain, and the final equations for all gains become:

LGain—panL *panF

RGain—panR * panF

LsGain—panL *panB

RsGain =panR *panB

The use of a multiplication may also be seen as a mutual scaling of the two Sine/Cosine amplitude-panning functions, where the smallest value of the two components becomes the largest value that the overall gain can reach.

FIG. 10 shows the 3 -dimensional representation of the LGain equation, and FIG. 11 the 3 -dimensional representation of all four gains superimposed.

Polynomial Approximation of a Cosine Function As shown in FIG. 8, the pan-law is composed of two curves: cos(x) and sin(x). (The sin function can be replaced by a cos function with the appropriate phase shift.) In order to avoid complex computations or using large lookup tables, in accordance with an aspect of the present invention, a second-order polynomial approximation of a cosine curve between 0 and π/2 may be used instead. The equation y = (1 - x² ) is reasonably close to y = cos (x * π/2) within the range 0<x<l . (see FIG. 12 in which the lower curve is the approximation). There may be little to no audible difference resulting from the use of this approximation.

Front Panning Adjustment

Because the anticipated audio input source is two-channel stereo, which is already mixed to pan naturally between L and R, it is an aspect of the present invention not to consider the LR panning component when calculating LGain and RGain. The additional left-right panning in the variable gains would not significantly improve separation in this case, since L and R are already well separated. In addition to saving some computation, it also allows a more stable soundfield in the front, by avoiding unnecessary gain riding. Removing the LR component, one arrives at these equations:

LGain =panF

RGain =panF

LsGain =panL * panB

RsGain =panR *panB The 3 -dimensional representation of these new equations is shown in FIG. 13.

Note that a similar simplification may be applied to the Ls gain and Rs gain equations, whereby no additional LR panning is used, and the natural panning within the source signal is used to create separation between the two surround channels. However, in such a case, the Ls and Rs separation is limited by the performance of the passive decoding taking place in Stage 4. A passive decoding matrix or matrixing process, such as forms part of aspects of the present invention, can only achieve a 3 dB separation between Ls and Rs, therefore making this simplification unacceptable from a channel separation standpoint. In order to maintain a higher degree of separation, the LR component in the equations of LsGain and RsGain is retained.

Final Gain Equations

Substituting the polynomial approximation for the cosine in each panning term, one may derive the final equation for each gain factor:

LGain = l - FB²

When FB = 0 ->LGain = 1

When FB = -l ^LGain = 0

RGain = l - FB²

When FB = 0 ->RGain = 1

When FB = -l ->RGain = 0

LsGain = [ 1 - ( (LR + 1) / 2 )² ] * [ 1 - (FB + I)² ] When FB = O ^LsGain = 0

When FB = -1 and LR = -1 ^LsGain = 1

When FB = -1 and LR = 1 -^LsGain = 0

RsGain = [ 1 - ( (LR - 1) / 2 )²] * [ 1 - (FB + I)² ] When FB = 0 -^Rs gain = 0

When FB = -1 and LR = -1 ^RsGain = 0

When FB = -1 and LR = 1 ^RsGain = 1 Referring to FIG. 14, the control signals LGain, RGain, LsGain, and RsGain are derived from the application of a panning function to a

directional dominance signal, and/or the product of the application of a panning function to one directional dominance signal and the application of a panning function to another directional dominance signal, wherein each panning function may be different from ones or all of the other pamiing functions. The panning functions are panning functions that are not inherent in the n input audio signals. In this example, one of the directional axes is a left/right axis and the panning functions are panning functions that do not include a left/right panning component. The following applies to this example. The LR directional dominance signal is applied to a panL panning function and to a panR panning function. The FB directional dominance signal (either without biasing as in FIG. 2 or with biasing as in FIG. 7) is applied to a panF panning function and to a panB panning function. The result of applying the panF function to the FB dominance signal is applied as both the LGain and as the RGain to the Stage 4 passive decoder or decoding process. The result of applying the panB function to the FB dominance signal is multiplied by the result of applying the panL function to the LR dominance signal and is applied as the LsGain to the Stage 4 passive decoder or decoding process. The result of applying the panR function to the LR dominance signals is multiplied by the result of applying the panB function to the FB dominance signal and is applied as the RsGain to the Stage 4 passive decoder or decoding process.

Stage 4

FIG. 15 shows a passive matrix or matrixing process that produces n audio signals in response to m audio signals, and amplitude sealers or amplitude scaling processes, each of which amplitude scales one of the audio signals produced by the passive matrix or matrixing process in response to a time-varying amplitude-scale-factor control signal to produce the n audio output signals, wherein the plurality of time-varying control signals are n time-varying amplitude scale factor control signals, one for amplitude scaling each of the audio signals produced by the passive matrix or matrixing process. In the example of FIG. 14, the are two input audio signals, Lin and Rin, four audio output signals LOut, ROut, LsOut and RsOut, and four scale- factor control signals LGain, RGain, LsGain, and RsGain (from Stage 3).

In the example of FIG. 15, four audio output signals may be

characterized by the following equations:

LOut = LGain * (a *Lin + b * Rin)

ROut = RGain * (c * Lin + d * Rin)

LsOut - LsGain * (e * Lin +f *Rin)

RsOut = RsGain * (g * Rin + h * Lin)

where a through h are matrix coefficients, as indicated in FIG. 15. The coefficients a through h may be chosen to match those used in the Dolby Pro Logic II encode/decode system, where:

a = 1.0, b ^ 0.0,

c = 0.0, d = 1.0,

e = 0.8710, f ^'= -0.4898,

g = -0.4898, h = 0.8710

This provides the final equations:

LOut = LGain * Lin

ROut = RGain * Rin

LsOut = LsGain * (0.8710 * Lin - 0.4898 * Rin)

RsOut = RsGain * (0.8710 * Rin - 0.4898 * Lin)

FIG. 16 shows an overview of all four Stages of the example, indicating their inter-relationships.

Implementation

The invention may be implemented in hardware or software, or a combination of both {e.g., programmable logic arrays). Unless otherwise specified, the algorithms included as part of the invention are not inherently related to any particular computer or other apparatus. In particular, various general-purpose machines, such as digital signal processors, may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus (e.g., integrated circuits) to perform the required method steps. Thus, the invention may be implemented in one or more computer programs executing on one or more programmable computer systems each comprising at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device or port, and at least one output device or port. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices, in known fashion.

Each such program may be implemented in any desired computer language (including machine, assembly, or high level procedural, logical, or object oriented programming languages) to communicate with a computer system. In any case, the language may be a compiled or interpreted language.

Each such computer program is preferably stored on or downloaded to a storage media or device (e.g., solid state memory or media, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein. The inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein. A practical embodiment of the present invention embodied in a computer program suitable for controlling a digital signal processor has been implemented with under 30 lines of C code, running at an estimated 3 MIPS₅ and using virtually no memory. This is approximately 15% of the estimated MIPS usage of a Dolby Pro Logic II decoder. Processing may remain entirely in the time domain and be performed on a sample per sample basis (no block processing). In order to minimize the execution time for every sample, implementations may avoid the use of branches and mathematical functions such as square root, sine, cosine, and divide. Implementations may also avoid the use of lookup tables and look-ahead delays, which increase memory requirements and increase execution time. Thus aspects of the invention may be implemented with very simple computer programs and very basic digital signal processors. Particularly in view of their simplicity, aspects of the present invention may also be implemented using analog circuitry.

A number of embodiments of the invention have been described.

Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, some of the steps described herein may be order independent, and thus can be performed in an order different from that described.

Claims

1. A method for processing audio signals, comprising

deriving n audio output signals from m audio input signals, where m and n are positive whole integers and the n audio output signals are derived using an adaptive matrix or matrixing process responsive to one or more control signals, which matrix or matrixing process produces n audio signals in response to m audio signals,

deriving a plurality of time-varying control signals from said m audio input signals, wherein the control signals are derived from said m input audio signals using

a processor or process that produces a plurality of directional dominance signals in response to the m audio input signals, at least one directional dominance signal relating to a first directional axis and at least one other directional dominance signal relating to a second directional axis, and

a processor or process that produces said control signals in response to said directional dominance signals.

2. A method according to claim 1 wherein said adaptive matrix or matrixing process comprises

a passive matrix or matrixing process that produces n audio signals in response to m audio signals, and

amplitude sealers or amplitude scaling processes, each of which amplitude scales one of the audio signals produced by the passive matrix or matrixing process in response to a time-varying amplitude-scale-factor control signal to produce said n audio output signals,

wherein said plurality of time-varying control signals are n time- varying amplitude scale factor control signals, one for amplitude scaling each of the audio signals produced by the passive matrix or matrixing process.

3. A method according to claim 1 or claim 2 wherein m is 2 and n is 4 or 5.

4. A method according to claim 1, 2 or 3 wherein the processor or process that produces directional dominance signals uses

a passive matrix or matrixing process that produces pairs of signals in response to the m audio input signals, a first pair of signals representing signal strength in opposing directions along a first directional axis and a second pair of signals representing signal strength in opposing directions along a second directional axis, and a processor or process that produces, in response to the two pairs of signals, said plurality of directional dominance signals, at least one relating to each of said first and second directional axes.

5. A method according to claim 4 wherein the processor or process that produces a plurality of directional dominance signals uses linear amplitude domain subtractors or subtraction processes that obtain a positive or negative difference between the magnitudes of each pair of signals, an amplifier or amplification process that amplifies each of said differences, a clipper or clipping process that limits each of the amplified differences substantially at a positive clipping level and a negative clipping level, and a smoother or smoothing process that time-averages each of the amplified and limited differences.

6. A method according to claim 4 wherein the processor or process that produces a plurality of directional dominance signals uses linear amplitude domain subtracters or subtraction processes that obtain a positive or negative difference between the magnitudes of each pair of signals, a clipper or clipping process that limits each of the differences substantially at a positive clipping level and a negative clipping level, an amplifier or amplification process that amplifies each of said limited differences, and a smoother or smoothing process that time-averages each of the limited and amplified differences.

7. A method according to claim 5 or claim 6 wherein the relationship between the amplification factor of the amplifier or amplification process and the clipping level at which the clipper or clipping function limits the amplified difference constitutes a positive and negative threshold for magnitudes below which the limited and amplified difference has an amplitude between zero and substantially the clipping level, and above which the limited and amplified difference has an amplitude substantially at the clipping level.

8. A method according to claim 7 wherein for uncorrelated audio input signals the directional dominance signal approximates a directional dominance signal based on a ratio of signal pairs comparison and for correlated audio input signals the directional dominance signal tends toward the negative or positive clipping level.

9. A method according to claim 7 wherein the transfer function of the limited and amplified difference with respect to the difference is

substantially linear between the thresholds.

10. A method according to claim 7 or claim 9 wherein a difference above the positive threshold indicates a positive dominance along a directional axis, a difference below the negative threshold indicates a negative dominance along a directional axis, and a difference between the positive and negative threshold indicates non-dominance along a directional axis.

11. A method according to any one of claims 5, 6, 7, 9 and 10 wherein the processor or process that produces a plurality of directional dominance signals also modifies the amplified and limited difference signal prior to or after smoothing so that the derived directional dominance signal is biased along the axis to which the directional dominance signal relates.

12. A method according to claim 11 as dependent on claim 10 wherein the processor or process that produces a plurality of directional dominance signals modifies the amplified and limited difference signal differently when there is non-dominance along a directional axis than when there is positive or negative dominance.

13. A method according to any one of claims 5, 6, 7, 9 and 10 wherein the processor or process that produces a plurality of directional dominance signals also restricts either the positive or negative magnitude of the output of a clipper or clipping process prior to a smoother or smoothing process.

14. A method according to claim 13 wherein the processor or process that produces a plurality of directional dominance signals restricts the positive magnitude of the output of at least one of the clippers or clipping processes prior to the smoother or smoothing process.

15. A method according to claim 14 wherein the first directional axis is a front/back axis and the processor or process that produces a plurality of directional dominance signals restricts the positive magnitude of the output of the clipper or clipping process that processes a front/back axis directional dominance signal.

16. A method according to any one of claims 4-15 wherein the processor or process that produces said control signals in response to said plurality of directional dominance signals applies at least one panning function to each of said plurality of directional dominance signals.

17. A method according to claim 16 wherein one or more of the panning functions implement a trigonometric transfer function.

18. A method according to claim 16 wherein one or more of the panning functions implement a logarithmic transfer function.

19. A method according to claim 16 wherein one or more of the panning functions implement a linear transfer function.

20. A method according to claim 16 wherein one or more of the panning functions implement a mathematically simplified approximation of a trigonometric transfer function.

21. A method according to any one of claims 16-20 wherein the control signals are derived from

the application of a panning function to a directional dominance signal, and/or the product of the application of a panning function to one directional dominance signal and the application of a panning function to another directional dominance signal,

wherein each panning function may be different from ones or all of the other panning functions.

22. A method according to any one of claims 16-20 wherein the panning functions are panning functions that are not inherent in the n input audio signals.

23. A method according to claim 22 wherein one of the directional axes is a left/right axis and the panning functions are panning functions that do not include a left/right panning component.

24. A method according to any one of claims 16-23 wherein at least some of said n time-varying scale factor signals are derived from the application of a single panning function to a directional dominance signal and others of said n time- varying scale factor signals are derived from the products of the application of a single panning function to a directional dominance signal and the application of another single panning function to another directional dominance signal.

25. A method according to claim 24 wherein the directional axis of one of the directional dominance signals is a left/right axis and the

directional axis of another of the directional dominance signals is a front/back axis, wherein at least some of said n time-varying scale factor signals are derived from the application of a single panning function to the front/back directional dominance signal and at least some of said n time- varying scale factor signals are derived from the products of the application of a single panning function to the left/right directional dominance signal and the application of another single panning function to the front/back directional dominance signal.

26. A method according to any one of claims 1-25 further comprising deriving p audio signals from said n audio output signals, wherein p is two and said p audio signals are derived from said n audio signals using a virtualizer or virtualization process such that, when the p audio signals are applied to a pair of transducers, a listener suitably located with respect to the transducers perceives the n audio signals as coming from locations that may be different from the location of the transducers.

27. A method according to 26 wherein the virtualizer or virtualization process includes the application of one or more head-related-transfer- functions to ones of said n audio output signals.

28. A method according to claim 26 or claim 27 wherein the pair of transducers is a pair of headphones.

29. A method according to claim 26 or claim 27 wherein the pair of transducers is a pair of loudspeakers.

30. Apparatus adapted to perform the methods of any one of claims 1 through 29.

31. A computer program, stored on a computer-readable medium for causing a computer to perform the methods of any one of claims 1 through 29.