HK1233037B

HK1233037B - Residual encoding in an object-based audio system

Info

Publication number: HK1233037B
Application number: HK17106461.8A
Authority: HK
Inventors: A．考克; G.塞鲁西
Original assignee: Dts（英属维尔京群岛）有限公司
Priority date: 2014-03-20
Filing date: 2015-03-04
Publication date: 2021-03-05

Description

Residual Coding in Object-Based Audio Systems

相关申请Related applications

本申请要求于2014年3月20日提交的题为“基于对象的音频系统中的残差编码”的美国临时专利申请No.61/968111、以及于2015年2月12日提交的题为“基于对象的音频系统中的残差编码”的美国非临时专利申请No.14/620544的优先权。This application claims priority to U.S. Provisional Patent Application No. 61/968,111, filed on March 20, 2014, and entitled “Residual Coding in Object-Based Audio Systems,” and U.S. Non-Provisional Patent Application No. 14/620,544, filed on February 12, 2015, and entitled “Residual Coding in Object-Based Audio Systems.”

技术领域Technical Field

本发明大体上涉及有损的、多声道音频压缩和解压缩，更具体地涉及对下混的多声道音频信号以有助于对接收的解压缩的多声道音频信号进行上混的方式来压缩和解压缩。The present invention generally relates to lossy, multi-channel audio compression and decompression, and more particularly to compressing and decompressing a downmixed multi-channel audio signal in a manner that facilitates upmixing a received decompressed multi-channel audio signal.

背景技术Background Art

音频和视听娱乐系统从不起眼的起点、能够通过单个扬声器再现单声道音频而进展。现代环绕声系统能够通过在收听者环境(可以是公共剧院或更私人的“家庭影院”)中的多个扬声器来记录、发送并再现多个声道。各种环绕声扬声器设置是可用的：这些扬声器设置遵循诸如“5.1环绕”、“7.1环绕”甚至20.2环绕的名称(其中小数点右侧的数字指示低频效果声道)。对于每个这样的配置，扬声器的各种物理设置是可能的；但是通常如果渲染几何形状类似于由混合和掌控被记录的声道的音频工程师来假定的几何形状，那么最佳的结果将被实现。Audio and audiovisual entertainment systems have evolved from humble beginnings, capable of reproducing monophonic audio through a single speaker. Modern surround sound systems are capable of recording, transmitting, and reproducing multiple channels through multiple speakers in a listener's environment (which can be a public theater or a more private "home theater"). Various surround sound speaker setups are available: these follow names such as "5.1 surround," "7.1 surround," and even 20.2 surround (where the number to the right of the decimal indicates the low-frequency effects channel). For each such configuration, various physical arrangements of speakers are possible; but generally, the best results will be achieved if the rendering geometry resembles the geometry assumed by the audio engineer who mixed and mastered the recorded channels.

因为除了混合工程师的预测之外的各种渲染环境和几何形状是可能的，并且因为相同的内容可以在多种收听配置或环境中被回放，所以环绕声配置的多样性给希望呈现忠实的收听体验的工程师或艺术家带来了众多挑战。“基于声道的”或(最近的)“基于对象的”方法可以被用来呈现环绕声收听体验。Because a variety of rendering environments and geometries beyond those predicted by the mixing engineer are possible, and because the same content can be played back in a variety of listening configurations or environments, the diversity of surround sound configurations presents numerous challenges to engineers or artists who wish to present a faithful listening experience. "Channel-based" or (more recently) "object-based" approaches can be used to present a surround sound listening experience.

在基于声道的方法中，每个声道被记录，目的是它应该在对应的扬声器上的回放期间被渲染。在混合期间，期待的扬声器的物理设置被预先确定或至少被近似地假设。相较而言，在基于对象的方法中，多个独立的音频对象被分别记录、存储和发送，保留它们的同步关系，但是独立于与期待的回放扬声器或环境的配置或几何形状有关的任何假定。音频对象的示例将是单个乐器、诸如被视为统一的乐音的中提琴部分的合奏部分、人声或声音效果。为了保留空间关系，表示音频对象的数字数据对于每个对象包括象征与特定声源相关联的信息的某些数据(“元数据”)：例如，矢量方向、近似值、响度、运动和声源的范围可以被以符号编码(优选地以能够时变的方式)并且这个信息与特定的声音信号一起被发送或记录。独立声源波形和相关联的元数据的组合一起包括音频对象(存储为音频对象文件)。这个方法具有这样的优点：它可以以许多不同的配置而被灵活地渲染；但是，负担被施加到渲染处理器(“引擎”)上以基于回放扬声器和环境的几何形状和配置来计算合适的混合。In a channel-based approach, each channel is recorded with the intention that it will be rendered during playback on the corresponding loudspeaker. During mixing, the physical setup of the intended loudspeakers is predetermined or at least approximately assumed. In contrast, in an object-based approach, multiple independent audio objects are recorded, stored, and transmitted separately, preserving their synchronization relationship but independent of any assumptions about the configuration or geometry of the intended playback loudspeakers or environment. Examples of audio objects would be individual instruments, ensemble parts such as a viola section perceived as a unified musical note, vocals, or sound effects. To preserve spatial relationships, the digital data representing the audio objects includes, for each object, certain data ("metadata") representing information associated with a particular sound source: for example, vector direction, approximation, loudness, motion, and range of the sound source can be symbolically encoded (preferably in a time-varying manner), and this information is transmitted or recorded along with the particular sound signal. The combination of the individual sound source waveforms and the associated metadata together comprises an audio object (stored as an audio object file). This approach has the advantage that it can be flexibly rendered in many different configurations; however, the burden is placed on the rendering processor ("engine") to calculate the appropriate mix based on the geometry and configuration of the playback speakers and environment.

在对于音频的、基于声道的和基于对象的方法两者中，都频繁地期望以这样的方式来发送下混的信号(A加上B)：在该方式中，两个独立的声道(或对象，A和B)可以在回放期间被分开(“被上混”)。发送下混的一个动机可能是为了保持向后兼容性，使得下混的节目可以在单声道、传统的双声道立体声或(更一般地)在具有比记录的节目中的声道或对象的数目少的扬声器的系统上播放。为了恢复声道或对象的更高的多元性，应用上混过程。例如，如果某人发送信号A和B的和C：(A+B)，并且如果其还发送B，那么接收器可以容易地构造A：(A+B-B)＝A。可替代地，某人可以发送复合信号(A+B)和(A-B)，然后通过采用发送的复合信号的线性组合来恢复A和B。许多现有系统使用这个“矩阵混合”方法的变型。这些系统在恢复离散声道或对象方面是颇为成功的。但是，当大量的声道或者特别是对象被求和时，在没有伪影或不切实际的高带宽需要的情况下，充分地再现单独的离散对象或声道变得困难。因为基于对象的音频经常牵涉到非常大量的独立音频对象，所以，在为了从下混的信号中恢复离散的对象的有效的上混中，特别是在数据率(或更一般地，带宽)被约束的地方，牵涉到巨大的困难。In both channel-based and object-based approaches to audio, it is frequently desirable to send a downmix of the signal (A plus B) in such a way that the two independent channels (or objects, A and B) can be separated ("upmixed") during playback. One motivation for sending a downmix may be to maintain backward compatibility, so that the downmixed program can be played in mono, traditional two-channel stereo, or (more generally) on a system with fewer speakers than the number of channels or objects in the recorded program. In order to recover a higher diversity of channels or objects, an upmixing process is applied. For example, if one sends the sum C of signals A and B: (A+B), and if one also sends B, then the receiver can easily construct A: (A+B-B)=A. Alternatively, one can send the composite signals (A+B) and (A-B), and then recover A and B by taking a linear combination of the transmitted composite signals. Many existing systems use variations of this "matrix mixing" approach. These systems are quite successful in recovering discrete channels or objects. However, when a large number of channels, or in particular objects, are summed, it becomes difficult to adequately reproduce the individual discrete objects or channels without artifacts or unrealistically high bandwidth requirements. Because object-based audio often involves a very large number of independent audio objects, there are significant difficulties in efficiently upmixing to recover the discrete objects from the downmixed signal, especially where the data rate (or more generally, bandwidth) is constrained.

在用于数字音频的发送或记录的大多数实用的系统中，数据压缩的一些方法将是高度被期望的。数据率一直受到一些约束，并且更高效地发送音频一直是被期望的。当使用大量声道时-作为离散声道或被上混，这个考虑变得越来越重要。在本申请中，术语“压缩”指减小发送或记录音频信号的数据需求的方法，不论结果是数据率减小还是文件大小减小。(这个定义不应该与动态范围压缩混淆，在与这里无关的其他音频情境中，动态范围压缩有时也被称为“压缩”)。In most practical systems for the transmission or recording of digital audio, some method of data compression would be highly desirable. Data rates are always subject to some constraints, and it is always desirable to transmit audio more efficiently. This consideration becomes increasingly important when using a large number of channels - either as discrete channels or being upmixed. In this application, the term "compression" refers to a method of reducing the data requirements for transmitting or recording an audio signal, whether the result is a reduction in data rate or a reduction in file size. (This definition should not be confused with dynamic range compression, which is sometimes also called "compression" in other audio contexts not relevant here).

压缩下混的信号的现有方法通常采用下面的两个方法中的一个：无损编码或冗余描述。这两个方法中的任何一个可以有助于解压缩之后的上混，但是两者均有缺点。Existing methods for compressing downmix signals generally adopt one of the following two approaches: lossless coding or redundant description. Either of these two approaches can help with upmixing after decompression, but both have drawbacks.

无损和有损编码：Lossless and lossy encoding:

假设A,B₁,B₂,...,B_m是独立的信号(对象)，这些独立的信号(对象)在码流中被编码并被发出到渲染器。被分辨的对象A将被称为基对象，而B＝B₁,B₂,...,B_m将被称为常规对象。在基于对象的音频系统中，我们对同时但独立地渲染对象感兴趣，使得，例如，每个对象可以在不同的空间定位处被渲染。Assume that A, _B1 , _B2 , ..., _Bm are independent signals (objects) that are encoded in the bitstream and sent to the renderer. The distinguished object A will be called the base object, and B = _B1 , _B2 , ..., _Bm will be called the normal object. In object-based audio systems, we are interested in rendering objects simultaneously but independently, so that, for example, each object can be rendered at a different spatial location.

向后兼容性是被期望的：换言之，我们需要编码流是可以由既不是基于对象的也不是知晓对象的老式系统、或能够处理更少的声道老式系统解译的。这样的系统只能从C的编码的(压缩的)版本E(C)渲染复合对象或声道C＝A+B₁+B₂+···+B_m。因此，我们需要码流包括被发送的E(C)，后面跟着对单独的对象的描述，该单独的对象的描述被老式系统忽略。因而，码流可以包括E(C)，后面跟着常规对象的描述E(B₁),E(B₂),…,E(B_m)。基对象A随后通过解码这些描述并设置A＝C-B₁–B₂-···-B_m被恢复。但是应该注意，在实践中使用的大多数音频编解码器是有损的，这意味着编码的对象E(X)的解码的版本Q(X)＝D(E(X))只是X的近似，因而不必与它相同。近似的精确度通常依赖于编解码器的选择并依赖于可用于码流的带宽(或存储空间)。虽然无损编码是可能的，即Q(X)＝X，但是它通常需要比有损编码大得多的带宽或存储空间。在另一方面，后者仍然可以提供与原始的对象在知觉上无法区分的高质量再现。Backward compatibility is desirable: in other words, we need the coded stream to be interpretable by legacy systems that are neither object-based nor object-aware, or that can handle fewer channels. Such systems can only render composite objects or channels C = A + B ₁ + B ₂ + ... + B _m from the encoded (compressed) version E(C) of C. Therefore, we need the codestream to include E(C) being sent, followed by descriptions of individual objects, which are ignored by legacy systems. Thus, the codestream can include E(C) followed by descriptions of regular objects E(B ₁ ), E(B ₂ ), ..., E(B _m ). The base object A is then recovered by decoding these descriptions and setting A = CB ₁ - B ₂ - ... - B _m . However, it should be noted that most audio codecs used in practice are lossy, meaning that the decoded version Q(X) = D(E(X)) of an encoded object E(X) is only an approximation of X and is not necessarily identical to it. The accuracy of the approximation usually depends on the choice of codec and on the bandwidth (or storage space) available for the bitstream. Although lossless coding is possible, i.e., Q(X) = X, it usually requires much greater bandwidth or storage space than lossy coding. On the other hand, the latter can still provide a high-quality reproduction that is perceptually indistinguishable from the original object.

冗余描述：Redundant description:

可替代的方法是在码流中包括对某些特权对象A的显式的编码，该码流因此将包括E(C)，E(A)，E(B₁)，E(B₂)，…，E(B_m)。假设E是有损的，这个方法可能比使用无损编码更经济，但是仍然不是带宽的高效利用。该方法是冗余的，因为E(C)显然与单独地编码的对象E(A)，E(B₁)，E(B₂)，…，E(B_m)相关。An alternative approach is to include explicit encoding of some privileged object A in the codestream, which would then consist of E(C), E(A), E( _B1 ), E( _B2 ), ..., E( _Bm ). Assuming E is lossy, this approach may be more economical than using lossless encoding, but it is still not a very efficient use of bandwidth. The approach is redundant because E(C) is clearly related to the separately encoded objects E(A), E( _B1 ), E( _B2 ), ..., E( _Bm ).

发明内容Summary of the Invention

对具有多个轨道和对象的下混复合信号(包括下混的信号)的有损压缩和发送以与冗余发送或无损压缩相比减小比特率要求同时减小上混伪影的方式来完成。压缩的残差信号与压缩的总混合和至少一个压缩的音频对象一起被生成和发送。在接收和上混方面，本发明对下混的信号和其他压缩的对象进行解压缩，计算近似上混信号，并通过减去解压缩的残差信号来校正从上混中得出的特定基信号。本发明因而允许有损压缩与下混音频信号组合用于通过通信通道发送(或用于存储)。在后面的接收和上混时，附加的基信号在提供多对象性能的有能力的系统中是可恢复的(而老式系统可以在不上混的情况下容易地解码总混合)。本发明的方法和装置具有以下两个方面：a)音频压缩和下混方面，和b)音频解压缩/上混方面，其中压缩应该被理解成表示的是比特率减小(或文件大小减小)的方法，而下混表示的是声道或对象计数的减小，同时上混表示的是通过恢复和分离之前被下混的声道或对象而引起的声道计数的增大。The lossy compression and transmission of a downmix composite signal (including the downmixed signal) having multiple tracks and objects is accomplished in a manner that reduces bit rate requirements while reducing upmix artifacts compared to redundant transmission or lossless compression. A compressed residual signal is generated and transmitted together with the compressed overall mix and at least one compressed audio object. In terms of reception and upmixing, the present invention decompresses the downmixed signal and other compressed objects, calculates an approximate upmixed signal, and corrects a specific base signal derived from the upmix by subtracting the decompressed residual signal. The present invention thus allows lossy compression to be combined with the downmixed audio signal for transmission over a communication channel (or for storage). Upon subsequent reception and upmixing, the additional base signal is recoverable in capable systems that provide multi-object performance (while older systems can easily decode the overall mix without upmixing). The method and apparatus of the present invention have two aspects: a) an audio compression and downmixing aspect, and b) an audio decompression/upmixing aspect, where compression should be understood to mean a method of bitrate reduction (or file size reduction), while downmixing means a reduction in channel or object count, and upmixing means an increase in channel count by restoring and separating previously downmixed channels or objects.

在本发明的解压缩和上混方面，本发明包括用于对压缩的下混复合音频信号进行解压缩和上混的方法。该方法包括以下步骤：接收总混合信号C的压缩表示、一组相应的对象信号{Bi}(所述组具有至少一个成员)的一组压缩表示和残差信号Δ的压缩表示；对总混合信号C的压缩表示进行解压缩，对残差信号Δ的压缩表示和该组对象信号{Bi}解压缩以得到相应的近似总混合信号C’、一组近似对象信号{Bi’}和重构的残差信号Δ’；相减地混合该近似总混合信号C’和整组近似对象信号{Bi’}以便得到基信号R的近似R’；及相减地混合所述重构的残差信号Δ’与参考信号R的近似R’以便产生校正的基信号A”。在优选的实施例中，对至少一个Bi的压缩表示和对C的压缩表示中的至少一个由有损压缩方法来准备。In terms of decompression and upmixing of the present invention, the present invention includes a method for decompressing and upmixing a compressed downmix composite audio signal. The method includes the following steps: receiving a compressed representation of a total mixed signal C, a set of compressed representations of a group of corresponding object signals {Bi} (the group having at least one member) and a compressed representation of a residual signal Δ; decompressing the compressed representation of the total mixed signal C, decompressing the compressed representation of the residual signal Δ and the group of object signals {Bi} to obtain a corresponding approximate total mixed signal C’, a set of approximate object signals {Bi’} and a reconstructed residual signal Δ’; subtractively mixing the approximate total mixed signal C’ and the entire group of approximate object signals {Bi’} to obtain an approximation R’ of the base signal R; and subtractively mixing the reconstructed residual signal Δ’ with an approximation R’ of the reference signal R to produce a corrected base signal A". In a preferred embodiment, at least one of the compressed representations of at least one Bi and the compressed representation of C is prepared by a lossy compression method.

在本发明的压缩和下混方面，本发明包括压缩复合音频信号的方法，该复合音频信号包括总混合信号C、一组至少一个对象信号{Bi}(所述组具有至少一个成员Bi)和基信号A，其中总混合信号C包括根据以下步骤与所述一组至少一个对象信号{Bi}混合的基信号A：通过有损压缩方法来压缩该总混合信号C和所述一组至少一个对象信号{Bi}以便分别产生压缩的总混合信号E(C)和一组压缩的对象信号E({Bi})；解压缩所述压缩的总混合信号E(C)和该组压缩的对象信号E({Bi})以便得到重构的Q(C)和一组重构的对象信号Q({Bi})；相减地混合重构的信号Q(C)和整组对象信号Q({Bi})以便产生近似的基信号Q’(A)；及从近似的基信号减去参考信号以便产生残差信号Δ，然后压缩该残差信号Δ以便得到压缩的残差信号Ec(Δ)。压缩的总混合信号E(C)、所述一组组(至少一个)压缩的对象信号E({Bi})和压缩的残差信号Ec(Δ)被优选地发送(或等同地，存储或记录)。In terms of compression and downmixing of the present invention, the present invention includes a method for compressing a composite audio signal, which composite audio signal includes a total mixed signal C, a group of at least one object signal {Bi} (the group has at least one member Bi) and a base signal A, wherein the total mixed signal C includes the base signal A mixed with the group of at least one object signal {Bi} according to the following steps: compressing the total mixed signal C and the group of at least one object signal {Bi} by a lossy compression method to respectively produce a compressed total mixed signal E(C) and a group of compressed object signals E({Bi}); decompressing the compressed total mixed signal E(C) and the group of compressed object signals E({Bi}) to obtain a reconstructed Q(C) and a group of reconstructed object signals Q({Bi}); subtractively mixing the reconstructed signal Q(C) and the entire group of object signals Q({Bi}) to produce an approximate base signal Q’(A); and subtracting a reference signal from the approximate base signal to produce a residual signal Δ, and then compressing the residual signal Δ to obtain a compressed residual signal Ec(Δ). The compressed total mixed signal E(C), the set of (at least one) compressed object signals E({Bi}) and the compressed residual signal Ec(Δ) are preferably transmitted (or equivalently, stored or recorded).

在压缩和下混方面的一个实施例中，参考信号包括基混合信号A。在可替代的实施例中，参考信号是通过以下方法得出的基信号A的近似：使用有损方法压缩基信号A以便形成压缩信号E(A)，然后解压缩该压缩信号E(A)以便得到参考信号(该参考信号是基信号A的近似)。In one embodiment of the compression and downmixing aspects, the reference signal comprises a base mix signal A. In an alternative embodiment, the reference signal is an approximation of the base signal A obtained by compressing the base signal A using a lossy method to form a compressed signal E(A), and then decompressing the compressed signal E(A) to obtain the reference signal (which is an approximation of the base signal A).

提供这个总结以便介绍在下面的具体实施方式中被进一步描述的简化形式中的概念的选择。这个总结既不是意图识别要求权利的主题的关键特征或本质特征，也不是意图用于限制权力要求的范围。正如在本申请中使用的，除非在上下文以其他方式清楚地要求，否则术语“组”被用于表示具有至少一个成员的组，但不必需要具有多个成员。这个概念在数学情境中常用，并且不应该导致歧义。根据下面结合附图对优选实施例的详细描述，对于本领域的技术人员而言，本发明的这些和其他特征和优点将是清楚的，其中：This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is neither intended to identify key features or essential features of the claimed subject matter nor is it intended to be used to limit the scope of the claims. As used in this application, unless the context clearly requires otherwise, the term "group" is used to refer to a group having at least one member, but not necessarily having multiple members. This concept is commonly used in mathematical contexts and should not lead to ambiguity. These and other features and advantages of the present invention will be apparent to those skilled in the art from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings, in which:

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是描绘在现有技术中已知的、用于以向后兼容的方式来压缩和发送包括混合的音频信号的复合信号的一般化系统的高水平框图；FIG1 is a high-level block diagram depicting a generalized system known in the art for compressing and transmitting a composite signal including mixed audio signals in a backward-compatible manner;

图2是示出根据本发明的第一实施例来压缩复合音频信号的方法的步骤的流程图；2 is a flow chart showing the steps of a method for compressing a composite audio signal according to a first embodiment of the present invention;

图3是示出根据本发明的解压缩方面解压缩并上混音频信号的方法的步骤的流程图；3 is a flow chart illustrating the steps of a method for decompressing and upmixing an audio signal according to the decompression aspect of the present invention;

图4是示出根据本发明的可替代的实施例压缩复合音频信号的方法的步骤的流程图；4 is a flow chart illustrating the steps of a method for compressing a composite audio signal according to an alternative embodiment of the present invention;

图5是根据本发明的可替代实施例，与图2中的方法一致地压缩复合音频信号的装置的原理框图；及FIG5 is a schematic block diagram of an apparatus for compressing a composite audio signal in accordance with the method of FIG2 according to an alternative embodiment of the present invention; and

图6是根据本发明的第一实施例，与图4的方法一致地压缩复合音频信号的装置的原理框图。FIG. 6 is a schematic block diagram of an apparatus for compressing a composite audio signal in accordance with the method of FIG. 4 according to a first embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

本文中描述的方法涉及处理信号，特别地针对处理表示物理声音的音频信号。这些信号可以由数字电子信号表示。在本讨论中，连续的数学公式可以被示出或讨论以便例证概念；但是，应该理解，一些实施例在数字字节或字的时间序列的情境中操作，所述字节或字形成对于模拟信号或(最终)物理声音的离散近似。该离散的数字信号与周期地采样的音频波形的数字表示对应。在实施例中，可以使用近似48000采样/秒的采样率。诸如96khz的更高的采样率可以被可替代地使用。可以选择量化方案和位分辨率来满足特定应用的需求。本文中描述的技术和装置可以在数个声道中相互依赖地应用。例如，它们可以用于具有多于两个声道的环绕音频系统的情境中。The methods described herein relate to processing signals, and are particularly directed to processing audio signals that represent physical sounds. These signals can be represented by digital electronic signals. In this discussion, continuous mathematical formulas may be shown or discussed to illustrate the concepts; however, it should be understood that some embodiments operate in the context of a time series of digital bytes or words that form a discrete approximation to an analog signal or (ultimately) a physical sound. This discrete digital signal corresponds to a digital representation of a periodically sampled audio waveform. In an embodiment, a sampling rate of approximately 48,000 samples per second can be used. A higher sampling rate such as 96 kHz can be used alternatively. The quantization scheme and bit resolution can be selected to meet the needs of a specific application. The techniques and devices described herein can be applied interdependently in several channels. For example, they can be used in the context of a surround audio system with more than two channels.

如在本文中使用的，“数字音频信号”或“音频信号”不是描述单纯的数学抽象概念，而是除了具有它的普通意义，还表示在能够由机器或装置检测的非瞬时性物理介质中体现的、或由该介质携带的信息。这个术语包括记录的或发送的信号，并且应该理解为包括以任何形式的编码来输送，该任何形式的编码包括脉冲编码调制(PCM)但不限于PCM。输出或输入可以用各种已知方法中的任何一种而被编码或压缩，该已知方法包括MPEG、ATRAC、AC3或在美国专利5,974,380、5,978,762和6,487,535中描述的、DTS公司的专有方法。可以对计算执行一些修改以便适应那个特定的压缩或编码方法。As used herein, "digital audio signal" or "audio signal" does not describe a simple mathematical abstraction, but rather, in addition to its ordinary meaning, refers to information embodied in or carried by a non-transitory physical medium capable of being detected by a machine or device. This term includes recorded or transmitted signals and should be understood to include transmission using any form of encoding, including but not limited to pulse code modulation (PCM). The output or input may be encoded or compressed using any of a variety of known methods, including MPEG, ATRAC, AC3, or the proprietary method of DTS Corporation described in U.S. Patents 5,974,380, 5,978,762, and 6,487,535. Some modifications may be performed on the calculations to accommodate that particular compression or encoding method.

概述Overview

图1以高水平的概括示出了本发明在其中操作的一般性环境。如在现有技术中，编码器110接收被任意地称为A、B的多个独立的音频信号，使用混合器120将所述信号下混成总混合信号C(＝A+B)，使用压缩器130压缩该下混的信号，然后以将允许在解码器160处重构该信号的合理的近似的方式来发送(或记录)该下混的信号。虽然在图中仅示出信号B(为了简化)，但是本发明可以用于多个独立的信号或对象B1,B2,...,Bm。类似地，在下面的描述中我们称一组对象B1,B2,...,Bm；应该理解，该组对象包括至少一个对象，即m>＝1，不限于某个数目的对象。Figure 1 shows a general environment in which the present invention operates at a high level. As in the prior art, an encoder 110 receives a plurality of independent audio signals, arbitrarily referred to as A, B, downmixes the signals into a total mixed signal C (=A+B) using a mixer 120, compresses the downmixed signal using a compressor 130, and then transmits (or records) the downmixed signal in a manner that will allow a reasonable approximation of the signal to be reconstructed at a decoder 160. Although only signal B is shown in the figure (for simplicity), the present invention can be used for multiple independent signals or objects B1, B2, ..., Bm. Similarly, in the following description we refer to a group of objects B1, B2, ..., Bm; it should be understood that the group of objects includes at least one object, i.e. m>=1, and is not limited to a certain number of objects.

除了编码器110和解码器160，图1还示出了一般化的发送通道150，发送通道150应该被理解成包括发送或记录或存储介质的、特别是记录到非瞬时性机器可读存储介质上的任何装备。在本发明的情境中，更一般地在通信理论中，记录或存储与后面的回放结合，这可以被视为信息发送或通信的特殊情况，据理解，再现对应于通常在后面的时间，可选地在不同的空间定位中接收并解码该编码的信息。因而，术语“发送”可以表示在存储介质上记录；“接收”可以表示从存储介质读取；且“通道”可以包括介质上的信息存储。In addition to the encoder 110 and the decoder 160, FIG1 also shows a generalized transmission channel 150, which should be understood to include any equipment for transmitting or recording or storing a medium, in particular recording to a non-transitory machine-readable storage medium. In the context of the present invention, and more generally in communication theory, recording or storing combined with later playback can be considered a special case of information transmission or communication, with playback being understood to correspond to receiving and decoding the encoded information, typically at a later time, optionally in a different spatial location. Thus, the term "transmitting" can mean recording on a storage medium; "receiving" can mean reading from a storage medium; and "channel" can include information storage on a medium.

信号通过发送通道以复用的格式被发送对于维持和保留信号(A,B,C)之间的同步关系是重要的。复用器和解复用器可以包括现有技术中已知的比特封装和数据格式化方法。发送通道还可以包括信息编码或处理的其他层，例如错误校正、奇偶校验或适合(例如)在OSI层模型中描述的通道或物理层的其他技术。The signals are transmitted in a multiplexed format over the transmission channel, which is important for maintaining and preserving the synchronization relationship between the signals (A, B, C). The multiplexers and demultiplexers may include bit packing and data formatting methods known in the art. The transmission channel may also include other layers of information encoding or processing, such as error correction, parity checking, or other techniques suitable for the channel or physical layer described, for example, in the OSI layer model.

如示出的，解码器接收压缩的下混的音频信号，解复用所述信号，以创新的方式解压缩所述信号，该创新的方式允许对上混的可接受的重构以便再现多个独立的信号(或音频对象)。随后该信号被优选地上混以便恢复原始的信号(或尽可能的近似)。As shown, the decoder receives a compressed downmixed audio signal, demultiplexes the signal, decompresses the signal in an innovative way that allows acceptable reconstruction of the upmix to reproduce multiple independent signals (or audio objects), and then preferably upmixes the signal to restore the original signal (or as close as possible).

操作原理：How it works:

假设A,B₁,B₂,...,B_m是独立的信号(对象)，这些独立的信号(对象)在码流中被编码并被发出到渲染器。被分辨的对象A将被称为基对象，而B＝B₁,B₂,...,B_m将被称为常规对象。我们称一组对象B₁,B₂,...,B_m；但是应该理解，该组对象包含至少一个对象(即m>＝1)，不限于某个数目的对象。在基于对象的音频系统中，我们对同时但独立地渲染对象感兴趣，使得，例如，每个对象可以在不同的空间定位处被渲染。Assume that A, _B1 , _B2 , ..., _Bm are independent signals (objects) that are encoded in the bitstream and sent to the renderer. The distinguished object A will be called the base object, and B = _B1 , _B2 , ..., _Bm will be called the regular object. We call a group of objects _B1 , _B2 , ..., _Bm ; however, it should be understood that the group contains at least one object (i.e., m>=1) and is not limited to a certain number of objects. In object-based audio systems, we are interested in rendering objects simultaneously but independently, so that, for example, each object can be rendered at a different spatial location.

对于向后兼容性，我们需要编码流可以由既不是基于对象的也不是知晓对象的老式系统解译。这样的系统只能从C的编码的版本E(C)渲染复合对象C＝A+B₁+B₂+···+B_m。因此，我们需要发送的码流包括E(C)，后面跟着对单独的对象的描述，该单独的对象的描述被老式系统忽略。在现有技术方法中，码流将包括E(C)，后面跟着常规对象的描述E(B₁),E(B₂),…,E(B_m)。基对象A随后通过解码这些描述并设置A＝C-B₁–B₂-···-B_m被恢复。但是应该注意，在实践中使用的大多数音频编解码器是有损的，这意味着编码的对象E(X)的解码的版本Q(X)＝D(E(X))只是X的近似，而不必与它相同。该近似的精确度通常依赖于编解码器{E,D}的选择并依赖于可用于码流的带宽(或存储空间)。For backward compatibility, we need the coded stream to be interpretable by legacy systems that are neither object-based nor object-aware. Such systems can only render the composite object C = A + B ₁ + B ₂ + ... + B _m from the encoded version E(C) of C. Therefore, we need to send a codestream consisting of E(C) followed by descriptions of the individual objects, which are ignored by legacy systems. In prior art approaches, the codestream would consist of E(C) followed by descriptions of regular objects E(B ₁ ), E(B ₂ ), ..., E(B _m ). The base object A is then recovered by decoding these descriptions and setting A = CB ₁ – B ₂ - ... - B _m . However, it should be noted that most audio codecs used in practice are lossy, meaning that the decoded version Q(X) = D(E(X)) of an encoded object E(X) is only an approximation of X, not necessarily identical to it. The accuracy of this approximation generally depends on the choice of codec {E, D} and on the bandwidth (or storage space) available for the codestream.

因此，由此可见，当使用有损编码器时，解码器将不能访问对象C,B₁,B₂,…,B_m,但是可以访问近似版本Q(C),Q(B₁),Q(B₂),…,Q(B_m),并且将只能将A估计成Therefore, it follows that when using a lossy encoder, the decoder will not have access to objects C, _B1 , _B2 , ..., _Bm , but will have access to approximate versions Q(C), Q( _B1 ), Q( _B2 ), ..., Q( _Bm ), and will only be able to estimate A as

Q’(A)＝Q(C)-Q(B₁)-Q(B₂)-···-Q(B_m)Q'(A)=Q(C)-Q(B ₁ )-Q(B ₂ )-···-Q(B _m )

这样的近似将遭受单独的有损编码中的误差的积累。在实践中这将经常导致令人不快的可感知的伪影。特别是，Q’(A)可能是比Q(A)差得多的A的近似，且它的伪影可能与其他对象统计相关，而Q(A)并不会这样。在实践中，残差C–B1–B2等将在听觉上与B1+B2+..相关(对于有损压缩)。我们的人耳可以分辨出(pick up)在算法上很难检测的相关性。Such an approximation will suffer from the accumulation of errors in the individual lossy encodings. In practice, this often leads to unpleasant perceptible artifacts. In particular, Q'(A) may be a much worse approximation of A than Q(A), and its artifacts may be statistically correlated with other objects, while Q(A) is not. In practice, the residuals C–B1–B2, etc., will be audibly correlated with B1+B2+.. (for lossy compression). Our ears can pick up correlations that are difficult to detect algorithmically.

根据本发明，避免了结合现有方法提到的冗余中的一些，同时仍然允许A的可接受的重构。我们在码流中包括编码E_c(Δ),而不是包括(冗余信号)Q(A)，其中，Δ是残差信号：According to the present invention, some of the redundancies mentioned in connection with prior methods are avoided while still allowing acceptable reconstruction of A. Instead of including (the redundant signal) Q(A) in the codestream, we include the code E _c (Δ), where Δ is the residual signal:

Δ＝Q’(A)-AΔ＝Q’(A)-A

且E_c是对于Δ的有损编码器(不必与E相同)。令D_c是对于E_c的解码器,并令and E _c is a lossy encoder for Δ (not necessarily the same as E). Let D _c be a decoder for E _c , and let

R(Δ)＝D_c(E_c(Δ))R(Δ)＝D _c (E _c (Δ))

在解码器侧，得到A的近似On the decoder side, we get an approximation of A

Q_c(A)＝Q’(A)-R(Δ)Q _c (A) = Q' (A) - R (Δ)

第一实施例的方法：The method of the first embodiment:

1.编码器1. Encoder

上面以数学方式描述的编码方法可以在程序上被描述成动作序列，如图2所示。如前面描述的，至少一个被分辨的对象A将被称为基对象，而B₁,B₂,...,B_m将被称为常规对象。为了简洁，我们可以在下面将常规对象统称为B，可以理解，该组全部的(至少一个)常规对象B₁,B₂,...,B_m可以被指定为{Bi}；相较而言，B＝B₁+B₂+…B_m表示常规对象B₁,B₂,...,B_m的混合。该方法从混合的信号C＝A+B开始。应当清楚，A+B的混合可以作为预备步骤，或信号可以被设置为事先混合的。信号A也是需要的；它可以被分开接收或通过从C中减去B被重构。该组(至少一个)常规对象{Bi}也是需要的，并且以下面描述的方式被编码器使用。The encoding method described mathematically above can be programmatically described as a sequence of actions, as shown in Figure 2. As previously described, at least one distinguished object A will be referred to as a base object, while _B1 , _B2 , ..., _Bm will be referred to as regular objects. For simplicity, we will refer to the regular objects collectively as B below, with the understanding that the set of all (at least one) regular objects _B1 , _B2 , ..., _Bm can be designated as {Bi}; in contrast, B = _B1 + _B2 + ... _Bm represents a mixture of regular objects _B1 , _B2 , ..., _Bm . The method begins with a mixed signal C = A + B. It should be clear that the mixing of A + B can be performed as a preliminary step, or the signals can be provided as pre-mixed. Signal A is also required; it can be received separately or reconstructed by subtracting B from C. The set of (at least one) regular objects {Bi} is also required and used by the encoder in the manner described below.

首先，编码器使用有损编码方法分别压缩(步骤210)信号A、{Bi}和C，以便得到分别由E(A)、{E(Bi)}和E(C)表示的对应的压缩信号。(符号{E(Bi)}表示该组编码的对象中的每个与属于该组信号{Bi}的相应的原始对象对应，每个对象信号由E单独编码)。然后编码器使用与用于压缩C和{Bi}的方法互补的方法来解压缩(步骤220)E(C)和{E(Bi)}，以便产生重构的信号Q(C)和{Q(Bi)}。这些信号与原始的C和{Bi}近似(不同，因为它们使用有损压缩/解压缩方法被压缩然后解压缩)。随后，使用相减混合步骤230从Q(C)中减去{Q(Bi)}，以便产生修改的上混信号Q’(A)，该修改的上混信号是原始的A的近似，由于在混合之前的有损编码中引入的误差，Q’(A)不同于A。然后，在第二混合步骤240中从修改的上混信号Q’(A)中减去信号A(参考信号)，以便得到残差信号Δ＝Q’(A)-A(步骤130)。该残差信号Δ随后由压缩方法压缩(步骤250)，我们指定该压缩方法为E_c,其中E_c不必是与E(在步骤210中用于压缩信号A、{Bi}或C)相同的压缩方法或设备。优选地，为了降低带宽需求，E_c应该是被选择以便与Δ的特性匹配的、对于Δ的有损编码器。但是，在带宽被更少地优化的可替代的实施例中，E_c可以是无损压缩方法。First, the encoder compresses (step 210) signals A, {Bi}, and C, respectively, using a lossy encoding method to obtain corresponding compressed signals represented by E(A), {E(Bi)}, and E(C), respectively. (The symbol {E(Bi)} indicates that each of the set of encoded objects corresponds to a corresponding original object belonging to the set of signals {Bi}, and each object signal is encoded separately by E). The encoder then decompresses (step 220) E(C) and {E(Bi)} using a method complementary to the method used to compress C and {Bi} to produce reconstructed signals Q(C) and {Q(Bi)}. These signals are approximate to the original C and {Bi} (different because they were compressed and then decompressed using a lossy compression/decompression method). Subsequently, {Q(Bi)} is subtracted from Q(C) using a subtractive mixing step 230 to produce a modified upmix signal Q'(A), which is an approximation of the original A, and Q'(A) is different from A due to errors introduced in the lossy encoding before mixing. Then, in a second mixing step 240, signal A (the reference signal) is subtracted from the modified upmix signal Q'(A) to obtain a residual signal Δ = Q'(A) - A (step 130). This residual signal Δ is then compressed (step 250) by a compression method, which we designate as _Ec , where _Ec is not necessarily the same compression method or device as E (used to compress signals A, {Bi}, or C in step 210). Preferably, to reduce bandwidth requirements, _Ec should be a lossy encoder for Δ that is selected to match the characteristics of Δ. However, in an alternative embodiment where bandwidth is less optimized, _Ec can be a lossless compression method.

注意，上面描述的方法需要连续的压缩和解压缩步骤210和220(如应用到信号{Bi}和C的那样)。在这些步骤中，以及在下面描述的可替代的方法中，在一些例子中可以通过只执行压缩(和解压缩)的有损部分来减小计算复杂度和时间。例如，诸如在美国专利5974380中描述的DTS编解码器的许多有损解压缩方法需要连续应用有损步骤(滤波到子带中、比特分配、在子带中重新量化)和跟在后面的无损步骤(应用码本、熵减小)两者。在这样的例子中，省略编码和解码两者上的无损步骤而只执行有损步骤是足够的。重构的信号将仍然显示出有损发送的效果的全部，但是节省了许多计算步骤。Note that the method described above requires successive compression and decompression steps 210 and 220 (as applied to signals {Bi} and C). In these steps, as well as in the alternative methods described below, the computational complexity and time can be reduced in some examples by performing only the lossy portion of the compression (and decompression). For example, many lossy decompression methods for the DTS codec, such as that described in U.S. Patent 5,974,380, require the successive application of both the lossy steps (filtering into subbands, bit allocation, requantization in subbands) followed by the lossless steps (application of a codebook, entropy reduction). In such examples, it is sufficient to omit the lossless steps on both encoding and decoding and perform only the lossy steps. The reconstructed signal will still show all the effects of the lossy transmission, but many computational steps are saved.

随后编码器发送(步骤260)R＝Ec(Δ)、E(C)和{E(Bi)}。优选地，编码方法还包括将这三个信号复用或重新格式化成被复用的封装以便用于发送或记录的可选步骤。如果一些方式被用于保留或重构这三个分开的但是相关的信号的时间同步，那么可以使用已知的复用方法中的任何一个。应该记住，不同的量化方案可以被用于全部三个信号，并且带宽可以在该信号之间分配。有损音频压缩的许多已知的方法中的任何一个可以被用于E，包括MP3、AAC、WMA或DTS(等等)。The encoder then transmits (step 260) R = Ec(Δ), E(C), and {E(Bi)}. Preferably, the encoding method also includes the optional step of multiplexing or reformatting the three signals into a multiplexed package for transmission or recording. If some means is used to preserve or reconstruct the time synchronization of the three separate but related signals, then any of the known multiplexing methods can be used. It should be remembered that different quantization schemes can be used for all three signals and the bandwidth can be divided among the signals. Any of a number of known methods of lossy audio compression can be used for E, including MP3, AAC, WMA, or DTS (among others).

这个方法至少提供了以下优点：首先，“误差”信号Δ被期待拥有比原始对象小的功率和熵。由于具有与A相比减小的功率，该误差信号Δ可以用比对象A少的比特被编码，这帮助重构。因此，提出的方法被期待比上面讨论的冗余描述方法(在背景技术部分)经济。第二，编码器E可以是任何音频编码器(例如MP3、AAC、WMA等)，特别注意，编码器可以是并且在优选实施例中是使用心理声学原理的有损编码器。(对应的解码器当然也将是对应的有损解码器)。第三，编码器E_c不需要是标准音频编码器，而可以对于信号Δ被优化，Δ不是标准音频信号。事实上，在E_c的设计和优化中，感知的考虑将与标准音频编解码器的设计中的感知的考虑不同。例如，感知的音频编解码器不总是寻求在信号的所有部分中最大化SNR；相反，有时寻求更“恒定的”瞬时SNR机制，其中当信号更强时允许更大的误差。事实上，这是在Q’(A)中找到的由B_i引起的伪影的主要源头。对于E_c，我们寻求尽可能多地消除这些伪影，所以在这种情况下直接的瞬时SNR最大化似乎更合适。This approach offers at least the following advantages: First, the "error" signal Δ is expected to have less power and entropy than the original object. Due to its reduced power compared to A, the error signal Δ can be encoded with fewer bits than the object A, which aids reconstruction. Therefore, the proposed approach is expected to be more economical than the redundant description approach discussed above (in the Background section). Second, the encoder E can be any audio encoder (e.g., MP3, AAC, WMA, etc.), notably, the encoder can be, and in preferred embodiments is, a lossy encoder using psychoacoustic principles. (The corresponding decoder will, of course, also be a corresponding lossy decoder). Third, the encoder E _c need not be a standard audio encoder, but can be optimized for the signal Δ, which is not a standard audio signal. Indeed, perceptual considerations in the design and optimization of E _c will differ from those in the design of standard audio codecs. For example, perceptual audio codecs do not always seek to maximize the SNR across all parts of the signal; instead, they sometimes seek a more "constant" instantaneous SNR regime, where larger errors are tolerated when the signal is stronger. Indeed, this is the primary source of the artifacts found in Q'(A) caused by B _i . For E _c , we seek to eliminate as many of these artifacts as possible, so direct instantaneous SNR maximization seems more appropriate in this case.

根据本发明的解码方法在图3中示出。作为预备的可选步骤300，解码器必须接收并解复用数据流以便恢复Ec(Δ)、{E(Bi)}和E(C)。首先，(步骤310)解码器接收压缩的数据流(或文件)Ec(Δ)、{E(Bi)}和E(C)。然后解码器将对数据流(或文件)Ec(Δ)、{E(Bi)}和E(C)中的每个进行解压缩(步骤320)以便得到重构的表示{Q(Bi)}、Q(C)和Rc(Δ)＝Dc(Ec(Δ))，其中Dc是与压缩方法Ec相反的解压缩方法，且其中用于{E(Bi)}和E(C)的解压缩方法是与用于{Bi}和C的压缩方法互补的解压缩方法。信号Q(C)和{Q(Bi)}被相减地混合(步骤330)以便恢复Q’(A)＝Q(C)-ΣQ(Bi)。这个信号Q’(A)是A的近似，与原始的A不同，因为它根据Q(C)和{Q(Bi)}的相减的混合被重构，Q(C)和{Q(Bi)}两者均使用有损编解码方法被发送。在本发明的解码和上混方法中，随后通过减去(步骤340)重构的残差R(Δ)以便得到Qc(A)＝Q’(A)-R(Δ)而改善近似信号Q’(A)。恢复的副本信号Qc(A)、Q(C)、{Q(Bi)}随后可以被再现或输出以便作为上混(A,{Bi})再现(步骤350)。对于具有更少的通道的系统，下混信号Q(C)对于输出也是可用的(或作为基于消费者控制或偏好的选择)。The decoding method according to the present invention is illustrated in FIG3 . As a preliminary, optional step 300 , the decoder must receive and demultiplex the data streams to recover Ec(Δ), {E(Bi)}, and E(C). First, (step 310 ) the decoder receives the compressed data streams (or files) Ec(Δ), {E(Bi)}, and E(C). The decoder then decompresses each of the data streams (or files) Ec(Δ), {E(Bi)}, and E(C) (step 320 ) to obtain the reconstructed representations {Q(Bi)}, Q(C), and Rc(Δ) = Dc(Ec(Δ)), where Dc is the decompression method that is the inverse of the compression method Ec, and where the decompression method used for {E(Bi)} and E(C) is the complementary decompression method used for {Bi} and C. Signals Q(C) and {Q(Bi)} are subtractively mixed (step 330 ) to recover Q'(A) = Q(C) - ΣQ(Bi). This signal Q'(A) is an approximation of A and differs from the original A because it is reconstructed from a subtractive mix of Q(C) and {Q(Bi)}, both of which are transmitted using a lossy codec method. In the decoding and upmixing method of the present invention, the approximation signal Q'(A) is then improved by subtracting (step 340) the reconstructed residual R(Δ) to obtain Qc(A) = Q'(A) - R(Δ). The recovered replica signals Qc(A), Q(C), {Q(Bi)} can then be reproduced or output for reproduction as the upmix (A, {Bi}) (step 350). For systems with fewer channels, the downmix signal Q(C) is also available for output (or as a selection based on consumer control or preference).

应该认识到，本发明的方法确实需要发送一些冗余数据。但是，本发明的方法的文件大小(或比特率需求)比下面的方法中所需的文件大小(或比特率需求)小：a)对所有通道使用无损编码，或者b)发送对有损编码的对象加上有损编码的上混的冗余描述。在一个实验中，本发明的方法被用于将上混A+B(对于单个对象B)与基声道A一起发送。结果在表1中示出。可以看到，冗余描述(现有技术)方法将需要309KB来发送混合；相较而言，本发明的方法对于相同的信息(加上复用和头字段的一些最低开销)将只需要251KB。这个实验不表示对可以通过进一步优化压缩方法来得到的改进的限制。It will be appreciated that the method of the present invention does require sending some redundant data. However, the file size (or bit rate requirement) of the method of the present invention is smaller than the file size (or bit rate requirement) required by either: a) using lossless encoding for all channels, or b) sending redundant descriptions of the lossy coded objects plus the lossy coded upmix. In one experiment, the method of the present invention was used to send the upmix A+B (for a single object B) along with the base channel A. The results are shown in Table 1. As can be seen, the redundant description (prior art) method would require 309KB to send the mix; in comparison, the method of the present invention would only require 251KB for the same information (plus some minimal overhead for multiplexing and header fields). This experiment does not represent a limit to the improvements that can be obtained by further optimizing the compression method.

如图4所示，在本方法的可替代的实施例中，编码方法不同，因为残差信号Δ根据Q’(A)＝D(E(C))-ΣD(E(Bi))和Q(A)(代替A)之间的差值而得出。这个实施例在这样的应用中特别适合：在该应用中A的重构被期望，并被期待近似地达到与B和C的重构相同的质量(不需要努力达到对A的更高的保真度重构)。在音频娱乐系统中情况往往如此。As shown in FIG4 , in an alternative embodiment of the present method, the encoding method is different because the residual signal Δ is derived from the difference between Q′(A)=D(E(C))−ΣD(E(Bi)) and Q(A) (instead of A). This embodiment is particularly suitable in applications where the reconstruction of A is desired and expected to be approximately the same quality as the reconstructions of B and C (without striving for a higher fidelity reconstruction of A). This is often the case in audio entertainment systems.

注意，在可替代的实施例中，Q’(A)是通过求取a)C下混的编码然后解码的版本、和b)通过对有损编码的基混合B进行解码而再现的、重构的基对象{Q(Bi)}这两者之间的差值来再现的信号。Note that in an alternative embodiment, Q’(A) is the signal reproduced by taking the difference between a) an encoded and then decoded version of the C downmix, and b) the reconstructed basis object {Q(Bi)} reproduced by decoding the lossy coded base mix B.

现在参考图4，在可替代的方法中，编码器使用有损编码方法分别压缩(步骤410)信号A、{Bi}和C以便得到三个对应的压缩信号，该三个对应的压缩信号分别由EA、{E(Bi)}和E(C)表示。然后编码器使用与用于压缩A的方法互补的方法来解压缩E(A)，产生Q(A)，Q(A)是A的近似(不同，因为它使用有损压缩/解压缩方法被压缩然后解压缩)。该可替代的方法随后使用与用于编码C和{Bi}的方法互补的相应的方法来对E(C)和{E(Bi)}两者解压缩(步骤430)。产生的重构信号Q(C)和{Q(Bi)}是原始的{Bi}和C的近似，由于由有损编码和解码方法引入的缺陷而不同。可替代的方法随后在步骤440中从Q(C)中减去ΣQ(Bi)以便得到差值信号Q’(A)。Q’(A)是A的另一个近似，由于有损压缩被用于发送的下混而不同。残差信号Δ通过从Q’(A)中减去Q(A)被得到(步骤450)。Referring now to FIG. 4 , in an alternative method, an encoder compresses (step 410) signals A, {Bi}, and C, respectively, using a lossy encoding method to obtain three corresponding compressed signals, denoted by EA, {E(Bi)}, and E(C), respectively. The encoder then decompresses E(A) using a method complementary to the method used to compress A, producing Q(A), which is an approximation of A (different because it was compressed and then decompressed using a lossy compression/decompression method). The alternative method then decompresses both E(C) and {E(Bi)} using a corresponding method complementary to the method used to encode C and {Bi} (step 430). The resulting reconstructed signals Q(C) and {Q(Bi)} are approximations of the original {Bi} and C, differing due to imperfections introduced by the lossy encoding and decoding methods. The alternative method then subtracts ΣQ(Bi) from Q(C) in step 440 to obtain a difference signal Q’(A). Q'(A) is another approximation of A, which differs due to lossy compression being used for the transmitted downmix. The residual signal Δ is obtained by subtracting Q(A) from Q'(A) (step 450).

残差信号Δ随后使用编码方法Ec(Ec可以与E不同)被压缩(步骤460)。正如在上面描述的第一实施例中的，Ec优选地是适合残差信号的特性的有损编解码器。该编码器随后通过发送通道发送(步骤470)R＝Ec(Δ)、E(C)和{E(Bi)}，且同步关系被保留。优选地，编码方法还包括将这三个信号复用或重新格式化到复用的封装中以便用于发送或记录。如果一些方式被用于保留或重构这三个分开的但是相关的信号的时间同步，那么可以使用已知的复用方法中的任何一个。应该记住，不同的量化方案可以被用于全部三个信号，并且带宽可以在信号之间分配。音频压缩的许多已知的方法中的任何一个可以被用于E，包括MP3、AAC、WMA或DTS(等等)。The residual signal Δ is then compressed (step 460) using an encoding method Ec (Ec may be different from E). As in the first embodiment described above, Ec is preferably a lossy codec suitable for the characteristics of the residual signal. The encoder then sends (step 470) R=Ec(Δ), E(C) and {E(Bi)} over a transmission channel, with the synchronization relationship preserved. Preferably, the encoding method also includes multiplexing or reformatting these three signals into a multiplexed package for transmission or recording. If some means is used to preserve or reconstruct the time synchronization of these three separate but related signals, then any of the known multiplexing methods can be used. It should be remembered that different quantization schemes can be used for all three signals and the bandwidth can be allocated between the signals. Any of the many known methods of audio compression can be used for E, including MP3, AAC, WMA or DTS (etc.).

由可替代的编码方法编码的信号可以使用上面结合图3描述的相同的解码方法来解码。解码器将减去重构的残差信号以便改善上混信号的近似，Q(A)，由此减小重构的副本信号Q(A)和原始的信号A之间的差值。本发明的两个实施例由这样的一般性而联合起来：它们在编码器处生成残差或误差信号Δ，Δ表示在对信号进行解码和上混以便提取特权对象A之后被期待的差值。在这两个实施例中，误差信号Δ均被压缩和发送(或等同地，被记录和或存储)。在这两个实施例中，解码器均对该被压缩的误差信号进行解压缩并将其从重构的上混信号中减去，该重构的上混信号近似于特权对象A。A signal encoded by an alternative encoding method can be decoded using the same decoding method described above in conjunction with Figure 3. The decoder will subtract the reconstructed residual signal in order to improve the approximation of the upmixed signal, Q(A), thereby reducing the difference between the reconstructed replica signal Q(A) and the original signal A. The two embodiments of the present invention are united by the generality that they generate a residual or error signal Δ at the encoder, Δ representing the difference to be expected after decoding and upmixing the signal to extract the privileged object A. In both embodiments, the error signal Δ is compressed and transmitted (or equivalently, recorded and or stored). In both embodiments, the decoder decompresses the compressed error signal and subtracts it from the reconstructed upmixed signal, which approximates the privileged object A.

可替代的实施例的方法可以在某些应用中具有一些可感知的优点。在实践中，可替代的实施例中的哪个是优选的可以依赖于系统的具体参数以及具体的优化目标。The methods of the alternative embodiments may have some perceived advantages in certain applications. In practice, which of the alternative embodiments is preferred may depend on the specific parameters of the system and the specific optimization goals.

在另一方面，本发明包括用于对混合的音频信号进行压缩或编码的装置，如图5所示。在该装置的第一实施例中，信号C(＝A+B对象混合)和B在输入510和512处被分别提供。信号C由编码器520编码以便产生编码的信号E(C)；信号{Bi}由编码器530编码以便产生第二编码的信号{E(Bi)}。E(C)和{E(Bi)}随后分别由解码器540和550解码，以便产生重构的信号Q(C)和{Q(Bi)}。重构的信号Q(C)和{Q(Bi)}在混合器560中被相减地混合以便产生差值信号Q’(A)。这个差值信号与原始信号A不同，因为它是通过对重构的总混合Q(C)和重构的对象{Q(Bi)}进行混合而得到的；伪影或误差被引入，均是因为编码器520是有损编码器，并且因为信号是通过减法(在混合器560中)而得出的。重构的信号Q’(A)随后被从信号A中减去(输入到570)且差值Δ由第二编码器580压缩以便产生压缩的残差信号Ec(Δ)，在优选的实施例中第二编码器580使用与压缩器520不同的方法来操作。In another aspect, the present invention includes an apparatus for compressing or encoding a mixed audio signal, as shown in FIG5 . In a first embodiment of the apparatus, signals C (=A+B object mixture) and B are provided at inputs 510 and 512 , respectively. Signal C is encoded by encoder 520 to produce encoded signal E(C); signal {Bi} is encoded by encoder 530 to produce a second encoded signal {E(Bi)}. E(C) and {E(Bi)} are then decoded by decoders 540 and 550 , respectively, to produce reconstructed signals Q(C) and {Q(Bi)}. The reconstructed signals Q(C) and {Q(Bi)} are subtractively mixed in mixer 560 to produce a difference signal Q'(A). This difference signal differs from the original signal A because it is obtained by mixing the reconstructed total mixture Q(C) and the reconstructed object {Q(Bi)}; artifacts or errors are introduced both because encoder 520 is a lossy encoder and because the signals are derived by subtraction (in mixer 560). The reconstructed signal Q’(A) is then subtracted from signal A (input to 570) and the difference Δ is compressed by a second encoder 580 to produce a compressed residual signal Ec(Δ), which in a preferred embodiment operates using a different method than the compressor 520.

如图6所示，在编码器装置的可替代实施例中，信号C(＝A+B对象混合)和B分别在输入510和512处被提供。信号C由编码器520编码以便产生编码的信号E(C)；信号{Bi}由编码器530编码以便产生第二编码的信号{E(Bi)}。E(C)和{E(Bi)}随后分别由解码器540和550解码，以便产生重构的信号Q(C)和{Q(Bi)}。重构的信号Q(C)和{Q(Bi)}在混合器560中被相减地混合以便产生差值信号Q’(A)。这个差值信号与原始信号A不同，因为它是通过对重构的总混合Q(C)和重构的对象{Q(Bi)}进行混合而得到的。伪影或误差被引入，均是因为编码器520是有损编码器，并且因为信号是通过减法(在混合器560中)而得出的。到现在为止可替代的实施例与第一实施例相似。As shown in Figure 6, in an alternative embodiment of an encoder device, signals C (=A+B object mixture) and B are provided at inputs 510 and 512, respectively. Signal C is encoded by encoder 520 to produce encoded signal E(C); signal {Bi} is encoded by encoder 530 to produce a second encoded signal {E(Bi)}. E(C) and {E(Bi)} are then decoded by decoders 540 and 550, respectively, to produce reconstructed signals Q(C) and {Q(Bi)}. The reconstructed signals Q(C) and {Q(Bi)} are subtractively mixed in mixer 560 to produce a difference signal Q'(A). This difference signal differs from the original signal A because it is obtained by mixing the reconstructed total mixture Q(C) and the reconstructed object {Q(Bi)}. Artifacts or errors are introduced because encoder 520 is a lossy encoder and because the signals are derived by subtraction (in mixer 560). Up to this point, the alternative embodiment is similar to the first embodiment.

在装置的可替代的实施例中，在输入570处接收的信号A由编码器572编码(该编码器可以是与有损编码器520和530相同的编码器或由与之相同的原理来操作)，然后572的编码输出再次由互补的解码器574来解码以便产生重构的近似Q(A)，由于编码器572的有损的性质，所以Q(A)与A不同。重构的信号Q(A)随后在混合器560中被从Q’(A)中减去，而产生的残差信号由第二编码器580编码(与在有损编码器520和530中使用的方法不同的方法)。输出E(C)、{E(Bi)}和E(Δ)随后被使得可用于进行发送或记录，优选地以一些复用的格式或准许同步的任何其他方法进行发送或记录。In an alternative embodiment of the apparatus, the signal A received at input 570 is encoded by an encoder 572 (which may be the same encoder as the lossy encoders 520 and 530 or operate on the same principles), and the encoded output of 572 is then decoded again by a complementary decoder 574 to produce a reconstructed approximation Q(A), which differs from A due to the lossy nature of encoder 572. The reconstructed signal Q(A) is then subtracted from Q'(A) in a mixer 560, and the resulting residual signal is encoded by a second encoder 580 (using a method different from that used in lossy encoders 520 and 530). The outputs E(C), {E(Bi)}, and E(Δ) are then made available for transmission or recording, preferably in some multiplexed format or any other method that permits synchronization.

由第一或可替代的方法或编码装置(图6)编码的内容可以由图3的解码器来解码，这将是清楚的。解码器需要压缩的误差信号，但是不需要对计算误差的方式敏感。这给将来在编解码器上进行改进而不改变解码器设计留下了机会。It will be clear that content encoded by the first or alternative method or encoding apparatus ( FIG. 6 ) can be decoded by the decoder of FIG. 3 . The decoder requires a compressed error signal, but need not be sensitive to the manner in which the error is calculated. This leaves room for future improvements in the codec without changing the decoder design.

本文中描述的方法可以在消费者电子设备中实现，诸如通用计算机、数字音频工作站、DVD或BD播放器、TV调谐器、CD播放器、手持播放器、互联网音频/视频设备、游戏控制台、移动电话、头戴式耳机等等。消费者电子设备可以包括中央处理单元(CPU)，该中央处理单元可以表示一个或多个种类的处理器，诸如IBM PowerPC，Intel Pentium(x86)处理器等等。随机存取存储器(RAM)临时存储由CPU执行的数据处理操作的结果，并且通常可以经由专用内存通道与CPU相连。消费者电子设备还可以包括诸如硬驱动的永久存储设备，其也可以经由I/O总线与CPU通信。诸如磁带驱动器或光盘驱动器的其他种类的存储设备也可以被连接。显卡也可以经由视频总线被连接到CPU，并将表示显示数据的信号发送到显示监视器。诸如键盘或鼠标的外围数据输入设备可以经由USB端口被连接到音频再现系统。USB控制器可以对去到以及来自CPU的数据和指令进行转换以用于连接到USB端口的外围设备。诸如打印机、麦克风、扬声器、头戴式耳机等等的附加的设备可以被连接到消费者电子设备。The methods described herein can be implemented in consumer electronic devices, such as general-purpose computers, digital audio workstations, DVD or BD players, TV tuners, CD players, handheld players, Internet audio/video devices, game consoles, mobile phones, headphones, and the like. Consumer electronic devices may include a central processing unit (CPU), which may represent one or more types of processors, such as an IBM PowerPC, an Intel Pentium (x86) processor, and the like. Random access memory (RAM) temporarily stores the results of data processing operations performed by the CPU and is typically connected to the CPU via a dedicated memory channel. Consumer electronic devices may also include permanent storage devices, such as hard drives, which may also communicate with the CPU via an I/O bus. Other types of storage devices, such as tape drives or optical disk drives, may also be connected. A graphics card may also be connected to the CPU via a video bus and send signals representing display data to a display monitor. Peripheral data input devices, such as a keyboard or mouse, may be connected to the audio reproduction system via a USB port. A USB controller may convert data and instructions to and from the CPU for use by peripheral devices connected to the USB port. Additional devices, such as printers, microphones, speakers, headphones, and the like, may be connected to the consumer electronic device.

消费者电子设备可以利用具有图形用户接口(GUI)的操作系统，诸如来自华盛顿雷蒙德的微软公司的WINDOWS、来自CA库珀蒂诺的苹果公司的MAC OS、为诸如安卓的移动操作系统而设计的移动GUI的各种版本等等。消费者电子设备可以运行一个或多个计算机程序。通常，操作系统和计算机程序被有形地体现在非瞬时性计算机可读介质中，例如包括硬驱动的、固定和/或可移动数据存储设备中的一个或多个。操作系统和计算机程序两者均可以从前述的数据存储设备中被加载到RAM中以便由CPU执行。计算机程序可以包括指令，当由CPU读取和运行时，该指令使得该CPU执行运行本文中描述的实施例的步骤或特征的步骤。The consumer electronic device may utilize an operating system with a graphical user interface (GUI), such as WINDOWS from Microsoft Corporation of Redmond, Washington, MAC OS from Apple Inc. of Cupertino, CA, various versions of mobile GUI designed for mobile operating systems such as Android, and the like. The consumer electronic device may run one or more computer programs. Typically, the operating system and the computer program are tangibly embodied in a non-transitory computer-readable medium, such as one or more of a hard drive, fixed and/or removable data storage device. Both the operating system and the computer program may be loaded from the aforementioned data storage device into RAM for execution by the CPU. The computer program may include instructions that, when read and executed by the CPU, cause the CPU to perform the steps or features of the embodiments described herein.

本文中描述的实施例可以具有许多不同的配置和架构。任何这样的配置或架构可以容易地被替代。本领域的技术人员将认识到，上述序列是在计算机可读介质中最常用的，但是具有可以被替代的其他现存的序列。The embodiments described herein can have many different configurations and architectures. Any such configuration or architecture can be easily substituted. Those skilled in the art will recognize that the above sequence is the most commonly used in computer-readable media, but there are other existing sequences that can be substituted.

一个实施例的元素可以由硬件、固件、软件或其任何组合来实现。当被实现为硬件时，本文中描述的实施例可以在一个音频信号处理器上应用或在各种处理部件之间被分配。当在软件中被实现时，实施例的元素可以包括执行必要的任务的代码段。软件可以包括实施在一个实施例中描述的操作的实际代码或模拟或仿真该操作的代码。程序或代码段可以被存储在处理器或机器可访问介质中，或由在载波中体现的计算机数据信号或由载波调制的信号经由发送介质来发送。处理器可读或可访问介质或机器可读或可访问介质可以包括可以存储、发送或传递信息的任何介质。相较而言，计算机可读存储介质或非瞬时性计算机存储器可以包括物理计算机器存储设备但是不包括信号。The elements of an embodiment can be implemented by hardware, firmware, software or any combination thereof. When implemented as hardware, the embodiments described herein can be applied on an audio signal processor or distributed between various processing components. When implemented in software, the elements of the embodiment can include code segments that perform the necessary tasks. The software can include actual code that implements the operations described in an embodiment or code that simulates or emulates the operations. The program or code segment can be stored in a processor or machine accessible medium, or sent via a transmission medium by a computer data signal embodied in a carrier or a signal modulated by a carrier. Processor-readable or accessible medium or machine-readable or accessible medium can include any medium that can store, send or transmit information. In contrast, a computer-readable storage medium or non-transitory computer memory can include a physical computer machine storage device but does not include a signal.

处理器可读介质的示例包括电子电路、半导体存储设备、只读存储器(ROM)、闪存存储器、可擦除ROM(EROM)、软盘、压缩磁盘(CD)ROM、光盘、硬盘、光纤介质、射频(RF)链路等等。计算机数据信号可以包括能够经由诸如电子网络通道、光纤、空气、电磁波、RF链路等的发送介质传播的任何信号。代码段可以经由诸如互联网、内联网等的计算机网络来下载。机器可访问介质可以在制品中体现。机器可访问介质可以包括当由机器访问时使机器执行下面描述的操作的数据。术语“数据”除了具有普通的意义之外，在这里还指为了机器可读的目的而被编码的任何种类的信息。因此，它可以包括程序、代码、文件等。Examples of processor-readable media include electronic circuits, semiconductor storage devices, read-only memories (ROMs), flash memories, erasable ROMs (EROMs), floppy disks, compact disk (CD) ROMs, optical disks, hard disks, fiber optic media, radio frequency (RF) links, and the like. Computer data signals may include any signal that can be transmitted via a transmission medium such as an electronic network channel, an optical fiber, air, electromagnetic waves, an RF link, and the like. Code segments may be downloaded via a computer network such as the Internet, an intranet, and the like. Machine-accessible media may be embodied in an article of manufacture. Machine-accessible media may include data that, when accessed by a machine, causes the machine to perform the operations described below. The term "data," in addition to its ordinary meaning, refers here to any type of information that is encoded for the purpose of being machine-readable. Thus, it may include programs, codes, files, and the like.

各种实施例的全部或部分可以由在机器中运行的软件实现，该机器例如包括数字逻辑电路的硬件处理器。软件可以具有彼此耦合的数个模块。硬件处理器可以是可编程数字微处理器、或专用可编程数字信号处理器(DSP)、场可编程门阵列、ASIC或其他数字处理器。例如，在一个实施例中，根据本发明的方法的步骤的全部(或者在编码器方面或者解码器方面)可以由在软件控制下顺序地运行全部步骤的一个或多个可编程数字计算机来合适地实施。软件模块可以耦合到另一模块以便接收变量、参数、自变量(argument)、指针等和/或以便生成或传递结果、更新的变量、指针等。软件模块还可以是与运行在平台上的操作系统交互的软件驱动器或接口。软件模块还可以包括用来配置、设置、初始化硬件设备、发送数据到该硬件设备或从该硬件设备接收数据的硬件驱动器。All or part of various embodiments can be implemented by software running in a machine, and this machine for example includes the hardware processor of digital logic circuit.Software can have several modules coupled to each other.The hardware processor can be a programmable digital microprocessor or a special programmable digital signal processor (DSP), a field programmable gate array, an ASIC or other digital processor.For example, in one embodiment, all of the steps of the method according to the present invention (or in terms of encoder or decoder) can be suitably implemented by one or more programmable digital computers that sequentially run all steps under software control.A software module can be coupled to another module to receive variables, parameters, independent variables (arguments), pointers etc. and/or to generate or transfer results, updated variables, pointers etc.A software module can also be a software driver or an interface that interacts with the operating system running on the platform.A software module can also include a hardware driver that is used to configure, set, initialize a hardware device, send data to the hardware device or receive data from the hardware device.

各种实施例可以被描述为一个或多个过程，该一个或多个过程可以被描绘成流程图、流图、结构图或框图。虽然框图可以将操作描述成顺序过程，但是许多操作可以并行或同期执行。此外，操作的顺序可以重新设置。当过程的操作完成时，过程终止。过程可以对应于方法、程序、步骤等等。Various embodiments may be described as one or more processes, which may be depicted as flow charts, flow diagrams, structure diagrams, or block diagrams. Although a block diagram may depict operations as a sequential process, many operations may be performed in parallel or concurrently. Furthermore, the order of operations may be reset. A process terminates when its operations are completed. A process may correspond to a method, procedure, step, or the like.

在整个本申请，频繁地引述加法、减法或“相减地混合”信号。将容易地认识到，信号可以以各种方式混合，结果是等同的。例如，为了从G中减去任意信号F(G-F)，人们可以使用差分输入直接相减，或者等同地将信号中的一个翻转，然后相加(例如：G+(-F))。其他等同操作可以被设想，一些操作包括引入相位偏移。诸如“减去”或“相减地混合”的术语意图包括这样的等同变型。类似地，信号相加的变型的方法是可能的，并被设想为“混合”。Throughout this application, reference is frequently made to adding, subtracting, or "subtractively mixing" signals. It will be readily appreciated that signals can be mixed in a variety of ways with equivalent results. For example, to subtract an arbitrary signal F from G (G-F), one can use differential inputs to directly subtract, or equivalently flip one of the signals and then add (e.g., G+(-F)). Other equivalent operations are contemplated, some involving the introduction of phase offsets. Terms such as "subtracting" or "subtractively mixing" are intended to include such equivalent variations. Similarly, variations on the addition of signals are possible and contemplated as "mixing."

在示出并描述了本发明的数个示例性的实施例的情况下，本领域的技术人员将能想到多种变型和可替代的实施例。可以在不违背在所附的权利要求中定义的、本发明的精神和范围的情况下设想和进行这样的变型和可替代实施例。While several exemplary embodiments of the present invention have been shown and described, numerous modifications and alternative embodiments will occur to those skilled in the art. Such modifications and alternative embodiments may be conceived and made without departing from the spirit and scope of the present invention as defined in the appended claims.

Claims

1. A method for decompressing and upmixing a compressed and downmixed composite audio signal, comprising the following steps:

receiving a compressed representation of a total mixed signal C, a compressed representation of a residual signal Δ, and a set of compressed representations {Bi} of corresponding object signals, wherein the set of compressed representations of at least one object signal comprises at least one compressed representation of a corresponding object signal Bi, wherein Bi is one of the set of compressed representations {Bi} of the corresponding object signals, and the total mixed signal C is a mixture of the set of object signals {Bi} and a base signal A;

Decompressing the compressed representation of the total mixed signal to obtain an approximate total mixed signal C';

decompressing the compressed representation of the residual signal Δ to obtain a reconstructed residual signal;

decompressing the set of compressed representations of the object signals so as to obtain a complete set of approximated object signals {Bi'}, said set having as members one or more approximated object signals Bi';

subtractively mixing the approximated total mixed signal C' and the complete set of approximated object signals {Bi'} to obtain a first approximation A' of the basis signal; and

The reconstructed residual signal is subtractively mixed with a first approximation of the base signal in order to obtain an improved approximation of the base signal.

2. The method of claim 1, wherein the set of compressed representations of object signals comprises one compressed representation of the corresponding object signal.

3. The method of claim 1, wherein at least one of the compressed representations is prepared by a lossy compression method.

4. The method of claim 3 , wherein the compressed representation of the residual signal Δ is prepared by:

subtractively mixing a reference signal R with a reconstructed approximation A' of the base signal A to obtain a residual signal Δ representing the difference; and

The residual signal Δ is compressed.

The method of claim 4 , wherein the reference signal comprises a base signal A.

The method of claim 4 , wherein the reference signal comprises an approximation of the base signal A.

7. The method of claim 1 , further comprising:

At least one of the corrected base signal A", the reconstructed set of object signals Q({Bi}) and the approximate total mixed signal C' is reproduced as sound.

8. The method of claim 1, wherein

The step of decompressing the set of compressed representations of the corresponding object signals comprises decompressing a plurality of compressed representations so as to obtain the complete set of approximated object signals {Bi'}; and

The step of subtractively mixing the approximated total mixed signal C' and the complete set of object signals comprises subtracting the complete set of approximated object signals {Bi'} from C' to obtain a first approximation of the base signal.

9. The method of claim 8, wherein at least one of the compressed representations is prepared by a lossy compression method.

10. The method of claim 9, wherein the compressed representation of the residual signal Δ is prepared by:

subtractively mixing the reference signal R with the reconstructed approximation A' of the base signal A to obtain a residual signal Δ representing the difference; and

The residual signal Δ is compressed.

The method of claim 10 , wherein the reference signal comprises a base signal A.

The method of claim 10 , wherein the reference signal comprises an approximation of the base signal A.

13. The method of claim 8, further comprising:

14. A method for compressing a composite audio signal comprising a total mixed signal C, a set of compressed representations {Bi} of corresponding object signals, and a base signal A, wherein the total mixed signal C comprises the base signal A mixed with the set of compressed representations {Bi} of corresponding object signals, the set of compressed representations {Bi} of corresponding object signals having at least one member object signal Bi, the method comprising the following steps:

compressing the total mixed signal C and the set of compressed representations {Bi} of the corresponding object signals using a lossy compression method to produce a compressed total mixed signal E(C) and a compressed set of object signals E({Bi}), respectively;

Decompressing the compressed total mixed signal E(C) and the compressed set of object signals E({Bi}) to obtain a reconstructed signal Q(C) and a reconstructed set of object signals Q({Bi});

subtractively mixing the reconstructed signal Q(C) and a complete mixture of the set of reconstructed signals Q({Bi}) to produce an approximate basis signal Q'(A);

subtracting a reference signal from the approximated base signal Q'(A) to generate a residual signal Δ; and

The residual signal Δ is compressed to obtain a compressed residual signal Ec(Δ).

15. The method of claim 14, wherein the set of compressed representations {Bi} of respective object signals comprises only one object signal.

16. The method of claim 15, further comprising the steps of:

A composite signal including the compressed total mixed signal E(C), the compressed set of object signals E({Bi}) and the compressed residual signal Ec(Δ) is transmitted.

The method of claim 15 , wherein the reference signal comprises a base signal A.

18. The method of claim 15, wherein the reference signal comprises an approximation of a base signal A, wherein the approximation of the base signal A is obtained by compressing the base signal A using a lossy compression method and then decompressing it to obtain an approximation Q(A) of the base signal.

19. The method of claim 15, wherein the step of compressing the residual signal comprises compressing the residual signal using a method different from the method used to compress the total mixed signal C.

20. The method of claim 14, wherein the set of compressed representations {Bi} of corresponding object signals comprises a plurality of object signals.

The method of claim 20 , wherein the reference signal comprises a base signal A.

22. The method of claim 20, wherein the reference signal comprises an approximation of a base signal A, wherein the approximation of the base signal A is obtained by compressing the base signal A using a lossy compression method and then decompressing it to obtain an approximation Q(A) of the base signal.

23. The method of claim 20, wherein the step of compressing the residual signal comprises compressing the residual signal using a method different from the method used to compress the total mixed signal C.

24. A method for improving digital audio reproduction by refining an approximation of an audio base signal A derived from an approximated total mixed signal C′ and a complete set of approximated object signals {Bi′} having at least one member signal Bi′, the method comprising the steps of:

decompressing the compressed representation of the residual signal Δ to obtain a reconstructed residual signal Δ′;

subtractively mixing the approximated total mixed signal C' and the complete set of approximated object signals {Bi'} to obtain a first approximation A' of the audio base signal A; and

The first approximation A' of the audio base signal A is subtractively mixed with the reconstructed residual signal Δ' in order to obtain an improved approximation of the audio base signal A.

25. The method of claim 24, wherein the compressed representation of the residual signal Δ is prepared by:

subtractively mixing the reconstructed approximation A' of the audio base signal A with the reference signal R to obtain a residual signal representing the difference; and

The residual signal is compressed to obtain a compressed representation of the residual signal Δ.

The method of claim 25 , wherein the reference signal comprises an audio base signal A.

27. The method of claim 25, wherein the reference signal comprises an approximation A' of the base signal, the approximation A' being prepared by compressing the audio base signal A using a lossy method and then decompressing it to obtain the reference signal R.