WO2001003119A1 - Codage et decodage audio incluant des composantes non harmoniques du signal - Google Patents
Codage et decodage audio incluant des composantes non harmoniques du signal Download PDFInfo
- Publication number
- WO2001003119A1 WO2001003119A1 PCT/FR2000/001907 FR0001907W WO0103119A1 WO 2001003119 A1 WO2001003119 A1 WO 2001003119A1 FR 0001907 W FR0001907 W FR 0001907W WO 0103119 A1 WO0103119 A1 WO 0103119A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- spectral
- audio signal
- spectrum
- cepstral
- data
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
Definitions
- the present invention relates to the field of coding of audio signals. It applies in particular, but not exclusively, to speech coding, in narrow band or in wide band, in various ranges of coding bit rates.
- the design of an audio codec mainly aims to provide a good compromise between the bit rate of the stream transmitted by the coder and the quality of the audio signal which the decoder is capable of reconstructing from this stream.
- the coder estimates a fundamental frequency of the signal, representing its pitch
- the spectral analysis consists determining parameters representing the harmonic structure of the signal at frequencies which are integer multiples of this fundamental frequency.
- a modeling of the non-harmonic, or non-voiced component can also be carried out in the spectral domain.
- the parameters transmitted to the decoder typically represent the modulus of the spectrum of the voiced and unvoiced components. Added to this is information representing either voiced / unvoiced decisions relating to different portions of the spectrum, or information on the probability of voicing of the signal, allowing the decoder to determine in which portions of the spectrum it must use the voiced component. or the unvoiced component.
- coder families include MBE type coders
- An object of the present invention is to allow, in a coding scheme with analysis in the spectral domain, a coding of the non-harmonic or unvoiced component which presents a good compromise between the quality of representation of this component over the range of the spectrum and the required bit rate.
- Another aim is to propose a modeling of the non-harmonic component which is homogeneous with that of the harmonic component, thus allowing an adequate mixing of the harmonic and non-harmonic components.
- the invention thus proposes a method for coding an audio signal, in which a fundamental frequency of the audio signal is estimated, a spectrum of the audio signal is determined by a transform in the frequency domain of a frame of the audio signal, and includes in a digital output stream data representative of spectral amplitudes associated with at least some of the frequencies of the spectrum.
- the data included in the digital output stream comprises data for coding a non-harmonic component of the audio signal, comprising data representative of spectral amplitudes associated with frequencies located in regions of the intermediate spectrum with respect to the multiple frequencies of the estimated fundamental frequency.
- Another aspect of the present invention relates to a method for decoding an input digital stream representing an encoded audio signal, in which a spectral estimate of a component is generated over the spectrum of the audio signal.
- harmonic of the audio signal on the basis of first coding data included in the digital input stream, a spectral estimate of a non-harmonic component of the audio signal is generated over the spectrum of the audio signal based on second coding data included in the digital input stream, and combining the spectral estimates of the harmonic and non-harmonic components to form an overall spectral estimate which is transformed in the time domain to produce a decoded version of the audio signal.
- the invention also provides an audio coder and decoder comprising means for implementing the above methods.
- FIG. 1 is a block diagram of an audio coder according to invention
- Figures 2 and 3 are diagrams illustrating the formation of audio signal frames in the encoder of Figure 1;
- FIGS. 4 and 5 are graphs showing an example of the audio signal spectrum and illustrating the extraction of the upper and lower envelopes of this spectrum;
- FIG. 6 is a block diagram of an example of quantization means usable in the encoder of Figure 1;
- FIG. 7 is a block diagram of means used to extract parameters relating to the phase of the non-harmonic component in a variant of the encoder of Figure 1;
- FIG. 8 is a block diagram of an audio decoder corresponding to the encoder of Figure 1;
- FIG. 9 is a flow diagram of an example of a procedure for smoothing spectral coefficients and extracting minimum phases implemented in the decoder of FIG. 8;
- FIG. 10 is a block diagram of analysis and spectral mixing modules of harmonic and non-harmonic components of the audio signal
- FIGS. 14 and 15 are diagrams illustrating one way of proceeding to the temporal synthesis of the signal frames in the decoder of FIG. 8; - Figures 16 and 17 are graphs showing windowing functions usable in the synthesis of the frames according to Figures 14 and 15;
- FIGS. 18 and 19 are block diagrams of interpolation means usable in an alternative embodiment of the coder and the decoder;
- FIG. 20 is a block diagram of interpolation means usable in another alternative embodiment of the encoder.
- FIGS. 21 and 22 are diagrams illustrating another way of proceeding with the temporal synthesis of the signal frames in the decoder of FIG. 8, using an interpolation of parameters.
- the coder and the decoder described below are digital circuits which can, as is usual in the field of audio signal processing, be produced by programming a digital signal processor (DSP) or an integrated circuit d specific application (ASIC).
- DSP digital signal processor
- ASIC integrated circuit d specific application
- the audio coder represented in FIG. 1 processes an input audio signal x which, in the nonlimiting example considered below, is a signal of word.
- the signal x is available in digital form, for example at a sampling frequency F e of 8 kHz. It is for example delivered by an analog-digital converter processing the amplified output signal of a microphone.
- the input signal x can also be formed from another version, analog or digital, coded or not, of the speech signal.
- the encoder comprises a module 1 which forms successive audio signal frames for the different treatments carried out, and an output multiplexer 6 which delivers an output stream ⁇ containing for each frame sets of quantization parameters from which a decoder will be capable. synthesize a decoded version of the audio signal.
- N 256
- the module 1 multiplies the samples of each frame 2 by a windowing function f A , preferably chosen for its good spectral properties.
- the coder in FIG. 1 analyzes the audio signal in the spectral domain. It includes a module 3 which calculates the fast Fourier transform (TFR) of each signal frame.
- TFR fast Fourier transform
- the TFR 3 module obtains the signal spectrum for each frame, the module and phase of which are respectively denoted
- a fundamental frequency detector 4 estimates for each signal frame a value of the fundamental frequency F 0 .
- the detector 4 can apply any known method of analysis of the speech signal of the frame to estimate the fundamental frequency F 0 , for example a method based on the autocorrelation function or the AMDF function, possibly preceded by a whitening module. by linear prediction.
- the estimation can also be performed in the spectral domain or in the cepstral domain.
- Another possibility is to evaluate the time intervals between the consecutive breaks in the speech signal attributable to closures of the glottis of the intervening speaker during the duration of the frame.
- Well-known methods which can be used to detect such micro-ruptures are described in the following articles: M.
- the estimated fundamental frequency F 0 is subject to quantification, for example scalar, by a module 5, which supplies the output multiplexer 6 with an index iF for quantizing the fundamental frequency for each frame of the signal.
- the encoder uses cepstral parametric models to represent an upper envelope and a lower envelope of the spectrum of the audio signal.
- the first step of the cepstral transformation consists in applying to the signal spectrum module a spectral compression function, which can be a logarithmic or root function.
- the coder module 8 thus operates, for each value X (i) of the signal spectrum (0 ⁇ i ⁇ N), the following transformation:
- ⁇ being an exponent between 0 and 1.
- the compressed spectrum LX of the audio signal is processed by a module 9 which extracts spectral amplitudes associated with the harmonics of the signal corresponding to the multiples of the estimated fundamental frequency F0. These amplitudes are then interpolated by a module 10 in order to obtain a compressed upper envelope denoted LX_sup.
- the spectral compression could be carried out in an equivalent manner after the determination of the amplitudes associated with the harmonics. It could also be done after interpolation, which would only change the form of the interpolation functions.
- the maxima extraction module 9 takes account of the possible variation of the fundamental frequency on the analysis frame, of the errors that the detector 4 can make, as well as of the inaccuracies linked to the discrete nature of the frequency sampling. For this, the search for the amplitudes of the spectral peaks does not simply consist in taking the values LX (i) corresponding to the indices i such that iF e / 2N is the frequency closest to a harmonic of frequency kF 0 (k> 1 ).
- the spectral amplitude retained for a harmonic of order k is a local maximum of the modulus of the spectrum in the vicinity of the frequency kF 0 (this amplitude is obtained directly in compressed form when the spectral compression 8 is carried out before the extraction of the maxima 9).
- FIGS. 4 and 5 show an example of the shape of the compressed spectrum LX, where it can be seen that the maximum amplitudes of the harmonic peaks do not necessarily coincide with the amplitudes corresponding to the integer multiples of the estimated fundamental frequency F 0 .
- the sides of the peaks being quite steep, a small positioning error of the fundamental frequency F 0 , amplified by the harmonic index k, can strongly distort the estimated upper envelope of the spectrum and cause poor modeling of the formantic structure of the signal.
- the interpolation is carried out between points whose abscissa is the frequency corresponding to the maximum of the amplitude of a spectral peak, and whose ordinate is this maximum, before or after compression.
- the interpolation performed to calculate the upper envelope LX_sup is a simple linear interpolation.
- another form of interpolation could be used (for example polynomial or spline).
- the interpolation is carried out between points whose abscissa is a frequency kF 0 multiple of the fundamental frequency (in fact the closest frequency in the discrete spectrum) and whose ordinate is the maximum amplitude, before or after compression, of the spectrum in the vicinity of this multiple frequency.
- the maximum amplitude search interval associated with a harmonic of rank k is centered on the index i of the frequency of the highest TFR
- This search interval depends on the sampling frequency F e , the size 2N of the TFR and the range of possible variation of the fundamental frequency. This width is typically of the order of ten frequencies with the examples of values previously considered. It can be made adjustable as a function of the value F 0 of the fundamental frequency and of the number k of the harmonic.
- a non-linear distortion of the frequency scale is operated on the upper envelope compressed by a module 12 before the module 13 performs the inverse fast Fourier transform (TFRI) providing the cepstral coefficients cx_sup.
- TFRI inverse fast Fourier transform
- the non-linear distortion makes it possible to minimize the modeling error more effectively. It is for example carried out according to a Mel or Bark type frequency scale. This distortion may possibly depend on the estimated fundamental frequency F 0 .
- Figure 1 illustrates the case of the Mel scale. The relationship between the frequencies F of the linear spectrum, expressed in hertz, and the frequencies F 'of the Mel scale is as follows: c , 1000. (. F ⁇
- NCS can be equal to 16.
- a post-filtering in the cepstral domain is applied by a module 15 to the compressed upper envelope LX_sup.
- This post-liftrage corresponds to a manipulation of the cepstral coefficients cx_sup delivered by the module of TRFI 13, which corresponds approximately to a post-filtering of the harmonic part of the signal by a transfer function having the classical form:
- a (z) is the transfer function of a linear prediction filter of the audio signal
- ⁇ 1 and ⁇ 2 are coefficients between 0 and 1
- ⁇ is a possibly zero pre-emphasis coefficient.
- a normalization module 16 further modifies the cepstral coefficients by imposing the exact modeling constraint of a point on the initial spectrum, which is preferably the most energetic point among the spectral maxima extracted by the module 9 In practice, this normalization only modifies the value of the coefficient c p (0).
- the normalization module 16 operates as follows: it recalculates a value of the synthesized spectrum at the frequency of the maximum indicated by the module 9, by Fourier transform of the truncated and post-liftral cepstral coefficients, taking into account the non-distortion linear of the frequency axis; it determines a normalization gain g N by the logarithmic difference between the value of the maximum provided by the module 9 and this recalculated value; and he adds the gain g N to the post-raised cepstral coefficient Cp (0). This normalization can be seen as part of post-liftrage.
- Post-lifter and standardized cepstral coefficients are subject to a quantization by a module 18 which transmits corresponding quantization indexes icxs to the output multiplexer 6 of the coder.
- the module 18 can operate by vector quantization from cepstral vectors formed from post-liftred and normalized coefficients, denoted here cx [n] for the signal frame of rank n.
- cx [n] 16 cepstral coefficients cx [n, 0], cx [n, 1], ..., cx [n, NCS-1] is distributed in four sub - cepstral vectors each containing four coefficients of consecutive orders.
- the cepstral vector cx [n] can be processed by the means shown in Figure 6, part of the quantization module 18.
- rcx_q [n-1] denotes the quantized residual vector for the frame of rank n-1, whose components are respectively noted rcx_q [n, 0], rcx_q [n, 1], ... , rcx_q [n, NCS-1].
- the numerator of the relation (10) is obtained by a subtractor 20, whose components of the output vector are divided by the quantities 2- ⁇ (i) at 21.
- the residual vector rcx [n] is subdivided into four sub-vectors, corresponding to the subdivision into four cepstral sub-vectors.
- the unit 22 proceeds to the vector quantization of each sub-vector of the residual vector rcx [n]. This quantification can consist, for each sub-vector srcx [n], in selecting from the dictionary the quantized sub-vector srcx_q [n] which minimizes the quadratic error
- Unit 22 also delivers the values of the residual sub-vectors which form the vector rcx_q [n]. This is delayed by a frame at 23, and its components are multiplied by the coefficients ⁇ (i) at 24 to supply the vector to the negative input of subtractor 20. This latter vector is also supplied to a adder 25, the other input of which receives a vector formed by the components of the quantized residue rcx_q [n] respectively multiplied by the quantities 1- ⁇ (i) at 26. The adder 25 thus delivers the quantized cepstral vector cx_q [n] that will recover the decoder.
- the prediction coefficient ⁇ (i) can be optimized separately for each of the cepstral coefficients.
- the quantization dictionaries can also be optimized separately for each of four cepstral sub-vectors.
- the above scheme for quantifying cepstral coefficients may only be applied for some of the frames. For example, it is possible to provide a second quantization mode as well as a selection process of that of the two modes which minimizes a criterion of least squares with the cepstral coefficients to be quantified, and transmit with the quantization indexes of the frame a bit indicating which of the two modes has been selected.
- the adaptation module 29 controls the post-lifter 15 so as to minimize a module difference between the spectrum of the audio signal and the corresponding module values calculated at 28.
- This module difference can be expressed by a sum of absolute values of amplitude differences, compressed or not, corresponding to one or more of the frequencies harmonics. This sum can be weighted according to the spectral amplitudes associated with these frequencies.
- the modulus difference taken into account in the adaptation of the post-liftring would take into account all the harmonics of the spectrum.
- the module 28 can resynthesize the spectral amplitudes only for one or more frequencies multiple of the fundamental frequency F 0 , selected on the basis of the size of the spectrum module in absolute value .
- the adaptation module 29 can for example consider the three most intense spectral peaks in the calculation of the module deviation to be minimized.
- the adaptation module 29 estimates a spectral masking curve of the audio signal by means of a psychoacoustic model, and the frequencies taken into account in the calculation of the module deviation to be minimized are selected on the basis the importance of the spectrum module relative to the masking curve (for example, we can take the three frequencies for which the spectrum module exceeds the masking curve the most).
- Different conventional methods can be used to calculate the masking curve from the audio signal.
- the module 29 can use a filter identification model.
- a simpler method consists in predefining a set of post-liftring parameter sets, i.e. a set of couples ⁇ - ) , ⁇ 2 in the case of a post-liftring according to relations (8) , to carry out the operations incumbent on the modules 15, 16, 18 and 28 for each of these sets of parameters, and to retain that of the sets of parameters which leads to the minimum module deviation between the signal spectrum and the recalculated values.
- the quantization indexes provided by the module 18 are then those which relate to the best set of parameters.
- the coder determines coefficients cx_inf representing a compressed lower envelope LX_inf.
- a module 30 extracts from the compressed spectrum LX spectral amplitudes associated with frequencies located in zones of the intermediate spectrum with respect to the multiple frequencies of the estimated fundamental frequency F 0 .
- F 0 simply corresponds to the modulus of the spectrum for the frequency (k + 1/2) .F 0 located in the middle of the interval separating the two harmonics. In another embodiment, this amplitude could be an average of the spectrum modulus over a small range surrounding this frequency (k + 1/2) .F 0 .
- a module 31 proceeds to an interpolation, for example linear, of the spectral amplitudes associated with the frequencies located in the intermediate zones to obtain the compressed lower envelope LX_inf.
- the cepstral transformation applied to this compressed lower envelope LX_inf is carried out according to a frequency scale resulting from a non-linear distortion applied by a module 32.
- the TFRI module 33 calculates a cepstral vector of NCI cepstral coefficients cx_inf of orders 0 to NCI-1 representing the lower envelope.
- the non-linear transformation of the frequency scale for the cepstral transformation of the lower envelope can be carried out towards a finer scale at high frequencies than at low frequencies, which advantageously makes it possible to model well the non-voiced components of the signal at high frequencies.
- module 32 it may be preferable to adopt in module 32 the same scale as in module 12 (Mel in the example considered).
- the cepstral coefficients cx_inf representing the compressed lower envelope are quantified by a module 34, which can operate in the same way as the module 18 for quantifying the cepstral coefficients representing the compressed upper envelope.
- the vector thus formed is subjected to a vector quantization of prediction residue, carried out by means identical to those represented in FIG. 6 but without subdivision into sub-vectors.
- the coder shown in FIG. 1 does not include any particular device for coding the phases of the spectrum with the harmonics of the audio signal.
- it includes means 36-40 for coding temporal information linked to the phase of the non-harmonic component represented by the lower envelope.
- a spectral decompression module 36 and a TFRI module 37 form a temporal estimate of the frame of the non-harmonic component.
- the module 36 applies a reciprocal decompression function of the compression function applied by the module 8 (that is to say an exponential or a power 1 / ⁇ function) to the compressed lower envelope LX_inf produced by the module interpolation 31. This provides the modulus of the estimated frame of the non-harmonic component, the phase of which is taken to be equal to that ⁇ ⁇ of the spectrum of the signal X over the frame.
- the inverse Fourier transform performed by the module 37 provides the estimated frame of the non-harmonic component.
- the module 38 subdivides this estimated frame of the non-harmonic component into several time segments.
- the module 38 calculates the energy equal to the sum of the squares of the samples, and forms a vector E1 formed by eight positive real components equal to the eight calculated energies.
- the largest of these eight energies, denoted EM, is also determined to be supplied, with the vector E1, to a normalization module 39.
- the latter divides each component of the vector E1 by EM, so that the normalized vector Emix is formed of eight components between 0 and 1. It is this normalized vector Emix, or weighting vector, which is subject to quantization by module 40. This can perform vector quantization with a dictionary determined during a prior learning.
- the quantization index iEm is supplied by the module 40 to the output multiplexer 6 of the coder.
- FIG. 7 shows an alternative embodiment of the means used by the coder of FIG. 1 to determine the vector Emix of energy weighting of the frame of the non-harmonic component.
- the spectral decompression and TFRI modules 36, 37 operate like those which have the same references in FIG. 1.
- a selection module 42 is added to determine the value of the module of the spectrum subjected to the inverse Fourier transform 37. On the based on the estimated fundamental frequency F 0 , the module 42 identifies harmonic regions and non-harmonic regions of the spectrum of the audio signal.
- a frequency will be considered to belong to a harmonic region if it is in a frequency interval centered on a harmonic kF 0 and of width corresponding to a width of spectral line synthesized, and to a non-harmonic region otherwise.
- the complex signal subjected to TFRI 37 is equal to the value of the spectrum, that is to say that its modulus and its phase correspond to the values
- this complex signal has the same phase ⁇ ⁇ as the spectrum and a module given by the lower envelope after spectral decompression 36. This procedure according to FIG. 7 provides more precise modeling of non-harmonic regions.
- modules 46-49 include quantization dictionaries similar to those of modules 5, 18, 34 and 40 of FIG. 1, in order to restore the values of the quantized parameters.
- the modules 47 and 48 have dictionaries to form the quantized prediction residues rcx_q [n], and they deduce therefrom the quantified cepstral vectors cx_q [n] with elements identical to the elements 23-26 of FIG. 6. These quantified cepstral vectors cx_q [n] provide the cepstral coefficients cx_sup_q and cx_inf_q processed by the decoder.
- a module 51 calculates the fast Fourier transform of the coefficients cepstraux cx_sup for each signal frame.
- the frequency spectrum of the resulting compressed spectrum is modified non-linearly by a module 52 applying the reciprocal non-linear transformation to that of module 12 in FIG. 1, and which provides the estimate LX_sup of the compressed upper envelope .
- a spectral decompression of LX_sup operated by a module 53, provides the upper envelope X_sup comprising the estimated values of the module of the spectrum at frequencies multiple of the fundamental frequency F 0 .
- the module 54 synthesizes the spectral estimate X v of the harmonic component of the audio signal, by a sum of spectral lines centered on the frequencies multiple of the fundamental frequency F 0 and whose amplitudes (in module) are those given by the envelope superior X_sup.
- the decoder in FIG. 8 is capable of extracting information on this phase from cepstral coefficients cx_sup_q representing the compressed upper envelope. This phase information is used to assign a phase ⁇ (k) to each of the spectral lines determined by the module 54 in the estimation of the harmonic component of the signal.
- the speech signal can be considered to be at minimum phase.
- the minimum phase information can easily be deduced from a cepstral modeling. This minimum phase information is therefore calculated for each harmonic frequency.
- the minimum phase assumption means that the energy of the synthesized signal is localized at the start of each period of the fundamental frequency F 0 .
- This post-liftrage is for example of the form (8).
- module 56 deduces post-liftrated cepstral coefficients and smoothed the minimum phase assigned to each spectral line representing a harmonic peak of spectrum.
- the operations performed by the modules 56, 57 for smoothing and extracting the minimum phase are illustrated by the flowchart in FIG. 9.
- the module 56 examines the variations of the cepstral coefficients in order to apply a lesser smoothing in the presence of sudden variations only in the presence of slow variations. For this, it performs the smoothing of the cepstral coefficients by means of a forgetting factor ⁇ c chosen as a function of a comparison between a threshold d th and a distance d between two successive sets of post-liftrated cepstral coefficients.
- the threshold d th is itself adapted as a function of the variations of the cepstral coefficients.
- the first step 60 consists in calculating the distance d between the two successive vectors relating to the frames n-1 and n. These vectors, denoted here cxp [ ⁇ -1] and cxp [n], correspond for each frame to all the NCS post-liftral cepstral coefficients representing the compressed upper envelope.
- the distance used can in particular be the Euclidean distance between the two vectors or even a quadratic distance.
- Two smoothings are first carried out, respectively by means of forgetting factors ⁇ min and ⁇ ma ⁇ , to determine a minimum distance d mjn and a maximum distance d ma ⁇ .
- the forgetting factors ⁇ min and ⁇ max are themselves selected from two distinct values, respectively ⁇ mjn1 , ⁇ min2 and maxi ' ⁇ max2 between 0 and 1, the indices ⁇ mjn1 , ⁇ max1 each being substantially closer to 0 than the indices ⁇ mjn2 , ⁇ ma ⁇ 2 . If d> d mjn (test 61), the forget factor ⁇ mjn is equal to ⁇ min1 (step 62); otherwise it is taken equal to ⁇ min2 (step 63).
- step 64 the minimum distance d mjn is taken equal to ⁇ min d min + ( 1 - ⁇ min) If d> d max ( test 65 )> the forgetfulness of forgetting ⁇ max is equal to ⁇ ma ⁇ 1 (step 66); otherwise it is taken equal to ⁇ max2 (step 67).
- step 68 the minimum distance d max is taken equal to ⁇ ma ⁇ .d ma ⁇ + (1- ⁇ max ) .d.
- step 72 we adopt for the forget factor ⁇ c a value ⁇ c1 relatively close to 0 (step 72).
- the corresponding signal is considered to be of the non-stationary type, so that there is no need to keep a large memory of the previous cepstral coefficients.
- step 73 we adopt for the forget factor ⁇ c a value ⁇ c2 less close to 0 in order to further smooth the cepstral coefficients.
- the module 57 then calculates the minimum phases ⁇ (k) associated with the harmonics kF 0 - In known manner, the minimum phase for a harmonic of order k is given by:
- step 75 the harmonic index k is initialized to 1.
- the phase ⁇ (k) and the cepstral index m are initialized respectively at 0 and 1 in step 76.
- the module 57 adds to phase ⁇ (k) the quantity -2.cxl [n, m] .sin (2 ⁇ mk.F 0 / F e ).
- the cepstral index m is incremented in step 78 and compared to NCS in step 79. Steps 77 and 78 are repeated as long as m ⁇ NCS.
- m NCS
- the calculation of the minimum phase is completed for the harmonic k, and the index k is incremented in step 80.
- the calculation of minimum phases 76-79 is repeated for the following harmonic as long as kF 0 ⁇ F e / 2 (test 81).
- the module 54 takes account of a constant phase over the width of each spectral line, equal to the minimum phase ⁇ (k) supplied for the corresponding harmonic k by the module 57.
- the estimate X v of the harmonic component is synthesized by summing spectral lines positioned at the harmonic frequencies of the fundamental frequency F 0 .
- the spectral lines can be positioned on the frequency axis with a resolution greater than the resolution of the Fourier transform. For that, one precalculates once and for all a spectral line of reference according to the higher resolution. This calculation can consist of a Fourier transform of the analysis window f with a transform size of 16384 points, providing a resolution of 0.5 Hz per point.
- each harmonic line is then carried out by the module 54 by positioning on the frequency axis the reference line at high resolution, and by sub-sampling this spectral line of reference to reduce to the resolution of 16.625 Hz of the Fourier transform on 512 points. This allows to precisely position the spectral line.
- the TFR module 85 of the decoder of FIG. 8 receives the NCI quantified cepstral coefficients cx_inf_q of orders 0 to NCI - 1, and it advantageously supplements them by the NCS - NCI cepstral coefficients cx_sup_q d NCI to NCS order - 1 representing the upper envelope. Indeed, it can be estimated as a first approximation that the rapid variations of the compressed lower envelope are well reproduced by those of the compressed upper envelope. In another embodiment, the TFR 85 module could only consider the NCI cepstraux parameters cx_inf_q.
- the module 86 converts the frequency scale reciprocally from the conversion operated by the module 32 of the coder, in order to restore the estimate LX_inf of the compressed lower envelope, subjected to the spectral decompression module 87.
- the decoder has a lower envelope X_inf comprising the values of the spectrum module in the valleys located between the harmonic peaks.
- This envelope X_inf will modulate the spectrum of a noise frame whose phase is treated as a function of the quantized weighting vector Emix extracted by the module 49.
- a generator 88 delivers a normalized noise frame whose segments of 4 ms are weighted in a module 89 in accordance with the normalized components of the Emix vector supplied by the module 49 for the current frame.
- This noise is a high-pass filtered white noise to take account of the low level which in principle the unvoiced component has at low frequencies.
- the Fourier transform of the resulting frame is calculated by the TFR module 91.
- the spectral estimate X uv of the non-harmonic component is determined by the spectral synthesis module 92 which performs frequency-by-frequency weighting. This weighting consists in multiplying each complex spectral value supplied by the TFR module 91 by the value of the lower envelope X_inf obtained for the same frequency by the spectral decompression module 87.
- a mixing module 95 controlled by a module 96 for analyzing the degree of harmony (or voicing) of the signal .
- the analysis module 96 comprises a unit 97 for estimating a degree of voicing W dependent on the frequency, from which four gains dependent on are calculated.
- the frequency namely two gains g v , g uv controlling the relative importance of the harmonic and non-harmonic components in the synthesized signal, and two gains g v g uv used to noise the phase of the harmonic component.
- the degree of voicing W (i) is a continuously variable value between 0 and 1 determined for each frequency index i (0 ⁇ i ⁇ N) as a function of the upper envelope X_sup (i) and the lower envelope X_inf (i) obtained for this frequency i by the decompression modules 53, 87.
- the degree of voicing W (i) is estimated by the unit 97 for each frequency index i corresponding to a harmonic of the fundamental frequency F 0 ,
- the threshold Vth (F 0 ) corresponds to the average dynamics calculated on a synthetic spectrum purely voiced at the fundamental frequency. It is advantageously chosen depending on the fundamental frequency F 0 .
- the degree of voicing W (i) for a frequency other than the harmonic frequencies is obtained simply as being equal to that estimated for the nearest harmonic.
- the gain g v (i), which depends on the frequency, is obtained by applying a non-linear function to the degree of voicing W (i) (block 98).
- phase ⁇ v of the mixed harmonic component is the result of a linear combination of the phases ⁇ v , ⁇ uv of the harmonic and non-harmonic components X y , X uv synthesized by the modules 54, 92.
- the gains g v g uv respectively applied to these phases are calculated from the degree of voicing W and also weighted as a function of the frequency index i, since the sound effects of the phase are only really useful beyond a certain frequency.
- a first gain g v1 is calculated by applying a non-linear function to the degree of voicing W (i), as shown diagrammatically by block 100 in FIG. 10.
- This non-linear function can have the form represented in FIG. 12:
- a multiplier 101 multiplies for each index frequency i the gain g v1 by another gain g v2 depending only on the frequency index i, to form the gain g v (i).
- g v2 _ ⁇ (i) G2 if i2 ⁇ i ⁇ 1 the indices il and i2 being such that 0 ⁇ il ⁇ i2 ⁇ N, and the minimum gain G2 being between 0 and 1.
- the complex spectrum Y of the synthesized signal is produced by the mixing module 95, which realizes the following mixing relation, for 0 ⁇ i ⁇ N:
- Y (i) g v (') -
- - exp [i ⁇ v (0] + 9uv (') - X uv (') ⁇ 17 ) with ⁇ ' v (i) g v _ ⁇ (i). ⁇ v (i) + g uv _ ⁇ (i). ⁇ uv (i) (18) where ⁇ v (i) denotes the argument of the complex number X v (i) supplied by the module 54 for the frequency of index i (block 104 of figure 10), and ⁇ uv (i) denotes the argument of the complex number X uv (i) supplied by the module 92 (block 105 of FIG. 10). This combination is carried out by the multipliers 106-110 and the adders 111-112 shown in FIG. 10.
- the frames successively obtained in this way are finally processed by the time synthesis module 116 which forms the decoded audio signal x.
- the time synthesis module 116 performs an overlap sum of frames modified with respect to those successively evaluated at the output of module 115.
- the modification can be seen in two stages illustrated respectively in FIGS. 14 and 15.
- the first step (FIG. 14) consists in multiplying each frame 2 ′ delivered by the TFRI module 115 by a window 1 / f A opposite to the analysis window f A used by the module 1 of the coder. Frame samples
- each sample of the decoded audio signal x thus obtained is assigned a uniform overall weight, equal to A.
- This overall weight comes from the contribution of a single frame if the sample has in this frame a rank i such that L ⁇ i ⁇ N - L, and includes the summed contributions of two successive frames if 0 ⁇ i ⁇ L where N - L ⁇ i ⁇ N. It is thus possible to perform the time synthesis in a simple manner even if, as in the case considered, the overlap L between two successive frames is smaller than half the size N of these frames.
- FIG. 16 shows the appearance of the compound window f c in the case where the analysis window f A is a Hamming window and the synthesis window f s has the form given by the relations (19) to (21) .
- the coder in FIG. 1 can increase the rate of formation and analysis of the frames, in order to transmit more quantization parameters to the decoder.
- a frame of N 256 samples (32 ms) is formed every 20 ms.
- the frames for which an interpolation is carried out can be those of rank half-integer n + 1/2 which are offset by 10 ms relative to the frames of the subset.
- the notations cx_q [n-1] and cX- ⁇ l 1 " 1 ] designate quantized cepstral vectors determined, for two successive frames of whole rank, by the quantization module 18 and / or by the quantization module 34. These vectors include, for example, four consecutive cepstral coefficients each, and could also include more cepstral coefficients.
- a module 120 performs an interpolation of these two cepstral vectors cx_q [n-1] and cx_q [n], in order to estimate an intermediate value cx_i [n-1/2].
- the interpolation performed by the module 120 can be a simple arithmetic mean of the vectors cx_q [n-1] and cx_q [n].
- the module 120 could apply a more sophisticated interpolation formula, for example polynomial, also based on the cepstral vectors obtained for frames prior to frame n-1.
- the interpolation takes account of the relative position of each interpolated frame.
- the coder uses the means described above to calculate the cepstral coefficients cx [n-1/2] relating to the frame of half-integer rank.
- these cepstral coefficients are those provided by the module of TFR1 13 after post-liftrage 15 (for example with the same post-liftrage coefficients as for the previous frame n-1) and normalization 16.
- the cepstral coefficients cx [n-1/2] are those delivered by the TFRI module 33.
- a subtractor 121 forms the difference ecx [n-1/2] between the cepstral coefficients cx [n-1/2] calculated for the half-integer row frame and the coefficients cx_i [n-1/2] estimated by interpolation.
- This difference is supplied to a quantization module 122 which addresses quantization indices icx [n-1/2] to the output multiplexer 6 of the coder.
- the module 122 operates for example by vector quantization of the ecx interpolation errors [n-1/2] successively determined for the half-integer rank frames.
- This quantization of the interpolation error can be carried out by the coder for each of the NCS + NCI cepstral coefficients used by the decoder, or only for some of them, typically those of the smallest orders.
- the corresponding means of the decoder are illustrated in FIG. 19.
- the decoder functions essentially like that described with reference to FIG. 8 to determine the signal frames of whole rank.
- An interpolation module 124 identical to the module 120 of the coder estimates the intermediate coefficients cx_i [n-1/2] from the quantized coefficients cx_q [n-1] and cX-ClM supplied by module 47 and / or module 48 from the indexes icxs, icxi extracted from the flow ⁇ .
- a parameter extraction module 125 receives the quantization index icx [n-1/2] from the input demultiplexer 45 of the decoder, and deduces therefrom the quantized interpolation error ecx_q [n-1/2] from the same quantization dictionary as that used by the module 122 of the coder.
- An adder 126 sums the cepstral vectors cx_i [n-1/2] and ecx_q [n-1/2] in order to provide the cepstral coefficients cx [n-1/2] which will be used by the decoder (modules 51 - 57, 95, 96, 115 and / or modules 85-87, 92, 95, 96, 115) to form the interpolated frame of rank n-1/2. If only some of the cepstral coefficients have been the subject of an interpolation error quantification, the others are determined by the decoder by a simple interpolation, without correction.
- the decoder can also interpolate the other parameters F 0 , Emix used to synthesize the signal frames.
- the fundamental frequency F 0 can be interpolated linearly, either in the time domain, or (preferably) directly in the frequency domain.
- the interpolation should be carried out after denormalization and of course taking account of the time offsets between frames.
- the coder uses the cepstral vectors cx_q [n], cx_q [n-1], ..., cx_q [nr] and cx_q [n-1/2] calculated for the last frames passed (r> 1) to identify an optimal interpolator filter which, when subject to the quantized cepstral vectors cx_q [nr], ..., cx_q [n] relating to frames of whole rank, delivers an interpolated cepstral vector cx_i [n -1/2] which presents a minimum distance with the vector cx [n-1/2] calculated for the last frame of rank half-integer.
- this interpolator filter 128 is present in the coder, and a subtractor 129 subtracts its output cx_i [n-1/2] from the calculated cepstral vector cx [n-1/2].
- a minimization module 130 determines the set of parameters ⁇ P ⁇ of the interpolator filter 128, for which the interpolation error ecx [n-1/2] delivered by the subtractor 129 has a minimum standard. This set of parameters ⁇ P ⁇ is addressed to a quantization module 131 which provides a corresponding quantization index iP to the output multiplexer 6 of the coder.
- the decoder From the quantization indexes iP of the parameters ⁇ P ⁇ obtained in the bit stream ⁇ , the decoder reconstructs the interpolator filter 128 (except for quantization errors), and processes the spectral vectors cx_q [nr], ..., cx_q [ n] in order to estimate the cepstral coefficients cx [n-1/2] used to synthesize the half-integer rank frames.
- the decoder can use a simple interpolation method (without transmission of parameters from the coder for half-integer rank frames), an interpolation method with consideration of an interpolation error quantized (according to Figures 17 and 18), or an interpolation method with an optimal interpolator filter (according to Figure 19) to evaluate the half-integer rank frames in addition to the whole rank frames evaluated directly as explained with reference to FIGS. 8 to 13.
- the temporal synthesis module 116 can then combine all of these evaluated frames to form the synthesized signal x in the manner explained below with reference to FIGS. 14, 21 and 22.
- the module 116 performs an overlap sum of modified frames with respect to those successively evaluated at the output of the module 115, and this modification can be seen in two stages, the first of which is identical to that previously described with reference to FIG. 14 (divide the samples of the frame 2 'by the analysis window fA).
- fs (') + fs (' + / p) A for N / 2 - M / p ⁇ i ⁇ N / 2 (25)
- the summary window f s (i) gradually increases for i going from
- the synthesis window f s can be, over this interval, a Hamming window (as shown in FIG. 21) or a Hanning window.
- FIG. 21 shows the successive frames 2 "repositioned in time by the module 116.
- the hatching indicates the portions eliminated from the frames (summary window at 0). It can be seen that by performing the overlapping sum of the samples of the successive frames, the property (25) ensures a homogeneous weighting of the samples of the synthesized signal.
- the interpolated frames can be the subject of a reduced transmission of coding parameters, as described above, but this is not compulsory.
- This embodiment makes it possible to maintain a relatively large interval M between two analysis frames, and therefore to limit the required transmission rate, while limiting the discontinuities likely to appear due to the size of this interval relative to the scales. of time typical of the variations of the parameters of the audio signal, in particular the cepstral coefficients and the fundamental frequency.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU62921/00A AU6292100A (en) | 1999-07-05 | 2000-07-04 | Audio encoding and decoding including non harmonic components of the audio signal |
EP00949622A EP1192620A1 (fr) | 1999-07-05 | 2000-07-04 | Codage et decodage audio incluant des composantes non harmoniques du signal |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR9908636A FR2796192B1 (fr) | 1999-07-05 | 1999-07-05 | Procedes et dispositifs de codage et de decodage audio |
FR99/08636 | 1999-07-05 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001003119A1 true WO2001003119A1 (fr) | 2001-01-11 |
Family
ID=9547705
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FR2000/001907 WO2001003119A1 (fr) | 1999-07-05 | 2000-07-04 | Codage et decodage audio incluant des composantes non harmoniques du signal |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1192620A1 (fr) |
AU (1) | AU6292100A (fr) |
FR (1) | FR2796192B1 (fr) |
WO (1) | WO2001003119A1 (fr) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5754974A (en) * | 1995-02-22 | 1998-05-19 | Digital Voice Systems, Inc | Spectral magnitude representation for multi-band excitation speech coders |
US5826222A (en) * | 1995-01-12 | 1998-10-20 | Digital Voice Systems, Inc. | Estimation of excitation parameters |
WO1999010719A1 (fr) * | 1997-08-29 | 1999-03-04 | The Regents Of The University Of California | Procede et appareil de codage hybride de la parole a 4kbps |
-
1999
- 1999-07-05 FR FR9908636A patent/FR2796192B1/fr not_active Expired - Fee Related
-
2000
- 2000-07-04 WO PCT/FR2000/001907 patent/WO2001003119A1/fr not_active Application Discontinuation
- 2000-07-04 EP EP00949622A patent/EP1192620A1/fr not_active Withdrawn
- 2000-07-04 AU AU62921/00A patent/AU6292100A/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5826222A (en) * | 1995-01-12 | 1998-10-20 | Digital Voice Systems, Inc. | Estimation of excitation parameters |
US5754974A (en) * | 1995-02-22 | 1998-05-19 | Digital Voice Systems, Inc | Spectral magnitude representation for multi-band excitation speech coders |
WO1999010719A1 (fr) * | 1997-08-29 | 1999-03-04 | The Regents Of The University Of California | Procede et appareil de codage hybride de la parole a 4kbps |
Also Published As
Publication number | Publication date |
---|---|
FR2796192A1 (fr) | 2001-01-12 |
EP1192620A1 (fr) | 2002-04-03 |
FR2796192B1 (fr) | 2001-10-05 |
AU6292100A (en) | 2001-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0782128B1 (fr) | Procédé d'analyse par prédiction linéaire d'un signal audiofréquence, et procédés de codage et de décodage d'un signal audiofréquence en comportant application | |
EP1692689B1 (fr) | Procede de codage multiple optimise | |
EP1593116A1 (fr) | Procede pour le traitement numerique differencie de la voix et de la musique, le filtrage de bruit, la creation d'effets speciaux et dispositif pour la mise en oeuvre dudit procede | |
WO2005106852A1 (fr) | Procede et systeme ameliores de conversion d'un signal vocal | |
EP0428445B1 (fr) | Procédé et dispositif de codage de filtres prédicteurs de vocodeurs très bas débit | |
FR2784218A1 (fr) | Procede de codage de la parole a bas debit | |
WO2004088633A1 (fr) | Procede d'analyse d'informations de frequence fondamentale et procede et systeme de conversion de voix mettant en oeuvre un tel procede d'analyse | |
FR2653557A1 (fr) | Appareil et procede pour le traitement de la parole. | |
EP1192619B1 (fr) | Codage et decodage audio par interpolation | |
EP0616315A1 (fr) | Dispositif de codage et de décodage numérique de la parole, procédé d'exploration d'un dictionnaire pseudo-logarithmique de délais LTP, et procédé d'analyse LTP | |
WO2023165946A1 (fr) | Codage et décodage optimisé d'un signal audio utilisant un auto-encodeur à base de réseau de neurones | |
EP1192621B1 (fr) | Codage audio avec composants harmoniques | |
EP1192618B1 (fr) | Codage audio avec liftrage adaptif | |
EP1190414A1 (fr) | Codage et decodage audio avec composantes harmoniques et phase minimale | |
EP1194923B1 (fr) | Procedes et dispositifs d'analyse et de synthese audio | |
EP1192620A1 (fr) | Codage et decodage audio incluant des composantes non harmoniques du signal | |
FR2773653A1 (fr) | Dispositifs de codage/decodage de donnees, et supports d'enregistrement memorisant un programme de codage/decodage de donnees au moyen d'un filtre de ponderation frequentielle | |
WO2013135997A1 (fr) | Modification des caractéristiques spectrales d'un filtre de prédiction linéaire d'un signal audionumérique représenté par ses coefficients lsf ou isf | |
FR2737360A1 (fr) | Procedes de codage et de decodage de signaux audiofrequence, codeur et decodeur pour la mise en oeuvre de tels procedes | |
WO2002029786A1 (fr) | Procede et dispositif de codage segmental d'un signal audio | |
FR2739482A1 (fr) | Procede et dispositif pour l'evaluation du voisement du signal de parole par sous bandes dans des vocodeurs | |
FR2980620A1 (fr) | Traitement d'amelioration de la qualite des signaux audiofrequences decodes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2000949622 Country of ref document: EP Ref document number: 10019914 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 2000949622 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000949622 Country of ref document: EP |