CN112562699A

CN112562699A - Voice processing method and device

Info

Publication number: CN112562699A
Application number: CN201910918220.5A
Authority: CN
Inventors: 陈昭纶; 李安正; 黄立维
Original assignee: Acer Inc
Current assignee: Acer Inc
Priority date: 2019-09-26
Filing date: 2019-09-26
Publication date: 2021-03-26
Anticipated expiration: 2039-09-26
Also published as: CN112562699B

Abstract

The invention provides a voice processing method and a device thereof. The method comprises the following steps: in a Multi-Excitation Linear Prediction (MELP) speech coding system, obtaining a speech sample signal frame and estimating the signal quality of the speech sample signal frame; determining a particular LPC order used by a Linear Prediction Coding (LPC) circuit based on the signal quality; controlling the LPC circuit to convert the voice sampling signal frame into line spectrum pair parameters based on the specific LPC order; replacing the speech signal spectrum of the speech sample signal frame with the line spectrum pair parameters to generate a predicted speech signal; and performing a speech encoding operation and a signal synthesizing operation of the MELP speech encoding system based on the predicted speech signal.

Description

Voice processing method and device

Technical Field

The present invention relates to a speech processing method and a device thereof, and more particularly, to a speech processing method and a device thereof for adaptively adjusting an order of Linear Prediction Coding (LPC).

Background

The development trend of 5 th generation (5G) mobile communication brings about related industrial applications of Internet of Things (IoT), especially applications in low power and low transmission rate.

A Multi-Excitation Linear Prediction (MELP) speech coding system is a set of low bit rate speech coding and decoding systems, which is widely applied to a plurality of digital broadcasting, wireless communication and network systems. However, for mobile communication and related applications of the internet of things, the MELP speech coding system does not take into account the signal quality in the actual environment, which results in poor speech synthesis effect due to excessive noise influence when reconstructing and synthesizing speech signals. Moreover, the distortion rate caused by this method also has a negative effect on the voice quality.

Disclosure of Invention

In view of the above, the present invention provides a speech processing method and apparatus thereof, which can solve the above technical problems.

The invention provides a voice processing method, which comprises the following steps: in a multi-excitation linear prediction speech coding system, obtaining a speech sample signal frame and estimating a signal quality of the speech sample signal frame, wherein the multi-excitation linear prediction speech coding system comprises a linear prediction coding circuit; determining a specific linear predictive coding order used by the linear predictive coding circuit based on the signal quality; controlling a linear predictive coding circuit to convert the speech sample signal frame into a line spectrum pair parameter based on a specific linear predictive coding order; replacing a speech signal spectrum of the speech sample signal frame with the line spectrum pair parameters to generate a predicted speech signal; and performing a speech encoding operation and a signal synthesis operation of the multi-excitation linear predictive speech coding system based on the predicted speech signal.

The invention provides a voice processing device, which comprises a multi-excitation linear prediction voice coding system, a storage circuit and a processor. The memory circuit stores a plurality of modules. The processor is coupled to the storage circuit and accesses the modules to execute the following steps: in a multi-excitation linear prediction speech coding system, obtaining a speech sample signal frame and estimating a signal quality of the speech sample signal frame, wherein the multi-excitation linear prediction speech coding system comprises a linear prediction coding circuit; determining a specific linear predictive coding order used by the linear predictive coding circuit based on the signal quality; controlling a linear predictive coding circuit to convert the speech sample signal frame into a line spectrum pair parameter based on a specific linear predictive coding order; replacing a speech signal spectrum of the speech sample signal frame with the line spectrum pair parameters to generate a predicted speech signal; and performing a speech encoding operation and a signal synthesis operation of the multi-excitation linear predictive speech coding system based on the predicted speech signal.

Based on the above, the method and the apparatus of the present invention can adaptively determine the LPC order according to the signal quality of the speech sample frame, thereby improving the subsequent speech coding and signal synthesis effects and improving the audio quality.

In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

FIG. 1 is a schematic diagram of a speech processing apparatus according to an embodiment of the present invention;

FIG. 2 is a flow diagram illustrating a method of speech processing according to an embodiment of the present invention;

fig. 3 is a graph illustrating the spectral distortion caused by an LPC circuit operating based on a fixed LPC order, according to an embodiment of the present invention.

The reference numbers illustrate:

100: speech processing device

102: memory circuit

104: MELP speech coding system

106: processor with a memory having a plurality of memory cells

311-314: curve line

S210 to S250: step (ii) of

Detailed Description

Fig. 1 is a schematic diagram of a speech processing apparatus according to an embodiment of the invention. As shown in fig. 1, the speech processing apparatus 100 includes a memory circuit 102, a MELP speech coding system 104 and a processor 106. In various embodiments, the voice processing apparatus 100 is, for example, an internet of things (e.g., narrowband Band IoT (NB-IoT) device, etc.) device that can be used to receive voice signals and perform desired signal processing operations thereon, or a portable mobile communication device that can be used to perform low bit rate, low power audio codec, but the invention is not limited thereto.

In various embodiments, the Memory circuit 102 is, for example, any type of fixed or removable Random Access Memory (RAM), Read-Only Memory (ROM), Flash Memory (Flash Memory), hard disk, or the like, or a combination thereof, and can be used to record a plurality of program codes or modules.

The processor 106 is coupled to the memory Circuit 102 and the MELP speech coding system 104, and may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor, a plurality of microprocessors (microprocessors), one or more microprocessors in conjunction with a digital signal processor core, a controller, a microcontroller, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), any other type of Integrated Circuit, a state Machine, an Advanced reduced instruction set Machine (Advanced RISC Machine) based processor, or the like.

In the embodiment of the present invention, the processor 106 can access the modules and program codes recorded in the storage circuit 102 to implement the voice processing method of the present invention. Briefly, the speech processing apparatus 100 of the present invention can utilize the MELP speech coding system 104 to process the received speech signal, but the LPC order adopted by the LPC circuit in the MELP speech coding system 104 is adaptively determined based on the signal quality of the speech signal. Therefore, the effect of subsequent voice coding and synthesis operation can be improved, and the audio quality is improved. The detailed description is as follows.

Referring to fig. 2, a flowchart of a voice processing method according to an embodiment of the invention is shown. The method of the present embodiment can be executed by the speech processing apparatus 100 of fig. 1, and details of steps in fig. 2 will be described below with reference to elements shown in fig. 1.

First, in step S210, in the MELP speech coding system 104, the processor 106 may obtain a speech sample signal frame and estimate the signal quality of the speech sample signal frame. In this embodiment, the voice sampling signal frame may include a plurality of sampling signals generated by the processor 106 sampling an analog voice signal inputted by a user. The Signal quality of the speech sample frame may be estimated by a Signal quality estimation unit disposed in the MELP speech coding system 104, and may be characterized as a Signal to Interference plus Noise Ratio (SINR) of the speech sample frame, for example, but the present invention is not limited thereto.

Thereafter, in step S220, the processor 106 may determine a specific LPC order used by the LPC circuit based on the signal quality. In this embodiment, the designer may preset predetermined signal quality intervals corresponding to different signal qualities, and each predetermined signal quality interval may correspond to a different LPC order. In addition, the LPC order corresponding to the higher one of the predetermined signal quality intervals may be higher than the lower other one of the predetermined signal quality intervals. In this case, the process 104 may find a specific signal quality interval to which the signal quality belongs from a plurality of preset signal quality intervals, and use the LPC order corresponding to the specific signal quality interval as the specific LPC order.

In an embodiment, each of the predetermined signal quality intervals and the corresponding LPC orders can be exemplified as the embodiment in table 1 below.

Presetting signal quality interval	LPC order
		SINR(dB)>25	20
16<SINR(dB)<25	16
		11<SINR(dB)<15	10
SINR(dB)<10	8

TABLE 1

As shown in table 1, if the SINR of the speech sample frame is greater than 25dB, the LPC order corresponding to the frame is, for example, 20; if the SINR of the frame of the speech sample is between 16 and 25dB, the LPC order corresponding to the frame is, for example, 16; if the SINR of the frame of the speech sample is between 11 and 15dB, the LPC order corresponding to the frame is, for example, 10; if the SINR of the frame of the speech sample is less than 10dB, the LPC order corresponding to the frame is, for example, 8, but the invention is not limited thereto.

Therefore, in various embodiments, if the SINR of the voice sample frame is greater than 25dB, the processor 106 may determine the specific LPC order of the LPC circuit to be 20 based on table 1; if the SINR of the frame of speech samples is between 16 and 25dB, the processor 106 may determine the specific LPC order of the LPC circuit to be 16 based on table 1; if the SINR of the frame of speech samples is between 11 and 15dB, the processor 106 may determine the specific LPC order of the LPC circuit to be 8 based on table 1; if the SINR of the frame of the voice samples is less than 10dB, the processor 106 may determine the specific LPC order of the LPC circuit to be 8 based on table 1, but the invention is not limited thereto.

In step S230, the processor 106 may control the LPC circuit to convert the speech sample frame into a line spectral pair parameter (line spectral pair) based on the specific LPC order.

In one embodiment, the processor 106 may determine whether the signal quality of the frame of the voice sample signal is higher than a predetermined threshold. If so, the processor 106 may control the LPC circuit to convert the frame of speech samples into line spectral pair parameters based on a first scheme, otherwise, the LPC circuit may control the frame of speech samples into line spectral pair parameters based on a second scheme, where the first scheme and the second scheme are different for generating the prediction error.

In various embodiments, the predetermined threshold may be determined by a designer according to the requirement. For convenience of illustration, the predetermined threshold is assumed to be 15dB, but it is only for example and not intended to limit the possible embodiments of the present invention. Accordingly, table 1 can be correspondingly adjusted to the examples of table 2 below.

TABLE 2

If the processor 106 controls the LPC circuit to convert the frame of speech samples into line spectrum pair parameters based on the first scheme, the processor 106 may first obtain an estimated signal corresponding to the frame of speech samples and subtract the estimated signal from the frame of speech samples (denoted by s (n))

To produce a prediction error (denoted as e (n)).

In one embodiment, the estimated signal in the first scheme may be characterized as:

wherein a is_kFor the prediction coefficients, P is the specific LPC order, - ∞<n<And + ∞. In this case, the prediction error may be characterized as

Furthermore, in another embodiment, the estimated signal in the second scheme may be characterized as:

wherein-_kFor the prediction coefficients, P is the specific LPC order, - ∞<n<And + ∞. In this case, the prediction error may be characterized as

The processor 106 may then re-employ the Levinson-Durbin algorithm to generate line spectrum pair parameters based on the prediction error and the particular LPC order. In the present embodiment, the relevant details of the Levinson-Durbin algorithm corresponding to the first scheme and the second scheme can be summarized as the following table 3.

TABLE 3

In Table 3, E⁽⁰⁾For example, minimum mean square error, G and R_i(0 ≦ i ≦ P) is, for example, a gain parameter, but the present invention may not be limited thereto.

Next, in step S240, the processor 106 may replace the speech spectrum of the speech sample signal frame with the line spectrum pair parameters to generate a predicted speech signal. Also, in step S250, the processor 106 may perform a speech encoding operation and a signal synthesis operation of the MELP speech encoding system based on the predicted speech signal. In the embodiment of the present invention, step S250 may refer to the related specification of the MELP speech coding system in the prior art, which is not described herein again.

As can be seen from the above, since the LPC order (which is positively related to the signal quality of the speech sample frame) adopted in the present invention can be adaptively determined according to the signal quality of the speech sample frame, the subsequent speech coding and signal synthesis effects can be improved, thereby improving the audio quality.

From another perspective, the concept of the present invention can be broadly understood to adjust the LPC circuit in the conventional MELP speech coding system to adaptively operate according to the LPC order corresponding to the signal quality, rather than according to the fixed LPC order. To other circuits of the MELP speech coding system. The aforementioned other circuits include, for example, a prefilter (prefilter), a pitch search (pitch search) circuit, a band pass sound determination (band pitch decision) circuit, a gain calculation (gain calculation) circuit, a final pitch and sound determination (final pitch and sound determination) circuit, a line spectrum frequency quantization (line spectrum frequency quantization) circuit, a gain/pitch/sound/jitter quantization (gain/pitch/jitter quantization) circuit, a Fourier size calculation (Fourier transform) circuit, a forward error correction (forward error correction) circuit, and the like, and the LPC circuit of the present invention may be disposed between the gain calculation circuit and the final pitch and sound determination circuit, but is not limited thereto. Therefore, if the signal quality of the frame of the speech sample signal is poor, the present invention can correspondingly adopt a lower LPC order, thereby avoiding the degradation of the audio quality caused by excessive interpolation noise during the operation of the LPC circuit, and simultaneously reducing the related computation. If, on the other hand, the signal quality of the speech sample frame is better, the present invention can correspondingly use a higher LPC order, thereby correspondingly improving the subsequent audio quality (e.g., lower spectral distortion).

Furthermore, in the embodiment using the second scheme for the Levinson-Durbin algorithm, the prediction error system is characterized as

Therefore, the absolute value operation requiring a higher operation amount in the subsequent operation process can be avoided. Thus, the overall calculation amount can be effectively reduced, and the calculation delay can be reduced.

In addition, in order to demonstrate the effect of the present invention, the following is further illustrated with the aid of FIG. 3. Referring to fig. 3, a graph illustrating the spectral distortion caused by the LPC circuit operating based on a fixed LPC order is shown according to an embodiment of the present invention. In the present embodiment, the curves 311-314 correspond to LPC orders of 20, 16, 10, and 8, respectively. As can be seen from fig. 3, when the SINR is low (e.g., less than 11dB), using a higher LPC order will result in higher spectral distortion due to excessive interpolation noise, while using a lower LPC order may achieve lower spectral distortion. Moreover, when the SINR is high (for example, greater than 11dB), using a higher LPC order will result in lower spectral distortion due to better learning effect, while using a lower LPC order will result in higher spectral distortion due to poor learning effect.

Therefore, it can be seen that the fixed LPC order alone will not produce better spectrum distortion performance corresponding to various signal qualities. In contrast, the method and apparatus of the present invention can adaptively adopt different LPC orders according to the signal quality, thereby resulting in better spectrum distortion performance.

Taking fig. 3 as an example, the designer may set the predetermined signal quality interval with SINR greater than 11dB to correspond to a higher LPC order (e.g., 20 and/or 16), and set the predetermined signal quality interval with SINR less than 11dB to correspond to a lower LPC order (e.g., 10 and/or 8). Thus, the present invention can employ a lower LPC order (e.g., 20 and/or 16) when the SINR is low (e.g., less than 11dB) and a higher LPC order (e.g., 10 and/or 8) when the SINR is high (e.g., greater than 11dB), thereby providing better audio quality in response to different signal qualities.

In summary, the LPC order (which is positively related to the signal quality of the speech sample frame) adopted in the present invention can be adaptively determined according to the signal quality of the speech sample frame, so that the subsequent speech coding and signal synthesis effects can be improved, thereby improving the audio quality.

Moreover, the invention can further select the first scheme or the second scheme to execute the Levinson-Durbin algorithm to obtain the line spectrum pair parameters according to the signal quality, thereby further reducing the operation amount and reducing the delay required by the operation.

Although the present invention has been described with reference to the above embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention.

Claims

1. A method of speech processing, comprising:

in a multi-excitation linear prediction speech coding system, obtaining a speech sample signal frame and estimating the signal quality of the speech sample signal frame, wherein the multi-excitation linear prediction speech coding system comprises a linear prediction coding circuit;

determining a particular linear prediction coding order used by the linear prediction coding circuit based on the signal quality;

controlling the linear predictive coding circuit to convert the speech sample signal frame into line spectrum pair parameters based on the specific linear predictive coding order;

replacing the speech signal spectrum of the speech sample signal frame with the line spectrum pair parameters to generate a predicted speech signal; and

and performing a speech encoding operation and a signal synthesis operation of the multi-excitation linear prediction speech encoding system based on the predicted speech signal.

2. The method of claim 1, wherein the signal quality characterization is a signal-to-interference-plus-noise ratio of the speech sample signal block.

3. The method of claim 1, wherein deciding the particular linear prediction coding order used by the linear prediction coding circuit based on the signal quality comprises:

determining a specific signal quality interval to which the signal quality belongs in a plurality of preset signal quality intervals, wherein the preset signal quality intervals correspond to different linear prediction coding orders, and the linear prediction coding order corresponding to a higher one of the preset signal quality intervals is higher than another lower one of the preset signal quality intervals; and

and taking the line predictive coding order corresponding to the specific signal quality interval as the specific linear predictive coding order.

4. The method of claim 3, wherein the step of controlling the linear prediction encoding circuit to convert the block of speech samples signals to the line spectral pair parameters based on the particular linear prediction encoding order comprises:

in response to determining that the signal quality of the speech sample signal frame is above a predetermined threshold, controlling the linear predictive coding circuit to convert the speech sample signal frame into the line spectrum pair parameter based on a first scheme;

and in response to determining that the signal quality of the speech sample signal frame is not higher than the preset threshold, controlling the linear predictive coding circuit to convert the speech sample signal frame into the line spectrum pair parameters based on a second scheme, wherein the first scheme and the second scheme are different in a manner of generating a prediction error.

5. The method of claim 4, wherein the step of controlling the linear prediction encoding circuit to convert the block of speech samples signals to the line spectral pair parameters based on the first scheme comprises:

obtaining an estimation signal corresponding to the frame of speech samples, and subtracting the estimation signal from the frame of speech samples to generate the prediction error;

a Levinson-Durbin algorithm is employed to generate the line spectrum pair parameters based on the prediction error and the particular linear prediction coding order.

6. The method of claim 4, wherein the step of controlling the linear prediction encoding circuit to convert the block of speech samples signals to the line spectral pair parameters based on the second scheme comprises:

obtaining an estimation signal corresponding to the frame of speech samples, and adding the estimation signal to the frame of speech samples to generate the prediction error; and

7. A speech processing apparatus, comprising:

a multi-excitation linear predictive speech coding system;

a storage circuit that stores a plurality of modules; and

a processor coupled to the storage circuit and accessing the plurality of modules to perform the following steps:

in the multi-excitation linear prediction speech coding system, acquiring a speech sampling signal frame and estimating the signal quality of the speech sampling signal frame, wherein the multi-excitation linear prediction speech coding system comprises a linear prediction coding circuit;

determining a specific linear predictive coding order used by the linear predictive coding circuit based on the signal quality;

replacing a speech signal spectrum of the speech sample signal frame with the line spectrum pair parameters to generate a predicted speech signal; and