WO2007041789A1 - Front-end processing of speech signals - Google Patents
Front-end processing of speech signals Download PDFInfo
- Publication number
- WO2007041789A1 WO2007041789A1 PCT/AU2006/001498 AU2006001498W WO2007041789A1 WO 2007041789 A1 WO2007041789 A1 WO 2007041789A1 AU 2006001498 W AU2006001498 W AU 2006001498W WO 2007041789 A1 WO2007041789 A1 WO 2007041789A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frame
- frames
- speech
- speech signal
- noise
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/20—Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
Definitions
- the invention concerns a method of processing speech signals. For example, but not limited to, processing speech signals as part of a speech recognition, speaker verification, speech enhancement or speech coding system.
- the invention also concerns software and a computer system to perform the method of processing speech signals.
- Speech recognition systems comprise a few basic components, as shown in Figure 1.
- the 'front-end' 10 is the first stage of the system and uses signal processing techniques to derive a compact representation of a frame (or single short segment) of speech.
- three broad classes of techniques can be used to improve the overall recognition performance:
- Model-space techniques These are based in the back-end 12, and can produce some useful improvement, however they are still limited by the quality of the features being received from the front-end.
- the invention provides a method for front-end processing of speech signals comprising the following steps: dividing the speech signal into frames; filtering the frames of the speech signal into frequency bands to produce filtered outputs for each frame; and deriving a noise estimate for each frequency band of the speech signal and weighting the filtered outputs of each frame with a function derived from the filtered outputs and noise estimates to emphasise outputs that are less effected by noise.
- This invention provides good performance in recognising speech in speech signals spoken in noisy environments at a reduced processing load, making deployment in many practical situations (e.g. handheld devices) feasible.
- the frequency filtering is based on the Mel-scale frequency.
- the step of dividing the speech signal into frames may comprise calculating an energy function value of candidate frames within a predetermined time window and selecting a candidate frame that has an energy function value that meets predetermined criteria to be a frame for further processing, such as for feature extraction of the speech signal.
- the method may further comprise the steps of: applying a discrete cosine transformation to frames having weighted frequency filtered outputs; and mapping the discrete cosine transformed frames to a predetermined probability density function and disregarding the mapped frames from a tail region of the distribution in further processing of the speech signal.
- Selecting the frame may be based on the position of the previous frame relative in time.
- the frames may be in a sequence of time order.
- the predetermined time window may have a predetermined minimum time and maximum time that the candidate frames start at.
- the given time window may be based on the position of the previous frame.
- the time differences between candidate frames may be predetermined. Two or more of the selected frames of the speech signal may overlap in time.
- the predetermined criteria may be that the candidate frame that has the optimum energy function value, this may be a minimum or a maximum value.
- the predetermined criteria may be the candidate frame that has the largest absolute difference in energy function value than the previous frame.
- the energy function may be the log energy of the frame. The energy of each candidate frame is dependent upon the energy value of a previous candidate frame, which makes the calculation of the energy values of all the candidate frames computationally inexpensive.
- the noise estimate of the speech signal is determined from part of the speech signal that does not include any speech.
- the filtered outputs of a frame comprise a magnitude value for every frequency band in the frame and the filtered noise estimates comprise a magnitude value for every frequency band in the noise estimate.
- the noise estimate may be derived for each frame.
- the step of weighting the filtered outputs may comprise subtracting from the magnitude value of a frequency band of a filtered frame, the value of the magnitude of the noise estimate in that frequency band.
- the function used for weighting may include a first weighting factor that is dependent on the filtered outputs and noise estimates of multiple frequency bands that may be based on a ratio of the Signal-to-Noise Ratio of a particular f ⁇ lterbank output to the sum of the Signal-to-Noise Ratios of all the filterbank outputs.
- the step of deriving the noise estimate for each frequency band may be derived dynamically for each frame.
- the step of weighting frequency filtered outputs may comprise scaling the magnitude value of a frequency filtered frame and adding an offset to the scaled value.
- the step of weighting frequency filtered frames after logarithmic compression may comprise weighting by a function that is dependent on the signal-to-noise ratios of frequency filtered outputs at multiple frequency bands.
- the weighting function is calculated dynamically for different frames of speech.
- the step of removing from the mapped distribution frames from the tail region may be from the left tail region.
- the method may be performed at the front-end of a speech recognition system. This has the distinct advantage of being cepstral-based, meaning that it fits well into the paradigm of distributed speech recognition (DSR), where international standards are available for leveraging the application of speech recognition on mobile and handheld devices.
- DSR distributed speech recognition
- the invention provides a method of dividing a speech signal into frames for further front-end processing of the speech signals, the method comprising the following steps: calculating an energy function value of candidate frames within a predetermined time window and selecting a candidate frame that has an energy function value that meets predetermined criteria to be a frame for further processing.
- the invention provides software to perform the method described above.
- the invention provides a computer system programmed to perform the method described above.
- the computer system may be a distributed system.
- the invention provides software to perform the method described in any one of the preceding claims.
- Figure 1 is a block diagram of a speech recognition system (prior art);
- Figure 2 is a flowchart of the method of processing speech signals;
- Figure 3 schematically shows candidate frames within a given time window;
- Figure 4 shows the calculated Mel-filterbank values for a frame;
- Figure 5 is a schematic diagram of a computer system used for pre-processing of speech signals in accordance with the present invention; and
- Figure 6 shows graphically some experimental results of the invention.
- a speech signal y(t) 30 is provided as input to a 'front-end' of the speech recognition system. Pre-processing and dividing the signal into frames (i.e. framing) is then applied to the speech signal y(t) according to an energy search variable frame rate (VFR) analysis 32.
- VFR energy search variable frame rate
- Energy Search and VFR analysis seeks to estimate the optimum relative position of the next frame of speech k in time by maximizing the difference in log energy between the current frame and possible next frame.
- the optimum position of the next frame is determined based on an energy search. It is predetermined that the next frame will start somewhere in a time window defined by K mn 52 and K max 54. The time increment between possible candidate frames is also predetermined. An energy search is then conducted on the time interval between K min 52 and K max 54 by calculating the energy E ⁇ + ⁇ (k) of each candidate frame that starts between K mtn 52 and Kmax 54 in order to determine the next frame of speech k .
- the energy search is performed according to:
- k is the candidate frame advance relative to the current frame position in samples
- Ei is the energy value of the current frame
- E ⁇ + ⁇ (k) is the energy value of the next frame
- / is the frame index
- K mtn and K max are the minimum and maximum admissible values of frame advance in samples.
- energy is calculated according to the usual formula, except that the energy of the next frame is dependent upon the candidate frame advance k, i.e. where N is the number of sample points in a frame and k is defined relative to the beginning of the current (Zth) frame, as shown in Figure 3.
- E 1+1 (k + V) E 1+1 (Jc) + xf (N + k + ⁇ ) - xf (k) , (3)
- Equation (1) might alternatively comprise some other function of energy, i.e.
- the estimated next frame k is usually the candidate frame k with the highest average amplitude of the speech signal within that frame when compared to the other candidate frames.
- Energy Search and VFR analysis (1) is repeated with the next candidate frame k now being the current frame position (7th).
- FFT Fast Fourier Transform
- Mel-frequency is applied, that is, each frame is filtered by different frequency filters to produce for each frequency band j in the frame, a magnitude (amplitude) value Y j .
- FFT Fast Fourier Transform
- Y j a magnitude (amplitude) value
- Mel-filtering Output Weighting 38 is applied to Y j for each estimated frame.
- Mel-filtering output weighting aims to adjust the Y j values in order to compensate for noise and improve the quality of the speech signal. The result is for each output magnitude of the 7-th Mel filterbank, an enhanced value m j value is determined.
- O 1 -, ⁇ j, % all e (0,1) are parameters to adjust the noise compensation
- Y j is the output magnitude of the j-th Mel-filterbank
- N j is the noise estimate of the y ' -th MeI- filterbank output
- MAX[.] is a function which returns the maximum value of its arguments.
- O 1 - is basically calculated as the ratio of the SNR of a particular filterbank output to the sum of the SNR' s of all the filterbank outputs.
- all the weighing factors are calculated frame-by-frame dynamically based on the noise estimates, e.g. from the first 10 frames of each speech utterance. Further enhancement to the noise estimates can be applied that updates the estimates dynamically based on online speech/non-speech classification of each frame of data.
- a Discrete Cosine Transform (DCT) 40 is applied to nt j in a linear form.
- Cumulative Distribution Mapping 42 is then applied to the enhanced output features according to the mapped distribution.
- the main idea of this method 42 is to map the distribution of the noisy speech features into a target distribution with a pre-defined probability density function (PDF).
- PDF probability density function
- F v (v) is the corresponding cumulative distribution function (CDF) of a given set of speech features
- F z (z) is the target CDF
- h(z) are the respective PDF's.
- ⁇ is the number of frames whose corresponding feature values are less than a particular V 0 in an utterance and T is the total number of frames in the utterance.
- a truncated Gaussian is used as the target distribution.
- the left tail region of the distribution of a normalized feature may not be that useful as it represents mainly the range of more noisy features.
- the front-end 80 includes an analog-to-digital (A/D) converter 84, a storage buffer 86, a central processing unit 88, a memory module 90, a data output interface 92 and a system bus 94.
- the memory module is a computer readable storage medium.
- memory module 90 stores the hardwired or programmable code for a pre-processing and framing module 96, an energy search variable frame rate (VFR) analysis module 98, a fast Fourier transform (FFT) and Mel-filtering module 100, a Mel-output weighting module 102, a discrete cosine transform (DCT) module 104 and a cumulative distribution mapping (CDM) module 106.
- VFR energy search variable frame rate
- FFT fast Fourier transform
- Mel-filtering module 100 a Mel-output weighting module 102
- DCT discrete cosine transform
- CDDM cumulative distribution mapping
- A/D converter 84 receives an input speech signal 82 and converts the signal into digital data.
- the digital speech data are then stored in buffer 86 which in turn provides the data to various component modules via system bus 94.
- CPU 88 accesses the speech data on system bus 94 and then processes the data according to the software instructions stored in memory module 90.
- CPU 88 filters the digital speech data and then divides the chunk of speech data into frames, preferably 25ms long and with an overlapping of 10ms.
- CPU 88 For each frame of speech data, CPU 88 decides if a frame should be discarded according to the instructions and calculations stored in VFR analysis module 98 (step 32), If the speech frame is retained, CPU 88 estimates the frequency contents of this frame of speech data according to the instructions stored in FFT and Mel-filtering module 100 (step 36). CPU 88 then weights each Mel-filter output according to the instructions stored in Mel-output weighting module 102 (step 38) and correspondingly applies de-correlation processing according to the instructions and parameters stored in DCT module 104 to generate MFCCs.
- CPU 88 normalizes the MFCCs (step 40) according to the instructions stored in CDM module 106 (step 42) and instructs output interface 92 to transmit the enhanced features, preferably wirelessly, to backend 12 where pattern matching occurs.
- CPU 88 repeats the above processing steps until all speech frames have been processed and instructs output interface 92 to send a control signal to backend 12 to indicate the completion of front-end processing. Backend 12 responses to this control signal and sends off recognition results 110 to an application.
- This method is used at the 'front-end' of speech recognition, speech coding and speech identification systems.
- the likely end users are current users of speech recognition or speaker verification technology, particularly people whose occupation or physical abilities require or are greatly enhanced by this technology. Also users of telephone spoken dialog systems, particularly where such users regularly phone in from noisy environments and other related handheld devices that use speech recognition.
- This front-end method described above (i.e. the portion to the left of the dotted line in Fig. 1 marked 'front-end' 10) could be implemented with good computational efficiency on a standard mobile phone processor (e.g. an XScale processor), for example.
- the system could optionally be distributed for example by performing the back-end operations (i.e. the portion to the right of the dotted line in Fig. 1 marked 'back-end' 12) on a central server, so that the complex back-end computations were not performed by the mobile phone processor.
- the features extracted by the front-end on the phone would be sent via a wireless network protocol to a central telephone exchange, where the back-end server was located.
- the cepstral based front-end is by far the most popular choice in the field of speech recognition, and that is the main reason why the ETSI adopted this type of front-ends into their standards.
- the 'Advanced Front-End' This provides substantially improved performance over the ETSI standard front-end, with a very large (threefold) increase in computational complexity. This configuration is indicative of the state of the art in robust speech recognition at very high complexity.
- each of the above methods 32, 38, 42 produces improvements over ETSI.
- the combined addition of the above methods 32, 38 and 42 creates only a very slight increase in computational complexity over the ETSI standard configuration.
- the combined system produces substantial improvements over ETSI, closely approaching the performance of the advanced front-end.
- the combined system creates only a slight increase in computational complexity over the ETSI configuration.
- the above method provides speech recognition accuracies exceeding the ETSI standard MFCC front-end at reasonable complexity, and very closely approaching the state of the art (see Table 1) for a reduction in computational complexity of at least threefold.
- the Aurora noisy digit database is a standard, very large, difficult recognition task used widely in the research community.
- Table 1 Average digit accuracies (%) for Aurora test sets, proposed front-end compared with ETSI MFCC front-ends, clean HMM set
- processing blocks of the proposed front-end can be made distributed in different physical locations dependent on the requirement of an application.
- One example is to have the static cepstral coefficients generated on the client side and then apply CDM on the server side when receiving the static features.
- CDM CDM on the server side when receiving the static features.
- the proposed front-end Compared with signal-space and model-space methods for noisy speech recognition, the proposed front-end has the following advantages:
- DSR distributed speech recognition
- ETSI advanced front-end represents the current state of the art in the field of noise robustness.
- the proposed front-end has a much lighter computation load and is very nearly as noise robust as the advanced front-end (see Table 1).
- any suitable frequency scale known in the art can be used for filtering, such as Barker-frequency or linear frequency.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2006301933A AU2006301933A1 (en) | 2005-10-11 | 2006-10-11 | Front-end processing of speech signals |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU200505604 | 2005-10-11 | ||
AU00505604 | 2005-10-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2007041789A1 true WO2007041789A1 (en) | 2007-04-19 |
Family
ID=37942224
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/AU2006/001498 WO2007041789A1 (en) | 2005-10-11 | 2006-10-11 | Front-end processing of speech signals |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2007041789A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2450886A (en) * | 2007-07-10 | 2009-01-14 | Motorola Inc | Voice activity detector that eliminates from enhancement noise sub-frames based on data from neighbouring speech frames |
WO2010075789A1 (en) * | 2008-12-31 | 2010-07-08 | 华为技术有限公司 | Signal processing method and apparatus |
WO2015165264A1 (en) * | 2014-04-29 | 2015-11-05 | 华为技术有限公司 | Signal processing method and device |
CN108053837A (en) * | 2017-12-28 | 2018-05-18 | 深圳市保千里电子有限公司 | A kind of method and system of turn signal voice signal identification |
CN111462757A (en) * | 2020-01-15 | 2020-07-28 | 北京远鉴信息技术有限公司 | Data processing method and device based on voice signal, terminal and storage medium |
CN117388835A (en) * | 2023-12-13 | 2024-01-12 | 湖南赛能环测科技有限公司 | Multi-spelling fusion sodar signal enhancement method |
CN118016049A (en) * | 2022-11-10 | 2024-05-10 | 唯思电子商务(深圳)有限公司 | A closed-loop OTP verification system based on voice verification code |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4630304A (en) * | 1985-07-01 | 1986-12-16 | Motorola, Inc. | Automatic background noise estimator for a noise suppression system |
US5406635A (en) * | 1992-02-14 | 1995-04-11 | Nokia Mobile Phones, Ltd. | Noise attenuation system |
US6230122B1 (en) * | 1998-09-09 | 2001-05-08 | Sony Corporation | Speech detection with noise suppression based on principal components analysis |
US6411925B1 (en) * | 1998-10-20 | 2002-06-25 | Canon Kabushiki Kaisha | Speech processing apparatus and method for noise masking |
US6826528B1 (en) * | 1998-09-09 | 2004-11-30 | Sony Corporation | Weighted frequency-channel background noise suppressor |
-
2006
- 2006-10-11 WO PCT/AU2006/001498 patent/WO2007041789A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4630304A (en) * | 1985-07-01 | 1986-12-16 | Motorola, Inc. | Automatic background noise estimator for a noise suppression system |
US5406635A (en) * | 1992-02-14 | 1995-04-11 | Nokia Mobile Phones, Ltd. | Noise attenuation system |
US6230122B1 (en) * | 1998-09-09 | 2001-05-08 | Sony Corporation | Speech detection with noise suppression based on principal components analysis |
US6826528B1 (en) * | 1998-09-09 | 2004-11-30 | Sony Corporation | Weighted frequency-channel background noise suppressor |
US6411925B1 (en) * | 1998-10-20 | 2002-06-25 | Canon Kabushiki Kaisha | Speech processing apparatus and method for noise masking |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2450886A (en) * | 2007-07-10 | 2009-01-14 | Motorola Inc | Voice activity detector that eliminates from enhancement noise sub-frames based on data from neighbouring speech frames |
GB2450886B (en) * | 2007-07-10 | 2009-12-16 | Motorola Inc | Voice activity detector and a method of operation |
US8909522B2 (en) | 2007-07-10 | 2014-12-09 | Motorola Solutions, Inc. | Voice activity detector based upon a detected change in energy levels between sub-frames and a method of operation |
WO2010075789A1 (en) * | 2008-12-31 | 2010-07-08 | 华为技术有限公司 | Signal processing method and apparatus |
CN101770775B (en) * | 2008-12-31 | 2011-06-22 | 华为技术有限公司 | Signal processing method and device |
RU2688259C2 (en) * | 2014-04-29 | 2019-05-21 | Хуавэй Текнолоджиз Ко., Лтд. | Method and device for signal processing |
US11081121B2 (en) | 2014-04-29 | 2021-08-03 | Huawei Technologies Co., Ltd. | Signal processing method and device |
US12249339B2 (en) | 2014-04-29 | 2025-03-11 | Huawei Technologies Co., Ltd. | Signal processing method and device |
RU2656812C2 (en) * | 2014-04-29 | 2018-06-06 | Хуавэй Текнолоджиз Ко., Лтд. | Signal processing method and device |
US10186271B2 (en) | 2014-04-29 | 2019-01-22 | Huawei Technologies Co., Ltd. | Signal processing method and device |
WO2015165264A1 (en) * | 2014-04-29 | 2015-11-05 | 华为技术有限公司 | Signal processing method and device |
US10347264B2 (en) | 2014-04-29 | 2019-07-09 | Huawei Technologies Co., Ltd. | Signal processing method and device |
US10546591B2 (en) | 2014-04-29 | 2020-01-28 | Huawei Technologies Co., Ltd. | Signal processing method and device |
US11881226B2 (en) | 2014-04-29 | 2024-01-23 | Huawei Technologies Co., Ltd. | Signal processing method and device |
US9837088B2 (en) | 2014-04-29 | 2017-12-05 | Huawei Technologies Co., Ltd. | Signal processing method and device |
US11580996B2 (en) | 2014-04-29 | 2023-02-14 | Huawei Technologies Co., Ltd. | Signal processing method and device |
CN108053837A (en) * | 2017-12-28 | 2018-05-18 | 深圳市保千里电子有限公司 | A kind of method and system of turn signal voice signal identification |
CN111462757A (en) * | 2020-01-15 | 2020-07-28 | 北京远鉴信息技术有限公司 | Data processing method and device based on voice signal, terminal and storage medium |
CN111462757B (en) * | 2020-01-15 | 2024-02-23 | 北京远鉴信息技术有限公司 | Voice signal-based data processing method, device, terminal and storage medium |
CN118016049A (en) * | 2022-11-10 | 2024-05-10 | 唯思电子商务(深圳)有限公司 | A closed-loop OTP verification system based on voice verification code |
CN117388835A (en) * | 2023-12-13 | 2024-01-12 | 湖南赛能环测科技有限公司 | Multi-spelling fusion sodar signal enhancement method |
CN117388835B (en) * | 2023-12-13 | 2024-03-08 | 湖南赛能环测科技有限公司 | Multi-spelling fusion sodar signal enhancement method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108281146B (en) | Short voice speaker identification method and device | |
JP4218982B2 (en) | Audio processing | |
US7725314B2 (en) | Method and apparatus for constructing a speech filter using estimates of clean speech and noise | |
EP2431972B1 (en) | Method and apparatus for multi-sensory speech enhancement | |
US6253175B1 (en) | Wavelet-based energy binning cepstal features for automatic speech recognition | |
EP1500087B1 (en) | On-line parametric histogram normalization for noise robust speech recognition | |
US7181390B2 (en) | Noise reduction using correction vectors based on dynamic aspects of speech and noise normalization | |
JP3154487B2 (en) | A method of spectral estimation to improve noise robustness in speech recognition | |
EP1569422A2 (en) | Method and apparatus for multi-sensory speech enhancement on a mobile device | |
US7027979B2 (en) | Method and apparatus for speech reconstruction within a distributed speech recognition system | |
US20060253285A1 (en) | Method and apparatus using spectral addition for speaker recognition | |
WO2002029782A1 (en) | Perceptual harmonic cepstral coefficients as the front-end for speech recognition | |
WO2007041789A1 (en) | Front-end processing of speech signals | |
CN108922543A (en) | Model library method for building up, audio recognition method, device, equipment and medium | |
CN108053842B (en) | Short wave voice endpoint detection method based on image recognition | |
US20110066426A1 (en) | Real-time speaker-adaptive speech recognition apparatus and method | |
Siam et al. | A novel speech enhancement method using Fourier series decomposition and spectral subtraction for robust speaker identification | |
US20020010578A1 (en) | Determination and use of spectral peak information and incremental information in pattern recognition | |
EP1465153A2 (en) | Method and apparatus for formant tracking using a residual model | |
KR20170088165A (en) | Method and apparatus for speech recognition using deep neural network | |
El-Henawy et al. | Recognition of phonetic Arabic figures via wavelet based Mel Frequency Cepstrum using HMMs | |
Faycal et al. | Comparative performance study of several features for voiced/non-voiced classification | |
Ondusko et al. | Blind signal-to-noise ratio estimation of speech based on vector quantizer classifiers and decision level fusion | |
AU2006301933A1 (en) | Front-end processing of speech signals | |
Park et al. | Noise robust feature for automatic speech recognition based on mel-spectrogram gradient histogram. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006301933 Country of ref document: AU |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2006301933 Country of ref document: AU Date of ref document: 20061011 Kind code of ref document: A |
|
WWP | Wipo information: published in national office |
Ref document number: 2006301933 Country of ref document: AU |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 06790368 Country of ref document: EP Kind code of ref document: A1 |