+

WO1997009712A2 - Method and system for processing auditory signals - Google Patents

Method and system for processing auditory signals Download PDF

Info

Publication number
WO1997009712A2
WO1997009712A2 PCT/DK1996/000370 DK9600370W WO9709712A2 WO 1997009712 A2 WO1997009712 A2 WO 1997009712A2 DK 9600370 W DK9600370 W DK 9600370W WO 9709712 A2 WO9709712 A2 WO 9709712A2
Authority
WO
WIPO (PCT)
Prior art keywords
time
leading edge
maximum
signal
εignal
Prior art date
Application number
PCT/DK1996/000370
Other languages
French (fr)
Other versions
WO1997009712A3 (en
WO1997009712B1 (en
Inventor
Frank Uldall Leonhard
Original Assignee
Frank Uldall Leonhard
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Frank Uldall Leonhard filed Critical Frank Uldall Leonhard
Priority to AU67856/96A priority Critical patent/AU6785696A/en
Priority to EP96928357A priority patent/EP0850472A2/en
Publication of WO1997009712A2 publication Critical patent/WO1997009712A2/en
Publication of WO1997009712A3 publication Critical patent/WO1997009712A3/en
Publication of WO1997009712B1 publication Critical patent/WO1997009712B1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Definitions

  • the present invention relates to a method and system for signal processing, by which method and system features repre ⁇ senting distinct sound pictures in auditory signals are extracted from transients in auditory signals.
  • the result of the processing may be used for identification of sound or of speech signals or for quality measurement of audio products or systems, such as loudspeakers, hearing instruments or hearing aids, telecommunication systems, or for quality measurement of acoustic conditions.
  • the method of the present invention may also be used in connection with speech compres ⁇ sion and decompression in narrow band telecommunication or speech storing systems.
  • the human ear has the ability to catch fast sound signals, detect sound frequency with great accuracy and differentiate between sound signals in complicated sound environments. For instance it is possible to understand what a singer is sing ⁇ ing in an accompaniment of musical instruments.
  • transient component in an auditory signal in this invention may be interpreted as a fast change of the energy in an auditory signal, where the rise time of the energy change is at the most 3 ms, and a slower change of the energy level may be interpreted as a change of the quasi steady state component of an auditory signal.
  • the transient and the quasi steady state component in an auditory signal may be defined as follows:
  • the transient component in an auditory signal is the fast energy changes, that may be detected by means of an envelope detection using a lowpass filter with a rela ⁇ tively high cutoff frequency in the range 50-1500 Hz, and preferably in the range 300-1500 Hz.
  • the quasi steady state component in an auditory signal i ⁇ the energy level, that may be detected by means an of envelope detection using a low-pass filter with a rela ⁇ tively low cutoff frequency in the range below 400 Hz, and preferably below 150 Hz.
  • the fast energy changes in the auditory signal may also be detected without the use of envelope detection or without the use of a low pass filter.
  • the nerve pulses launched from the cochlea are synchronised to the frequency of a sinus tone if the frequency is less than about 1.4 kHz. If the frequency of the tone is higher than about 1.4 kHz the pulses are launched randomly and less than once per period. Therefore the audi ⁇ tory perceptive faculty is tone oriented in the range up to about 1.4 kHz and transient oriented above.
  • the frequency spectra of speech signals from human beings contain energy bands, called formants. These formants are carriers of outstanding transients, and if the formants are selected for transient analyses an important noise sup- pression may be obtained.
  • WO 94/25958 it is described how the information hold in the shape of pulses representing the fast energy change ⁇ in auditory signal ⁇ are used for identifying distinct sound pictures, and in a preferred embodiment the shape of the leading edge of a pulse is determined by determining the pulse rise time or determining the slope variation. It is further preferred that the shape of the top part of the leading edge is determined, the top part starting at the point of the edge where the slope is maximum.
  • the rise time of a pulse provided as an input to a filter is faster than the rise time of the impul ⁇ e response of the filter then, the rise time of the output of the filter generated in response to the input pulse will be substantially equal to the rise time of the impulse response of the filter.
  • the ri ⁇ e time of the output of the filter generated in re ⁇ pon ⁇ e to the input pulse will be sub ⁇ tantially equal to the ri ⁇ e time of the input pul ⁇ e.
  • the signal processing of sound signals in the cochlea may be simulated by a filter bank compri ⁇ ing a set of bandpass filters with different centre frequencies and that the bandwidths of these filters increase with increasing centre frequencies which again means that the rise time ⁇ of the impulse responses of the filters increase with increasing centre frequencies.
  • the ri ⁇ e time of an output pul ⁇ e generated by a corresponding filter of the filter bank will be substantially equal to the rise time of the impulse re ⁇ pon ⁇ e of the filter when the ri ⁇ e time of the input pul ⁇ e i ⁇ fa ⁇ ter than the rise time of the impulse re ⁇ ponse of the filter and sub ⁇ tantially equal to the ri ⁇ e time of the input pulse when the rise time of the input pulse is slower than the rise time of the impulse respon ⁇ e of the filter.
  • the rise time of the input pulse may be determined by determination of the two filters A and B of the filter bank having the narrowest bandwidths of the filters of the bank generating output pulses in response to the input pulse with ⁇ ub ⁇ tantially identical ri ⁇ e time ⁇ a ⁇ the ri ⁇ e time of the input pul ⁇ e mu ⁇ t be within the rise time range between the rise time of the impulse response of the filter A, B with the narrowest bandwidth and the rise time of the impulse respon ⁇ e of the filter with the largest bandwidth that is also lower than the bandwidths of the filters A, B.
  • This ri ⁇ e time detection principle may be utilized by the auditory organ ⁇ of living beings and thi ⁇ could explain why the bandwidth ⁇ of the filters simulating cochlea ⁇ ound proce ⁇ ing are increa ⁇ ing with increa ⁇ ing centre frequencie ⁇ .
  • sound ⁇ peech signals may be generated by modulation of pulses in filters that modulate the ⁇ hape of the pul ⁇ es as described above.
  • Pulses to be modulated correspond to speech signal ⁇ generated in the articulation channel, e.g. by the vocal chord, and the proce ⁇ sing in the filters correspond to the modulation performed by adjustment of the articulation channel according to the phoneme proce ⁇ ed whereby the filter ⁇ modulate the shape of the pulses.
  • the time between pulses to be modulated should sufficiently long to ensure that there i ⁇ no interference between output pul ⁇ e ⁇ generated in response to different input pulses.
  • This object is accomplished by providing a method of proces- ⁇ ing an auditory ⁇ ignal to facilitate identification of abrupt energy changes within the auditory signal, which abrupt energy changes have a ri ⁇ e time of at the mo ⁇ t 3 m ⁇ , and which abrupt energy change ⁇ can be perceived by an animal ear ⁇ uch a ⁇ a human ear a ⁇ representing a distinct sound picture.
  • the method comprises: deriving, from the auditory signal, a first signal com- prising transient pul ⁇ es corresponding to at least part of the abrupt energy change ⁇ , tracing or monitoring pul ⁇ e ⁇ in the fir ⁇ t transient signal, determining local maxima of the transient pul ⁇ e ⁇ , and generating a second transient signal wherein the value of at lea ⁇ t one determined local maximum of a pul ⁇ e in the fir ⁇ t tran ⁇ ient signal is hold at said maximum value for a pre ⁇ determined period of time t rfpr thereby generating a corre ⁇ sponding pul ⁇ e in the second transient signal, said predeter- mined period of time t rfpr being of at the most 5 ms.
  • pulses in a train of two or more successive pulses in the first transient signal are ⁇ ubjected to the above de ⁇ cribed holding procedure, and one or more of the pulses is/are located at a distance in time from a preceding pulse which is ⁇ horter than the predetermined period of time t rfpr and ha ⁇ /have a local maximum greater than the local maximum of ⁇ aid preceding pul ⁇ e, the hold of the local maximum of said preceding pulse is maintained until the occurrence of the ⁇ ub ⁇ equent, greater local maximum and i ⁇ replaced by ⁇ aid ⁇ ub ⁇ equent, greater local maximum.
  • the predetermined period of time t rfpr is shorter than or equal to 3 ms, or shorter than or equal to 2 m ⁇ . It i ⁇ even more preferred that t rfpr i ⁇ ⁇ horter than or equal to 1 m ⁇ , or about 0,7 m ⁇ .
  • the shape of a pulse in the second transient pulse signal is an important feature for identifi- cation of the pulse.
  • the shape of pulse ⁇ in the ⁇ econd tran ⁇ ient pulse signal are determined or identified, and preferably one or more distinct sound pictures is/are identified from the determined ⁇ hape.
  • the ⁇ hape of a pulse may be characterized by the pulse rise time, the form of the leading edge, the duration of the pulse, and/or the fall time or the form of the lagging edge, and it is preferred that the form of the leading edge is determined by determining rise time, slope and/or slope variation of at least part of the leading edge.
  • the frequency of the auditory signal is determined from the second transient signal based on the distance in time between succeeding leading edges of pul ⁇ e ⁇ in the ⁇ ignal.
  • the method include ⁇ ⁇ electing pulses where the shape of the leading edge has a maximum slope greater than a prede ⁇ termined minimum value, thereby discarding pulses with a rather small maximum slope, which pulses may be considered a ⁇ representing noise components in the process of identifica ⁇ tion or representation of distinct sound picture ⁇ of the auditory ⁇ ignal.
  • This object is accomplished by providing a method for select ⁇ ing leading edge ⁇ of tran ⁇ ient pulse ⁇ in a transient signal, said transient signal being derived from an auditory signal having abrupt energy changes with a rise time of at the mo ⁇ t 3 m ⁇ , and which abrupt energy change ⁇ can be perceived by an animal ear ⁇ uch a ⁇ a human ear a ⁇ representing a distinct sound picture.
  • the method compri ⁇ e ⁇ : determining or mea ⁇ uring the maximum ⁇ lope of a leading edge of a pul ⁇ e in the tran ⁇ ient ⁇ ignal, comparing the obtained maximum ⁇ lope with a predetermined lower thre ⁇ hold value for maximum slopes of leading edges, and if the obtained maximum slope i ⁇ equal to or greater than the predetermined lower threshold value, selecting ⁇ aid leading edge a ⁇ a candidate to the leading edge of a pul ⁇ e.
  • ⁇ everal leading pul ⁇ e edge ⁇ being candidate ⁇ for a ⁇ elected leading edge may be ob ⁇ erved within a ⁇ hort period of time. Thu ⁇ , it is preferred that if the transient signal comprises one or more sub ⁇ equent pul ⁇ e or pul ⁇ es, the leading edge or edges of which is/are located within a distance in time from the ⁇ elected candidate, which distance in time is ⁇ horter than a predetermined period of time, t s , of at the mo ⁇ t 4 m ⁇ , then the method further comprises: determining or measuring the maximum slope or ⁇ lope ⁇ of the leading edge or edge ⁇ of ⁇ aid ⁇ ub ⁇ equent pul ⁇ e or pulse ⁇ in the transient signal, comparing the obtained maximum slope or ⁇ lope ⁇ of the ⁇ ub ⁇ equent leading edge or edge ⁇ and the obtained maximum ⁇ lope of the selected candidate with one another, determining which of said leading edge ⁇ ha ⁇ the largest maximum slope, and selecting the leading edge with the largest maximum slope as the leading edge of a
  • the predetermined period of time t s is shorter than or equal to 3,3 ms, or shorter than or equal to 2 ms, or even shorter than or equal to 1 ms.
  • the method for ⁇ electing the leading edge of a ⁇ econd pul ⁇ e in the tran ⁇ ient ⁇ ignal further compri ⁇ e ⁇ : determining or mea ⁇ uring the maximum slope or slopes of the leading edge or edges of a pulse or pulses in the transi- ent signal subsequent to the selected leading edge of the first pulse within a distance in time from the leading edge of the fir ⁇ t pulse which is shorter than a predetermined period of time, t ep , of at the most 4 ms, said time period t ep being longer than or equal to the predetermined time period t s , comparing the obtained maximum ⁇ lope or ⁇ lope ⁇ of the ⁇ ubsequent leading edge or edges with the obtained maximum ⁇ lope of the leading edge of the fir
  • the method for selecting the leading edge of the second pulse in the transient ⁇ ignal further compri ⁇ es: determining or measuring the maximum slope or slope ⁇ of one or more leading pul ⁇ e edge ⁇ located at a di ⁇ tance in time from the leading edge of the first pulse which is longer than or equal to the predetermined period of time, t ep , reducing the required threshold value of the maximum slope below the maximum slope of the leading edge of the first pul ⁇ e, and ⁇ electing the fir ⁇ t leading edge with a maximum ⁇ lope • greater than the required threshold value as the leading edge of a second pulse, which second pul ⁇ e may correspond to an abrupt energy change representing a distinct sound picture.
  • the required threshold value for the maximum slope is decreased a ⁇ a function of time from the maximum ⁇ lope of the leading edge of the fir ⁇ t pul ⁇ e down to the predetermined lower thre ⁇ hold value.
  • the required thre ⁇ hold value i ⁇ decreased exponentially with a predetermined time constant t c .
  • the predetermined period of time t ep is shorter than or equal to 3,3 m ⁇ , or ⁇ horter than or equal to 2 m ⁇ , or even ⁇ horter than or equal to 1 m ⁇ .
  • the shape of a selected leading edge of a pulse may represent an important feature for identification or representation of the corresponding di ⁇ tinct ⁇ ound picture. Thu ⁇ , it i ⁇ pre ⁇ ferred that the ⁇ hape of the ⁇ elected leading edge ⁇ of pul ⁇ e ⁇ i ⁇ determined, and/or a distinct sound picture i ⁇ identified from the determined shape.
  • the shape of the selected leading edge of a pulse is determined by the obtained maximum ⁇ lope of the ⁇ elected leading edge.
  • the ri ⁇ e time of a ⁇ elected leading edge of a pulse may also represent an important feature for identification or repre ⁇ sentation of the corresponding distinct sound picture.
  • the ⁇ hape of a ⁇ elected leading edge of a pul ⁇ e i ⁇ characteri ⁇ ed by the rise time of the edge where the rise time i ⁇ determined as the time period from t b to t e , or by the ⁇ hape of the leading edge in the time period from t b to t e , where t b is the point in time where the slope of the leading edge has reached a threshold value for the beginning of the edge, d b , the ratio of said threshold value d b to the obtained maximum slope being predetermined, and t e is the point in time where the slope of the leading edge ha ⁇ decrea ⁇ ed from the maximum value to a thre ⁇ hold value for the end of the edge, d e , the ratio of said thres- hold value d e
  • the value of d b is in the range of 30-100% of the obtained maximum slope, and the value of d e is in the range of 30-90% of the obtained maximum ⁇ lope.
  • the value of d b may even more preferably be substantially equal to 50% or 100% of the obtained maximum slope, and the value of d e may even more preferably be sub ⁇ tantially equal to 70% of the obtained maximum ⁇ lope.
  • the transient signal from which the leading edge or edge ⁇ is/are selected is a transient signal generated in accordance with one of the embodiments referring to gene- ration of the ⁇ econd tran ⁇ ient ⁇ ignal.
  • the ⁇ y ⁇ tem com ⁇ pri ⁇ e ⁇ means for deriving, from the auditory signal, a first ⁇ ignal comprising transient pulses corresponding to at least part of the abrupt energy changes, and means for generating a second transient signal from said -first transient signal, said second signal generation mean ⁇ being adapted to hold the value of at least one local maximum of a pulse in the first transient signal at said maximum value for a predetermined period of time, t rfpr , thereby generating a corresponding pulse in the second transient signal, said predetermined period of time t rfpr being of at the most 5 ms.
  • t rfpr is of at the mo ⁇ t 1 ms or about 0,7 ms.
  • the invention also relates to a system for selecting leading edges of pulses in a transient signal, which signal repre- sents abrupt energy changes within an auditory signal.
  • the sy ⁇ tem compri ⁇ e ⁇ mean ⁇ for determining or measuring the maximum slope of a leading edge of a pulse in the tran ⁇ ient ⁇ ignal, mean ⁇ for comparing the obtained maximum slope with a predetermined lower threshold value for maximum slope ⁇ of leading edge ⁇ , and mean ⁇ for, ba ⁇ ed on the re ⁇ ult of ⁇ aid compari ⁇ on, se ⁇ lecting a candidate to the leading edge of a pulse.
  • the means for determining or measuring the maximum slope of a leading edge of a pulse are further adapted to determine or measure the maximum ⁇ lope or ⁇ lope ⁇ of a leading edge or edges of one or more pulses sub ⁇ equent to the ⁇ elected candi ⁇ date
  • the comparing means are further adapted for comparing the obtained maximum slope or slopes of the subsequent leading edge or edges and the obtained maximum ⁇ lope of the ⁇ elected candidate with one another
  • the ⁇ electing mean ⁇ are further adapted for, ba ⁇ ed on the re ⁇ ult of said comparison, selecting the leading edge with the largest maximum slope.
  • any of the system ⁇ which comprises means for generating the second transient signal further comprises means for selecting lead ⁇ ing edges of pulses in a transient signal in accordance with an embodiment of the present invention, the leading edges being selected from the second transient signal.
  • Fig. 1 show ⁇ a filter bank with N bandpass filters
  • Figs. 2 and 3 show transient detection signal ⁇ of the ⁇ peech ⁇ ignal " ⁇ oftkey" for two filters having different center frequencies in a filter bank
  • Fig. 4 how ⁇ the tran ⁇ ient detection signals of Fig. 3 of the vowel "i" a ⁇ in key
  • Fig. 5 show ⁇ tran ⁇ ient detection ⁇ ignal ⁇ corre ⁇ ponding to the ⁇ peech signal of Fig. 4, with the speech signal being pro ⁇ Obd according to a preferred embodiment of refractoriness period processing,
  • Fig. 6 show ⁇ tran ⁇ ient detection ⁇ ignal ⁇ corre ⁇ ponding to the speech signal of Fig. 4, with the speech signal being pro- ce ⁇ ed according to another preferred embodiment of refrac- torine ⁇ period proce ⁇ ing,
  • Fig. 7 illu ⁇ trates selection of a leading edge of a transient pul ⁇ e according to a preferred embodiment of the invention
  • Fig. 8 illustrate ⁇ the principle ⁇ of determination of maximum ⁇ lope and ri ⁇ e time of a leading edge of a tran ⁇ ient pul ⁇ e
  • Fig. 9 hows transient detection signals, including an edge signal and a measure of the pitch period, corresponding to the speech signal "softkey" pronounced by a female
  • Fig. 10 show ⁇ transient detection signal ⁇ , including an edge signal and a measure of the pitch period, corresponding to the vowel "i" as in key,
  • Fig. 11 shows the edge signal of Fig. 10 filtered by a band- pa ⁇ filter
  • Fig. 12 i a flow diagram illustrating a preferred embodiment of refractoriness period proce ⁇ ing
  • Fig. 13 is a flow diagram illustrating a preferred embodiment of detection of a leading edge
  • Fig. 14 is a plot of the bandwidths of cochlea bandpass filter ⁇ a ⁇ a function of centre frequency
  • Fig. 15 i ⁇ a plot of ri ⁇ e time ⁇ of input and output pulses of a bandpa ⁇ filter and of the impulse response of the filter.
  • the cochlea in the human ear can be regarded as an infinite number of bandpass filters, IBP, within the frequency range of the human ear.
  • a filter bank may be employed for detecting formants and thereby detecting the transient con ⁇ dition ⁇ that hold the most well qualified information with a sub ⁇ tantial suppres ⁇ ion of noi ⁇ e.
  • the bandwidth of the bandpas ⁇ filter ⁇ is chosen to be the same for all filter ⁇ in order to obtain the ⁇ ame envelope.
  • Another choice might be to scale the bandwidth of the filters in accordance with the Bark ⁇ cale or Mel ⁇ cale.
  • Fig. 1 shows a filter bank with N bandpass filters, BP ⁇ ⁇ -BP jj , followed by an envelope detection performed by use of rec ⁇ tification mean ⁇ , R- ⁇ R JJ , and lowpa ⁇ filter ⁇ , LP- L -LP J ⁇ .
  • the rectification mean ⁇ are preferably one-way rectification means.
  • the filter bank has to cover the transient oriented frequency range, and the centre frequency of the bandpa ⁇ filter ⁇ ha ⁇ therefore to be from about 1.4 kHz and upward ⁇ . To be able to detect sufficient fast transients the bandwidth has to be about 1.4 kHz.
  • Figs. 2 and 3 the transient detection by mean ⁇ of a filter bank i ⁇ illu ⁇ trated.
  • Figs. 2 and 3 show processed curves for the word " ⁇ oftkey" pronounced by a female and detected by mean ⁇ of two different bandpa ⁇ filter ⁇ .
  • the abscis ⁇ as represent a time interval of 1 ⁇ and the ordinates in Figs. 2a, 2b, 3a and 3b represent the sound pressure of the corre ⁇ ponding ⁇ peech ⁇ ignal wherea ⁇ the ordinates of Fig ⁇ . 2c and 3c repre ⁇ ent the energy of the corre ⁇ ponding ⁇ peech ⁇ ignal.
  • the bandpa ⁇ filter ⁇ are Butterworth filters of 6th order with a bandwidth on 1.4 kHz.
  • the centre frequency i ⁇ about 1.5 kHz with a lover cutoff frequency at about 0.8 kHz and an upper cutoff frequency at about 2.2 kHz.
  • the centre frequency is about 2.8 kHz with a lower cutoff frequency at about 2.1 kHz and an upper cutoff frequency at about 3.5 kHz.
  • the lowpas ⁇ filter i ⁇ a Ith order Butterworth filter with a cutoff frequency at 700 Hz, and the pretran ⁇ ient ⁇ ignal i ⁇ the output ⁇ ignal from the bandpa ⁇ filter.
  • the vowel "o” is very outstanding in the transient signal, but the other phonemes are very indistinct.
  • Fig. 3c the vowel "o” is less outstanding but the other phonemes are much more di ⁇ tinct.
  • the conclu ⁇ ion may be drawn that the vowel "o” should preferably be detected from the transient signal processed by the bandpa ⁇ filter with a centre fre ⁇ quency at 1.5 kHz, and the remaining phonemes should prefer ⁇ ably be detected from the transient signal processed by the bandpas ⁇ filter with a centre frequency at 2.8 kHz.
  • each branch can be regarded a ⁇ a TSD (Tran ⁇ ient Signal Detector) .
  • the number of branches in the sy ⁇ tem depend ⁇ on the demand on the ⁇ ystem, but the number should be in the range of 2-40.
  • TSDl the TSD used in connection with the results of Fig. 2 having a centre frequency at 1.5 kHz
  • TSD2 the TSD used in connection with the results of Fig. 3 having a centre frequency at 2.8 kHz
  • Fig. 1 then illu ⁇ trates a TSD bank.
  • Important features of fa ⁇ t energy changes of an auditory ⁇ ignal for identifying or repre ⁇ enting features that can be perceived by a human ear as repre ⁇ enting a di ⁇ tinct ⁇ ound picture may be the ⁇ hape of the leading edge and the period between the leading edge ⁇ .
  • Thi ⁇ period i ⁇ called the refractorine ⁇ period.
  • the nerve pul ⁇ e ⁇ launched from the cochlea are ⁇ ynchronized to the frequency of a ⁇ inu ⁇ tone if the frequency i ⁇ le ⁇ than about 1.4 kHz but not above thi ⁇ frequency.
  • Thi ⁇ mean ⁇ that the refractorine ⁇ period of interest may be about 0.7 ms.
  • the refractoriness period may be used for simplifying the proces ⁇ of detecting the leading edge of a tran ⁇ ient pul ⁇ e in the tran ⁇ ient component.
  • Fig. 4 shows part of the curves of Fig. 3 proces ⁇ ed by TSD2. The curves shown in Fig. 4 repre- sent the signals obtained for the vowel "i" as in key.
  • the transient signal of Fig. 4c is proces ⁇ ed without a refrac ⁇ torine ⁇ period.
  • Fig ⁇ . 5a and 6a are identical to Figs. 4a and Figs. 5b and 6b are identical to Figs. 4b.
  • the transient signal of Fig. 5c which represent ⁇ the energy of the corresponding speech signal is obtained from the bandpa ⁇ filtered pretransient signal in Fig. 5b by way of a rectification and by using a refractoriness period of l ms.
  • the signal of Fig. 5c ha ⁇ not been ⁇ ubject to a lowpa ⁇ filtration. It i ⁇ preferred that the implementation of the refractorine ⁇ s period is performed by using a software algorithm which is described below in connection with Fig. 12.
  • Fig. 6c show ⁇ a tran ⁇ ient ⁇ ignal which repre ⁇ ents the energy of the corresponding speech signal and which is obtained by performing a lowpas ⁇ filtration on the ⁇ ignal of Fig. 5c.
  • All the ⁇ ignal ⁇ of Fig ⁇ . 4 and 5 hold the ⁇ peech information and may ea ⁇ ily be perceived by a human ear, although ⁇ ome noi ⁇ e i ⁇ introduced during the proce ⁇ of tran ⁇ ient detection resulting in the signal ⁇ of Fig ⁇ . 5 c and 6c.
  • the ab ⁇ ci ⁇ a ⁇ repre ⁇ ent a time interval of 50 m ⁇ .
  • the refractorine ⁇ period may be about 0.5 m ⁇ or longer but preferably le ⁇ than the minimum pitch period, that mean ⁇ less than about 3.3 ms.
  • the shape of the leading edge may be one of the important feature ⁇ for repre ⁇ ⁇ enting a sound picture, and the maximum slope of the leading edge may be an important feature for the edge.
  • the maximum slope of the leading edge may be the basi ⁇ for detec- ting the important feature ⁇ for identifying or repre ⁇ enting a di ⁇ tinct ⁇ ound picture.
  • Fig. 7 the ab ⁇ ci ⁇ sa represent ⁇ a time interval of 50 m ⁇ , and the ⁇ ignals of Fig ⁇ . 7a, b and c correspond to the sig- nal ⁇ of Fig ⁇ . 6a, b and c, wherea ⁇ in Fig. 7d the differenti ⁇ ated ⁇ ignal of the signal of Fig. 7c, called differential signal, is shown.
  • d em a predetermined minimum value
  • the size of d em may depend on how the signal is normalised.
  • the signals of Fig ⁇ . 2-7 are normali ⁇ ed to the maximum nu ⁇ merical value in the whole ⁇ ignal, and d em i ⁇ preferably selected to 2.5% of the maximum detected slope value.
  • d em may be ⁇ elected otherwise, and preferably higher.
  • the maximum slope may be detected by finding a maximum greater than the threshold d em and select this a ⁇ a candidate to be the maximum ⁇ lope of a leading edge, called d m . If there i ⁇ a greater maximum ⁇ lope for a given ⁇ earch time, t s , then choo ⁇ e thi ⁇ point a ⁇ having the maximum ⁇ lope of a leading edge, else choose the candidate.
  • the search time t s may be selected to be les ⁇ than the minimum pitch period which means les ⁇ than about 3.3 m ⁇ , but preferably around 2 m ⁇ .
  • the following leading edge may be detected a ⁇ illu ⁇ trated in Fig. 7d.
  • t ep When the point for the maximum ⁇ lope for a leading edge i ⁇ detected, then for a time period, t ep , only a maximum ⁇ lope greater than the previou ⁇ maximum ⁇ lope will be accepted, in other word ⁇ , in thi ⁇ time period the thre ⁇ hold for accepting a leading edge i ⁇ equal to the previous maximum ⁇ lope.
  • the thre ⁇ hold may be expo ⁇ nential decrea ⁇ ed with a time con ⁇ tant t c , which i ⁇ also illustrated in Fig. 7d.
  • the time period for t ep may be less than the minimum pitch period, that mean less than about 3.3 ms, but preferably between 1-2 m ⁇ . However, t ep should be longer than or equal to the search time t s .
  • the edge of a leading edge may be described as beginning at a point in time, t b , where the slope has the maximum slope, or a point in time before the point with the maximum slope, where the slope has reached a threshold value, d b , having a predetermined ratio to the maximum slope, and ending at the point, t e , after the point with the maximum slope, where the ⁇ lope ha ⁇ decreased to a threshold value, d e , having a prede- termined ratio to the maximum slope.
  • This principle is il ⁇ lustrated in Fig. 8, where the amplitude of the leading edge is ⁇ hown a ⁇ A in Fig. 8a, and the differential of the leading edge i ⁇ ⁇ hown a ⁇ D in Fig. 8b.
  • Fig ⁇ . 9 and 10 an edge detection following the above defined edge detector principles i ⁇ illu ⁇ trated .
  • the absci ⁇ a ⁇ in Fig. 9 repre ⁇ ent a time interval of 1 ⁇ , while a time interval of 50 m ⁇ of the signal ⁇ in Fig. 9 i ⁇ repre ⁇ sented in Fig. 10, in which time interval the signal ⁇ for the vowel "i" in the word key are ⁇ hown.
  • the tran ⁇ ient signal of Figs. 9c and 10c has been processed in accordance with the signal presented in Fig. 6c, and a leading edge signal named edge ⁇ ignal, see Figs. 9d and lOd, has been obtained by determining the rise time of selected leading edges.
  • a graph of the pitch period between the selected edges is shown, Fig ⁇ . 9e and lOe. If the pitch period i ⁇ longer than 15 ms it i ⁇ set equal to 15 ms. A low resolution i ⁇ obtained in the printout of Fig. 9d due to a limited printer resolution.
  • the transient signal detector TSD2 is used when proces ⁇ ing the ⁇ ignal ⁇ of Fig ⁇ . 9 and 10.
  • the maximum slopes of pulse ⁇ in the tran ⁇ ient ⁇ ignal, Fig ⁇ . 9c and 10c, are determined, and for the selected leading edges the starting point in time, t b , of the edge is set equal to the point in time where the maximum ⁇ lope is detected, i.e. d b is equal to d m
  • t e is equal to the point in time where the ⁇ lope ha ⁇ decreased to 70% of d m , i.e. d e is equal to 70 % of d m .
  • the part of the leading edge of a pulse in the transient signal corresponding to the time interval of t b to t e is repre ⁇ ented a ⁇ the lead ⁇ ing edge of a pul ⁇ e in the edge signal, Figs. 9d and lOd.
  • the edge signal holds the full speech information and may easily be perceived by a human ear, although some noise may be introduced during the proces ⁇ ing.
  • the leading edge may be defined a ⁇ beginning at a leading threshold value, d b , greater than 50 % of the maximum slope, but preferably equal to the maximum ⁇ lope, and ending at a lagging thre ⁇ hold value, d e , greater than 50 % of the maximum ⁇ lope, but preferably 70% of the maximum ⁇ lope.
  • the rise time of the leading edge may be defined a ⁇ the time period between t b and t e , and may in a preferred embodiment be used as representing a measure for the ⁇ hape of the lead ⁇ ing edge, and thu ⁇ forming the ba ⁇ i ⁇ for identification of a di ⁇ tinct sound picture.
  • the pulses of the edge signal may al ⁇ o be cho ⁇ en a ⁇ the ba ⁇ i ⁇ for identification of a di ⁇ tinct ⁇ ound picture.
  • edge detector can be u ⁇ ed a ⁇ a pitch detector, but known technique ⁇ for pitch detection can al ⁇ o be applied.
  • the ⁇ hape of the leading edge of a ⁇ peech ⁇ ignal which ⁇ ignal may be a phoneme, may be considered a conclusive feature for narrow band communication. Therefore, only infor- mation about the leading edge, unvoiced or voiced, and/or pitch period, and/or loudnes ⁇ of the speech signal should need to be transmitted. Thu ⁇ , it ⁇ hould not be nece ⁇ ary to tran ⁇ mit information concerning the vocal filter, thereby ⁇ aving bandwidth.
  • Information about a ⁇ peech signal being unvoiced or voiced, and/or the pitch period and/or loudnes ⁇ of the speech signal may be compressed and decompres ⁇ ed by mean ⁇ of known tech ⁇ nology, in which ⁇ peech ⁇ ignals are framed in time periods of 20-40 ms, and only the change in the parameters need to be tran ⁇ mitted.
  • the leading edge may be compressed by identify ⁇ ing and representing the edge according to one of the embodi ⁇ ments of the present invention, for time frames of 20-40 m ⁇ by mean ⁇ of a template identification from a library or a book.
  • the speech signal may be decompres ⁇ ed by mean ⁇ of a library or book of edge template ⁇ with corresponding standard filters, which filters should be excited by the edge tem ⁇ plate. Otherwise the speech signal may be decompressed by mean ⁇ of a library or book, with ⁇ tandard wave form ⁇ iden- tified by means of the edge template identification.
  • FIG. 11 shows the edge signal of Fig. lOd filtered with the same bandpas ⁇ filter u ⁇ ed for processing the pretransient ⁇ ignal, Fig. 10b, i.e. the centre frequency i ⁇ about 2.8 kHz with a lower cutoff frequency about 2.1 kHz and an upper cutoff frequency about 3.5 kHz.
  • the sound quali ⁇ ty of the signal represented in Fig. 11 is improved when compared to the ⁇ ignal of Fig. lOd.
  • the ⁇ ignal of Fig. 11 may be compared with the pretran ⁇ ient ⁇ ignal of Fig. 10b.
  • the edge ⁇ ignal may be proce ⁇ ed by mean ⁇ of a filter with another filter characteri ⁇ tic or by means of waveform de ⁇ coding.
  • Fig. 12 how ⁇ a preferred embodiment of implementation of the refractorine ⁇ period.
  • the definition ⁇ of the flow chart variables of the proces ⁇ of Fig. 12 are given a ⁇ follow ⁇ :
  • PrvSi value of previous input ⁇ ample (Si (n-l), n > 0) .
  • LeadingEdge a Boolean variable,- it is true if the sample is in a leading edge or in a refractorines ⁇ period, el ⁇ e it i ⁇ false.
  • Fig. 13a how ⁇ a preferred embodiment of implementation of the edge detection principle.
  • d differentiated transient signal (Differential signal) .
  • n Index for ⁇ ample ⁇ of the differential ⁇ ignal.
  • d prv A help variable and mostly the previous sample of the differential ⁇ ignal.
  • d em Relative minimum thre ⁇ hold for the differential signal.
  • d m Maximum slope for the edge.
  • t s Search time in samples for the greatest local maximum of the slope greater than d m . t m :Sample no. for the detected maximum slope k :Index for the detected edge.
  • thr Predetermined ratio of thre ⁇ hold value for the ⁇ lope at the beginning of the edge d b to the maximum ⁇ lope d m .
  • thr c Predetermined ratio of thre ⁇ hold value for the ⁇ lope at the end of the edge d e to the maximum slope d m .
  • Fig. 15 illustrate ⁇ that if the ri ⁇ e time of a pul ⁇ e provided as an input to a filter is slower than the rise time of the impul ⁇ e re ⁇ pon ⁇ e of the filter then, the rise time of the output of the filter generated in response to the input pulse will be sub ⁇ tantially equal to the rise time of the input pulse.
  • Signal processing of sound signal ⁇ in the cochlea may be simulated by a filter bank comprising a ⁇ et of bandpass filters with different centre frequencies and wherein the bandwidths of these filters increase with increasing centre frequencies which again means that the ri ⁇ e times of the impulse responses of the filters increase with increasing centre frequencies.
  • the ri ⁇ e time of an output pulse generated by a corresponding filter of the filter bank will be ⁇ ubstantially equal to the ri ⁇ e time of the impul ⁇ e re ⁇ pon ⁇ e of the filter when the ri ⁇ e time of the input pulse is faster than the ri ⁇ e time of the impul ⁇ e re ⁇ pon ⁇ e of the filter and ⁇ ub ⁇ tantially equal to the ri ⁇ e time of the input pul ⁇ e when the ri ⁇ e time of the input pul ⁇ e i ⁇ ⁇ lower than the rise time of the impulse response of the filter.
  • the rise time of the input pul ⁇ e may be determined by determination of the two filters A and B of the filter bank having the narrowest bandwidths of the filters generating output pul ⁇ e ⁇ in re ⁇ ponse to the input pulse with sub ⁇ tantially identical rise times a ⁇ the rise time of the input pulse must be within the rise time range between the rise time of the impulse response of the filter A, B with the narrowest bandwidth and the ri ⁇ e time of the impul ⁇ e respon ⁇ e of the filter with the largest bandwidth that i ⁇ al ⁇ o lower than the bandwidth ⁇ of the filter ⁇ A, B.
  • speech signal ⁇ may be generated by modulation of pul ⁇ e ⁇ in a filter that modulate ⁇ the ⁇ hape of the pul ⁇ e ⁇ a ⁇ de ⁇ cribed above.
  • Pulse ⁇ to be modulated correspond to sound signals generated in the articulation channel, e.g. by the vocal chord, and the processing in the filters correspond ⁇ to the modulation performed by adju ⁇ tment of the articulation channel according to the phoneme proce ⁇ ed whereby the filters modulate the shape of the pulse ⁇ .
  • the time between pul ⁇ e ⁇ to be modulated ⁇ hould ⁇ ufficiently long to ensure that there i ⁇ no interference between output pul ⁇ e ⁇ generated in response to different input pulses.
  • the shape of the leading edge and the rise time may both be conclusive features.
  • the leading edge may be detected a ⁇ de ⁇ cribed above, and in a preferred embodiment the edge detection i ⁇ ba ⁇ ed on a transient signal proces ⁇ ed with a refractorine ⁇ period either without a lowpass filtering as ⁇ hown in Fig. 5, or with a lowpa ⁇ filter a ⁇ ⁇ hown in Fig. 6.
  • a phoneme may be identified by mean ⁇ of feature ⁇ , such as a cla ⁇ ification of the shape of the leading edges, mean pitch period, variation of pitch periods, and/or dynamic trend of the edge height in a time frame of 10-100 ms.
  • the pre ⁇ ent invention i ⁇ preferably implemented utilizing a programmed proce ⁇ or ⁇ uch a ⁇ a microcomputer for real time applications but this i ⁇ not to be limiting.
  • the pre ⁇ ent invention may al ⁇ o be implemented u ⁇ ing a dedicated hardware proce ⁇ or if de ⁇ ired or by a more powerful mainframe computer without departing from the pre ⁇ ent invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention relates to a method and a system for processing an auditory signal to facilitate identification of abrupt energy changes within the auditory signal, which abrupt energy changes have a rise time of at the most 3 ms, and which abrupt energy changes can be perceived by a human ear as representing a distinct sound picture. The abrupt energy changes representing the distinct sound picture can be a phoneme. When processing the auditory signal a first signal comprising transient pulses corresponding to at least part of the abrupt energy changes is generated, and a second transient signal is generated by monitoring pulses in the first transient signal, determining local maxima of the transient pulses, and holding the value of at least one determined local maximum of a pulse in the first transient signal at said maximum value for a predetermined period of time thereby generating a corresponding pulse in the second transient signal. It is preferred that the predetermined period of time equals the refractoriness period of nerve pulses launched from the cochlea of the human ear. The shape of the pulses of the second signal may be used for identification of the corresponding distinct sound picture. The invention further relates to a method and a system for selecting leading edges of transient pulses derived from the abrupt energy changes in the auditory signal.

Description

METHOD AND SYSTEM FOR PROCESSING AUDITORY SIGNALS
The present invention relates to a method and system for signal processing, by which method and system features repre¬ senting distinct sound pictures in auditory signals are extracted from transients in auditory signals. The result of the processing may be used for identification of sound or of speech signals or for quality measurement of audio products or systems, such as loudspeakers, hearing instruments or hearing aids, telecommunication systems, or for quality measurement of acoustic conditions. The method of the present invention may also be used in connection with speech compres¬ sion and decompression in narrow band telecommunication or speech storing systems.
The human ear has the ability to catch fast sound signals, detect sound frequency with great accuracy and differentiate between sound signals in complicated sound environments. For instance it is possible to understand what a singer is sing¬ ing in an accompaniment of musical instruments.
In WO 94/25958 the inventor of the present invention has argued that this is only possible because the human ear is able to detect very fast energy changes in auditory signals, e.g. transient pulses with a very short rise time and to use the information hold in the shape of pulses representing these fast energy changes for identifying distinct sound pictures. The present invention is based on the definitions given in WO 94/25958 which is hereby included by reference.
In known technology the interpretation of a transient in a speech signal is a change of the signal between two staple sounds detected either as a slower change in the energy level of the signal, a technique disclosed in WO 83/01526, or as a change in the spectrum of the signal, a technique disclosed in GB 2213623. The term transient component in an auditory signal in this invention may be interpreted as a fast change of the energy in an auditory signal, where the rise time of the energy change is at the most 3 ms, and a slower change of the energy level may be interpreted as a change of the quasi steady state component of an auditory signal.
The transient and the quasi steady state component in an auditory signal may be defined as follows:
The transient component in an auditory signal is the fast energy changes, that may be detected by means of an envelope detection using a lowpass filter with a rela¬ tively high cutoff frequency in the range 50-1500 Hz, and preferably in the range 300-1500 Hz.
The quasi steady state component in an auditory signal iε the energy level, that may be detected by means an of envelope detection using a low-pass filter with a rela¬ tively low cutoff frequency in the range below 400 Hz, and preferably below 150 Hz.
Here it should be noted that the fast energy changes in the auditory signal may also be detected without the use of envelope detection or without the use of a low pass filter.
Today it is known that the nerve pulses launched from the cochlea are synchronised to the frequency of a sinus tone if the frequency is less than about 1.4 kHz. If the frequency of the tone is higher than about 1.4 kHz the pulses are launched randomly and less than once per period. Therefore the audi¬ tory perceptive faculty is tone oriented in the range up to about 1.4 kHz and transient oriented above. The frequency spectra of speech signals from human beings contain energy bands, called formants. These formants are carriers of outstanding transients, and if the formants are selected for transient analyses an important noise sup- pression may be obtained.
In WO 94/25958 it is described how the information hold in the shape of pulses representing the fast energy changeε in auditory signalε are used for identifying distinct sound pictures, and in a preferred embodiment the shape of the leading edge of a pulse is determined by determining the pulse rise time or determining the slope variation. It is further preferred that the shape of the top part of the leading edge is determined, the top part starting at the point of the edge where the slope is maximum.
However, when examining transient εignalε derived from the auditory εignal, a number of pulses which are not holding any subεtantial information may be obεerved. Thuε, in order to accurately determine or εelect the pulεeε repreεenting the fast energy changeε, new efficient noise suppreεεion tech- niqueε may be useful. Furthermore, new noise suppreεεion techniques may also be useful when extracting information from the pulεeε in order to identify correεponding diεtinct εound pictureε.
Further, it is noted that if the rise time of a pulse provided as an input to a filter is faster than the rise time of the impulεe response of the filter then, the rise time of the output of the filter generated in response to the input pulse will be substantially equal to the rise time of the impulse response of the filter.
Likewise, if the rise time of a pulse provided aε an input to a filter iε εlower than the riεe time of the impulse response of the filter then, the riεe time of the output of the filter generated in reεponεe to the input pulse will be subεtantially equal to the riεe time of the input pulεe. It is well-known that the signal processing of sound signals in the cochlea may be simulated by a filter bank compriεing a set of bandpass filters with different centre frequencies and that the bandwidths of these filters increase with increasing centre frequencies which again means that the rise timeε of the impulse responses of the filters increase with increasing centre frequencies. When a pulse with a specific rise time is provided aε an input to the filter bank, the riεe time of an output pulεe generated by a corresponding filter of the filter bank will be substantially equal to the rise time of the impulse reεponεe of the filter when the riεe time of the input pulεe iε faεter than the rise time of the impulse reεponse of the filter and subεtantially equal to the riεe time of the input pulse when the rise time of the input pulse is slower than the rise time of the impulse responεe of the filter. Thuε, the rise time of the input pulse may be determined by determination of the two filters A and B of the filter bank having the narrowest bandwidths of the filters of the bank generating output pulses in response to the input pulse with εubεtantially identical riεe timeε aε the riεe time of the input pulεe muεt be within the rise time range between the rise time of the impulse response of the filter A, B with the narrowest bandwidth and the rise time of the impulse responεe of the filter with the largest bandwidth that is also lower than the bandwidths of the filters A, B.
This riεe time detection principle may be utilized by the auditory organε of living beings and thiε could explain why the bandwidthε of the filters simulating cochlea εound proceεεing are increaεing with increaεing centre frequencieε.
Correspondingly, sound εpeech signals may be generated by modulation of pulses in filters that modulate the εhape of the pulεes as described above. Pulses to be modulated correspond to speech signalε generated in the articulation channel, e.g. by the vocal chord, and the proceεsing in the filters correspond to the modulation performed by adjustment of the articulation channel according to the phoneme proceεεed whereby the filterε modulate the shape of the pulses. Preferably, the time between pulses to be modulated should sufficiently long to ensure that there iε no interference between output pulεeε generated in response to different input pulses.
It is an object of the present invention to provide an improved method for processing an auditory signal in order to obtain a reduction of signal components which may be con¬ sidered as representing noise components in the process of identification or representation of diεtinct εound pictureε of the auditory signal.
This object is accomplished by providing a method of proces- εing an auditory εignal to facilitate identification of abrupt energy changes within the auditory signal, which abrupt energy changes have a riεe time of at the moεt 3 mε, and which abrupt energy changeε can be perceived by an animal ear εuch aε a human ear aε representing a distinct sound picture. The method comprises: deriving, from the auditory signal, a first signal com- prising transient pulεes corresponding to at least part of the abrupt energy changeε, tracing or monitoring pulεeε in the firεt transient signal, determining local maxima of the transient pulεeε, and generating a second transient signal wherein the value of at leaεt one determined local maximum of a pulεe in the firεt tranεient signal is hold at said maximum value for a pre¬ determined period of time trfpr thereby generating a corre¬ sponding pulεe in the second transient signal, said predeter- mined period of time trfpr being of at the most 5 ms.
However, within the predetermined time period trfpr, several pulses with each their local maximum may be observed in the first transient signal. Thuε, it iε preferred that if pulses in a train of two or more successive pulses in the first transient signal are εubjected to the above deεcribed holding procedure, and one or more of the pulses is/are located at a distance in time from a preceding pulse which is εhorter than the predetermined period of time trfpr and haε/have a local maximum greater than the local maximum of εaid preceding pulεe, the hold of the local maximum of said preceding pulse is maintained until the occurrence of the εubεequent, greater local maximum and iε replaced by εaid εubεequent, greater local maximum.
It is preferred that the predetermined period of time trfpr is shorter than or equal to 3 ms, or shorter than or equal to 2 mε. It iε even more preferred that trfpr iε εhorter than or equal to 1 mε, or about 0,7 mε.
It εhould be noted that the shape of a pulse in the second transient pulse signal is an important feature for identifi- cation of the pulse. Thus, in a preferred embodiment of the invention the shape of pulseε in the εecond tranεient pulse signal are determined or identified, and preferably one or more distinct sound pictures is/are identified from the determined εhape. The εhape of a pulse may be characterized by the pulse rise time, the form of the leading edge, the duration of the pulse, and/or the fall time or the form of the lagging edge, and it is preferred that the form of the leading edge is determined by determining rise time, slope and/or slope variation of at least part of the leading edge.
In one embodiment of the invention the frequency of the auditory signal is determined from the second transient signal based on the distance in time between succeeding leading edges of pulεeε in the εignal.
It iε another object of the preεent invention to provide a method for selecting pulses in a transient signal, which pulεeε represent distinct sound pictures in an auditory signal. The method includeε εelecting pulses where the shape of the leading edge has a maximum slope greater than a prede¬ termined minimum value, thereby discarding pulses with a rather small maximum slope, which pulses may be considered aε representing noise components in the process of identifica¬ tion or representation of distinct sound pictureε of the auditory εignal.
This object is accomplished by providing a method for select¬ ing leading edgeε of tranεient pulseε in a transient signal, said transient signal being derived from an auditory signal having abrupt energy changes with a rise time of at the moεt 3 mε, and which abrupt energy changeε can be perceived by an animal ear εuch aε a human ear aε representing a distinct sound picture. The method compriεeε: determining or meaεuring the maximum εlope of a leading edge of a pulεe in the tranεient εignal, comparing the obtained maximum εlope with a predetermined lower threεhold value for maximum slopes of leading edges, and if the obtained maximum slope iε equal to or greater than the predetermined lower threshold value, selecting εaid leading edge aε a candidate to the leading edge of a pulεe.
However, εeveral leading pulεe edgeε being candidateε for a εelected leading edge may be obεerved within a εhort period of time. Thuε, it is preferred that if the transient signal comprises one or more subεequent pulεe or pulεes, the leading edge or edges of which is/are located within a distance in time from the εelected candidate, which distance in time is εhorter than a predetermined period of time, ts, of at the moεt 4 mε, then the method further comprises: determining or measuring the maximum slope or εlopeε of the leading edge or edgeε of εaid εubεequent pulεe or pulseε in the transient signal, comparing the obtained maximum slope or εlopeε of the εubεequent leading edge or edgeε and the obtained maximum εlope of the selected candidate with one another, determining which of said leading edgeε haε the largest maximum slope, and selecting the leading edge with the largest maximum slope as the leading edge of a first pulse. The selected first pulse may correspond to an abrupt energy change representing a distinct sound picture.
It is preferred that the predetermined period of time ts is shorter than or equal to 3,3 ms, or shorter than or equal to 2 ms, or even shorter than or equal to 1 ms.
The requirementε which εhould be fulfilled by a following leading edge in order to be εelected may depend on the char- acteriεticε of the firεt εelected leading edge. In a pre¬ ferred embodiment, the method for εelecting the leading edge of a εecond pulεe in the tranεient εignal further compriεeε: determining or meaεuring the maximum slope or slopes of the leading edge or edges of a pulse or pulses in the transi- ent signal subsequent to the selected leading edge of the first pulse within a distance in time from the leading edge of the firεt pulse which is shorter than a predetermined period of time, tep, of at the most 4 ms, said time period tep being longer than or equal to the predetermined time period ts, comparing the obtained maximum εlope or εlopeε of the εubsequent leading edge or edges with the obtained maximum εlope of the leading edge of the firεt pulεe, and selecting the firεt leading edge with a maximum εlope greater than a threεhold εlope equal to the maximum εlope of the leading edge of the firεt pulεe aε the leading edge of a εecond pulse. The selected second pulse may correspond to an abrupt energy change representing a diεtinct εound picture, which distinct sound picture may correspond to the sound picture of the first εelected pulse.
If no maximum εlope greater than the maximum εlope of the leading edge of the firεt pulεe iε obtained within the time period tep, then the method for selecting the leading edge of the second pulse in the transient εignal further compriεes: determining or measuring the maximum slope or slopeε of one or more leading pulεe edgeε located at a diεtance in time from the leading edge of the first pulse which is longer than or equal to the predetermined period of time, tep, reducing the required threshold value of the maximum slope below the maximum slope of the leading edge of the first pulεe, and εelecting the firεt leading edge with a maximum εlope greater than the required threshold value as the leading edge of a second pulse, which second pulεe may correspond to an abrupt energy change representing a distinct sound picture.
It is preferred that the required threshold value for the maximum slope is decreased aε a function of time from the maximum εlope of the leading edge of the firεt pulεe down to the predetermined lower threεhold value. Preferably, the required threεhold value iε decreased exponentially with a predetermined time constant tc.
It iε preferred that the predetermined period of time tep is shorter than or equal to 3,3 mε, or εhorter than or equal to 2 mε, or even εhorter than or equal to 1 mε.
The shape of a selected leading edge of a pulse may represent an important feature for identification or representation of the corresponding diεtinct εound picture. Thuε, it iε pre¬ ferred that the εhape of the εelected leading edgeε of pulεeε iε determined, and/or a distinct sound picture iε identified from the determined shape. Preferably, the shape of the selected leading edge of a pulse is determined by the obtained maximum εlope of the εelected leading edge.
The riεe time of a εelected leading edge of a pulse may also represent an important feature for identification or repre¬ sentation of the corresponding distinct sound picture. Thus, it is preferred that the εhape of a εelected leading edge of a pulεe iε characteriεed by the rise time of the edge, where the rise time iε determined as the time period from tb to te, or by the εhape of the leading edge in the time period from tb to te, where tb is the point in time where the slope of the leading edge has reached a threshold value for the beginning of the edge, db, the ratio of said threshold value db to the obtained maximum slope being predetermined, and te is the point in time where the slope of the leading edge haε decreaεed from the maximum value to a threεhold value for the end of the edge, de, the ratio of said thres- hold value de to the obtained maximum εlope being predeter¬ mined.
Preferably, the value of db is in the range of 30-100% of the obtained maximum slope, and the value of de is in the range of 30-90% of the obtained maximum εlope. The value of db may even more preferably be substantially equal to 50% or 100% of the obtained maximum slope, and the value of de may even more preferably be subεtantially equal to 70% of the obtained maximum εlope.
It iε also an object of the present invention to combine the above mentioned generation of the second tranεient εignal with the method of selection of a leading edge. So, it is preferred that the transient signal from which the leading edge or edgeε is/are selected is a transient signal generated in accordance with one of the embodiments referring to gene- ration of the εecond tranεient εignal.
It iε a further object of the preεent invention to provide a εyεtem for proceεεing an auditory signal, which procesεing εyεtem facilitates identification of abrupt energy changes within the auditory signal, or reduces the bandwith of the εignal with εubεtantial retention of the information of the εignal. The εyεtem com¬ priεeε means for deriving, from the auditory signal, a first εignal comprising transient pulses corresponding to at least part of the abrupt energy changes, and means for generating a second transient signal from said -first transient signal, said second signal generation meanε being adapted to hold the value of at least one local maximum of a pulse in the first transient signal at said maximum value for a predetermined period of time, trfpr, thereby generating a corresponding pulse in the second transient signal, said predetermined period of time trfpr being of at the most 5 ms. Preferably trfpr is of at the moεt 1 ms or about 0,7 ms.
The invention also relates to a system for selecting leading edges of pulses in a transient signal, which signal repre- sents abrupt energy changes within an auditory signal. The syεtem compriεeε meanε for determining or measuring the maximum slope of a leading edge of a pulse in the tranεient εignal, meanε for comparing the obtained maximum slope with a predetermined lower threshold value for maximum slopeε of leading edgeε, and meanε for, baεed on the reεult of εaid compariεon, se¬ lecting a candidate to the leading edge of a pulse.
It is preferred that the means for determining or measuring the maximum slope of a leading edge of a pulse are further adapted to determine or measure the maximum εlope or εlopeε of a leading edge or edges of one or more pulses subεequent to the εelected candi¬ date, the comparing means are further adapted for comparing the obtained maximum slope or slopes of the subsequent leading edge or edges and the obtained maximum εlope of the εelected candidate with one another, and the εelecting meanε are further adapted for, baεed on the reεult of said comparison, selecting the leading edge with the largest maximum slope. In another preferred embodiment of the invention, any of the systemε which comprises means for generating the second transient signal further comprises means for selecting lead¬ ing edges of pulses in a transient signal in accordance with an embodiment of the present invention, the leading edges being selected from the second transient signal.
Embodiments and details of the method and system according to the present invention appear from the claims and the detailed discuεεion of embodimentε of the system given in connection with the accompanying drawing.
Fig. 1 showε a filter bank with N bandpass filters,
Figs. 2 and 3 show transient detection signalε of the εpeech εignal "εoftkey" for two filters having different center frequencies in a filter bank,
Fig. 4 εhowε the tranεient detection signals of Fig. 3 of the vowel "i" aε in key,
Fig. 5 showε tranεient detection εignalε correεponding to the εpeech signal of Fig. 4, with the speech signal being pro¬ cessed according to a preferred embodiment of refractoriness period processing,
Fig. 6 showε tranεient detection εignalε correεponding to the speech signal of Fig. 4, with the speech signal being pro- ceεεed according to another preferred embodiment of refrac- torineεε period proceεεing,
Fig. 7 illuεtrates selection of a leading edge of a transient pulεe according to a preferred embodiment of the invention,
Fig. 8 illustrateε the principleε of determination of maximum εlope and riεe time of a leading edge of a tranεient pulεe, Fig. 9 εhows transient detection signals, including an edge signal and a measure of the pitch period, corresponding to the speech signal "softkey" pronounced by a female,
Fig. 10 showε transient detection signalε, including an edge signal and a measure of the pitch period, corresponding to the vowel "i" as in key,
Fig. 11 shows the edge signal of Fig. 10 filtered by a band- paεε filter,
Fig. 12 iε a flow diagram illustrating a preferred embodiment of refractoriness period proceεεing,
Fig. 13 is a flow diagram illustrating a preferred embodiment of detection of a leading edge,
Fig. 14 is a plot of the bandwidths of cochlea bandpass filterε aε a function of centre frequency, and
Fig. 15 iε a plot of riεe timeε of input and output pulses of a bandpaεε filter and of the impulse response of the filter.
In prior art methods of signal analysiε and in the method of the present invention it is asεumed that the cochlea in the human ear can be regarded as an infinite number of bandpass filters, IBP, within the frequency range of the human ear.
In WO 94/25958 it iε εhown that under the aεεumption that the bandwidth iε identical for all filterε in the IBP, the impulεe reεponεe will reεult in the same envelope for all filters.
In the present invention a filter bank may be employed for detecting formants and thereby detecting the transient con¬ ditionε that hold the most well qualified information with a subεtantial suppresεion of noiεe. In the following analyses the bandwidth of the bandpasε filterε is chosen to be the same for all filterε in order to obtain the εame envelope. Another choice might be to scale the bandwidth of the filters in accordance with the Bark εcale or Mel εcale.
Fig. 1 shows a filter bank with N bandpass filters, BP^^-BPjj, followed by an envelope detection performed by use of rec¬ tification meanε, R-^RJJ, and lowpaεε filterε, LP-L-LPJ^. The rectification meanε are preferably one-way rectification means. The filter bank has to cover the transient oriented frequency range, and the centre frequency of the bandpaεε filterε haε therefore to be from about 1.4 kHz and upwardε. To be able to detect sufficient fast transients the bandwidth has to be about 1.4 kHz.
In Figs. 2 and 3 the transient detection by meanε of a filter bank iε illuεtrated. Figs. 2 and 3 show processed curves for the word "εoftkey" pronounced by a female and detected by meanε of two different bandpaεε filterε. In Figε. 2 and 3 the abscisεas represent a time interval of 1 ε and the ordinates in Figs. 2a, 2b, 3a and 3b represent the sound pressure of the correεponding εpeech εignal whereaε the ordinates of Figε. 2c and 3c repreεent the energy of the correεponding εpeech εignal.
The bandpaεε filterε are Butterworth filters of 6th order with a bandwidth on 1.4 kHz. In Fig. 2 the centre frequency iε about 1.5 kHz with a lover cutoff frequency at about 0.8 kHz and an upper cutoff frequency at about 2.2 kHz. In Fig. 3 the centre frequency is about 2.8 kHz with a lower cutoff frequency at about 2.1 kHz and an upper cutoff frequency at about 3.5 kHz. In both Fig. 2 and 3 the lowpasε filter iε a Ith order Butterworth filter with a cutoff frequency at 700 Hz, and the pretranεient εignal iε the output εignal from the bandpaεε filter. In Fig. 2c the vowel "o" is very outstanding in the transient signal, but the other phonemes are very indistinct. In Fig. 3c the vowel "o" is less outstanding but the other phonemes are much more diεtinct. The concluεion may be drawn that the vowel "o" should preferably be detected from the transient signal processed by the bandpaεε filter with a centre fre¬ quency at 1.5 kHz, and the remaining phonemes should prefer¬ ably be detected from the transient signal processed by the bandpasε filter with a centre frequency at 2.8 kHz.
In Fig. 1 each branch can be regarded aε a TSD (Tranεient Signal Detector) . The number of branches in the syεtem dependε on the demand on the εystem, but the number should be in the range of 2-40. In the following examples the TSD used in connection with the results of Fig. 2 having a centre frequency at 1.5 kHz is referred to as TSDl, and the TSD used in connection with the results of Fig. 3 having a centre frequency at 2.8 kHz is referred to as TSD2. Fig. 1 then illuεtrates a TSD bank.
Important features of faεt energy changes of an auditory εignal for identifying or repreεenting features that can be perceived by a human ear as repreεenting a diεtinct εound picture may be the εhape of the leading edge and the period between the leading edgeε.
It iε known that when a nerve haε launched a pulεe it takeε εome time before it can launch a pulεe again. Thiε period iε called the refractorineεε period. Aε mentioned above the nerve pulεeε launched from the cochlea are εynchronized to the frequency of a εinuε tone if the frequency iε leεε than about 1.4 kHz but not above thiε frequency. Thiε meanε that the refractorineεε period of interest may be about 0.7 ms.
The refractoriness period may be used for simplifying the procesε of detecting the leading edge of a tranεient pulεe in the tranεient component. Fig. 4 shows part of the curves of Fig. 3 procesεed by TSD2. The curves shown in Fig. 4 repre- sent the signals obtained for the vowel "i" as in key. The transient signal of Fig. 4c is procesεed without a refrac¬ torineεε period. Figε. 5a and 6a are identical to Figs. 4a and Figs. 5b and 6b are identical to Figs. 4b.
However, the transient signal of Fig. 5c which representε the energy of the corresponding speech signal is obtained from the bandpaεε filtered pretransient signal in Fig. 5b by way of a rectification and by using a refractoriness period of l ms. The signal of Fig. 5c haε not been εubject to a lowpaεε filtration. It iε preferred that the implementation of the refractorineεs period is performed by using a software algorithm which is described below in connection with Fig. 12.
From Fig. 5c it may be observed that notches of the pulεeε in Fig. 4c are εmoothed away, thereby reεulting in a εignal having fewer local pulses, which may be more eaεily iden¬ tified.
Fig. 6c εhowε a tranεient εignal which repreεents the energy of the corresponding speech signal and which is obtained by performing a lowpasε filtration on the εignal of Fig. 5c. All the εignalε of Figε. 4 and 5 hold the εpeech information and may eaεily be perceived by a human ear, although εome noiεe iε introduced during the proceεε of tranεient detection resulting in the signalε of Figε. 5 c and 6c. In Figε. 4, 5 and 6 the abεciεεaε repreεent a time interval of 50 mε.
The refractorineεε period may be about 0.5 mε or longer but preferably leεε than the minimum pitch period, that meanε less than about 3.3 ms.
It has been recognized by the inventor that the shape of the leading edge may be one of the important featureε for repre¬ εenting a sound picture, and the maximum slope of the leading edge may be an important feature for the edge. Thus, the maximum slope of the leading edge may be the basiε for detec- ting the important featureε for identifying or repreεenting a diεtinct εound picture.
In Fig. 7 the abεciεsa representε a time interval of 50 mε, and the εignals of Figε. 7a, b and c correspond to the sig- nalε of Figε. 6a, b and c, whereaε in Fig. 7d the differenti¬ ated εignal of the signal of Fig. 7c, called differential signal, is shown. To be accepted as a leading edge the maxi¬ mum slope has to be greater than a predetermined minimum value, called dem. The size of dem may depend on how the signal is normalised.
The signals of Figε. 2-7 are normaliεed to the maximum nu¬ merical value in the whole εignal, and dem iε preferably selected to 2.5% of the maximum detected slope value. In syεtemε with automatic gain control (AGC) dem may be εelected otherwise, and preferably higher.
The maximum slope may be detected by finding a maximum greater than the threshold dem and select this aε a candidate to be the maximum εlope of a leading edge, called dm. If there iε a greater maximum εlope for a given εearch time, ts, then chooεe thiε point aε having the maximum εlope of a leading edge, else choose the candidate. The search time ts may be selected to be lesε than the minimum pitch period which means lesε than about 3.3 mε, but preferably around 2 mε.
The following leading edge may be detected aε illuεtrated in Fig. 7d. When the point for the maximum εlope for a leading edge iε detected, then for a time period, tep, only a maximum εlope greater than the previouε maximum εlope will be accepted, in other wordε, in thiε time period the threεhold for accepting a leading edge iε equal to the previous maximum εlope. After the time period tep the threεhold may be expo¬ nential decreaεed with a time conεtant tc, which iε also illustrated in Fig. 7d. The time period for tep may be less than the minimum pitch period, that mean less than about 3.3 ms, but preferably between 1-2 mε. However, tep should be longer than or equal to the search time ts.
The edge of a leading edge may be described as beginning at a point in time, tb, where the slope has the maximum slope, or a point in time before the point with the maximum slope, where the slope has reached a threshold value, db, having a predetermined ratio to the maximum slope, and ending at the point, te, after the point with the maximum slope, where the εlope haε decreased to a threshold value, de, having a prede- termined ratio to the maximum slope. This principle is il¬ lustrated in Fig. 8, where the amplitude of the leading edge is εhown aε A in Fig. 8a, and the differential of the leading edge iε εhown aε D in Fig. 8b.
In Figε. 9 and 10 an edge detection following the above defined edge detector principles iε illuεtrated . The εpeech εignal iε the word "softkey" pronounced by a female. The absciεεaε in Fig. 9 repreεent a time interval of 1 ε, while a time interval of 50 mε of the signalε in Fig. 9 iε repre¬ sented in Fig. 10, in which time interval the signalε for the vowel "i" in the word key are εhown. The tranεient signal of Figs. 9c and 10c has been processed in accordance with the signal presented in Fig. 6c, and a leading edge signal named edge εignal, see Figs. 9d and lOd, has been obtained by determining the rise time of selected leading edges.
Below the edge εignal in Figs. 9d and lOd a graph of the pitch period between the selected edges is shown, Figε. 9e and lOe. If the pitch period iε longer than 15 ms it iε set equal to 15 ms. A low resolution iε obtained in the printout of Fig. 9d due to a limited printer resolution.
The transient signal detector TSD2 is used when procesεing the εignalε of Figε. 9 and 10. The maximum slopes of pulseε in the tranεient εignal, Figε. 9c and 10c, are determined, and for the selected leading edges the starting point in time, tb, of the edge is set equal to the point in time where the maximum εlope is detected, i.e. db is equal to dm, and te is equal to the point in time where the εlope haε decreased to 70% of dm, i.e. de is equal to 70 % of dm. The part of the leading edge of a pulse in the transient signal corresponding to the time interval of tb to te is repreεented aε the lead¬ ing edge of a pulεe in the edge signal, Figs. 9d and lOd. The edge signal holds the full speech information and may easily be perceived by a human ear, although some noise may be introduced during the procesεing.
The leading edge may be defined aε beginning at a leading threshold value, db, greater than 50 % of the maximum slope, but preferably equal to the maximum εlope, and ending at a lagging threεhold value, de, greater than 50 % of the maximum εlope, but preferably 70% of the maximum εlope.
The rise time of the leading edge may be defined aε the time period between tb and te, and may in a preferred embodiment be used as representing a measure for the εhape of the lead¬ ing edge, and thuε forming the baεiε for identification of a diεtinct sound picture. However, as illuεtrated in Figε. 9 and 10, the pulses of the edge signal may alεo be choεen aε the baεiε for identification of a diεtinct εound picture.
From Figε. lOd and lOe it can be εeen that the edge detector can be uεed aε a pitch detector, but known techniqueε for pitch detection can alεo be applied.
Aε the εhape of the leading edge iε a feature that can be perceived by a human ear aε repreεenting a distinct εound picture, the εhape of the leading edge of a εpeech εignal, which εignal may be a phoneme, may be considered a conclusive feature for narrow band communication. Therefore, only infor- mation about the leading edge, unvoiced or voiced, and/or pitch period, and/or loudnesε of the speech signal should need to be transmitted. Thuε, it εhould not be neceεεary to tranεmit information concerning the vocal filter, thereby εaving bandwidth. Information about a εpeech signal being unvoiced or voiced, and/or the pitch period and/or loudnesε of the speech signal may be compressed and decompresεed by meanε of known tech¬ nology, in which εpeech εignals are framed in time periods of 20-40 ms, and only the change in the parameters need to be tranεmitted. The leading edge may be compressed by identify¬ ing and representing the edge according to one of the embodi¬ ments of the present invention, for time frames of 20-40 mε by meanε of a template identification from a library or a book. The speech signal may be decompresεed by meanε of a library or book of edge templateε with corresponding standard filters, which filters should be excited by the edge tem¬ plate. Otherwise the speech signal may be decompressed by meanε of a library or book, with εtandard wave formε iden- tified by means of the edge template identification.
As an example Fig. 11 shows the edge signal of Fig. lOd filtered with the same bandpasε filter uεed for processing the pretransient εignal, Fig. 10b, i.e. the centre frequency iε about 2.8 kHz with a lower cutoff frequency about 2.1 kHz and an upper cutoff frequency about 3.5 kHz. The sound quali¬ ty of the signal represented in Fig. 11 is improved when compared to the εignal of Fig. lOd. The εignal of Fig. 11 may be compared with the pretranεient εignal of Fig. 10b. If the filtered edge εignal should look and sound more like the original speech εignal alεo containing lower frequencieε, the edge εignal may be proceεεed by meanε of a filter with another filter characteriεtic or by means of waveform de¬ coding.
Refractorineεε period proceεsing-
Fig. 12 εhowε a preferred embodiment of implementation of the refractorineεε period. The definitionε of the flow chart variables of the procesε of Fig. 12 are given aε followε:
RfrPr refractoriness period in numbers of samples.
Rfr number of samples left of the refractorineεε. Si(n) input εample εignal.
So(n) proceεεed output signal.
PrvSi value of previous input εample (Si (n-l), n > 0) .
Smax maximum value for the local leading edge.
nMax total number of samples to be processed. LeadingEdge a Boolean variable,- it is true if the sample is in a leading edge or in a refractorinesε period, elεe it iε false.
After initialisation of the proceεε and εelection of the firεt input εignal, the sequence of the procesε is as fol- lows:
Iε the input εample greater than the previouε input εample or is the LeadingEdge true?
If no: The sample is in a lagging edge. So(n)=Si(n), PrvSi=Si (n) and Smax=Si(n). Are there more sampleε then go to the beginning, elεe end.
If yeε: Is then Si(n) greater than Smax or Rfr equal to 0?
If yes: The sample is in a leading edge and the procesε is tracing the leading edge. Rfr=RfrPr, So(n)=Si(n), Smax=Si (n) , PrvSi=Si (n) and
LeadingEdge=true. Are there more εampleε then go to the beginning, elεe end.
If no: The εample is in a refractorinesε period and the procesε iε holding the output at the maximum value. So(n)=Smax, PrvSi=Si (n) and Rfr iε decreaεed with one. Iε Rfr equal to 0?
If yeε: The refractorinesε period iε finished and LeadingEdge=false. Are there more sampleε then go to the beginning, elεe end.
If no: If Rfr iε not equal to 0 the next εample iε also in the refractorinesε period. Are there more εamples then go to the beginning, else end.
Principles of detection of a leading edge
Fig. 13a εhowε a preferred embodiment of implementation of the edge detection principle.
The definitionε of the flow chart variables of the procesε of Fig. 13a are given as follows:
d differentiated transient signal (Differential signal) . n :Index for εampleε of the differential εignal. dprv :A help variable and mostly the previous sample of the differential εignal. dem :Relative minimum threεhold for the differential signal. dm :Maximum slope for the edge. ts :Search time in samples for the greatest local maximum of the slope greater than dm. tm :Sample no. for the detected maximum slope k :Index for the detected edge.
The process iε executed for n equal to 0 to n leεε or equal to the number of εamples of the signal. After initialisation of the process, the sequence of the process is as follows:
Search for the next d greater than the minimum threshold dem.
Is d greater than the previous εample dprv?
If ye : Set dprv equal to d(n) . Get next d and compare again.
If no: dprv iε a local maximum. Set dm equal to dprv.
Search for the greatest maximum of d for the next ts sampleε greater than dm.
Set the local counterε i and t equal to zero.
Begin the εearch.
Is d(n+i) greater than dprv?
If no: Increase i and compare again.
If yes: Set dprv equal to d(n+i) and t equal to i. Increase i and compare again.
When the εearch iε completed.
Iε dprv greater than dm?
If no: dm is the maximum slope of the edge. Set the sample no. for the maximum slope tm(k) equal to n. Increaεe k and go to εtep 2.
If yeε: dprv iε the maximum εlope of intereεt. Set dm equal to dprv and tm(k) equal to n + t. Increaεe k and go to εtep 2. Fig. 13b εhows a preferred embodiment of implementation of detection of the beginning and the end of the edge.
The definitions of the flow chart variables of the process of Fig. 13b are given as follows:
:Threshold value for the slope at the beginning of the edge.
:Threshold value for the slope at the end of the edge. thr Predetermined ratio of threεhold value for the εlope at the beginning of the edge db to the maximum εlope dm. thrc Predetermined ratio of threεhold value for the εlope at the end of the edge de to the maximum slope dm.
:Sampling no. for the beginning of the edge. :Sampling no. for the end of the edge.
-ep :Minimum pitch period between edges. :Time conεtant for calculating the threεhold for accepting an edge.
The εequence of the proceεε iε aε followε:
Calculate db and de from thrb, thre and dm.
Search for the first sample of d lesε than or equal to db in the previous εampleε of d, and εelect thiε aε a candidate for the beginning εample of the edge.
Search for the firεt sample of d lesε than or equal to de in the following samples of d, and select thiε aε a candidate for the end of the edge.
Iε the time period in εampleε from the previouε edge lesε than the minimum pitch period tep? If yeε: Iε the εlope greater than the εlope for previous edge?
If yes: Then accept the edge and go to search for the next edge.
If no: Then ignore the edge and go to search for the next edge.
If no: Is the maximum slope greater than the threshold calculated from the maximum slope for the previous edge and the period of time in sampleε between the edges?
If yes: Then accept the edge and go to search for the next edge.
If no: Then ignore the edge and go to search for the next edge.
Personε εkilled in the art will recognize other wayε of implementing the functionε of Figε. 12 and 13.
The upper plot of Fig. 15 illuεtrateε the phenomena already mentioned that if the riεe time of a pulεe provided aε an input to a filter iε faεter than the riεe time of the impulεe response of the filter then, the rise time of the output of the filter generated in response to the input pulse will be εubεtantially equal to the riεe time of the impulse response of the filter.
Likewise, the lower plot of Fig. 15 illustrateε that if the riεe time of a pulεe provided as an input to a filter is slower than the rise time of the impulεe reεponεe of the filter then, the rise time of the output of the filter generated in response to the input pulse will be subεtantially equal to the rise time of the input pulse. Signal processing of sound signalε in the cochlea may be simulated by a filter bank comprising a εet of bandpass filters with different centre frequencies and wherein the bandwidths of these filters increase with increasing centre frequencies which again means that the riεe times of the impulse responses of the filters increase with increasing centre frequencies. Fig. 14 is a plot of the bandwidths of cochlea bandpaεε filterε aε a function of centre frequency from Finn Agerkviεt, "Time-frequency analyεiε and auditory models" Ph. D. thesis, Technical University of Denmark, 1994.
When a pulse with a specific rise time is provided aε an input to the filter bank, the riεe time of an output pulse generated by a corresponding filter of the filter bank will be εubstantially equal to the riεe time of the impulεe reεponεe of the filter when the riεe time of the input pulse is faster than the riεe time of the impulεe reεponεe of the filter and εubεtantially equal to the riεe time of the input pulεe when the riεe time of the input pulεe iε εlower than the rise time of the impulse response of the filter. Thus, the rise time of the input pulεe may be determined by determination of the two filters A and B of the filter bank having the narrowest bandwidths of the filters generating output pulεeε in reεponse to the input pulse with subεtantially identical rise times aε the rise time of the input pulse must be within the rise time range between the rise time of the impulse response of the filter A, B with the narrowest bandwidth and the riεe time of the impulεe responεe of the filter with the largest bandwidth that iε alεo lower than the bandwidthε of the filterε A, B.
This rise time detection principle may be utilized by the auditory organs and this could explain why the bandwidths of the filterε εimulating cochlea εound processing are increasing with increaεing centre frequencieε. Correspondingly, speech signalε may be generated by modulation of pulεeε in a filter that modulateε the εhape of the pulεeε aε deεcribed above. Pulseε to be modulated correspond to sound signals generated in the articulation channel, e.g. by the vocal chord, and the processing in the filters correspondε to the modulation performed by adjuεtment of the articulation channel according to the phoneme proceεεed whereby the filters modulate the shape of the pulseε. Preferably, the time between pulεeε to be modulated εhould εufficiently long to ensure that there iε no interference between output pulεeε generated in response to different input pulses.
For speech recognition the shape of the leading edge and the rise time may both be conclusive features. According to the preεent invention the leading edge may be detected aε deεcribed above, and in a preferred embodiment the edge detection iε baεed on a transient signal procesεed with a refractorineεε period either without a lowpass filtering as εhown in Fig. 5, or with a lowpaεε filter aε εhown in Fig. 6.
A phoneme may be identified by meanε of featureε, such as a claεεification of the shape of the leading edges, mean pitch period, variation of pitch periods, and/or dynamic trend of the edge height in a time frame of 10-100 ms.
The preεent invention iε preferably implemented utilizing a programmed proceεεor εuch aε a microcomputer for real time applications but this iε not to be limiting. The preεent invention may alεo be implemented uεing a dedicated hardware proceεεor if deεired or by a more powerful mainframe computer without departing from the preεent invention.
The performance of the method and syεtem of the preεent invention iε deεcribed in the time domain. It iε however to be underεtood that the tranεient εignalε, componentε and/or pulεeε being deεcribed in the time domain could alεo be given a corresponding description in the frequency domain, which would naturally be within the scope of the invention.
It is also to be noted that the methods of signal procesεing deεcribed above could be performed either digitally, elec¬ tronically by uεe of analog componentε, mechanically, or by any combination thereof. Such methodε of proceεεing would alεo be within the εcope of the invention. Thoεe skilled in the art will appreciate that many variations of the implemen¬ tation of the present invention are posεible.

Claims

1. A method of processing an auditory signal to facilitate identification of abrupt energy changes within the auditory εignal, which abrupt energy changes have a riεe time of at the moεt 3 ms, and which abrupt energy changes can be per¬ ceived by an animal ear such as a human ear as representing a distinct sound picture, said method comprising
1) deriving, from the auditory signal, a first signal comprising transient pulseε correεponding to at leaεt part of the abrupt energy changeε,
2) tracing or monitoring pulεeε in the firεt tranεient εignal,
3) determining local maxima of the tranεient pulεeε, and
4) generating a εecond tranεient signal wherein the value of at least one determined local maximum of a pulse in the first transient signal iε hold at εaid maximum value for a predetermined period of time trfpr thereby generating a correεponding pulse in the second transient signal, said predetermined period of time trfpr being of at the moεt 5 ms.
2. A method according to claim 1, wherein, if pulεeε in a train of two or more εuccessive pulεeε are εubjected to the holding procedure in step 4) in claim 1, and one or more of the pulses is/are located at a diεtance in time from a pre- ceding pulεe which iε εhorter than the predetermined period of time and haε/have a local maximum greater than the local maximum of εaid preceding pulεe, the hold of the local maxi¬ mum of εaid preceding pulεe iε maintained until the occur¬ rence of the εubεequent, greater local maximum and iε replaced by εaid εubεequent, greater local maximum.
3. A method according to claim 1 or 2, wherein the predeter¬ mined period of time trfpr is shorter than or equal to 3 ms, or preferably shorter than or equal to 2 ms.
4. A method according to claim 1 or 2, wherein the predeter- mined period of time trfpr iε εhorter than or equal to 1 mε, preferably about 0,7 mε.
5. A method according to any of the preceding claimε, wherein the εhape of pulses in the second transient pulse signal is determined, and/or one or more distinct sound pictures is/are identified from the determined εhape.
6. A method according to claim 5, wherein the diεtance in time between succeeding leading edges of pulses in the second transient pulse signal is determined, and the frequency of the distinct sound picture iε identified from the measured distance.
7. A method for selecting leading edges of transient pulεes in a transient εignal, εaid tranεient εignal being derived from an auditory signal having abrupt energy changes with a rise time of at the most 3 mε, and which abrupt energy changeε can be perceived by an animal ear such as a human ear as representing a diεtinct sound picture, the method comprising
1) determining or measuring the maximum slope of a leading edge of a pulse in the transient signal,
2) comparing the obtained maximum slope with a predetermined lower threshold value for maximum slopeε of leading edges, and
3) if the obtained maximum slope is equal to or greater than the predetermined lower threshold value, selecting said leading edge aε a candidate to the leading edge of a pulse.
8. A method according to claim 7, wherein the transient signal compriseε one or more subsequent pulse or pulses, the leading edge or edgeε of which iε/are located within a diε¬ tance in time from the selected candidate, which distance in time iε shorter than a predetermined period of time, ts, of at the most 4 ms, said method further comprising
1) determining or measuring the maximum slope or slopeε of the leading edge or edgeε of said subεequent pulεe or pulεeε in the tranεient εignal,
2) comparing the obtained maximum εlope or εlopeε of the εubεequent leading edge or edgeε and the obtained maximum εlope of the εelected candidate with one another,
3) determining which of εaid leading edgeε haε the largeεt maximum slope, and
4) selecting the leading edge with the largest maximum slope as the leading edge of a first pulεe correεponding to an abrupt energy change repreεenting a diεtinct εound picture.
9. A method according to claim 8, wherein the predetermined period of time ts iε εhorter than or equal to 3,3 mε, or preferably εhorter than or equal to 2 ms, and even more preferably shorter than or equal to 1 ms .
10. A method according to any of the claims 7-9, wherein the leading edge of a εecond pulεe in the transient signal is selected, said method further comprising
1) determining or measuring the maximum εlope or εlopeε of the leading edge or edgeε of a pulεe or pulses in the transient signal εubεequent to the εelected leading edge of the firεt pulεe within a diεtance in time from the leading edge of the firεt pulεe which iε shorter than a predetermined period of time, tep, of at the most 4 mε, said time period tep being longer than or equal to the predetermined time period ts,
2) comparing the obtained maximum slope or slopes of the subsequent leading edge or edges with the obtained maximum slope of the leading edge of the first pulse, and
3) selecting the first leading edge with a maximum slope greater than a threshold slope equal to the maximum slope of the leading edge of the firεt pulεe aε the leading edge of a εecond pulse corresponding to an abrupt energy change repreεenting a diεtinct εound picture, or
4) if no maximum εlope greater than the maximum εlope of the leading edge of the firεt pulεe iε obtained within the time period tep, determining or measuring the maximum slope or εlopeε of one or more leading pulεe edgeε located at a diεtance in time from the leading edge of the firεt pulεe which iε longer than or equal to the predetermined period of time, tep,
5) reducing the required threεhold value of the maximum εlope below the maximum εlope of the leading edge of the firεt pulse, and
6) εelecting the firεt leading edge with a maximum εlope greater than the required threεhold value aε the leading edge of a εecond pulεe correεponding to an abrupt energy change repreεenting a diεtinct εound picture.
11. A method according to claim 10, wherein the required threεhold value iε decreaεed aε a function of time down to the predetermined lower threεhold value.
12. A method according to claim 11, wherein the required threshold value is decreased exponentially with a predeter¬ mined time constant tc.
13. A method according to any of the claimε 10-12, wherein the predetermined period of time tep iε shorter than or equal to 3,3 ms, or preferably shorter than or equal to 2 ms, and even more preferably shorter than or equal to 1 ms.
14. A method according to any of the claims 7-13, wherein the shape of a selected leading edge of a pulse is determined, and/or a distinct sound picture iε identified from the deter¬ mined shape.
15. A method according to claim 14, wherein the shape of a selected leading edge of a pulse is determined by the obtained maximum slope of said selected leading edge.
16. A method according to claim 14, wherein the shape of a selected leading edge of a pulse is determined by determining the rise time of the leading edge, εaid riεe time being determined aε the time period from tb to te, or by determining the εhape of the leading edge in the time period from tb to te, where tb is the point in time where the slope of the leading edge haε reached a threεhold value for the beginning of the edge, db, the ratio of εaid threεhold value db to the obtained maximum εlope being predetermined, and te is the point in time where the slope of the leading edge has decreased from the maximum value to a threshold value for the end of the edge, de, the ratio of said thres- hold value de to the obtained maximum εlope being predeter¬ mined.
17. A method according to claim 16, wherein the value of db is in the range of 30-100% of the obtained maximum slope, and the value of de iε in the range of 30-90% of the obtained maximum εlope.
18. A method according to claim 16 or 17, wherein the value of db is subεtantially equal to 50% of the obtained maximum εlope, and/or the value of de is substantially equal to 70% of the obtained maximum εlope.
19. A method according to claim 16 or 17, wherein the value of db iε εubεtantially equal to the obtained maximum εlope.
20. A method according to any of the claimε 7-19, wherein the diεtance in time between εelected succeeding leading edges of pulseε is determined, and the frequency of a distinct sound picture iε identified from the meaεured diεtance.
21. A method according to claim 7, wherein the tranεient signal from which the leading edge or edges is/are εelected iε a tranεient εignal generated in accordance with the εecond tranεient εignal in any of the claimε 1-4.
22. A method according to any of the claimε 8-20, wherein the tranεient εignal from which the leading edge or edgeε iε/are εelected iε a transient signal generated in accordance with the second tranεient εignal in any of the claimε 1-4, εaid time period trfpr being εhorter than the time period ts.
23. A method according to any of the preceding claimε, further compriεing determination of the riεe time of an input pulεe by
proviεion of the input pulse to the input of a filter bank comprising a set of bandpass filters, each bandpaεε filter of the εet having a centre frequency that iε different from the centre frequencieε of the other filterε of the εet and a bandwidth that is larger than or equal to the bandwidth of filters of the set with a lower centre frequency than the filter in question,
determination of the two filters of the set with the narrowest bandwidths of the filters that generate response pulses in responεe to the input pulεe with εubεtantially identical riεe timeε, and
utilization of the determined narroweεt bandwidthε for determination of the rise time of the input pulse.
24. A method for determination of the rise time of an input pulεe, compriεing
provision of the input pulse to the input of a filter bank comprising a εet of bandpass filters, each bandpaεε filter of the set having a centre frequency that is different from the centre frequencies of the other filters of the set and a bandwidth that is larger than or equal to the bandwidth of filters of the set with a lower centre frequency than the filter in question,
determination of the two filters of the set with the narrowest bandwidthε of the filterε that generate reεponεe pulεeε in responεe to the input pulse with εubεtantially identical riεe timeε, and
utilization of the determined narroweεt bandwidthε for determination of the riεe time of the input pulεe.
25. A method of generating εignalε εimulating εpeech, comprising proviεion of pulεeε correεponding to εound εignals generated in the articulation channel, e.g. by the vocal chord, to the input of a set of filterε, the processing in the set of filterε correεponding to modulation performed by adjuεtment of the articulation channel of a living being according to phonemeε processed, the set of filters modulating the shape of the pulseε.
26. A syεtem for processing an auditory signal to facilitate identification of abrupt energy changes within the auditory signal, which abrupt energy changes have a riεe time of at the moεt 3 ms, and which abrupt energy changes can be per¬ ceived by an animal ear such as a human ear as representing a distinct sound picture, said syεtem compriεing meanε for deriving, from the auditory εignal, a firεt εignal compriεing tranεient pulεeε correεponding to at least part of the abrupt energy changeε, and meanε for generating a εecond transient εignal from εaid firεt tranεient εignal, εaid εecond εignal generation meanε being adapted to hold the value of at leaεt one local maximum of a pulεe in the firεt tranεient εignal at said maximum value for a predetermined period of time, trfpr, thereby generating a corresponding pulse in the second transient signal, said predetermined period of time trfpr being of at the moεt 5 mε.
27. A εyεtem for proceεεing an auditory signal to reduce the bandwith of the signal with subεtantial retention of the information of the εignal, εaid auditory εignal having abrupt energy changeε with a riεe time of at the moεt 3 mε, εaid εyεtem compriεing meanε for deriving, from the auditory signal, a first εignal compriεing tranεient pulεeε correεponding to at leaεt part of the abrupt energy changeε, and meanε for generating a εecond transient signal from said firεt tranεient εignal, εaid εecond εignal generation means being adapted to hold the value of at least one local maximum of a pulse in the first tranεient signal at said maximum value for a predetermined period of time, trfpr, thereby generating a corresponding pulse in the εecond tranεient εignal, εaid predetermined period of time trfpr being of at the most 5 ms.
28. A εystem according to claim 26 or 27, wherein the pre¬ determined period of time trfpr is shorter than or equal to 3 ms, or preferably shorter than or equal to 2 ms.
29. A system according to claim 26 or 27, wherein the prede- termined period of time trfpr is shorter than or equal to 1 mε, preferably about 0,7 mε.
30. A εystem according to any of the claims 26-29, wherein the means for deriving the first tranεient signal comprise a bandpasε filter or a highpaεε filter.
31. A εyεtem according to any of the claimε 26-30, wherein the meanε for deriving the firεt tranεient signal further comprise rectification meanε, preferably one-way rectifica¬ tion meanε.
32. A εystem according to any of the claimε 26-31, wherein the meanε for generating the εecond tranεient εignal com¬ priεeε a lowpass filter.
33. A εyεtem according to any of the claimε 30-32, wherein the lower cutoff frequency of the bandpaεε or highpass filter is in the range of 800-3000 Hz, and/or the upper cutoff frequency of the bandpasε filter iε in the range between 2 and 7 kHz.
34. A εystem according to claim 32 or 33, wherein the cutoff frequency of the lowpasε filter iε in the range of 400-1200 Hz, pref.erably about 700 Hz.
35. A εyεtem according to claim 32 or 33, wherein the cutoff frequency of the lowpaεε filter iε in the range of 50-1500 Hz, preferably in the range of 300-1500 Hz.
36. A system according to any of the claims 26-35, wherein the means for deriving the firεt tranεient pulse and the means for generating the second transient pulse comprise a filter bank.
37. A system according to any of the claims 26-36, further comprising means for determine the εhape of pulses in the second transient εignal, and/or meanε for identifying one or more diεtinct εound pictureε from the determined shape.
38. A system for εelecting leading edgeε of tranεient pulεeε in a tranεient εignal, εaid tranεient εignal being derived from an auditory εignal having abrupt energy changeε with a rise time of at the most 3 mε, and which abrupt energy changeε can be perceived by an animal ear such as a human ear as representing a distinct sound picture, said system com¬ prising meanε for determining or measuring the maximum slope of a leading edge of a pulse in the transient signal, meanε for comparing the obtained maximum εlope with a predetermined lower threshold value for maximum slopeε of leading edgeε, and meanε for, baεed on the reεult of εaid compariεon, selec¬ ting a candidate to the leading edge of a pulse.
39. A syεtem according to claim 38, wherein εaid meanε for determining or meaεuring the maximum εlope of a leading edge of a pulεe are further adapted to determine or meaεure the maximum εlope or εlopeε of a leading edge or edges of one or more pulεes subsequent to the selected can¬ didate, said comparing means are further adapted for comparing the obtained maximum slope or εlopeε of the εubεequent lead¬ ing edge or edgeε and the obtained maximum slope of the selected candidate with one another, and said selecting meanε are further adapted for, based on the result of said comparison, selecting the leading edge with the largest maximum slope.
40. A system according to claim 38 or 39, further comprising means for determining the shape of selected leading edges of pulses, and/or meanε for identifying one or more diεtinct εound pictureε from the determined shape.
41. A system according to any of the claims 26-35, further comprising the means of any of the claims 37-40 for εelecting leading edgeε from the εecond tranεient εignal.
42. A εyεtem according to any of the claimε 26-41, further compriεing
a εet of filterε compriεing a set of bandpasε filters in which εet each bandpass filter has a centre frequency that is different from the centre frequencieε of the other filterε of the εet and a bandwidth that iε larger than or equal to the bandwidth of filterε of the εet with a lower centre frequency than the centre frequency of the filter in queεtion,
meanε for εelecting the two filters of the εet that have the narroweεt bandwidthε A, B of the filterε of the set that generate responεe pulεeε in reεponse to an input pulse provided to the input of the set of filters with εubεtantially identical riεe timeε, and
meanε for determination of the riεe time of the input pulse by utilization of the determined narrowest bandwidthε A, B.
43. A system for determination of the riεe time of an input pulεe, comprising
a set of filters comprising a set of bandpasε filterε in which εet each bandpaεε filter haε a centre frequency that iε different from the centre frequencies of the other filterε of the εet and a bandwidth that iε larger than or equal to the bandwidth of filters of the set with a lower centre frequency than the centre frequency of the filter in question,
means for selecting the two filters of the set that have the narrowest bandwidths A, B of the filters of the set that generate reεponse pulseε in responεe to the input pulse provided to the input of the set of filters with subεtantially identical riεe times, and
means for determination of the rise time of the input pulse by utilization of the determined narroweεt bandwidthε A, B.
44. A εyεtem for generating εignalε εimulating speech, comprising
means for generating pulses correεponding to εound εignalε generated in the articulation channel, e.g. by the vocal chord, and
a εet of filters, the pulseε being provided to the input of the set of filters and the proceεsing in the set of filterε correεponding to the modulation performed by adjuεtment of the articulation channel of a living being according to phonemes processed, the set of filters modulating the εhape of the pulεeε.
PCT/DK1996/000370 1995-09-05 1996-09-04 Method and system for processing auditory signals WO1997009712A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU67856/96A AU6785696A (en) 1995-09-05 1996-09-04 Method and system for processing auditory signals
EP96928357A EP0850472A2 (en) 1995-09-05 1996-09-04 Method and system for processing auditory signals

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DK97495 1995-09-05
DK0974/95 1995-09-05

Publications (3)

Publication Number Publication Date
WO1997009712A2 true WO1997009712A2 (en) 1997-03-13
WO1997009712A3 WO1997009712A3 (en) 1997-04-10
WO1997009712B1 WO1997009712B1 (en) 1997-05-22

Family

ID=8099600

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/DK1996/000370 WO1997009712A2 (en) 1995-09-05 1996-09-04 Method and system for processing auditory signals

Country Status (3)

Country Link
EP (1) EP0850472A2 (en)
AU (1) AU6785696A (en)
WO (1) WO1997009712A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002025997A1 (en) * 2000-09-20 2002-03-28 Leonhard Research A/S Quality control of electro-acoustic transducers
WO2002080618A1 (en) * 2001-03-30 2002-10-10 Leonhard Research A/S Noise suppression in measurement of a repetitive signal
EP1293961A1 (en) * 1998-03-13 2003-03-19 LEONHARD, Frank Uldall A signal processing method to analyse transients of a speech signal
WO2010086194A3 (en) * 2009-01-30 2011-09-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and computer program for manipulating an audio signal comprising a transient event

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3564142A (en) * 1967-08-03 1971-02-16 Ibm Method of multiplex speech synthesis
FR2274101A1 (en) * 1974-06-04 1976-01-02 Fuji Xerox Co Ltd VOICE RECOGNITION PROCESS AND DEVICE IMPLEMENTING THIS PROCESS
US4382164A (en) * 1980-01-25 1983-05-03 Bell Telephone Laboratories, Incorporated Signal stretcher for envelope generator
WO1994025958A2 (en) * 1993-04-22 1994-11-10 Frank Uldall Leonhard Method and system for detecting and generating transient conditions in auditory signals

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3564142A (en) * 1967-08-03 1971-02-16 Ibm Method of multiplex speech synthesis
FR2274101A1 (en) * 1974-06-04 1976-01-02 Fuji Xerox Co Ltd VOICE RECOGNITION PROCESS AND DEVICE IMPLEMENTING THIS PROCESS
US4382164A (en) * 1980-01-25 1983-05-03 Bell Telephone Laboratories, Incorporated Signal stretcher for envelope generator
WO1994025958A2 (en) * 1993-04-22 1994-11-10 Frank Uldall Leonhard Method and system for detecting and generating transient conditions in auditory signals

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DATABASE INSPEC INSTITUTE OF ELECTRICAL ENGINEERS, STEVENAGE, GB Inspec No. 5087386, KUMAR ET AL.: "Level crossing time interval circuit for micro-power analog VLSI auditory processing" XP002021408 & PROCEEDINGS OF 1995 IEEE WORKSHOP ON NEURAL NETWORKS FOR SIGNAL PROCESSING, 1 August 1995 - 2 September 1995, CAMBRIDGE, MA, US, pages 581-590, *
DATABASE INSPEC INSTITUTE OF ELECTRICAL ENGINEERS, STEVENAGE, GB Inspec No. 5230761, VILLA ET AL.: "New perspectives in auditory coding: bases for a new cochlear behavioural model" XP002021409 & INTERNATIONAL WORKSHOP ON ARTIFICIAL NEURAL NETWORKS, PROCEEDINGS OF IWANN 95, 7 June 1996 - 9 June 1995, MALAGA-TORREMOLINOS, ES, pages 121-129, *
IBM TECHNICAL DISCLOSURE BULLETIN, vol. 6, no. 7, December 1963, NEW YORK, US, pages 83-84, XP002021407 ANONYMOUS: "Speech Recognition System Using Formant Transient Detection. December 1963." *
PROCEEDINGS OF THE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE (NAECON), vol. 1, 21 - 25 May 1990, DAYTON, OH, US, pages 57-63, XP000301963 AHN ET AL.: "Cochlear modeling using a general purpose digital signal processor" *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1293961A1 (en) * 1998-03-13 2003-03-19 LEONHARD, Frank Uldall A signal processing method to analyse transients of a speech signal
WO2002025997A1 (en) * 2000-09-20 2002-03-28 Leonhard Research A/S Quality control of electro-acoustic transducers
WO2002080618A1 (en) * 2001-03-30 2002-10-10 Leonhard Research A/S Noise suppression in measurement of a repetitive signal
WO2010086194A3 (en) * 2009-01-30 2011-09-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and computer program for manipulating an audio signal comprising a transient event
US9230557B2 (en) 2009-01-30 2016-01-05 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus, method and computer program for manipulating an audio signal comprising a transient event

Also Published As

Publication number Publication date
EP0850472A2 (en) 1998-07-01
AU6785696A (en) 1997-03-27
WO1997009712A3 (en) 1997-04-10

Similar Documents

Publication Publication Date Title
US5884260A (en) Method and system for detecting and generating transient conditions in auditory signals
US3855416A (en) Method and apparatus for phonation analysis leading to valid truth/lie decisions by fundamental speech-energy weighted vibratto component assessment
US8488800B2 (en) Segmenting audio signals into auditory events
CA2448182C (en) Segmenting audio signals into auditory events
JP4763965B2 (en) Split audio signal into auditory events
AU2002252143A1 (en) Segmenting audio signals into auditory events
EP0182989B1 (en) Normalization of speech signals
WO1990011593A1 (en) Method and apparatus for speech analysis
US5960373A (en) Frequency analyzing method and apparatus and plural pitch frequencies detecting method and apparatus using the same
JPH0431898A (en) Voice/noise separating device
US5483617A (en) Elimination of feature distortions caused by analysis of waveforms
Smith A phoneme detector
WO1997009712A2 (en) Method and system for processing auditory signals
EP1293961B1 (en) A signal processing method to analyse transients of a speech signal
US4982433A (en) Speech analysis method
Kajita et al. Subband-autocorrelation analysis and its application for speech recognition
RU2174714C2 (en) Method for separating the basic tone
KR100359988B1 (en) real-time speaking rate conversion system
Niederjohn et al. Computer recognition of the continuant phonemes in connected English speech
WO1997009712B1 (en) Method and system for processing auditory signals
Kiukaanniemi et al. Long-term speech spectra: A computerized method of measurement and a comparative study of Finnish and English data
WO1993009531A1 (en) Processing of electrical and audio signals
JPS61126600A (en) Sound wave input processing
JPS61273599A (en) Voice recognition equipment
JPH0462598B2 (en)

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AL AM AT AT AU AZ BB BG BR BY CA CH CN CU CZ CZ DE DE DK DK EE EE ES FI FI GB GE HU IL IS JP KE KG KP KR KZ LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK TJ TM TR TT UA UG US UZ VN AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT

AK Designated states

Kind code of ref document: A3

Designated state(s): AL AM AT AT AU AZ BB BG BR BY CA CH CN CU CZ CZ DE DE DK DK EE EE ES FI FI GB GE HU IL IS JP KE KG KP KR KZ LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SK TJ TM TR TT UA UG US UZ VN AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1996928357

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1996928357

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase in:

Ref country code: CA

WWW Wipo information: withdrawn in national office

Ref document number: 1996928357

Country of ref document: EP

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载