US20170365271A1 - Automatic speech recognition de-reverberation - Google Patents
Automatic speech recognition de-reverberation Download PDFInfo
- Publication number
- US20170365271A1 US20170365271A1 US15/388,323 US201615388323A US2017365271A1 US 20170365271 A1 US20170365271 A1 US 20170365271A1 US 201615388323 A US201615388323 A US 201615388323A US 2017365271 A1 US2017365271 A1 US 2017365271A1
- Authority
- US
- United States
- Prior art keywords
- audio stream
- filter
- gwpe
- reverberation
- term
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 claims abstract description 64
- 239000000872 buffer Substances 0.000 claims description 21
- 230000003139 buffering effect Effects 0.000 claims description 10
- 230000005236 sound signal Effects 0.000 description 31
- 239000011159 matrix material Substances 0.000 description 30
- 238000012545 processing Methods 0.000 description 28
- 239000013598 vector Substances 0.000 description 17
- 230000015654 memory Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 2
- 230000005291 magnetic effect Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 235000014121 butter Nutrition 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 239000012212 insulator Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000003071 parasitic effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 239000012224 working solution Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/02—Casings; Cabinets ; Supports therefor; Mountings therein
- H04R1/04—Structural association of microphone with electric circuitry therefor
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0324—Details of processing therefor
- G10L21/034—Automatic adjustment
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
- H04R1/406—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R31/00—Apparatus or processes specially adapted for the manufacture of transducers or diaphragms therefor
- H04R31/006—Interconnection of transducer parts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/22—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired frequency characteristic only
- H04R1/28—Transducer mountings or enclosures modified by provision of mechanical or acoustic impedances, e.g. resonator, damping means
- H04R1/2869—Reduction of undesired resonances, i.e. standing waves within enclosure, or of undesired vibrations, i.e. of the enclosure itself
- H04R1/2876—Reduction of undesired resonances, i.e. standing waves within enclosure, or of undesired vibrations, i.e. of the enclosure itself by means of damping material, e.g. as cladding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2201/00—Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
- H04R2201/40—Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups
- H04R2201/401—2D or 3D arrays of transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2201/00—Details of transducers, loudspeakers or microphones covered by H04R1/00 but not provided for in any of its subgroups
- H04R2201/40—Details of arrangements for obtaining desired directional characteristic by combining a number of identical transducers covered by H04R1/40 but not provided for in any of its subgroups
- H04R2201/403—Linear arrays of transducers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2420/00—Details of connection covered by H04R, not provided for in its groups
- H04R2420/07—Applications of wireless loudspeakers or wireless microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
Definitions
- Embodiments described herein generally relate to automatic speech recognition (ASR) and more specifically to ASR de-reverberation.
- ASR automatic speech recognition
- ASR involves a machine-based collection of techniques to understand human languages.
- ASR is interdisciplinary, often involving microphone, analog to digital conversion, frequency processing, database, and artificial intelligence technologies to convert the spoken word into textual or machine readable representations of not only what said (e.g., a transcript) but also what was meant (e.g., semantic understanding) by a human speaker.
- Far field ASR involves techniques to decrease a word error rate (WER) in utterances made a greater distance to a microphone, or microphone array, than traditionally accounted for in ASR processing pipelines. Such distance often decreases the signal to noise ratio (SNR) and thus increases WER in traditional ASR systems.
- WER word error rate
- SNR signal to noise ratio
- far field ASR involves distances more than half meter from the microphone.
- FIG. 1 is an example of a smart home gateway housing, according to an embodiment.
- FIG. 2 is a block diagram of an example high accuracy (HA) real-time de-reverberation device, according to an embodiment.
- HA high accuracy
- FIG. 3 is a block diagram of an example low latency (II) real-time de-reverberation device, according to an embodiment.
- II low latency
- FIG. 4 is a block diagram of an example low complexity and high accuracy (LH) real-time de-reverberation device, according to an embodiment.
- LH low complexity and high accuracy
- FIG. 5 is an example of a method for automatic speech recognition de-reverberation, according to an embodiment.
- FIG. 6 is a block diagram illustrating an example of a machine upon which one or more embodiments may be implemented.
- Embodiments and examples herein general described a number of systems, devices, and techniques for ASR de-reverberation. It is understood, however, that the systems, devices, and techniques are examples illustrating the underlying concepts.
- ASR Automatic speech recognition
- WER ASR word error rate
- Reverb removal is not a trivial task because its characteristics depend on many factors, such as reverberation time (RT), distance between the sound source (e.g., user) and the microphone array, microphone array/device position, among other factors. Accordingly, even if a reverberation characteristic is known at one point in a room, for example, the characteristics may not be helpful to remove reverberation when a signal is captured in a different location (even if the location changed only a few millimeters).
- RT reverberation time
- distance between the sound source e.g., user
- microphone array microphone array/device position
- GWPE generalized weighted prediction error
- RTF real-time factor
- Examples are able to run in real-time with, for example, an eight microphone array (such as that illustrated in FIG. 1 ). Some of these techniques improve WER at the five meter distance from 17.2% to 9.3% in clean conditions (e.g., without noises) and from 51.5% to 33.2% in noisy conditions. Removing reverberation helps improve ASR engine performance, beam-former techniques, and estimation of signal's source position, thus generally increasing the performance of ASR systems.
- FIG. 1 is an example of a device 100 including a smart home gateway housing 105 , according to an embodiment. As illustrated, the circles atop the housing are lumens 110 behind which are housed microphones (as illustrated there are eight microphones). The dashed lines illustrate microphones in a linear arrangement 115 as well as in a circular arrangement 120 .
- the device 100 includes a sampler 125 , a signal processor 130 , a multiplexer 140 , and an interlink 145 . In an example, the device includes a data store 135 to provide one or more buffers. All of these components are implemented in electronic hardware, such as that described below e.g., circuits).
- the sampler 125 is arranged to obtain a portion of an audio stream.
- the portion of the audio stream being a proper subset of the audio stream.
- the sampler 125 may receive the portion or may retrieve the portion from, for example, a microphone.
- obtaining the portion of the audio stream includes buffering the audio stream for a fixed time period.
- the fixed time period i.e. buffer length
- the fixed time period is an audio frame.
- the audio frame length is thirty two milliseconds. The sampler 125 thus operates to obtain discrete audio samples.
- the signal processor 130 is arranged to create a filter by applying GWPE to the portion of the audio stream.
- the signal processor 130 is arranged to calculate the frequency of domain data by applying an overlap and add (OLA) procedure combined with a Fast Fourier Transform (FFT) prior to creating the filter.
- OLA overlap and add
- FFT Fast Fourier Transform
- a description of GWPE and its application in a number of examples is provided below with respect to FIGS. 2-4 .
- GWPE is applied only to the portion obtained by the sampler 125 (e.g., a buffered signal). Reducing the size of the buffered segment of the signal reduces the computational complexity of traditional GWPE while still maintaining enough de-reverberation performance to benefit ASR processing.
- creating the filter occurs in a first pipeline and applying the filter occurs in a second pipeline.
- the first and second pipelines are arranged to execute in parallel on the device 100 .
- Parallel execution here means actual simultaneous (e.g., not sequential as in a single core processor) execution of the pipelines.
- the signal processor 130 is also arranged to repeatedly create and apply the filter with a subsequent fixed time period (e.g., buttered portions of signal).
- the signal processor 130 creates and applies filters for respective samples obtained by the sampler 125 .
- creating the filter includes the signal processor 130 to combine a current GWPE application to the audio stream with a previously created filter.
- GWPE application refers to the result of applying GWPE to the audio stream
- combining the current GWPE application to the audio stream with a previously created filter includes adding the current GWPE application as a first term to the previously created filter as a second term.
- combining the current GWPE application to the audio stream with a previously created filter includes applying a first scaling factor to the first term and a second scaling factor to the second term prior to the adding.
- the second scaling factor is between zero and one.
- the first scaling factor is one minus the second scaling factor.
- the data store 135 provides a buffer to introduce a delay to the audio stream prior to applying the filter. This delay permits the processing buffer to fill with the required amount of signal. In an example, the delay is higher than 40 milliseconds and depends inter alia on the processor compute power.
- the multiplexer 140 is arranged to apply the filter to the audio stream to remove reverberation.
- the multiplexer 140 accepts both the filter and the complex audio spectrum as inputs.
- the multiplexer 140 then applies the filter to the audio stream, calculates Inverse FFT(IFFT) and perform the OLA procedure to produce a signal with some (or all) of the reverberations removed.
- This signal may be referred to as a clean audio signal, filtered audio signal, or the like.
- the interlink 145 is arranged to provide a filtered version of the audio stream to an audio stream consumer.
- the interlink 145 interfaces the present pipeline to other processing as part of a far field, or other, ASR system.
- far field ASR may greatly benefit from the device 100
- the device 100 may also be helpful in other ASR situations, such as near field, to reduce reverberations.
- FIGS. 2-4 illustrate several examples of configurations for a variety of the operations described above. These examples describe changes to a traditional GWPE application to meet the real-time processing used by many current devices while still maintaining a high degree of reverberation removal for ASR processing applications.
- GWPE was originally designed in such a way that it required a whole recording of an utterance to properly design the de-reverberation filter. Thus, GWPE was not appropriate for real-time signal processing because processing must begin prior to an utterance's completion, denying the knowledge contained in the end of the recording that was used in the traditional GWPE implementation.
- the first example is here labeled high accuracy (HA)
- the second example is here labeled low latency (LL)
- LH low-complexity with high accuracy
- the first and third examples (HA and LH) estimate the de-reverberation filter in a parallel to the channel processing.
- the role of the main thread is different between HA and LH.
- LL performs all calculations in a single (e.g., main) thread but uses estimates of the signal statistics instead of using the signal's actual (e.g., real) values.
- GWPE operates in the frequency domain. GWPE operates on a signal with sample rate of 16 kHz and uses 32 ms frames with an 8 ms shift. GWPE treats every frequency bin independently. Thus, GWPE processes 257 independent frequency bins. As used herein, a frequency bin is represented by/and a number of the frame (e.g., index of the frame in a sample) by t. GWPE takes as an input of M channels and provides M channels of output. GWPE is a blind de-reverberation technique because all of the statistics needed by GWPE are obtained directly from the input signal. The GWPE de-reverberation operation is defined by the following equation:
- ⁇ * l ( ⁇ ) may be estimated using the following:
- every filter update involves solving 257 linear equation systems, each of size 32 by 32 (R) (matrix) and 32 by 8 (r) (vector). From the perspective of signal quality it is important to update filter frequently enough to keep up (e.g., align) with reverberation characteristic changes.
- GWPE updates the filter for an entire utterance.
- GWPE is inappropriate for working real-time working solutions due to:
- FIG. 2 is a block diagram of an HA real-time de-reverberation device 205 , according to an embodiment.
- HA de-reverberation uses a short time span for filter estimation.
- the time for estimation (T_est) may be equal to one second (1000 ms)
- the filter is updated every T_est ms.
- the elements of the HA device 205 include a main thread 215 and a parallel thread 210 .
- the main thread 215 includes a storage device 235 , a signal delay buffer 240 , and a switcher 245 .
- the parallel thread 210 includes a signal statistics calculator 220 , a filter calculator 225 , and a reverberation removal block 230 .
- These components are all implemented in electronic hardware (e.g., circuits). The operations of the components proceed as follows:
- the processing proceeds with the latest (e.g., last) GWPE de-reverberation without imposing a delay for the entire utterance.
- An advantage of this technique includes its high accuracy of reverberation reduction.
- the delay introduced is:
- T_est is the time span used for the filter estimation and the RTF obtained from the GWPE implementation on the sample.
- RTF approached 0.3 for this example.
- FIG. 3 is a block diagram of an example LL real-time de-reverberation device 305 , according to an embodiment.
- This additional example is here called the low-latency real-time de-reverberation. It uses an estimate of a weighted correlation matrix and vector ( ⁇ circumflex over (R) ⁇ l and ⁇ circumflex over ( ⁇ ) ⁇ l ) and does not wait for filter convergence.
- the device 305 includes a single thread 310 rather than the parallel threads described with respect to the I-IA real-time de-reverberation device 205 described above.
- the single thread includes a signal statistics estimation block 220 rather than the signal statistics calculation block described above, but otherwise also includes a filter calculator 320 and reverberation removal block 325 . All of illustrated processing elements are implemented in electronic hardware. Operations of the device 305 proceed as follows:
- ⁇ l ⁇ right arrow over (R) ⁇ l ⁇ 1 r l , f or t ⁇ T min
- ⁇ is a smoothing factor (typically in a range from 0.9 to 0.999) and T_min is an initialization time span (typically in a range from 300 to 500 milliseconds ms)).
- This technique may introduce delay caused by the fast Fourier transform (FFT) and the overlap-add (OLA) procedures. For example, the delay is 40 ms when the frame size equals 32 ms and a frame shift of 8 ms.
- FFT fast Fourier transform
- OOA overlap-add
- An additional benefit of this example includes lower memory requirements because calculating a new values of ⁇ right arrow over (R) ⁇ l and ⁇ circumflex over (r) ⁇ l requires only the last T signal frame to be buffered whereas the HA example may use T_est+T frames to be buffered.
- the filter is updated for every new frame, the precision of the de-reverberation filter may perform better than the HA example.
- a WER for clean speech of 10.5% for the HA example was achieved and a WER of 9.3% for the LL example was achieved.
- FIG. 4 is a block diagram of an example LH real-time de-reverberation device 405 , according to an embodiment.
- This example uses a similar foundation illustrated above between the threads of the HA example ( FIG. 2 ), but uses the LL de-reverberation procedure albeit in a parallel thread.
- the delay introduced by this example may be set arbitrarily depending on the available computing power.
- the device 405 includes a main thread 430 and a parallel filter estimation thread 410 .
- the main thread 430 includes a signal estimation block 435 which buffers results in a storage device 425 .
- the main thread includes another storage area 445 (which may be the same physical device as storage device 425 in an example) to buffer the audio signal for use by the reverberation removal block 420 of the parallel thread 410 .
- the main thread 430 may include a delay 440 block to delay the audio signal into the switcher 450 to allow the filter to be processed prior to outputting the filtered signal.
- the parallel thread 410 includes the filter calculator 415 similar to that of the LL example to operates on the signal statistic estimates to produce the de-reverberation filter. This then is provided to the reverberation removal block 420 to perform the filtering.
- FIG. 5 is an example of a method 500 for automatic speech recognition de-reverberation, according to an embodiment.
- the operations of the method 500 are implemented and executed upon electronic hardware, such as that described above and below (e.g., circuits).
- a portion of an audio stream is obtained.
- the portion of the audio stream is a proper subset of the audio stream.
- obtaining the portion of the audio stream includes buffering the audio stream for a fixed time period.
- the fixed time period is a second.
- the fixed time period is an audio frame.
- the audio frame length is thirty two milliseconds.
- the method 500 may be extended to include repeating (e.g., repeatedly) creating the filter with a subsequent fixed time period.
- a filter is created by applying GWPE to the portion of the audio stream.
- creating the filter occurs in a first pipeline and applying the filter occurs in a second pipeline.
- the first and second pipelines execute in parallel on a device.
- creating the filter includes combining a current GWPE application to the audio stream with a previously created filter.
- combining the current GWPE application to the audio stream with a previously created filter includes adding the current GWPE application as a first term to the previously created filter as a second term.
- combining the current GWPE application to the audio stream with a previously created filter includes applying a first scaling factor to the first term and a second scaling factor to the second term prior to the adding.
- the second scaling factor is between zero and one.
- the first scaling factor is one minus the second scaling factor.
- the filter is applied to the audio stream to remove reverberation from the audio stream to produce a filtered version of the audio stream.
- the filtered version of the audio stream is provided to an audio stream consumer.
- the operations of the method 500 may be optionally extended to include introducing a delay to the audio stream prior to applying the filter.
- the delay is 40 milliseconds.
- FIG. 6 illustrates a block diagram of an example machine 600 upon which any one or more of the techniques (e.g., methodologies) discussed herein may perform.
- the machine 600 may operate as a standalone device or may be connected (e.g., networked) to other machines.
- the machine 600 may operate in the capacity of a server machine, a client machine, or both in server-client network environments.
- the machine 600 may act as a peer machine in peer-to-peer (P2P) (or other distributed) network environment.
- P2P peer-to-peer
- the machine 600 may be a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a mobile telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions (sequential or otherwise) that specify actions to be taken by that machine.
- PC personal computer
- PDA personal digital assistant
- STB set-top box
- PDA personal digital assistant
- mobile telephone a web appliance
- network router network router, switch or bridge
- machine any machine capable of executing instructions (sequential or otherwise) that specify actions to be taken by that machine.
- machine shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein, such as cloud computing, software as a service (SaaS), other computer cluster configurations.
- SaaS software as a service
- Circuitry is a collection of circuits implemented in tangible entities that include hardware (e.g., simple circuits, gates, logic, etc.). Circuitry membership may be flexible over time and underlying hardware variability. Circuitries include members that may, alone or in combination, perform specified operations when operating. In an example, hardware of the circuitry may be immutably designed to carry out a specific operation (e.g., hardwired).
- the hardware of the circuitry may include variably connected physical components (e.g., execution units, transistors, simple circuits, etc.) including a computer readable medium physically modified (e.g., magnetically, electrically, moveable placement of invariant massed particles, etc.) to encode instructions of the specific operation.
- a computer readable medium physically modified (e.g., magnetically, electrically, moveable placement of invariant massed particles, etc.) to encode instructions of the specific operation.
- the instructions enable embedded hardware (e.g., the execution units or a loading mechanism) to create members of the circuitry in hardware via the variable connections to carry out portions of the specific operation when in operation.
- the computer readable medium is communicatively coupled to the other components of the circuitry when the device is operating.
- any of the physical components may be used in more than one member of more than one circuitry.
- execution units may be used in a first circuit of a first circuitry at one point in time and reused by a second circuit in the first circuitry, or by a third circuit in a second circuitry at a different time.
- Machine 600 may include a hardware processor 602 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a hardware processor core, or any combination thereof), a main memory 604 and a static memory 606 , some or all of which may communicate with each other via an interlink (e.g., bus) 608 .
- the machine 600 may further include a display unit 610 , an alphanumeric input device 612 (e.g., a keyboard), and a user interface (UI) navigation device 614 (e.g., a mouse).
- the display unit 610 , input device 612 and UI navigation device 614 may be a touch screen display.
- the machine 600 may additionally include a storage device (e.g., drive unit) 616 , a signal generation device 618 (e.g., a speaker), a network interface device 620 , and one or more sensors 621 , such as a global positioning system (UPS) sensor, compass, accelerometer, or other sensor.
- the machine 600 may include an output controller 628 , such as a serial (e.g., universal serial bus (USB), parallel, or other wired or wireless (e.g., infrared (IR), near field communication (NFC), etc) connection to communicate or control one or more peripheral devices (e.g., a printer, card reader, etc.).
- a serial e.g., universal serial bus (USB), parallel, or other wired or wireless (e.g., infrared (IR), near field communication (NFC), etc) connection to communicate or control one or more peripheral devices (e.g., a printer, card reader, etc.).
- USB universal serial bus
- NFC
- the storage device 616 may include a machine readable medium 622 on which is stored one or more sets of data structures or instructions 624 (e.g., software) embodying or utilized by any one or more of the techniques or functions described herein.
- the instructions 624 may also reside, completely or at least partially, within the main memory 604 , within static memory 606 , or within the hardware processor 602 during execution thereof by the machine 600 .
- one or any combination of the hardware processor 602 , the main memory 604 , the static memory 606 , or the storage device 616 may constitute machine readable media.
- machine readable medium 622 is illustrated as a single medium, the term “machine readable medium” may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) configured to store the one or more instructions 624 .
- machine readable medium may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) configured to store the one or more instructions 624 .
- machine readable medium may include any medium that is capable of storing, encoding, or carrying instructions for execution by the machine 600 and that cause the machine 600 to perform any one or more of the techniques of the present disclosure, or that is capable of storing, encoding or carrying data structures used by or associated with such instructions.
- Non-limiting machine readable medium examples may include solid-state memories, and optical and magnetic media.
- a massed machine readable medium comprises a machine readable medium with a plurality of particles having invariant (e.g., rest) mass. Accordingly, massed machine-readable media are not transitory propagating signals.
- massed machine readable media may include: non-volatile memory, such as semiconductor memory devices (e.g., Electrically Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM)) and flash memory devices; magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
- non-volatile memory such as semiconductor memory devices (e.g., Electrically Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM)) and flash memory devices
- EPROM Electrically Programmable Read-Only Memory
- EEPROM Electrically Erasable Programmable Read-Only Memory
- flash memory devices e.g., electrically Erasable Programmable Read-Only Memory (EEPROM)
- EPROM Electrically Programmable Read-Only Memory
- EEPROM Electrically Erasable Programmable Read-Only Memory
- flash memory devices e.g., electrical
- the instructions 624 may further be transmitted or received over a communications network 626 using a transmission medium via the network interface device 620 utilizing any one of a number of transfer protocols (e.g., frame relay, internet protocol (IP), transmission control protocol (TCP), user datagram protocol (UDP), hypertext transfer protocol (HTTP), etc.).
- transfer protocols e.g., frame relay, internet protocol (IP), transmission control protocol (TCP), user datagram protocol (UDP), hypertext transfer protocol (HTTP), etc.
- Example communication networks may include a local area network (LAN), a wide area network (WAN), a packet data network (e.g., the Internet), mobile telephone networks (e.g., cellular networks), Plain Old Telephone (POTS) networks, and wireless data networks (e.g., Institute of Electrical and Electronics Engineers (IEEE) 802.11 family of standards known as Wi-Fi®, IEEE 802.16 family of standards known as WiMax®), IEEE 802.15.4 family of standards, peer-to-peer (P2P) networks, among others.
- the network interface device 620 may include one or more physical jacks (e.g., Ethernet, coaxial, or phone jacks) or one or more antennas to connect to the communications network 626 .
- the network interface device 620 may include a plurality of antennas to wirelessly communicate using at least one of single-input multiple-output (SIMO), multiple-input multiple-output (MIMO), or multiple-input single-output (MISO) techniques.
- SIMO single-input multiple-output
- MIMO multiple-input multiple-output
- MISO multiple-input single-output
- transmission medium shall be taken to include any intangible medium that is capable of storing, encoding or carrying instructions for execution by the machine 600 , and includes digital or analog communications signals or other intangible medium to facilitate communication of such software.
- Example 1 is a system for automatic speech recognition de-reverberation, the system comprising: a sampler to obtain a portion of an audio stream, the portion of the audio stream being a proper subset of the audio stream; a signal processor to create a filter by applying Generalized Weighted Prediction Error (GWPE) to the portion of the audio stream; a multiplexer to apply the filter to the audio stream to remove reverberation; and an interlink to provide a filtered version of the audio stream to an audio stream consumer.
- GWPE Generalized Weighted Prediction Error
- Example 2 the subject matter of Example 1 optionally includes wherein the processor is in a first pipeline to create the filter and the multiplexer is in a second pipeline to apply the filter, the first and second pipelines arranged to execute in parallel.
- Example 3 the subject matter of any one or more of Examples 1-2 optionally include wherein, to obtain the portion of the audio stream, the sampler buffers the audio stream for a fixed time period.
- Example 4 the subject matter of Example 3 optionally includes wherein the fixed time period is a second.
- Example 5 the subject matter of any one or more of Examples 3-4 optionally include wherein the signal processor includes a loop to repetitively create the filter with subsequent fixed time periods.
- Example 6 the subject matter of any one or more of Examples 3-5 optionally include wherein the fixed time period is an audio frame.
- Example 7 the subject matter of Example 6 optionally includes wherein the audio frame length is thirty two milliseconds.
- Example 8 the subject matter of any one or more of Examples 1-7 optionally include wherein, to create the filter, the signal processor combines a current GWPE application to the audio stream with a previously created filter.
- Example 9 the subject matter of Example 8 optionally includes wherein, to combine the current GWPE application to the audio stream with a previously created filter, the signal processor adds the current GWPE application as a first term to the previously created filter as a second term.
- Example 10 the subject matter of Example 9 optionally includes wherein, to combine the current GWPE application to the audio stream with a previously created filter, the signal processor applies a first scaling factor to the first term and a second scaling factor to the second term prior to the adding.
- Example 11 the subject matter of Example 10 optionally includes wherein the second scaling factor is between zero and one and wherein the first scaling factor is one minus the second scaling factor.
- Example 12 the subject matter of any one or more of Examples 1-11 optionally include a buffer to introduce a delay to the audio stream prior to applying the filter.
- Example 13 the subject matter of Example 12 optionally includes wherein the delay is eight milliseconds.
- Example 14 is at least one machine readable medium including instructions for automatic speech recognition de-reverberation, the instructions, when executed by a machine, cause the machine to perform operations comprising: obtaining a portion of an audio stream, the portion of the audio stream being a proper subset of the audio stream; creating a filter by applying Generalized Weighted Prediction Error (GWPE) to the portion of the audio stream; applying the filter to the audio stream to remove reverberation; and providing a filtered version of the audio stream to an audio stream consumer.
- GWPE Generalized Weighted Prediction Error
- Example 15 the subject matter of Example 14 optionally includes wherein creating the filter occurs in a first pipeline and applying the filter occurs in a second pipeline, the first and second pipelines executing in parallel on a device.
- Example 16 the subject matter of any one or more of Examples 14-15 optionally include wherein obtaining the portion of the audio stream includes buffering the audio stream for a fixed time period.
- Example 17 the subject matter of Example 16 optionally includes wherein the fixed time period is a second.
- Example 18 the subject matter of any one or more of Examples 16-17 optionally include wherein the operations include repeating creating the filter with a subsequent fixed time period.
- Example 19 the subject matter of any one or more of Examples 16-18 optionally include wherein the fixed time period is an audio frame.
- Example 20 the subject matter of Example 19 optionally includes wherein the audio frame length is thirty two milliseconds.
- Example 21 the subject matter of any one or more of Examples 14-20 optionally include wherein creating the filter includes combining a current GWPE application to the audio stream with a previously created filter.
- Example 22 the subject matter of Example 21 optionally includes wherein combining the current GWPE application to the audio stream with a previously created filter includes adding the current GWPE application as a first term to the previously created filter as a second term.
- Example 23 the subject matter of Example 22 optionally includes wherein combining the current GWPE application to the audio stream with a previously created filter includes applying a first scaling factor to the first term and a second scaling factor to the second term prior to the adding.
- Example 24 the subject matter of Example 23 optionally includes wherein the second scaling factor is between zero and one and wherein the first scaling factor is one minus the second scaling factor.
- Example 25 the subject matter of any one or more of Examples 14-24 optionally include introducing a delay to the audio stream prior to applying the filter.
- Example 26 the subject matter of Example 25 optionally includes wherein the delay is eight milliseconds.
- Example 27 is a device for automatic speech recognition de-reverberation, the device comprising: means for obtaining a portion of an audio stream, the portion of the audio stream being a proper subset of the audio stream; means for creating a filter by applying Generalized Weighted Prediction Error (GWPE) to the portion of the audio stream; means for applying the filter to the audio stream to remove reverberation; and means for providing a filtered version of the audio stream to an audio stream consumer.
- GWPE Generalized Weighted Prediction Error
- Example 28 the subject matter of Example 27 optionally includes wherein the means for creating the filter occurs in a first pipeline and the means for applying the filter occurs in a second pipeline, the first and second pipelines executing in parallel on the device.
- Example 29 the subject matter of any one or more of Examples 27-28 optionally include wherein the means for obtaining the portion of the audio stream includes means for buffering the audio stream for a fixed time period.
- Example 30 the subject matter of Example 29 optionally includes wherein the fixed time period is a second.
- Example 31 the subject matter of any one or more of Examples 29-30 optionally include means for repeating creating the filter with a subsequent fixed time period.
- Example 32 the subject matter of any one or more of Examples 29-31 optionally include wherein the fixed time period is an audio frame.
- Example 33 the subject matter of Example 32 optionally includes wherein the audio frame length is thirty two milliseconds.
- Example 34 the subject matter of any one or more of Examples 27-33 optionally include wherein the means for creating the filter includes means for combining a current GWPE application to the audio stream with a previously created filter.
- Example 35 the subject matter of Example 34 optionally includes wherein the means for combining the current GWPE application to the audio stream with a previously created filter includes means for adding the current GWPE application as a first term to the previously created filter as a second term.
- Example 36 the subject matter of Example 35 optionally includes wherein the means for combining the current GWPE application to the audio stream with a previously created filter includes means for applying a first scaling factor to the first term and a second scaling factor to the second term prior to the adding.
- Example 37 the subject matter of Example 36 optionally includes wherein the second scaling factor is between zero and one and wherein the first scaling factor is one minus the second scaling factor.
- Example 38 the subject matter of any one or more of Examples 27-37 optionally include wherein the means for applying the filter to the audio stream includes means for introducing a delay to the audio stream prior to applying the filter.
- Example 39 the subject matter of Example 38 optionally includes wherein the delay is eight milliseconds.
- Example 40 is a method for automatic speech recognition de-reverberation, the method comprising: obtaining a portion of an audio stream; the portion of the audio stream being a proper subset of the audio stream; creating a filter by applying Generalized Weighted Prediction Error (GWPE) to the portion of the audio stream; applying the filter to the audio stream to remove reverberation; and providing a filtered version of the audio stream to an audio stream consumer.
- GWPE Generalized Weighted Prediction Error
- Example 41 the subject matter of Example 40 optionally includes wherein creating the filter occurs in a first pipeline and applying the filter occurs in a second pipeline; the first and second pipelines executing in parallel on a device.
- Example 42 the subject matter of any one or more of Examples 40-41 optionally include wherein obtaining the portion of the audio stream includes buffering the audio stream for a fixed time period.
- Example 43 the subject matter of Example 42 optionally includes wherein the fixed time period is a second.
- Example 44 the subject matter of any one or more of Examples 42-43 optionally include repeating creating the filter with a subsequent fixed time period.
- Example 45 the subject matter of any one or more of Examples 42-44 optionally include wherein the fixed time period is an audio frame.
- Example 46 the subject matter of Example 45 optionally includes wherein the audio frame length is thirty two milliseconds.
- Example 47 the subject matter of any one or more of Examples 40-46 optionally include wherein creating the filter includes combining a current GWPE application to the audio stream with a previously created filter.
- Example 48 the subject matter of Example 47 optionally includes wherein combining the current GWPE application to the audio stream with a previously created filter includes adding the current GWPE application as a first term to the previously created filter as a second term.
- Example 49 the subject matter of Example 48 optionally includes wherein combining the current GWPE application to the audio stream with a previously created filter includes applying a first scaling factor to the first term and a second scaling factor to the second term prior to the adding.
- Example 50 the subject matter of Example 49 optionally includes wherein the second scaling factor is between zero and one and wherein the first scaling factor is one minus the second scaling factor.
- Example 51 the subject matter of any one or more of Examples 40-50 optionally include introducing a delay to the audio stream prior to applying the filter.
- Example 52 the subject matter of Example 51 optionally includes wherein the delay is eight milliseconds.
- Example 53 is a system comprising means to perform any of the methods 40-52.
- Example 54 is at least one machine readable medium including instructions that, when executed by a machine, cause the machine to perform any of methods 40-52.
- Example 55 is at least one machine readable medium including instructions for de-reverberation of an audio signal, the instructions, when executed by a machine, causing the machine to perform operations comprising: performing Generalized Weighted Prediction Error (GWPE) in a first pipeline; and performing signal processing in a second pipeline, the second pipeline and first pipeline executing in parallel, the second pipeline applying the output of the first pipeline to remove reverberation in an audio signal processed by the second pipeline.
- GWPE Generalized Weighted Prediction Error
- Example 56 the subject matter of Example 55 optionally includes buffering the audio signal in a buffer; providing contents of the buffer every second to the first pipeline; and clearing the buffer after providing the contents.
- Example 57 the subject matter of any one or more of Examples 55-56 optionally include wherein the first pipeline includes iteratively: calculating signal statistics; calculating a de-reverb filter; and applying the de-reverb filter to remove reverberation.
- Example 58 is a method for de-reverberation of an audio signal, the method comprising: performing Generalized Weighted Prediction Error (GWPE) in a first pipeline; and performing signal processing in a second pipeline, the second pipeline and first pipeline executing in parallel, the second pipeline applying the output of the first pipeline to remove reverberation in an audio signal processed by the second pipeline.
- GWPE Generalized Weighted Prediction Error
- Example 59 the subject matter of Example 58 optionally includes buffering the audio signal in a buffer; providing contents of the buffer every second to the first pipeline; and clearing the buffer after providing the contents.
- Example 60 the subject matter of any one or more of Examples 58-59 optionally include wherein the first pipeline iteratively includes: calculating signal statistics; calculating a de-reverb filter; and applying the de-reverb filter to remove reverberation.
- Example 61 is a system comprising means to perform any of the methods 58-60.
- Example 62 is at least one machine readable medium including instructions that, when executed by a machine, cause the machine to perform any of methods 58-60.
- Example 63 is a system for de-reverberation of an audio signal, the system comprising: means for performing Generalized Weighted Prediction Error (GWPE) in a first pipeline; and means for performing signal processing in a second pipeline, the second pipeline and first pipeline executing in parallel, the second pipeline applying the output of the first pipeline to remove reverberation in an audio signal processed by the second pipeline.
- GWPE Generalized Weighted Prediction Error
- Example 64 the subject matter of Example 63 optionally includes means for buffering the audio signal in a buffer; means for providing contents of the butter every second to the first pipeline; and means for clearing the buffer after providing the contents.
- Example 65 the subject matter of any one or more of Examples 63-64 optionally include wherein the first pipeline includes means for iteratively: calculating signal statistics; calculating a de-reverb filter; and applying the de-reverb filter to remove reverberation.
- Example 66 is at least one machine readable medium including instructions for de-reverberation of an audio signal, the instructions, when executed by a machine, causing the machine to perform operations comprising: estimating signal statistics for an audio signal; performing Generalized Weighted Prediction Error (GWPE) using the estimated signal statistics; estimating a spatial correlation matrix; creating weighted matrix and vector inputs from the spatial correlation matrix estimation; and updating a de-reverb filter with the weighted matrix and vector.
- GWPE Generalized Weighted Prediction Error
- Example 67 the subject matter of Example 66 optionally includes are performed inline to other audio signal processing.
- Example 68 the subject matter of any one or more of Examples 66-67 optionally include wherein only one signal frame is buffered at a time to input into the operations.
- Example 69 is a method for de-reverberation of an audio signal, the method comprising: estimating signal statistics for an audio signal; performing Generalized Weighted Prediction Error (GWPE) using the estimated signal statistics; estimating a spatial correlation matrix; creating weighted matrix and vector inputs from the spatial correlation matrix estimation; and updating a de-reverb filter with the weighted matrix and vector.
- GWPE Generalized Weighted Prediction Error
- Example 70 the subject matter of Example 69 optionally includes are performed inline to other audio signal processing.
- Example 71 the subject matter of any one or more of Examples 69-70 optionally include wherein only one signal frame is buffered at a time to input into the operations.
- Example 72 is a system comprising means to perform any of the methods 69-71.
- Example 73 is at least one machine readable medium including instructions that, when executed by a machine, cause the machine to perform any of methods 69-71.
- Example 74 is a system for de-reverberation of an audio signal, the system comprising: means for estimating signal statistics for an audio signal; means for performing Generalized Weighted Prediction Error (GWPE) using the estimated signal statistics; means for estimating a spatial correlation matrix; means for creating weighted matrix and vector inputs from the spatial correlation matrix estimation; and means for updating a de-reverb filter with the weighted matrix and vector.
- GWPE Generalized Weighted Prediction Error
- Example 75 the subject matter of Example 74 optionally includes are performed inline to other audio signal processing.
- Example 76 the subject matter of any one or more of Examples 74-75 optionally include wherein only one signal frame is buffered at a time to input into the operations.
- Example 77 is at least one machine readable medium including instructions for de-reverberation of an audio signal, the instructions, when executed by a machine, causing the machine to perform operations comprising: performing de-reverb filter updating in a first pipeline; and performing signal processing in a second pipeline, the second pipeline and first pipeline executing in parallel, the second pipeline applying the output of the first pipeline to remove reverberation in an audio signal processed by the second pipeline.
- Example 78 the subject matter of Example 77 optionally includes wherein the de-reverb filter updating includes: estimating signal statistics for the audio signal; performing Generalized Weighted Prediction Error (GWPE) using the estimated signal statistics; estimating a spatial correlation matrix; creating weighted matrix and vector inputs from the spatial correlation matrix estimation; and updating a de-reverb filter with the weighted matrix and vector.
- GWPE Generalized Weighted Prediction Error
- Example 79 the subject matter of any one or more of Examples 77-78 optionally include wherein only one signal frame is buffered at a time to input into the first pipeline.
- Example 80 is a method for de-reverberation of an audio signal, the method comprising: performing de-reverb filter updating in a first pipeline; and performing signal processing in a second pipeline, the second pipeline and first pipeline executing in parallel, the second pipeline applying the output of the first pipeline to remove reverberation in an audio signal processed by the second pipeline.
- Example 81 the subject matter of Example 80 optionally includes wherein the de-reverb filter updating includes: estimating signal statistics for the audio signal; performing Generalized Weighted Prediction Error (GWPE) using the estimated signal statistics; estimating a spatial correlation matrix; creating weighted matrix and vector inputs from the spatial correlation matrix estimation; and updating a de-reverb filter with the weighted matrix and vector.
- GWPE Generalized Weighted Prediction Error
- Example 82 the subject matter of any one or more of Examples 80-81 optionally include wherein only one signal frame is buffered at a time to input into the first pipeline.
- Example 83 is a system comprising means to perform any of the methods 80-82.
- Example 84 is at least one machine readable medium including instructions that, when executed by a machine, cause the machine to perform any of methods 80-82.
- Example 85 is a system for de-reverberation of an audio signal, the system comprising: means for performing dc-reverb filter updating in a first pipeline; and means for performing signal processing in a second pipeline, the second pipeline and first pipeline executing in parallel, the second pipeline applying the output of the first pipeline to remove reverberation in an audio signal processed by the second pipeline.
- Example 86 the subject matter of Example 85 optionally includes wherein the de-reverb filter updating includes: means for estimating signal statistics for the audio signal; means for performing Generalized Weighted Prediction Error (GWPE) using the estimated signal statistics; means for estimating a spatial correlation matrix; means for creating weighted matrix and vector inputs from the spatial correlation matrix estimation; and means for updating a de-reverb filter with the weighted matrix and vector.
- GWPE Generalized Weighted Prediction Error
- Example 87 the subject matter of any one or more of Examples 85-86 optionally include wherein only one signal frame is buffered at a time to input into the first pipeline.
- the terms “a” or “an” are used, as is common in patent documents, to include one or more than one, independent of any other instances or usages of “at least one” or “one or more.”
- the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated.
Landscapes
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Otolaryngology (AREA)
- Manufacturing & Machinery (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
System and techniques for automatic speech recognition de-reverberation are described herein. A portion of an audio stream may be obtained. here, the portion of the audio stream is a proper subset of the audio stream. A filter may be created by applying Generalized Weighted Prediction Error (GWPE) to the portion of the audio stream. The filter may be applied to the audio stream to remove reverberation. The filtered version of the audio stream may then be provided to an audio stream consumer.
Description
- This patent application claims the benefit of priority, under 35 U.S.C. §119, to U.S. Provisional Application Ser. No. 62/350,507, titled “FAR FIELD AUTOMATIC SPEECH RECOGNITION” and filed on Jun. 15, 2016, the entirety of which is hereby incorporated by reference herein.
- Embodiments described herein generally relate to automatic speech recognition (ASR) and more specifically to ASR de-reverberation.
- ASR involves a machine-based collection of techniques to understand human languages. ASR is interdisciplinary, often involving microphone, analog to digital conversion, frequency processing, database, and artificial intelligence technologies to convert the spoken word into textual or machine readable representations of not only what said (e.g., a transcript) but also what was meant (e.g., semantic understanding) by a human speaker. Far field ASR involves techniques to decrease a word error rate (WER) in utterances made a greater distance to a microphone, or microphone array, than traditionally accounted for in ASR processing pipelines. Such distance often decreases the signal to noise ratio (SNR) and thus increases WER in traditional ASR systems. As used herein, far field ASR involves distances more than half meter from the microphone.
- In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. Like numerals having different letter suffixes may represent different instances of similar components. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed in the present document.
-
FIG. 1 is an example of a smart home gateway housing, according to an embodiment. -
FIG. 2 is a block diagram of an example high accuracy (HA) real-time de-reverberation device, according to an embodiment. -
FIG. 3 is a block diagram of an example low latency (II) real-time de-reverberation device, according to an embodiment. -
FIG. 4 is a block diagram of an example low complexity and high accuracy (LH) real-time de-reverberation device, according to an embodiment. -
FIG. 5 is an example of a method for automatic speech recognition de-reverberation, according to an embodiment. -
FIG. 6 is a block diagram illustrating an example of a machine upon which one or more embodiments may be implemented. - Embodiments and examples herein general described a number of systems, devices, and techniques for ASR de-reverberation. It is understood, however, that the systems, devices, and techniques are examples illustrating the underlying concepts.
- Automatic speech recognition (ASR) often performs poorly if the user (or other sound source) is changing positions in the so-called far field (e.g., between one and five meters away from the microphone array). For example, for distances changing from 0.5 to 5 m the ASR word error rate (WER) for traditional techniques often grows from 4% to 18.5%. Due to issues of environmental acoustic conditions (e.g., reverberation or noise) that are exacerbated in far field ASR, near field (e.g., distances less than one half meter) ASR processing techniques perform poorly in the far field. Often far field conditions cause signals captured by microphones to be reverberant. In fact, when dealing with the far field ASR, reverberations may become the main parasitic factor decreasing ASR performance.
- Reverb removal is not a trivial task because its characteristics depend on many factors, such as reverberation time (RT), distance between the sound source (e.g., user) and the microphone array, microphone array/device position, among other factors. Accordingly, even if a reverberation characteristic is known at one point in a room, for example, the characteristics may not be helpful to remove reverberation when a signal is captured in a different location (even if the location changed only a few millimeters).
- In a reverberant scenario, many near-field pre-processing techniques that are used for acceptable ASR performance start to fail. What is needed is a de-reverberation technique to facilitate far field ASR performance improvements, such as, helping beam-formers to work properly in the reverberant conditions.
- A framework for reducing reverberation is the generalized weighted prediction error (GWPE) de-reverberation technique. Although GWPE may lead to large WER improvements, it is expensive (e.g., in time, power, or device complexity) to compute. In tests, GWPE's real-time factor (RTF) is greater than 2.8 for a modern Intel® i7 processor on an eight microphone array. Here, the RTF is a ratio between the required time of computation and computed signal duration. Thus, when the RTF is higher than 1.0, the measured processing will not be real-time and the computation will take longer than the signal lasts. Because GWPE does not have an RTF less than or equal to one on most consumer devices, GWPE generally cannot be used in real-time solutions.
- To address the issues with GWPE in real-time processing, several examples of systems and techniques are presented herein. Examples are able to run in real-time with, for example, an eight microphone array (such as that illustrated in
FIG. 1 ). Some of these techniques improve WER at the five meter distance from 17.2% to 9.3% in clean conditions (e.g., without noises) and from 51.5% to 33.2% in noisy conditions. Removing reverberation helps improve ASR engine performance, beam-former techniques, and estimation of signal's source position, thus generally increasing the performance of ASR systems. - The devices or systems described below allow processing of an eight or less channel signal in real-time (e.g., RTF<=1.0). Further, in an example, some optimizations in the GWPE technique result in reduced computational complexity, allowing wider use of GWPE in a variety of devices with different computational capabilities.
-
FIG. 1 is an example of adevice 100 including a smarthome gateway housing 105, according to an embodiment. As illustrated, the circles atop the housing arelumens 110 behind which are housed microphones (as illustrated there are eight microphones). The dashed lines illustrate microphones in alinear arrangement 115 as well as in acircular arrangement 120. Thedevice 100 includes asampler 125, asignal processor 130, amultiplexer 140, and aninterlink 145. In an example, the device includes adata store 135 to provide one or more buffers. All of these components are implemented in electronic hardware, such as that described below e.g., circuits). - The
sampler 125 is arranged to obtain a portion of an audio stream. Here, the portion of the audio stream being a proper subset of the audio stream. To obtain the portion, thesampler 125 may receive the portion or may retrieve the portion from, for example, a microphone. In an example, obtaining the portion of the audio stream includes buffering the audio stream for a fixed time period. In an example, the fixed time period (i.e. buffer length) is a second. In an example, the fixed time period is an audio frame. In an example, the audio frame length is thirty two milliseconds. Thesampler 125 thus operates to obtain discrete audio samples. - The
signal processor 130 is arranged to create a filter by applying GWPE to the portion of the audio stream. In an example, thesignal processor 130 is arranged to calculate the frequency of domain data by applying an overlap and add (OLA) procedure combined with a Fast Fourier Transform (FFT) prior to creating the filter. A description of GWPE and its application in a number of examples is provided below with respect toFIGS. 2-4 . Thus, instead of operating an entire utterance, GWPE is applied only to the portion obtained by the sampler 125 (e.g., a buffered signal). Reducing the size of the buffered segment of the signal reduces the computational complexity of traditional GWPE while still maintaining enough de-reverberation performance to benefit ASR processing. In an example, creating the filter occurs in a first pipeline and applying the filter occurs in a second pipeline. In an example, the first and second pipelines are arranged to execute in parallel on thedevice 100. Parallel execution here means actual simultaneous (e.g., not sequential as in a single core processor) execution of the pipelines. In an example, thesignal processor 130 is also arranged to repeatedly create and apply the filter with a subsequent fixed time period (e.g., buttered portions of signal). Thus, thesignal processor 130 creates and applies filters for respective samples obtained by thesampler 125. - In an example, creating the filter includes the
signal processor 130 to combine a current GWPE application to the audio stream with a previously created filter. Here, GWPE application refers to the result of applying GWPE to the audio stream, in an example, combining the current GWPE application to the audio stream with a previously created filter includes adding the current GWPE application as a first term to the previously created filter as a second term. In an example, combining the current GWPE application to the audio stream with a previously created filter includes applying a first scaling factor to the first term and a second scaling factor to the second term prior to the adding. In an example, the second scaling factor is between zero and one. In an example, the first scaling factor is one minus the second scaling factor. - The
data store 135 provides a buffer to introduce a delay to the audio stream prior to applying the filter. This delay permits the processing buffer to fill with the required amount of signal. In an example, the delay is higher than 40 milliseconds and depends inter alia on the processor compute power. - The
multiplexer 140 is arranged to apply the filter to the audio stream to remove reverberation. Thus, themultiplexer 140 accepts both the filter and the complex audio spectrum as inputs. Themultiplexer 140 then applies the filter to the audio stream, calculates Inverse FFT(IFFT) and perform the OLA procedure to produce a signal with some (or all) of the reverberations removed. This signal may be referred to as a clean audio signal, filtered audio signal, or the like. - The
interlink 145 is arranged to provide a filtered version of the audio stream to an audio stream consumer. Thus, theinterlink 145 interfaces the present pipeline to other processing as part of a far field, or other, ASR system. Although far field ASR may greatly benefit from thedevice 100, thedevice 100 may also be helpful in other ASR situations, such as near field, to reduce reverberations. -
FIGS. 2-4 illustrate several examples of configurations for a variety of the operations described above. These examples describe changes to a traditional GWPE application to meet the real-time processing used by many current devices while still maintaining a high degree of reverberation removal for ASR processing applications. - GWPE was originally designed in such a way that it required a whole recording of an utterance to properly design the de-reverberation filter. Thus, GWPE was not appropriate for real-time signal processing because processing must begin prior to an utterance's completion, denying the knowledge contained in the end of the recording that was used in the traditional GWPE implementation. To address the limitations of GWPE, three examples are described below. The first example is here labeled high accuracy (HA), the second example is here labeled low latency (LL), and the example is here labeled low-complexity with high accuracy (LH). The first and third examples (HA and LH) estimate the de-reverberation filter in a parallel to the channel processing. However, the role of the main thread is different between HA and LH. LL performs all calculations in a single (e.g., main) thread but uses estimates of the signal statistics instead of using the signal's actual (e.g., real) values.
- To illustrate how these three examples improve on GWPE, the GWPE technique is introduced and then modifications to GWPE in each of the examples are explained.
- GWPE operates in the frequency domain. GWPE operates on a signal with sample rate of 16 kHz and uses 32 ms frames with an 8 ms shift. GWPE treats every frequency bin independently. Thus, GWPE processes 257 independent frequency bins. As used herein, a frequency bin is represented by/and a number of the frame (e.g., index of the frame in a sample) by t. GWPE takes as an input of M channels and provides M channels of output. GWPE is a blind de-reverberation technique because all of the statistics needed by GWPE are obtained directly from the input signal. The GWPE de-reverberation operation is defined by the following equation:
-
- where:
-
- {circumflex over (X)}l(t) is an estimate of a dry signal (e.g., without reverberations) for the time frame t from timespan T and frequency bin l,
- Yl(t) is the signal captured by a microphone array;
- Ĝ*l(τ) is an estimate of the de-reverberation filter;
- Kl is a filter length;
- Δ is a time delay; and
- {circumflex over (X)}l(t), Yl(t), and Ĝ*l(τ) are vectors of length M.
- Ĝ*l(τ) may be estimated using the following:
-
- 1, initialization: Ĝ*l(τ)=0 for all τ with Δ≦τ≦(Δ+Kl−1).
- 2. de-reverberation using equation (1).
- 3. spatial correlation matrix estimation:
-
- 4. weighted correlation matrix/vector calculation:
-
-
- 5. filter update:
-
ĝ l ={circumflex over (R)} l −1 r l (5) -
- 6. convergence check
- The complexity of the above technique is high. Given an eight channel input signal, the filter length is four. With 257 frequency bins, every filter update involves solving 257 linear equation systems, each of size 32 by 32 (R) (matrix) and 32 by 8 (r) (vector). From the perspective of signal quality it is important to update filter frequently enough to keep up (e.g., align) with reverberation characteristic changes.
- GWPE updates the filter for an entire utterance. Thus, GWPE is inappropriate for working real-time working solutions due to:
-
- the latency it will introduce
- misalignment between the input signal and the filter in a case when the user moves during the utterance speaking
- resource requirements related to the memory needed to buffer the signal
-
FIG. 2 is a block diagram of an HA real-time de-reverberation device 205, according to an embodiment. To address some of the GWPE issues noted above, HA de-reverberation uses a short time span for filter estimation. For example, the time for estimation (T_est) may be equal to one second (1000 ms) Here, the filter is updated every T_est ms. - The elements of the
HA device 205 include amain thread 215 and aparallel thread 210. Themain thread 215 includes astorage device 235, asignal delay buffer 240, and aswitcher 245. Theparallel thread 210 includes asignal statistics calculator 220, afilter calculator 225, and areverberation removal block 230. These components are all implemented in electronic hardware (e.g., circuits). The operations of the components proceed as follows: -
- 1. The
main thread 215 buffers the input signal in thestorage device 235 anddelays 240 samples at the beginning of processing. - 2. When the buffer is filled (e.g., when 1000 ms of data has been collected), the GWPE procedure is performed in the
parallel thread 210 using the technique described above, including calculating thesignal statistics 220 and calculating thefilter 225. - 3. When the GWPE procedure is finished, the
reverberation removal block 230 provides a signal with the reverberation removed to theswitcher 245 to output. The filter update (e.g., blocks 220 and 225) may then be started for a new data chunk.
- 1. The
- Thus, the processing proceeds with the latest (e.g., last) GWPE de-reverberation without imposing a delay for the entire utterance. An advantage of this technique includes its high accuracy of reverberation reduction. In an example, the delay introduced is:
-
ΔT=(1+RTF)T est (6) - Where T_est is the time span used for the filter estimation and the RTF obtained from the GWPE implementation on the sample. In experimental results, the RTF approached 0.3 for this example.
-
FIG. 3 is a block diagram of an example LL real-time de-reverberation device 305, according to an embodiment. This additional example is here called the low-latency real-time de-reverberation. It uses an estimate of a weighted correlation matrix and vector ({circumflex over (R)}l and {circumflex over (τ)}l) and does not wait for filter convergence. Thedevice 305 includes asingle thread 310 rather than the parallel threads described with respect to the I-IA real-time de-reverberation device 205 described above. The single thread includes a signalstatistics estimation block 220 rather than the signal statistics calculation block described above, but otherwise also includes afilter calculator 320 andreverberation removal block 325. All of illustrated processing elements are implemented in electronic hardware. Operations of thedevice 305 proceed as follows: -
- 1. initialization: Ĝl(t)=0 for all τ values with Δ≦τ≦Δ+Kl−1
- 2. De-reverberation using equation (1) (e.g., block 325)
- 3. Spatial correlation matrix estimation using equation (2) (e.g., block 315)
- 4. Weighted correlation matrix/vector calculation (e.g., block 320):
-
-
- 5. Filter update (e.g., block 320):
-
ĝ l ={right arrow over (R)} l −1 r l , f or t≧T min - where α is a smoothing factor (typically in a range from 0.9 to 0.999) and T_min is an initialization time span (typically in a range from 300 to 500 milliseconds ms)). This technique may introduce delay caused by the fast Fourier transform (FFT) and the overlap-add (OLA) procedures. For example, the delay is 40 ms when the frame size equals 32 ms and a frame shift of 8 ms. An additional benefit of this example includes lower memory requirements because calculating a new values of {right arrow over (R)}l and {circumflex over (r)}l requires only the last T signal frame to be buffered whereas the HA example may use T_est+T frames to be buffered. Additionally, because the filter is updated for every new frame, the precision of the de-reverberation filter may perform better than the HA example. Experimentally, a WER for clean speech of 10.5% for the HA example was achieved and a WER of 9.3% for the LL example was achieved.
-
FIG. 4 is a block diagram of an example LH real-time de-reverberation device 405, according to an embodiment. This example uses a similar foundation illustrated above between the threads of the HA example (FIG. 2 ), but uses the LL de-reverberation procedure albeit in a parallel thread. The delay introduced by this example may be set arbitrarily depending on the available computing power. Thus, thedevice 405 includes amain thread 430 and a parallelfilter estimation thread 410. Themain thread 430 includes asignal estimation block 435 which buffers results in astorage device 425. The main thread includes another storage area 445 (which may be the same physical device asstorage device 425 in an example) to buffer the audio signal for use by thereverberation removal block 420 of theparallel thread 410. Additionally, themain thread 430 may include adelay 440 block to delay the audio signal into theswitcher 450 to allow the filter to be processed prior to outputting the filtered signal. Theparallel thread 410 includes thefilter calculator 415 similar to that of the LL example to operates on the signal statistic estimates to produce the de-reverberation filter. This then is provided to thereverberation removal block 420 to perform the filtering. - The higher ΔT (e.g., the greater the time-frame sample) is set the lower the compute complexity becomes because the delay will often define how frequently the filter is updated. Thus, for ΔT=0, this example will be equivalent in compute complexity to the LL example described with respect to
FIG. 3 . -
FIG. 5 is an example of amethod 500 for automatic speech recognition de-reverberation, according to an embodiment. The operations of themethod 500 are implemented and executed upon electronic hardware, such as that described above and below (e.g., circuits). - At
operation 505, a portion of an audio stream is obtained. In an example, the portion of the audio stream is a proper subset of the audio stream. In an example, obtaining the portion of the audio stream includes buffering the audio stream for a fixed time period. In an example, the fixed time period is a second. In an example, the fixed time period is an audio frame. In an example, the audio frame length is thirty two milliseconds. In an example, themethod 500 may be extended to include repeating (e.g., repeatedly) creating the filter with a subsequent fixed time period. - At
operation 510, a filter is created by applying GWPE to the portion of the audio stream. In an example, creating the filter occurs in a first pipeline and applying the filter occurs in a second pipeline. In an example, the first and second pipelines execute in parallel on a device. - In an example, creating the filter includes combining a current GWPE application to the audio stream with a previously created filter. In an example, combining the current GWPE application to the audio stream with a previously created filter includes adding the current GWPE application as a first term to the previously created filter as a second term. In an example, combining the current GWPE application to the audio stream with a previously created filter includes applying a first scaling factor to the first term and a second scaling factor to the second term prior to the adding. In an example, the second scaling factor is between zero and one. In an example, the first scaling factor is one minus the second scaling factor.
- At
operation 515, the filter is applied to the audio stream to remove reverberation from the audio stream to produce a filtered version of the audio stream. - At
operation 520, the filtered version of the audio stream is provided to an audio stream consumer. - The operations of the
method 500 may be optionally extended to include introducing a delay to the audio stream prior to applying the filter. In an example, the delay is 40 milliseconds. -
FIG. 6 illustrates a block diagram of anexample machine 600 upon which any one or more of the techniques (e.g., methodologies) discussed herein may perform. In alternative embodiments, themachine 600 may operate as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, themachine 600 may operate in the capacity of a server machine, a client machine, or both in server-client network environments. In an example, themachine 600 may act as a peer machine in peer-to-peer (P2P) (or other distributed) network environment. Themachine 600 may be a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a mobile telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein, such as cloud computing, software as a service (SaaS), other computer cluster configurations. - Examples, as described herein, may include, or may operate by, logic or a number of components, or mechanisms. Circuitry is a collection of circuits implemented in tangible entities that include hardware (e.g., simple circuits, gates, logic, etc.). Circuitry membership may be flexible over time and underlying hardware variability. Circuitries include members that may, alone or in combination, perform specified operations when operating. In an example, hardware of the circuitry may be immutably designed to carry out a specific operation (e.g., hardwired). In an example, the hardware of the circuitry may include variably connected physical components (e.g., execution units, transistors, simple circuits, etc.) including a computer readable medium physically modified (e.g., magnetically, electrically, moveable placement of invariant massed particles, etc.) to encode instructions of the specific operation. In connecting the physical components, the underlying electrical properties of a hardware constituent are changed, for example, from an insulator to a conductor or vice versa. The instructions enable embedded hardware (e.g., the execution units or a loading mechanism) to create members of the circuitry in hardware via the variable connections to carry out portions of the specific operation when in operation. Accordingly, the computer readable medium is communicatively coupled to the other components of the circuitry when the device is operating. In an example, any of the physical components may be used in more than one member of more than one circuitry. For example, under operation, execution units may be used in a first circuit of a first circuitry at one point in time and reused by a second circuit in the first circuitry, or by a third circuit in a second circuitry at a different time.
- Machine (e.g., computer system) 600 may include a hardware processor 602 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a hardware processor core, or any combination thereof), a
main memory 604 and astatic memory 606, some or all of which may communicate with each other via an interlink (e.g., bus) 608. Themachine 600 may further include adisplay unit 610, an alphanumeric input device 612 (e.g., a keyboard), and a user interface (UI) navigation device 614 (e.g., a mouse). In an example, thedisplay unit 610,input device 612 andUI navigation device 614 may be a touch screen display. Themachine 600 may additionally include a storage device (e.g., drive unit) 616, a signal generation device 618 (e.g., a speaker), anetwork interface device 620, and one ormore sensors 621, such as a global positioning system (UPS) sensor, compass, accelerometer, or other sensor. Themachine 600 may include anoutput controller 628, such as a serial (e.g., universal serial bus (USB), parallel, or other wired or wireless (e.g., infrared (IR), near field communication (NFC), etc) connection to communicate or control one or more peripheral devices (e.g., a printer, card reader, etc.). - The
storage device 616 may include a machinereadable medium 622 on which is stored one or more sets of data structures or instructions 624 (e.g., software) embodying or utilized by any one or more of the techniques or functions described herein. Theinstructions 624 may also reside, completely or at least partially, within themain memory 604, withinstatic memory 606, or within thehardware processor 602 during execution thereof by themachine 600. In an example, one or any combination of thehardware processor 602, themain memory 604, thestatic memory 606, or thestorage device 616 may constitute machine readable media. - While the machine
readable medium 622 is illustrated as a single medium, the term “machine readable medium” may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) configured to store the one ormore instructions 624. - The term “machine readable medium” may include any medium that is capable of storing, encoding, or carrying instructions for execution by the
machine 600 and that cause themachine 600 to perform any one or more of the techniques of the present disclosure, or that is capable of storing, encoding or carrying data structures used by or associated with such instructions. Non-limiting machine readable medium examples may include solid-state memories, and optical and magnetic media. In an example, a massed machine readable medium comprises a machine readable medium with a plurality of particles having invariant (e.g., rest) mass. Accordingly, massed machine-readable media are not transitory propagating signals. Specific examples of massed machine readable media may include: non-volatile memory, such as semiconductor memory devices (e.g., Electrically Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM)) and flash memory devices; magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. - The
instructions 624 may further be transmitted or received over acommunications network 626 using a transmission medium via thenetwork interface device 620 utilizing any one of a number of transfer protocols (e.g., frame relay, internet protocol (IP), transmission control protocol (TCP), user datagram protocol (UDP), hypertext transfer protocol (HTTP), etc.). Example communication networks may include a local area network (LAN), a wide area network (WAN), a packet data network (e.g., the Internet), mobile telephone networks (e.g., cellular networks), Plain Old Telephone (POTS) networks, and wireless data networks (e.g., Institute of Electrical and Electronics Engineers (IEEE) 802.11 family of standards known as Wi-Fi®, IEEE 802.16 family of standards known as WiMax®), IEEE 802.15.4 family of standards, peer-to-peer (P2P) networks, among others. In an example, thenetwork interface device 620 may include one or more physical jacks (e.g., Ethernet, coaxial, or phone jacks) or one or more antennas to connect to thecommunications network 626. In an example, thenetwork interface device 620 may include a plurality of antennas to wirelessly communicate using at least one of single-input multiple-output (SIMO), multiple-input multiple-output (MIMO), or multiple-input single-output (MISO) techniques. The term “transmission medium” shall be taken to include any intangible medium that is capable of storing, encoding or carrying instructions for execution by themachine 600, and includes digital or analog communications signals or other intangible medium to facilitate communication of such software. - Example 1 is a system for automatic speech recognition de-reverberation, the system comprising: a sampler to obtain a portion of an audio stream, the portion of the audio stream being a proper subset of the audio stream; a signal processor to create a filter by applying Generalized Weighted Prediction Error (GWPE) to the portion of the audio stream; a multiplexer to apply the filter to the audio stream to remove reverberation; and an interlink to provide a filtered version of the audio stream to an audio stream consumer.
- In Example 2, the subject matter of Example 1 optionally includes wherein the processor is in a first pipeline to create the filter and the multiplexer is in a second pipeline to apply the filter, the first and second pipelines arranged to execute in parallel.
- In Example 3, the subject matter of any one or more of Examples 1-2 optionally include wherein, to obtain the portion of the audio stream, the sampler buffers the audio stream for a fixed time period.
- In Example 4, the subject matter of Example 3 optionally includes wherein the fixed time period is a second.
- In Example 5, the subject matter of any one or more of Examples 3-4 optionally include wherein the signal processor includes a loop to repetitively create the filter with subsequent fixed time periods.
- In Example 6, the subject matter of any one or more of Examples 3-5 optionally include wherein the fixed time period is an audio frame.
- In Example 7, the subject matter of Example 6 optionally includes wherein the audio frame length is thirty two milliseconds.
- In Example 8, the subject matter of any one or more of Examples 1-7 optionally include wherein, to create the filter, the signal processor combines a current GWPE application to the audio stream with a previously created filter.
- In Example 9, the subject matter of Example 8 optionally includes wherein, to combine the current GWPE application to the audio stream with a previously created filter, the signal processor adds the current GWPE application as a first term to the previously created filter as a second term.
- In Example 10, the subject matter of Example 9 optionally includes wherein, to combine the current GWPE application to the audio stream with a previously created filter, the signal processor applies a first scaling factor to the first term and a second scaling factor to the second term prior to the adding.
- In Example 11, the subject matter of Example 10 optionally includes wherein the second scaling factor is between zero and one and wherein the first scaling factor is one minus the second scaling factor.
- In Example 12, the subject matter of any one or more of Examples 1-11 optionally include a buffer to introduce a delay to the audio stream prior to applying the filter.
- In Example 13, the subject matter of Example 12 optionally includes wherein the delay is eight milliseconds.
- Example 14 is at least one machine readable medium including instructions for automatic speech recognition de-reverberation, the instructions, when executed by a machine, cause the machine to perform operations comprising: obtaining a portion of an audio stream, the portion of the audio stream being a proper subset of the audio stream; creating a filter by applying Generalized Weighted Prediction Error (GWPE) to the portion of the audio stream; applying the filter to the audio stream to remove reverberation; and providing a filtered version of the audio stream to an audio stream consumer.
- In Example 15, the subject matter of Example 14 optionally includes wherein creating the filter occurs in a first pipeline and applying the filter occurs in a second pipeline, the first and second pipelines executing in parallel on a device.
- In Example 16, the subject matter of any one or more of Examples 14-15 optionally include wherein obtaining the portion of the audio stream includes buffering the audio stream for a fixed time period.
- In Example 17, the subject matter of Example 16 optionally includes wherein the fixed time period is a second.
- In Example 18, the subject matter of any one or more of Examples 16-17 optionally include wherein the operations include repeating creating the filter with a subsequent fixed time period.
- In Example 19, the subject matter of any one or more of Examples 16-18 optionally include wherein the fixed time period is an audio frame.
- In Example 20, the subject matter of Example 19 optionally includes wherein the audio frame length is thirty two milliseconds.
- In Example 21, the subject matter of any one or more of Examples 14-20 optionally include wherein creating the filter includes combining a current GWPE application to the audio stream with a previously created filter.
- In Example 22, the subject matter of Example 21 optionally includes wherein combining the current GWPE application to the audio stream with a previously created filter includes adding the current GWPE application as a first term to the previously created filter as a second term.
- In Example 23, the subject matter of Example 22 optionally includes wherein combining the current GWPE application to the audio stream with a previously created filter includes applying a first scaling factor to the first term and a second scaling factor to the second term prior to the adding.
- In Example 24, the subject matter of Example 23 optionally includes wherein the second scaling factor is between zero and one and wherein the first scaling factor is one minus the second scaling factor.
- In Example 25, the subject matter of any one or more of Examples 14-24 optionally include introducing a delay to the audio stream prior to applying the filter.
- In Example 26, the subject matter of Example 25 optionally includes wherein the delay is eight milliseconds.
- Example 27 is a device for automatic speech recognition de-reverberation, the device comprising: means for obtaining a portion of an audio stream, the portion of the audio stream being a proper subset of the audio stream; means for creating a filter by applying Generalized Weighted Prediction Error (GWPE) to the portion of the audio stream; means for applying the filter to the audio stream to remove reverberation; and means for providing a filtered version of the audio stream to an audio stream consumer.
- In Example 28, the subject matter of Example 27 optionally includes wherein the means for creating the filter occurs in a first pipeline and the means for applying the filter occurs in a second pipeline, the first and second pipelines executing in parallel on the device.
- In Example 29, the subject matter of any one or more of Examples 27-28 optionally include wherein the means for obtaining the portion of the audio stream includes means for buffering the audio stream for a fixed time period.
- In Example 30, the subject matter of Example 29 optionally includes wherein the fixed time period is a second.
- In Example 31, the subject matter of any one or more of Examples 29-30 optionally include means for repeating creating the filter with a subsequent fixed time period.
- In Example 32, the subject matter of any one or more of Examples 29-31 optionally include wherein the fixed time period is an audio frame.
- In Example 33, the subject matter of Example 32 optionally includes wherein the audio frame length is thirty two milliseconds.
- In Example 34, the subject matter of any one or more of Examples 27-33 optionally include wherein the means for creating the filter includes means for combining a current GWPE application to the audio stream with a previously created filter.
- In Example 35, the subject matter of Example 34 optionally includes wherein the means for combining the current GWPE application to the audio stream with a previously created filter includes means for adding the current GWPE application as a first term to the previously created filter as a second term.
- In Example 36, the subject matter of Example 35 optionally includes wherein the means for combining the current GWPE application to the audio stream with a previously created filter includes means for applying a first scaling factor to the first term and a second scaling factor to the second term prior to the adding.
- In Example 37, the subject matter of Example 36 optionally includes wherein the second scaling factor is between zero and one and wherein the first scaling factor is one minus the second scaling factor.
- In Example 38, the subject matter of any one or more of Examples 27-37 optionally include wherein the means for applying the filter to the audio stream includes means for introducing a delay to the audio stream prior to applying the filter.
- In Example 39, the subject matter of Example 38 optionally includes wherein the delay is eight milliseconds.
- Example 40 is a method for automatic speech recognition de-reverberation, the method comprising: obtaining a portion of an audio stream; the portion of the audio stream being a proper subset of the audio stream; creating a filter by applying Generalized Weighted Prediction Error (GWPE) to the portion of the audio stream; applying the filter to the audio stream to remove reverberation; and providing a filtered version of the audio stream to an audio stream consumer.
- In Example 41, the subject matter of Example 40 optionally includes wherein creating the filter occurs in a first pipeline and applying the filter occurs in a second pipeline; the first and second pipelines executing in parallel on a device.
- In Example 42, the subject matter of any one or more of Examples 40-41 optionally include wherein obtaining the portion of the audio stream includes buffering the audio stream for a fixed time period.
- In Example 43, the subject matter of Example 42 optionally includes wherein the fixed time period is a second.
- In Example 44, the subject matter of any one or more of Examples 42-43 optionally include repeating creating the filter with a subsequent fixed time period.
- In Example 45, the subject matter of any one or more of Examples 42-44 optionally include wherein the fixed time period is an audio frame.
- In Example 46, the subject matter of Example 45 optionally includes wherein the audio frame length is thirty two milliseconds.
- In Example 47, the subject matter of any one or more of Examples 40-46 optionally include wherein creating the filter includes combining a current GWPE application to the audio stream with a previously created filter.
- In Example 48, the subject matter of Example 47 optionally includes wherein combining the current GWPE application to the audio stream with a previously created filter includes adding the current GWPE application as a first term to the previously created filter as a second term.
- In Example 49, the subject matter of Example 48 optionally includes wherein combining the current GWPE application to the audio stream with a previously created filter includes applying a first scaling factor to the first term and a second scaling factor to the second term prior to the adding.
- In Example 50, the subject matter of Example 49 optionally includes wherein the second scaling factor is between zero and one and wherein the first scaling factor is one minus the second scaling factor.
- In Example 51, the subject matter of any one or more of Examples 40-50 optionally include introducing a delay to the audio stream prior to applying the filter.
- In Example 52, the subject matter of Example 51 optionally includes wherein the delay is eight milliseconds.
- Example 53 is a system comprising means to perform any of the methods 40-52.
- Example 54 is at least one machine readable medium including instructions that, when executed by a machine, cause the machine to perform any of methods 40-52.
- Example 55 is at least one machine readable medium including instructions for de-reverberation of an audio signal, the instructions, when executed by a machine, causing the machine to perform operations comprising: performing Generalized Weighted Prediction Error (GWPE) in a first pipeline; and performing signal processing in a second pipeline, the second pipeline and first pipeline executing in parallel, the second pipeline applying the output of the first pipeline to remove reverberation in an audio signal processed by the second pipeline.
- In Example 56, the subject matter of Example 55 optionally includes buffering the audio signal in a buffer; providing contents of the buffer every second to the first pipeline; and clearing the buffer after providing the contents.
- In Example 57, the subject matter of any one or more of Examples 55-56 optionally include wherein the first pipeline includes iteratively: calculating signal statistics; calculating a de-reverb filter; and applying the de-reverb filter to remove reverberation.
- Example 58 is a method for de-reverberation of an audio signal, the method comprising: performing Generalized Weighted Prediction Error (GWPE) in a first pipeline; and performing signal processing in a second pipeline, the second pipeline and first pipeline executing in parallel, the second pipeline applying the output of the first pipeline to remove reverberation in an audio signal processed by the second pipeline.
- In Example 59, the subject matter of Example 58 optionally includes buffering the audio signal in a buffer; providing contents of the buffer every second to the first pipeline; and clearing the buffer after providing the contents.
- In Example 60, the subject matter of any one or more of Examples 58-59 optionally include wherein the first pipeline iteratively includes: calculating signal statistics; calculating a de-reverb filter; and applying the de-reverb filter to remove reverberation.
- Example 61 is a system comprising means to perform any of the methods 58-60.
- Example 62 is at least one machine readable medium including instructions that, when executed by a machine, cause the machine to perform any of methods 58-60.
- Example 63 is a system for de-reverberation of an audio signal, the system comprising: means for performing Generalized Weighted Prediction Error (GWPE) in a first pipeline; and means for performing signal processing in a second pipeline, the second pipeline and first pipeline executing in parallel, the second pipeline applying the output of the first pipeline to remove reverberation in an audio signal processed by the second pipeline.
- In Example 64, the subject matter of Example 63 optionally includes means for buffering the audio signal in a buffer; means for providing contents of the butter every second to the first pipeline; and means for clearing the buffer after providing the contents.
- In Example 65, the subject matter of any one or more of Examples 63-64 optionally include wherein the first pipeline includes means for iteratively: calculating signal statistics; calculating a de-reverb filter; and applying the de-reverb filter to remove reverberation.
- Example 66 is at least one machine readable medium including instructions for de-reverberation of an audio signal, the instructions, when executed by a machine, causing the machine to perform operations comprising: estimating signal statistics for an audio signal; performing Generalized Weighted Prediction Error (GWPE) using the estimated signal statistics; estimating a spatial correlation matrix; creating weighted matrix and vector inputs from the spatial correlation matrix estimation; and updating a de-reverb filter with the weighted matrix and vector.
- In Example 67, the subject matter of Example 66 optionally includes are performed inline to other audio signal processing.
- In Example 68, the subject matter of any one or more of Examples 66-67 optionally include wherein only one signal frame is buffered at a time to input into the operations.
- Example 69 is a method for de-reverberation of an audio signal, the method comprising: estimating signal statistics for an audio signal; performing Generalized Weighted Prediction Error (GWPE) using the estimated signal statistics; estimating a spatial correlation matrix; creating weighted matrix and vector inputs from the spatial correlation matrix estimation; and updating a de-reverb filter with the weighted matrix and vector.
- In Example 70, the subject matter of Example 69 optionally includes are performed inline to other audio signal processing.
- In Example 71, the subject matter of any one or more of Examples 69-70 optionally include wherein only one signal frame is buffered at a time to input into the operations.
- Example 72 is a system comprising means to perform any of the methods 69-71.
- Example 73 is at least one machine readable medium including instructions that, when executed by a machine, cause the machine to perform any of methods 69-71.
- Example 74 is a system for de-reverberation of an audio signal, the system comprising: means for estimating signal statistics for an audio signal; means for performing Generalized Weighted Prediction Error (GWPE) using the estimated signal statistics; means for estimating a spatial correlation matrix; means for creating weighted matrix and vector inputs from the spatial correlation matrix estimation; and means for updating a de-reverb filter with the weighted matrix and vector.
- In Example 75, the subject matter of Example 74 optionally includes are performed inline to other audio signal processing.
- In Example 76, the subject matter of any one or more of Examples 74-75 optionally include wherein only one signal frame is buffered at a time to input into the operations.
- Example 77 is at least one machine readable medium including instructions for de-reverberation of an audio signal, the instructions, when executed by a machine, causing the machine to perform operations comprising: performing de-reverb filter updating in a first pipeline; and performing signal processing in a second pipeline, the second pipeline and first pipeline executing in parallel, the second pipeline applying the output of the first pipeline to remove reverberation in an audio signal processed by the second pipeline.
- In Example 78, the subject matter of Example 77 optionally includes wherein the de-reverb filter updating includes: estimating signal statistics for the audio signal; performing Generalized Weighted Prediction Error (GWPE) using the estimated signal statistics; estimating a spatial correlation matrix; creating weighted matrix and vector inputs from the spatial correlation matrix estimation; and updating a de-reverb filter with the weighted matrix and vector.
- In Example 79, the subject matter of any one or more of Examples 77-78 optionally include wherein only one signal frame is buffered at a time to input into the first pipeline.
- Example 80 is a method for de-reverberation of an audio signal, the method comprising: performing de-reverb filter updating in a first pipeline; and performing signal processing in a second pipeline, the second pipeline and first pipeline executing in parallel, the second pipeline applying the output of the first pipeline to remove reverberation in an audio signal processed by the second pipeline.
- In Example 81, the subject matter of Example 80 optionally includes wherein the de-reverb filter updating includes: estimating signal statistics for the audio signal; performing Generalized Weighted Prediction Error (GWPE) using the estimated signal statistics; estimating a spatial correlation matrix; creating weighted matrix and vector inputs from the spatial correlation matrix estimation; and updating a de-reverb filter with the weighted matrix and vector.
- In Example 82, the subject matter of any one or more of Examples 80-81 optionally include wherein only one signal frame is buffered at a time to input into the first pipeline.
- Example 83 is a system comprising means to perform any of the methods 80-82.
- Example 84 is at least one machine readable medium including instructions that, when executed by a machine, cause the machine to perform any of methods 80-82.
- Example 85 is a system for de-reverberation of an audio signal, the system comprising: means for performing dc-reverb filter updating in a first pipeline; and means for performing signal processing in a second pipeline, the second pipeline and first pipeline executing in parallel, the second pipeline applying the output of the first pipeline to remove reverberation in an audio signal processed by the second pipeline.
- In Example 86, the subject matter of Example 85 optionally includes wherein the de-reverb filter updating includes: means for estimating signal statistics for the audio signal; means for performing Generalized Weighted Prediction Error (GWPE) using the estimated signal statistics; means for estimating a spatial correlation matrix; means for creating weighted matrix and vector inputs from the spatial correlation matrix estimation; and means for updating a de-reverb filter with the weighted matrix and vector.
- In Example 87, the subject matter of any one or more of Examples 85-86 optionally include wherein only one signal frame is buffered at a time to input into the first pipeline.
- The above detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show, by way of illustration, specific embodiments that may be practiced. These embodiments are also referred to herein as “examples.” Such examples may include elements in addition to those shown or described. However, the present inventors also contemplate examples in which only those elements shown or described are provided. Moreover, the present inventors also contemplate examples using any combination or permutation of those elements shown or described (or one or more aspects thereof), either with respect to a particular example (or one or more aspects thereof), or with respect to other examples (or one or more aspects thereof) shown or described herein.
- All publications, patents, and patent documents referred to in this document are incorporated by reference herein in their entirety, as though individually incorporated by reference. In the event of inconsistent usages between this document and those documents so incorporated by reference, the usage in the incorporated reference(s) should be considered supplementary to that of this document; for irreconcilable inconsistencies, the usage in this document controls.
- In this document, the terms “a” or “an” are used, as is common in patent documents, to include one or more than one, independent of any other instances or usages of “at least one” or “one or more.” In this document, the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.” Also, in the following claims, the terms “including” and “comprising” are open-ended, that is, a system, device, article, or process that includes elements in addition to those listed after such a term in a claim are still deemed to fall within the scope of that claim. Moreover, in the following claims, the terms “first,” “second,” and “third,” etc. are used merely as labels, and are not intended to impose numerical requirements on their objects.
- The above description is intended to be illustrative, and not restrictive. For example, the above-described examples (or one or more aspects thereof) may be used in combination with each other. Other embodiments may be used, such as by one of ordinary skill in the art upon reviewing the above description. The Abstract is to allow the reader to quickly ascertain the nature of the technical disclosure and is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Also, in the above Detailed. Description, various features may be grouped together to streamline the disclosure. This should not be interpreted as intending that an unclaimed disclosed feature is essential to any claim. Rather, inventive subject matter may lie in less than all features of a particular disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment. The scope of the embodiments should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
Claims (24)
1. A system for automatic speech recognition de-reverberation, the system comprising:
a sampler to obtain a portion of an audio stream, the portion of the audio stream being a proper subset of the audio stream;
a signal processor to create a filter by applying Generalized Weighted Prediction Error (GWPE) to the portion of the audio stream;
a multiplexer to apply the filter to the audio stream to remove reverberation; and
an interlink to provide a filtered version of the audio stream to an audio stream consumer.
2. The system of claim 1 , wherein the signal processor is in a first pipeline to create the filter and the multiplexer is in a second pipeline to apply the filter, the first and second pipelines arranged to execute in parallel.
3. The system of claim 1 , wherein, to obtain the portion of the audio stream, the sampler buffers the audio stream for a fixed time period.
4. The system of claim 3 , wherein the signal processor includes a loop to repetitively create the filter with subsequent fixed time periods.
5. The system of claim 3 , wherein fixed time period is an audio frame.
6. The system of claim 1 , wherein, to create the filter, the signal processor combines a current GWPE application to the audio stream with a previously created filter.
7. The system of claim 6 , wherein, to combine the current GWPE application to the audio stream with a previously created filter, the signal processor adds the current GWPE application as a first term to the previously created filter as a second term.
8. The system of claim 7 , wherein, to combine the current GWPE application to the audio stream with a previously created filter, the signal processor applies a first scaling factor to the first term and a second scaling factor to the second term prior to the adding.
9. At least one machine readable medium including instructions for automatic speech recognition de-reverberation, the instructions, when executed by a machine, cause the machine to perform operations comprising:
obtaining a portion of an audio stream, the portion of the audio stream being a proper subset of the audio stream;
creating a filter by applying Generalized Weighted Prediction Error (GWPE) to the portion of the audio stream;
applying the filter to the audio stream to remove reverberation; and
providing a filtered version of the audio stream to an audio stream consumer.
10. The at least one machine readable medium of claim 9 , wherein creating the filter occurs in a first pipeline and applying the filter occurs in a second pipeline, the first and second pipelines executing in parallel on a device.
11. The at least one machine readable medium of claim 9 , wherein obtaining the portion of the audio stream includes buffering the audio stream for a fixed time period.
12. The at least one machine readable medium of claim 11 , wherein the operations include repeating creating the filter with a subsequent fixed time period.
13. The at least one machine readable medium of claim 11 , wherein the fixed time period is an audio frame.
14. The at least one machine readable medium of claim 9 , wherein creating the filter includes combining a current GWPE application to the audio stream with a previously created filter.
15. The at least one machine readable medium of claim 14 , wherein combining the current GWPE application to the audio stream with a previously created filter includes adding the current GWPE application as a first term to the previously created filter as a second term.
16. The at least one machine readable medium of claim 15 , wherein combining the current GWPE application to the audio stream with a previously created filter includes applying a first scaling factor to the first term and a second scaling factor to the second term prior to the adding.
17. A method for automatic speech recognition de-reverberation, the method comprising:
obtaining a portion of an audio stream, the portion of the audio stream being a proper subset of the audio stream;
creating a filter by applying Generalized Weighted Prediction Error (GWPE) to the portion of the audio stream;
applying the filter to the audio stream to remove reverberation; and
providing a filtered version of the audio stream to an audio stream consumer.
18. The method of claim 17 , wherein creating the filter occurs in a first pipeline and applying the filter occurs in a second pipeline, the first and second pipelines executing in parallel on a device.
19. The method of claim 17 , wherein obtaining the portion of the audio stream includes buffering the audio stream for a fixed time period.
20. The method of claim 19 , comprising repeating creating the filter with a subsequent fixed time period.
21. The method of claim 19 , wherein the fixed time period is an audio frame.
22. The method of claim 17 , wherein creating the filter includes combining a current GWPE application to the audio stream with a previously created filter.
23. The method of claim 22 , wherein combining the current GWPE application to the audio stream with a previously created filter includes adding the current GWPE application as a first term to the previously created filter as a second term.
24. The method of claim 23 , wherein combining the current GWPE application to the audio stream with a previously created filter includes applying a first scaling factor to the first term and a second scaling factor to the second term prior to the adding.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/388,323 US20170365271A1 (en) | 2016-06-15 | 2016-12-22 | Automatic speech recognition de-reverberation |
PCT/US2017/032932 WO2017218129A1 (en) | 2016-06-15 | 2017-05-16 | Automatic speech recognition de-reverberation |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662350507P | 2016-06-15 | 2016-06-15 | |
US15/388,323 US20170365271A1 (en) | 2016-06-15 | 2016-12-22 | Automatic speech recognition de-reverberation |
Publications (1)
Publication Number | Publication Date |
---|---|
US20170365271A1 true US20170365271A1 (en) | 2017-12-21 |
Family
ID=60659998
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/388,275 Abandoned US20170366897A1 (en) | 2016-06-15 | 2016-12-22 | Microphone board for far field automatic speech recognition |
US15/388,323 Abandoned US20170365271A1 (en) | 2016-06-15 | 2016-12-22 | Automatic speech recognition de-reverberation |
US15/388,147 Abandoned US20170365255A1 (en) | 2016-06-15 | 2016-12-22 | Far field automatic speech recognition pre-processing |
US15/388,107 Expired - Fee Related US10657983B2 (en) | 2016-06-15 | 2016-12-22 | Automatic gain control for speech recognition |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/388,275 Abandoned US20170366897A1 (en) | 2016-06-15 | 2016-12-22 | Microphone board for far field automatic speech recognition |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/388,147 Abandoned US20170365255A1 (en) | 2016-06-15 | 2016-12-22 | Far field automatic speech recognition pre-processing |
US15/388,107 Expired - Fee Related US10657983B2 (en) | 2016-06-15 | 2016-12-22 | Automatic gain control for speech recognition |
Country Status (4)
Country | Link |
---|---|
US (4) | US20170366897A1 (en) |
EP (1) | EP3472834A4 (en) |
CN (1) | CN109074816B (en) |
WO (2) | WO2017218128A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170365255A1 (en) * | 2016-06-15 | 2017-12-21 | Adam Kupryjanow | Far field automatic speech recognition pre-processing |
EP3561807A1 (en) * | 2018-04-25 | 2019-10-30 | Comcast Cable Communications LLC | Microphone array beamforming control |
US10573301B2 (en) | 2018-05-18 | 2020-02-25 | Intel Corporation | Neural network based time-frequency mask estimation and beamforming for speech pre-processing |
CN111341345A (en) * | 2020-05-21 | 2020-06-26 | 深圳市友杰智新科技有限公司 | Control method and device of voice equipment, voice equipment and storage medium |
CN113963712A (en) * | 2020-07-21 | 2022-01-21 | 华为技术有限公司 | Method, electronic device, and computer-readable storage medium for filtering echoes |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107290711A (en) * | 2016-03-30 | 2017-10-24 | 芋头科技(杭州)有限公司 | A kind of voice is sought to system and method |
US11037330B2 (en) * | 2017-04-08 | 2021-06-15 | Intel Corporation | Low rank matrix compression |
US10403299B2 (en) * | 2017-06-02 | 2019-09-03 | Apple Inc. | Multi-channel speech signal enhancement for robust voice trigger detection and automatic speech recognition |
CN109979476B (en) * | 2017-12-28 | 2021-05-14 | 电信科学技术研究院 | Method and device for removing reverberation of voice |
USD920137S1 (en) * | 2018-03-07 | 2021-05-25 | Intel Corporation | Acoustic imaging device |
US10313786B1 (en) * | 2018-03-20 | 2019-06-04 | Cisco Technology, Inc. | Beamforming and gainsharing mixing of small circular array of bidirectional microphones |
US20190324117A1 (en) * | 2018-04-24 | 2019-10-24 | Mediatek Inc. | Content aware audio source localization |
US10667071B2 (en) * | 2018-05-31 | 2020-05-26 | Harman International Industries, Incorporated | Low complexity multi-channel smart loudspeaker with voice control |
KR102497468B1 (en) * | 2018-08-07 | 2023-02-08 | 삼성전자주식회사 | Electronic device including a plurality of microphones |
CN110491403B (en) | 2018-11-30 | 2022-03-04 | 腾讯科技(深圳)有限公司 | Audio signal processing method, device, medium and audio interaction equipment |
US11902758B2 (en) * | 2018-12-21 | 2024-02-13 | Gn Audio A/S | Method of compensating a processed audio signal |
CN109524004B (en) * | 2018-12-29 | 2022-03-08 | 思必驰科技股份有限公司 | Method for realizing parallel transmission of multi-channel audio and data, external voice interaction device and system |
CN109767769B (en) * | 2019-02-21 | 2020-12-22 | 珠海格力电器股份有限公司 | Voice recognition method and device, storage medium and air conditioner |
GB201902812D0 (en) * | 2019-03-01 | 2019-04-17 | Nokia Technologies Oy | Wind noise reduction in parametric audio |
CN110310655B (en) * | 2019-04-22 | 2021-10-22 | 广州视源电子科技股份有限公司 | Microphone signal processing method, device, equipment and storage medium |
KR20200132613A (en) * | 2019-05-16 | 2020-11-25 | 삼성전자주식회사 | Method and apparatus for speech recognition with wake on voice |
KR20210017252A (en) * | 2019-08-07 | 2021-02-17 | 삼성전자주식회사 | Method for processing audio sound based on multi-channel and an electronic device |
CN112887877B (en) * | 2021-01-28 | 2023-09-08 | 歌尔科技有限公司 | Audio parameter setting method and device, electronic equipment and storage medium |
EP4482173A1 (en) * | 2023-06-22 | 2024-12-25 | GN Audio A/S | Multimicrophone audio system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080201137A1 (en) * | 2007-02-20 | 2008-08-21 | Koen Vos | Method of estimating noise levels in a communication system |
US20100073216A1 (en) * | 2008-09-22 | 2010-03-25 | Denso Corporation | Radar device |
US20120023061A1 (en) * | 2008-09-30 | 2012-01-26 | Rockwell Automation Technologies, Inc. | Asymmetrical process parameter control system and method |
US20120288100A1 (en) * | 2011-05-11 | 2012-11-15 | Samsung Electronics Co., Ltd. | Method and apparatus for processing multi-channel de-correlation for cancelling multi-channel acoustic echo |
US20140286497A1 (en) * | 2013-03-15 | 2014-09-25 | Broadcom Corporation | Multi-microphone source tracking and noise suppression |
US20150126255A1 (en) * | 2012-04-30 | 2015-05-07 | Creative Technology Ltd | Universal reconfigurable echo cancellation system |
US9390723B1 (en) * | 2014-12-11 | 2016-07-12 | Amazon Technologies, Inc. | Efficient dereverberation in networked audio systems |
US9754605B1 (en) * | 2016-06-09 | 2017-09-05 | Amazon Technologies, Inc. | Step-size control for multi-channel acoustic echo canceller |
Family Cites Families (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07109560B2 (en) * | 1990-11-30 | 1995-11-22 | 富士通テン株式会社 | Voice recognizer |
US6453289B1 (en) * | 1998-07-24 | 2002-09-17 | Hughes Electronics Corporation | Method of noise reduction for speech codecs |
US6314396B1 (en) * | 1998-11-06 | 2001-11-06 | International Business Machines Corporation | Automatic gain control in a speech recognition system |
JP3180786B2 (en) * | 1998-11-27 | 2001-06-25 | 日本電気株式会社 | Audio encoding method and audio encoding device |
US6314394B1 (en) * | 1999-05-27 | 2001-11-06 | Lear Corporation | Adaptive signal separation system and method |
US6122331A (en) * | 1999-06-14 | 2000-09-19 | Atmel Corporation | Digital automatic gain control |
KR100304666B1 (en) * | 1999-08-28 | 2001-11-01 | 윤종용 | Speech enhancement method |
US6636829B1 (en) * | 1999-09-22 | 2003-10-21 | Mindspeed Technologies, Inc. | Speech communication system and method for handling lost frames |
EP1413167A2 (en) | 2001-07-20 | 2004-04-28 | Koninklijke Philips Electronics N.V. | Sound reinforcement system having an multi microphone echo suppressor as post processor |
WO2003010996A2 (en) | 2001-07-20 | 2003-02-06 | Koninklijke Philips Electronics N.V. | Sound reinforcement system having an echo suppressor and loudspeaker beamformer |
JP3984842B2 (en) * | 2002-03-12 | 2007-10-03 | 松下電器産業株式会社 | Howling control device |
DK174558B1 (en) | 2002-03-15 | 2003-06-02 | Bruel & Kjaer Sound & Vibratio | Transducers two-dimensional array, has set of sub arrays of microphones in circularly symmetric arrangement around common center, each sub-array with three microphones arranged in straight line |
CA2388439A1 (en) * | 2002-05-31 | 2003-11-30 | Voiceage Corporation | A method and device for efficient frame erasure concealment in linear predictive based speech codecs |
US6798380B2 (en) * | 2003-02-05 | 2004-09-28 | University Of Florida Research Foundation, Inc. | Robust capon beamforming |
US7039200B2 (en) | 2003-03-31 | 2006-05-02 | Microsoft Corporation | System and process for time delay estimation in the presence of correlated noise and reverberation |
DE60325699D1 (en) | 2003-05-13 | 2009-02-26 | Harman Becker Automotive Sys | Method and system for adaptive compensation of microphone inequalities |
US7415117B2 (en) | 2004-03-02 | 2008-08-19 | Microsoft Corporation | System and method for beamforming using a microphone array |
JP2005333180A (en) | 2004-05-18 | 2005-12-02 | Audio Technica Corp | Boundary microphone |
WO2009009568A2 (en) | 2007-07-09 | 2009-01-15 | Mh Acoustics, Llc | Augmented elliptical microphone array |
US8243951B2 (en) * | 2005-12-19 | 2012-08-14 | Yamaha Corporation | Sound emission and collection device |
NO345590B1 (en) | 2006-04-27 | 2021-05-03 | Dolby Laboratories Licensing Corp | Audio amplification control using specific volume-based hearing event detection |
EP1885154B1 (en) | 2006-08-01 | 2013-07-03 | Nuance Communications, Inc. | Dereverberation of microphone signals |
CN101192862B (en) * | 2006-11-30 | 2013-01-16 | 昂达博思公司 | Method and device for automatic gain control in wireless communication system |
US8488803B2 (en) * | 2007-05-25 | 2013-07-16 | Aliphcom | Wind suppression/replacement component for use with electronic systems |
KR100905586B1 (en) * | 2007-05-28 | 2009-07-02 | 삼성전자주식회사 | Performance Evaluation System and Method of Microphone for Remote Speech Recognition in Robots |
JP5075664B2 (en) * | 2008-02-15 | 2012-11-21 | 株式会社東芝 | Spoken dialogue apparatus and support method |
CN101986386B (en) * | 2009-07-29 | 2012-09-26 | 比亚迪股份有限公司 | Method and device for eliminating voice background noise |
US20110188671A1 (en) * | 2009-10-15 | 2011-08-04 | Georgia Tech Research Corporation | Adaptive gain control based on signal-to-noise ratio for noise suppression |
JP5423370B2 (en) | 2009-12-10 | 2014-02-19 | 船井電機株式会社 | Sound source exploration device |
TWI415117B (en) | 2009-12-25 | 2013-11-11 | Univ Nat Chiao Tung | Dereverberation and noise redution method for microphone array and apparatus using the same |
US8861756B2 (en) * | 2010-09-24 | 2014-10-14 | LI Creative Technologies, Inc. | Microphone array system |
US8798278B2 (en) * | 2010-09-28 | 2014-08-05 | Bose Corporation | Dynamic gain adjustment based on signal to ambient noise level |
US9100734B2 (en) | 2010-10-22 | 2015-08-04 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation |
FR2976710B1 (en) * | 2011-06-20 | 2013-07-05 | Parrot | DEBRISING METHOD FOR MULTI-MICROPHONE AUDIO EQUIPMENT, IN PARTICULAR FOR A HANDS-FREE TELEPHONY SYSTEM |
IN2015DN00484A (en) | 2012-07-27 | 2015-06-26 | Sony Corp | |
US9584642B2 (en) | 2013-03-12 | 2017-02-28 | Google Technology Holdings LLC | Apparatus with adaptive acoustic echo control for speakerphone mode |
JP6253031B2 (en) * | 2013-02-15 | 2017-12-27 | パナソニックIpマネジメント株式会社 | Calibration method |
EP3047483B1 (en) | 2013-09-17 | 2018-12-05 | Intel Corporation | Adaptive phase difference based noise reduction for automatic speech recognition (asr) |
EP2866465B1 (en) * | 2013-10-25 | 2020-07-22 | Harman Becker Automotive Systems GmbH | Spherical microphone array |
US9571930B2 (en) * | 2013-12-24 | 2017-02-14 | Intel Corporation | Audio data detection with a computing device |
US9124234B1 (en) * | 2014-04-11 | 2015-09-01 | Entropic Communications, LLC. | Method and apparatus for adaptive automatic gain control |
JP2016169358A (en) * | 2014-07-24 | 2016-09-23 | セントラル硝子株式会社 | Curable silicone resin composition and cured product of the same, and optical semiconductor device using them |
US9456276B1 (en) * | 2014-09-30 | 2016-09-27 | Amazon Technologies, Inc. | Parameter selection for audio beamforming |
US9997170B2 (en) | 2014-10-07 | 2018-06-12 | Samsung Electronics Co., Ltd. | Electronic device and reverberation removal method therefor |
US20160150315A1 (en) * | 2014-11-20 | 2016-05-26 | GM Global Technology Operations LLC | System and method for echo cancellation |
US9860635B2 (en) | 2014-12-15 | 2018-01-02 | Panasonic Intellectual Property Management Co., Ltd. | Microphone array, monitoring system, and sound pickup setting method |
US9800279B2 (en) * | 2015-02-16 | 2017-10-24 | Samsung Electronics Co., Ltd. | Method and apparatus for automatic gain control in wireless receiver |
CN104810021B (en) * | 2015-05-11 | 2017-08-18 | 百度在线网络技术(北京)有限公司 | The pre-treating method and device recognized applied to far field |
US9894428B1 (en) * | 2015-05-22 | 2018-02-13 | Amazon Technologies, Inc. | Electronic device with seamless fabric assembly |
US10028051B2 (en) * | 2015-08-31 | 2018-07-17 | Panasonic Intellectual Property Management Co., Ltd. | Sound source localization apparatus |
CN105355210B (en) * | 2015-10-30 | 2020-06-23 | 百度在线网络技术(北京)有限公司 | Preprocessing method and device for far-field speech recognition |
US20170366897A1 (en) | 2016-06-15 | 2017-12-21 | Robert Azarewicz | Microphone board for far field automatic speech recognition |
-
2016
- 2016-12-22 US US15/388,275 patent/US20170366897A1/en not_active Abandoned
- 2016-12-22 US US15/388,323 patent/US20170365271A1/en not_active Abandoned
- 2016-12-22 US US15/388,147 patent/US20170365255A1/en not_active Abandoned
- 2016-12-22 US US15/388,107 patent/US10657983B2/en not_active Expired - Fee Related
-
2017
- 2017-05-16 WO PCT/US2017/032913 patent/WO2017218128A1/en unknown
- 2017-05-16 WO PCT/US2017/032932 patent/WO2017218129A1/en active Application Filing
- 2017-05-16 CN CN201780029587.0A patent/CN109074816B/en active Active
- 2017-05-16 EP EP17813753.5A patent/EP3472834A4/en not_active Withdrawn
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080201137A1 (en) * | 2007-02-20 | 2008-08-21 | Koen Vos | Method of estimating noise levels in a communication system |
US20100073216A1 (en) * | 2008-09-22 | 2010-03-25 | Denso Corporation | Radar device |
US20120023061A1 (en) * | 2008-09-30 | 2012-01-26 | Rockwell Automation Technologies, Inc. | Asymmetrical process parameter control system and method |
US20120288100A1 (en) * | 2011-05-11 | 2012-11-15 | Samsung Electronics Co., Ltd. | Method and apparatus for processing multi-channel de-correlation for cancelling multi-channel acoustic echo |
US20150126255A1 (en) * | 2012-04-30 | 2015-05-07 | Creative Technology Ltd | Universal reconfigurable echo cancellation system |
US20140286497A1 (en) * | 2013-03-15 | 2014-09-25 | Broadcom Corporation | Multi-microphone source tracking and noise suppression |
US9390723B1 (en) * | 2014-12-11 | 2016-07-12 | Amazon Technologies, Inc. | Efficient dereverberation in networked audio systems |
US9754605B1 (en) * | 2016-06-09 | 2017-09-05 | Amazon Technologies, Inc. | Step-size control for multi-channel acoustic echo canceller |
Non-Patent Citations (1)
Title |
---|
Woudenberg et al., "A Block Least Squares Approach to Acoustic Echo Cancellation", ICASSP '99, Proceedings, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, April 1999, Pages 869 to 872. (Year: 1999) * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170365255A1 (en) * | 2016-06-15 | 2017-12-21 | Adam Kupryjanow | Far field automatic speech recognition pre-processing |
US10657983B2 (en) | 2016-06-15 | 2020-05-19 | Intel Corporation | Automatic gain control for speech recognition |
EP3561807A1 (en) * | 2018-04-25 | 2019-10-30 | Comcast Cable Communications LLC | Microphone array beamforming control |
US10586538B2 (en) | 2018-04-25 | 2020-03-10 | Comcast Cable Comminications, LLC | Microphone array beamforming control |
US11437033B2 (en) | 2018-04-25 | 2022-09-06 | Comcast Cable Communications, Llc | Microphone array beamforming control |
US10573301B2 (en) | 2018-05-18 | 2020-02-25 | Intel Corporation | Neural network based time-frequency mask estimation and beamforming for speech pre-processing |
CN111341345A (en) * | 2020-05-21 | 2020-06-26 | 深圳市友杰智新科技有限公司 | Control method and device of voice equipment, voice equipment and storage medium |
CN113963712A (en) * | 2020-07-21 | 2022-01-21 | 华为技术有限公司 | Method, electronic device, and computer-readable storage medium for filtering echoes |
Also Published As
Publication number | Publication date |
---|---|
EP3472834A4 (en) | 2020-02-12 |
US20170365255A1 (en) | 2017-12-21 |
US10657983B2 (en) | 2020-05-19 |
CN109074816A (en) | 2018-12-21 |
US20170366897A1 (en) | 2017-12-21 |
WO2017218129A1 (en) | 2017-12-21 |
US20170365274A1 (en) | 2017-12-21 |
CN109074816B (en) | 2023-11-28 |
WO2017218128A1 (en) | 2017-12-21 |
EP3472834A1 (en) | 2019-04-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20170365271A1 (en) | Automatic speech recognition de-reverberation | |
US10403299B2 (en) | Multi-channel speech signal enhancement for robust voice trigger detection and automatic speech recognition | |
CN108464015B (en) | Microphone array signal processing system | |
US8996372B1 (en) | Using adaptation data with cloud-based speech recognition | |
CN111418010A (en) | Multi-microphone noise reduction method and device and terminal equipment | |
US10930298B2 (en) | Multiple input multiple output (MIMO) audio signal processing for speech de-reverberation | |
US11798574B2 (en) | Voice separation device, voice separation method, voice separation program, and voice separation system | |
EP3047483A1 (en) | Adaptive phase difference based noise reduction for automatic speech recognition (asr) | |
EP2788980A1 (en) | Harmonicity-based single-channel speech quality estimation | |
CN112634880B (en) | Method, apparatus, device, storage medium and program product for speaker identification | |
US9390723B1 (en) | Efficient dereverberation in networked audio systems | |
JPWO2009069662A1 (en) | Voice detection system, voice detection method, and voice detection program | |
US9570088B2 (en) | Signal processor and method therefor | |
BR112014009647B1 (en) | NOISE Attenuation APPLIANCE AND NOISE Attenuation METHOD | |
JP5669036B2 (en) | Parameter estimation device for signal separation, signal separation device, parameter estimation method for signal separation, signal separation method, and program | |
WO2016077557A1 (en) | Adaptive interchannel discriminitive rescaling filter | |
JP2025503325A (en) | Method and system for speech signal enhancement with reduced latency - Patents.com | |
JP4729534B2 (en) | Reverberation apparatus, dereverberation method, dereverberation program, and recording medium thereof | |
CN110648681B (en) | Speech enhancement method, device, electronic equipment and computer-readable storage medium | |
JP6221258B2 (en) | Signal processing apparatus, method and program | |
US20240170003A1 (en) | Audio Signal Enhancement with Recursive Restoration Employing Deterministic Degradation | |
CN114299916B (en) | Speech enhancement method, computer device and storage medium | |
US20240161762A1 (en) | Full-band audio signal reconstruction enabled by output from a machine learning model | |
US20240185830A1 (en) | Method, device, and computer program product for text to speech | |
WO2022247427A1 (en) | Signal filtering method and apparatus, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KUPRYJANOW, ADAM;MAZIEWSKI, PRZEMYSLAW;KURYLO, LUKASZ;AND OTHERS;SIGNING DATES FROM 20170111 TO 20170112;REEL/FRAME:040989/0615 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |