WO1996017309A1 - Blind signal processing system employing information maximization to recover unknown signals through unsupervised minimization of output redundancy - Google Patents
Blind signal processing system employing information maximization to recover unknown signals through unsupervised minimization of output redundancy Download PDFInfo
- Publication number
- WO1996017309A1 WO1996017309A1 PCT/US1995/015981 US9515981W WO9617309A1 WO 1996017309 A1 WO1996017309 A1 WO 1996017309A1 US 9515981 W US9515981 W US 9515981W WO 9617309 A1 WO9617309 A1 WO 9617309A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signals
- scaling
- bias
- measure
- samples
- Prior art date
Links
- 238000012545 processing Methods 0.000 title abstract description 28
- 238000000034 method Methods 0.000 claims abstract description 79
- 238000013528 artificial neural network Methods 0.000 claims abstract description 25
- 239000000203 mixture Substances 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 120
- 238000012549 training Methods 0.000 claims description 46
- 230000009466 transformation Effects 0.000 claims description 24
- 230000003111 delayed effect Effects 0.000 claims description 7
- 230000001413 cellular effect Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 claims description 3
- 230000001934 delay Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 31
- 238000004458 analytical method Methods 0.000 abstract description 4
- 239000000284 extract Substances 0.000 abstract 1
- 238000000926 separation method Methods 0.000 description 50
- 239000011159 matrix material Substances 0.000 description 33
- 238000009826 distribution Methods 0.000 description 20
- 238000012546 transfer Methods 0.000 description 19
- 239000013598 vector Substances 0.000 description 16
- 238000012880 independent component analysis Methods 0.000 description 13
- 210000002569 neuron Anatomy 0.000 description 12
- 238000005315 distribution function Methods 0.000 description 10
- 230000000694 effects Effects 0.000 description 10
- 230000001364 causal effect Effects 0.000 description 9
- 238000002474 experimental method Methods 0.000 description 9
- 230000001537 neural effect Effects 0.000 description 9
- 230000002087 whitening effect Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000004044 response Effects 0.000 description 7
- 230000001131 transforming effect Effects 0.000 description 7
- 230000009467 reduction Effects 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 238000012886 linear function Methods 0.000 description 4
- 238000003672 processing method Methods 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 4
- 241000136406 Comones Species 0.000 description 3
- 230000010365 information processing Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000036961 partial effect Effects 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 239000011435 rock Substances 0.000 description 3
- 230000001953 sensory effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000002238 attenuated effect Effects 0.000 description 2
- 238000005311 autocorrelation function Methods 0.000 description 2
- 238000002592 echocardiography Methods 0.000 description 2
- 238000005562 fading Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 235000019645 odor Nutrition 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000000946 synaptic effect Effects 0.000 description 2
- 240000002627 Cordeauxia edulis Species 0.000 description 1
- 241000920471 Lucilia caesar Species 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000002582 magnetoencephalography Methods 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000004751 neurological system process Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000009738 saturating Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 210000001044 sensory neuron Anatomy 0.000 description 1
- 230000031893 sensory processing Effects 0.000 description 1
- 210000000225 synapse Anatomy 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2134—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis
- G06F18/21342—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis using statistical independence, i.e. minimising mutual information or maximising non-gaussianity
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L25/00—Baseband systems
- H04L25/02—Details ; arrangements for supplying electrical power along data transmission lines
- H04L25/03—Shaping networks in transmitter or receiver, e.g. adaptive shaping networks
- H04L25/03006—Arrangements for removing intersymbol interference
- H04L25/03165—Arrangements for removing intersymbol interference using neural networks
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03H—IMPEDANCE NETWORKS, e.g. RESONANT CIRCUITS; RESONATORS
- H03H21/00—Adaptive networks
- H03H21/0012—Digital adaptive filters
- H03H21/0025—Particular filtering methods
- H03H2021/0034—Blind source separation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L25/00—Baseband systems
- H04L25/02—Details ; arrangements for supplying electrical power along data transmission lines
- H04L25/03—Shaping networks in transmitter or receiver, e.g. adaptive shaping networks
- H04L25/03006—Arrangements for removing intersymbol interference
- H04L2025/03433—Arrangements for removing intersymbol interference characterised by equaliser structure
- H04L2025/03439—Fixed structures
- H04L2025/03445—Time domain
- H04L2025/03464—Neural networks
Definitions
- TECHNICAL FIELD This invention relates generally to systems for recovering the original unknown signals subjected to transfer through an unknown multichannel system by processing the known output signals therefrom and relates specifically to an information-maximizing neural network that uses unsupervised learning to recover each of a multiplicity of unknown source signals in a multichannel having reverberation.
- blind Signal Processing In many signal processing applications, the sample signals provided by the sensors are mixtures of many unknown sources.
- the "separation of sources” problem is to extract the original unknown signals from these known mixtures. Generally, the signal sources as well as their mixture characteristics are unknown. Without knowledge of the signal sources other than the general statistical assumption of source independence, this signal processing problem is known in the art as the “blind source separation problem". The separation is “blind” because nothing is known about the statistics of the independent source signals and nothing is known about the mixing process.
- the blind separation problem is encountered in many familiar forms. For instance, the well-known "cocktail party” problem refers to a situation where the unknown (source) signals are sounds generated in a room and the known (sensor) signals are the outputs of several microphones.
- Each of the source signals is delayed and attenuated in some (time varying) manner during transmission from source to microphone, where it is then mixed with other independently delayed and attenuated source signals, including multipath versions of itself (reverberation), which are delayed versions arriving from different directions.
- This signal processing problem arises in many contexts other than the simple situation where each of two unknown mixtures of two speaking voices reaches one of two microphones.
- Other examples involving many sources and many receivers include the separation of radio or radar signals sensed by an array of antennas, the separation of odors in a mixture by a sensor array, the parsing of the environment into separate objects by our biological visual system, and the separation of biomagnetic sources by a superconducting quantum interference device (SQUID) array in magnetoencephalography.
- SQUID superconducting quantum interference device
- Other important examples of the blind source separation problem include sonar array signal processing and signal decoding in cellular telecommunication systems.
- the blind source separation problem is closely related to the more familiar "blind deconvolution" problem, where a single unknown source signal is extracted from a known mixed signal that includes many time-delayed versions of the source originating from unknown multipath distortion or reverberation (self-convolution).
- the need for blind deconvolution or "blind equalization" arises in a number of important areas such as data transmission, acoustic reverberation cancellation, seismic deconvolution and image restoration.
- high-speed data transmission over a telephone communication channel relies on the use of adaptive equalization, which can operate either in a traditional training mode that transmits a known training sequence to establish deconvolution parameters or in a blind mode.
- the class of communication systems that may need blind equalization capability includes high-capacity line-of-site digital radio (cellular telecommunications). Such a channel suffers from anomalous propagation conditions arising from natural conditions, which can degrade digital radio performance by causing the transmitted signal to propagate along several paths of different electrical length (multipath fading). Severe multipath fading requires a blind equalization scheme to recover channel operation.
- a reflection coefficient sequence can be blindly extracted from the received signal, which includes echoes produced at the different reflection points of the unknown geophysical model.
- seismic deconvolution method used to remove the source waveform from a seismogram ignores valuable phase information contained in the reflection seismogram. This limitation is overcome by using blind deconvolution to process the received signal by assuming only a general statistical geological reflection coefficient model. Blind deconvolution can also be used to recover unknown images that are blurred by transmission through unknown systems.
- blind Separation Methods Because of the fundamental importance of both the blind separation and blind deconvolution signal processing problems, practitioners have proposed several classes of methods for solving the problems.
- the blind separation problem was first addressed in 1986 by Jutten and Herault ("Blind separation of sources,
- Part I An adaptive algorithm based on neuromimetic architecture", Signal processing 24 (1991) 1-10), who disclose the HJ neural network with backward connections that can usually solve the simple two-element blind source separation problem.
- the HJ network iterations may not converge to a proper solution in some cases, depending on the initial state and on the source statistics.
- the HJ network appears to converge in two stages, the first of which quickly decorrelates the two output signals and the second of which more slowly provides the statistical independence necessary to recover the two unknown sources.
- ICA Independent Component Analysis
- PCA Principal Components Analysis
- Comon suggests that although mutual information is an excellent measure of the contrast between joint probabilities, it is not practical because of computational complexity. Instead, Comon teaches the use of the fourth-order cumulant tensor (thereby ignoring fifth-order and higher statistics) as a preferred measure of contrast because the associated computational complexity increases only as the fifth power of the number of unknown signals.
- Burel proposes an iterative scheme for ICA employing a back propagation neural network for blind source separation that handles non-linear mixtures through iterative minimization of a cost function. Burefs network differs from the HJ network. which does not minimize any cost function. Like the HJ network, Burel' s system can separate the source signals in the presence of noise without attempting noise reduction (no noise hypotheses are assumed). Also, like the HJ system, practical convergence is not guaranteed because of the presence of local minima and computational complexity.
- Burel' s system differs sharply from traditional supervised back-propagation applications because his cost function is not defined in terms of difference between measured and desired outputs (the desired outputs are unknown). His cost function is instead based on output signal statistics alone, which permits "unsupervised” learning in his network.
- Blind deconvolution is an example of "unsupervised" learning in the sense that it learns to identify the inverse of an unknown linear time-invariant system without any physical access to the system input signal.
- This unknown system may be a nonminimum phase system having one or more zeroes outside the unit circle in the frequency domain.
- the blind deconvolution process must identify both the magnitude and the phase of the system transfer function. Although identification of the magnitude component requires only the second-order statistics of the system output signal, identification of the phase component is more difficult because it requires the higher-order statistics of the output signal. Accordingly, some form of non-linearity is needed to extract the higher-order statistical information contained in the magnitude and phase components of the output signal. Such non-linearity is useful only for unknown source signals having non- Gaussian statistics. There is no solution to the problem when the input source signal is Gaussian-distributed and the channel is nonminimum-phase because all polyspectra of Gaussian processes of order greater than two are identical to zero.
- Classical adaptive deconvolution methods are based almost entirely on second order statistics, and thus fail to operate correctly for nonminimum-phase channels unless the input source signal is accessible. This failure stems from the inability of second- order statistics to distinguish minimum-phase information from maximum-phase information of the channel.
- a minimum phase system (having all zeroes within the unit circle in the frequency domain) exhibits a unique relationship between its amplitude response and phase response so that second order statistics in the output signal are sufficient to recover both amplitude and phase information for the input signal.
- second-order statistics of the output signal alone are insufficient to recover phase information and.
- Bussgang techniques for blind deconvolution can be viewed as iterative polyspectral techniques, where rationale are developed for choosing the polyspectral orders with which to work and their relative weights by subtracting a source signal estimate from the sensor signal output.
- the Bussgang techniques can be understood with reference to Sandro Bellini (chapter 2: Bussgang Techniques For Blind
- the approaches in the art to the blind separation and deconvolution problems can be classified as those using non-linear transforming functions to spin off higher-order statistics (Jutten et al. and Bellini) and those using explicit calculation of higher-order cumulants and polyspectra (Haykin and Hatzinakos et al.).
- the HJ network does not reliably converge even for the simplest two-source problem and the fourth-order cumulant tensor approach does not reliably converge because of truncation of the cumulant expansion.
- Unsupervised Learning Methods In the biological sensory system arts, practitioners have formulated neural training optimality criteria based on studies of biological sensory neurons, which are known to solve blind separation and deconvolution problems of many kinds. The class of supervised learning techniques normally used with artificial neural networks are not useful for these problems because supervised learning requires access to the source signals for training purposes. Unsupervised learning instead requires some rationale for internally creating the necessary teaching signals without access to the source signals. Practitioners have proposed several rationale for unsupervised learning in biological sensory systems. For instance, Linsker ("An Application of the Principle of Maximum Information Preservation to Linear
- Linsker describes a two-phase learning algorithm for maximizing the mutual information between two layers of a neural network.
- Linsker assumes a linear input-output transforming function and multivariate Gaussian statistics for both source signals and noise components. With these assumptions, Linsker shows that a "local synaptic" (biological) learning rule is sufficient to maximize mutual information but he neither considers nor suggests solutions to the more general blind processing problem of recovering non-Gaussian source signals in a non-linear transforming environment.
- Barlow (Unsupervised Learning", Neural Computation 1 (1989) 295-31 1 ) also examines this issue and shows that "minimum entropy coding" in a biological sensory system operates to reduce the troublesome mutual information component even at the expense of suboptimal symbol frequency distribution. Barlow shows that the mutual information component of redundancy can be minimized in a neural network by feeding each neuron output back to other neuron inputs through anti-Hebbian synapses to discourage correlated output activity. This "redundancy reduction" principle is offered to explain how unsupervised perceptual learning occurs in animals.
- blind source separation and blind deconvolution are related problems in signal processing.
- the blind source separation problem can be succinctly stated as where a set of unknown source signals
- the blind deconvolution task is to recover S(t) by finding and convolving X(t) with a tapped delay-line filter W ] ... W j having the impulse response W(t) that reverses the effect of the unknown filter A(t).
- source signals are corrupted by the superposition of other source signals and, in the other, a single source signal is corrupted by superposition of time-delayed versions of itself.
- unsupervised learning is required because no error signals are available and no training signals are provided.
- second-order statistics alone are inadequate to solve the more general problem. For instance, a second-order decorrelation technique such as that proposed by Barlow et al.
- second-order decorrelation techniques based on the autocorrelation function are phase-blind and do not offer sufficient information to estimate the phase characteristics of the corrupting filter A(t) when applied to the more general blind deconvolution problem.
- both blind signal processing problems require the use of higher-order statistics as well as certain assumptions regarding source signal statistics.
- the sources are assumed to be statistically independent and non- Gaussian. With this assumption, the problem of learning [ ⁇ ] becomes the ICA problem described by Comon.
- the original signal S(t) is assumed to be a "white” process consisting of independent symbols.
- the blind deconvolution problem then becomes the problem of removing from the measured signal X(t) any statistical dependencies across time that are introduced by the corrupting filter A(t). This process is sometimes denominated the "whitening" of X(t).
- both the ICA procedure and the "whitening" of a time series are denominated "redundancy reduction".
- the first class of techniques uses some type of explicit estimation of cumulants and polyspectra, which can be appreciated with reference to Haykin and Hatzinakos et al. Disadvantageous ⁇ , such "brute force” techniques are computationally intensive for high numbers of sources or taps and may be inaccurate when cumulants higher than fourth order are ignored, as they usually must be.
- the second class of techniques uses static non-linear functions, the Taylor series expansions of which yield higher-order terms. Iterative learning rules containing such terms are expected to be somehow sensitive to the particular higher-order statistics necessary to accurate redundancy reduction. This reasoning is used by Comon et al. to explain the HJ network and by Bellini to explain the Bussgang deconvolver.
- This invention solves the above problem by introducing a new class of unsupervised learning procedures for a neural network that solve the general blind signal processing problem by maximizing joint input/output entropy through gradient ascent to minimize mutual information in the outputs.
- the network of this invention arises from the unexpectedly advantageous observation that a particular type of non-linear signal transform creates learning signals with the higher-order statistics needed to separate unknown source signals by minimizing mutual information among neural network output signals.
- This invention also arises from the second unexpectedly advantageous discovery that mutual information among neural network outputs can be minimized by maximizing joint output entropy when the learning transform is selected to match the signal probability distributions of interest.
- the process of this invention can be appreciated as a generalization of the infomax principle to non-linear units with arbitrarily distributed inputs uncorrupted by any known noise sources. It is a feature of the system of this invention that each measured input signal is passed through a predetermined sigmoid function to adaptively maximize information transfer by optimal alignment of the monotonic sigmoid slope with the input signal peak probability density. It is an advantage of this invention that redundancy is minimized among a multiplicity of outputs merely by maximizing total information throughput, thereby producing the independent components needed to solve the blind separation problem.
- Figs. 1A-1D illustrate the feature of sigmoidal transfer function alignment for optimal information flow in a sigmoidal neuron from the prior art
- Figs. 2A-2C illustrate the blind source separation and blind deconvolution problems from the prior art
- Figs. 3A-3C provide graphical diagrams illustrating a joint entropy maximization example where maximizing joint entropy fails to produce statistically independent output signals because of improper selection of the non-linear transforming function;
- Fig. 4 shows the theoretical relationship between the several entropies and mutual information from the prior art
- Fig. 5 shows a functional block diagram of an illustrative embodiment of the source separation network of this invention
- Fig. 6 is a functional block diagram of an illustrative embodiment of the blind decorrelating network of this invention
- Fig. 7 is a functional block diagram of an illustrative embodiment of the combined blind source separation and blind decorrelation network of this invention
- Figs. 8A-8C show typical probability density functions for speech, rock music and Gaussian white noise
- Figs. 9A-9B show typical spectra of a speech signal before and after decorrelation is performed according to the procedure of this invention
- Fig 10 shows the results of a blind source separation experiment performed using the procedure of this invention.
- Figs. 11A-1 IL show time domain filter charts illustrating the results of the blind deconvolution of several different corrupted human speech signals according to the procedure of this invention.
- This invention arises from the unexpectedly advantageous observation that a class of unsupervised learning rules for maximizing information transfer in a neural network solves the blind signal processing problem by minimizing redundancy in the network outputs.
- This class of new learning rules is now described in information theoretic terms, first for a single input and then for a multiplicity of unknown input signals.
- I(y,x) H(y) - H(y
- H(y) is the entropy of the output signal
- x) is that portion of the output signal entropy that did not come from the input signal
- I(y,x) is the mutual information.
- Eqn. 1 can be appreciated with reference to Fig. 4, which illustrates the well-known relationship between input signal entropy H(x), output signal entropy H(y) and mutual information I(y,x).
- Fig. 1A when a single input x is passed through a transforming function g(x) to give an output variable y, both I(y,x) and H(y) are maximized when the high density portion (mode) of the input probability density function f x (x) is aligned with the steepest sloping portion of non-linear transforming function g(x).
- This is equivalent to the alignment of a neuron input-output function to the expected distribution of incoming signals that leads to optimal information flow in sigmoidal neurons shown in Figs. 1C-1D.
- Fig. ID shows a zero-mode distribution matched to the sigmoid function in Fig. IC.
- Fig. ID shows a zero-mode distribution matched to the sigmoid function in Fig. IC.
- the input x having a probability distribution f x (x) is passed through the non-linear sigmoidal function g(x) to produce output signal y having a probability distribution f y (y).
- the information in the probability density function f y (y) varies responsive to the alignment of the mean and variance of x with respect to the threshold w 0 and slope w of g(x).
- H(y) E[ln
- the second term on the right side of Eqn. 5 is simply the unknown input signal entropy H(x), which cannot be affected by any changes in the parameter w that defines non-linear function g(x). Therefore, only the first term on the right side of Eqn. 5 need be maximized to maximize the output signal entropy H(y).
- This first term is the average logarithm of the effect of input signal x on output signal y and may be maximized by considering the input signals as a "training set" with density f x (x) and deriving an online, stochastic gradient descent learning rule expressed as:
- Eqn. 6 defines a scaling measure ⁇ w for changing the parameter w to adjust the log of the slope of sigmoid function.
- Any sigmoid function can be used to specify measure ⁇ w, such as the widely-used logistic transfer function.
- the hyperbolic tangent function is a member of the general class of functions g(x) each representing a solution to the partial differential equation.
- ⁇ - g(x) l -
- g(x) l r [Eqn. 8] dx with a boundary condition of g(0) 0.
- ⁇ w ⁇ -(x(l + 2y) + w "1 ) [Eqn. 1 1 ] where ⁇ >0 is a learning rate.
- the bias measure ⁇ w 0 operates to align the steepest part of the sigmoid curve g(x) with the peak x of f x (x), thereby matching input density to output slope in the manner suggested intuitively by Eqn. 3.
- the scaling measure ⁇ w operates to align the edges of the sigmoid curve slope to the particular width (proportional to variance) of f x (x). Thus, narrow probability density functions lead to sharply-sloping sigmoid functions.
- the scaling measure of Eqn. 1 1 defines an "anti-Hebbian" learning rule with a second "anti-decay” term.
- the first anti-Hebbian term prevents the uninformative solutions where output signal y saturates at 0 or 1 but such an unassisted anti-Hebbian rule alone allows the slope w to disappear at zero.
- the second anti-decay term (1/w) forces output signal y away from the other uninformative situation where slope w is so flat that output signal y stabilizes at 0.5 (Fig.
- the bias measure ⁇ w 0 then becomes proportional to -2y and the scaling measure ⁇ w becomes proportional to
- det[.] denotes the determinant of a square matrix.
- the method of this invention maximizes the natural log of the Jacobian to maximize output entropy H(Y) for a given input entropy H(X), as can be appreciated with reference to Eqn. 5.
- / I represents the volume of space in [Y] into which points in [X] are mapped. Maximizing this quantity attempts to spread the training set of input points evenly [Y].
- the first anti-Hebbian term has become an outer product of vectors and the second anti-decay term has generalized to an "anti-redundancy" term in the form of the inverse of the transpose of the weight matrix [W].
- Eqn. 15 can be written, for an individual weight Wy as follows:
- cof ⁇ Wy denotes the cofactor of element W jJ5 which is known to be ( ⁇ 1) 1+J times the determinant of the matrix obtained by removing the i" 1 row and the j" 1 column from
- the square weight matrix [W] and ⁇ is the learning rate.
- the i ,h bias measure ⁇ W i0 can be expressed as follows:
- Figs. 2B-2C illustrate the blind deconvolution problem.
- Fig. 2C shows an unobserved data sequence S(t) entering an unknown channel A(t), which responsively produces the measured signal X(t) that can be blindly equalized through a causal filter
- Fig. 2B shows the time series X(t), which is presumed to have a length of J samples (not shown).
- X(t) is convolved with a causal filter having I weighted taps, W,,...,W, and impulse response W(t).
- the causal filter output signal U(t) is then passed through a non-linear sigmoid function g(.) to create the training signal Y(t) (not shown).
- [Y] g([U])
- [X] are signal sample vectors having J samples.
- the vector ordering need not be temporal.
- [W] is a banded lower triangular J x J square matrix expressed as:
- the joint probability distribution functions f [y] ([Y]) and f [x] ([X]) are related by the Jacobian of the Eqn. 22 transformation according to Eqn. 13.
- the ensemble can be "created” from a single time series by breaking the series into sequences of length I, which reduces [W] in Eqn. 23 to an I x I lower triangular matrix.
- sigmoidal functions may be used to generate similarly useful learning rules
- a network with two outputs y, and y- which may be either two output channels from a blind source separation network or two signal
- the joint entropy can be maximized by maximizing the individual entropies while minimizing the mutual information I(y,,y 2 ) shared between the two.
- Both the ICA and the "whitening" approach to deconvolution are examples of pair-wise minimization of mutual information I(y ]5 y 2 ) for all pairs y, and y 2 .
- This process is variously denominated factorial code learning, predictability minimization, independent component analysis ICA and redundancy reduction.
- the process of this invention is a stochastic gradient ascent procedure that maximizes the joint entropy H(y,,y 2 ), thereby differing sharply from these "whitening" and ICA procedures known for minimizing mutual information I(y )5 y ).
- the system of this invention rests on the unexpectedly advantageous discovery of the general conditions under which maximizing joint entropy operates to reduce mutual information
- Fig. 3C shows one pathological example where a
- the sigmoidal function is not limited to the usual two functions and indeed can be tailored to the particular class of probability distribution functions expected by the process of this invention. Any function that is a member of the class of solutions to the partial differential Eqn. 8 provides a sigmoidal function suitable for use with the process of this invention. It can be shown that this general class of sigmoidal functions leads to the following two learning rules according to this invention: cof[W .]
- Fig. 8A shows a typical speech probability distribution function
- Fig. 8B shows the probability distribution function for rock music and Fig. 8C shows a typical Gaussian white noise distribution.
- the inventor has found that joint entropy maximization for sigmoidal networks always minimizes the mutual information between the network outputs for all super-Gaussian signal distributions tested.
- Special sigmoid functions can be selected that are suitable for accomplishing the same result for sub-
- Table 1 provides the anti-Hebbian terms from the learning rules resulting from several interesting
- the information-maximization rule consists of an anti-redundancy term which always has a form of [[W] T ] and an anti-Hebbian term that keeps the unit from saturating.
- the other functions use the net input u, as the output variable rather using the actual transformed output y,. Tests performed by the inventor show that
- Fig. 5 shows a functional block diagram illustrating an exemplary embodiment of a four-port blind signal separation network according to this invention.
- input signals ⁇ X, ⁇ represents "sensor” output signals such as the electrical signal received from a microphone at a “cocktail party” or an antenna output signal.
- Each of the four network output signals ⁇ U ; ⁇ is related to the four input signals by weights so that
- the four bias weights ⁇ W l0 ⁇ are updated regularly according to the learning rule of Eqn. 18 discussed above and each of the sixteen scaling weights ⁇ W y ⁇ are updated regularly according to the learning rule of Eqn. 17 discussed above. These updates can occur after every signal sample or may be accumulated over many signal samples for updating in a global mode.
- Each of the weight elements in Fig. 5 exemplified by element 18 includes the logic necessary to produce and accumulate the ⁇ W update according to the applicable learning rule.
- the separation network in Fig. 5 can also be used to remove interfering signals from a receive signal merely by, for example, isolating the interferer as output signal U, and then subtracting U, from the receive signal of interest, such as receive signal X,.
- the network shown in Fig. 5 is herein denominated a "interference cancelling" network.
- Fig. 6 shows a functional block diagram illustrating a simple causal filter operated according to the method of this invention for blind deconvolution.
- a time- varying signal is presented to the network at input 22.
- the five spaced taps ⁇ TJ are separated by a time-delay interval T in the manner well-known in the art for transversal filters.
- the five weight factors ⁇ W, ⁇ are established and updated by internal logic (not shown) according to the learning rules shown in Eqns. 26-27 discussed above.
- the five weighted tap signals ⁇ U, ⁇ are summed at a summation device 24 to produce the single time- varying output signal U,.
- Fig. 7 shows a functional block diagram illustrating the combination of blind source separation network and blind deconvolution filter systems of this invention.
- the blind separation learning rules and the blind deconvolution rules discussed above can be easily combined in the form exemplified by Fig. 7.
- the objective is to maximize the natural logarithm of a Jacobian with local lower triangular structure, which yields the expected learning rule that forces the leading weights ⁇ W ljk ⁇ in the filters to follow the blind separation rules and all others to follow a decorrelation rule except that tapped weights ⁇ W ijk ⁇ are interposed between a delayed input and an output.
- g(.) denotes the selected sigmoidal transfer function. If the hyperbolic tangent function is selected as the sigmoidal non-linearity, the following training rules are used in the system of this invention:
- Each of the source separation planes exemplified by plane 24, operates substantially as discussed above in connection with Fig. 5 for the three input
- Plane 24 contains the lead weights for the 16 individual causal filters formed by the network.
- the inventor conducted experiments using three-second segments of speech recorded from various speakers with only one speaker per recording. All speech segments were sampled at 8,000 Hz from the output of the auxiliary microphone of a Sparc- 10 workstation. No special post-processing was performed on the waveforms other than the normalization of amplitudes to a common interval [-3,3] to permit operation with the equipment used.
- the network was trained using the stochastic gradient ascent procedure
- Unsupervised learning in a neural network may proceed either continuously or in a global mode. Continuous learning consists in slightly modifying the weights after each propagation of an input vector through the network. This kind of learning is useful for
- the mixing matrix [A] was used to generate the several mixed time series [X j ] from the original sources [S .
- the mixing matrix [A] was used to generate the several mixed time series [X j ] from the original sources [S .
- the blind separation procedure of this invention was found to fail only when: (a) more than one unknown source is Gaussian white noise, and (b) when the mixing matrix
- Figs. 11 A, HE and 111 contained some zero values.
- Fig. HE represents the filter [0.8,0,0,0,1].
- the taps were sometimes adjacent to each other, as in Figs. 11 A-l ID, and sometimes spaced apart in time, as in Figs. HI-l lL.
- the leading weight of each filter is the right-most bar in each histogram, exemplified by bar 30 in Fig.
- FIGs. 1 1 A-l ID A whitening experiment is shown in Figs. 1 1 A-l ID, a barrel-effect experiment in Figs. 1 1E-1 IH and multiple-echo experiment in Figs. 1 11-1 IL.
- the time domain characteristics of convolving filter [A] is shown followed by those of the ideal deconvolving filter [W ide , those of the filter produced by the process of this invention [W] and the time domain pattern produced by convolution of [W] and [A].
- the convolution [W]*[A] should be a delta- function consisting of only a single high value at the right-most position of the leading weight when [W] correctly inverts [A].
- the first whitening example shows what happens when "deconvolving" a speech signal that has not been corrupted (convolving filter [A] is a delta- function). If the tap
- Fig. 9A shows the spectrum of the speech sequence before
- Fig. 9B shows the speech spectrum after deconvolution by the filter shown in Fig. 1 IC- Whitened speech sounds like a clear sharp version of the original signal because the phase structure is preserved.
- the system is maximizing information throughput in the channel.
- the deconvolving filter of this invention will recover a whitened version of it rather than the exact original.
- the filter taps are spaced further apart, as in Figs. 11E-1 II, there is less opportunity for simple whitening.
- Fig. 1 IE shows a 6.25 ms echo is added to the speech signal. This creates a mild audible barrel effect. Because filter 1 IE is finite
- Figs. 111- 1 IL shows a set of exponentially-decaying echoes spread out over 275 ms that may be inverted by a two-point filter shown in Fig. 1 U with a small decaying correction on the left, which is an artifact of the truncation of the convolving filter shown in Fig. 1 11.
- the learned filter corresponds almost exactly to the ideal filter in Fig. HJ and the deconvolution in Fig. 1 IL is almost perfect.
- This result demonstrates the sensitivity of the blind processing method of this invention in cases where the tap-spacing is great enough (100 sample intervals) that simple whitening cannot interfere noticeably with the deconvolution process.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Power Engineering (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
- Complex Calculations (AREA)
Abstract
A neural network system and unsupervised learning process for separating unknown source signals from their received mixtures by solving the Independent Components Analysis (ICA) problem. The unsupervised learning procedure solves the general blind signal processing problem by maximizing joint output entropy through gradient ascent to minimize mutual information in the outputs. The neural network system can separate a multiplicity of unknown source signals from measured mixture signals where the mixture characteristics and the original source signals are both unknown. The system can be easily adapted to solve the related blind deconvolution problem that extracts an unknown source signal from the output of an unknown reverberating channel.
Description
BLIND SIGNAL PROCESSING SYSTEM EMPLOYING INFORMATION
MAXIMIZATION TO RECOVER UNKNOWN SIGNALS THROUGH
UNSUPERVISED MINIMIZATION OF OUTPUT REDUNDANCY
REFERENCE TO GOVERNMENT RIGHTS
The U. S. Government has rights in the invention disclosed and claimed herein pursuant to Office of Naval Research contract no. .
TECHNICAL FIELD This invention relates generally to systems for recovering the original unknown signals subjected to transfer through an unknown multichannel system by processing the known output signals therefrom and relates specifically to an information-maximizing neural network that uses unsupervised learning to recover each of a multiplicity of unknown source signals in a multichannel having reverberation.
BACKGROUND ART
Blind Signal Processing: In many signal processing applications, the sample signals provided by the sensors are mixtures of many unknown sources. The "separation of sources" problem is to extract the original unknown signals from these known mixtures. Generally, the signal sources as well as their mixture characteristics are unknown. Without knowledge of the signal sources other than the general statistical assumption of source independence, this signal processing problem is known in the art as the "blind source separation problem". The separation is "blind" because nothing is known about the statistics of the independent source signals and nothing is known about the mixing process. The blind separation problem is encountered in many familiar forms. For instance, the well-known "cocktail party" problem refers to a situation where the unknown (source) signals are sounds generated in a room and the known (sensor) signals are the outputs of several microphones. Each of the source signals is delayed and attenuated in some (time varying) manner during transmission from source to microphone, where it is then mixed with other independently delayed and attenuated
source signals, including multipath versions of itself (reverberation), which are delayed versions arriving from different directions.
This signal processing problem arises in many contexts other than the simple situation where each of two unknown mixtures of two speaking voices reaches one of two microphones. Other examples involving many sources and many receivers include the separation of radio or radar signals sensed by an array of antennas, the separation of odors in a mixture by a sensor array, the parsing of the environment into separate objects by our biological visual system, and the separation of biomagnetic sources by a superconducting quantum interference device (SQUID) array in magnetoencephalography. Other important examples of the blind source separation problem include sonar array signal processing and signal decoding in cellular telecommunication systems.
The blind source separation problem is closely related to the more familiar "blind deconvolution" problem, where a single unknown source signal is extracted from a known mixed signal that includes many time-delayed versions of the source originating from unknown multipath distortion or reverberation (self-convolution). The need for blind deconvolution or "blind equalization" arises in a number of important areas such as data transmission, acoustic reverberation cancellation, seismic deconvolution and image restoration. For instance, high-speed data transmission over a telephone communication channel relies on the use of adaptive equalization, which can operate either in a traditional training mode that transmits a known training sequence to establish deconvolution parameters or in a blind mode. The class of communication systems that may need blind equalization capability includes high-capacity line-of-site digital radio (cellular telecommunications). Such a channel suffers from anomalous propagation conditions arising from natural conditions, which can degrade digital radio performance by causing the transmitted signal to propagate along several paths of different electrical length (multipath fading). Severe multipath fading requires a blind equalization scheme to recover channel operation.
In reflection seismology, a reflection coefficient sequence can be blindly extracted from the received signal, which includes echoes produced at the different reflection points of the unknown geophysical model. The traditional linear-predictive
--.-
seismic deconvolution method used to remove the source waveform from a seismogram ignores valuable phase information contained in the reflection seismogram. This limitation is overcome by using blind deconvolution to process the received signal by assuming only a general statistical geological reflection coefficient model. Blind deconvolution can also be used to recover unknown images that are blurred by transmission through unknown systems.
Blind Separation Methods: Because of the fundamental importance of both the blind separation and blind deconvolution signal processing problems, practitioners have proposed several classes of methods for solving the problems. The blind separation problem was first addressed in 1986 by Jutten and Herault ("Blind separation of sources,
Part I: An adaptive algorithm based on neuromimetic architecture", Signal processing 24 (1991) 1-10), who disclose the HJ neural network with backward connections that can usually solve the simple two-element blind source separation problem. Disadvantageously, the HJ network iterations may not converge to a proper solution in some cases, depending on the initial state and on the source statistics. When convergence is possible, the HJ network appears to converge in two stages, the first of which quickly decorrelates the two output signals and the second of which more slowly provides the statistical independence necessary to recover the two unknown sources. Comon et al. ("Blind separation of sources, Part II: Problems statement", Signal Processing 24 (1991) 11-20) show that the HJ network can be viewed as an adaptive process for cancelling higher-order cumulants in the output signals, thereby achieving some degree of statistical independence by minimizing higher-order statistics among the known sensor signals.
Other practitioners have attempted to improve the HJ network to remove some of the disadvantageous features. For instance, Sorouchyari ("Blind separation of sources,
Part III: Stability analysis" Signal Processing 24 (1991) 21-29) examines other higher- order non-linear transforming functions other than those simple first and third order functions proposed by Jutten et al. but concludes that the higher-order functions cannot improve implementation of the HJ network. In U. S. Patent 5,383,164, granted on January 17, 1995 and fully incorporated herein by this reference, Li et al. describe a blind source separation system based on the HJ neural network model that employs
linear beamforming to improve HJ network separation performance. Also, John C. Platt et al. ("Networks For The Separation of Sources That Are Superimposed and Delayed". Advances in Neural Information Processing Systems, vol. 4, Morgan-Kaufmann, San Mateo, 1992) propose extending the original magnitude-optimizing HJ network to estimate a matrix of time delays in addition to the HJ magnitude mixing matrix. Platt et al. observe that their modified network is disadvantaged by multiple stable states and unpredictable convergence.
Pierre Comon ("Independent component analysis, a new concept?" Signal Processing 36 (1994) 287-314) provides a detailed discussion of Independent Component Analysis (ICA), which defines a class of closed form techniques useful for solving the blind identification and deconvolution problems. As is known in the art, ICA searches for a transformation matrix to minimize the statistical dependence among components of a random vector. This is distinguished from Principal Components Analysis (PCA), which searches for a transformation matrix to minimize statistical correlation among components of a random vector, a solution that is inadequate for the blind separation problem. Thus, PCA can be applied to minimize second order cross-moments among a vector of sensor signals while ICA can be applied to minimize sensor signal joint probabilities, which offers a solution to the blind separation problem. Comon suggests that although mutual information is an excellent measure of the contrast between joint probabilities, it is not practical because of computational complexity. Instead, Comon teaches the use of the fourth-order cumulant tensor (thereby ignoring fifth-order and higher statistics) as a preferred measure of contrast because the associated computational complexity increases only as the fifth power of the number of unknown signals.
Similarly, Gilles Burel ("Blind separation of sources: A nonlinear neural algorithm", Neural Networks 5 (1992) 937-947) asserts that the blind source separation problem is nothing more than the Independent Components Analysis (ICA) problem. However, Burel proposes an iterative scheme for ICA employing a back propagation neural network for blind source separation that handles non-linear mixtures through iterative minimization of a cost function. Burefs network differs from the HJ network. which does not minimize any cost function. Like the HJ network, Burel' s system can separate the source signals in the presence of noise without attempting noise reduction
(no noise hypotheses are assumed). Also, like the HJ system, practical convergence is not guaranteed because of the presence of local minima and computational complexity.
Burel' s system differs sharply from traditional supervised back-propagation applications because his cost function is not defined in terms of difference between measured and desired outputs (the desired outputs are unknown). His cost function is instead based on output signal statistics alone, which permits "unsupervised" learning in his network.
Blind Deconvolution Methods: The blind deconvolution art can be appreciated with reference to the text edited by Simon Haykin (Blind Deconvolution,
Prentice-Hall, New Jersey, 1994), which discusses four general classes of blind deconvolution techniques, including Bussgang processes, higher-order cumulant equalization, polyspectra and maximum likelihood sequence estimation. Haykin neither considers nor suggests specific neural network techniques suitable for application to the blind deconvolution problem.
Blind deconvolution is an example of "unsupervised" learning in the sense that it learns to identify the inverse of an unknown linear time-invariant system without any physical access to the system input signal. This unknown system may be a nonminimum phase system having one or more zeroes outside the unit circle in the frequency domain. The blind deconvolution process must identify both the magnitude and the phase of the system transfer function. Although identification of the magnitude component requires only the second-order statistics of the system output signal, identification of the phase component is more difficult because it requires the higher-order statistics of the output signal. Accordingly, some form of non-linearity is needed to extract the higher-order statistical information contained in the magnitude and phase components of the output signal. Such non-linearity is useful only for unknown source signals having non- Gaussian statistics. There is no solution to the problem when the input source signal is Gaussian-distributed and the channel is nonminimum-phase because all polyspectra of Gaussian processes of order greater than two are identical to zero.
Classical adaptive deconvolution methods are based almost entirely on second order statistics, and thus fail to operate correctly for nonminimum-phase channels unless the input source signal is accessible. This failure stems from the inability of second- order statistics to distinguish minimum-phase information from maximum-phase
information of the channel. A minimum phase system (having all zeroes within the unit circle in the frequency domain) exhibits a unique relationship between its amplitude response and phase response so that second order statistics in the output signal are sufficient to recover both amplitude and phase information for the input signal. In a nonminimum-phase system, second-order statistics of the output signal alone are insufficient to recover phase information and. because the system does not exhibit a unique relationship between its amplitude response and phase response, blind recovery of source signal phase information is not possible without exploiting higher-order output signal statistics. These require some form of non-linear processing because linear processing is restricted to the extraction of second-order statistics.
Bussgang techniques for blind deconvolution can be viewed as iterative polyspectral techniques, where rationale are developed for choosing the polyspectral orders with which to work and their relative weights by subtracting a source signal estimate from the sensor signal output. The Bussgang techniques can be understood with reference to Sandro Bellini (chapter 2: Bussgang Techniques For Blind
Deconvolution and Equalization", Blind Deconvolution, S. Haykin (ed.), Prentice Hall, Englewood Cliffs, NJ, 1994), who characterizes the Bussgang process as a class of processes having an auto-correlation function equal to the cross-correlation of the process with itself as it exits from a zero-memory non-linearity. Polyspectral techniques for blind deconvolution lead to unbiased estimates of the channel phase without any information about the probability distribution of the input source signals. The general class of polyspectral solutions to the blind decorrelation problem can be understood with reference to a second Simon Haykin textbook ("Ch. 20: Blind Deconvolution", Adaptive Filter Theory, Second Ed., Simon Haykin (ed.), Prentice Hall, Englewood Cliffs, NJ, 1991) and to Hatzinakos et al. ("Ch. 5: Blind Equalization
Based on Higher Order Statistics (HOS)", Blind Deconvolution, Simon Haykin (ed.), Prentice Hall, Englewood Cliffs, NJ, 1994).
Thus, the approaches in the art to the blind separation and deconvolution problems can be classified as those using non-linear transforming functions to spin off higher-order statistics (Jutten et al. and Bellini) and those using explicit calculation of higher-order cumulants and polyspectra (Haykin and Hatzinakos et al.). The HJ network
does not reliably converge even for the simplest two-source problem and the fourth-order cumulant tensor approach does not reliably converge because of truncation of the cumulant expansion. There is accordingly a clearly-felt need for blind signal processing methods that can reliably solve the blind processing problem for significant numbers of source signals.
Unsupervised Learning Methods: In the biological sensory system arts, practitioners have formulated neural training optimality criteria based on studies of biological sensory neurons, which are known to solve blind separation and deconvolution problems of many kinds. The class of supervised learning techniques normally used with artificial neural networks are not useful for these problems because supervised learning requires access to the source signals for training purposes. Unsupervised learning instead requires some rationale for internally creating the necessary teaching signals without access to the source signals. Practitioners have proposed several rationale for unsupervised learning in biological sensory systems. For instance, Linsker ("An Application of the Principle of Maximum Information Preservation to Linear
Systems", Advances in Neural Information Processing Systems 1, D.S. Touretzky (ed.).
Morgan-Kaufmann, 1989) shows that his well-known "infomax" principle (first proposed in 1987) explains why biological sensor systems operate to minimize information loss between neural layers in the presence of noise. In a later work ("Local Synaptic Learning Rules Suffice to Maximize Mutual Information in a Linear Network",
Neural Computation 4 (1992) 691-702) Linsker describes a two-phase learning algorithm for maximizing the mutual information between two layers of a neural network. However, Linsker assumes a linear input-output transforming function and multivariate Gaussian statistics for both source signals and noise components. With these assumptions, Linsker shows that a "local synaptic" (biological) learning rule is sufficient to maximize mutual information but he neither considers nor suggests solutions to the more general blind processing problem of recovering non-Gaussian source signals in a non-linear transforming environment.
Simon Haykin ("Ch. 11: Self-Organizing Systems III: Information-Theoretic Models", Neural Networks: A Comprehensive Foundation, S. Haykin (ed.) MacMillan,
New York 1994) discusses Linsker' s "infomax" principle, which is independent of the
neural network learning rule used in its implementation. Haykin also discusses other well-known principles such as the "minimization of information loss" principle suggested in 1988 by Plumbley et al. and Barlow's "principle of minimum redundancy", first proposed in 1961, either of which can be used to derive a class of unsupervised learning rules.
Joseph Atick ("Could information theory provide an ecological theory of sensory processing?", Network 3 (1992) 213-251) applies Shannon's information theory to the neural processes seen in biological optical sensors. Atick observes that information redundancy is useful only in noise and includes two components: (a) unused channel capacity arising from suboptimal symbol frequency distribution and (b) intersymbol redundancy or mutual information. Atick suggests that optical neurons apparently evolved to minimize the troublesome intersymbol redundancy (mutual information) component of redundancy rather than to minimize overall redundancy. H. B. Barlow ("Unsupervised Learning", Neural Computation 1 (1989) 295-31 1 ) also examines this issue and shows that "minimum entropy coding" in a biological sensory system operates to reduce the troublesome mutual information component even at the expense of suboptimal symbol frequency distribution. Barlow shows that the mutual information component of redundancy can be minimized in a neural network by feeding each neuron output back to other neuron inputs through anti-Hebbian synapses to discourage correlated output activity. This "redundancy reduction" principle is offered to explain how unsupervised perceptual learning occurs in animals.
S. Laughlin ("A Simple Coding Procedure Enhances a Neuron's Information Capacity", Z. Naturforsch 36 (1981) 910-912) proves that the optical neuron of a blowfly optimizes information capacity through equalization of the probability distribution for each neural code value (minimizing the unused channel capacity component of redundancy), thereby confirming Barlow's "minimum redundancy" principle. J. J. Hopfield ("Olfactory computation and object perception", Proc. Natl. Acad. Sci. USA 88 (Aug. 1991) 6462-6466) examines the separation of odor source solution in neurons using the HJ neuron model for minimizing output redundancy. Becker et al. ("Self-organizing neural network that discovers surfaces in random- dot stereograms", Nature, vol. 355, pp. 161-163. January 9, 1992) propose a standard
back-propagation neural network learning model modified to replace the external teacher (supervised learning) by internally-derived teaching signals (unsupervised learning). Becker et al. use non-linear networks to maximize mutual information between different sets of outputs, contrary to the blind signal recovery requirement. By increasing redundancy, their network discovers invariance in separate groups of inputs, which can be selected out of information passed forward to improve processing efficiency.
Thus, it is known in the neural network arts that anti-Hebbian mutual interaction can be used to explain the decorrelation or minimization of redundancy observed in biological vision systems. This can be appreciated with reference to H. B. Barlow et al. ("Adaptation and Decorrelation in the Cortex" The Computing Neuron R. Durbin et al. (eds.), Addison- Wesley, , 1989) and to Schraudolph et al. ("Competitive Anti- Hebbian Learning of Invariance", Advances in Neural Information Processing Systems
4, J. E. Moody et al. (eds.), Morgan-Kaufmann, , 1992). In fact, practitioners have suggested that Linsker's "infomax" principle and Barlow's "minimum redundancy" principle may both yield the same neural network learning procedures. Until now, however, non-linear versions of these procedures applicable to the blind signal processing problem have been unknown in the art.
The Blind Processing Problem: As mentioned above, blind source separation and blind deconvolution are related problems in signal processing. The blind source separation problem can be succinctly stated as where a set of unknown source signals
S,(t),..., S,(t), are mixed together linearly by an unknown matrix [Aj . Nothing is known about the sources or the mixing process, both of which may be time- varying, although the mixing process is assumed to vary slowly with respect to the source. The blind separation task is to recover the original source signals from the J>I measured superpositions of them, X,(t),..., Xj(t) by finding a square matrix [W(j] that is a permutation of the inverse of the unknown matrix [Aj . The blind deconvolution problem can be similarly stated as where a single unknown signal S(t) is convolved with an unknown tapped delay-line filter A,,..., A,, producing the corrupted measured signal X(t) = A(t) * S(t), where A(t) is the impulse response of the unknown (perhaps slowly time-varying) filter. The blind deconvolution task is to recover S(t) by finding and
convolving X(t) with a tapped delay-line filter W] ... Wj having the impulse response W(t) that reverses the effect of the unknown filter A(t).
There are many similarities between the two problems. In one, source signals are corrupted by the superposition of other source signals and, in the other, a single source signal is corrupted by superposition of time-delayed versions of itself. In both cases, unsupervised learning is required because no error signals are available and no training signals are provided. In both cases, second-order statistics alone are inadequate to solve the more general problem. For instance, a second-order decorrelation technique such as that proposed by Barlow et al. would find uncorrelated (linearly independent) projections [Yj] of the input sensor signals [Xj] when attempting to separate unknown source signals {Sj} but is limited to discovering a symmetric decorrelation matrix that cannot reverse the effects of mixing matrix [Aj if the mixing matrix is asymmetric.
Similarly, second-order decorrelation techniques based on the autocorrelation function, such as prediction-error filters, are phase-blind and do not offer sufficient information to estimate the phase characteristics of the corrupting filter A(t) when applied to the more general blind deconvolution problem.
Thus, both blind signal processing problems require the use of higher-order statistics as well as certain assumptions regarding source signal statistics. For the blind separation problem, the sources are assumed to be statistically independent and non- Gaussian. With this assumption, the problem of learning [Ψ^] becomes the ICA problem described by Comon. For blind deconvolution, the original signal S(t) is assumed to be a "white" process consisting of independent symbols. The blind deconvolution problem then becomes the problem of removing from the measured signal X(t) any statistical dependencies across time that are introduced by the corrupting filter A(t). This process is sometimes denominated the "whitening" of X(t).
As used herein, both the ICA procedure and the "whitening" of a time series are denominated "redundancy reduction". The first class of techniques uses some type of explicit estimation of cumulants and polyspectra, which can be appreciated with reference to Haykin and Hatzinakos et al. Disadvantageous^, such "brute force" techniques are computationally intensive for high numbers of sources or taps and may be inaccurate when cumulants higher than fourth order are ignored, as they usually must
be. The second class of techniques uses static non-linear functions, the Taylor series expansions of which yield higher-order terms. Iterative learning rules containing such terms are expected to be somehow sensitive to the particular higher-order statistics necessary to accurate redundancy reduction. This reasoning is used by Comon et al. to explain the HJ network and by Bellini to explain the Bussgang deconvolver.
Disadvantageously, there is no assurance that the particular higher-order statistics yielded by the (heuristically) selected non-linear function are weighted in the manner necessary for achieving statistical independence. Recall that the known approach to attempting improvement of the HJ network is to test various non-linear functions selected heuristically and that the original functions are not yet improved in the art.
Accordingly, there is a need in the art for an improved blind processing method, such as some method of rigorously linking a static non-linearity to a learning rule that performs gradient ascent in some parameter guaranteed to be usefully related to statistical dependency. Until now, this was believed to be practically impossible because of the infinite number of higher-order statistics associated with statistical dependency.
The related unresolved problems and deficiencies are clearly felt in the art and are solved by this invention in the manner described below.
DISCLOSURE OF INVENTION
This invention solves the above problem by introducing a new class of unsupervised learning procedures for a neural network that solve the general blind signal processing problem by maximizing joint input/output entropy through gradient ascent to minimize mutual information in the outputs. The network of this invention arises from the unexpectedly advantageous observation that a particular type of non-linear signal transform creates learning signals with the higher-order statistics needed to separate unknown source signals by minimizing mutual information among neural network output signals. This invention also arises from the second unexpectedly advantageous discovery that mutual information among neural network outputs can be minimized by maximizing joint output entropy when the learning transform is selected to match the signal probability distributions of interest.
The process of this invention can be appreciated as a generalization of the infomax principle to non-linear units with arbitrarily distributed inputs uncorrupted by any known noise sources. It is a feature of the system of this invention that each measured input signal is passed through a predetermined sigmoid function to adaptively maximize information transfer by optimal alignment of the monotonic sigmoid slope with the input signal peak probability density. It is an advantage of this invention that redundancy is minimized among a multiplicity of outputs merely by maximizing total information throughput, thereby producing the independent components needed to solve the blind separation problem. The foregoing, together with other objects, features and advantages of this invention, can be better appreciated with reference to the following specification, claims and the accompanying drawing.
BRIEF DESCRIPTION OF DRAWING
The objects, advantages and features of this invention will be more readily appreciated from the following detailed description, when read in conjuction with the accompanying drawing, in which:
Figs. 1A-1D illustrate the feature of sigmoidal transfer function alignment for optimal information flow in a sigmoidal neuron from the prior art;
Figs. 2A-2C illustrate the blind source separation and blind deconvolution problems from the prior art;
Figs. 3A-3C provide graphical diagrams illustrating a joint entropy maximization example where maximizing joint entropy fails to produce statistically independent output signals because of improper selection of the non-linear transforming function;
Fig. 4 shows the theoretical relationship between the several entropies and mutual information from the prior art;
Fig. 5 shows a functional block diagram of an illustrative embodiment of the source separation network of this invention;
Fig. 6 is a functional block diagram of an illustrative embodiment of the blind decorrelating network of this invention;
Fig. 7 is a functional block diagram of an illustrative embodiment of the combined blind source separation and blind decorrelation network of this invention;
Figs. 8A-8C show typical probability density functions for speech, rock music and Gaussian white noise; Figs. 9A-9B show typical spectra of a speech signal before and after decorrelation is performed according to the procedure of this invention;
Fig 10 shows the results of a blind source separation experiment performed using the procedure of this invention; and
Figs. 11A-1 IL show time domain filter charts illustrating the results of the blind deconvolution of several different corrupted human speech signals according to the procedure of this invention.
BEST MODE FOR CARRYING OUT THE INVENTION
This invention arises from the unexpectedly advantageous observation that a class of unsupervised learning rules for maximizing information transfer in a neural network solves the blind signal processing problem by minimizing redundancy in the network outputs. This class of new learning rules is now described in information theoretic terms, first for a single input and then for a multiplicity of unknown input signals. Information Maximization For a Single Source
In a single-input network, the mutual information that the output y of a network contains about its input x can be expressed as:
I(y,x) = H(y) - H(y|x) [Eqn. 1] where H(y) is the entropy of the output signal, H(y|x) is that portion of the output signal entropy that did not come from the input signal and I(y,x) is the mutual information. Eqn. 1 can be appreciated with reference to Fig. 4, which illustrates the well-known relationship between input signal entropy H(x), output signal entropy H(y) and mutual information I(y,x).
When there is no noise or when the noise is treated as merely another unknown input signal, the mapping between input x and output y is deterministic and conditional entropy H(y|x) has its lowest possible value, diverging to minus infinity. This divergence is a consequence of the generalization of information theory to continuous
random variables. The output entropy H(y) is really the "differential" entropy of output signal y with respect to some reference, such as the noise level or the granularity of the discrete representation of the variables in x and y. These theoretical complexities can be avoided by restricting the network to the consideration of the gradient of information theoretic quantities with respect to some parameter w. Such gradients are as well- behaved as are discrete-variable entropies because the reference terms involved in the definition of differential entropies disappear. In particular, Eqn. 1 can be differentiated to obtain the corresponding gradients as follows:
- I (y,x) -. 3 H(y) [Eqn. 2] σw σw because, in the noiseless case. H(y|x) does not depend on w and its differential disappears. Thus, for continuous deterministic matchings, the mutual information between network input and network output can be maximized by maximizing the gradient of the entropy of the output alone, which is an unexpectedly advantageous consequence of treating noise as another unknown source signal. This permits the discussion to continue without knowledge of the input signal statistics.
Referring to Fig. 1A, when a single input x is passed through a transforming function g(x) to give an output variable y, both I(y,x) and H(y) are maximized when the high density portion (mode) of the input probability density function fx(x) is aligned with the steepest sloping portion of non-linear transforming function g(x). This is equivalent to the alignment of a neuron input-output function to the expected distribution of incoming signals that leads to optimal information flow in sigmoidal neurons shown in Figs. 1C-1D. Fig. ID shows a zero-mode distribution matched to the sigmoid function in Fig. IC. In Fig. 1A, the input x having a probability distribution fx(x) is passed through the non-linear sigmoidal function g(x) to produce output signal y having a probability distribution fy(y). The information in the probability density function fy(y) varies responsive to the alignment of the mean and variance of x with respect to the threshold w0 and slope w of g(x). When g(x) is monotonically increasing or decreasing (thereby having a unique inverse), the output signal probability density function fy(y) can be written as a function of the input signal probability density function f (x) as follows:
fχ ( ) fy(y) = — [Eqn. 3]
' d x '
where | • | denotes absolute value.
Eqn. 3 leads to the unexpected discovery of an advantageous gradient descent process because the output signal entropy can be expressed in terms of the output signal probability density function as follows:
H(y) = -E[ln fv (y)] = -/fy(y) ln fy(y) dy [Eqn. 4] where E[.] denotes expected value. Substituting Eqn. 3 into Eqn. 4 produces the following:
H(y) = E[ln | |] - E[lnfx(x)] [Eqn. 5] The second term on the right side of Eqn. 5 is simply the unknown input signal entropy H(x), which cannot be affected by any changes in the parameter w that defines non-linear function g(x). Therefore, only the first term on the right side of Eqn. 5 need be maximized to maximize the output signal entropy H(y). This first term is the average logarithm of the effect of input signal x on output signal y and may be maximized by considering the input signals as a "training set" with density fx(x) and deriving an online, stochastic gradient descent learning rule expressed as:
Eqn. 6 defines a scaling measure Δw for changing the parameter w to adjust the log of the slope of sigmoid function. Any sigmoid function can be used to specify measure Δw, such as the widely-used logistic transfer function. y = (1 - e "")"1, where u = wx + w0 [Eqn. 7] in which the input x is first aligned with the sigmoid function through multiplication by a scaling weight w and addition of a bias weight w0 to create an aligned signal u, which is then non-linearly transformed by the logistic transfer function to create signal y. Another useful sigmoid function is the hyperbolic tangent function expressed as y=tanh(u). The hyperbolic tangent function is a member of the general class of functions g(x) each representing a solution to the partial differential equation.
^- g(x) = l - | g(x) lr [Eqn. 8] dx
with a boundary condition of g(0)=0. The parameter r should be selected appropriately for the assumed kurtosis of the input probability distribution. For kurtosis above 3, either the hyperbolic tangent function (r=2) or the non-member logistic transfer function is well suited for the process of this invention.
For the logistic transfer function (Eqn. 7), the terms in Eqn. 6 can be expressed as:
-J. = wy( l -y) [Eqn. 9] dx
-^( l *y \ ) = y(l -y)(l +wx(l -2y)) [Eqn. 10]
Dividing Eqn. 10 by Eqn. 9 produces a scaling measure Δw for the scaling weight learning rule of this invention based on the logistic function:
Δw = ε-(x(l + 2y) + w "1) [Eqn. 1 1 ] where ε>0 is a learning rate.
Similar reasoning leads to a bias measure Δw0 for the bias weight learning rule of this invention based on the logistic transfer function, expressed as: Δw0 = ε-(l -2y) [Eqn. 12]
These two learning rules (Eqns. 1 1-12) are implemented by adjusting the respective w or w0 at a "learning rate" (ε), which is usually less than one percent
(ε<0.01), as is known in the neural network arts. Referring to Figs. 1A-1C, if the input probability density function fx(x) is Gaussian, then the bias measure Δw0 operates to align the steepest part of the sigmoid curve g(x) with the peak x of fx(x), thereby matching input density to output slope in the manner suggested intuitively by Eqn. 3.
The scaling measure Δw operates to align the edges of the sigmoid curve slope to the particular width (proportional to variance) of fx(x). Thus, narrow probability density functions lead to sharply-sloping sigmoid functions. The scaling measure of Eqn. 1 1 defines an "anti-Hebbian" learning rule with a second "anti-decay" term. The first anti-Hebbian term prevents the uninformative solutions where output signal y saturates at 0 or 1 but such an unassisted anti-Hebbian rule alone allows the slope w to disappear at zero. The second anti-decay term (1/w) forces output signal y away from the other uninformative situation where slope w is so flat that output signal y stabilizes at 0.5 (Fig. 1A).
The effect of these two balanced effects is to produce an output probability density function fy(y) that is close to the flat unit distribution function, which is known to be the maximum entropy distribution for a random variable bounded between 0 and 1. Fig. IB shows a family of sigmoid output distributions, with the most informative one occurring at sigmoid slope wopt. Using the logistic transfer function as the non¬ linear sigmoid transformation, the learning rule in Eqn. 1 1 eventually brings the slope w to wop„ thereby maximizing entropy in output signal y. The bias rule in Eqn. 12 centers the mode in the sloping region at w0 (Fig. 1A).
If the hyperbolic tangent sigmoid function is used, the bias measure Δw0 then becomes proportional to -2y and the scaling measure Δw becomes proportional to
-2xy+w"', such that Δw0 = -2yε and Δw = ε(-2xy+w"'), where ε is the learning rate. These learning rules offer the same general features and advantages of the learning rules discussed above in connection with Eqns. 10-1 1 for the logistic transfer function. In general, any sigmoid function in the class of solutions to Eqn. 8 selected for parametric suitability to a particular input probability distribution can be used in accordance with the process of this invention to solve the blind signal processing problem. These unexpectedly advantageous learning rules can be generalized to the multi-dimensional case. Joint Entropy Maximization for Multiple Sources To appreciate the multiple-signal blind processing method of this invention, consider the general network diagram shown in Fig. 2A where the measured input signal vector [X] is transformed by way of the weight matrix [W] to produce a monotonically transformed output vector [Y] = g([W][X]+[W0]). By analogy to Eqn. 3, the multivariate probability density function of [Y] can be expressed as
f (X) f (Y) = -- _-: [Eqn. 13]
\ // \ where | , 1 is the absolute value of the Jacobian of the transformation that produces output vector [Y] from input vector [X]. As is well-known in the art, the Jacobian is
the determinant of the matrix of partial derivatives:
dx. dx.
,/ = det [Eqn. 14]
dx. where det[.] denotes the determinant of a square matrix.
By analogy to the single-input case discussed above, the method of this invention maximizes the natural log of the Jacobian to maximize output entropy H(Y) for a given input entropy H(X), as can be appreciated with reference to Eqn. 5. The quantity ln| / I represents the volume of space in [Y] into which points in [X] are mapped. Maximizing this quantity attempts to spread the training set of input points evenly [Y].
For the commonly-used logistic transfer function, the resulting learning rules can be proven to be as follows:
[ΔW] = ε-(( [l] -2[Y]) [X]τ + [[W ]-1) [Eqn. 15]
[ΔW0] = ε-([ l ] - 2 [Y]) [Eqn. 16]
In Eqn. 15, the first anti-Hebbian term has become an outer product of vectors and the second anti-decay term has generalized to an "anti-redundancy" term in the form of the inverse of the transpose of the weight matrix [W]. Eqn. 15 can be written, for an individual weight Wy as follows:
,- ccooffT[WW.l] \ \
ΔW;i = ε-| , '[* + Xj(l -2Yi) [Eqn.17] det[W]
where cofϊWy] denotes the cofactor of element WjJ5 which is known to be (~1)1+J times the determinant of the matrix obtained by removing the i"1 row and the j"1 column from
the square weight matrix [W] and ε is the learning rate. Similarly, the i,h bias measure ΔWi0 can be expressed as follows:
ΔWro = e-(l -2Yt) [Eqn. 18]
The rules shown in Eqns. 17-18 are the same as those for the single unit mapping (Eqns. 11-12) except that the instability occurs at det[W] = 0 instead of w = 0. Thus, any degenerate weight matrix leads to instability because any weight matrix having a zero determinant is degenerate. This fact enables different outputs Y; to learn to represent different things about the inputs Xj. When the weight vectors entering two different outputs become too similar, det[W] becomes small and the natural learning
process forces these approaching weight vectors apart. This effect is mediated by the numerator coffWy], which approaches zero to indicate degeneracy in the weight matrix of the rest of the layer not associated with input Xj or output Y(. Other sigmoidal transformations yield other training rules that are similarly advantageous as discussed above in connection with Eqn. 8. For instance, the hyperbolic tangent function yields rules very similar to those of Eqns. 17-18.
ΔWi0 = ε-(-2Yj) [Eqn. 20] The usefulness of these blind source separation network learning rules can be appreciated with reference to the discussion below in connection with Fig. 5. Blind Deconvolution in a Causal Filter
Figs. 2B-2C illustrate the blind deconvolution problem. Fig. 2C shows an unobserved data sequence S(t) entering an unknown channel A(t), which responsively produces the measured signal X(t) that can be blindly equalized through a causal filter
W(t) to produce an output signal U(t) approximating the original unobserved data
sequence S(t). Fig. 2B shows the time series X(t), which is presumed to have a length of J samples (not shown). X(t) is convolved with a causal filter having I weighted taps, W,,...,W, and impulse response W(t). The causal filter output signal U(t) is then passed through a non-linear sigmoid function g(.) to create the training signal Y(t) (not shown).
This system can be expressed either as a convolution (Eqn. 21 ) or as a matrix equation (Eqn. 22) as follows:
Y(t) = g(W(t) * X(t)) [Eqn. 21]
[Y] = g( [ ] [X]) [Eqn. 22] in which [Y]=g([U]) and [X] are signal sample vectors having J samples. Of course, the vector ordering need not be temporal. For causal filtering, [W] is a banded lower triangular J x J square matrix expressed as:
Assuming an ensemble of time series, the joint probability distribution functions f[y]([Y]) and f[x]([X]) are related by the Jacobian of the Eqn. 22 transformation according to Eqn. 13. The ensemble can be "created" from a single time series by breaking the series into sequences of length I, which reduces [W] in Eqn. 23 to an I x I lower triangular matrix. The Jacobian of the transformation is then written as follows:
, = [Eqn. 24]
which may be decomposed into the determinant of the weight matrix [W] of Eqn. 23 and the product of the slopes of the sigmoidal squashing function for all times t. Because [W] is lower-triangular, its determinant is merely the product of the diagonal values, which is W,J. As before, the output signal entropy H(Y) is maximized by maximizing the logarithm of the Jacobian, which may be written as: lnL/ l = ln |WlV ∑ ln |i g | [Eqn. 25]
If the hyperbolic tangent is selected as the non-linear sigmoid function, then differentiation with respect to the filter weights W(t) provides the following two simple
learning rules:
Δw> = ε-∑ ( ^τ - 2 χj γj) [Ecm- 26]
J
ΔWS = ∑ (-^Xj .Yj), where i> 1 [Eqn. 27] j= ι
In Eqns. 26-27, W, is the "leading weight" and W, (i=2,...,I) represent the
remaining weights in a delay line having I weighted taps linking the input signal sample X1+ 1 to the output signal sample Y.. The leading weight W, therefore adapts like a
weight connected to a neuron with only that one input (Eqn. 11 above). The other tap weights {W,} attempt to decorrelate the past input from the present output. Thus, the
leading weight W, keeps the causal filter from "shrinking".
Other sigmoidal functions may be used to generate similarly useful learning rules,
as discussed above in connection with Eqn. 8. The equivalent rules for the logistic transfer function discussed above can be easily deduced to be:
Δ ^ -∑ f ^- ÷X^Yj)) [Eqn. 28]
W,
j ΔW, = -∑ X^l^Yj), where i > l [Eqn. 29]
The usefulness of these causal filter learning rules can be appreciated with reference to the discussion below in connection with Figs. 6 and 7. Information Maximization v. Statistical Dependence
The process of this invention relies on the unexpectedly advantageous observation
that, under certain conditions, the maximization of the mutual information I(Y,X) operates to minimize the mutual information between separate outputs {U;} in a multiple source network, thereby performing the redundancy reduction required to solve the blind signal processing problem. The usefulness of this relationship was unsuspected until now. When limited to the usual logistic transfer or hyperbolic tangent sigmoid functions, this invention appears to be limited to the general class of super-Gaussian signals having kurtosis greater than 3. This limitation can be understood by considering the following example shown in Figs. 3A-3C.
Referring to Fig. 3A, consider a network with two outputs y, and y-, which may be either two output channels from a blind source separation network or two signal
samples at different times for a blind deconvolution network. The joint entropy of these two variables can be written as:
H (y1 ,y2) = H(y1) + H(y2) - I (y1 ,y2) [Eqn. 30] Thus, the joint entropy can be maximized by maximizing the individual entropies while minimizing the mutual information I(y,,y2) shared between the two. When the mutual information I(yι,y2) is zero, the two variables y, and y2 are statistically
independent and the joint probability density function is equal to the product of the individual probability density functions so that fy y (y,,y2) = fy (y,)fy (y2). Both the ICA and the "whitening" approach to deconvolution are examples of pair-wise minimization of mutual information I(y]5y2) for all pairs y, and y2. This process is variously denominated factorial code learning, predictability minimization, independent component analysis ICA and redundancy reduction.
The process of this invention is a stochastic gradient ascent procedure that maximizes the joint entropy H(y,,y2), thereby differing sharply from these "whitening" and ICA procedures known for minimizing mutual information I(y)5y ). The system of this invention rests on the unexpectedly advantageous discovery of the general conditions under which maximizing joint entropy operates to reduce mutual information
(redundancy), thereby reducing the statistical dependence of the two outputs y, and y2.
Under many conditions, maximizing joint entropy H(y,,y2) does not guarantee minimization of mutual information I(y,,y2) because of interference from the other single entropy terms H(y,) in Eqn. 30. Fig. 3C shows one pathological example where a
"diagonal" projection of two independent, uniformly-distributed variables x, and x2 is preferred over the "independent" projection shown in Fig. 3B when joint entropy is maximized. This occurs because of a mismatch between the requisite alignment of input probability distribution function and sigmoid slope discussed above in connection with Figs. 1A-1C and Eqn. 8. The learning procedure of this invention achieves the higher value of mutual entropy shown in Fig. 3C than the desired value shown in Fig. 3B because of the higher individual output entropy values H(y,) arising from the triangular
probability distribution functions of (x, + x2) and (x, - x2) of Fig. 3C, which more closely match the sigmoid slope (not shown). This interferes with the minimization of
mutual information I(y,,y2) because the individual entropy H(y,) increases offset or mask undesired increases in mutual information to provide the higher joint entropy H(y,,y2) sought by the process.
The inventor believes that such interference has little significant effect in most
practical situations, however. As mentioned above in connection with Eqn. 8, the sigmoidal function is not limited to the usual two functions and indeed can be tailored to the particular class of probability distribution functions expected by the process of this invention. Any function that is a member of the class of solutions to the partial differential Eqn. 8 provides a sigmoidal function suitable for use with the process of this invention. It can be shown that this general class of sigmoidal functions leads to the following two learning rules according to this invention: cof[W .]
ΔWϋ = ε rX. I Y, T'sgntYi) [Eqn. 31] det[W]
ΔW i0 ε-( - rXj I Y. |r-'sgn(Yi)) [Eqn. 32]
+ 1 for Yj > 0 where sgn(Yj) 0 for Yi = 0
-1 for Y, <0
and where parameter r is chosen appropriately for the presumed kurtosis of the probability distribution function of the source signals [S,]. This formalism can be extended to covered skewed and multimodal input distribution by extending Eqn. 8 to produce an increasingly complex polynomial in g(x) such that — g(x) = G(g(x)) . dx
Even with the usual logistic transfer function (Eqn. 7) and the hyperbolic tangent
function (r=2), it appears that the problem of individual entropy interference is limited
to sub-Gaussian probability distribution functions having a kurtosis less than 3.
Advantageously, many actual analog signals, including the speech signals used in the experimental verification of the system of this invention, are super-Gaussian in distribution. They have longer tails and are more sharply peaked than the Gaussian distribution, as may be appreciated with reference to the three distribution functions
shown in Figs. 8A-8C. Fig. 8A shows a typical speech probability distribution function,
Fig. 8B shows the probability distribution function for rock music and Fig. 8C shows a typical Gaussian white noise distribution. The inventor has found that joint entropy maximization for sigmoidal networks always minimizes the mutual information between the network outputs for all super-Gaussian signal distributions tested. Special sigmoid functions can be selected that are suitable for accomplishing the same result for sub-
Gaussian signal distributions as well, although the precise learning rules must be selected in accordance with the parametric learning rules of Eqns. 31-32.
Different sigmoid non-linearities provide different anti-Hebbian terms. Table 1 provides the anti-Hebbian terms from the learning rules resulting from several interesting
non-linear transformation functions. The information-maximization rule consists of an anti-redundancy term which always has a form of [[W]T] and an anti-Hebbian term that keeps the unit from saturating.
Table 1
Function: Slope: Anti Hebb term: d 1 1 ' l y, = g(u.) au,
1 y,d - y.) x 1 - 2y.)
1 + e ""'
tanh(u,) (i - 1 y, I2) -2 jy,
Eqn. 8 solution i - 1 y, r -rXj|y,|sgn(y,)
arctan(u,) 1 2xj U,
1 + U 2 l + uj
10 erf(u,) -2 jU,
2 l + 2u^ e"u' -2u,y,
Table 1 shows that only the Eqn. 8 solutions (including the hyperbolic tangent function for r=2) and the logistic transfer functions produce anti-Hebbian terms that can yield higher-order statistics. The other functions use the net input u, as the output variable rather using the actual transformed output y,. Tests performed by the inventor show that
the erf function is unsuitable for blind separation. In fact, stable weight matrices using the -2x.u, can be calculated from the co variance matrix of the inputs alone. The learning rule for a Gaussian radial basis function node is interesting because it contains u, in both
the numerator and denominator. The denominator term limits the usefulness of such a rule because data points near the radial basis function center would cause instability. Radial transfer functions are generally appropriate only when input distributions are annular. Illustrative Networks
Fig. 5 shows a functional block diagram illustrating an exemplary embodiment of a four-port blind signal separation network according to this invention. Each of the four
input signals {X,} represents "sensor" output signals such as the electrical signal received from a microphone at a "cocktail party" or an antenna output signal. Each of the four network output signals {U;} is related to the four input signals by weights so that
The four bias weights {Wl0} are updated regularly according to the learning rule of Eqn. 18 discussed above and each of the sixteen scaling weights {Wy} are updated regularly according to the learning rule of Eqn. 17 discussed above. These updates can occur after every signal sample or may be accumulated over many signal samples for updating in a global mode. Each of the weight elements in Fig. 5 exemplified by element 18 includes the logic necessary to produce and accumulate the ΔW update according to the applicable learning rule.
The separation network in Fig. 5 can also be used to remove interfering signals from a receive signal merely by, for example, isolating the interferer as output signal U, and then subtracting U, from the receive signal of interest, such as receive signal X,. In such a configuration, the network shown in Fig. 5 is herein denominated a "interference cancelling" network.
Fig. 6 shows a functional block diagram illustrating a simple causal filter operated according to the method of this invention for blind deconvolution. A time- varying signal
is presented to the network at input 22. The five spaced taps {TJ are separated by a time-delay interval T in the manner well-known in the art for transversal filters. The five weight factors {W,} are established and updated by internal logic (not shown) according to the learning rules shown in Eqns. 26-27 discussed above. The five weighted tap signals {U,} are summed at a summation device 24 to produce the single time- varying output signal U,. Because input signal X, includes an unknown non-linear combination of time- delayed versions of an unknown source signal S,, the system of this invention adjusts the tap weights {W such that output signal U, approximates the unknown source signal S,. Fig. 7 shows a functional block diagram illustrating the combination of blind source separation network and blind deconvolution filter systems of this invention. The blind separation learning rules and the blind deconvolution rules discussed above can be easily combined in the form exemplified by Fig. 7. The objective is to maximize the natural logarithm of a Jacobian with local lower triangular structure, which yields the expected learning rule that forces the leading weights {Wljk} in the filters to follow the blind separation rules and all others to follow a decorrelation rule except that tapped weights {Wijk} are interposed between a delayed input and an output.
The outputs {Uj} are used to produce a set of training signals given by Eqn. 33:
Yj(t) = g(Uj (t)) = g Wijk Xk(t-(I-i+Dτ) [Eqn. 33]
where g(.) denotes the selected sigmoidal transfer function. If the hyperbolic tangent function is selected as the sigmoidal non-linearity, the following training rules are used in the system of this invention:
[ /' cofTW.., l
ΔW ljk [Eqn. 34] (^ det[W, ] - 2Xk k Yj >
Δwijk = ε'( " 2 ., Yj) when i>l [Eqn. 35] where ΔW,jk are the elements of the "lead" plane and e is the learning rate.
In Fig. 7, each of the three input signals {X } contain multipath distortion that
requires blind deconvolution as well as an unknown mixture of up to three unknown source signals {Sk}. Each of the source separation planes, exemplified by plane 24, operates substantially as discussed above in connection with Fig. 5 for the three input
signals {XJ, by providing three output contributions to the summing elements exemplified by summing circuit 26. Plane 24 contains the lead weights for the 16 individual causal filters formed by the network. Preliminary experiments performed by the inventor with speech signals in which signals were simultaneously separated and deconvolved using the learning rule discussed above resulted in recovery of apparently perfect speech. Experimental Results
The inventor conducted experiments using three-second segments of speech recorded from various speakers with only one speaker per recording. All speech segments were sampled at 8,000 Hz from the output of the auxiliary microphone of a Sparc- 10 workstation. No special post-processing was performed on the waveforms other than the normalization of amplitudes to a common interval [-3,3] to permit operation with the equipment used. The network was trained using the stochastic gradient ascent procedure
of this invention.
Unsupervised learning in a neural network may proceed either continuously or in a global mode. Continuous learning consists in slightly modifying the weights after each propagation of an input vector through the network. This kind of learning is useful for
signals that arrive in real time or when local storage capacity is restricted. In a global learning mode, a multiplicity of samples are propagated through the network and the
results stored locally. Statistics are computed exactly on these data and the weights are modified only after accumulating and processing the multiplicity of signal samples.
To reduce computational overhead, these experiments were performed using the global learning mode. To ensure that the input ensemble is stationary in time, random points were selected from the three-second window to generate the appropriate input vectors. Various learning rates were tested, with 0.005 preferred. As used herein, learning rate ε establishes the actual weight adjustment such that Wy = W + εΔWy, as is known in the art. The inventor found that reducing the learning rate over the learning
process was useful.
Blind Separation Results: The network architecture shown in Figs. 2A and 5 together with the learning rules in Eqns. 17-18 were found to be sufficient to perform blind separation of at least seven unknown source signals. A random mixing matrix [A]
was generated with values usually in the interval [-1,1]. The mixing matrix [A] was used to generate the several mixed time series [Xj] from the original sources [S . The
unmixing matrix [W] and the bias vector [W0] were then trained according to the rules in Eqns. 17-18. Fig. 10 shows the results of the attempted separation of five source signals. The mixtures [Xj] formed an incomprehensible babble that could not be penetrated by the human ear. The unmixed solutions shown as [Y(] were obtained after presenting about 500,000 time samples, equivalent to 20 passes through the complete three-second series. Any residual interference in the output vector elements [Y,] is inaudible to the human ear. This can be appreciated with reference to the permutation structure of the product of the
final weight matrix [W] and the initial mixing matrix [A]:
-4.09 0.13 0.09 -0.07 -0.01
0.07 -2.92 0.00 0.02 -0.06
[W] [A] = 0.02 -0.02 -0.06 -0.08 -2.20
0.02 0.03 0.00 1.97 0.02
-0.07 0.14 -3.50 -0.01 0.04
As can be seen, the residual interference factors are only a few percent of the single substantial entry in each row and column, thereby demonstrating that weight matrix [W] substantially removes all effects of mixing matrix [A] from the signals.
In a second experiment, seven source signals, including five speaking voices, a rock music selection and white noise, were successfully separated, although the separation was still slowly improving after 2.5 million iterations, equivalent to 100 passes through
the three-second data. For two sources, convergence is normally achieved in less than one pass through the three seconds of data by the system of this invention.
The blind separation procedure of this invention was found to fail only when: (a) more than one unknown source is Gaussian white noise, and (b) when the mixing matrix
[A] is nearly singular. Both weaknesses are understandable because no procedure can separate independent Gaussian sources and, if [A] is nearly singular, then any proper unmixing matrix [W] must also be nearly singular, making the expression in Eqn. 17 quite unstable in the vicinity of a solution.
In contrast with these results, experience with similar tests of the HJ network shows it occasionally fails to converge for two sources and rarely converges for three sources.
Blind Deconvolution Results: Speech signals were convolved with various filters and the learning rules in Eqns. 26-27 were used to perform blind deconvolution. Some results are shown in Figs. 11 A-l IL. The convolving filter time domains shown in
Figs. 11 A, HE and 111, contained some zero values. For example, Fig. HE represents the filter [0.8,0,0,0,1]. Moreover, the taps were sometimes adjacent to each other, as in Figs. 11 A-l ID, and sometimes spaced apart in time, as in Figs. HI-l lL. The leading
weight of each filter is the right-most bar in each histogram, exemplified by bar 30 in Fig.
I ll and bar 32 in Fig. HG.
A whitening experiment is shown in Figs. 1 1 A-l ID, a barrel-effect experiment in Figs. 1 1E-1 IH and multiple-echo experiment in Figs. 1 11-1 IL. For each of these three experiments, the time domain characteristics of convolving filter [A] is shown followed by those of the ideal deconvolving filter [Wide , those of the filter produced by the process of this invention [W] and the time domain pattern produced by convolution of [W] and [A]. Ideally, the convolution [W]*[A] should be a delta- function consisting of only a single high value at the right-most position of the leading weight when [W] correctly inverts [A].
The first whitening example shows what happens when "deconvolving" a speech signal that has not been corrupted (convolving filter [A] is a delta- function). If the tap
spacing is close enough, as in this case where the tap spacing is identical to the sample internal, the process of this invention learns the whitening filter shown in Fig. 1 IC that flattens the amplitude spectrum of the speech up to the Nyquist limit (equivalent to half of the sampling frequency). Fig. 9A shows the spectrum of the speech sequence before
deconvolution and Fig. 9B shows the speech spectrum after deconvolution by the filter shown in Fig. 1 IC- Whitened speech sounds like a clear sharp version of the original signal because the phase structure is preserved. By using all available frequency levels equally, the system is maximizing information throughput in the channel. Thus, when the original signal is not white, the deconvolving filter of this invention will recover a whitened version of it rather than the exact original. However, when the filter taps are spaced further apart, as in Figs. 11E-1 II, there is less opportunity for simple whitening.
In the second "barrel-effect" example shown in Fig. 1 IE, a 6.25 ms echo is added to the speech signal. This creates a mild audible barrel effect. Because filter 1 IE is finite
in length, its inverse is infinite in length but is shown in Fig. 1 IF as truncated. The inverting filter learned in Fig. 11G resembles Fig. 1 IF although the resemblance tails off toward the left side because the process of this invention actually learns an optimal filter of finite length instead of a truncated infinite optimal filter. The resulting deconvolution shown in Fig. 1 IH is very good.
The best results from the blind deconvolution process of this invention are seen
when the ideal deconvolving filter is of finite length, as in the third example shown in
Figs. 111- 1 IL. Fig. 1 II shows a set of exponentially-decaying echoes spread out over 275 ms that may be inverted by a two-point filter shown in Fig. 1 U with a small decaying correction on the left, which is an artifact of the truncation of the convolving filter shown in Fig. 1 11. As seen in Fig. 1 IK, the learned filter corresponds almost exactly to the ideal filter in Fig. HJ and the deconvolution in Fig. 1 IL is almost perfect. This result demonstrates the sensitivity of the blind processing method of this invention in cases where the tap-spacing is great enough (100 sample intervals) that simple whitening cannot interfere noticeably with the deconvolution process.
Clearly, other embodiments and modifications of this invention may occur readily to those of ordinary skill in the art in view of these teachings. Therefore, this invention is to be limited only by the following claims, which include all such embodiments and
modifications when viewed in conjunction with the above specification and accompanying drawing.
Claims
1. In a neural network having input means for accepting a plurality J of input signals {Xj} and output means for producing a plurality I of output signals {U,} each said output signal U, representing a combination of said input signals {X}} weighted by a plurality I of bias weights {W,0} and a plurality I2 of scaling weights {W,,} such that [UJ = [Wrj] [X.] + [Wl0], a method for minimizing the information redundancy among said output signals {Uj}, wherein 0<i<I>l and 0<j<J>l are integers, said method comprising the steps of:
(a) selecting initial values for said bias weights {Wl0} and said scaling weights
{W„>;
(b) producing a plurality I of training signals {Y,} responsive to a transformation of said input signals {X,} such that Y, = g(U,), wherein g(x) is a nonlinear function and the Jacobian of said transformation is ,/ = det[θY/θXj] when J=I; and
(c) adjusting said bias weights {W,0} and said scaling weights {W,.} responsive to one or more samples of said training signals {Y,} such that each said bias weight Wl0 is changed proportionately to a corresponding bias measure ΔW)0 accumulated over said one or more samples and each said scaling weight W,j is changed proportionately to a
corresponding scaling measure ΔWy = ε-θ(ln|^|)/θW,. accumulated over said one or more
samples, wherein ε>0 is a learning rate.
2. The method of claim 1 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of the solutions to the equation — g(x) = 1 - | g(x) |r and said adjusting step (c) comprises the step of: dx (c) adjusting said bias weights {Wl0} and said scaling weights {Wy} responsive to one or more samples of said training signals {Y,} such that each said bias weight Wl0 is changed proportionately to a corresponding bias measure ΔWl0 = ε-(-r|Y,|r lsgn(Y,)) accumulated over said one or more samples and each said scaling weight WtJ is changed proportionately to a corresponding scaling measure ΔW,. = ε-((cofϊW1J]/det[W1J]) - rXJYJ'"1 sgn(Y,)) accumulated over said one or more samples.
3. The method of claim 1 wherein said nonlinear function g(x) is a nonlinear function
selected from a group consisting essentially of g,(x) = tanh(x) and g2(x) = (1-e")"1 and said adjusting step (c) comprises the step of: (c) adjusting said bias weights { Wl0} and said scaling weights { Wυ} responsive to one or more samples of said training signals {Y,} such that each said bias weight W,0 is changed proportionately to a corresponding bias measure ΔWl0 selected from the group consisting essentially of Δ,Wl0 =ε-(-2Y,) and Δ2Wl0 =ε-(l-2Y,) accumulated over said one
or more samples and each said scaling weight WtJ is changed proportionately to the a corresponding scaling measure ΔWU selected from the group consisting essentially of Δ.W.J = ε<(cof[Wlj]/det[W1J]) - 2X.Y,) and Δ^ = ε-((cofϊWlj]/det[W,J]) + X.(l- 2Y,)) accumulated over said one or more samples.
4. A method for recovering one or more of a plurality I of independent source signals
{S,} from a plurality J > I of sensor signals {X.} each including a combination of at least some of said source signals {S,} wherein 0<i<I>l and 0<j<J>I are integers, said method comprising the steps of: (a) selecting a plurality I of bias weights {Wl0} and a plurality I2 of scaling
weights {W,.};
(b) adjusting said bias weights {Wl0} and said scaling weights {W,.} by repeatedly performing the steps of: (b.l) producing a plurality I of estimation signals {U,} responsive to said sensor signals {X.} such that [U,] = [W,.] [X.] + [Wl0],
(b.2) producing a plurality I of training signals {Y,} responsive to a transformation of said sensor signals {XJ such that Y, = g(U,), wherein g(x) is a
nonlinear function and the Jacobian of said transformation is // - det[θY,/θXj] when J=l, and
(b.3) adjusting each said bias weight W,0 and each said scaling weight W,. responsive to one or more samples of said training signals {Y,} such that said each bias weight W)0 is changed proportionately to a bias measure ΔWl0 accumulated over said one or more samples and said each scaling weight W3J is changed proportionately to a corresponding scaling measure ΔWy = ε-θ(ln|,/|)/3W accumulated over said one or more samples, wherein ε>0 is a learning rate; and
(c) producing said estimation signals {U,} to represent said one or more recovered source signals {S,}.
5. The method of claim 4 wherein said nonlinear function g(x) is a nonlinear function
selected from a group consisting essentially of the solutions to the equation
— g(x) = 1 - I g(x) |r and said adjusting step (c) comprises the step of: dx
(c) adjusting said bias weights {Wl0} and said scaling weights {W,j} responsive to one or more samples of said training signals {Y,} such that each said bias weight Wl0 is changed proportionately to a corresponding bias measure ΔW,0 = ε-(-rX.|Y, '1sgn(Y1)) accumulated over said one or more samples and each said scaling weight Wy is changed proportionately to a corresponding scaling measure ΔWy = sgn(Y,)) accumulated over said one or more samples.
6. The method of claim 4 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of g,(x) = tanh(x) and g2(x) = (1-e")"1 and said adjusting step (c) comprises the step of:
(c) adjusting said bias weights {Wl0} and said scaling weights {Wy} responsive to one or more samples of said training signals {Y,} such that each said bias weight Wl0
is changed proportionately to a corresponding bias measure ΔW|0 selected from the group consisting essentially of Δ,W|0 =ε-(-2Y,) and Δ2Wl0 =ε-(l-2Y,) accumulated over said one or more samples and each said scaling weight Wy is changed proportionately to the a corresponding scaling measure ΔWy selected from the group consisting essentially of Δ,Wy = ε-((cofIWy]/det[Wy]) - 2XjY,) and Δ2W,. = ε-((cofIWy]/det[Wy]) + X.(l- 2Y,)) accumulated over said one or more samples.
7. In a transversal filter having an input for accepting a sensor signal X that includes
a combination of multipath reverberations of a source signal S and having a plurality I of delay line tap output signals {T distributed at intervals of one or more time delays T, said source signal S and said sensor signal X varying with time over a plurality J > I of said time delay intervals T such that said sensor signal X has a value X. at time τ(j-l) and each said delay line tap output signal T, has a value XJ+,., representing said sensor signal value Xj delayed by a time interval τ(i- 1 ), wherein τ>0 is a predetermined constant and 0<i<I>l and 0<j<J>I are integers, a method for recovering said source signal S from said sensor signal X comprising the steps of:
(a) selecting a plurality I of filter weights {WJ;
(b) adjusting said filter weights {WJ by repeatedly performing the steps of
(b.l) producing a plurality K=I of weighted tap output signals {Vk} by combining said delay line tap output signals {TJ such that [VJ = [Fkl] [T,], wherein 0<k<K=I>l are integers, and wherein Fkl = Wk+]., when l<k+l-i<I and FkI = 0 otherwise,
(b.2) summing a plurality K=I of said weighted tap signals {V } to
K produce an estimation signal U = Vk, wherein said estimation signal U has a k=l value Uj at time τ(j-l),
(b.3) producing a plurality J of training signals {Yj} responsive to a
transformation of said sensor signal values {Xj} such that Yj = g(U,) wherein g(x) is a nonlinear function and the Jacobian of said transformation is / = det[θY/θXj] when J=I, and
(b.4) adjusting each said filter weight W, responsive to one or more
samples of said training signals {Y.} such that said each filter weight W, is changed proportionately to a corresponding leading measure ΔW, accumulated over said one or more samples when i=l and a corresponding scaling measure ΔW, = ε-θ(ln|,/|)/θW, accumulated over said one or more samples otherwise; and
(c) producing said estimation signal U to represent said recovered source signal S.
8. The method of claim 7 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of g,(x) = tanh(x) and g2(x) = (1-e"1)"1 and said adjusting step (b.4) comprises the step of:
(b.4) adjusting each said filter weight Wj responsive to one or more samples of said training signals {Yj} such that said each filter weight W: is changed proportionately to a corresponding leading measure ΔW( selected from the group consisting essentially of
9. The method of claim 7 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of the solutions to the equation
— g(x) - l - | g(x) |r and said adjusting step (b.4) comprises the step of: dx
(b.4) adjusting each said filter weight Wj responsive to one or more samples of said training signals {Yj} such that said each filter weight t is changed proportionately
to a corresponding leading measure ΔWj = ε-] — - rX: |Yj |r"1sgn(Yi) | accumulated j-i l Wι J over said one or more samples when i=l and a corresponding scaling measure j ΔWj = ε- (-rX^. Yj l^'sgnC j)) accumulated over said one or more samples otherwise.
10. A neural network for recovering a plurality of source signals from a plurality of mixtures of said source signals, said neural network comprising: input means for accepting a plurality J of input signals {XJ each including a
combination of at least some of a plurality I of independent source signals {SJ, wherein 0<i≤I>l and 0<j<J>I are integers; weight means coupled to said input means for storing a plurality I of bias weights
{W,0} and a plurality I2 of scaling weights {Wy}; output means coupled to said weight means for producing a plurality I of output signals {UJ responsive to said input signals {Xj} such that [U,] = [Wy] [XJ + [W,0]; training means coupled to said output means for producing a plurality I of training signals {YJ responsive to a transformation of said input signals {XJ such that Y,=g(U,), wherein g(x) is a nonlinear function and the Jacobian of said transformation is / =
adjusting said bias weights {W,0} and said scaling weights {Wy} responsive to one or more samples of said training signals {YJ such that each said bias weight Wl0 is changed proportionately to a corresponding bias measure ΔWl0 accumulated over said one or more samples and each said scaling weight Wy is changed proportionately to a corresponding scaling measure ΔWy = ε-d(-n|,/|)/dWy accumulated over said one or more samples, wherein ε>0 is a learning rate.
1 1. The neural network of claim 10 wherein said nonlinear function g(x) is a nonlinear
function selected from a group consisting essentially of the solutions to the equation
— g(x) = l - | g(x) |r and said bias measure ΔW|0 = ε-(-r|Y,|r lsgn(Y1)) and said scaling dx measure ΔWy = ε-αcofTW /dettW ) - rXJYJ'-1 sgn(Y,)).
12. The neural network of claim 10 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of g,(x) = tanh(x) and g2(x) = (l-e"x)"' and said bias measure ΔWl0 is selected from a group consisting essentially of Δ,W|0 = -2Y, and Δ2Wl0 = 1-2Y, and said scaling measure ΔWy is selected from a group consisting
essentially of Δ,W,. = (coflW-J/dettW,.]) - Xj2Y, and Δ2Wy = (cof[W..]/det[W..]) + X l-
2Y,).
13. A system for adaptively cancelling one or more interferer signals {Sn} comprising:
input means for accepting a plurality J of input signals {XJ each including a combination of at least some of a plurality I of independent source signals {SJ that includes said one or more interferer signals {Sn}, wherein 0<i<I>l. 0<j≤J>I and 0<n<N>l are integers; weight means coupled to said input means for storing a plurality I of bias weights {Wl0} and a plurality I2 of scaling weights {W,.}; output means coupled to said weight means for producing a plurality I of output signals {UJ responsive to said input signals {XJ such that [U,] = [WJ [XJ + [Wl0]; training means coupled to said output means for producing a plurality I of training signals {YJ responsive to a transformation of said input signals {XJ such that Y=g(U,), wherein g(x) is a nonlinear function and the Jacobian of said transformation is ,/ =
det[θY/θX.]; adjusting means coupled to said training means and said weight means for adjusting said bias weights {Wl0} and said scaling weights {Wy} responsive to one or more samples of said training signals {YJ such that each said bias weight Wl0 is changed proportionately to a corresponding bias measure ΔW|0 accumulated over said one or more samples and each said scaling weight Wy is changed proportionately to a corresponding scaling measure ΔWy = ε-d(ln|,y|)/dWy accumulated over said one or more samples, wherein ε>0 is a learning rate; and feedback means coupled to said output means and said input means for selecting one or more said output signals {Un} representing said one or more interferer signals {Sn} for combination with said input signals {Xj}, thereby cancelling said interferer
signals {Sn}.
14. The system of claim 13 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of the solutions to the equation — g(x) = 1 - I g(x) |r and said bias measure ΔW|0 = ε-(-r|Y, 'sgn(Y,)) and said scaling dx measure ΔW,. = ε{(cof[W,.]/det[Wu]) - rX.|Y,| sgn(Y,)),
15. The system of claim 13 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of g,(x) = tanh(x) and g2(x) = (l-e" )"' and said bias measure ΔWl0 is selected from a group consisting essentially of Δ,Wl0 = -2Y, and Δ2Wl0 = 1-2Y, and said scaling measure ΔWy is selected from a group consisting
essentially of Δ,Wy = (cof[Wy]/det[W,j]) - X.2Y, and Δ2W,. = (cof[Wy]/det[W,j]) + X.(l- 2Y,).
16. A system for separating a plurality I of independent source signals {SJ wherein 0<i<I>l are integers, said system comprising: sensor means having a plurality J of source sensors {sJ for receiving said source signals {SJ wherein 0<j<J>I are integers; transducer means in each said source sensor Sj for creating an input signal Xj responsive to said source signals {Sf} arriving at each said source sensor s}; weight means coupled to said sensor means for storing a plurality I of bias weights {Wl0} and a plurality I2 of scaling weights {Wy};
output means coupled to said weight means for producing a plurality I of output signals {UJ responsive to said input signals {XJ such that [U,] = [W,J [XJ + [Wl0] wherein said output signals {UJ each represent one of said source signals {SJ; training means coupled to said output means for producing a plurality I of training signals {YJ responsive to a transformation of said input signals {XJ such that Y=g(U,), wherein g(x) is a nonlinear function and the Jacobian of said transformation is // = detfθY/θXj] when J=I; adjusting means coupled to said training means and said weight means for adjusting said bias weights {Wl0} and said scaling weights {Wy} responsive to one or more samples of said training signals {YJ such that each said bias weight Wl0 is changed proportionately to a corresponding bias measure ΔW|0 accumulated over said one or more samples and each said scaling weight Wy is changed proportionately to a corresponding
scaling measure ΔWj. = ε-d(ln|/|)/dWjj accumulated over said one or more samples, wherein ε>0 is a training rate.
17. The system of claim 16 wherein said nonlinear function g(x) is a nonlinear
function selected from a group consisting essentially of the solutions to the equation
— g(x) = 1 - I g(x) |r and said bias measure ΔW,0 = ε-(-r|Y1|r"1sgn(Yj)) and said scaling dx measure ΔWy = ε-((cof[Wij]/det[Wjj]) - rXjIYf r-l1 sgn(Yf)),
18. The system of claim 16 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of g,(x) = tanh(x) and g2(x) = (l-e"x)"' and said bias measure ΔWl0 is selected from a group consisting essentially of Δ,W,0 = -2Y, and Δ2Wl0 = 1-2Y, and said scaling measure ΔWy is selected from a group consisting essentially of Δ,Wy = (cof[W,j]/det[Wy]) - Xj2Y, and Δ2W,. = (cofTW /detfW ) + X l- 2Y,).
19. A system for separating a plurality I of source signals {SJ in an underwater acoustic communication system, wherein 0<i<I>l are integers, said system comprising: sensor means having a plurality J of acoustic sensors {sJ for receiving said source signals {SJ wherein 0<j≤J>I are integers; transducer means in each said acoustic sensor s, for creating an input signal X, responsive to said source signals {SJ arriving at said each acoustic sensor st; weight means coupled to said sensor means for storing a plurality I of bias weights {W,0} and a plurality I2 of scaling weights {W,.}; output means coupled to said weight means for producing a plurality I of output signals {UJ responsive to said input signals {X.} such that [U = [Wy] [Xj] + [Wl0] wherein said output signals {UJ each represent one of said source signals {SJ;
training means coupled to said output means for producing a plurality I of training signals {YJ responsive to a transformation of said input signals {XJ such that Y,=g(UJ,
wherein g(x) is a nonlinear function and the Jacobian of said transformation is / = det[dY/dXj] when J=I; and adjusting means coupled to said training means and said weight means for adjusting said bias weights {W,0} and said scaling weights {W,J responsive to one or more samples of said training signals {YJ such that each said bias weight Wl0 is changed proportionately to a corresponding bias measure ΔWl0 accumulated over said one or more samples and each said scaling weight Wy is changed proportionately to a corresponding scaling measure ΔWy = ε-θ(ln /|)/aWy accumulated over said one or more samples,
wherein ε>0 is a training rate.
20. The system of claim 19 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of the solutions to the equation
g(x) = 1 - I g(x) |r and said bias measure ΔW|0 = ε-(-r|Y,| sgn(Y,)) and said scaling dx measure ΔWy = ε (cof[W1J]/det[W1J]) - rXjIYf1 sgn(Y,)),
21. The system of claim 19 wherein said nonlinear function g(x) is a nonlinear
function selected from a group consisting essentially of g,(x) = tanh(x) and g2(x) = (l-e'x)"' and said bias measure ΔWl0 is selected from a group consisting essentially of Δ,Wl0 = -2Y,
and Δ2Wl0 = 1-2Y, and said scaling measure ΔWy is selected from a group consisting essentially of Δ,Wy = (cofTW /dettW ) - X.2Y, and Δ2Wy = (cof[Wy]/det[W,J) + Xj(l- 2Y,).
22. A system for separating a plurality I of source signals {SJ in a cellular telecommunications system, wherein 0<i<I>l are integers, said system comprising:
sensor means having a plurality J of antenna elements {ej for receiving said source signals {SJ wherein 0<j<J>I are integers; transducer means in each said antenna element e, for creating an input signal Xj responsive to said source signals {SJ arriving at said each antenna element e^, weight means coupled to said sensor means for storing a plurality I of bias weights {W,0} and a plurality I2 of scaling weights {Wy}; output means coupled to said weight means for producing a plurality I of output signals {UJ responsive to said input signals {Xj} such that [UJ = [Wy] [Xj] + [W,0] wherein said output signals {UJ each represent one of said source signals {SJ; training means coupled to said output means for producing a plurality I of training signals {YJ responsive to a transformation of said input signals {Xj} such that Y,=g(U,), wherein g(x) is a nonlinear function and the Jacobian of said transformation is // = det[3Y,/aXj] when J=I; and adjusting means coupled to said training means and said weight means for adjusting said bias weights {Wl0} and said scaling weights {Wy} responsive to one or more samples of said training signals {YJ such that each said bias weight Wl0 is changed proportionately to a corresponding bias measure ΔW,0 accumulated over said one or more samples and each said scaling weight Wy is changed proportionately to a corresponding
scaling measure ΔWy = ε-a(ln|,y|)/θWy accumulated over said one or more samples, wherein ε>0 is a training rate.
23. The system of claim 22 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of the solutions to the equation
— g(x) = 1 - I g(x) |r and said bias measure ΔWl0 = ε-(-r|Y,|r"1sgn(Y,)) and said scaling dx measure ΔWy = ε (cof[Wy]/det[W,j]) - rX.IYf1 sgn(Y,)),
24. The system of claim 22 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of g,(x) = tanh(x) and g2(x) = (1-e"")"1 and said bias measure ΔW|0 is selected from a group consisting essentially of Δ,Wl0 = -2Y, and Δ2W,0 = 1-2Y, and said scaling measure ΔWy is selected from a group consisting
essentially of Δ,Wy = (cof[Wy]/det[Wy]) - Xj2Y, and Δ2Wy = (cof[Wy]/det[Wy]) + X.(l- 2Y,).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU45965/96A AU4596596A (en) | 1994-11-29 | 1995-11-28 | Blind signal processing system employing information maximization to recover unknown signals through unsupervised minimization of output redundancy |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/346,535 | 1994-11-29 | ||
US08/346,535 US5706402A (en) | 1994-11-29 | 1994-11-29 | Blind signal processing system employing information maximization to recover unknown signals through unsupervised minimization of output redundancy |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1996017309A1 true WO1996017309A1 (en) | 1996-06-06 |
Family
ID=23359855
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1995/015981 WO1996017309A1 (en) | 1994-11-29 | 1995-11-28 | Blind signal processing system employing information maximization to recover unknown signals through unsupervised minimization of output redundancy |
Country Status (3)
Country | Link |
---|---|
US (1) | US5706402A (en) |
AU (1) | AU4596596A (en) |
WO (1) | WO1996017309A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8165373B2 (en) | 2009-09-10 | 2012-04-24 | Rudjer Boskovic Institute | Method of and system for blind extraction of more pure components than mixtures in 1D and 2D NMR spectroscopy and mass spectrometry combining sparse component analysis and single component points |
US8224427B2 (en) | 2007-04-25 | 2012-07-17 | Ruder Boscovic Institute | Method for real time tumour visualisation and demarcation by means of photodynamic diagnosis |
CN112101280A (en) * | 2020-09-25 | 2020-12-18 | 北京百度网讯科技有限公司 | Face image recognition method and device |
Families Citing this family (104)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2722631B1 (en) * | 1994-07-13 | 1996-09-20 | France Telecom Etablissement P | METHOD AND SYSTEM FOR ADAPTIVE FILTERING BY BLIND EQUALIZATION OF A DIGITAL TELEPHONE SIGNAL AND THEIR APPLICATIONS |
FR2730881A1 (en) * | 1995-02-22 | 1996-08-23 | Philips Electronique Lab | SYSTEM FOR ESTIMATING SIGNALS RECEIVED IN THE FORM OF MIXED SIGNALS |
US6208295B1 (en) * | 1995-06-02 | 2001-03-27 | Trw Inc. | Method for processing radio signals that are subject to unwanted change during propagation |
US6359582B1 (en) | 1996-09-18 | 2002-03-19 | The Macaleese Companies, Inc. | Concealed weapons detection system |
US5995733A (en) * | 1997-01-27 | 1999-11-30 | Lucent Technologies, Inc. | Method and apparatus for efficient design and analysis of integrated circuits using multiple time scales |
JP3444131B2 (en) * | 1997-02-27 | 2003-09-08 | ヤマハ株式会社 | Audio encoding and decoding device |
US5959966A (en) * | 1997-06-02 | 1999-09-28 | Motorola, Inc. | Methods and apparatus for blind separation of radio signals |
AU740617C (en) | 1997-06-18 | 2002-08-08 | Clarity, L.L.C. | Methods and apparatus for blind signal separation |
US6185309B1 (en) * | 1997-07-11 | 2001-02-06 | The Regents Of The University Of California | Method and apparatus for blind separation of mixed and convolved sources |
FR2768818B1 (en) * | 1997-09-22 | 1999-12-03 | Inst Francais Du Petrole | STATISTICAL METHOD FOR CLASSIFYING EVENTS RELATED TO PHYSICAL PROPERTIES OF A COMPLEX ENVIRONMENT SUCH AS THE BASEMENT |
US6968342B2 (en) * | 1997-12-29 | 2005-11-22 | Abel Wolman | Energy minimization for data merging and fusion |
US6993186B1 (en) * | 1997-12-29 | 2006-01-31 | Glickman Jeff B | Energy minimization for classification, pattern recognition, sensor fusion, data compression, network reconstruction and signal processing |
US6691073B1 (en) | 1998-06-18 | 2004-02-10 | Clarity Technologies Inc. | Adaptive state space signal separation, discrimination and recovery |
US6269334B1 (en) * | 1998-06-25 | 2001-07-31 | International Business Machines Corporation | Nongaussian density estimation for the classification of acoustic feature vectors in speech recognition |
US6343268B1 (en) * | 1998-12-01 | 2002-01-29 | Siemens Corporation Research, Inc. | Estimator of independent sources from degenerate mixtures |
EP1018854A1 (en) * | 1999-01-05 | 2000-07-12 | Oticon A/S | A method and a device for providing improved speech intelligibility |
US6735482B1 (en) | 1999-03-05 | 2004-05-11 | Clarity Technologies Inc. | Integrated sensing and processing |
US6768515B1 (en) | 1999-03-05 | 2004-07-27 | Clarity Technologies, Inc. | Two architectures for integrated realization of sensing and processing in a single device |
US6856271B1 (en) | 1999-05-25 | 2005-02-15 | Safe Zone Systems, Inc. | Signal processing for object detection system |
US7450052B2 (en) * | 1999-05-25 | 2008-11-11 | The Macaleese Companies, Inc. | Object detection method and apparatus |
US7167123B2 (en) * | 1999-05-25 | 2007-01-23 | Safe Zone Systems, Inc. | Object detection method and apparatus |
US6342696B1 (en) * | 1999-05-25 | 2002-01-29 | The Macaleese Companies, Inc. | Object detection method and apparatus employing polarized radiation |
US6424960B1 (en) | 1999-10-14 | 2002-07-23 | The Salk Institute For Biological Studies | Unsupervised adaptation and classification of multiple classes and sources in blind signal separation |
US6704703B2 (en) * | 2000-02-04 | 2004-03-09 | Scansoft, Inc. | Recursively excited linear prediction speech coder |
US6775646B1 (en) * | 2000-02-23 | 2004-08-10 | Agilent Technologies, Inc. | Excitation signal and radial basis function methods for use in extraction of nonlinear black-box behavioral models |
AU2001243424A1 (en) * | 2000-02-29 | 2001-09-12 | Hrl Laboratories, Llc | Cooperative mobile antenna system |
US6654719B1 (en) * | 2000-03-14 | 2003-11-25 | Lucent Technologies Inc. | Method and system for blind separation of independent source signals |
CN1436436A (en) * | 2000-03-31 | 2003-08-13 | 克拉里提有限公司 | Method and apparatus for voice signal extraction |
US6490573B1 (en) * | 2000-04-11 | 2002-12-03 | Philip Chidi Njemanze | Neural network for modeling ecological and biological systems |
JP3449348B2 (en) * | 2000-09-29 | 2003-09-22 | 日本電気株式会社 | Correlation matrix learning method and apparatus, and storage medium |
JP4028680B2 (en) * | 2000-11-01 | 2007-12-26 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Signal separation method for restoring original signal from observation data, signal processing device, mobile terminal device, and storage medium |
JP3725418B2 (en) * | 2000-11-01 | 2005-12-14 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Signal separation method, image processing apparatus, and storage medium for restoring multidimensional signal from image data mixed with a plurality of signals |
US6870962B2 (en) | 2001-04-30 | 2005-03-22 | The Salk Institute For Biological Studies | Method and apparatus for efficiently encoding chromatic images using non-orthogonal basis functions |
US6622117B2 (en) * | 2001-05-14 | 2003-09-16 | International Business Machines Corporation | EM algorithm for convolutive independent component analysis (CICA) |
KR100383594B1 (en) * | 2001-06-01 | 2003-05-14 | 삼성전자주식회사 | Method and apparatus for downlink joint detector in communication system |
US7941205B2 (en) * | 2001-07-05 | 2011-05-10 | Sigmed, Inc. | System and method for separating cardiac signals |
US8055333B2 (en) * | 2001-07-05 | 2011-11-08 | Jeng-Ren Duann | Device and method for detecting cardiac impairments |
GB0123772D0 (en) * | 2001-10-03 | 2001-11-21 | Qinetiq Ltd | Apparatus for monitoring fetal heartbeat |
US6954494B2 (en) * | 2001-10-25 | 2005-10-11 | Siemens Corporate Research, Inc. | Online blind source separation |
US6701170B2 (en) * | 2001-11-02 | 2004-03-02 | Nellcor Puritan Bennett Incorporated | Blind source separation of pulse oximetry signals |
US6728396B2 (en) * | 2002-02-25 | 2004-04-27 | Catholic University Of America | Independent component imaging |
US6993440B2 (en) * | 2002-04-22 | 2006-01-31 | Harris Corporation | System and method for waveform classification and characterization using multidimensional higher-order statistics |
US6711528B2 (en) * | 2002-04-22 | 2004-03-23 | Harris Corporation | Blind source separation utilizing a spatial fourth order cumulant matrix pencil |
US7167568B2 (en) * | 2002-05-02 | 2007-01-23 | Microsoft Corporation | Microphone array signal enhancement |
CA2436400A1 (en) | 2002-07-30 | 2004-01-30 | Abel G. Wolman | Geometrization for pattern recognition, data analysis, data merging, and multiple criteria decision making |
US7366564B2 (en) | 2002-08-23 | 2008-04-29 | The United States Of America As Represented By The Secretary Of The Navy | Nonlinear blind demixing of single pixel underlying radiation sources and digital spectrum local thermometer |
AU2003296976A1 (en) * | 2002-12-11 | 2004-06-30 | Softmax, Inc. | System and method for speech processing using independent component analysis under stability constraints |
GB0229473D0 (en) * | 2002-12-18 | 2003-01-22 | Qinetiq Ltd | Signal separation system and method |
GB0306629D0 (en) * | 2003-03-22 | 2003-04-30 | Qinetiq Ltd | Monitoring electrical muscular activity |
US7187326B2 (en) * | 2003-03-28 | 2007-03-06 | Harris Corporation | System and method for cumulant-based geolocation of cooperative and non-cooperative RF transmitters |
US6993460B2 (en) | 2003-03-28 | 2006-01-31 | Harris Corporation | Method and system for tracking eigenvalues of matrix pencils for signal enumeration |
FR2853427B1 (en) * | 2003-04-01 | 2005-06-17 | Thales Sa | METHOD OF BLINDLY IDENTIFYING MIXTURES OF SOURCES WITH HIGHER ORDERS |
US7430546B1 (en) | 2003-06-07 | 2008-09-30 | Roland Erwin Suri | Applications of an algorithm that mimics cortical processing |
US7099821B2 (en) * | 2003-09-12 | 2006-08-29 | Softmax, Inc. | Separation of target acoustic signals in a multi-transducer arrangement |
US20100265139A1 (en) * | 2003-11-18 | 2010-10-21 | Harris Corporation | System and method for cumulant-based geolocation of cooperative and non-cooperative RF transmitters |
US7574333B2 (en) * | 2004-02-05 | 2009-08-11 | Honeywell International Inc. | Apparatus and method for modeling relationships between signals |
US7363200B2 (en) * | 2004-02-05 | 2008-04-22 | Honeywell International Inc. | Apparatus and method for isolating noise effects in a signal |
US10721405B2 (en) | 2004-03-25 | 2020-07-21 | Clear Imaging Research, Llc | Method and apparatus for implementing a digital graduated filter for an imaging apparatus |
US8331723B2 (en) | 2004-03-25 | 2012-12-11 | Ozluturk Fatih M | Method and apparatus to correct digital image blur due to motion of subject or imaging device |
US9826159B2 (en) | 2004-03-25 | 2017-11-21 | Clear Imaging Research, Llc | Method and apparatus for implementing a digital graduated filter for an imaging apparatus |
US20060034531A1 (en) * | 2004-05-10 | 2006-02-16 | Seiko Epson Corporation | Block noise level evaluation method for compressed images and control method of imaging device utilizing the evaluation method |
US7333850B2 (en) * | 2004-05-28 | 2008-02-19 | University Of Florida Research Foundation, Inc. | Maternal-fetal monitoring system |
US7231227B2 (en) * | 2004-08-30 | 2007-06-12 | Kyocera Corporation | Systems and methods for blind source separation of wireless communication signals |
WO2006044699A2 (en) * | 2004-10-13 | 2006-04-27 | Softmax, Inc. | Method and system for cardiac signal decomposition |
JP4449871B2 (en) * | 2005-01-26 | 2010-04-14 | ソニー株式会社 | Audio signal separation apparatus and method |
US7464029B2 (en) * | 2005-07-22 | 2008-12-09 | Qualcomm Incorporated | Robust separation of speech signals in a noisy environment |
US7330801B2 (en) * | 2005-07-29 | 2008-02-12 | Interdigital Technology Corporation | Signal separation using rank deficient matrices |
DE102005039621A1 (en) * | 2005-08-19 | 2007-03-01 | Micronas Gmbh | Method and apparatus for the adaptive reduction of noise and background signals in a speech processing system |
US8874439B2 (en) * | 2006-03-01 | 2014-10-28 | The Regents Of The University Of California | Systems and methods for blind source signal separation |
JP2009529699A (en) * | 2006-03-01 | 2009-08-20 | ソフトマックス,インコーポレイテッド | System and method for generating separated signals |
TW200811204A (en) * | 2006-05-19 | 2008-03-01 | Nissan Chemical Ind Ltd | Hyper-branched polymer and process for producing the same |
US8160273B2 (en) * | 2007-02-26 | 2012-04-17 | Erik Visser | Systems, methods, and apparatus for signal separation using data driven techniques |
CN101622669B (en) * | 2007-02-26 | 2013-03-13 | 高通股份有限公司 | Systems, methods, and apparatus for signal separation |
WO2008109859A1 (en) * | 2007-03-07 | 2008-09-12 | The Macaleese Companies, Inc. D/B/A Safe Zone Systems | Object detection method and apparatus |
US8112372B2 (en) * | 2007-11-20 | 2012-02-07 | Christopher D. Fiorello | Prediction by single neurons and networks |
US8175291B2 (en) * | 2007-12-19 | 2012-05-08 | Qualcomm Incorporated | Systems, methods, and apparatus for multi-microphone based speech enhancement |
US7869863B2 (en) * | 2008-01-10 | 2011-01-11 | The Johns Hopkins University | Apparatus and method for non-invasive, passive fetal heart monitoring |
US8321214B2 (en) * | 2008-06-02 | 2012-11-27 | Qualcomm Incorporated | Systems, methods, and apparatus for multichannel signal amplitude balancing |
WO2009151578A2 (en) * | 2008-06-09 | 2009-12-17 | The Board Of Trustees Of The University Of Illinois | Method and apparatus for blind signal recovery in noisy, reverberant environments |
WO2010058230A2 (en) * | 2008-11-24 | 2010-05-27 | Institut Rudjer Boskovic | Method of and system for blind extraction of more than two pure components out of spectroscopic or spectrometric measurements of only two mixtures by means of sparse component analysis |
US20100138010A1 (en) * | 2008-11-28 | 2010-06-03 | Audionamix | Automatic gathering strategy for unsupervised source separation algorithms |
US20100174389A1 (en) * | 2009-01-06 | 2010-07-08 | Audionamix | Automatic audio source separation with joint spectral shape, expansion coefficients and musical state estimation |
CN101478374B (en) * | 2009-01-19 | 2013-06-12 | 北京航空航天大学 | Physical layer network code processing method |
DE102010006956B4 (en) * | 2010-02-02 | 2012-03-29 | Technische Universität Berlin | Method and apparatus for measuring oxygen saturation in blood |
US9470551B2 (en) | 2011-12-20 | 2016-10-18 | Robert Bosch Gmbh | Method for unsupervised non-intrusive load monitoring |
JP6362363B2 (en) * | 2014-03-10 | 2018-07-25 | キヤノン株式会社 | Image estimation method, program, recording medium, and image estimation apparatus |
CN105050114B (en) * | 2015-06-26 | 2018-10-02 | 哈尔滨工业大学 | The Volterra prediction techniques that high band frequency spectrum occupies |
US10387778B2 (en) | 2015-09-29 | 2019-08-20 | International Business Machines Corporation | Scalable architecture for implementing maximization algorithms with resistive devices |
US10325006B2 (en) | 2015-09-29 | 2019-06-18 | International Business Machines Corporation | Scalable architecture for analog matrix operations with resistive devices |
US10565496B2 (en) * | 2016-02-04 | 2020-02-18 | Nec Corporation | Distance metric learning with N-pair loss |
US11270798B2 (en) | 2016-07-13 | 2022-03-08 | Koninklijke Philips N.V. | Central signal segregation system |
EP3684463A4 (en) | 2017-09-19 | 2021-06-23 | Neuroenhancement Lab, LLC | Method and apparatus for neuroenhancement |
WO2019094562A1 (en) * | 2017-11-08 | 2019-05-16 | Google Llc | Neural network based blind source separation |
US11717686B2 (en) | 2017-12-04 | 2023-08-08 | Neuroenhancement Lab, LLC | Method and apparatus for neuroenhancement to facilitate learning and performance |
US11478603B2 (en) | 2017-12-31 | 2022-10-25 | Neuroenhancement Lab, LLC | Method and apparatus for neuroenhancement to enhance emotional response |
US12280219B2 (en) | 2017-12-31 | 2025-04-22 | NeuroLight, Inc. | Method and apparatus for neuroenhancement to enhance emotional response |
US11364361B2 (en) | 2018-04-20 | 2022-06-21 | Neuroenhancement Lab, LLC | System and method for inducing sleep by transplanting mental states |
EP3849410A4 (en) | 2018-09-14 | 2022-11-02 | Neuroenhancement Lab, LLC | SLEEP ENHANCEMENT SYSTEM AND METHOD |
US11786694B2 (en) | 2019-05-24 | 2023-10-17 | NeuroLight, Inc. | Device, method, and app for facilitating sleep |
CN111126199B (en) * | 2019-12-11 | 2023-05-30 | 复旦大学 | Signal characteristic extraction and data mining method based on echo measurement data |
CN113095464B (en) * | 2021-04-01 | 2022-08-02 | 哈尔滨工程大学 | Blind source separation method based on quantum mucormycosis search mechanism under strong impact noise |
CN113259283B (en) * | 2021-05-13 | 2022-08-26 | 侯小琪 | Single-channel time-frequency aliasing signal blind separation method based on recurrent neural network |
US20230075595A1 (en) * | 2021-09-07 | 2023-03-09 | Biosense Webster (Israel) Ltd. | Weighting projected electrophysiological wave velocity with sigmoid curve |
CN115856215B (en) * | 2022-12-02 | 2025-03-04 | 国网福建省电力有限公司泉州供电公司 | A Blind Source Separation Gas CO and H2 Detection System Based on Enhanced Channel |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4965732A (en) * | 1985-11-06 | 1990-10-23 | The Board Of Trustees Of The Leland Stanford Junior University | Methods and arrangements for signal reception and parameter estimation |
US5272656A (en) * | 1990-09-21 | 1993-12-21 | Cambridge Signal Technologies, Inc. | System and method of producing adaptive FIR digital filter with non-linear frequency resolution |
IL101556A (en) * | 1992-04-10 | 1996-08-04 | Univ Ramot | Multi-channel signal separation using cross-polyspectra |
US5383164A (en) * | 1993-06-10 | 1995-01-17 | The Salk Institute For Biological Studies | Adaptive system for broadband multisignal discrimination in a channel with reverberation |
-
1994
- 1994-11-29 US US08/346,535 patent/US5706402A/en not_active Expired - Fee Related
-
1995
- 1995-11-28 AU AU45965/96A patent/AU4596596A/en not_active Abandoned
- 1995-11-28 WO PCT/US1995/015981 patent/WO1996017309A1/en active Application Filing
Non-Patent Citations (4)
Title |
---|
CICHOCKI A ET AL: "Robust learning algorithm for blind separation of signals", ELECTRONICS LETTERS, 18 AUG. 1994, UK, vol. 30, no. 17, ISSN 0013-5194, pages 1386 - 1387, XP002001214 * |
FU L ET AL: "Sensitivity analysis for input vector in multilayer feedforward neural networks", PROCEEDINGS OF 1993 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS (ICNN '93), SAN FRANCISCO, CA, USA, 28 MARCH-1 APRIL 1993, ISBN 0-7803-0999-5, 1993, NEW YORK, NY, USA, IEEE, USA, pages 215 - 218 vol.1, XP002001213 * |
KARHUNEN J ET AL: "Representation and separation of signals using nonlinear PCA type learning", NEURAL NETWORKS, 1994, USA, vol. 7, no. 1, ISSN 0893-6080, pages 113 - 127, XP002001215 * |
OBRADOVIC D ET AL: "LINEAR FEATURE EXTRACTION IN NETWORKS WITH LATERAL CONNECTIONS", INTERNATIONAL CONFERENCE ON NEURAL NETWORKS/ WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, ORLANDO, JUNE 27 - 29, 1994, vol. 2, 27 June 1994 (1994-06-27), INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, pages 686 - 691, XP000532128 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8224427B2 (en) | 2007-04-25 | 2012-07-17 | Ruder Boscovic Institute | Method for real time tumour visualisation and demarcation by means of photodynamic diagnosis |
US8165373B2 (en) | 2009-09-10 | 2012-04-24 | Rudjer Boskovic Institute | Method of and system for blind extraction of more pure components than mixtures in 1D and 2D NMR spectroscopy and mass spectrometry combining sparse component analysis and single component points |
CN112101280A (en) * | 2020-09-25 | 2020-12-18 | 北京百度网讯科技有限公司 | Face image recognition method and device |
Also Published As
Publication number | Publication date |
---|---|
US5706402A (en) | 1998-01-06 |
AU4596596A (en) | 1996-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5706402A (en) | Blind signal processing system employing information maximization to recover unknown signals through unsupervised minimization of output redundancy | |
Torkkola | Blind separation of delayed sources based on information maximization | |
Amari et al. | Adaptive blind signal processing-neural network approaches | |
Bell et al. | Blind separation and blind deconvolution: an information-theoretic approach | |
Lee et al. | A unifying information-theoretic framework for independent component analysis | |
Miao et al. | Fast subspace tracking and neural network learning by a novel information criterion | |
Zaknich | Principles of adaptive filters and self-learning systems | |
AU705518B2 (en) | Methods and apparatus for blind separation of delayed and filtered sources | |
US7711553B2 (en) | Methods and apparatus for blind separation of multichannel convolutive mixtures in the frequency domain | |
Choi et al. | Adaptive blind separation of speech signals: Cocktail party problem | |
Smaragdis | Information theoretic approaches to source separation | |
Woo et al. | Neural network approach to blind signal separation of mono-nonlinearly mixed sources | |
EP1088394B1 (en) | Adaptive state space signal separation, discrimination and recovery architectures and their adaptations for use in dynamic environments | |
Fiori | Blind separation of circularly distributed sources by neural extended APEX algorithm | |
Girolami et al. | Stochastic ICA contrast maximisation using Oja's nonlinear PCA algorithm | |
Chuah et al. | Robust CDMA multiuser detection using a neural-network approach | |
Fiori | Extended Hebbian learning for blind separation of complex-valued sources | |
Choi et al. | Blind signal deconvolution by spatio-temporal decorrelation and demixing | |
Grant | Artificial neural network and conventional approaches to filtering and pattern recognition | |
Mansour et al. | Blind separation for instantaneous mixture of speech signals: Algorithms and performances | |
Douglas et al. | Convolutive blind source separation for audio signals | |
Hyvarinen et al. | A neuron that learns to separate one signal from a mixture of independent sources | |
Zhang et al. | Geometrical structures of FIR manifold and multichannel blind deconvolution | |
Kawamoto et al. | Blind separation for convolutive mixtures of non-stationary signals. | |
Mei et al. | Convolutive blind source separation based on disjointness maximization of subband signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AM AT AU BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU IS JP KE KG KP KR KZ LK LR LT LU LV MD MG MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TT UA UG US UZ VN |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): KE LS MW SD SZ UG AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase |