Quasi-Nonparametric Blind Inversion of Wiener Systems

An e cient procedure for the blind inversion of a nonlinear Wiener system is proposed. We proved that the problem can be expressed as a problem of blind source separation in nonlinear mixtures, for which a solution has been recently proposed. Based on a quasi-nonparametric relative gradient descent, the proposed algorithm can perform e ciently even in the presence of hard distortions.


I. Introduction
When linear models fail, nonlinear models, because of their better approximation capabilities, appear to be powerfull tools for modeling practical situations.Examples of actual nonlinear systems include digital satellite and microwave channels which are composed of a linear lter followed by a memoryless nonlinear travelling wave tube ampli er 6 , magnetic recording channel, etc. Identi cation of such systems is, therefore, of great theoretical and practical interest.
Many researches have been done for the identi cation and or the inversion of such systems.Traditional nonlinear system identi cation methods have been based on higher-order input output cross-correlations 2 , or on the application of the Bussgang and Prices theorems 3 , 10 for nonlinear systems with Gaussian inputs.
These methods assume the availability of the nonlinear system input.However, in a real world situation, one often does not have access to the system input: blind identi cation of the nonlinearity becomes then a necessary and a powerful tool.Although higher order statistics of the output signal has been used in the detection of nonlinearities 11 , 12 or in the identi cation of cascade of linear system -memoryless polynomial nonlinearity -linear system 13 , blind identi cation of nonlinear systems has remained an intractable problem, except for a very restricted class of inputs, mainly Gaussian.
It is well known that, unlike the case of linear systems, prior knowledge of the model is necessary for nonlinear system identi cation 14 .This paper is concerned by a particular class of nonlinear systems.These are composed by a linear time invariant subsystem lter h followed by a memoryless nonlinear distortion function f Figure 1.This class of nonlinear systems, also known as Wiener systems, is not only another nice and mathematically attracting model, but also a model found in various areas, such as biology: study of the visual system 5 , relation between the muscle length and tension 8 , industry: description of a distillation plant 1 , sociology and psychology.See also 9 and the references therein.Despite its interest, at our knowledge, no blind procedure exists for the inversion of such systems.
The paper organizes as follows.In section II, the model, notations and assumptions are detailed.The cost function, based on mutual information, is proposed in section III.Derivation of the cost function optimization algorithm is exposed in section IV.In V, practical issues for implementing the algorithm are adressed.Finally, section VI illustrates the good behavior of the proposed algorithm in a hard situation, and its e ciency by means of Monte-Carlo simulations.

II. Model and assumptions
We suppose that the input of the system, S = fstg is an unknown non-Gaussian independent and identically distributed iid process, and that both subsystems h; f are unknown and invertible.We are concerned by the restitution of st by only observing the system's output.This implies the blind design of an inverse structure Figure 2, this structure will be composed of the same subsystems than the Wiener system, i.e. a linear lter and a memoryless nonlinearity, but in the reverse order, i.e. a Hammerstein system.The nonlinear part g is concerned by the compensation of the distortion f without access to its input, while the linear part w is a deconvolution lter.
The following notation will be adopted through the paper.For each process Z = fztg, z denotes a vector of in nite dimension, whose t-th entry is zt.Following this notation, the unknown input-output transfert can be written as: e = fHs 1 where: denotes a square Toeplitz matrix of in nite dimension and represents the action of the lter h on st.This matrix is nonsingular provided that the lter h is invertible, i.e. h ,1 exists and satis es h ,1 h = h h ,1 = 0 , where 0 is the Dirac impulse at t = 0 .One can recognise in equation 1 the postnonlinear pnl model 15 .However, this model has been studied only in the nite dimentional case, in which it has been shown that, under mild conditions, the system is separable provided that the input s has independent components, and that the matrix H has at least two nonzero entries perrow or percolumn.
Here, we conjecture that this will remain true for the in nite dimension case.The rst separability condition is full led since s has independent components due to the iid assumption.Moreover, due to the particular structure of the matrix H, the second condition of separability will always hold except if h is proportional to a pure delay.
The output of the inversion structure can bewritten in the same way than 1: y = W x 3 with xt = get.Following 15 , the inverse system g;wis estimated by minimizing the output mutual information.In fact, this criterion is always positive and vanishes i y has independent components, i.e. yt is an iid process.

III. Cost function
The mutual information of a random vector of dimension n is de ned by: Hz i , Hz 1 ; z 2 ; : : : ; z n 4 Since we a re interested in the mutual information of in nite dimension random vectors, a natural question is "how does this quantity grow with n?".This comes by using the notion of entropy rates of stochastic processes 4 .The entropy rate of the stochastic process Z = fztg is de ned as: HZ = lim T!1 1 2T + 1 Hz,T; : : : ; z T 5 if the limit exists.Theorem 4.2.1 of 4 states that this limit exists for a stationary stochastic process.We shall then de ne the mutual information rate of a stationary stochastic process by: Hzt , Hz,T; : : : ; z T = Hz , HZ 6 Here is arbitrary due to the stationarity assumption.We shall notice that IZ is always positive and vanishes i Z is iid.Now, since S is stationary, and since h and w are time-invariant lters, then Y is also stationary, and IY is de ned by: IY = Hy , HY 7 We need the following lemma, proof of which is given in Appendix A: To derive the optimization algorithm we need the derivatives of IY 11 with respect to the linear part w and with respect to the nonlinear function g.
A. Linear subsystem For the linear subsystem w, this is quite easy since the lter w is well parameterized by its coe cients.For the coe cient wt, c o rresponding to the t-th lag, we have: @Hy @wt = ,E @y @wt y y = ,E x , t y y 12 where y u = log p y 0 u.Since, by stationarity, y is independent of , in the following it will be denoted simply y .The second term of interest is see appendix B: One recognises the f,tg-th coe cient of the inverse of the lter w, which we denote w,t.
The derivatives of the other terms with respect to the w coe cients are null, which leads, by combining 12 and 13, to: @IY @wt = ,E x , t y y , w,t

B. Nonlinear subsystem
For this subsystem, we choose a nonparametric approach, implying no parametric-type restriction concerning its functional form.In consequence, and since the family of all possible characteristics is so wide, the only possible parametrisation of g is itself.This may seem to be confusing, but it will appear clearly in subsection V-D.Consider a small relative deviation of g, expressed as the composition of g by a "small" function : g !g + g 19 Then, the variations of the two terms Hy and E log g 0 e in 11 are considere.We It is thus su cient to take fQg 0 for insuring the descent condition.The algorithm writes then as: It is clear that 18 and 29 are unusable in practice.This section is concerned by adapting these algorithms to an actual situation.We consider then a nite discrete sample E = fe1; e 2; : : : ; e T g.We assume that we already have computed the output of the inversion system, i.e.X = fx1; x 2; : : : ; x T g and Y = fy1; y 2; : : : ; y T g.The rst question of interest is the estimation of the quantities involved in equations 18 and 29.

A. Estimation of y
Two approaches can beused to estimate y .The rst approach consists in estimating the pdfofy, and by y = log p y 0 one gets an estimate of y .A possible choice of an estimator of p y consists in using nonparametric approaches, in particular methods based on kernels 7 .
Kernel density estimators are easy to implement and have a very exible form.However, they su er from the di culty of the choice of the kernel bandwidth B. In fact, this choice strongly depends on the criterion used to evaluate the performance of the estimation.
Formally, using normalized kernels 2 , one estimates p y by: In our experiments we used the Gaussian kernel K has a Gaussian shape, however many other kernel shapes can begood candidates see 7 for more details.A "quick and dirty" method 2 R R Kudu = 1 .
for the choice of the bandwith consists in using the rule of thumb h = 1:06 ^ y T ,1=5 which is based on the minimum asymptotic mean integrated error criterion.Probably better choices could be used, but experimentally we noticed that the proposed estimator works ne.
A second approach for the estimation of y is based on the direct mean square minimisation of: y = E y y , y y 2 32 which can be done without knowing the true function y .In fact, using: E y y y y = ,E 0 y y equation 32 reduces to: y = E y y 2 + 2 0 y y + E y y 2 33 in which the true function y appear as a constant term.y being nonlinear, it could then be modeled, for example, by a neural network 16 and trained using the backpropagation algorithm.This approach has not been used in this paper where we prefer a nonparametric approach.

B. Estimation of y;yy
According to the previous subsection, it is assumed that one has an estimator o f y .We a re now interested in estimating: this can be explained by the low-pass ltering e ect of Q on J.

F. Indeterminacies
The output of the nonlinear subsystem xt; t = 1 ; : : : ; T should be centered and normalized.
In fact, the inverse of the nonlinear distortion can berestored only up to a scale and a shift factor.For the linear subsystem, the output yt; t = 1 ; : : : ; T should also benormalized due to scale indeterminacy on w.One can use the same normalisation scheme for yt.

A. Hard experiment
To test the previous algorithm, we simulate a hard situation.The iid input sequence st, shown in gure 5, is generated by applying a cubic distortion to an iid Gaussian sequence.The lter h is FIR, with the coe cients: h = 0 :826; ,0:165; 0:851; 0:163; 0:810 Its frequency response is shown in gure 3. Figure 4 shows that h has two zeros outside the unit circle, which indicates that h is non-minimum phase.
The nonlinear distortion is a hard saturation fu = tanh10u.The observed sequence is shown in gure 5.
The algorithm was provided with a sample of size T = 1000.The size of the impulse response of w was set to 51 with equal size for the causal and anti-causal parts.Estimation results, shown in gures 5,6 and 7, prove the good behavior o f the proposed algorithm.The phase of w Figure 6 is composed of a linear part which corresponds to an arbitrary uncontrolled but constant delay, and of a nonlinear part which cancels the h phase.

B. Performance analysis
The performance of the proposed algorithm is evaluated by means of Monte-Carlo simulations.Since we are interested in restoring the original sequence, a good performance index consists in evaluating the error between the original signal st and the restored signal yt.
The performance index writes then as: in which, it is supposed that yt has been properly delayed and rescaled.
A uniformally distributed signal in the range , p 3; p 3 is chosen for the simulations.Filter h is a simple low-pass lter with coe cients 1; 0:5 , and the nonlinear distortion is still fu = tanh10u.Figure 8 shows the results of the simulations, where each point is the average of 50 experiments.As one can expect, the performance index decreases when the sample size T increases.

C. Real data
The previous procedure has been tested on real data.Figure 9 shows a seismic refraction pro le recorded using a vessel and an Ocean Bottom Seismometer OBS over the Beaulieu plateau in France.
Each wave correspond to the signal recorded by the unique sensor, located on the sea bottom OBS, in response to the shots which are produced at regular time intervals by the vessel which moves at constant speed.
In this experiment, the proposed procedure provides an estimate of the nonlinearity g which is far from a linear characteristic.The output of the nonlinear processing is shown in Figure 10.
Figure 11 compares a purely linear deconvolution and the proposed nonlinear method assuming the existence of a nonlinear memoryless distortion.We can notice that the proposed procedure provides an enhanced version of the purely linear deconvolution revealing hidden structures not emphasized by the linear deconvolution.However, these results are currently not well understood since seismic experts consider usually purely linear models.The only thing that can be said is that the linear model does not hold since our procedure revealed the existence of a nonlinearity.

VII. Final remarks and conclusion
In this paper a blind procedure for the inversion of a nonlinear Wiener system was proposed.Contrary to other blind identi cation procedures, the system input is not assumed to beGaussian.Moreover, the nonlinear subsystem is unknown or not directly identi able.The inversion procedure is based on a quasi-nonparametric relative gradient descent of the mutual information rate of the inversion system output.It is inspired from recent results on blind source separation in nonlinear mixtures.
One may notice that some quantities involved in the algorithm can bee ciently estimated by resort to FFT and interpolation, which reduces dramatically the computational cost.The estimation of g is done implicitely: only the values of xt = get; t = 1; : : : ; T are estimated.One can further use any regression algorithm based on these data to estimate g, e.g.neural networks, splines, etc.
The proposed procedure performs well, even in hard situations hard nonlinearity and non minimum phase lter.Extension to complex channels is quite easy and is currently under investigation.
This approach could bepromising in many application areas, as suggest the preliminary results on seismic data.These results show that the method emphasizes novel information that cannot be obtained by purely linear methods.
Aknowledgement: Hh X + h Z , HX + Z = Hh X , HX 52 for any Z.This expression shows that the di erence between the output and input entropy is independent from the input distribution.We can then write:

Lemma 1 : 8 2
Let X bea stationary stochastic process, and let h bean invertible linear time invariant lter, then: Since yt = h xt, one can write, applying lemma 1: xt = get and by using the stationarity of E = et, one can write: jd , E log g 0 e , HE 11 IV.Theoretical derivation of the inversion algorithm

1
Small enough to insure the validity of the rst order variation approximation.It leads to a v a riation: IY = ,E y y fw xg , E 0 x 22 of IY.Now let us write: fw x , vg + 0 x , v | z Jv vdv 25To make a gradient descent, we may take:v = Q Jv 26where is a small positive real constant, and Q is any function such that: that IY decreases.Using the Parseval equality, the last condition becomes Z R jJ j 2 fQgd 0 28 where J and Q are the Fourier transforms of J and Q respectively and denotes the real part.
For example, the mean and variance of xt are estimated by:

Fig. 10 .Fig. 11 .
Fig. 9. Original seismic data In order to reduce the computational load when T is large, equation 38 can be simpli ed.Instead of computing QJ at each sample xt, it is computed only at N T equally-spaced points in the range x min ; x max of the input signal xt.The values of QJ at the sample xt are then obtained by linear interpolation.Equation 37 is in fact quite robust to sampling: This work has been in part supported by the Direcci o General de Recerca de la Generalitat de Catalunya.The authors are very grateful to the nonlinear research group of the LIS laboratory for fruitful discussion and to F. Glangeaud for p roviding seismic data.