Parametric approach to blind deconvolution of nonlinear channels

Abstract A parametric procedure for blind inversion of nonlinear channels is proposed, based on a recent method of blind source separation in nonlinear mixtures. Two parametric models are developed: a polynomial model and a neural model. The method, based on the minimization of the output mutual information, needs the knowledge of log-derivative of the input distribution (score function). Each algorithm consists of three adaptive blocks: one for estimating the score function, and two other blocks for estimating the inverses of the linear and nonlinear parts of the channel. Experiments show that the algorithms perform efficiently, even for hard nonlinear distortion.


Introduction
When linear models fail, nonlinear models appear to be powerful tools for modelling practical situations. Many researches have been done in the identification and/or the inversion of nonlinear systems. These assume that both the input and the output of the distortion are available, and are based on higher-order input/output cross-correlation [2] or on the application of the Bussgang and Prices theorems [3,8] for nonlinear systems with Gaussian inputs. However, in a real world situation, one often does not have access to the distortion input. In this case, the blind identification of the nonlinearity becomes the only way to solve the problem. This paper is concerned by a particular class of nonlinear systems, composed by a linear subsystem followed by a memoryless nonlinear distortion ( Figure 1). This class of nonlinear systems, also known as Wiener systems, is a nice and mathematically attracting model, but also a actual model used in various areas, such as biology [6], industry [1], sociology and psychology (see also [7] and the references therein). Despite its interest, today, there does not exist completely blind procedure for inverting such systems.  We suppose that the input of the system S={s(t)}is an unknown non-Gaussian independent and identically distributed (i.i.d.) process, and that subsystems h, f are respectively a linear filter and a memoryless nonlinear function, both unknown and invertible. We would like to estimate s(t) by only observing the system's output. This implies the blind estimation of the inverse structure (Figure 1), composed of similar subsystems: a memoryless nonlinear function g and a linear filter w. Such a system is known as a Hammerstein system. Let s and e be the vectors of infinite dimension defined from the processes S={s(t)} and Ε={e(t)} respectively, whose t-th entry is s(t) or e(t). The unknown input-output transfert can be written as: ( ) where: is a Toeplitz matrix of infinite dimension which represents the action of the filter h to the signal s(t). The matrix H is non-singular provided that the filter h is invertible, i.e. satisfies h -1 * h = h * h -1 = δ 0 , where δ 0 is the Dirac impulse at t = 0. Equation (1) corresponds to a postnonlinear (pnl) model [12]. This model has been recently studied in nonlinear source separation, but only for a finite dimensional case. In fact, with the above parametrization, the i.i.d. nature of s(t) implies the spatial independence of the components of the infinite vector s. Similarly, the output of the inversion structure can be written . Following [12,13] the inverse system (g, w) can be estimated by minimizing the output mutual information, i.e. spatial independence of y which is equivalent to the i.i.d. nature of y(t).

Cost function
The mutual information of a random vector of dimension n, defined by: is extended to a vector of infinite dimension, using the notion of entropy rates of stationary stochastic process [4]: where τ is arbitrary due to the stationarity assumption. We shall notice that I(Z) is always positive and vanishes iff Z is i.i.d. Now, since S is stationary, and since h and w are time-invariant filters, then Y is also stationary, and I(Y) is defined by: Combining (6) and (7) in (5) leads finally to:

Theoretical derivation of the inversion algorithm
To derive the optimization algorithm, we need the derivatives of I(Y) (8) with respect to the parameters of the linear part w and of the nonlinear function g.

Linear subsystem
The linear subsystem is well parameterised by the coefficients of the filter w. The derivative of I(Y) with respect to the coefficient w(t), corresponding to the t-th lag, is easily computed: It leads to the following gradient descent algorithm (see [13] for details):

Nonlinear subsystem
For this subsystem, we propose two different models for the function g. Both methods are parametric, the first is based on a polynomial model of g, and the second on a multilayer perceptrons.

Polynomial parameterization
The gradient descent algorithm for g requires the derivatives of I(Y) (8) with respect to the coefficients a n of the polynomial. The derivatives of the right-side terms of (5) are: and Combining (12) and (13) leads to: Equation (14) is the gradient of I(Y) with respect to polynomial coefficients a n , and will be used to provide a gradient descent algorithm for estimating g.

Neural network parametrization
In this subsection, we model g using a multilayer perceptron with one hidden layer: The gradient descent algorithm of g requires the derivatives of I(Y) (8) with respect the network coefficients, m, n and θ: Equations (16, 17, 18) are the gradients of I(Y) with respect to network coefficients and will be used for deriving a gradient descent algorithm of g.

Experimental results
In order to prove the efficiency of the previous algorithms, we simulate a difficult situation, where the input sequence s(t) is an i.i.d. random sequence, filtered by a non-minimum phase FIR filter h=[-0.082, 0, -0.1793, 0, 0.6579, 0, 0.1793, 0, -0.082] and followed by the nonlinear distortion f(u) = 0.1u+tanh (5u) (see figure 2). Note especially on Fig. 2-right, the saturation due to the function tanh(.). On the frequency response of h, shown in figure 3, we can see that h a non-minimum phase filter. The algorithm was trained with a sample size of T = 1000. The length of the impulse response of w is set to 21 with equal length for the causal and anti-causal parts. Estimation results, shown in figures 4 and 5, prove the efficiency of the two algorithms, with parametric, either polynomial or neural, models for estimating the inverse of nonlinear function f.
, the output S/N, where N is the error power and S is the estimated signal power. After adequate processing (delaying and re-scaling of y(t)), one obtains dB N S 18 + ≈ with both polynomial and neural models.

Conclusions
In this paper, two blind parametric procedures for the inversion of a nonlinear channel (Wiener system) were proposed. Contrary to other blind identification procedures, the system input is not assumed to be Gaussian. If the input is not iid, but is a linear filtering of an iid noise (the so-called innovation), the output provides the innovation. The restitution of the input then requires more information, for instance the distribution or covariance matrix. Moreover, the nonlinear subsystem is unknown and cannot be directly estimated because its input is not available. The inversion procedure, in both cases, is based on the minimization of the mutual information rate of the inverse system output. The estimation of g is done according to a parametric model, using either a polynomial model or a neural network. Both models leads to good results, even in difficult situations (hard nonlinearity and non minimum phase filter).