Features Extraction Based on the Discrete Hartley Transform for Closed Contour

In this paper the authors propose a new closed contour descriptor that could be seen as a Feature Extractor of closed contours based on the Discrete Hartley Transform (DHT), its main characteristic is that uses only half of the coefficients required by Elliptical Fourier Descriptors (EFD) to obtain a contour approximation with similar error measure. The proposed closed contour descriptor provides an excellent capability of information compression useful for a great number of AI applications. Moreover it can provide scale, position and rotation invariance, and last but not least it has the advantage that both the parameterization and the reconstructed shape from the compressed set can be computed very efficiently by the fast Discrete Hartley Transform (DHT) algorithm. This Feature Extractor could be useful when the application claims for reversible features and when the user needs and easy measure of the quality for a given level of compression, scalable from low to very high quality.


Introduction
Features extraction is one of the common steps in a good number of AI applications, ranging from machine learning, pattern recognition, data mining or computer vision.The goal of this step is to represent the information available in the data, minimizing their redundancy and hence their dimensionality.There are some approaches to feature extraction that are valid for many kind of data, like Principal Components Analysis (PCA), however in other cases the feature extraction can provide a better compression ratio if the features are oriented to a specific kind of data or to a specific use of these data, i.e. the SIFT descriptors for images and computer vision [17,18].In this paper we will focus on a specific kind of data, we are dealing with silhouettes of objects [30], represented with a raw sequence of numbers that are extracted from the (x, y) graph coordinates of the objects' outline.All the examples we will use have a silhouette with a single closed contour, but a similar approach could be used for silhouettes of objects that are represented with multiple closed contours or even with one or more open con-tours, however this issue is out of the scope of this text.This sequence of x, y coordinates are usually the raw data for further quantitative shape analysis that is often required in many applied fields [19] such as agronomy, medicine, genetics, ecology or taxonomy.In the case of the analysis of biological shapes, the raw data are usually extracted from the silhouettes contours of images [15,19].In many applications the contours of the silhouettes are used to quantify or classify the data automatically, for example [7] and [31] classified soybeans and flowers from the shape of their leafs or petals; or even [5], who distinguished fishes species or populations from their respective otoliths.
One of the major problems when performing a quantification of contour sets automatically is the large amount of data involved in describing the biological shape.With a suitable feature extraction method the most relevant information for a particular purpose can be represented with a reduced number of coefficients, and hence with less dimensions than the original data.Although different feature extractors and descriptors from contours or silhouettes have been used: chain codes [25], radial descriptors [30], Zernike moments [11], skeletons and Medial Axis Transforms [14] and many others [32], any of them fulfil all the desirable characteristics in such a diverse scope of applications.Sometimes it is useful that these features are scale invariant, rotation invariant and translation invariant, moreover in some applications it is needed that the extracted features could represent back the original contour because an human expert need to verify which characteristics of the shape are preserved with the set of selected features.Elliptical Fourier Descriptors (EFD) are one of the reference feature extractors for biological shape contours [15], especially suited to describe shapes with high detail and when you need to differentiate relatively close shapes, other shape descriptors fail in this purpose [32], another one of the reasons for their intensive use in this field is the popularization of tools like SHAPE [10], moreover they fulfil all the previous requirements and they could be extracted with a fast computation algorithm, by means of the Fast Fourier Transform (FFT).However EFD are not appropriate when you deal with non-rigid shapes or when there are occlusions in the shape, other Features extractors perform better in these cases, like [13].Elliptic Fourier Descriptors (EFD) were first proposed by [12] and one of the reasons for its wide acceptance is because EFDs can represent all kinds of close curves as well as preserve the original shape information when shape reconstruction is required using only a limited number of coefficients, providing intuitive information about the number of coefficients required to preserve a given level of detail in the shapes, the quality of the shape can be easy scalable with the number of coefficients used, with the first coefficients you describe the basic ellipsoids, and with higher orders coefficients you get the details of the contour.We found some examples applied to the characterization of biological contours of animals and plants; see for instance the works in [9,23,28,29].Concerning the practical uses of EFDs, although the reconstruction of any discrete contour can be perfect with the appropriate number of EFD coefficients, in real applications it is mandatory to achieve a good balance between the preservation of the relevant shape information and the data dimensionality reduction, this is done taking the first coefficients from the EFD.In some automatic classification problems this feature extraction step is the first data dimensionality reduction applied on the data [19].

Purpose and contributions of this paper
In this work a new 2D contour features extractor is presented, it is based on the Discrete Hartley Transform (DHT) applied on the (x, y) outline of a silhouette with a single closed contour.The proposed descriptors -we will name them Hartley Contour De-scriptors (HCD) -maintain the same good properties of interpretation and reconstruction as EFDs and, in a similar way to EFDs, can be applied to all kinds of close curves, being robust to rotations, translations and scale of the silhouette if it is required by the application, moreover they can provide a quality scalable feature extractor depending on the number of coefficients used.The new parameterization, however, outperforms, by far, the power of information compression that EFDs have, needing approximately half the number of coefficients to represent the same level of contour details.To show these results in this work we have established an error measure based on the Euclidean distance from the original shape to the reconstructed counterpart, with this error measure we can analytically verify the quality of contour approximations and we can compare its performance with the EFDs for a given number of coefficients, although the differences are so clear that this is easily checked by direct visual inspection.As in the case of EFD there are fast algorithms available for its calculation.
The work is organized as follows.Section 2 gives an EFD overview.Section 3 presents the new contour descriptors.Section 4 introduces a distance measure to evaluate the performance of the HCD.Section 5 introduces a classification problem with fish otoliths to validate the proposed Feature Extractor.Section 6 evaluates the performance of HCD respect to EFD directly in the shapes and compares the result of EFD with HCD in a test classification problem.Finally, in Section 7 some conclusions are reported.
This paper is a significant extension of an earlier and much shorter version [21] presented in CCIA'2013 Congress.First of all with a complete rewriting of the Introduction, specially focused on an overview of existing methods in AI related fields, mainly visual processing, image classification, preprocessing, patterns recognition and visual computing; secondly with the inclusion of Section 5; and finally with the extension of Section 6, adding the results with the comparison between different shapes and the classification results with a Test-dataset of fish otoliths.

Elliptical Fourier Descriptors extractor overview
As it is well-known a continuous closed contour of a silhouette could be defined by the evolution of the coordinates of the outline x(t) and y(t) along the variation of t with period T , the periodicity is due to na-ture of the closed contour when the outline reaches the starting point.Moreover, due to their periodic nature, the contour coordinates can be expanded using the Fourier series and can be written in their equivalent real or complex forms as Eqs ( 1) and ( 2): where: T t dt with: The real coefficients a k , b k , c k and d k become an alternative to perfectly describe the outline of the silhouette and are known as Elliptic Fourier coefficients.From [3] it can be seen that the coefficients a 0 and c 0 only represent the position of the centroid of the shape.If a 0 and c 0 take the value zero the contour is centred in the origin, and then the features are independent of the position in the coordinate space.The contour approximation based on the EFDs is achieved by selecting a reduced set of coefficients.This is, by limiting the number of harmonics in the following way in Eqs (3) and (4): The approximation of x K (t) and y K (t) to x(t) and y(t) is greater as K increases.
The contours from digital 2D silhouettes have a discrete nature and what we really have are the discrete signals x n and y n which can be thought of as sampled versions of x(t) and y(t) at the instants t = nT /N where n goes from 0 to N − 1.In practice, then, we have the N pairs of points (x n , y n ) in a fundamental period.The discrete versions in Eqs (1)-( 4) are obtained first replacing t by the discrete values nT /N (n = 0, . . ., N −1) and second, taking into account that the discrete lowest frequency able to represent, considering N the fundamental period, is ω = 2π/N and the rest of frequencies will be multiples of that one: ω k = 2πk/N .The algorithms to compute the discrete coefficients are well known and can be found in [15].Figure 1 shows the contour of a butterfly and its x and y coordinates.Figure 2 shows a reconstruction of that contour and coordinates from a reduced set of EFD coefficients.

Schematic steps for the EFD Feature Extractor
The basic algorithm involved in EFD Feature Extractor of a digital silhouette represented by a closed contour could be summarized with 3 steps: • Outline from the silhouette.
• EFD of the contour.
• Quality measure: selection of K first coefficients.

Preliminary
The new Feature extractor substitutes the EFD in the previous method by another Transform, the Discrete Hartley Transform (DHT).The Discrete Hartley Transform (DHT) is a linear and a invertible operation which transforms a sequence of N real numbers x 0 , . . ., x N −1 into a new sequence of N real numbers H 0 , . . ., H N −1 according to the formula in Eq. ( 5): where the function cas(•) is defined as: The transform is inverted by the following operation of Eq. ( 6): In our definition, the factor 1/N is associated with the inverse transform to maintain consistency with the DFT (Discrete Fourier Transform).The DHT (Discrete Hartley Transform) was originally proposed by Bracewell in 1984.The DHT has the advantage with respect to the DFT of being a purely real transform.Some of its properties and algorithms can be found in [2,4] and [3].

New contour descriptors
Taking all this information into account, we propose the signal expansion for the contour coordinates x n and y n in terms of the Hartley coefficients in the form of Eqs ( 7) and ( 8): where the coefficients o k and p k are: The new feature extractor with a scalable compression capability, could be obtained selecting K coefficients of the Discrete Hartley Transform.The contour approximation is obtained by a reconstruction of the coordinates limiting the number of coefficients by choosing a K < N.Then, the sequences representing the coordinates of the approximation with K elements, xn,K and ỹn,K , can be written as Eqs (11) and ( 12): n = 0, . . ., N − 1.

P. Martí-Puig et al. / Features extraction based on the Discrete Hartley
Transform for closed contour 107

Fast algorithms and 3D extension
To measure the complexity of the calculation of the features, we know that the matrix vector multiplication requires a number of multiplications proportional to N 2 , being N the length of the vector, however there are some fast algorithms for the DHT that only require an order of N log 2 N multiplications.The DHT can be computed via FFT or directly via a fast algorithm [27].Fast algorithms are analogous to the ones existing for the FFT [27].Advances in the computation of the DHT sometimes appear in parallel with advances in the FFT case; see [6].
The compression strategy can be extended to 3D contours by performing the same operation on each coordinate.Figure 3 shows a closed 3D curve with its coordinates and Fig. 4, using the coordinates of Fig. 3, shows some different reconstructions obtained for a limited number of Hartley coefficients.

Error metrics
There are many measures to compare two different shapes, one of the most popular is the Chamfer Distance [1,24], it can be formalized [8] to measure a distance between two contours, C a and C b , that could be of different lengths Na and Nb: First we define the Chamfer Distance D(C a , C b ) as the mean Euclidean distance between every point in C a to the closest point in C b , as is represented in Eq. ( 13): with the closest point subindex J(i) defined in Eq. ( 14): where ArgMin[f , x] gives the position x min where f is minimized.
We could define the Chamfer Error between two contours C a and C b as a mean Chamfer Distances from C a to C b and from C b to C a ; it can be expressed with Eq. ( 15): However we can introduce a simplified error metric E K in order to analytically measure the quality of the approximations for a given number K of coefficients, the number K could be seen as an inverse value of the compression.This error metric is a simplified alternate with complexity O(n), that is only useful if the shapes are well aligned and have the same number of samples.The Chamfer Distance and its corresponding Chamfer Error defined in Eq. ( 15), has a complexity O(n 2 ), although there can be found efficient algorithms that have complexity O(n This error metric is measured between the original shape and a reconstructed contour (both of the same length N ), it is based on the Euclidean distance and produces a measure of the quality of the proposed Feature Extractor for a given number of coefficients K expressed in Eq. ( 16): In the Eq. ( 16) the subindex K represents the number of coefficients used in the contour reconstruction.The same measure in dBs will be 10 log 10 E K .As we will see, the proposed parameterization HCD outperforms the power of information compression that EFDs have, needing approximately half the number of coefficients to represent the same level of contour details.

Classification of fishes species from otoliths
To evaluate the effectiveness of this Feature Extraction method for closed contours, we will test it in a complete test-Classifier.In [22] the authors presented an Automatic Taxon Identification system, reproduced in the Fig. 5.The ATI is able to classify fish's species from the shapes of a query otolith image.The otoliths of our test-Dataset are obtained from the registers of AFORO [16], a web based environment for shape analysis of fish otoliths and at the same time a database of otoliths.The procedure for each query image is as follow: first a silhouette of the high quality otolith image is extracted, in these cases a simple segmentation using Otsu method [20] is enough as the images are highly contrasted; then the outline of the closed contour is obtained with a morphologic contour extractor [26]; and previously to the entrance to the classifier a Feature Extractor is used to reduce the dimensionality of the data.The experimental classifier distinguishes the 10 classes of the test-Dataset, represented in Table 1, with a One-versus-all strategy on 10 classifiers based in linear SVM for each one of the classes.It is important to notice that we are interested in the comparison of the performances of the Feature Extraction phase, not the absolute values of the classifiers, that is the reason why we have not proved other classifiers maybe more suitable for the Data composition and their na-  ture.The experiment shows the results of the classification modifying two configuration values of the global ATI system: the number of points of the contours N , and the order of the Feature Extractors K.The value N is used to normalize the contours of different specimens, and can usually go from 4 to 1024 when dealing with otoliths.The value L is the total number of dimensions in the classification space; it is related to the order of the Transformed domain K, either EFD (Elliptic Fourier Descriptors), with the equation we use the HCD (Hartley Contour Descriptor).When L is low, the contour is represented with basic ellipsoids, and the selected features do not represent the detail of the contours, although the classifiers have a low dimensional space (and their search engine is easier to train and generalize).When L is higher, the contour is represented with more details; however, the classifier has a higher dimension and thus becomes unstable and sometimes cannot be generalized.

Error measures results
Figures 6 and 7 compare graphically the power of information compression of EFD with the power of the proposed parameterization method HCD.The results are shown using a butterfly contour.In the top left section of these two figures the original contour is plotted while in all the other subplots the reconstructed contours are simultaneously represented using both descriptors with the same number of limited coefficients.Figure 8 shows the comparison of errors of reconstruction for EFD and the proposed method for the same butterfly contour quantified using the error measure of Eq. ( 16) in a logarithmic scale.Note that the same error is achieved with half of the coefficients when the HCD are used instead of EFD.As an example, the quality of the contour approximation given by the EFDs with 100 coefficients is reached with only 50 coefficients by means of the proposed HCD method.In order to provide another example, the next error measure is performed on one of the fish otolith of the AFORO Database [16].In Fig. 9 there is an otolith image and its contour together in the upper side, and in the bottom you could find the comparison of the errors of the reconstructions obtained from the EFDs vs the HCD.Note again that, using the proposed HCD descriptors, the same reconstruction error done by EFDs is achieved by reconstructions using half the number of our proposed coefficients.

Average errors results for the Test Dataset
In this section we compare the error results with the two mentioned Error metrics: in Fig. 10 with the Chamfer Error, and in Fig. 11 with our own Error metric.In order to have a wider representations about these error measures we consider the average error in the whole Test Dataset represented in Table 1 using different number of reconstructing coefficients K, instead of a single shape like the one used in Figs 8 or 9.

Classification results with EFD vs HCD
To finish this series of comparative studies, we obtain a comparison of the classification performances for the two Feature Extractors (EFD vs. HCD) with the ATI system proposed in previous Section 5.In Fig. 12 it is represented a tridimensional graph with the mean value of correct classification results.The graph shows the fluctuation of the number of correct answers with  mensionality of the data L.For a deeper explanation about the Test Dataset and the results for the EFD feature extractor, readers are referred to [22].The resulting graphic is the same for one Feature extractor or the other, however the recommended one should be HCD, because for any given Transform order K, it needs half of the coefficients respect to the EFD counterpart in order to obtain similar classification results.

Conclusions and future work
In this work we have shown a Features Extraction Method for two dimensional contours that exhibits more information compression than the elliptic Fourier Descriptors.From the results reported in this work we must conclude that, given the same number of coefficients representing a contour, the new contour descriptors and Feature Extractors based on Discrete Hartley Transform (HCD -Hartley Contour Descriptors) clearly outperform the quality and the accuracy in the contour reconstruction than their EFD counterpart.It must be noted that previously to presenting these results, we had explored other shape descriptors associated with other discrete transforms.The Fast Fourier transform, which is a complex transform, have the same performance than the EFD and it is possible to obtain the EFD coefficients from the FFT ones.Other discrete real transforms were explored and tested to evaluate their compression capability as the Walsh Hadamard Transform (WHT) and the different families of the Discrete Transforms (DCT).However, contour shape descriptors based on the DCT have not shown significant compression capability improvement with respect to EFDs, and at the same time required a similar number of coefficients to reach a similar level of contour details.The WHT based descriptors, although they provided some computational advantages because they can be computed without performing multiplication, displayed worst results than the EFDs.The use of the Hartley Discrete Transform was the one that provided best compression results in the preliminary studies, this was the reason why it was selected for this work.We have also shown that this Feature Extraction ability could be useful for classification problems, reaching similar results with half of the dimensions in the Compressed Dataset thanks to the Feature Extractor.We also guess that the exposed method could be used in some applications of pattern recognition and also in computer vision, due to their fast computation possibilities.And last but not least, this Feature Extraction method could be prepared to be invariant to translation, rotation and scale if it is necessary, to do that you can use similar strategies than the ones used by its EFD counterpart in [19], moreover due to their reversible nature, they offer a straightforward method to measure the quality for each compression level.

Fig. 3 .
Fig. 3. On the left, the x, y and z coordinate of the 3D closed contour represented on the right.(Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-140620.)

Fig. 5 .
Fig. 5. Scheme of the ATI (Automatic Taxon Identification).Obtains the best matching otolith from AFORO database from a query of a high quality otolith image.

Fig. 9 .
Fig. 9. (Up) The image of an otolith on the left and on his right there is an image of the silhouette outline.(Down) Error measurements (dB) for the reconstruction of the upper contours using a number of coefficients that goes from 4 to 240.The EFD contour reconstructions are represented in the upper graph curve.Equivalent results with the new HCD descriptors are represented in the lower graph.(Colors are visible in the online version of the article; http:// dx.doi.org/10.3233/AIC-140620.)

Fig. 10 .Fig. 11 .
Fig. 10.Average error measurements (dB), using Chamfer Error, using the Test Dataset of the Otoliths contours reconstructed with K coefficients.The error performed by EFD reconstructions is represented in the upper graph curve and the equivalent results performed by the new HCD descriptor are in the lower graph.(Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-140620.)

Fig. 12 .
Fig. 12. Mean value of correct answers with ATI system with any Feature extractor.Results for each N (total length of the contours) and each K (order of the Transformed coefficients).(Colors are visible in the online version of the article; http://dx.doi.org/10.3233/AIC-140620.)

Table 1
Otoliths species in the Test Database with the corresponding characteristics: Shape, Size range of the fish in millimetres, number of elements in the test database