International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 60

ISSN 2229-5518

Wavelet Method for Detecting and Modeling Anomalous Observations in Gaussian and Non - Gaussian Distributions

ABTRACT: Wavelet analysis has been applied recently for analyzing data completely due to its potential. In this paper, we present aberrant observation detection and modeling approach based on wavelet analysis in Gaussian and Non- Gaussian distributions. In order to characterize these distributions, a simulation of 1020 data set from normal distribution and contaminated with four normal data and later with four aberrant observations since wavelet analysis is dyadic. It was discovered that Normal (Gaussian) distribution with aberrant observations is the most efficient in detecting aberrant observations while Laplace (Non-Gaussian) distribution is the optimal distribution in modeling aberrant observations using the three distributions.

Index Terms: Wavelets, Outliers, resolution, Residuals, Distributions, Gaussian, Discrete, Analysis

Aberrant observations (outliers) are defined as data points that are distinctly separate from the rest of the data. It is an observation that lies an abnormal distance from other values in a set of data. In statistics, an aberrant observation is an observation that is numerically distinct from the rest of the data. They can occur by chance in any distribution but are often indicative either of measurement error or that the population is heavy tailed. It can also indicate faulty data, erroneous procedures, etc. Section 2 looks at the overview of wavelet analysis which uses both resolution and location in analyzing data completely. Section 3 describes how these outliers will be detected using these distributions which are the main goal of this paper. Section 4 discusses the analysis for these residuals while Section 5 interprets the results, conclusion and informed us of areas of further work.

Wavelet analysis is a statistical tool that can be used to extract information from any kind of data and are

generally needed to analyze data fully at different resolution (scale) and location.

Discrete Wavelet Transform re - expresses a time series in terms of coefficients that are associated with a particular time and a particular dyadic scale 2 J . These coefficients are fully equivalent to the original series from its Discrete Wavelet Transform coefficients.

The Discrete Wavelet Transform allows us to

partition (decompose) the information in a time series into pieces that are associated with different scales and time. This decomposition is very close to the statistical technique known as the Analysis of variance (ANOVA), so DWT leads to a scaled β based ANOVA that is quite analogous to the frequency β based ANOVA provided by the power

spectrum R. Todd Ogden (Dec., 1996).

It effectively decorrelates a wide variety of time series that occurs quite commonly in physical applications. This property is the key to the use of the DWT in the statistical methodology

L. Li and G.Lee (2003).

IJSER Β© 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 61

ISSN 2229-5518

We begin with a set of discrete sequence of data

π¦ = π¦1 , π¦2 β¦ β¦ π¦π Where each of π¦π is a real number and π is an integer ranging from 1to n. we assume that the length of our sequence n is a power of two, π =

2π½ for someπ½ β₯ 0. This should not be seen as a

restriction as this can be modified for other n

Abraham Maslow (Dec., 2008). We call the sequence

Where π = 2 dyadic one. The key information we

extract is the βdetailβ in the sequence at different

scale and different locations. By detail we mean the degree of the difference or variation between

successive observations of the vector that is, π1 =

π¦2 β π¦1 at the given scale and location.

ππ = πππ‘πππ ππ‘ πππππ‘πππ π

ππ = (π¦2π β π¦2πβ1 ) β¦β¦β¦β¦(2.1)

πΉππ π = 1,2, β¦ β¦ β¦ . , ποΏ½2

e.g

π1 = π¦2 β π¦1 , π2 = π¦4 β π¦3, π3 = π¦6 β π¦5 , ππ‘π

In equation (2.1) if the detail in π¦2π β π¦2πβ1 are

similar, then the coefficient ππ will be very small; if

they are exactly the same, ππ is zero and if very large

the coefficient will be very large. ππ encodes the

difference between successive pairs of observations

in the original y vector. ππ Is known as the finest

scale detail Abraham Maslow (Dec., 1998)

π

If π = π½ β 1 then ππ can be written as ππ ,π and the first level averages or smooth πΆπ are renamed to becomes πΆπβ1 , π written as πΆπ,π

To obtain the next coarsest detail, we repeat the

operation of equation (3.1) to the finest level

averages πΆπβ1 , π as follows.

The scale in the quantity 2π½ where π = π½ β 1 and the

level for the intergral quantity while k is the locations

Larger π (positive) corresponds to finer scale and smaller π refers to the coarser scale in the contents of this work from equation 2.2 πΆπβ1,π

πΆπβ1,2πΏ = πΆπβ1,2πΏ + πΆπβ1,2πΏβ1β¦β¦β¦..β¦β¦. (2.3)

For π = 1,2, β¦ β¦ β¦ β¦ β¦ . , ποΏ½4

From the original vector π¦ for π = 1

πΆπβ1, π = (π¦4πΏ+2 + π¦4πΏβ3 ) β (π¦4πΏ + π¦4πΏβ1 )

= (π¦2 + π¦1 ) β (π¦4 + π¦3 )

= π¦1 + π¦2 + π¦3 + π¦4

This is a kind of moving average except that it is not

divided by 1οΏ½4

ππ,π βdetailedβ coefficients are wavelet coefficients

and πΆπ,π coefficients are known as father wavelet or

οΏ½2

π π=1

Is not the conventional first difference vector

scaling function coefficients.

since difference such as π¦3 β π¦2 are missing from

{ππ } location. ππ only gives information about 2π

and its neighbor at the finest possible scale of detail

G.P. Nason (2008)

This general pyramid algorithm is called Haar wavelet transform.

The inverse of the original sequence can be

reconstructed exactly by using wavelet coefficients

At Coarser Scale; for coarser detail

ππ,π

and last πΆ00

W. Lu and I. Traore (2005)

πΆπ = π¦2π + π¦2πβ1β¦β¦β¦β¦β¦β¦β¦ (2.2)

ποΏ½

The behavior of sparsity is a characteristic of

[πΆπ ]

2 is the sum of scaled average (scaled because

π=1

it is not divided by 2). The information in [ πΆπ ] is a

roughing of that original π¦ vector. The operation that

turns [π¦π ] to [ πΆπ ] is similar to the moving average

smoothening operation except that the differencing

does not overlap consecutive pairs. A. Dainotti, A. Pescape and G. Viorgio (2006)

Each πΆπ contains information originating from π¦2π

and π¦2πβ1 (adjacent observations)

wavelet: piece wise smooth functions have sparse

representation G.P. Nason (2008).

To conserve information we change equation (2.1)

and (2.2) by introducing πΌ as follows

ππ = πΌ(π¦2π β

π¦2πβ1 )β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦. (2.4)

πΆπ = Ξ± (π¦2π +

π¦2πβ1 )β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦ (2.5)

The original sequence π¦ consist of 2π½ observations

The inputs are (π¦2π

, π¦2πβ1

) transformed into the

{ππ } consist of ποΏ½2 = 2π½β1 observations

output (π , πΆ ) and the (squared) norm of the output.

π π

IJSER Β© 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 62

ISSN 2229-5518

2 + 2 = 2 2

β

πΆπ = οΏ½ βπΏ π¦2πβπ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ . (2.11)

πΏ=*β*

d k ck

οΏ½

Ξ± (π¦2π + 2π¦2π π¦2πβ1 + π¦2πβ1 )

=+2βΞ±2 (2 π¦(π¦2π2 ++2π¦ 2π π¦2)πββ¦1 +β¦.π¦.(22π.β61))

βπΏ

β1

2π

2 2

2πβ1

2

= οΏ½2

οΏ½2 πππ πΏ = 0

οΏ½2 πππ πΏ = 1

β¦ β¦ β¦ β¦ β¦ β¦ β¦ (2.12)

Where π¦2π + π¦2πβ1the squared norm of the input

coefficients hence is to wish the norm of output

equals norms of input

Let2 Ξ± 2 =1 therefore

Ξ± οΏ½ = 2 οΏ½2

β1

0 ππ‘βπππ€ππ π

Like the orthonormal discrete Fourier transform, the

discrete wavelets transform (DWT) of ππ‘ is an

β1

Then the discrete wavelet coefficients is

orthonormal transform [5]. Let

[ππ ; π = 0 β¦ β¦ . . π β

π = (π¦ β π¦ )

οΏ½ β¦β¦β¦β¦β¦β¦β¦β¦ (2.7)

β2

Equation 2.7 can be rewritten as

1] be the DWT coefficients then, we can write

π = π€π where π is a column vector of length

π = 22 whose ππ‘β DWT and satisfying π€π π€ = πΌπ

orthornormality implies that π = π€π π€ and

ππ = π0π¦2π +

βπβ2 = βπβ2 . Hence ππ

represents the

π1 π¦2πβ1 β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦ (2.8)

contribution to the energy attributable to the DWT

coefficient with index π.

β1 β1

Where π0 = 2

In general

β

οΏ½2 and π1 = β2 οΏ½2

Whereas ODFT coefficients are associated with

frequencies the ππ‘β wavelet coefficient ππ is

associated with a particular scale and with a

particular set of times H. Nayyar and Ali. A.

ππ = οΏ½ ππΏ π¦2πβ1 β¦ β¦ β¦ . . β¦ β¦ β¦ β¦ β¦ β¦ (2.9)

πΏ=*β*

β1

Ghorbani (2006).

Explicitly, the rows of this mature for n=0, 8, 12, 14, and 15 are

2

Where ππΏ = οΏ½2

οΏ½2 πππ πΏ = 0

οΏ½2 πππ πΏ = 1

β¦β¦. (2.10)

β1

0 ππ‘βπππ€ππ π

π€π = οΏ½β

1οΏ½ __ __,

β2

1οΏ½ __ __, 0οΏ½β¦οΏ½οΏ½β¦οΏ½β¦οΏ½οΏ½β¦οΏ½β¦οΏ½οΏ½.0οΏ½

β2

Equation 2.9 is similar to a filtering operation with

β

14 π§πππ

π = οΏ½β 1οΏ½ , β1οΏ½ , 1οΏ½ , 1οΏ½ , οΏ½0 β¦οΏ½ β¦οΏ½οΏ½β¦ οΏ½.0οΏ½

coefficient of {ππΏ }π=β [10,11]

That is the input sequence can be thought to possess a

norm as defined by

π€8

2 2 2

2

12 π§πππ

π€π =

π

2 οΏ½β 1οΏ½ , β¦ β¦ β¦ β¦ . . , β 1οΏ½ , 1οΏ½ β¦ β¦ β¦ 1οΏ½ , οΏ½0 β¦οΏ½ οΏ½β¦οΏ½.οΏ½. οΏ½.0οΏ½

βπ¦β2 = οΏ½ π¦π β8

πΏ=1

Another interesting component of the filter object is

the H component which is equal to the vector

β8 β8

β8 8 π§πππ

β1 β1

π€π = οΏ½β

1οΏ½ __ __, β¦ β¦ . . , β

1οΏ½ __ __,

1οΏ½ __ __β¦ β¦ β¦

1οΏ½ __ __οΏ½

operation οΏ½2

οΏ½2, 2

οΏ½2 οΏ½ which is involved in the

β4 β4 β4 β4

filtering operation analogous to that equation 3.12

that produce πΆπ as

π€π = οΏ½1οΏ½ β¦ β¦ β¦ 1οΏ½ οΏ½

15 β4 β4

The remaining eleven rows are shifted version of the

above;

IJSER Β© 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 63

ISSN 2229-5518

π€1 = π2 π€0 , π€2 = π4 π€0 β¦ β¦ π€7 = π14 π€0

π€9 = π4 π€8 , π€10 = π8 π€8 π€11 = π12 π€8

π€13 = π8 π€12

Let us now, define exactly what the notation of scale

means for a positive integer π let

π‘ (π)

πβ1

1

= οΏ½ ππ‘β1 β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ . β¦ β¦ β¦ (2.13)

π

π =0

Donald B. Percival, Andrew T. Walden (2000)

In this section, we assume that the higher the value of

the residuals, the more anomalous the data Wei Lu, Mahbod Tavallaee and Ali A. (2008). As a result, in order to identify these outliers the residuals of these

distributions at different resolutions will be obtained and compared to identify their rate of detection J. McHugh (2000) and P. Barford, J. Kline, D. Plonka and A. Ron (2002).

The purpose for analyzing the residuals of these

distributions is to support our assumption in section (3). The data analyzed were simulated from Normal distribution involving 1020 data set. Since Wavelet analysis is dyadic, we introduced four data within the maximum and minimum values in the data set and analyzed it as Normal distribution without aberrant observations (NO) at different resolution (j). These four values were removed and four aberrant observations were now introduced and further analysis using Normal (NW), Laplace and Cauchy distributions were used to analyze the contaminated data set at different resolutions. The mean and standard deviation (residual) were obtained at different resolutions using the Maximum Likelihood estimate which of course, is more efficient than the conventional method. Since Wavelet analysis is dyadic, the data were analyze at different band size (1024, 512, 256, 128, 64, 32) and at different resolutions (j= 10 9,8,7,6 and 5) respectively.

Resolutio n level j | Band Size | NO | NW | LAPLACE | CAUCHY | ||||

Resolutio n level j | Band Size | Mean | StdDev | Mean | StdDe v | Mean | StdDev | Mean | StdDev |

10 | 1024 | 0.04545 | 0.9869 | 0.0727 | 1.4858 | 0.0711 | 0.8613 | 0.0671 | 0.6255 |

9 | 512 | -0.0581 | 0.9470 | 0.0160 | 1.5503 | 0.1752 | 0.8681 | 0.1701 | 0.6326 |

8 | 256 | 0.0573 | 0.9738 | 0.0547 | 1.5059 | 0.0012 | 0.8975 | 0.0114 | 0.6599 |

7 | 128 | -0.2162 | 0.9549 | -0.0163 | 1.5041 | 0.2338 | 0.9059 | 0.2465 | 0.6198 |

6 | 64 | -0.0989 | 0.6676 | -0.1550 | 1.5739 | -0.1842 | 0.8498 | -0.1572 | 0.5021 |

5 | 32 | -0.1220 | 0.6514 | -0.2535 | 1.4314 | -0.0332 | 0.8505 | -0.0036 | 0.4851 |

NO: Normal distribution without aberrant

observations

NW: Normal distribution with aberrant observations

From the above, the mean and standard deviations for

the coefficients of Normal distribution without outliers(NO) at different resolutions(j) or band size with approximately mean = 0 and standard deviation

= 1 confirms the absence of outliers. Also for the other three distributions (Normal with outliers(NW), Laplace and Cauchy), it was observed that the

Laplace distribution has a standard deviation closer to that of Normal without outliers, followed by the Cauchy distribution and finally the Normal distribution with aberrant observations(NW).

Since Normal (Gaussian) distribution with aberrant observations has the highest standard deviation at all resolutions from Normal, we conclude that among the three distributions, it is the most efficient in detecting aberrant observations. While on the other hand, Laplace (Non-Gaussian) distribution whose standard deviations at different resolution is closest to the Normal distribution without aberrant observations

IJSER Β© 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 64

ISSN 2229-5518

is regarded as the optimal distributions for modeling

[7] L. Li, and G.Lee, β DDoS attack Detection and Waveletsβ

th

aberrant observations among these distributions.

Proceedings of 12

International Conference on Communication

References

[1] Abraham Maslow, Histogram Smoothing Via The Wavelet Transform. Journal of Computational and Graphic Statistics, Vol.7, No.4 (Dec., 1998)

[2] A. Dainotti, A. Pescape and G. Viorgio βWavelet-based

Detection of Dos Attackβ

Proceedings of IEEE Global Telecommunication Conference, San

Francisco, 2006

[3] Donald B. Percival, Andrew T. Walden, βWavelets Methods for Time Series Analysis.β Cambridge University Press (2000)

[4] G.P. Nason, Wavelet Methods In Statistics With R Springer

(2008)

[5] H. Nayyar and Ali. A. Ghorbani βApproximate Autoregressive Modeling for Network Attack Detection,β Proceedings of the 4th Annual Conference on Privacy, Security and Trust, pp. 175-184, Markham, Canada, 2006

[6] J. McHugh, β Testing Intrusion Detecting System: A Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory ,β CAM Transition on Information and System Security, 3 (4):262-294,

2000

and Networks, pp. 421-427, Texas, 2003

[8] P. Barford, J. Kline, D. Plonka and A. Ron, βA Signal Analysis of Network Traffic Anomaliesβ A Proceedings of Internet Workshop 2002, Marseille, France, 2002.

[9] R. Todd Ogden βEssential Wavelets for Statistical Application and Data Analysisβ Birkhauser Boston (Dec., 1996)

[10] W. Lu and I. Traore βA Novel Unsupervised Anomaly Detection Framework for Detecting Network Attacks in Real Timeβ, Lecture Note in Computer Science, Vol., 3810, pp. 96-109, Springer, 2005, Y.G. Desmedt et al (Eds.)

[11] Wei Lu, Mahbod Tavallaee andAli A. βGhorbani Detecting

Network Anomalous Using Wavelet Basis Functionsβ CNSR, pg

149-156, IEEE Computer Science (2008)

First Arthur

Shittu Olarewanju Ismail*

P.D.S.,B.Sc, M.Sc, M.Phil,Ph.D

Second Arthur

Aideyan Donald Osaro**

P.D.S.,B.Sc, M.Sc.

IJSER Β© 2015 http://www.ijser.org