International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 60

ISSN 2229-5518

Wavelet Method for Detecting and Modeling Anomalous Observations in Gaussian and Non - Gaussian Distributions

Shittu Olarewanju Ismail1 University of Ibada, Department of Statistics, Ibadan, Oyo State, Nigeria Aideyan Donald Osaro2, Kogi state University, Department of Mathematical Sciences, Anyigba, Kogi State, Nigeria.

ABTRACT: Wavelet analysis has been applied recently for analyzing data completely due to its potential. In this paper, we present aberrant observation detection and modeling approach based on wavelet analysis in Gaussian and Non- Gaussian distributions. In order to characterize these distributions, a simulation of 1020 data set from normal distribution and contaminated with four normal data and later with four aberrant observations since wavelet analysis is dyadic. It was discovered that Normal (Gaussian) distribution with aberrant observations is the most efficient in detecting aberrant observations while Laplace (Non-Gaussian) distribution is the optimal distribution in modeling aberrant observations using the three distributions.

Index Terms: Wavelets, Outliers, resolution, Residuals, Distributions, Gaussian, Discrete, Analysis

----------------------------------------♦------------------------------------------

1 INRODUCTION

Aberrant observations (outliers) are defined as data points that are distinctly separate from the rest of the data. It is an observation that lies an abnormal distance from other values in a set of data. In statistics, an aberrant observation is an observation that is numerically distinct from the rest of the data. They can occur by chance in any distribution but are often indicative either of measurement error or that the population is heavy tailed. It can also indicate faulty data, erroneous procedures, etc. Section 2 looks at the overview of wavelet analysis which uses both resolution and location in analyzing data completely. Section 3 describes how these outliers will be detected using these distributions which are the main goal of this paper. Section 4 discusses the analysis for these residuals while Section 5 interprets the results, conclusion and informed us of areas of further work.

2 OVERVIEW OF WAVELET ANALYSIS

Wavelet analysis is a statistical tool that can be used to extract information from any kind of data and are
generally needed to analyze data fully at different resolution (scale) and location.
Discrete Wavelet Transform re - expresses a time series in terms of coefficients that are associated with a particular time and a particular dyadic scale 2 J . These coefficients are fully equivalent to the original series from its Discrete Wavelet Transform coefficients.
The Discrete Wavelet Transform allows us to
partition (decompose) the information in a time series into pieces that are associated with different scales and time. This decomposition is very close to the statistical technique known as the Analysis of variance (ANOVA), so DWT leads to a scaled – based ANOVA that is quite analogous to the frequency – based ANOVA provided by the power
spectrum R. Todd Ogden (Dec., 1996).
It effectively decorrelates a wide variety of time series that occurs quite commonly in physical applications. This property is the key to the use of the DWT in the statistical methodology
L. Li and G.Lee (2003).

Illustration

IJSER Β© 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 61

ISSN 2229-5518

We begin with a set of discrete sequence of data
𝑦 = 𝑦1 , 𝑦2 … … 𝑦𝑛 Where each of 𝑦𝑖 is a real number and 𝑖 is an integer ranging from 1to n. we assume that the length of our sequence n is a power of two, 𝑛 =
2𝐽 for some𝐽 β‰₯ 0. This should not be seen as a
restriction as this can be modified for other n
Abraham Maslow (Dec., 2008). We call the sequence
Where 𝑛 = 2 dyadic one. The key information we
extract is the β€œdetail” in the sequence at different
scale and different locations. By detail we mean the degree of the difference or variation between
successive observations of the vector that is, 𝑑1 =
𝑦2 βˆ’ 𝑦1 at the given scale and location.
π‘‘π‘˜ = π‘‘π‘’π‘‘π‘Žπ‘–π‘™ π‘Žπ‘‘ π‘™π‘œπ‘π‘Žπ‘‘π‘–π‘œπ‘› π‘˜
π‘‘π‘˜ = (𝑦2π‘˜ βˆ’ 𝑦2π‘˜βˆ’1 ) …………(2.1)
πΉπ‘œπ‘Ÿ π‘˜ = 1,2, … … … . , 𝑛�2
e.g
𝑑1 = 𝑦2 βˆ’ 𝑦1 , 𝑑2 = 𝑦4 βˆ’ 𝑦3, 𝑑3 = 𝑦6 βˆ’ 𝑦5 , 𝑒𝑑𝑐
In equation (2.1) if the detail in 𝑦2π‘˜ βˆ’ 𝑦2π‘˜βˆ’1 are
similar, then the coefficient π‘‘π‘˜ will be very small; if
they are exactly the same, π‘‘π‘˜ is zero and if very large
the coefficient will be very large. π‘‘π‘˜ encodes the
difference between successive pairs of observations
in the original y vector. π‘‘π‘˜ Is known as the finest
scale detail Abraham Maslow (Dec., 1998)

𝑛

If 𝑗 = 𝐽 βˆ’ 1 then π‘‘π‘˜ can be written as 𝑑𝑗 ,π‘˜ and the first level averages or smooth πΆπ‘˜ are renamed to becomes πΆπ‘—βˆ’1 , π‘˜ written as 𝐢𝑗,π‘˜
To obtain the next coarsest detail, we repeat the
operation of equation (3.1) to the finest level
averages πΆπ‘—βˆ’1 , π‘˜ as follows.

2.1 SCALE/LEVEL TERMINOLOGY

The scale in the quantity 2𝐽 where 𝑗 = 𝐽 βˆ’ 1 and the
level for the intergral quantity while k is the locations
Larger 𝑗 (positive) corresponds to finer scale and smaller 𝑗 refers to the coarser scale in the contents of this work from equation 2.2 πΆπ‘—βˆ’1,π‘˜
πΆπ‘—βˆ’1,2𝐿 = πΆπ‘—βˆ’1,2𝐿 + πΆπ‘—βˆ’1,2πΏβˆ’1………..……. (2.3)
For 𝑙 = 1,2, … … … … … . , 𝑛�4
From the original vector 𝑦 for 𝑙 = 1
πΆπ‘—βˆ’1, 𝑙 = (𝑦4𝐿+2 + 𝑦4πΏβˆ’3 ) βˆ’ (𝑦4𝐿 + 𝑦4πΏβˆ’1 )
= (𝑦2 + 𝑦1 ) βˆ’ (𝑦4 + 𝑦3 )
= 𝑦1 + 𝑦2 + 𝑦3 + 𝑦4
This is a kind of moving average except that it is not
divided by 1οΏ½4
𝑑𝑗,π‘˜ β€œdetailed” coefficients are wavelet coefficients
and 𝐢𝑗,π‘˜ coefficients are known as father wavelet or

οΏ½2

π‘˜ π‘˜=1

Is not the conventional first difference vector
scaling function coefficients.
since difference such as 𝑦3 βˆ’ 𝑦2 are missing from
{π‘‘π‘˜ } location. π‘‘π‘˜ only gives information about 2π‘˜
and its neighbor at the finest possible scale of detail
G.P. Nason (2008)
This general pyramid algorithm is called Haar wavelet transform.
The inverse of the original sequence can be
reconstructed exactly by using wavelet coefficients

At Coarser Scale; for coarser detail

𝑑𝑗,π‘˜
and last 𝐢00
W. Lu and I. Traore (2005)
πΆπ‘˜ = 𝑦2π‘˜ + 𝑦2π‘˜βˆ’1………………… (2.2)

𝑛�

2.2 Sparsity

The behavior of sparsity is a characteristic of
[πΆπ‘˜ ]

2 is the sum of scaled average (scaled because

π‘˜=1

it is not divided by 2). The information in [ πΆπ‘˜ ] is a
roughing of that original 𝑦 vector. The operation that
turns [𝑦𝑖 ] to [ πΆπ‘˜ ] is similar to the moving average
smoothening operation except that the differencing
does not overlap consecutive pairs. A. Dainotti, A. Pescape and G. Viorgio (2006)
Each πΆπ‘˜ contains information originating from 𝑦2π‘˜
and 𝑦2π‘˜βˆ’1 (adjacent observations)
wavelet: piece wise smooth functions have sparse
representation G.P. Nason (2008).
To conserve information we change equation (2.1)
and (2.2) by introducing 𝛼 as follows
π‘‘π‘˜ = 𝛼(𝑦2π‘˜ βˆ’
𝑦2π‘˜βˆ’1 )…………………………………. (2.4)
πΆπ‘˜ = Ξ± (𝑦2π‘˜ +
𝑦2π‘˜βˆ’1 )…………………………………… (2.5)
The original sequence 𝑦 consist of 2𝐽 observations
The inputs are (𝑦2π‘˜
, 𝑦2π‘˜βˆ’1
) transformed into the
{π‘‘π‘˜ } consist of 𝑛�2 = 2π½βˆ’1 observations
output (𝑑 , 𝐢 ) and the (squared) norm of the output.

π‘˜ π‘˜

IJSER Β© 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 62

ISSN 2229-5518

2 + 2 = 2 2

∞

πΆπ‘˜ = οΏ½ β„ŽπΏ 𝑦2π‘˜βˆ’π‘™ … … … … … … … … … . (2.11)

𝐿=∞

d k ck

οΏ½
Ξ± (𝑦2π‘˜ + 2𝑦2π‘˜ 𝑦2π‘˜βˆ’1 + 𝑦2π‘˜βˆ’1 )
=+2∝α2 (2 𝑦(𝑦2π‘˜2 ++2𝑦 2π‘˜ 𝑦2)π‘˜βˆ’β€¦1 +….𝑦.(22π‘˜.βˆ’61))
β„ŽπΏ

βˆ’1

2π‘˜

2 2

2π‘˜βˆ’1

2
= οΏ½2

οΏ½2 π‘“π‘œπ‘Ÿ 𝐿 = 0

οΏ½2 π‘“π‘œπ‘Ÿ 𝐿 = 1

… … … … … … … (2.12)
Where 𝑦2π‘˜ + 𝑦2π‘˜βˆ’1the squared norm of the input
coefficients hence is to wish the norm of output
equals norms of input
Let2 Ξ± 2 =1 therefore

Ξ± οΏ½ = 2 οΏ½2

βˆ’1

0 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’

2.3 MATRX REPRESENTATION

Like the orthonormal discrete Fourier transform, the
discrete wavelets transform (DWT) of 𝑋𝑑 is an

βˆ’1

Then the discrete wavelet coefficients is
orthonormal transform [5]. Let
[π‘Šπ‘› ; 𝑛 = 0 … … . . 𝑁 βˆ’
𝑑 = (𝑦 βˆ’ 𝑦 )
οΏ½ …………………… (2.7)
√2
Equation 2.7 can be rewritten as
1] be the DWT coefficients then, we can write
π‘Š = 𝑀𝑛 where π‘Š is a column vector of length
𝑁 = 22 whose π‘›π‘‘β„Ž DWT and satisfying 𝑀𝑇 𝑀 = 𝐼𝑁
orthornormality implies that 𝑋 = 𝑀𝑇 𝑀 and
π‘‘π‘˜ = 𝑔0𝑦2π‘˜ +
β€–π‘Šβ€–2 = ‖𝑋‖2 . Hence π‘Šπ‘›
represents the
𝑔1 𝑦2π‘˜βˆ’1 ……………………………………… (2.8)
contribution to the energy attributable to the DWT
coefficient with index 𝑛.

βˆ’1 βˆ’1

Where 𝑔0 = 2
In general

∞

οΏ½2 and 𝑔1 = βˆ’2 οΏ½2

Whereas ODFT coefficients are associated with
frequencies the π‘›π‘‘β„Ž wavelet coefficient π‘Šπ‘› is
associated with a particular scale and with a
particular set of times H. Nayyar and Ali. A.
π‘‘π‘˜ = οΏ½ 𝑔𝐿 𝑦2π‘˜βˆ’1 … … … . . … … … … … … (2.9)

𝐿=∞

βˆ’1

Ghorbani (2006).
Explicitly, the rows of this mature for n=0, 8, 12, 14, and 15 are
2
Where 𝑔𝐿 = οΏ½2

οΏ½2 π‘“π‘œπ‘Ÿ 𝐿 = 0

οΏ½2 π‘“π‘œπ‘Ÿ 𝐿 = 1

……. (2.10)

βˆ’1

0 π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’
𝑀𝑇 = οΏ½βˆ’
1οΏ½ ,
√2
1οΏ½ , 0�…��…�…��…�…��.0οΏ½
√2
Equation 2.9 is similar to a filtering operation with

∞

14 π‘§π‘’π‘Ÿπ‘œ

𝑇 = οΏ½βˆ’ 1οΏ½ , βˆ’1οΏ½ , 1οΏ½ , 1οΏ½ , οΏ½0 …� …��… οΏ½.0οΏ½

coefficient of {𝑔𝐿 }𝑙=∞ [10,11]
That is the input sequence can be thought to possess a
norm as defined by
𝑀8
2 2 2
2

12 π‘§π‘’π‘Ÿπ‘œ

𝑀𝑇 =

𝑛

2 οΏ½βˆ’ 1οΏ½ , … … … … . . , βˆ’ 1οΏ½ , 1οΏ½ … … … 1οΏ½ , οΏ½0 …� �…�.οΏ½. οΏ½.0οΏ½





‖𝑦‖2 = οΏ½ 𝑦𝑖 √8

𝐿=1

Another interesting component of the filter object is
the H component which is equal to the vector
√8 √8
√8 8 π‘§π‘’π‘Ÿπ‘œ

βˆ’1 βˆ’1

𝑀𝑇 = οΏ½βˆ’
1οΏ½ , … … . . , βˆ’
1οΏ½ ,
1οΏ½ … … …
1οΏ½ οΏ½
operation οΏ½2

οΏ½2, 2

οΏ½2 οΏ½ which is involved in the

√4 √4 √4 √4
filtering operation analogous to that equation 3.12
that produce πΆπ‘˜ as


𝑀𝑇 = οΏ½1οΏ½ … … … 1οΏ½ οΏ½

15 √4 √4

The remaining eleven rows are shifted version of the
above;

IJSER Β© 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 63

ISSN 2229-5518

𝑀1 = 𝑇2 𝑀0 , 𝑀2 = 𝑇4 𝑀0 … … 𝑀7 = 𝑇14 𝑀0
𝑀9 = 𝑇4 𝑀8 , 𝑀10 = 𝑇8 𝑀8 𝑀11 = 𝑇12 𝑀8
𝑀13 = 𝑇8 𝑀12
Let us now, define exactly what the notation of scale
means for a positive integer π‘˜ let

𝑑 (π‘˜)

π‘˜βˆ’1


1
= οΏ½ π‘‹π‘‘βˆ’1 … … … … … … … … … … … … … … . … … … (2.13)
π‘˜

𝑙 =0

Donald B. Percival, Andrew T. Walden (2000)

3 OUTLIER DETECTION

In this section, we assume that the higher the value of
the residuals, the more anomalous the data Wei Lu, Mahbod Tavallaee and Ali A. (2008). As a result, in order to identify these outliers the residuals of these
distributions at different resolutions will be obtained and compared to identify their rate of detection J. McHugh (2000) and P. Barford, J. Kline, D. Plonka and A. Ron (2002).

4 ANALYSIS OF RESIDUALS

The purpose for analyzing the residuals of these
distributions is to support our assumption in section (3). The data analyzed were simulated from Normal distribution involving 1020 data set. Since Wavelet analysis is dyadic, we introduced four data within the maximum and minimum values in the data set and analyzed it as Normal distribution without aberrant observations (NO) at different resolution (j). These four values were removed and four aberrant observations were now introduced and further analysis using Normal (NW), Laplace and Cauchy distributions were used to analyze the contaminated data set at different resolutions. The mean and standard deviation (residual) were obtained at different resolutions using the Maximum Likelihood estimate which of course, is more efficient than the conventional method. Since Wavelet analysis is dyadic, the data were analyze at different band size (1024, 512, 256, 128, 64, 32) and at different resolutions (j= 10 9,8,7,6 and 5) respectively.

Table 1: Mean and Standard Deviations of the Distributions

Resolutio

n level j

Band

Size

NO

NW

LAPLACE

CAUCHY

Resolutio

n level j

Band

Size

Mean

StdDev

Mean

StdDe

v

Mean

StdDev

Mean

StdDev

10

1024

0.04545

0.9869

0.0727

1.4858

0.0711

0.8613

0.0671

0.6255

9

512

-0.0581

0.9470

0.0160

1.5503

0.1752

0.8681

0.1701

0.6326

8

256

0.0573

0.9738

0.0547

1.5059

0.0012

0.8975

0.0114

0.6599

7

128

-0.2162

0.9549

-0.0163

1.5041

0.2338

0.9059

0.2465

0.6198

6

64

-0.0989

0.6676

-0.1550

1.5739

-0.1842

0.8498

-0.1572

0.5021

5

32

-0.1220

0.6514

-0.2535

1.4314

-0.0332

0.8505

-0.0036

0.4851

Key

NO: Normal distribution without aberrant
observations
NW: Normal distribution with aberrant observations

5 EXPERIMENTAL EVALUATIONS

From the above, the mean and standard deviations for
the coefficients of Normal distribution without outliers(NO) at different resolutions(j) or band size with approximately mean = 0 and standard deviation
= 1 confirms the absence of outliers. Also for the other three distributions (Normal with outliers(NW), Laplace and Cauchy), it was observed that the
Laplace distribution has a standard deviation closer to that of Normal without outliers, followed by the Cauchy distribution and finally the Normal distribution with aberrant observations(NW).
Since Normal (Gaussian) distribution with aberrant observations has the highest standard deviation at all resolutions from Normal, we conclude that among the three distributions, it is the most efficient in detecting aberrant observations. While on the other hand, Laplace (Non-Gaussian) distribution whose standard deviations at different resolution is closest to the Normal distribution without aberrant observations

IJSER Β© 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 64

ISSN 2229-5518

is regarded as the optimal distributions for modeling

[7] L. Li, and G.Lee, β€œ DDoS attack Detection and Wavelets”

th

aberrant observations among these distributions.

Proceedings of 12

International Conference on Communication

References

[1] Abraham Maslow, Histogram Smoothing Via The Wavelet Transform. Journal of Computational and Graphic Statistics, Vol.7, No.4 (Dec., 1998)

[2] A. Dainotti, A. Pescape and G. Viorgio β€œWavelet-based

Detection of Dos Attack”

Proceedings of IEEE Global Telecommunication Conference, San

Francisco, 2006

[3] Donald B. Percival, Andrew T. Walden, β€œWavelets Methods for Time Series Analysis.” Cambridge University Press (2000)

[4] G.P. Nason, Wavelet Methods In Statistics With R Springer

(2008)

[5] H. Nayyar and Ali. A. Ghorbani β€œApproximate Autoregressive Modeling for Network Attack Detection,” Proceedings of the 4th Annual Conference on Privacy, Security and Trust, pp. 175-184, Markham, Canada, 2006

[6] J. McHugh, β€œ Testing Intrusion Detecting System: A Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory ,” CAM Transition on Information and System Security, 3 (4):262-294,

2000

and Networks, pp. 421-427, Texas, 2003

[8] P. Barford, J. Kline, D. Plonka and A. Ron, β€œA Signal Analysis of Network Traffic Anomalies” A Proceedings of Internet Workshop 2002, Marseille, France, 2002.

[9] R. Todd Ogden β€œEssential Wavelets for Statistical Application and Data Analysis” Birkhauser Boston (Dec., 1996)

[10] W. Lu and I. Traore β€œA Novel Unsupervised Anomaly Detection Framework for Detecting Network Attacks in Real Time”, Lecture Note in Computer Science, Vol., 3810, pp. 96-109, Springer, 2005, Y.G. Desmedt et al (Eds.)

[11] Wei Lu, Mahbod Tavallaee andAli A. β€œGhorbani Detecting

Network Anomalous Using Wavelet Basis Functions” CNSR, pg

149-156, IEEE Computer Science (2008)

First Arthur
Shittu Olarewanju Ismail*
P.D.S.,B.Sc, M.Sc, M.Phil,Ph.D
Second Arthur
Aideyan Donald Osaro**
P.D.S.,B.Sc, M.Sc.

IJSER Β© 2015 http://www.ijser.org