Wavelet-Method-for-Detecting-and-Modeling-Anomalous-Observations-in-Gaussian.docx

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 60

ISSN 2229-5518

Wavelet Method for Detecting and Modeling Anomalous Observations in Gaussian and Non - Gaussian Distributions

Shittu Olarewanju Ismail1 University of Ibada, Department of Statistics, Ibadan, Oyo State, Nigeria Aideyan Donald Osaro2, Kogi state University, Department of Mathematical Sciences, Anyigba, Kogi State, Nigeria.

ABTRACT: Wavelet analysis has been applied recently for analyzing data completely due to its potential. In this paper, we present aberrant observation detection and modeling approach based on wavelet analysis in Gaussian and Non- Gaussian distributions. In order to characterize these distributions, a simulation of 1020 data set from normal distribution and contaminated with four normal data and later with four aberrant observations since wavelet analysis is dyadic. It was discovered that Normal (Gaussian) distribution with aberrant observations is the most efficient in detecting aberrant observations while Laplace (Non-Gaussian) distribution is the optimal distribution in modeling aberrant observations using the three distributions.

Index Terms: Wavelets, Outliers, resolution, Residuals, Distributions, Gaussian, Discrete, Analysis

----------------------------------------♦------------------------------------------

1 INRODUCTION

Aberrant observations (outliers) are defined as data points that are distinctly separate from the rest of the data. It is an observation that lies an abnormal distance from other values in a set of data. In statistics, an aberrant observation is an observation that is numerically distinct from the rest of the data. They can occur by chance in any distribution but are often indicative either of measurement error or that the population is heavy tailed. It can also indicate faulty data, erroneous procedures, etc. Section 2 looks at the overview of wavelet analysis which uses both resolution and location in analyzing data completely. Section 3 describes how these outliers will be detected using these distributions which are the main goal of this paper. Section 4 discusses the analysis for these residuals while Section 5 interprets the results, conclusion and informed us of areas of further work.

2 OVERVIEW OF WAVELET ANALYSIS

Wavelet analysis is a statistical tool that can be used to extract information from any kind of data and are
generally needed to analyze data fully at different resolution (scale) and location.
Discrete Wavelet Transform re - expresses a time series in terms of coefficients that are associated with a particular time and a particular dyadic scale 2 J . These coefficients are fully equivalent to the original series from its Discrete Wavelet Transform coefficients.
The Discrete Wavelet Transform allows us to
partition (decompose) the information in a time series into pieces that are associated with different scales and time. This decomposition is very close to the statistical technique known as the Analysis of variance (ANOVA), so DWT leads to a scaled – based ANOVA that is quite analogous to the frequency – based ANOVA provided by the power
spectrum R. Todd Ogden (Dec., 1996).
It effectively decorrelates a wide variety of time series that occurs quite commonly in physical applications. This property is the key to the use of the DWT in the statistical methodology
L. Li and G.Lee (2003).

Illustration

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 61

ISSN 2229-5518

We begin with a set of discrete sequence of data
𝑦 = 𝑦1 , 𝑦2 … … 𝑦𝑛 Where each of 𝑦𝑖 is a real number and 𝑖 is an integer ranging from 1to n. we assume that the length of our sequence n is a power of two, 𝑛 =
2𝐽 for some𝐽 ≥ 0. This should not be seen as a
restriction as this can be modified for other n
Abraham Maslow (Dec., 2008). We call the sequence
Where 𝑛 = 2 dyadic one. The key information we
extract is the “detail” in the sequence at different
scale and different locations. By detail we mean the degree of the difference or variation between
successive observations of the vector that is, 𝑑1 =
𝑦2 − 𝑦1 at the given scale and location.
𝑑𝑘 = 𝑑𝑒𝑡𝑎𝑖𝑙 𝑎𝑡 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑘
𝑑𝑘 = (𝑦2𝑘 − 𝑦2𝑘−1 ) …………(2.1)
𝐹𝑜𝑟 𝑘 = 1,2, … … … . , 𝑛�2
e.g
𝑑1 = 𝑦2 − 𝑦1 , 𝑑2 = 𝑦4 − 𝑦3, 𝑑3 = 𝑦6 − 𝑦5 , 𝑒𝑡𝑐
In equation (2.1) if the detail in 𝑦2𝑘 − 𝑦2𝑘−1 are
similar, then the coefficient 𝑑𝑘 will be very small; if
they are exactly the same, 𝑑𝑘 is zero and if very large
the coefficient will be very large. 𝑑𝑘 encodes the
difference between successive pairs of observations
in the original y vector. 𝑑𝑘 Is known as the finest
scale detail Abraham Maslow (Dec., 1998)

𝑛

If 𝑗 = 𝐽 − 1 then 𝑑𝑘 can be written as 𝑑𝑗 ,𝑘 and the first level averages or smooth 𝐶𝑘 are renamed to becomes 𝐶𝑗−1 , 𝑘 written as 𝐶𝑗,𝑘
To obtain the next coarsest detail, we repeat the
operation of equation (3.1) to the finest level
averages 𝐶𝑗−1 , 𝑘 as follows.

2.1 SCALE/LEVEL TERMINOLOGY

The scale in the quantity 2𝐽 where 𝑗 = 𝐽 − 1 and the
level for the intergral quantity while k is the locations
Larger 𝑗 (positive) corresponds to finer scale and smaller 𝑗 refers to the coarser scale in the contents of this work from equation 2.2 𝐶𝑗−1,𝑘
𝐶𝑗−1,2𝐿 = 𝐶𝑗−1,2𝐿 + 𝐶𝑗−1,2𝐿−1………..……. (2.3)
For 𝑙 = 1,2, … … … … … . , 𝑛�4
From the original vector 𝑦 for 𝑙 = 1
𝐶𝑗−1, 𝑙 = (𝑦4𝐿+2 + 𝑦4𝐿−3 ) − (𝑦4𝐿 + 𝑦4𝐿−1 )
= (𝑦2 + 𝑦1 ) − (𝑦4 + 𝑦3 )
= 𝑦1 + 𝑦2 + 𝑦3 + 𝑦4
This is a kind of moving average except that it is not
divided by 1�4
𝑑𝑗,𝑘 “detailed” coefficients are wavelet coefficients
and 𝐶𝑗,𝑘 coefficients are known as father wavelet or

�2

𝑘 𝑘=1

Is not the conventional first difference vector
scaling function coefficients.
since difference such as 𝑦3 − 𝑦2 are missing from
{𝑑𝑘 } location. 𝑑𝑘 only gives information about 2𝑘
and its neighbor at the finest possible scale of detail
G.P. Nason (2008)
This general pyramid algorithm is called Haar wavelet transform.
The inverse of the original sequence can be
reconstructed exactly by using wavelet coefficients

At Coarser Scale; for coarser detail

𝑑𝑗,𝑘
and last 𝐶00
W. Lu and I. Traore (2005)
𝐶𝑘 = 𝑦2𝑘 + 𝑦2𝑘−1………………… (2.2)

𝑛�

2.2 Sparsity

The behavior of sparsity is a characteristic of
[𝐶𝑘 ]

2 is the sum of scaled average (scaled because

𝑘=1

it is not divided by 2). The information in [ 𝐶𝑘 ] is a
roughing of that original 𝑦 vector. The operation that
turns [𝑦𝑖 ] to [ 𝐶𝑘 ] is similar to the moving average
smoothening operation except that the differencing
does not overlap consecutive pairs. A. Dainotti, A. Pescape and G. Viorgio (2006)
Each 𝐶𝑘 contains information originating from 𝑦2𝑘
and 𝑦2𝑘−1 (adjacent observations)
wavelet: piece wise smooth functions have sparse
representation G.P. Nason (2008).
To conserve information we change equation (2.1)
and (2.2) by introducing 𝛼 as follows
𝑑𝑘 = 𝛼(𝑦2𝑘 −
𝑦2𝑘−1 )…………………………………. (2.4)
𝐶𝑘 = α (𝑦2𝑘 +
𝑦2𝑘−1 )…………………………………… (2.5)
The original sequence 𝑦 consist of 2𝐽 observations
The inputs are (𝑦2𝑘
, 𝑦2𝑘−1
) transformed into the
{𝑑𝑘 } consist of 𝑛�2 = 2𝐽−1 observations
output (𝑑 , 𝐶 ) and the (squared) norm of the output.

𝑘 𝑘

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 62

ISSN 2229-5518

2 + 2 = 2 2

∞

𝐶𝑘 = � ℎ𝐿 𝑦2𝑘−𝑙 … … … … … … … … … . (2.11)

𝐿=∞

d k ck

�
α (𝑦2𝑘 + 2𝑦2𝑘 𝑦2𝑘−1 + 𝑦2𝑘−1 )
=+2∝α2 (2 𝑦(𝑦2𝑘2 ++2𝑦 2𝑘 𝑦2)𝑘−…1 +….𝑦.(22𝑘.−61))
ℎ𝐿

−1

2𝑘

2 2

2𝑘−1

2
= �2

�2 𝑓𝑜𝑟 𝐿 = 0

�2 𝑓𝑜𝑟 𝐿 = 1

… … … … … … … (2.12)
Where 𝑦2𝑘 + 𝑦2𝑘−1the squared norm of the input
coefficients hence is to wish the norm of output
equals norms of input
Let2 α 2 =1 therefore

α � = 2 �2

−1

0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

2.3 MATRX REPRESENTATION

Like the orthonormal discrete Fourier transform, the
discrete wavelets transform (DWT) of 𝑋𝑡 is an

−1

Then the discrete wavelet coefficients is
orthonormal transform [5]. Let
[𝑊𝑛 ; 𝑛 = 0 … … . . 𝑁 −
𝑑 = (𝑦 − 𝑦 )
� …………………… (2.7)
√2
Equation 2.7 can be rewritten as
1] be the DWT coefficients then, we can write
𝑊 = 𝑤𝑛 where 𝑊 is a column vector of length
𝑁 = 22 whose 𝑛𝑡ℎ DWT and satisfying 𝑤𝑇 𝑤 = 𝐼𝑁
orthornormality implies that 𝑋 = 𝑤𝑇 𝑤 and
𝑑𝑘 = 𝑔0𝑦2𝑘 +
‖𝑊‖2 = ‖𝑋‖2 . Hence 𝑊𝑛
represents the
𝑔1 𝑦2𝑘−1 ……………………………………… (2.8)
contribution to the energy attributable to the DWT
coefficient with index 𝑛.

−1 −1

Where 𝑔0 = 2
In general

∞

�2 and 𝑔1 = −2 �2

Whereas ODFT coefficients are associated with
frequencies the 𝑛𝑡ℎ wavelet coefficient 𝑊𝑛 is
associated with a particular scale and with a
particular set of times H. Nayyar and Ali. A.
𝑑𝑘 = � 𝑔𝐿 𝑦2𝑘−1 … … … . . … … … … … … (2.9)

𝐿=∞

−1

Ghorbani (2006).
Explicitly, the rows of this mature for n=0, 8, 12, 14, and 15 are
2
Where 𝑔𝐿 = �2

�2 𝑓𝑜𝑟 𝐿 = 0

�2 𝑓𝑜𝑟 𝐿 = 1

……. (2.10)

−1

0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑤𝑇 = �−
1� ,
√2
1� , 0�…��…�…��…�…��.0�
√2
Equation 2.9 is similar to a filtering operation with

∞

14 𝑧𝑒𝑟𝑜

𝑇 = �− 1� , −1� , 1� , 1� , �0 …� …��… �.0�

coefficient of {𝑔𝐿 }𝑙=∞ [10,11]
That is the input sequence can be thought to possess a
norm as defined by
𝑤8
2 2 2
2

12 𝑧𝑒𝑟𝑜

𝑤𝑇 =

𝑛

2 �− 1� , … … … … . . , − 1� , 1� … … … 1� , �0 …� �…�.�. �.0�

‖𝑦‖2 = � 𝑦𝑖 √8

𝐿=1

Another interesting component of the filter object is
the H component which is equal to the vector
√8 √8
√8 8 𝑧𝑒𝑟𝑜

−1 −1

𝑤𝑇 = �−
1� , … … . . , −
1� ,
1� … … …
1� �
operation �2

�2, 2

�2 � which is involved in the

√4 √4 √4 √4
filtering operation analogous to that equation 3.12
that produce 𝐶𝑘 as

𝑤𝑇 = �1� … … … 1� �

15 √4 √4

The remaining eleven rows are shifted version of the
above;

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 63

ISSN 2229-5518

𝑤1 = 𝑇2 𝑤0 , 𝑤2 = 𝑇4 𝑤0 … … 𝑤7 = 𝑇14 𝑤0
𝑤9 = 𝑇4 𝑤8 , 𝑤10 = 𝑇8 𝑤8 𝑤11 = 𝑇12 𝑤8
𝑤13 = 𝑇8 𝑤12
Let us now, define exactly what the notation of scale
means for a positive integer 𝑘 let

𝑡 (𝑘)

𝑘−1

1
= � 𝑋𝑡−1 … … … … … … … … … … … … … … . … … … (2.13)
𝑘

𝑙 =0

Donald B. Percival, Andrew T. Walden (2000)

3 OUTLIER DETECTION

In this section, we assume that the higher the value of
the residuals, the more anomalous the data Wei Lu, Mahbod Tavallaee and Ali A. (2008). As a result, in order to identify these outliers the residuals of these
distributions at different resolutions will be obtained and compared to identify their rate of detection J. McHugh (2000) and P. Barford, J. Kline, D. Plonka and A. Ron (2002).

4 ANALYSIS OF RESIDUALS

The purpose for analyzing the residuals of these
distributions is to support our assumption in section (3). The data analyzed were simulated from Normal distribution involving 1020 data set. Since Wavelet analysis is dyadic, we introduced four data within the maximum and minimum values in the data set and analyzed it as Normal distribution without aberrant observations (NO) at different resolution (j). These four values were removed and four aberrant observations were now introduced and further analysis using Normal (NW), Laplace and Cauchy distributions were used to analyze the contaminated data set at different resolutions. The mean and standard deviation (residual) were obtained at different resolutions using the Maximum Likelihood estimate which of course, is more efficient than the conventional method. Since Wavelet analysis is dyadic, the data were analyze at different band size (1024, 512, 256, 128, 64, 32) and at different resolutions (j= 10 9,8,7,6 and 5) respectively.

Table 1: Mean and Standard Deviations of the Distributions

Resolutio n level j	Band Size	NO	NW	LAPLACE	CAUCHY
Resolutio n level j	Band Size	Mean	StdDev	Mean	StdDe v	Mean	StdDev	Mean	StdDev
10	1024	0.04545	0.9869	0.0727	1.4858	0.0711	0.8613	0.0671	0.6255
9	512	-0.0581	0.9470	0.0160	1.5503	0.1752	0.8681	0.1701	0.6326
8	256	0.0573	0.9738	0.0547	1.5059	0.0012	0.8975	0.0114	0.6599
7	128	-0.2162	0.9549	-0.0163	1.5041	0.2338	0.9059	0.2465	0.6198
6	64	-0.0989	0.6676	-0.1550	1.5739	-0.1842	0.8498	-0.1572	0.5021
5	32	-0.1220	0.6514	-0.2535	1.4314	-0.0332	0.8505	-0.0036	0.4851

Key

NO: Normal distribution without aberrant
observations
NW: Normal distribution with aberrant observations

5 EXPERIMENTAL EVALUATIONS

From the above, the mean and standard deviations for
the coefficients of Normal distribution without outliers(NO) at different resolutions(j) or band size with approximately mean = 0 and standard deviation
= 1 confirms the absence of outliers. Also for the other three distributions (Normal with outliers(NW), Laplace and Cauchy), it was observed that the
Laplace distribution has a standard deviation closer to that of Normal without outliers, followed by the Cauchy distribution and finally the Normal distribution with aberrant observations(NW).
Since Normal (Gaussian) distribution with aberrant observations has the highest standard deviation at all resolutions from Normal, we conclude that among the three distributions, it is the most efficient in detecting aberrant observations. While on the other hand, Laplace (Non-Gaussian) distribution whose standard deviations at different resolution is closest to the Normal distribution without aberrant observations

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 64

ISSN 2229-5518

is regarded as the optimal distributions for modeling

[7] L. Li, and G.Lee, “ DDoS attack Detection and Wavelets”

aberrant observations among these distributions.

Proceedings of 12

International Conference on Communication

References

[1] Abraham Maslow, Histogram Smoothing Via The Wavelet Transform. Journal of Computational and Graphic Statistics, Vol.7, No.4 (Dec., 1998)

[2] A. Dainotti, A. Pescape and G. Viorgio “Wavelet-based

Detection of Dos Attack”

Proceedings of IEEE Global Telecommunication Conference, San

Francisco, 2006

[3] Donald B. Percival, Andrew T. Walden, “Wavelets Methods for Time Series Analysis.” Cambridge University Press (2000)

[4] G.P. Nason, Wavelet Methods In Statistics With R Springer

(2008)

[5] H. Nayyar and Ali. A. Ghorbani “Approximate Autoregressive Modeling for Network Attack Detection,” Proceedings of the 4th Annual Conference on Privacy, Security and Trust, pp. 175-184, Markham, Canada, 2006

[6] J. McHugh, “ Testing Intrusion Detecting System: A Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory ,” CAM Transition on Information and System Security, 3 (4):262-294,

2000

and Networks, pp. 421-427, Texas, 2003

[8] P. Barford, J. Kline, D. Plonka and A. Ron, “A Signal Analysis of Network Traffic Anomalies” A Proceedings of Internet Workshop 2002, Marseille, France, 2002.

[9] R. Todd Ogden “Essential Wavelets for Statistical Application and Data Analysis” Birkhauser Boston (Dec., 1996)

[10] W. Lu and I. Traore “A Novel Unsupervised Anomaly Detection Framework for Detecting Network Attacks in Real Time”, Lecture Note in Computer Science, Vol., 3810, pp. 96-109, Springer, 2005, Y.G. Desmedt et al (Eds.)

[11] Wei Lu, Mahbod Tavallaee andAli A. “Ghorbani Detecting

Network Anomalous Using Wavelet Basis Functions” CNSR, pg

149-156, IEEE Computer Science (2008)

First Arthur
Shittu Olarewanju Ismail*
P.D.S.,B.Sc, M.Sc, M.Phil,Ph.D
Second Arthur
Aideyan Donald Osaro**
P.D.S.,B.Sc, M.Sc.