International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 60
ISSN 2229-5518
Wavelet Method for Detecting and Modeling Anomalous Observations in Gaussian and Non - Gaussian Distributions
ABTRACT: Wavelet analysis has been applied recently for analyzing data completely due to its potential. In this paper, we present aberrant observation detection and modeling approach based on wavelet analysis in Gaussian and Non- Gaussian distributions. In order to characterize these distributions, a simulation of 1020 data set from normal distribution and contaminated with four normal data and later with four aberrant observations since wavelet analysis is dyadic. It was discovered that Normal (Gaussian) distribution with aberrant observations is the most efficient in detecting aberrant observations while Laplace (Non-Gaussian) distribution is the optimal distribution in modeling aberrant observations using the three distributions.
Index Terms: Wavelets, Outliers, resolution, Residuals, Distributions, Gaussian, Discrete, Analysis
Aberrant observations (outliers) are defined as data points that are distinctly separate from the rest of the data. It is an observation that lies an abnormal distance from other values in a set of data. In statistics, an aberrant observation is an observation that is numerically distinct from the rest of the data. They can occur by chance in any distribution but are often indicative either of measurement error or that the population is heavy tailed. It can also indicate faulty data, erroneous procedures, etc. Section 2 looks at the overview of wavelet analysis which uses both resolution and location in analyzing data completely. Section 3 describes how these outliers will be detected using these distributions which are the main goal of this paper. Section 4 discusses the analysis for these residuals while Section 5 interprets the results, conclusion and informed us of areas of further work.
Wavelet analysis is a statistical tool that can be used to extract information from any kind of data and are
generally needed to analyze data fully at different resolution (scale) and location.
Discrete Wavelet Transform re - expresses a time series in terms of coefficients that are associated with a particular time and a particular dyadic scale 2 J . These coefficients are fully equivalent to the original series from its Discrete Wavelet Transform coefficients.
The Discrete Wavelet Transform allows us to
partition (decompose) the information in a time series into pieces that are associated with different scales and time. This decomposition is very close to the statistical technique known as the Analysis of variance (ANOVA), so DWT leads to a scaled β based ANOVA that is quite analogous to the frequency β based ANOVA provided by the power
spectrum R. Todd Ogden (Dec., 1996).
It effectively decorrelates a wide variety of time series that occurs quite commonly in physical applications. This property is the key to the use of the DWT in the statistical methodology
L. Li and G.Lee (2003).
IJSER Β© 2015 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 61
ISSN 2229-5518
We begin with a set of discrete sequence of data
π¦ = π¦1 , π¦2 β¦ β¦ π¦π Where each of π¦π is a real number and π is an integer ranging from 1to n. we assume that the length of our sequence n is a power of two, π =
2π½ for someπ½ β₯ 0. This should not be seen as a
restriction as this can be modified for other n
Abraham Maslow (Dec., 2008). We call the sequence
Where π = 2 dyadic one. The key information we
extract is the βdetailβ in the sequence at different
scale and different locations. By detail we mean the degree of the difference or variation between
successive observations of the vector that is, π1 =
π¦2 β π¦1 at the given scale and location.
ππ = πππ‘πππ ππ‘ πππππ‘πππ π
ππ = (π¦2π β π¦2πβ1 ) β¦β¦β¦β¦(2.1)
πΉππ π = 1,2, β¦ β¦ β¦ . , ποΏ½2
e.g
π1 = π¦2 β π¦1 , π2 = π¦4 β π¦3, π3 = π¦6 β π¦5 , ππ‘π
In equation (2.1) if the detail in π¦2π β π¦2πβ1 are
similar, then the coefficient ππ will be very small; if
they are exactly the same, ππ is zero and if very large
the coefficient will be very large. ππ encodes the
difference between successive pairs of observations
in the original y vector. ππ Is known as the finest
scale detail Abraham Maslow (Dec., 1998)
π
If π = π½ β 1 then ππ can be written as ππ ,π and the first level averages or smooth πΆπ are renamed to becomes πΆπβ1 , π written as πΆπ,π
To obtain the next coarsest detail, we repeat the
operation of equation (3.1) to the finest level
averages πΆπβ1 , π as follows.
The scale in the quantity 2π½ where π = π½ β 1 and the
level for the intergral quantity while k is the locations
Larger π (positive) corresponds to finer scale and smaller π refers to the coarser scale in the contents of this work from equation 2.2 πΆπβ1,π
πΆπβ1,2πΏ = πΆπβ1,2πΏ + πΆπβ1,2πΏβ1β¦β¦β¦..β¦β¦. (2.3)
For π = 1,2, β¦ β¦ β¦ β¦ β¦ . , ποΏ½4
From the original vector π¦ for π = 1
πΆπβ1, π = (π¦4πΏ+2 + π¦4πΏβ3 ) β (π¦4πΏ + π¦4πΏβ1 )
= (π¦2 + π¦1 ) β (π¦4 + π¦3 )
= π¦1 + π¦2 + π¦3 + π¦4
This is a kind of moving average except that it is not
divided by 1οΏ½4
ππ,π βdetailedβ coefficients are wavelet coefficients
and πΆπ,π coefficients are known as father wavelet or
οΏ½2
π π=1
Is not the conventional first difference vector
scaling function coefficients.
since difference such as π¦3 β π¦2 are missing from
{ππ } location. ππ only gives information about 2π
and its neighbor at the finest possible scale of detail
G.P. Nason (2008)
This general pyramid algorithm is called Haar wavelet transform.
The inverse of the original sequence can be
reconstructed exactly by using wavelet coefficients
At Coarser Scale; for coarser detail
ππ,π
and last πΆ00
W. Lu and I. Traore (2005)
πΆπ = π¦2π + π¦2πβ1β¦β¦β¦β¦β¦β¦β¦ (2.2)
ποΏ½
The behavior of sparsity is a characteristic of
[πΆπ ]
2 is the sum of scaled average (scaled because
π=1
it is not divided by 2). The information in [ πΆπ ] is a
roughing of that original π¦ vector. The operation that
turns [π¦π ] to [ πΆπ ] is similar to the moving average
smoothening operation except that the differencing
does not overlap consecutive pairs. A. Dainotti, A. Pescape and G. Viorgio (2006)
Each πΆπ contains information originating from π¦2π
and π¦2πβ1 (adjacent observations)
wavelet: piece wise smooth functions have sparse
representation G.P. Nason (2008).
To conserve information we change equation (2.1)
and (2.2) by introducing πΌ as follows
ππ = πΌ(π¦2π β
π¦2πβ1 )β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦. (2.4)
πΆπ = Ξ± (π¦2π +
π¦2πβ1 )β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦ (2.5)
The original sequence π¦ consist of 2π½ observations
The inputs are (π¦2π
, π¦2πβ1
) transformed into the
{ππ } consist of ποΏ½2 = 2π½β1 observations
output (π , πΆ ) and the (squared) norm of the output.
π π
IJSER Β© 2015 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 62
ISSN 2229-5518
2 + 2 = 2 2
β
πΆπ = οΏ½ βπΏ π¦2πβπ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ . (2.11)
πΏ=β
d k ck
οΏ½
Ξ± (π¦2π + 2π¦2π π¦2πβ1 + π¦2πβ1 )
=+2βΞ±2 (2 π¦(π¦2π2 ++2π¦ 2π π¦2)πββ¦1 +β¦.π¦.(22π.β61))
βπΏ
β1
2π
2 2
2πβ1
2
= οΏ½2
οΏ½2 πππ πΏ = 0
οΏ½2 πππ πΏ = 1
β¦ β¦ β¦ β¦ β¦ β¦ β¦ (2.12)
Where π¦2π + π¦2πβ1the squared norm of the input
coefficients hence is to wish the norm of output
equals norms of input
Let2 Ξ± 2 =1 therefore
Ξ± οΏ½ = 2 οΏ½2
β1
0 ππ‘βπππ€ππ π
Like the orthonormal discrete Fourier transform, the
discrete wavelets transform (DWT) of ππ‘ is an
β1
Then the discrete wavelet coefficients is
orthonormal transform [5]. Let
[ππ ; π = 0 β¦ β¦ . . π β
π = (π¦ β π¦ )
οΏ½ β¦β¦β¦β¦β¦β¦β¦β¦ (2.7)
β2
Equation 2.7 can be rewritten as
1] be the DWT coefficients then, we can write
π = π€π where π is a column vector of length
π = 22 whose ππ‘β DWT and satisfying π€π π€ = πΌπ
orthornormality implies that π = π€π π€ and
ππ = π0π¦2π +
βπβ2 = βπβ2 . Hence ππ
represents the
π1 π¦2πβ1 β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦ (2.8)
contribution to the energy attributable to the DWT
coefficient with index π.
β1 β1
Where π0 = 2
In general
β
οΏ½2 and π1 = β2 οΏ½2
Whereas ODFT coefficients are associated with
frequencies the ππ‘β wavelet coefficient ππ is
associated with a particular scale and with a
particular set of times H. Nayyar and Ali. A.
ππ = οΏ½ ππΏ π¦2πβ1 β¦ β¦ β¦ . . β¦ β¦ β¦ β¦ β¦ β¦ (2.9)
πΏ=β
β1
Ghorbani (2006).
Explicitly, the rows of this mature for n=0, 8, 12, 14, and 15 are
2
Where ππΏ = οΏ½2
οΏ½2 πππ πΏ = 0
οΏ½2 πππ πΏ = 1
β¦β¦. (2.10)
β1
0 ππ‘βπππ€ππ π
π€π = οΏ½β
1οΏ½ ,
β2
1οΏ½ , 0οΏ½β¦οΏ½οΏ½β¦οΏ½β¦οΏ½οΏ½β¦οΏ½β¦οΏ½οΏ½.0οΏ½
β2
Equation 2.9 is similar to a filtering operation with
β
14 π§πππ
π = οΏ½β 1οΏ½ , β1οΏ½ , 1οΏ½ , 1οΏ½ , οΏ½0 β¦οΏ½ β¦οΏ½οΏ½β¦ οΏ½.0οΏ½
coefficient of {ππΏ }π=β [10,11]
That is the input sequence can be thought to possess a
norm as defined by
π€8
2 2 2
2
12 π§πππ
π€π =
π
2 οΏ½β 1οΏ½ , β¦ β¦ β¦ β¦ . . , β 1οΏ½ , 1οΏ½ β¦ β¦ β¦ 1οΏ½ , οΏ½0 β¦οΏ½ οΏ½β¦οΏ½.οΏ½. οΏ½.0οΏ½
βπ¦β2 = οΏ½ π¦π β8
πΏ=1
Another interesting component of the filter object is
the H component which is equal to the vector
β8 β8
β8 8 π§πππ
β1 β1
π€π = οΏ½β
1οΏ½ , β¦ β¦ . . , β
1οΏ½ ,
1οΏ½ β¦ β¦ β¦
1οΏ½ οΏ½
operation οΏ½2
οΏ½2, 2
οΏ½2 οΏ½ which is involved in the
β4 β4 β4 β4
filtering operation analogous to that equation 3.12
that produce πΆπ as
π€π = οΏ½1οΏ½ β¦ β¦ β¦ 1οΏ½ οΏ½
15 β4 β4
The remaining eleven rows are shifted version of the
above;
IJSER Β© 2015 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 63
ISSN 2229-5518
π€1 = π2 π€0 , π€2 = π4 π€0 β¦ β¦ π€7 = π14 π€0
π€9 = π4 π€8 , π€10 = π8 π€8 π€11 = π12 π€8
π€13 = π8 π€12
Let us now, define exactly what the notation of scale
means for a positive integer π let
π‘ (π)
πβ1
1
= οΏ½ ππ‘β1 β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ β¦ . β¦ β¦ β¦ (2.13)
π
π =0
Donald B. Percival, Andrew T. Walden (2000)
In this section, we assume that the higher the value of
the residuals, the more anomalous the data Wei Lu, Mahbod Tavallaee and Ali A. (2008). As a result, in order to identify these outliers the residuals of these
distributions at different resolutions will be obtained and compared to identify their rate of detection J. McHugh (2000) and P. Barford, J. Kline, D. Plonka and A. Ron (2002).
The purpose for analyzing the residuals of these
distributions is to support our assumption in section (3). The data analyzed were simulated from Normal distribution involving 1020 data set. Since Wavelet analysis is dyadic, we introduced four data within the maximum and minimum values in the data set and analyzed it as Normal distribution without aberrant observations (NO) at different resolution (j). These four values were removed and four aberrant observations were now introduced and further analysis using Normal (NW), Laplace and Cauchy distributions were used to analyze the contaminated data set at different resolutions. The mean and standard deviation (residual) were obtained at different resolutions using the Maximum Likelihood estimate which of course, is more efficient than the conventional method. Since Wavelet analysis is dyadic, the data were analyze at different band size (1024, 512, 256, 128, 64, 32) and at different resolutions (j= 10 9,8,7,6 and 5) respectively.
Resolutio n level j | Band Size | NO | NW | LAPLACE | CAUCHY | ||||
Resolutio n level j | Band Size | Mean | StdDev | Mean | StdDe v | Mean | StdDev | Mean | StdDev |
10 | 1024 | 0.04545 | 0.9869 | 0.0727 | 1.4858 | 0.0711 | 0.8613 | 0.0671 | 0.6255 |
9 | 512 | -0.0581 | 0.9470 | 0.0160 | 1.5503 | 0.1752 | 0.8681 | 0.1701 | 0.6326 |
8 | 256 | 0.0573 | 0.9738 | 0.0547 | 1.5059 | 0.0012 | 0.8975 | 0.0114 | 0.6599 |
7 | 128 | -0.2162 | 0.9549 | -0.0163 | 1.5041 | 0.2338 | 0.9059 | 0.2465 | 0.6198 |
6 | 64 | -0.0989 | 0.6676 | -0.1550 | 1.5739 | -0.1842 | 0.8498 | -0.1572 | 0.5021 |
5 | 32 | -0.1220 | 0.6514 | -0.2535 | 1.4314 | -0.0332 | 0.8505 | -0.0036 | 0.4851 |
NO: Normal distribution without aberrant
observations
NW: Normal distribution with aberrant observations
From the above, the mean and standard deviations for
the coefficients of Normal distribution without outliers(NO) at different resolutions(j) or band size with approximately mean = 0 and standard deviation
= 1 confirms the absence of outliers. Also for the other three distributions (Normal with outliers(NW), Laplace and Cauchy), it was observed that the
Laplace distribution has a standard deviation closer to that of Normal without outliers, followed by the Cauchy distribution and finally the Normal distribution with aberrant observations(NW).
Since Normal (Gaussian) distribution with aberrant observations has the highest standard deviation at all resolutions from Normal, we conclude that among the three distributions, it is the most efficient in detecting aberrant observations. While on the other hand, Laplace (Non-Gaussian) distribution whose standard deviations at different resolution is closest to the Normal distribution without aberrant observations
IJSER Β© 2015 http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 64
ISSN 2229-5518
is regarded as the optimal distributions for modeling
[7] L. Li, and G.Lee, β DDoS attack Detection and Waveletsβ
th
aberrant observations among these distributions.
Proceedings of 12
International Conference on Communication
References
[1] Abraham Maslow, Histogram Smoothing Via The Wavelet Transform. Journal of Computational and Graphic Statistics, Vol.7, No.4 (Dec., 1998)
[2] A. Dainotti, A. Pescape and G. Viorgio βWavelet-based
Detection of Dos Attackβ
Proceedings of IEEE Global Telecommunication Conference, San
Francisco, 2006
[3] Donald B. Percival, Andrew T. Walden, βWavelets Methods for Time Series Analysis.β Cambridge University Press (2000)
[4] G.P. Nason, Wavelet Methods In Statistics With R Springer
(2008)
[5] H. Nayyar and Ali. A. Ghorbani βApproximate Autoregressive Modeling for Network Attack Detection,β Proceedings of the 4th Annual Conference on Privacy, Security and Trust, pp. 175-184, Markham, Canada, 2006
[6] J. McHugh, β Testing Intrusion Detecting System: A Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory ,β CAM Transition on Information and System Security, 3 (4):262-294,
2000
and Networks, pp. 421-427, Texas, 2003
[8] P. Barford, J. Kline, D. Plonka and A. Ron, βA Signal Analysis of Network Traffic Anomaliesβ A Proceedings of Internet Workshop 2002, Marseille, France, 2002.
[9] R. Todd Ogden βEssential Wavelets for Statistical Application and Data Analysisβ Birkhauser Boston (Dec., 1996)
[10] W. Lu and I. Traore βA Novel Unsupervised Anomaly Detection Framework for Detecting Network Attacks in Real Timeβ, Lecture Note in Computer Science, Vol., 3810, pp. 96-109, Springer, 2005, Y.G. Desmedt et al (Eds.)
[11] Wei Lu, Mahbod Tavallaee andAli A. βGhorbani Detecting
Network Anomalous Using Wavelet Basis Functionsβ CNSR, pg
149-156, IEEE Computer Science (2008)
First Arthur
Shittu Olarewanju Ismail*
P.D.S.,B.Sc, M.Sc, M.Phil,Ph.D
Second Arthur
Aideyan Donald Osaro**
P.D.S.,B.Sc, M.Sc.
IJSER Β© 2015 http://www.ijser.org