International Journal of Scientific & Engineering Research, Volume 6, Issue 1, January-2015 1364

ISSN 2229-5518

Combination of Multi Classification Algorithms for

Intrusion Detection System

Dr. Maiwan Bahjat Abdulrazaq, Azar Abid Salih

Computer Science Department, Faculty of Science, University of Zakho

AbstractClassification is one of the common tasks that are involved in data mining to build models for the prediction of future data. It performs its task by different classifier algorithms. This paper provides an approach based on information gain to determine the most distinguishing subset features of each attack class and combine multi classification algorithms which includes (Decision Tree J48, k nearest Neighbor and Naïve Bays). These classifi- ers are used for the task of detecting intrusions and comparing their relative performances. The goal of this work is to analyze the performance and ac- curacy of classification algorithms in order to identify the most efficient algorithm for each attack class, and then build accurate intrusion detection sys- tem. The proposed model has been applied on KDD Cup 99 data set using 60% of them for training and 40% for testing. These experimental results show that multiple classifiers work better than a single classifier. Also, multiple classifiers are more accurate and have abilities of distinguishing among the different attacks and normal connections effectively.

Index TermsClassification Algorithms, Information Gain, Feature Selection, Data preprocessing, Intrusion Detection System, KDD Cup 99, Deci- sion tree.

.

1 INTRODUCTION

—————————— ——————————
ecently with a large number of internet users and differ- ent types of threats, the computer network security is be- coming more important to defense information and de- tecting attacks. Thus, the Intrusion Detection System (IDS) is the best way possible to protect systems from anonymous at-
tacks [1].
Intrusion is any attempt that threatens the integrity, confi-
dentiality, availability of a resource, or to override the security
mechanisms of a computer system or network [2]. The IDS can
be defined as a process in network security for monitoring and
analyzing events occurring in computer and network traffic
system to detect signs of security problem and identify if there
is any attack and helping in the protection of the system [3].
Generally, there are two techniques in analyzing the events
using IDS: Misuse-(signature) based detection and anomaly
based detection. In the misuse based IDS, a database of known
attack signatures are maintained. Attacks are detected by
comparing them with data collection unit and data stored in
the database attack signatures. If a match occurs, then attack signal gets generated that they detect only the attacks for which they are trained to detect.
While anomaly based IDS detects attacks by observing de- viations from the normal activities of the system by some his- tory of monitoring systems, for example, earlier behavior or some previously defined profile of that system. The system matches the current profile with the previous profile to see if there is any significant deviation. Then, that activity is notified as an attack. The primary advantage of anomaly detection is the ability to detect novel and unknown attacks [4].
The IDS is a technique that provides security for both
computers and networks. There are basically two types of in-
trusion detection systems [5]:
-Host-based IDS: It analyzes and monitors user activities and
the operating system events on an individual computer that
serve as hosts which are the internals of a computing system
to detect the unauthorized intruders. Such as including audit
record of operation system, system logs, application programs information.
-Network-based IDS: It analyzes and monitors network traffic from all computers. These systems collect information from the network itself and examine the traffic packet in real time. It capable of monitoring to detect remote attacks such as: Dos attacks, port scan.

2 RELATED WORKS

This section discusses some of the techniques that used for de- signing and developing IDS. There are many works in the literature that performed it.
In 2012 Chandolikar and Nandavavdekar [6], proposed two data mining classification algorithms: Bayes net and J48 algorithm. Then
, they analyzed and compared the results found that J48 learning
algorithm was performing better than Bayes net in terms of better accuracy and lower error rate.In 2013, K. Kumar, et al [7], used dif- ferent feature selection algorithms used such as Information Gain, Gain Ratio, One R, Relief etc. Combining the features of the best algorithms whose performance is better by comparing the results with each other using Decision Tree J48 classifier .They proved that the dimension reduction of data and select the most relevant features in intrusion detection can reduce training time, and increase classifi- cation accuracy. In 2013, Jayshri R. Patel [8], applied the decision tree classification algorithms such as C4.5 and CART, Random forest and Rep tree for intrusion detection. They used the information gain technique for the feature selection and decision tree for classifica- tion. They selected 15 features among 41 features for classification. The experiments showed that the performance of Random Forest is better as it correctly identifies more number of instances than the others and the decision tree classifier is able to improve the classifi- cation results for ranked intrusion detection. In 2014, V.Kosamkar and S.Chaudhari [9], proposed two hybrid algorithms for developing the intrusion detection system. C4.5 Decision Tree and Support Vec- tor Machine (SVM) were tested with benchmark NSL- KDD data set. Correlation-Based Feature Selection (CFS) algorithm is used for feature selection.

IJSER © 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 1, January-2015 1365

ISSN 2229-5518s

3- BACKGROUND THEORY

3.1 Intrusion Detection System (IDS)

IDS plays important role in network security and it is one of the
key technologies to guarantee the systems security. The goal of In- trusion Detection is to identify or detect wide variety type of attacks. The functions of intrusion detection system are [10]:
1- Analyzing and monitoring network traffic to classify whether the
record is an attack or normal.
2- Analyzing system configurations and vulnerabilities.
3- Detect existing gaps in protection systems.
4- Ability to recognize and record all kinds of threats that occur in
the network.
5- Identify the mistakes of the protection system administrator and abnormal activities

3.2 KDD CUP 99 DATA Set Description

A Since 1999, KDD Cup 99DataSet has been the most widely used data set for the evaluation of anomaly detection methods. Addi- tionally, it is built based on the data captured in DARPA’98 IDS evaluation program which was prepared and managed by Lincoln Laboratory at MIT and it was used in the (Third International Knowledge Discovery and Data Mining tools competition), Which was held to evaluate the results of intrusion detection research [11].The KDD training dataset consists of approximately 4,900,000 single connection vectors.The feature or attribute 42 include target class labeled as either normal or attack as shown in Table 1.

Table 1: W hole KDD 99 Intrusion Detection Dataset

The dataset used in the present study is a smaller subset (10% of the original training set), that contains 494,021. Every record in the dataset has 41 features or attributes including target class. The KDD 99 datasets are divided into two parts: the training dataset and the testing dataset.
The classes in KDD99 dataset can be categorized into five main classes (one normal class and four main intrusion clas- ses: probe, DOS, U2R, and R2L) which are divided into 22 dif- ferent attacks as shown in Table 2.
1) Normal: the normal connections are generated by simulated daily user behavior such as downloading files, visiting web pages [11].
2) Denial of Service Attack (DoS): is an attack in which the attacker makes some computing or memory resource too busy or complete to handle legitimate requests, or denies legitimate users access to computer system ,for example ping of death
,land, mail bomb, syn flood[11].
3) Probing Attack (Prob): is an attack in which the hacker
scans a machine or a networking device in order to determine
weaknesses or vulnerabilities that may later be exploited so as
to compromise the system, for example: Port scanning [12].
4) User to Root (U2R) Attack: In this type of attack a local user on a machine is able to exploit some vulnerability to obtain privileges normally reserved for the root users, for example:
buffer overflow attack [13].
5) Remote to Local (R2L) Attack: In this type of attack an at-
tacker who has the ability to send packets to a machine over a
network but does not have an account on a victim machine
exploits some vulnerability to obtain privileges local access to
the machine and modifies the data for example: password
guessing[13].

Table2: Attacks in KDD 99 Training dataset

4 Frame Work

There many stages depended to achieve the proposed system. The stages are started at preprocessing data it consists of data cleaning

from the noise such as inconsistent, missing values and removing duplicate to handling model to get the best result for classifying, and then transform the features that has text forms to numeric forms. This is an important operation for machine learning classification. Additional that the data scaled within a small specified range (0 to 1) by using Min-Max normalization. After that this model can effective- ly select the most distinguish features for intrusion detection rate by using a reduced features set (Information Gain) with several kinds of classification algorithms including (Decision Tree j48, K Nearest Neighbor and Naïve Bayes) to identify which algorithm is better and get best results classification accuracy of each type of attack class. The input data used in the system is the intrusion detection dataset called KDD’99 cup dataset. The classifier is trained 60% of the data set and theremainder for the testing set.

Fig.1. Flowchart of general structure of the proposed system

IJSER © 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 1, January-2015 1366

ISSN 2229-5518s

4.1 Data Pre-Processing

In real world data are generally dirty such as it contains er- rors, missing value, duplicate, outlier, incomplete, irrelevant and inconsistent data. The purpose of data preprocessing is to clean the noise data, extract features, and transforms thepreliminary datainto a format thatwill be more

easily and effectivelyprocessedfor the purposeof

theuser. This technique helps to improve the efficiency of

the algorithm to classify the data correctly.
The data preprocessing involves three stages
1- Data cleaning: this stage is responsible for removing any records contains a missing value and inconsistent value. It also removes duplicate records in the data set.
2- Data transformation: after data cleaning the next stage of
pre-processing is to transform or convert the features that
have text forms to numeric form to be suitable input for classification algorithms.
3- Data normalization: as a step of data preprocessing when datasets are too large, attribute normalization is important to detection performance. The ranges of the feature are
normalized by scaling it is value so that they fall within the small specified range 0 to 1). The method min- max nor- malization is applied.

Min-Max Normalization: performs a linear transformation on the original data values. Suppose that min and max are the min- imum and maximum of feature x. mapping interval [ , ]

into a new interval to [, ].Consequently, every value v from the original interval will be mapped into value

using
….(1)

Table 3: The result of pre processing

5 Feature Selection: is a process to select the most relevant feature set by removing irrelevant or redundant features. Only subsets of original features are selected.

The Objectives of feature selection techniques are [14]:
-Reduce the dimensionality of feature space to avoid in curse
of dimensionality.
-Speed of learning algorithms.
-Reduce large amount of data
- Less memory storage.
- Increasing the training (learning model) performance.
- Enhance Predictive accuracy by decreasing overfitting.
-Sampling the data more efficiently and improving the data
quality.

6 Information Gain (IG)

In this method, the features are filtered to create the most in- fluential feature subset before the start of the learning process. This algorithm using rankers method on features and evaluate the feature by ranking them from most important to least im- portant.
Entropy is commonly used in the information theory measure of the impurity in the collection of training data.
The feature has higher entropy is more information content. Given a collection of instances S

Proportion of S belonging to class i

……………………..(2) Information gain of an attribute A is defined as:
………….(3) A: Is the set of all possible values for attribute A
SV: is the subset of S for which A has value
Only the highest attributes ranking is used by the classifica- tion algorithms to classify the input dataset to either normal or attack as shown in Table 4.

Table 4: Attribute ranking by Information Gain

7 CLASSIFICATION

Classification is a supervised learning technique. It is one of the important tasks of data mining predictive modeling that classify the data item into predefined class label [15].
In the classification the following techniques are mainly used like Decision Trees (j48), Bayesian (Naive Bayes) and Lazy (K- Nearest Neighbor).Before classification, data preprocessing techniques and feature selection are applied this techniques improve the efficiency of the algorithm to classify the data correctly.

7.1 Classification Algorithm

IJSER © 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 1, January-2015 1367

ISSN 2229-5518s

Differentiation classification algorithms have been used for the performance evaluation, below are listed.

7.1.1 Decision Tree (J48)

In classification, there are a number of tree algorithms the most popular tree classifier is J48 decision tree which was primarily designed as the enhanced version of C4.5 that builds decision trees by using the concept of information entropy. It uses the fact that each attribute of the data can be used to make a decision by splitting the data into smaller subsets which consist of nodes that form a rooted tree. The tree has three types of nodes. The first is a root node that has no in- coming edges, the second one is Internal nodes (branches) and all other nodes are called leaves also known as terminal or decision nodes, each leaf node is assigned a class label, and other internal nodes contain attribute test conditions to sepa- rate records that have different characteristics [16].
For building decision tree, the pseudo code for building Deci- sion Trees is written below
1. Check for base classes
2.For each attribute a find the information gain from splitting a
3. Let A is a best attribute with the highest information gain.
4. Create a decision node that splits the A
5. Recurs on the sub lists obtained by splitting a best and add
those nodes as children’s of nodes.

7.1.2 Naïve Bayes

Bayesian classifiers a simple approach based on the infer- ences of probabilistic graphical models. It is very easy to con- struct, not needing any complicated iterative parameter esti- mation schemes and also predicts the class label in the fastest time [17].
The Naive Bayes classifier is a supervised learning algorithm based applying Bayes' theorem. This method assumes that all the features are independent values of each other predictors.
This assumption is called class conditional independence.
Using Bayesian theory
………..(4) …………...(5)

…..(6)

Let X: be the data record is a possible value in the session class

Ci: represent the category of classes

P(Ci|X):is the posterior probability of class.
: is the prior probability of the class.
: is the probability of class given predictor attribute.
P(X): is the prior probability of the predictor.
C1=Normal, C2=DOS, C3= Prob, C4=R2L, C5=U2R

7.1.3 k Nearest Neighbor (KNN)

It is one of supervised classification lazy learning which
takes more time for predicting. It uses the distance weighting measures with the capability attributes of thesearch method and only valid for continuous variables.
K is the number of nearest neighbors.
Using Euclidean distance between two input vectors with m

quantitative features where x = (x1,…., xm) and y = (y1,…., ym).
………….….(7)
Let x be the point to be labeled. Find the point closest to x, let it be y.
KNN Procedure steps:
1- Store the KDD99 training data set with corresponding label.
2- K: Number of nearest neighbour.
3- Calculate the distance between (x', y') test and every exam- ple (x, y) training, select the minimum nearest to neighbour.
4- Repeat the procedure for all connection in testing dataset.
5- For End.
Disadvantages K Nearest Neighbor [18]:
1-large space of memory requirement to store the complete
training
2-That lazy learning methods take more time are usually
slower to classify data set.

8 Training Data Stage

It is a dataset that is used to build a classifier which is a pro- cess of learning something from instances in order to predict the class attributes of new coming unknown instances. Each instance in the training set contains one “target value'' is a class label. The learning phase, where the training data is ana- lyzed and classification rules are generated.

9 Testing Data stage

It is a dataset that is used to evaluate the learning algorithm. It is used to validate the learning algorithm. These learning algo- rithms when applied to training dataset makes that dataset learn to work according to that algorithm. Then to check if that algorithm is appropriate to be used with that dataset is vali- dated by evaluating that dataset with the testing dataset. The testing phase is the classification, where test data is classified into classes according to the generated rules.

IJSER © 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 1, Jan 68

ISSN 2229-5518s

Confusion Matrix: information about actual and predicted classi- fications done by a classification system. For two classes case can be represented as shown in table (5).

Table 5: Confusion Matrix

Fig.2. Flowchart of Training stage

Table 6: Number of Records for Training 60% and Testing 40%

10.1 Training results

Table 7: 60% Training Decision Tre

Fig.3. Flowchart of Testing stage

10 Experiments and Result

The performances metrics are calculated as follows: True positive (TP): Attacks correctly classified as attacks
False positive (FP): Incorrectly classified normal data as attacks True negative(TN): Normal correctly classified as normal False negative (FN): Incorrectly classified attacks as a normal

Table 8: 60% Training K Nearest Neighbor

IJSER © 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 1, January-2015 1369

ISSN 2229-5518s


Table 9: 60% Training Naïve Bayes

10.2 Testing results

For testing stage the number of records for each class (Normal=
53132, Dos = 21829 , Prob= 852 , R2l =400 , U2R= 21)

Table 10: 40% Testing Decision Tree


Table 11: 40% Testing K Nearest Neighbor

Fig.4. Comparison chart of accuracy Classifiers for each attack class. (Testing Results)

11 CONCLUSIONS

In this paper, the classification algorithms Decision Tree (j48), lazy (KNN), Bayesian (Naïve bays) are discussed and analyzed the classification accuracy of each type of class. The results show that applied preprocessing data and relevant feature selection using information gain for reducing features of the data set are very important to decrease the learning time of the algorithm and increase of the classification accuracy.
The accuracy of classification gets the best result when (Prob) class is used with Decision Tree classifier while (R2L and U2R) are used with K Nearest Nighbour classifier, and the (Dos) is used with Naïve Bayes. The combination among DT, KNN and NB may give better accuracy for the above classes. The Naïve Bayes classifier under-performs and gives less accuracy than any of the two other classifiers but it is a faster classifier for building a model and the KNN is slower as it takes more time to test a model for training classifier.

References

Table 12: 40% Testing Naïve Bayes

[1] ReyadhSh.Naoum, Wafa' S. Al-Sharafat, "Adaptive Framework for Network Intrusion Detection by Using Genetic-Based Machine Learning Algorithm", IJCSNS International Journal of Computer Sci- ence and Network Security, VOL.9No.4. April 2009.

[2] Megha Aggarwal, Amrita,"Performance Analysis of Different Feature- Selection Methods in Intrusion Detection'', International journal of science & technology research volume 2, Issue 6, June 2013.

[3] Lata, InduKashyap, '' Study and Analysis of Network based Intrusion Detection System'', International Journal of Advanced Research in- Computer and Communication Engineering Vol. 2, Issue 5, May 2013

[4] Devikrishna K, Ramakrishna B, ''An Artificial Neural Network based

IJSER © 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 1, J 1370

ISSN 2229-5518s

Intrusion Detection System and Classification of Attacks '', Vol. 3, Is- sue 4, Jul-Aug 2013.

[5] G.V. Nadiammai, S.Krishnaveni, M. Hemalatha, ''A Comprehensive Analysis and study in Intrusion Detection System using Data Mining Techniques'', International Journal of Computer Applications, Vol- ume 35– No.8, December 2011.

[6] N.S.CHANDOLIKAR, V.D.NANDAVADEKAR, ''Comparative Analy- sis of two Algorithms for intrusion attack classification using KDD CUP DataSet'', International Journal of Computer Science and Engi- neering ( IJCSE ) Vol.1, Issue 1 Aug 2012.

[7] Krishan Kumar, Gulshan Kumar, Yogesh Kumar, '' Feature Selection Approach for Intrusion Detection System'', International Journal of Advanced Trends in Computer Science and Engineering (IJATCSE), Vol.2, No.5, 2013.

[8] JAYSHRI R. PATEL, '' Performance Evaluation of Decision Tree Classi- fiers for Ranked Features of Intrusion Detection'', NOV 12 TO OCT

13 | volum – 02, ISSUE – 02.

[9] Vaishali Kosamkar, Sangita S Chaudhari,'' Improved Intrusion Detec- tion System using C4.5 Decision Tree and Support Vector Machine'', International Journal of Computer Science and Information Technol- ogies, Vol. 5 (2) , 2014.

[10] Prabhdeep Kaur & Shevetavashisht, '' Evaluation of Intrusion Detec- tion Technique and Algorithms in Terms of Performance and Effi- ciency through Data Mining'', International Journal of Computer Sci- ence Engineering and Information Technology Research (IJCSEITR) Vol. 3, Issue 2, Jun 2013.

[11] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani '' A Detailed Analysis of the KDD CUP 99 Data Set'', Proceedings of the 2009 IEEE Symposium on Computational Intelligence in Security and Defense Applications (CISDA 2009).

[12] Swati Paliwal, Ravindra Gupta ,'' Denial-of-Service, Probing & Re- mote to User (R2L) Attack Detection using Genetic Algorithm'', In- ternational Journal of Computer Applications ,Volume 60– No.19, December 2012.

[13] Rupali Datti, ShilpaLakhina,''Performance Comparison of Features Reduction Techniques for Intrusion Detection System'', IJCST Vol. 3, Issue 1, Jan. - March 2012.

[14] Tanya Garg, AmandeepKaur, ''A Review Paper on Data Mining Ap- proach for Feature Selection for Network Intrusion Detection Sys- tem'', Volume 3, Issue 11, November 2013.

[15] Syeda FarhaShazmeen, Mirza Mustafa Ali Baig, M.Reena Pawar, ''Per- formance Evaluation of Different Data Mining Classification Algo- rithm and Predictive Analysis'', IOSR Journal of Computer Engineer- ing (IOSR-JCE), Volume 10, Issue 6 (May. - Jun. 2013).

[16]- Upendra,''An Efficient Feature Reduction Comparison of Machine Learning Algorithms for IntrusionDetection System'', International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) , Volume 2, Issue 1, January – February 2013.

[17]-MrutyunjayaPanda1, Manas RanjanPatra, ''Evaluating Machine Learning Algorithms for Detecting Network Intrusions'', Internation- al Journal of Recent Trends in Engineering, Vol. 1, No. 1, May 2009.

[18]- S. Vijayarani1, M. Muthulakshmi, ''Comparative Analysis of Bayes and Lazy Classification Algorithms'', International Journal of Ad- vanced Research in Computer and Communication Engineering Vol.

2, Issue 8, August 2013.

Dr. Maiwan Bahjat Abdulrazzaq received his BSc at
1991 in Computer Science Dept. Department at Mosul University,
and Master degree from Al Neealin University, Faculty of
Computing Science & Information Technology, Computer
Science Dept. at 2003, and PhD degrees from University of Zakho, Faculty of Science, Computer Science Dept. at 2013. He is lecturer in University of Zakho.

Azar Abid Salih, received his BSc at 2009 in Computer Science Dept. at University of Duhok.He is working now to get MSc degree in Computer Science. This paper is one of his research part productions working together with his supervi- sor Dr. Maiwan Bahjat Abdulrazaq.

IJSER © 2015 http://www.ijser.org

International Journal of Scientific & Engineering Research, Volume 6, Issue 1, January-2015 1371

ISSN 2229-5518s

IJSER © 2015 http://www.ijser.org