Author Topic: Anomaly Detection through NN Hybrid Learning with Data Transformation Analysis  (Read 4006 times)

0 Members and 1 Guest are viewing this topic.

IJSER Content Writer

  • Sr. Member
  • ****
  • Posts: 327
  • Karma: +0/-1
    • View Profile
Author : Saima Munawar, Mariam Nosheen and Dr.Haroon Atique Babri
International Journal of Scientific & Engineering Research Volume 3, Issue 1, January-2012
ISSN 2229-5518
Download Full Paper : PDF

Abstract— Intrusion detection system is a vital part of computer security system commonly used for precaution and detection.It is built for classifier or descriptive or predictive model to proficient classification of normal behavior from abnormal behavior of IP packets. This paper presents the solution regarding proper data transformation methods handling and importance of data analysis of complete data set which is apply on hybrid neural network approaches for used to cluster and classify normal and abnormal behavior to improve the accuracy of network based anomaly detection classifier. Because neural network classes only require the numerical form of data but IP connections or packets of network have some symbolic features which are difficult to handle without the proper data transformation analysis. For this reason, it got non redundant new NSL KDD CUP data set. The experimental results show that indicator variable is more effective as compared to the both conditional probabilities and arbitrary assignment method from measurement of accuracy and balance error rate.
Index Terms — ANN, Anomaly Detection, Self Organizing Map, Backpropagation network, Indicator variables, Conditional probability

1   INTRODUCTION                                       
In computer security, network administrators always suggest prevented action for better cure of any system. Intrusion Detec-tion Systems (IDS) are classified in to three categories which are host-based, network-based and vulnerability-assessment [1].Signature based detection and anomaly detection model are two basic models of intrusion detection. In signature based, it is only used to detect attack through known intrusions and it cannot be detected novel behavior. It is specially used in commercial tools and it has to update new attacks in database.The anomaly intrusion detection can be resolved these limitation of signature based and used to detect new attack via searching abnormality [2], [3]. Anomaly detection issues have numerous possibilities that are yet unexplored [4]. Network and computer security is significant issues of every security demanded organization. Prevention, detection and response are three basic foundation of network security.For this purpose many researchers emphasizes on preventive action over the detection and response [5]. For increasing the demand of network security, many devices like firewall and intrusion detection used to contol the abnormal packets accesibility.Basically abnomal packets violate the internet pro-tocol standards and these packets is used to crash the systems [6].So this reason better intrusion detection devices are building for prevention and accurate detection of normal and abnormal packets and to reduce the false alarm rate. IDS are basically de-voted to fulfill this purpose to monitor the system intelligently. As far as the access control points is concerned ,firewall is good but it is not designed to prevent action against intrusions that's why most security experts emphasizes the IDS which is located before and after the firewall [7], [8].Many researchers have been improving intrusion detection systems through different research areas such as statistics, machine learning, data mining, information theory and spectral theory[2], [3] [4].The purpose of this research is to provide the hybrid learning of artificial neural network base design approach for anomaly intrusion detection classifier system.There is unable to directly handle the symbolic features of IP data set so that It is considered that there are two data transformation methods indicator variable and conditional probabilities which are effective to improve the classifier performance, it is processed through hybrid technique self organizing map and backpropagation of neural network.The data transformation is processed on selecyive nine features of IP NSL data set.It is prepared for anomaly detection classifier which is used for LAN security.

Five sections are presented in this research. Section 2 is back-ground literature of the related research processes. Section 3 pro-vides the detail analysis of proposed research methodology, algorithms of SOM and BPN and their training and testing results are discussed. Section 4 provides detail analysis of experimental results of the research and comparison between two methods effect the performance of classifier. In last, section 5 presents conclusion and discussed the future direction of this domain.

2.1 Hybrid learning use in misuse and anomaly detection

Hybrid approaches have been used to resolve the anomaly intru-sion detection problems. Hamdan [9] comparison four tech-niques of supervised learning of support vector machine and neural network self organizing map and fuzzy logic of unsuper-vised learning techniques. It is only proposed descriptions of theses techniques but did not include the methodology and nu-merical analysis of all these applied techniques. Other approach artificial immune system is used for detection and self organizing is used for classification. It is emphasized the higher level information output rather than the low level for more beneficial to security analyst to analyzing reports. The KDD CUP 1999 data set is used as input for specially focused on two types of attacks which is denial-of-service and user-to-root attacks [10].M.bahrololum et .al [11] presented the design approach and it would be used further explanation in future enhancement. It described introduction of SOM and backpropagation algorithm, KDDCUP data set features, training and testing data, experimenting table view. But besides all of these it did not mentioned how to arrange and used this data set in to which software, how to implement experiment, how to apply these techniques on data set and what methods used to evaluate result. It only provided the proposal and discussed some design issue with flow diagrams. Hayoung et al [12] proposed the new labeling methods apply for this domain but in real time system detection, if no correlation build how to detect the normal or anomaly data set but labeling is supervised learning ,again a huge analysis will require for the correlation between the features. But it did not provided the labeling time only described the detection time but in real time system total time is require for the completion of all processes.

2.2 Analysis and Data Transformation Processes

The data analysis and preprocessing is core part of the artificial neural network architecture for processing and analysis of accu-rate result. Anomaly detection has been paying attention of many researchers during the last decade. Due to this reason many researcher not only considered the new algorithms but also taking analysis of data set used for training and testing classifier. The KDD CUP 99 data set is mostly used for intrusion detection problems. It has 41 features. There are three basic features which are individual TCP connections feature, content features, and traffic features which include 7 symbolic and 34 continuous attributes [13]. Tavallaee presented the detail and critical review of KDDCUP99 data set. It is discussed the problems in KDD CUP99 data set and resolved two issue of KDD CUP 99 data set which affects the performance and poor evaluation in anomaly detection approaches. It proposed new data set NSL-KDD, which include selected records and remove redundancy of records of KDD CUP 99.The form of this data set is ARFF (attribute relation file format).The authors claimed that this data set will help researcher for solving anomaly detection problems [14].Preprocessing apply before processing of neural networks algorithms because these algorithms require the quantitative data instead of qualitative information.The most commonly conversion method used is arbitrary assignment but criticizing of this method, three other approaches is using for machine learning algorithms. E.Hernandez presented three methods for symbolic features conversion apply on KDD CUP data set. It described all these techniques in detail and also described the comparison of these techniques have been applied on different feed forward neural network and support vector machine. They claimed that these three conversion methods improve the prediction ability of the classifier. These methods are using for preprocessing (symbolic attributes convert in to numeric form) which is indicator variable, conditional prob-abilities and SSV (Separability split value) criterion based method [15].

This section is divided into theses main processes which are data Analysis, preprocessing, modeling of clustering and classification and performance evaluation.

3.1 Data Analysis
NSL KDD CUP data set is reasonable and improves the evalua-tion.This data set is offline and it is provided for anomaly detec-tion classification research to better evaluation of classifier.It also gives the consistent and getting more comparable results [14], [17], [18].

3.2 Feature selection
It is difficult to select the important feature for detecting and classification between normal packets and attacks. More research work is doing for selection of feature on anomaly detection problems.The basic question is how many types of features are selective for improving the classification rate and to relate which types of attacks. In this research, first basic 9 attributes of individual tcp connections are used. It consist of duration, types of protocol, services, source bytes, destination bytes, flag, land, wrong fragment and urgent. These features have 3 symbolic and 5 continues attributes. The protocol and services are most important features to detect the attacks [13], [14].The main purpose to select these features because it has maximum number of symbolic features instead of others for handling symbolic features preprocessing.

3.3 Preprocessing
 The given input data set has symbolic and continuous attributes. These data set need to be converted in to numerical form for processing on neural network algorithms. Researchers are finding best data transformation techniques applying on selected features for improving the performance of classifiers. The main purpose is to show how different preprocessing methods affect the accuracy of different tasks of machine learning simulation.Besides the modification of algorithm, it also important to consider data transformation and feature selection methods according to the demand of any machine learning and training.The details of data transformation methods are used in this research which is given below.

Read More: Click here...