Author Topic: Survey on Data Mining Techniques in Intrusion Detection  (Read 2137 times)

0 Members and 1 Guest are viewing this topic.

IJSER Content Writer

  • Sr. Member
  • ****
  • Posts: 327
  • Karma: +0/-1
    • View Profile
Survey on Data Mining Techniques in Intrusion Detection
« on: August 21, 2011, 06:55:44 am »
Author : Amanpreet Chauhan, Gaurav Mishra, Gulshan Kumar
International Journal of Scientific & Engineering Research Volume 2, Issue 6, June-2011
ISSN 2229-5518
Download Full Paper : PDF

Abstract-Intrusion Detection (ID) is the main research area in field of network security. It involves the monitoring of the events occurring in a computer system and its network. Data mining is one of the technologies applied to ID to invent a new pattern from the massive network data as well as to reduce the strain of the manual compilations of the intrusion and normal behavior patterns. Keeping in mind, data mining techniques are practiced significantly intrusion detection and prevention. This article reviews the current state of art Data mining techniques with ID in brief and highlights its advantages and disadvantages.

Keywords: - Network Intrusion, Decision Trees, Naïve Bayes, Fuzzy Logic, Support Vector Machines, Data Clustering, Data Mining.

The internet has become a part of daily life and an essential tool today. It aids people in many areas, such as business, entertainment, and education etc. Most traditional communications media including telephone, music, film, and television are being reshaped or redefined by the Internet [1]. Newspaper, book and other print publishing have to adapt to Web sites and blogging. The Internet has enabled or accelerated new forms of human interactions through instant messaging, Internet forums, and social networking. Online shopping has boomed both for major retail outlets and small artisans and traders. Business-to-business and financial services on the Internet affect supply chains across entire industries. In particular, internet has been used as an important component of business models. For the business operation, both business and customers apply the internet application such as website and e-mail on business activities. Therefore, information security of using internet as the media needs to be carefully concerned [2].

Intrusion, in simple words, is an illegal act of entering, seizing, or taking possession of another's property (in this case the property being the computer system). It means a code that disables the proper flowing of traffic on the network or steals the information from the traffic [3].

The other most common names for intrusion are virus, Trojan etc. The various divisions of intrusions can be listed as follows:-
DoS Attack - A denial-of-service attack (DoS attack) or distributed denial-of-service attack (DDoS attack) is an attempt to make a computer resource unavailable to its intended users. Although the means to carry out, motives for, and targets of a DoS attack may vary, it generally consists of the concerted efforts of a person or people to prevent an Internet site or service from functioning efficiently or at all, temporarily or indefinitely. Perpetrators of DoS attacks typically target sites or services hosted on high-profile web servers such as banks, credit card payment gateways, and even root name servers.                               

Remote to User (R2L) – This kind of attack describes the unauthorized access from a remote machine into the super user (root) account of the target system. It is a class of attack where an attacker sends packets to a machine over a network, then exploits the machine’s vulnerability to illegally gain local access as a user. There are different types of R2U attacks; where the most common attack in this category is the art of social engineering.
User to Root (U2R) – User to root attack defines the unauthorized access to local super user (root). These exploits are classes of attacks which an attacker starts out with access to a normal user account on the host system and is able to exploit vulnerability to gain root access to the system. Most common exploits in this class of attacks are regular buffer overflows, which are caused by regular programming mistakes and environment assumptions.
  Probing – Probing is a class of attack where        an attacker scans a network to gather information or find vulnerabilities. An attacker with a map of machines and services that are available on a network can use the information to look for exploits. There are different types of probes: some of them abuse the computer’s legitimate features; some of them use social engineering techniques. This class of attack is the most common and requires very little technical expertise.
In information security, intrusion detection is the act of detecting actions that attempt to compromise the confidentiality, integrity or availability of a resource. The recent advances of computer communication infrastructure realizes the era of computer-based data processing. The need for intrusion detection arises from the fact that computer systems are used in each and every aspect of this age of mainstream science and technology [4]. There broadly exist the following intrusion detection methodologies:-
1.   Anomaly Detection - it refers to detecting patterns in a given data set that do not conform to an established normal behavior. The patterns thus detected are called anomalies and often translate to critical and actionable information in several application domains. Anomalies are also referred to as outliers, surprise, aberrant, deviation, peculiarity, etc. It actually refers to storing features of user’s usual behaviors into database, then comparing user’s current behavior with those in the database. If there occurs a divergence huge enough, it is said that the data tested is abnormal. The advantage of anomaly detection lies in its complete irrelevance of the system, its strong versatility and the possibility to detect the attack that was never detected before. Anomaly-based intrusion detection is about discrimination of malicious and legitimate patterns of activities (system or user driven) in variables characterizing system normality [5]. But due to the fact that normal contour conducted cannot give a complete description of all user’s behaviors in the system, moreover each user’s behavior changes constantly, its main drawback is the high rate of false alarm (a failure of an IDS to detect an actual attack).
2.   Misuse Detection - In misuse detection approach, we define abnormal system behavior at first, and then define any other behaviour, as normal behavior.  It assumes that abnormal behavior and activity has a simple to define model. Its advantage is simplicity of adding known attacks to the model. Its disadvantage is its inability to recognize unknown attacks. Misuse Detection refers to confirming attack incidents by matching features through the attacking feature library. It advances in the high speed of detection and low percentage of false alarm. However, it fails in discovering the non-pre-designated attacks in the feature library, so it cannot detect the numerous new attacks.

Data mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into business intelligence giving an informational advantage. It is currently used in a wide range of profiling practices, such as marketing, surveillance, fraud detection, and scientific discovery. A primary reason for using data mining is to assist in the analysis of collections of observations of behavior. An unavoidable fact of data mining is that the (sub-) set(s) of data being analyzed may not be representative of the whole domain, and therefore may not contain examples of certain critical relationships and behaviors that exist across other parts of the domain [6].
Data mining technology is advanced for:
•   It can process large amount of data
•   It can discover the hidden and ignored information
Data mining commonly involves four classes of tasks:-
1.   Clustering – it is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.
2.   Classification – it is the task of generalizing known structure to apply to new data. For example, an email program might attempt to classify an email as legitimate or spam. Common algorithms include decision tree learning, nearest neighbor, Naive Bayesian classification, neural networks and support vector machines.
3.   Regression - Attempts to find a function which models the data with the least error.
4.   Association rule learning - Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.

Read More: Click here...