Author Topic: A Study Of Web Navigation Pattern Using Clustering Algorithm In Web Log Files  (Read 2293 times)

0 Members and 1 Guest are viewing this topic.

IJSER Content Writer

  • Sr. Member
  • ****
  • Posts: 327
  • Karma: +0/-1
    • View Profile
Quote
Author : Mrs.V.Sujatha, Dr.Punithavalli
International Journal of Scientific & Engineering Research Volume 2, Issue 9, September-2011
ISSN 2229-5518
Download Full Paper : PDF

ABSTRACT -Web user navigation pattern is a heavily researched area in the field of web usage mining with wide range of applications. Web usage mining is the process of applying data mining techniques to the discovery of usage pattern from data extracted from web log files. Discovering hidden information from Web log data is called Web usage mining. The aim of discovering frequent patterns in Web log data is to obtain information about the navigational behavior of the users. This can be used for advertising purposes, for creating dynamic user profiles etc. In this paper four types of clustering approaches are investigated in web log files  to improve the quality of clustering for user navigation pattern in web usage mining systems, for predicting userís intuition in the large web sites.
           Index Term - Classification, Clustering, Web mining, Weblog data, and Web usage mining.

1. INTRODUCTION
The expansion of the World Wide Web (Web for short) has resulted in a large amount of data that is now in general freely available for user access. The different types of data have to be managed and organized in such a way that they can be accessed by different users efficiently. Several data mining methods are used to discover the hidden information in the Web. However, Web mining does not only mean applying data mining techniques to the data stored in the Web. The algorithms have to be modified such that they better suit the demands of the Web. New approaches should be used which better fit the properties of Web data. Furthermore, not only data mining algorithms, but also artificial intelligence, information retrieval and natural language processing techniques can be used efficiently. Thus, Web mining has been developed into an autonomous research area. Web mining The term web mining is coined by Etzioni in 1996, to signify the use of data mining techniques to automatically discover web documents and services, uncover general pattern on the web and to observe user behavior (viewing, book marking and browsing history).Web mining is the process of finding out what users are looking for on the internet .Some users might be looking at only textual data, whereas some others might be interested in multimedia data. Web usage mining is classified into three and are web content mining, web structure mining, web usage mining.
                          Web usage mining focuses on techniques that could predict user behavior while the user interacts with the web. As mentioned before the mined data in this category are the secondary data on the web as the result of interaction. These data could range very widely but generally it is classified into usage data that resides in the web client, proxy server and servers.

The aim of understanding the navigation preferences of the visitors is to enhance the quality of electronic commerce services ecommerce, to personalize the Web portals or to improve the Web structure and Web server performance. The first stage is preprocessing, next stage is pattern discovery and the last stage is pattern analysis.

2. WEB USAGE MINING ARCHITECTURE

2.1. Preprocessing
Pre-processing "consists of converting the usage, content,  and structure information contained in the various available data sources into the data abstractions necessary for pattern discovery". This step can break into at least four sub steps: Data Cleaning, User Identification, Session Identification and Formatting. Unneeded data will be deleted from raw data in web log files in the data cleaning step. At least two log file formats exists: Common Log File format (CLF) and Extended Log File format ([16] for more details). Our university log file consists of these fields: Date, Time, client IP address, Method, URI stem, Protocol status, Bytes sent, Protocol version, Host, User Agent and Referrer.
 
2.2 Pattern Discovery
* Statistical Analysis such as frequency analysis, mean, median, etc.
* Clustering of users help to discover groups of users with similar navigation patterns (provide personalized Web content).
* Classification is the technique to map a data item into one of several predefined classes.
* Association Rules discover correlations among pages accessed together by a client.
* Sequential Patterns extract frequently occurring inter-session patterns such that the presence of a set of items s followed by another item in time order.
* Dependency Modeling determines if there are any significant dependencies among the variables in the Web.
2.3 Pattern Analysis
Pattern Analysis is the final stage of WUM (Web Usage Mining), which involves the validation and interpretation of the mined pattern.
Validation: to eliminate the irrelevant rules or patterns and to extract the interesting rules or patterns from the output of the pattern discovery process.
Interpretation: the output of mining algorithms is mainly in mathematic form and not suitable for direct human interpretations.

3. RELATED WORK
Identifying Web browsing strategies is a crucial step in Website design and evaluation, and requires approaches that provide information on both the extent of any particular type of user behavior and the motivations for such behavior [9].Pattern discovery from web data is the key component of web mining and it converge algorithms and techniques from several research areas. Baraglia and Palmerini (2002) proposed a WUM system called SUGGEST that provide useful information to make easier the web user navigation and to optimize the web server performance. Liu and Keselj (2007) proposed the automatic classification of web user navigation patterns and proposed a novel approach to classifying user navigation patterns and predicting usersí future requests and Mobasher (2003) presents a Web Personalizer system which provides dynamic recommendations, as a list of hypertext links, to users. Jespersen et al. (2002) [10] proposed a hybrid approach for analyzing the visitor click sequences. Jalali et al. (2008a [7] and 2008b [8]) proposed a system for discovering user navigation patterns using a graph partitioning model. An undirected graph based on connectivity between each pair of Web pages was considered and weights were assigning to edges of the graph. Dixit and Gadge (2010) [5] presented another user navigation pattern mining system based on the graph partitioning. An undirected graph based on connectivity between Referrer and URI pages was presented along with a preprocessing method to process unprocessed web log file and a formula for assigning weights to edges of the undirected graph. Ant-based clustering due to its flexibility and self-organization has been applied in a variety of areas from problems arising in e-commerce to circuit design, and text-mining to web-mining, etc (Jianbin et al., 2000. The various works proposed in this area with particular emphasize on web usage mining, clustering and classification was provided in this section. In this present work, research work is one another attempt made to propose a hybrid system that uses clustering and classification methods to discover the userís navigation pattern and analyze them from the serverís web log file.

Read More: Click here...