Inte rnatio nal Jo urnal o f Sc ie ntific & Eng inee ring Re se arc h Vo lume 3, Issue 2 , Fe bruary -2012 1

ISSN 2229-5518

A Reconciling Website System to Enhance

Efficiency with Web Mining Techniques

Joy Shalom Sona, Prof. Asha Ambhaikar

Abs tractExisting w ebsite systems are not easier f or user to extract inf ormation and having some shortcomings. To enhance these shortcomings w e propose a new reconciling w ebsite system. It is new w ay to increase the eff iciency of w eb site system using w eb mining techniques. It w ill help to reorganize the w ebsite structure to increase brow sing eff iciency and also to make it easier f or user brow sing. This paper concentrates on the brow sing eff iciency of w ebsite. For achieving optimize eff iciency the paper introduces an algorithms to calculate eff iciency accurately and to suggest how to enhance user brow sing eff iciency. This can be achieved by w eb mining techniques.

Inde x TermsWeb Structure Mining; Web Content Mining; Reconciling Website System; Brow sing Eff iciency.

—————————— ——————————

1 INTRODUCTION


ODERN age of the Web is huge, diverse and dynamic. The W eb contains massively information and pr ovides
an access to it at any place at any time. The most of the people br owsing the internet for r etrieving information. But most of the time, they gets lots of insignificant and irrelevant document even after navigating several links. For r etrieving information fr om the Web, W eb mining techniques ar e used.

1.1 Web Mining Overview

W eb mining is an application of the data mining techniques to automatically discover and extra ct knowledge fr om the W eb. According to Kosala et al [2], Web mining consists of the fol- low ing tasks:

Resource finding: the task of r etr ieving intended W eb docu-

ments.

Information selection and pre-processing: automatically selecting and pr e-pr ocessing specific information from r etrieved W eb r esources.

Generalization: automatically discovers general patterns at indi- vidual W eb sites as well as acr oss multiple sites.

Analysis: validation and/or inter pr etation of the mined patter ns. Ther e ar e thr ee areas of W eb mining according to the usage of

the W eb data used as input in the data mining process, namely, W eb Content Mining (WCM), W eb Usage Mining (W UM) and W eb Str uctur e Mining (WSM).

———— ——— ——— ——— ———

Joy Shalom Sona is currently pursuing masters degree program in Computer Technology in RCET Bhilai , India, PH -09926152273. E-mail: sjoysha- lom@gmail.com

Prof. Asha Ambhaikar is currently working as Associate Professor in Comput-

er Science Engineering in RCET Bhilai, India, PH-09229655211. E-mail:

asha31.a@rediffmail.com

Fig.1 Web Mining Classificat ion

W eb content usage mining, W eb structure mining, and Web content mining. W eb usage mining r efer s to the discovery of user access patterns from Web usage logs. W eb str uctur e min- ing tries to discover useful knowledge fr om the structur e of hyperlinks which helps to investigate the node and connection str uctur e of web sites. According the type of w eb structur al data, w eb str uctur e mining can be divided into tw o kinds 1) extracting the documents fr om hyper links in the w eb 2) analy- sis of the tr ee-like str uctur e of page str uctur e. Based on the t o- pology of the hyperlinks, w eb str uctur e mining w ill categor ize the w eb page and generate the information, such as the similar- ity and mining is concer ned with the r etrieval of information fr om WWW into mor e structured form and indexing the in- formation to r etr ieve it quickly. W eb usage mining is the pr ocess of identifying the browsing patterns by analyzing the user ’s navigational b ehavior . W eb structur e mining is to dis- cover the model underlying the link structur es of the W eb pa g- es, catalog them and generate information such as the similar i- ty and relationship between them, taking advantage of their hyperlink topology. W eb classification is shown in Fig 1.

IJSER © 2012

http :// www.ijser.org

Inte rnatio nal Jo urnal o f Sc ie ntific & Eng inee ring Re se arc h Vo lume 3, Issue 2 , Fe bruary -2012 2

ISSN 2229-5518

1.2 Web Content Mining (WCM)

W eb Content Mining is the pr ocess of extracting useful infor-
mation from the contents of w eb documents. The web doc u-
ments may consists of text, images, audio, video or structured
r ecor ds like tables and lists. Mining can be applied on the web
documents as w ell the r esults pages produced fr om a sear ch engine. Ther e ar e tw o types of approach in content mining called agent based approach and database based appr oach. The agent based approach concentr ate on searching r elevant infor- mation using the characteristics of a particular domain to in- ter pret and or ganize the collected information. The database appr oach is used for r etr ieving the semi-structur e data fr om the w eb.

1.3 Web Usage Mining (WUM)

W eb Usage Mining is the pr ocess of extracting useful informa-
tion from the secondary data der ived fr om the interactions of
the user while sur fing on the W eb. It extracts data stor ed in server access logs, r eferr er logs, agent logs, client-side cookies, user pr ofile and meta data.

1.4 Web Structure Mining (WSM)

The goal of the W eb Structure Mining is to generate the struc- tural summary about the W eb site and W eb page. It tr ies to discover the link structure of the hyper links at the inter- document level. Based on the topology of the hyperlinks, Web Structur e mining will categor ize the W eb pages and generate the information like similar ity and r elationship between differ- ent W eb sites. This type of mining can be per formed at the document level (intra-page) or at the hy perlink level (inter- page). It is important to understand the W eb data structur e for Infor mation Retr ieval.

2 R ELATED WORK

2.1 WEB MI NING

Web mining has emer ged as a specialized field dur ing the last few year s and r efers to the application of know ledge discovery techniques specifically to web data. Web cont ent and web structur e mining, r espectively, r efer to the analysis of the con- tent of w eb pages and the str uctur e of links between them. Web usage mining, on the other hand, is the pr ocess of apply- ing data mining techniques to the discovery of patter ns in w eb data [5]. Web usage mining involves four steps: user identifica- tion, data pr e-pr ocessing, pattern discovery and analysis. User access patter ns ar e models of user br owsing activity. In most cases these ar e deduced from w eb server access logs. An alter- native method includes client-side logging, using techniques such as cookies. This is r eferr ed to as web -log mining [4]. Min- ing activities help us to know the data patter ns. User patter ns, extracted fr om Web data, have been applied to a w ide range of applications. Pr oj ects by Spiliopoulou and Faulstich (1998), Wu et al. (1998), Zaiane et al. (1998), Shahabi et al. (1998) have focused on Web Usage Mining in general, w ithout extensive tailoring of the pr ocess towards one of the var ious sub-
categor ies. The WebSIFT pr oj ect is designed to per form Web Usage Mining fr om server logs in the extended NSCA for mat. Chen et al. (1996) intr oduce the concept of maximal forward r efer ence to character ize user episodes for the mining of tra- versal patter ns. A maximal forward r efer ence is the sequence of pages r equested by a user up to the last page befor e bac k- tracking occur s dur ing a par ticular server session. The Speed- Tracer pr oj ect [Wu et al., 1998] fr om IBM Watson is built upon wor k or iginally r eported in Chen et al. (1996). In addition to episode identification, SpeedTr acer makes use of r eferr er and agent information in the pr epr ocessing r outines to identify users and server sessions in the absence of additional client side infor mation. The Web Utilization Miner (WUM) system [Spiliopoulou and Faulstich, 1998] pr ovides a r obust mining language in or der to specify char acteristics of discover ed fr e- quent paths that ar e inter esting to the analyst. Zaiane et al. (1998) have loaded Web server logs into a data cube str uctur e in order to per form data mining as well as On-Line Analytical Pr ocessing (OLAP) activities such as r oll-up and drill-down of the data. Their WebLogMiner system has been used to discov- er association rules, per form classification and time-ser ies analysis. Shahabi et al. (1997) and Zar kesh et al. (1997) have one of the few Web Usage mining systems that r ely on client side data collection. The client side agent sends back page r e- quest and time information to the server every time a page containing the Java applet is loaded or destr oyed [5].

2.2 Adaptive Website

User s interact with a w ebsite in multiple ways, while their
mental model about a particular subj ect can obviously differ fr om those of other user s and the w eb developer. Consequent- ly, impr oving the interaction betw een user s and web sites is of impor tance. Raskin [6] intr oduces var ious ways of quantifica- tion in measur ing inter face design in his book. Esp ecially, he mentions information-theor etic efficiency, which is defined similar ly to the way efficiency is defined in thermodynamics; in ther modynamics we calculate efficiency by dividing the pow er coming out of a pr ocess by the pow er going into the pr ocess. If, dur ing a certain time interval, an electrical genera- tor is producing 820 watts while it is dr iven by an engine that has an output of 1000 W, it has an efficiency 820/1000, or 0.82. Efficiency is also often expr essed as a per centage; in this case, the generator has an efficiency of 82%. This calculation can be applied to calculate the information efficiency. Sr ikant and Yang [7] pr opose an algorithm to automatically find pages in a website whose location is differ ent fr om wher e visitors expect to find them. The key insight is that visitor s w ill backtrack if they do not find the information w her e they expect it: the point fr om wher e they backtr ack is the expected location for the page. They also use a time thr eshold to distinguish wheth- er a page is tar get page or not. Nakayama et al. (2000) pr opos- es a technique that discovers the gap betw een w ebsite design- ers’ expectations and users’ b ehavior. The former ar e assessed by measuring the inter-page conceptual r elevance and the lat- ter by measuring the inter-page access co-occurr ence. They also suggest how to apply quantitative data obtained thr ough a multiple r egr ession analysis that pr edicts hyper link tr aversal fr equency from page layout featur es. Most adaptive systems

IJSER © 2012

http :// www.ijser.org

Inte rnatio nal Jo urnal o f Sc ie ntific & Eng inee ring Re se arc h Vo lume 3, Issue 2 , Fe bruary -2012 3

ISSN 2229-5518

include a pr ocedur e on mining w eb log to under stand user behaviors and patter ns and to impr ove their website automat- ically and efficiently. How ever , none of them try to calculate the efficiency to impr ove the w eb str uctur e. We want to apply the efficiency concept fr om [6] and develop the efficiency ca l- culation function.

3 M ETHODOLOGY

For implementing r econciling website system w e will pr oceed through user br owsing r ecor d and calculating br owsing effi- ciency. User br owsing r ecor ds can be collected thr ough br ow s- er cookies. The br owsing efficiency can be calculated by the ratio of infor mation accumulated when br owsing useful pages and all information accumulated when br owsing all pages in one browsing r oute fr om initial to final w eb pages.

4 C ONCLUSION

This paper pr oposed a Reconciling Website System which im- pr oves the browsing efficiency and suggests the r eor ganiza- tion of the web site. Reconciling Websites can make popular pages mor e accessible, highlight inter esting links, connected r elated pages. Adaptive web sites can advice to a site’s web- master , summar izing access infor mation and making sugges- tions. These suggestion based on the user br owsing behavior which incr ease the efficiency by r eor ganizing Web Str uctur e.

REFERENCES

[1] Ji-Hy un Le e, We i-Kun S hiu: An adaptiv e we bsite syste m to improve e ffic ie ncy with we b mining tec hnique s. Adv ance d Eng inee ring Info rmatic s 18 (3): 129-142 (2004)

[2] R. Kosala, H. Bloc kee l, “We b Mining Re se arc h: A S urvey ”, S IGKDD Explo ratio ns, Ne wsle tte r o f the ACM S pec ial Inte re st Gro up o n Kno wle dge Discove ry and Data Mining Vol. 2, No . 1 pp 1 -15, 2000.

[3] Pe rkowitz M, Etzio ni O. Adaptiv e we b site : an AI c halle ng e. IJCAI-

97 1997.

[4] Ko utri M, Daskalaki S , Avo uris N. Adaptiv e inte rac tio n with we b site : an ove rv iew o f me tho ds and te c hniques. Co mpute r Sc ie nce and Info rmatio n Tec hno log ie s, CS IT 2002.

[5] S riv astav a J, Coo ley R, Deshpande M, Tan P -N. We b usage mining : discove ry and applic atio ns o f usage patte rns fro m we b data. ACM S IGKDD 2000.

[6] Raskin J. The human inte rfac e , first e d. Me nlo, CA: S tratfo rd

Publishing , Inc .; 2000.

[7] S rikant R, Yang Y. Mining we b log s to improve we bsite o rg anizatio n.

ACM 2001.

[8] S pilio po ulo u M, Faulstic h L. Wum: A web utilizatio n mine r. EDBT Wo rksho p We bDB 98. S pain: V ale nc ia; 1998.

[9] Wu K-L, Yu P-S , Ballman A. S pee d-trac e r: A we b usage mining and analy sis too l. IBM Syste ms Jo urnal 1998;37(1).

[10] Zaiane O, Xin M, Han J. Discove ring we b acce ss patte rns and tre nds by apply ing o lap and data mini ng tec hno logy o n we b log s. In: Adv ance s in Dig ital Libraries. CA: S anta Barbara; 1998. p.19 –29.

[11] S hahabi C, Zarkesh A, Adibi J, S hah V. Kno wle dg e discove ry fro m use rs we b-page nav ig atio n Wo rksho p o n Rese arc h Issue s in Data Eng inee ring. Eng land: Birming ham; 1997.

[12] Che n M -S , Park J-S, Yu P-S . Data mining fo r path trav e rsal patte rns in a we b e nv iro nme nt. 16th Inte rnatio nal Co nfe re nce o n Distribute d Co mputing Syste ms 1996;385 –92.

[13] Zarke sh A, Adibi J, S hahabi C, S adri R, S hah V . Analy sis and de sig n o f se rv e r info rmative www-sites 6th Inte rnatio nal Co nfe re nce o n Info rmatio n and Kno wle dge Manage me nt. Nev ada: Las Veg as; 1997.

[14] Nakay ama T, Kato H, Yamane Y. Discove ring the g ap be twee n we b

site desig ne rs’ e xpec tatio ns and use rs’ be hav io r. Co mpute r Ne two rks

2000;33(1-6):811–22.

[15] Nie lse n//Ne tRating s_Global. Glo bal Inte rne t Inde x; Marc h 2001

[16] Wang , May and Ye n, Be njamin, "We b S truc ture Reo rg anizatio n to Impro ve We b Nav ig atio n Effic ie ncy" (2007). PACIS 2007 Proceedings. Pape r 46.

[17] M. Kilfo il , A. Gho rbani , W. Xing , Z. Le i , J. Lu , J. Zhang , X. Xu,” To ward An Adaptiv e We b: The S tate o f the Art and Sc ie nce (2003), CNS R 2003.

IJSER © 2012

http :// www.ijser.org