Author Topic: New Approach to Weighted Pattern Sequential Mining-Dataset  (Read 2143 times)

0 Members and 1 Guest are viewing this topic.

IJSER Content Writer

  • Sr. Member
  • ****
  • Posts: 327
  • Karma: +0/-1
    • View Profile
New Approach to Weighted Pattern Sequential Mining-Dataset
« on: August 20, 2011, 04:57:09 am »
Author : D.Saravana Kumar, N.Ananthi, D.Yadavaram
International Journal of Scientific & Engineering Research, Volume 2, Issue 5, May-2011
ISSN 2229-5518
Download Full Paper : PDF

Abstract - In real world data the knowledge used for mining rule is almost time varying. The item have the dynamic characteristic in terms of transaction , which have seasonal selling rate and it hold time-based association ship with another item. Traditional model of association rule mining is adapted to handle weighted association rule mining problems where each item is allowed to have a weight. Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database. End users of association rule mining tools encounter several in practice when data bases come with binary attributes. In this paper, we introduce a new measure, which does not require initially allotted weights .The quality of transactions is considered by link based models and a fast mining algorithm is adopted.

Keywords- Association Rule, Fast algorithm, weighted support.

THE classical model of association rule mining employs the support measure, which treats every transaction equally. In contrast, different transactions have different weights in real-life data sets. During recent years, one of active research topic is Association rule discovery. The association rule discovery is used to identify relationships between items in very large databases, to extract interesting correlations, associations among sets of items in the transaction databases or other data repositories.
For example, given a market basket database, it would be interesting for decision support to know the fact that 30% of customers who bought coca powder and sugar also bought butter. This analysis may be used to provide some basis if is required to increase the sales and introduce from free schemes like, if 3 kg of sugar is bought then 100g butter free. In a census database, we should discover that 20% of persons who worked last year earned more than the average income, or in a medical database, that 35% of patients who have cold also have sinus.
Association Rule Mining aims to explore large transaction databases for association rules, which may reveal the implicit relationships among the data attributes. It has number of practical applications, including classification, text mining, and Web log analysis, and Share Market and recommendation systems. The classical model of association rule mining employs the support measure, which treats every transaction equally. In contrast, different transactions have different weights in real-life data sets. For example, in the market basket data, each transaction is recorded with some profit. Much effort has been dedicated to association rule mining with preassigned weights. However, most data types do not come with such preassigned weights, such as Web site click-stream data. There should be some notion of importance in those data.
Data mining technology has emerged as a means for identifying patterns and trends from large quantities of data. Data mining, also known as Knowledge Discovery in Databases, has been defined as "The nontrivial extraction of implicit, previously unknown, and potentially useful information from data". Data mining is used to extract structured knowledge automatically from large data sets. The information that is ‘mined’ is expressed as a model of the semantic structure of the dataset, where in the prediction or classification of the obtained data is facilitated with the aid of the model.
The concept of association rule was first introduced. It proposed the support-confidence measurement framework and reduced association rule mining to the discovery of frequent item sets. The following year a fast mining algorithm, A priori, was proposed]. Much effort has been dedicated to the classical (binary) association rule mining problem since then. Numerous algorithms have been proposed to extract the rules more efficiently. These algorithms strictly follow the classical measurement framework and produce the same results once the minimum support and minimum confidence are given. 
WARM generalizes the traditional model to the case where items have weights. Ram Kumar et al. introduced weighted support of association rules based on the costs assigned to both items as well as transactions. An algorithm called WIS was proposed to derive the rules that have a weighted support larger than a given threshold. Cai et al defined weighted support in a similar way except that they only took item weights into account.
The goal is to steer the mining focus to those significant relationships involving items with significant weights rather than being flooded in the combinatorial explosion of insignificant relationships Discovery of association rules has been found useful in many applications.

Quantitative association rule mining problem has been introduced in  and some algorithms for quantitative values also have been proposed, where the algorithm finds association rules by partitioning the attribute domain, combining adjacent partitions and then transforming the problem into a binary state.
               Mining QARs by a generic BAR mining algorithm, however, is infeasible in most cases for the following reasons. First, QAR mining suffers from the same problem of a combinatorial explosion of attribute sets as does BAR mining; that is, given a set of N distinct attributes, the number of its non-empty subsets is (2N-1). However, as shown by
 it is necessary to combine the consecutive intervals of a quantitative attribute to gain sufficient support and more meaningful intervals. This leads to another combinatorial explosion problem: if the domain of a quantitative attribute is partitioned into n intervals, the total number of intervals of the attribute grows to O(n2) after combining the consecutive intervals. When we join the attributes in the mining process, the number of itemsets (i.e., a set of <attribute, interval> pairs) can become prohibitively large if the number of intervals associated with an attribute is large.
         The second one is caused by the sharp boundary between intervals. To dominant this problem, Mining fuzzy association rules for quantitative values has been considered by a number of researches [8]-[13], most of which have based their methods on the important APriori algorithm. Chan and Au introduced F-APACS for mining fuzzy association rules .Instead of using intervals, F-APACS employs linguistic terms to represent the revealed regularities and exceptions . Kuok’s algorithm expects the user or an expert to provide the required fuzzy sets of the quantitative attributes and their corresponding membership functions. Fu argues that experts may not give the right fuzzy sets and their corresponding membership functions. Hence, he proposed a method to find the fuzzy sets based on clustering techniques. Each of these researchers treated all attributes (or all the linguistic terms) as uniform. However, in real-world applications, the users perhaps have more interest in the rules that contain fashionable items. Gyenesei introduces the problem of mining weighted quantitative association rules based on fuzzy approach. He assigns weights to the fuzzy sets to reflect their importance to the user and proposes two different definitions of weighted support: with and without normalization similar to his previous method.
Ishibuchi et al. extended the genetic algorithm-based rule selection method in Ref. [16] to the case where various fuzzy partitions with different granularities are used for each input. This extension increases the number of candidate rules. Hence, they proposed a prescreening procedure which is based on two rule evaluation criteria of association rules, to decrease the number of candidate rules.
Kaya et al proposed an automated clustering method based on multi-objective genetic algorithms. This method automatically clusters the values of a given quantitative attribute in order to obtain large number of large itemsets in low duration.
             The support counting procedure of the Apriori algorithm has attracted voluminous research owing to the fact that the performance of the algorithm mostly relies on this aspect. Park et al. proposed an optimization, called DHP (Direct Hashing and Pruning) intended towards restricting the number of candidate itemstes, shortly following the Apriori algorithms mentioned above. Brin et al put forth the DIC algorithm that partitions the database into intervals of a fixed size so as to reduce the number of traversals through the database . Another algorithm called the CARMA algorithm (Continuous Association Rule Mining Algorithm) employs an identical technique in order to restrict the interval size to 1.

Read More: Click here...