Audit guide 2 Association Rules Outline Goal Provide an overview of basic Association Rule mining techniques ? Association Rules Problem Overview ?? Large itemsets ? Association Rules Algorithms ?? Apriori ?? Eclat CExample Market Basket Data ? Items freq
Association Rules Outline Goal Provide an overview of basic Association Rule mining techniques ? Association Rules Problem Overview ?? Large itemsets ? Association Rules Algorithms ?? Apriori ?? Eclat CExample Market Basket Data ? Items frequently purchased together Bread ? PeanutButter ? Uses ?? Placement ?? Advertising ?? Sales ?? Coupons ? Objective increase sales and reduce costs CAssociation Rule De ?nitions ? Set of items I I I ? Im ? Transactions D t t ? tn tj ?? I ? Itemset Ii Ii ? Iik ?? I ? Support of an itemset Percentage of transactions which contain that itemset ? Large Frequent itemset Itemset whose number of occurrences is above a threshold CAssociation Rules Example I Beer Bread Jelly Milk PeanutButter Support of Bread PeanutButter is CAssociation Rule De ?nitions ? Association Rule AR implication X ? Y where X Y ?? I and X ?? Y ? Support of AR s X ? Y Percentage of transactions that contain X ??Y ? Con ?dence of AR ?? X ? Y Ratio of number of transactions that contain X ?? Y to the number that contain X CAssociation Rules Ex cont ? d CAssociation Rule Problem ? Given a set of items I I I ? Im and a database of transactions D t t ? tn where ti Ii Ii ? Iik and Iij ? I the Association Rule Problem is to identify all association rules X ? Y with a minimum support and con ?dence ? Link Analysis ? NOTE Support of X ? Y is same as support of X ?? Y CAssociation Rule Techniques Find Large Itemsets Generate rules from frequent itemsets CAlgorithm to Generate ARs CApriori ? Large Itemset Property Any subset of a large itemset is large ? Contrapositive If an itemset is not large none of its supersets are large CLarge Itemset Property CApriori Ex cont ? d s ?? CApriori Algorithm C Itemsets of size one in I Determine all large itemsets of size L i Repeat i i Ci Apriori-Gen Li- Count Ci to determine Li until no more large itemsets found CApriori-Gen ? Generate candidates of size i from large itemsets of size i ? Approach used join large itemsets of size i if they agree on i- ? May also prune candidates who have subsets that are not large CApriori-Gen Example CApriori-Gen Example cont ? d CApriori Adv Disadv ? Advantages ?? Uses large itemset property ?? Easily parallelized ?? Easy to implement ? Disadvantages ?? Assumes transaction database is memory resident ?? Requires up to m database scans CClassi ?cation based on Association Rules CBA ? Why ?? Can e ?ectively uncover the correlation structure in data ?? AR are typically quite scalable in practice ?? Rules are often very intuitive ? Hence classi ?er built on intuitive rules is easier to interpret ? When to use ?? On large dynamic datasets where class labels are available and the correlation structure is unknown ?? Multi- class
Documents similaires










-
43
-
0
-
0
Licence et utilisation
Gratuit pour un usage personnel Attribution requise- Détails
- Publié le Sep 04, 2021
- Catégorie Business / Finance
- Langue French
- Taille du fichier 122.5kB