ๅฐŠๆ•ฌ็š„ ๅพฎไฟกๆฑ‡็Ž‡๏ผš1ๅ†† โ‰ˆ 0.046078 ๅ…ƒ ๆ”ฏไป˜ๅฎๆฑ‡็Ž‡๏ผš1ๅ†† โ‰ˆ 0.046168ๅ…ƒ [้€€ๅ‡บ็™ปๅฝ•]
SlideShare a Scribd company logo
Mining Transactional Data
        Ted Dunning - 2004
Outline
โ—   What are LLR tests?
    โ€“   What value have they shown?
โ—   What are transactional values?
    โ€“   How can we define LLR tests for them?
โ—   How can these methods be applied?
    โ€“   Modeling architecture examples
โ—   How new is this?
Log-likelihood Ratio Tests
โ—   Theorem due to Chernoff showed that
    generalized log-likelihood ratio is asymptotically
    ๎ƒŒ2 distributed in many useful cases
โ—   Most well known statistical tests are either
    approximately or exactly LLR tests
    โ€“   Includes z-test, F-test, t-test, Pearson's ๎ƒŒ2
โ—   Pearson's ๎ƒŒ2 is an approximation valid for large
    expected counts ... G2 is the exact form for
    multinomial contingency tables
Mathematical Definition
โ—   Ratio of maximum likelihood under the null
    hypothesis to the unrestricted maximum
    likelihood
                      max l ๎‚ž X โˆฃ๎‚พ๎‚Ÿ
                 ๎ƒ= max l ๎‚ž X โˆฃ๎‚พ๎‚Ÿ
                      ๎‚พโˆˆ๎‚ถ0


                      ๎‚พโˆˆ๎‚ถ


                  d.o.f.=dim ๎‚ถโˆ’dim ๎‚ถ0
โ—   -2 log ๎ƒ is asymptotically ๎ƒŒ2 distributed
Comparison of Two Observations
โ—   Two independent observations, X1 and X2 can be
    compared to determine whether they are from the
    same distribution
                ๎‚ž๎‚พ1 , ๎‚พ2 ๎‚Ÿ โˆˆ ๎‚ถร—๎‚ถ
                       max           l ๎‚ž X 1โˆฃ๎‚พ๎‚Ÿl ๎‚ž X 2โˆฃ๎‚พ๎‚Ÿ
                ๎ƒ=      ๎‚พโˆˆ๎‚ถ

                        max l ๎‚ž X 1โˆฃ๎‚พ1 ๎‚Ÿl ๎‚ž X 2โˆฃ๎‚พ2 ๎‚Ÿ
                     ๎‚พ1 โˆˆ๎‚ถ , ๎‚พ2 โˆˆ๎‚ถ


                d.o.f.=dim ๎‚ถ
History of LLR Tests for โ€œTextโ€
โ—   Statistics of Surprise and Coincidence
โ—   Genomic QA tools
โ—   Luduan
โ—   HNC text-mining, preference mining
โ—   MusicMatch recommendation engine
How Useful is LLR?
โ—   A test in 1997 showed that a query construction
    system using LLR (Luduan) decreased the error
    rate of the best document routing system
    (Inquery) by approximately 5x at 10% recall and
    nearly 2x at 20% recall
โ—   Language and species ID programs showed
    similar improvements versus state of the art
โ—   Previously unsuspected structure around intron
    splice sites was discovered using LLR tests
TREC Document Routing Results
               1

              0.9

              0.8
                                                 Luduan vs Inquery
              0.7

              0.6
  Precision




              0.5

              0.4
                                          Inquery
              0.3
                                          Luduan
              0.2                         Convectis

              0.1

               0
                    0   0.1   0.2   0.3    0.4        0.5   0.6   0.7   0.8   0.9   1
                                                 Recall
What are Transactional Variables?
โ—   A transactional sequence is a sequence of
    transactions.
โ—   Transactions are instances of a symbol and
    (optionally) a time and an amount:
                 Z =๎‚ž z 1 ... z N ๎‚Ÿ
                 z i =๎‚ž๎ƒˆ i , t i , x i ๎‚Ÿ
                 ๎ƒˆ i โˆˆ๎‚ฒ , an alphabet of symbols
                 t i , x i โˆˆโ„
Example - Text
โ—   A textual document is a transactional sequence
    without times or amounts

                 Z =๎‚ž ๎ƒˆ 1 ... ๎ƒˆ N ๎‚Ÿ
                 ๎ƒˆ i โˆˆ๎‚ฒ
Example โ€“ Traffic Violation History
โ—   A history of traffic violations is a (hopefully
    empty) sequence of violation types and
    associated dates (times)

              Z =๎‚ž z 1 ... z N ๎‚Ÿ
              z i =๎‚ž๎ƒˆ i , t i ๎‚Ÿ
              ๎ƒˆ i โˆˆ{stop-sign , speeding , DUI ,...}
              t i โˆˆโ„
Example โ€“ Speech Transcript
โ—   A conversation between a and b can be rendered
    as a transactions containing words spoken by
    either a or b at particular times:
                Z =๎‚ž z 1 ... z N ๎‚Ÿ
                z i =๎‚ž๎ƒˆ i , t i ๎‚Ÿ
                ๎ƒˆ i โˆˆ{a , b}ร—๎‚ฒ
                t i โˆˆโ„
Example โ€“ Financial History
โ—   A credit card history can be viewed as a
    transactional sequence with merchant code, date
    (=time) and amount:

     Z =๎‚ž z 1 ... z N ๎‚Ÿ         9/03/03
                                9/04/03
                                          Cash Advance
                                          Groceries
                                                             $300
                                                               79
                                9/07/03   Fuel                 21
     z i =โŒฉ๎ƒˆ i , t i , x i โŒช    9/10/03   Groceries            42
                                9/23/03   Department Store    173
     ๎ƒˆ i โˆˆ๎‚ฒ                    10/03/03   Payment            -600
                               10/09/03   Hotel & Motel       104
     t i โˆˆโ„                    10/17/03   Rental Cars         201
                               10/24/03   Lufthansa           838
Proposed Evolution
                Transaction
                  Mining
                              Augmented
   LLR tests                      Data


Transactional                    Luduan,
Data                                 etc
   Data                        LLR tests
 Augmentation
                   Text
LLR for Transaction Sequence
โ—   Assuming reasonable interactions between
    timing, symbol selection and amount distribution,
    LLR test can be decomposed
โ—   Two major terms remain, one for symbols and
    timing together, one for amounts

       LLR= LLR๎‚žsymbols & timing๎‚Ÿ๎‚ƒ LLR๎‚žamounts๎‚Ÿ
Anecdotal Observations
โ—   Symbol selection often looks multinomial, or
    (rarely) Markov
โ—   Timing is often nearly Poisson (but rate depends
    on which symbol)
โ—   Distribution of amount appears to depend on
    symbol, but generally not on inter-transaction
    timing. Mixed discrete/continuous distributions
    are common in financial settings
Transaction Sequence Distributions
โ—   Mixed Poisson distributions give desired
    symbol/timing behavior
โ—   Amount distribution depends on symbol
                       k ๎ƒˆ โˆ’๎ƒ‚๎ƒˆ T
                ๎‚ž๎ƒ‚๎ƒˆ T ๎‚Ÿ e
     p๎‚žZ ๎‚Ÿ= โˆ                         โˆ          p๎‚ž x iโˆฃ ๎‚พ๎ƒˆ ๎‚Ÿ
           ๎ƒˆ โˆˆ๎‚ฒ       k ๎ƒˆ!           i=1. .. N
                                                                i




            [               ][                    ]โˆ
                       k๎ƒˆ                  โˆ’๎ƒ‚ T
                    ๎ƒ†                N
                                 ๎‚ž๎ƒ‚T ๎‚Ÿ e
     p๎‚žZ ๎‚Ÿ= N ! โˆ
                       ๎ƒˆ
                                                                    p๎‚ž x iโˆฃ ๎‚พ๎ƒˆ ๎‚Ÿ
                ๎ƒˆโˆˆ๎‚ฒ k ๎ƒˆ !            N!                                       i
                                                    i=1. .. N

     ๎ƒ‚๎ƒˆ =๎ƒ‚๎ƒ†๎ƒˆ , โˆ‘ ๎ƒ†๎ƒˆ =1
                ๎ƒˆ โˆˆ๎‚ฒ
LLR for Multinomial
โ—   Easily expressed as entropy of contingency table



                       [                                  ]
                           k 11   k 12   ...       k1 n       k 1*
                           k 21   k 22   ...       k2n        k 2*
                           โ‹ฎ      โ‹ฎ      โ‹ฑ         โ‹ฎ          โ‹ฎ
                           k m1   k m2   ...       k mn       k m*
                           k * 1 k * 2 ... k * n              k **

    โˆ’2 log ๎ƒ=2 N
                    ๎‚ž โˆ‘ ๎ƒ†ij log ๎ƒ†ij โˆ’โˆ‘ ๎ƒ†i * log ๎ƒ†i *โˆ’โˆ‘ ๎ƒ†* j log ๎ƒ†* j ๎‚Ÿ
                      ij                       i                        j

                     k ij k **              ๎ƒ†ij
    log ๎ƒ=โˆ‘ k ij log            =โˆ‘ k ij log                          d.o.f.=๎‚žmโˆ’1๎‚Ÿ๎‚žnโˆ’1๎‚Ÿ
          ij         k i * k * j ij         ๎ƒ†* j
LLR for Poisson Mixture
โ—   Easily expressed using timed contingency table



                  [                             โˆฃ]
                  k 11      k 12   ...   k1n    t1
                  k 21      k 22   ...   k 2n   t2
                  โ‹ฎ         โ‹ฎ      โ‹ฑ     โ‹ฎ      โ‹ฎ
                  k m1      k m2   ...   k mn   tm
                      k * 1 k * 2 ... k * n โˆฃ t *

                              k ij t *              ๎ƒ‚ij
             log ๎ƒ=โˆ‘ k ij log           =โˆ‘ k ij log
                     ij        t i k * j ij         ๎ƒ‚* j
             d.o.f.=๎‚žmโˆ’1๎‚Ÿ n
LLR for Normal Distribution
โ—   Assume X1 and X2 are normally distributed
โ—   Null hypothesis of identical mean and variance


                                                     ๎‚
                            โˆ’๎‚ž xโˆ’๎ƒ‚๎‚Ÿ2

p ๎‚ž xโˆฃ๎ƒ‚ , ๎ƒˆ ๎‚Ÿ=
                     1
                        e      2 ๎ƒˆ2
                                       ๎‚‘
                                       ๎ƒ‚=
                                          โˆ‘ xi   ๎‚‘
                                                 ๎ƒˆ=
                                                    โˆ‘ ๎‚ž x i โˆ’๎ƒ‚๎‚Ÿ2
                 ๎‚ 2 ๎ƒ†๎ƒˆ                   N              N


                            ๎‚ž๎‚‘
                             ๎ƒˆ
           โˆ’2 log ๎ƒ=2 N 1 log ๎‚ƒN 2 log
                             ๎‚‘
                             ๎ƒˆ1
                                       ๎‚‘
                                       ๎ƒˆ
                                       ๎‚‘
                                       ๎ƒˆ2        ๎‚Ÿ
           d.o.f.=2
Calculations
โ—   Assume X1 and X2 are normally distributed
โ—   Null hypothesis of identical mean and variance

           p ๎‚ž xโˆฃ๎ƒ‚ ,๎ƒˆ๎‚Ÿ=
                             1
                         ๎‚ 2๎ƒ† ๎ƒˆ
                                e
                                       โˆ’๎‚ž xโˆ’๎ƒ‚๎‚Ÿ2
                                          2๎ƒˆ 2
                                             ๎ƒ‚= i
                                             ๎‚‘
                                                N
                                                      โˆ‘ xi
                                                           ๎ƒˆ= i
                                                           ๎‚‘
                                                                      ๎‚N
                                                                          โˆ‘ ๎‚ž xโˆ’๎ƒ‚๎‚Ÿ2

           log p๎‚ž X 1โˆฃ๎ƒ‚ , ๎ƒˆ ๎‚Ÿ๎‚ƒlog p๎‚ž X 1โˆฃ๎ƒ‚ ,๎ƒˆ ๎‚Ÿโˆ’log p๎‚ž X 1โˆฃ๎ƒ‚1, ๎ƒˆ 1 ๎‚Ÿโˆ’log p๎‚ž X 2โˆฃ๎ƒ‚2, ๎ƒˆ 2 ๎‚Ÿ=

           โˆ’     โˆ‘ [
               i=1. . N 1
                            log ๎‚ 2 ๎ƒ†๎‚ƒlog ๎ƒˆ๎‚ƒ
                                            ๎‚ž x 1i โˆ’๎ƒ‚๎‚Ÿ2
                                                2 ๎ƒˆ2      ] [
                                                        โˆ’ โˆ‘ log ๎‚ 2 ๎ƒ†๎‚ƒlog ๎ƒˆ๎‚ƒ
                                                         i=1. . N
                                                              2
                                                                            ๎‚ž x 2 i โˆ’๎ƒ‚๎‚Ÿ2
                                                                                 2 ๎ƒˆ2        ]
                 โˆ‘ [                                       ] โˆ‘[                                      ]
                                                          2                                      2
                                               ๎‚ž x โˆ’๎ƒ‚ ๎‚Ÿ                             ๎‚ž x โˆ’๎ƒ‚ ๎‚Ÿ
           ๎‚ƒ                log ๎‚ 2 ๎ƒ†๎‚ƒlog ๎ƒˆ 1 ๎‚ƒ 1i 2 1 ๎‚ƒ          log ๎‚ 2 ๎ƒ†๎‚ƒlog ๎ƒˆ 2๎‚ƒ 2i 2 2
               i=1. . N 1                          2 ๎ƒˆ1  i=1. . N 2
                                                                                        2 ๎ƒˆ2

          โˆ’2 log ๎ƒ=2 N 1 log
                              ๎‚ž      ๎ƒˆ
                                     ๎ƒˆ1
                                        ๎‚ƒN 2 log
                                                 ๎ƒˆ
                                                 ๎ƒˆ2   ๎‚Ÿ
           d.o.f.=2
Transactional Data in Context
             Real-world input often
             consists of one or more
             bags of transactional values
             combined with an
             assortment of conventional
  1.2        numerical or categorial
  34 years
  male       values.

             Extracting information from
             the transactional data can be
             difficult and is often,
             therefore, not done.
Real World Target Variables
             Mislabeled   a   Secondary
             Instances         Labels




                                          b




       Labeled
       as Red
Luduan Modeling Methodology
โ—   Use LLR tests to find exemplars (query terms)
    from secondary label sets
โ—   Create positive and negative secondary label
    models for each class of transactional data
โ—   Cluster using output of all secondary label
    models and all conventional data
โ—   Test clusters for stability
โ—   Use distance cluster centroids and/or secondary
    label models as derived input variables
Example #1- Auto Insurance
โ—   Predict probability of attrition and loss for auto
    insurance customers
โ—   Transactional variables include
    โ€“   Claim history
    โ€“   Traffic violation history
    โ€“   Geographical code of residence(s)
    โ€“   Vehicles owned
โ—   Observed attrition and loss define past behavior
Derived Variables
โ—   Split training data according to observable classes
    โ€“   These include attrition and loss > 0
โ—   Define LLR variables for each class/variable
    combination
โ—   These 2 m v derived variables can be used for
    clustering (spectral, k-means, neural gas ...)
โ—   Proximity in LLR space to clusters are the new
    modeling variables
Results
โ—   Conventional NN modeling by competent analyst
    was able to explain 2% of variance
    โ€“   No significant difference on training/test data
โ—   Models built using Luduan based cluster
    proximity variables were able to explain 70% of
    variance (KS approximately 0.4)
    โ€“   No significant difference on training/test data
Example #2 โ€“ Fraud Detection
โ—   Predict probability that an account is likely to
    result in charge-off due to payment fraud
โ—   Transactional variables include
    โ€“   Zip code
    โ€“   Recent payments and charges
    โ€“   Recent non-monetary transactions
โ—   Bad payments, charge-off, delinquency are
    observable behavioral outcomes
Derived Variables
โ—   Split training data according to observable classes
    (charge-off, NSF payment, delinquency)
โ—   Define LLR variables for each class/variable
    combination
โ—   These 2 m v derived variables can be used
    directly as model variables
โ—   No results available for publication
Example #3 โ€“ E-commerce monitor
โ—   Detect malfunctions or changes in behavior of e-
    commerce system due to fraud or system failure
โ—   Transaction variables include (time, SKU,
    amount)
โ—   Desired output is alarm for operational staff
Derived Variables
โ—   Time warp derived as product of smoothed daily
    and weekly sales rates
โ—   Time warp updated monthly to account for
    seasonal variations
โ—   Warped time used in transactions
โ—   Warped time since last transaction โ‰ˆ LLR in
    single product/single price case
โ—   Full LLR allows testing for significant difference
    in Champion/Challenger e-commerce optimizer
Transductive Derived Variables
โ—   All objective segmentations of data provide new
    LLR variables
โ—   Cross product of model outputs versus objective
    segmentation provide additional LLR variables
    for second level model derivation
โ—   Comparable to Luduan query construction
    technique โ€“ TREC pooled evaluation technique
    provided cross product of relevance versus
    perceived relevance
Relationship To Risk Tables
โ—   Risk tables are estimate of relative risk for each
    value of a single symbolic variable
    โ€“   Useful with variables such as post-code of primary
        residence
    โ€“   Ad hoc smoothing used to deal with small counts
โ—   Not usually applied to symbol sequences
โ—   Risk tables ignore time entirely
โ—   Risk tables require considerable analyst finesse
Relationship to Known Techniques
โ—   Clock-tick symbols
    โ€“   Time-embedded symbols viewed as sequences of
        symbols along with โ€œticksโ€ that occur at fixed time
        intervals
    โ€“   Allows multinomial LLR as poor man's mixed
        Poisson LLR
โ—   Not a well known technique, not used in
    production models
โ—   Difficulties in choosing time resolution and
    counting period
Conclusions
โ—   Theoretical properties of transaction variables are
    well defined
โ—   Similarities to known techniques indicates low
    probability of gross failure
โ—   Similarity to Luduan techniques suggests high
    probability of superlative performance
โ—   Transactional LLR statistics define similarity
    metrics useful for clustering

More Related Content

What's hot

Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
maha797959
ย 
Markov Models
Markov ModelsMarkov Models
Markov Models
Vu Pham
ย 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
Davis David
ย 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Ghulam Imaduddin
ย 
Link Analysis
Link AnalysisLink Analysis
Link Analysis
Yusuke Yamamoto
ย 
pipeline and vector processing
pipeline and vector processingpipeline and vector processing
pipeline and vector processing
Acad
ย 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
Pabna University of Science & Technology
ย 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
Knoldus Inc.
ย 
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
TEJVEER SINGH
ย 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
Sung Yub Kim
ย 
Knowledge representation and Predicate logic
Knowledge representation and Predicate logicKnowledge representation and Predicate logic
Knowledge representation and Predicate logic
Amey Kerkar
ย 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
hktripathy
ย 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan
ย 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
Umair Shafique
ย 
Association rules
Association rulesAssociation rules
Association rules
Dr. C.V. Suresh Babu
ย 
Recognition-of-tokens
Recognition-of-tokensRecognition-of-tokens
Recognition-of-tokens
Dattatray Gandhmal
ย 
Markov presentation
Markov presentationMarkov presentation
Markov presentation
SUBHABRATA MAITY
ย 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning system
swapnac12
ย 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
Yugal Kumar
ย 
Heuristic search
Heuristic searchHeuristic search
Heuristic search
NivethaS35
ย 

What's hot (20)

Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
ย 
Markov Models
Markov ModelsMarkov Models
Markov Models
ย 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
ย 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
ย 
Link Analysis
Link AnalysisLink Analysis
Link Analysis
ย 
pipeline and vector processing
pipeline and vector processingpipeline and vector processing
pipeline and vector processing
ย 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
ย 
Association Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset GenerationAssociation Rule Learning Part 1: Frequent Itemset Generation
Association Rule Learning Part 1: Frequent Itemset Generation
ย 
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
ย 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
ย 
Knowledge representation and Predicate logic
Knowledge representation and Predicate logicKnowledge representation and Predicate logic
Knowledge representation and Predicate logic
ย 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
ย 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
ย 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
ย 
Association rules
Association rulesAssociation rules
Association rules
ย 
Recognition-of-tokens
Recognition-of-tokensRecognition-of-tokens
Recognition-of-tokens
ย 
Markov presentation
Markov presentationMarkov presentation
Markov presentation
ย 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning system
ย 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
ย 
Heuristic search
Heuristic searchHeuristic search
Heuristic search
ย 

Viewers also liked

Introduction to data mining technique
Introduction to data mining techniqueIntroduction to data mining technique
Introduction to data mining technique
Pawneshwar Datt Rai
ย 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
Saif Ullah
ย 
Intelligent Search
Intelligent SearchIntelligent Search
Intelligent Search
Ted Dunning
ย 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
Sunny Gandhi
ย 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
COSTARCH Analytical Consulting (P) Ltd.
ย 
Distributed Databases
Distributed DatabasesDistributed Databases
Distributed Databases
elliando dias
ย 
Centralised and distributed databases
Centralised and distributed databasesCentralised and distributed databases
Centralised and distributed databases
Forrester High School
ย 
Lecture 11 - distributed database
Lecture 11 - distributed databaseLecture 11 - distributed database
Lecture 11 - distributed database
HoneySah
ย 
Datacube
DatacubeDatacube
Datacube
man2sandsce17
ย 
DATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGEDATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGE
Neeraj Goswami
ย 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
Albert Orriols-Puig
ย 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
DataminingTools Inc
ย 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
DataminingTools Inc
ย 
Data cubes
Data cubesData cubes
Data cubes
Mohammed
ย 
Data Processing-Presentation
Data Processing-PresentationData Processing-Presentation
Data Processing-Presentation
nibraspk
ย 
Distributed database
Distributed databaseDistributed database
Distributed database
ReachLocal Services India
ย 
Data cube computation
Data cube computationData cube computation
Data cube computation
Rashmi Sheikh
ย 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
ย 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
Sulemang
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Jason Rodrigues
ย 

Viewers also liked (20)

Introduction to data mining technique
Introduction to data mining techniqueIntroduction to data mining technique
Introduction to data mining technique
ย 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
ย 
Intelligent Search
Intelligent SearchIntelligent Search
Intelligent Search
ย 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
ย 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
ย 
Distributed Databases
Distributed DatabasesDistributed Databases
Distributed Databases
ย 
Centralised and distributed databases
Centralised and distributed databasesCentralised and distributed databases
Centralised and distributed databases
ย 
Lecture 11 - distributed database
Lecture 11 - distributed databaseLecture 11 - distributed database
Lecture 11 - distributed database
ย 
Datacube
DatacubeDatacube
Datacube
ย 
DATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGEDATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGE
ย 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
ย 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
ย 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
ย 
Data cubes
Data cubesData cubes
Data cubes
ย 
Data Processing-Presentation
Data Processing-PresentationData Processing-Presentation
Data Processing-Presentation
ย 
Distributed database
Distributed databaseDistributed database
Distributed database
ย 
Data cube computation
Data cube computationData cube computation
Data cube computation
ย 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
ย 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
ย 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ย 

Similar to Transactional Data Mining

Transactional Data Mining Ted Dunning 2004
Transactional Data Mining Ted Dunning 2004Transactional Data Mining Ted Dunning 2004
Transactional Data Mining Ted Dunning 2004
MapR Technologies
ย 
Algo complexity
Algo complexityAlgo complexity
Algo complexity
Zร…hid Islร…m
ย 
Interactive Visualization in Human Time -StampedeCon 2015
Interactive Visualization in Human Time -StampedeCon 2015Interactive Visualization in Human Time -StampedeCon 2015
Interactive Visualization in Human Time -StampedeCon 2015
StampedeCon
ย 
Asymptotic notation
Asymptotic notationAsymptotic notation
Asymptotic notation
mustafa sarac
ย 
Algorithms - A Sneak Peek
Algorithms - A Sneak PeekAlgorithms - A Sneak Peek
Algorithms - A Sneak Peek
BADR
ย 
Description and retrieval of medical visual information based on language mod...
Description and retrieval of medical visual information based on language mod...Description and retrieval of medical visual information based on language mod...
Description and retrieval of medical visual information based on language mod...
Antonio Foncubierta Rodriguez
ย 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx
pallavidhade2
ย 
On estimating the integrated co volatility using
On estimating the integrated co volatility usingOn estimating the integrated co volatility using
On estimating the integrated co volatility using
kkislas
ย 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conq
Pradeep Bisht
ย 
Unit 3
Unit 3Unit 3
Unit 3
Unit 3Unit 3
Unit 3
guna287176
ย 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream Cipher
Amirul Wiramuda
ย 
Introduction to pairtrading
Introduction to pairtradingIntroduction to pairtrading
Introduction to pairtrading
Kohta Ishikawa
ย 
Ai32647651
Ai32647651Ai32647651
Ai32647651
IJMER
ย 
19. algorithms and-complexity
19. algorithms and-complexity19. algorithms and-complexity
19. algorithms and-complexity
ashishtinku
ย 
Lec10
Lec10Lec10
The convenience yield implied by quadratic volatility smiles presentation [...
The convenience yield implied by quadratic volatility smiles   presentation [...The convenience yield implied by quadratic volatility smiles   presentation [...
The convenience yield implied by quadratic volatility smiles presentation [...
yigalbt
ย 
Unit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdfUnit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdf
AmayJaiswal4
ย 
11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...
Alexander Decker
ย 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
Dr. C.V. Suresh Babu
ย 

Similar to Transactional Data Mining (20)

Transactional Data Mining Ted Dunning 2004
Transactional Data Mining Ted Dunning 2004Transactional Data Mining Ted Dunning 2004
Transactional Data Mining Ted Dunning 2004
ย 
Algo complexity
Algo complexityAlgo complexity
Algo complexity
ย 
Interactive Visualization in Human Time -StampedeCon 2015
Interactive Visualization in Human Time -StampedeCon 2015Interactive Visualization in Human Time -StampedeCon 2015
Interactive Visualization in Human Time -StampedeCon 2015
ย 
Asymptotic notation
Asymptotic notationAsymptotic notation
Asymptotic notation
ย 
Algorithms - A Sneak Peek
Algorithms - A Sneak PeekAlgorithms - A Sneak Peek
Algorithms - A Sneak Peek
ย 
Description and retrieval of medical visual information based on language mod...
Description and retrieval of medical visual information based on language mod...Description and retrieval of medical visual information based on language mod...
Description and retrieval of medical visual information based on language mod...
ย 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx
ย 
On estimating the integrated co volatility using
On estimating the integrated co volatility usingOn estimating the integrated co volatility using
On estimating the integrated co volatility using
ย 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conq
ย 
Unit 3
Unit 3Unit 3
Unit 3
ย 
Unit 3
Unit 3Unit 3
Unit 3
ย 
Block Cipher vs. Stream Cipher
Block Cipher vs. Stream CipherBlock Cipher vs. Stream Cipher
Block Cipher vs. Stream Cipher
ย 
Introduction to pairtrading
Introduction to pairtradingIntroduction to pairtrading
Introduction to pairtrading
ย 
Ai32647651
Ai32647651Ai32647651
Ai32647651
ย 
19. algorithms and-complexity
19. algorithms and-complexity19. algorithms and-complexity
19. algorithms and-complexity
ย 
Lec10
Lec10Lec10
Lec10
ย 
The convenience yield implied by quadratic volatility smiles presentation [...
The convenience yield implied by quadratic volatility smiles   presentation [...The convenience yield implied by quadratic volatility smiles   presentation [...
The convenience yield implied by quadratic volatility smiles presentation [...
ย 
Unit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdfUnit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdf
ย 
11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...
ย 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
ย 

More from Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
Ted Dunning
ย 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
Ted Dunning
ย 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
Ted Dunning
ย 
Anomaly Detection: How to find what you didnโ€™t know to look for
Anomaly Detection: How to find what you didnโ€™t know to look forAnomaly Detection: How to find what you didnโ€™t know to look for
Anomaly Detection: How to find what you didnโ€™t know to look for
Ted Dunning
ย 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Ted Dunning
ย 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
Ted Dunning
ย 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
Ted Dunning
ย 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
Ted Dunning
ย 
T digest-update
T digest-updateT digest-update
T digest-update
Ted Dunning
ย 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
Ted Dunning
ย 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
Ted Dunning
ย 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
Ted Dunning
ย 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
Ted Dunning
ย 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
Ted Dunning
ย 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Ted Dunning
ย 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Ted Dunning
ย 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
Ted Dunning
ย 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
Ted Dunning
ย 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
Ted Dunning
ย 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Ted Dunning
ย 

More from Ted Dunning (20)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
ย 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
ย 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
ย 
Anomaly Detection: How to find what you didnโ€™t know to look for
Anomaly Detection: How to find what you didnโ€™t know to look forAnomaly Detection: How to find what you didnโ€™t know to look for
Anomaly Detection: How to find what you didnโ€™t know to look for
ย 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
ย 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
ย 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
ย 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
ย 
T digest-update
T digest-updateT digest-update
T digest-update
ย 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
ย 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
ย 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
ย 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
ย 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
ย 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
ย 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
ย 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
ย 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
ย 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
ย 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
ย 

Recently uploaded

Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
ScyllaDB
ย 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
Larry Smarr
ย 
An Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise IntegrationAn Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise Integration
Safe Software
ย 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
ย 
Ubuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdfUbuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdf
TechOnDemandSolution
ย 
Leveraging AI for Software Developer Productivity.pptx
Leveraging AI for Software Developer Productivity.pptxLeveraging AI for Software Developer Productivity.pptx
Leveraging AI for Software Developer Productivity.pptx
petabridge
ย 
Call Girls Chennai โ˜Ž๏ธ +91-7426014248 ๐Ÿ˜ Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai โ˜Ž๏ธ +91-7426014248 ๐Ÿ˜ Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai โ˜Ž๏ธ +91-7426014248 ๐Ÿ˜ Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai โ˜Ž๏ธ +91-7426014248 ๐Ÿ˜ Chennai Call Girl Beauty Girls Chennai...
anilsa9823
ย 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
ย 
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
DianaGray10
ย 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
Databarracks
ย 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
UiPathCommunity
ย 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
ย 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
Paige Cruz
ย 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
Neeraj Kumar Singh
ย 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
ScyllaDB
ย 
Building a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data PlatformBuilding a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data Platform
Enterprise Knowledge
ย 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
ThousandEyes
ย 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc
ย 
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
Overkill Security
ย 
Dev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous DiscoveryDev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous Discovery
UiPathCommunity
ย 

Recently uploaded (20)

Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudRadically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
ย 
From NCSA to the National Research Platform
From NCSA to the National Research PlatformFrom NCSA to the National Research Platform
From NCSA to the National Research Platform
ย 
An Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise IntegrationAn Introduction to All Data Enterprise Integration
An Introduction to All Data Enterprise Integration
ย 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
ย 
Ubuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdfUbuntu Server CLI cheat sheet 2024 v6.pdf
Ubuntu Server CLI cheat sheet 2024 v6.pdf
ย 
Leveraging AI for Software Developer Productivity.pptx
Leveraging AI for Software Developer Productivity.pptxLeveraging AI for Software Developer Productivity.pptx
Leveraging AI for Software Developer Productivity.pptx
ย 
Call Girls Chennai โ˜Ž๏ธ +91-7426014248 ๐Ÿ˜ Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai โ˜Ž๏ธ +91-7426014248 ๐Ÿ˜ Chennai Call Girl Beauty Girls Chennai...Call Girls Chennai โ˜Ž๏ธ +91-7426014248 ๐Ÿ˜ Chennai Call Girl Beauty Girls Chennai...
Call Girls Chennai โ˜Ž๏ธ +91-7426014248 ๐Ÿ˜ Chennai Call Girl Beauty Girls Chennai...
ย 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
ย 
Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2Communications Mining Series - Zero to Hero - Session 2
Communications Mining Series - Zero to Hero - Session 2
ย 
Cyber Recovery Wargame
Cyber Recovery WargameCyber Recovery Wargame
Cyber Recovery Wargame
ย 
Day 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data ManipulationDay 4 - Excel Automation and Data Manipulation
Day 4 - Excel Automation and Data Manipulation
ย 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
ย 
The "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community DayThe "Zen" of Python Exemplars - OTel Community Day
The "Zen" of Python Exemplars - OTel Community Day
ย 
Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0Chapter 5 - Managing Test Activities V4.0
Chapter 5 - Managing Test Activities V4.0
ย 
CTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database MigrationCTO Insights: Steering a High-Stakes Database Migration
CTO Insights: Steering a High-Stakes Database Migration
ย 
Building a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data PlatformBuilding a Semantic Layer of your Data Platform
Building a Semantic Layer of your Data Platform
ย 
APJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes WebinarAPJC Introduction to ThousandEyes Webinar
APJC Introduction to ThousandEyes Webinar
ย 
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
TrustArc Webinar - Your Guide for Smooth Cross-Border Data Transfers and Glob...
ย 
Fuxnet [EN] .pdf
Fuxnet [EN]                                   .pdfFuxnet [EN]                                   .pdf
Fuxnet [EN] .pdf
ย 
Dev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous DiscoveryDev Dives: Mining your data with AI-powered Continuous Discovery
Dev Dives: Mining your data with AI-powered Continuous Discovery
ย 

Transactional Data Mining

  • 1. Mining Transactional Data Ted Dunning - 2004
  • 2. Outline โ— What are LLR tests? โ€“ What value have they shown? โ— What are transactional values? โ€“ How can we define LLR tests for them? โ— How can these methods be applied? โ€“ Modeling architecture examples โ— How new is this?
  • 3. Log-likelihood Ratio Tests โ— Theorem due to Chernoff showed that generalized log-likelihood ratio is asymptotically ๎ƒŒ2 distributed in many useful cases โ— Most well known statistical tests are either approximately or exactly LLR tests โ€“ Includes z-test, F-test, t-test, Pearson's ๎ƒŒ2 โ— Pearson's ๎ƒŒ2 is an approximation valid for large expected counts ... G2 is the exact form for multinomial contingency tables
  • 4. Mathematical Definition โ— Ratio of maximum likelihood under the null hypothesis to the unrestricted maximum likelihood max l ๎‚ž X โˆฃ๎‚พ๎‚Ÿ ๎ƒ= max l ๎‚ž X โˆฃ๎‚พ๎‚Ÿ ๎‚พโˆˆ๎‚ถ0 ๎‚พโˆˆ๎‚ถ d.o.f.=dim ๎‚ถโˆ’dim ๎‚ถ0 โ— -2 log ๎ƒ is asymptotically ๎ƒŒ2 distributed
  • 5. Comparison of Two Observations โ— Two independent observations, X1 and X2 can be compared to determine whether they are from the same distribution ๎‚ž๎‚พ1 , ๎‚พ2 ๎‚Ÿ โˆˆ ๎‚ถร—๎‚ถ max l ๎‚ž X 1โˆฃ๎‚พ๎‚Ÿl ๎‚ž X 2โˆฃ๎‚พ๎‚Ÿ ๎ƒ= ๎‚พโˆˆ๎‚ถ max l ๎‚ž X 1โˆฃ๎‚พ1 ๎‚Ÿl ๎‚ž X 2โˆฃ๎‚พ2 ๎‚Ÿ ๎‚พ1 โˆˆ๎‚ถ , ๎‚พ2 โˆˆ๎‚ถ d.o.f.=dim ๎‚ถ
  • 6. History of LLR Tests for โ€œTextโ€ โ— Statistics of Surprise and Coincidence โ— Genomic QA tools โ— Luduan โ— HNC text-mining, preference mining โ— MusicMatch recommendation engine
  • 7. How Useful is LLR? โ— A test in 1997 showed that a query construction system using LLR (Luduan) decreased the error rate of the best document routing system (Inquery) by approximately 5x at 10% recall and nearly 2x at 20% recall โ— Language and species ID programs showed similar improvements versus state of the art โ— Previously unsuspected structure around intron splice sites was discovered using LLR tests
  • 8. TREC Document Routing Results 1 0.9 0.8 Luduan vs Inquery 0.7 0.6 Precision 0.5 0.4 Inquery 0.3 Luduan 0.2 Convectis 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall
  • 9. What are Transactional Variables? โ— A transactional sequence is a sequence of transactions. โ— Transactions are instances of a symbol and (optionally) a time and an amount: Z =๎‚ž z 1 ... z N ๎‚Ÿ z i =๎‚ž๎ƒˆ i , t i , x i ๎‚Ÿ ๎ƒˆ i โˆˆ๎‚ฒ , an alphabet of symbols t i , x i โˆˆโ„
  • 10. Example - Text โ— A textual document is a transactional sequence without times or amounts Z =๎‚ž ๎ƒˆ 1 ... ๎ƒˆ N ๎‚Ÿ ๎ƒˆ i โˆˆ๎‚ฒ
  • 11. Example โ€“ Traffic Violation History โ— A history of traffic violations is a (hopefully empty) sequence of violation types and associated dates (times) Z =๎‚ž z 1 ... z N ๎‚Ÿ z i =๎‚ž๎ƒˆ i , t i ๎‚Ÿ ๎ƒˆ i โˆˆ{stop-sign , speeding , DUI ,...} t i โˆˆโ„
  • 12. Example โ€“ Speech Transcript โ— A conversation between a and b can be rendered as a transactions containing words spoken by either a or b at particular times: Z =๎‚ž z 1 ... z N ๎‚Ÿ z i =๎‚ž๎ƒˆ i , t i ๎‚Ÿ ๎ƒˆ i โˆˆ{a , b}ร—๎‚ฒ t i โˆˆโ„
  • 13. Example โ€“ Financial History โ— A credit card history can be viewed as a transactional sequence with merchant code, date (=time) and amount: Z =๎‚ž z 1 ... z N ๎‚Ÿ 9/03/03 9/04/03 Cash Advance Groceries $300 79 9/07/03 Fuel 21 z i =โŒฉ๎ƒˆ i , t i , x i โŒช 9/10/03 Groceries 42 9/23/03 Department Store 173 ๎ƒˆ i โˆˆ๎‚ฒ 10/03/03 Payment -600 10/09/03 Hotel & Motel 104 t i โˆˆโ„ 10/17/03 Rental Cars 201 10/24/03 Lufthansa 838
  • 14. Proposed Evolution Transaction Mining Augmented LLR tests Data Transactional Luduan, Data etc Data LLR tests Augmentation Text
  • 15. LLR for Transaction Sequence โ— Assuming reasonable interactions between timing, symbol selection and amount distribution, LLR test can be decomposed โ— Two major terms remain, one for symbols and timing together, one for amounts LLR= LLR๎‚žsymbols & timing๎‚Ÿ๎‚ƒ LLR๎‚žamounts๎‚Ÿ
  • 16. Anecdotal Observations โ— Symbol selection often looks multinomial, or (rarely) Markov โ— Timing is often nearly Poisson (but rate depends on which symbol) โ— Distribution of amount appears to depend on symbol, but generally not on inter-transaction timing. Mixed discrete/continuous distributions are common in financial settings
  • 17. Transaction Sequence Distributions โ— Mixed Poisson distributions give desired symbol/timing behavior โ— Amount distribution depends on symbol k ๎ƒˆ โˆ’๎ƒ‚๎ƒˆ T ๎‚ž๎ƒ‚๎ƒˆ T ๎‚Ÿ e p๎‚žZ ๎‚Ÿ= โˆ โˆ p๎‚ž x iโˆฃ ๎‚พ๎ƒˆ ๎‚Ÿ ๎ƒˆ โˆˆ๎‚ฒ k ๎ƒˆ! i=1. .. N i [ ][ ]โˆ k๎ƒˆ โˆ’๎ƒ‚ T ๎ƒ† N ๎‚ž๎ƒ‚T ๎‚Ÿ e p๎‚žZ ๎‚Ÿ= N ! โˆ ๎ƒˆ p๎‚ž x iโˆฃ ๎‚พ๎ƒˆ ๎‚Ÿ ๎ƒˆโˆˆ๎‚ฒ k ๎ƒˆ ! N! i i=1. .. N ๎ƒ‚๎ƒˆ =๎ƒ‚๎ƒ†๎ƒˆ , โˆ‘ ๎ƒ†๎ƒˆ =1 ๎ƒˆ โˆˆ๎‚ฒ
  • 18. LLR for Multinomial โ— Easily expressed as entropy of contingency table [ ] k 11 k 12 ... k1 n k 1* k 21 k 22 ... k2n k 2* โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ โ‹ฎ k m1 k m2 ... k mn k m* k * 1 k * 2 ... k * n k ** โˆ’2 log ๎ƒ=2 N ๎‚ž โˆ‘ ๎ƒ†ij log ๎ƒ†ij โˆ’โˆ‘ ๎ƒ†i * log ๎ƒ†i *โˆ’โˆ‘ ๎ƒ†* j log ๎ƒ†* j ๎‚Ÿ ij i j k ij k ** ๎ƒ†ij log ๎ƒ=โˆ‘ k ij log =โˆ‘ k ij log d.o.f.=๎‚žmโˆ’1๎‚Ÿ๎‚žnโˆ’1๎‚Ÿ ij k i * k * j ij ๎ƒ†* j
  • 19. LLR for Poisson Mixture โ— Easily expressed using timed contingency table [ โˆฃ] k 11 k 12 ... k1n t1 k 21 k 22 ... k 2n t2 โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ โ‹ฎ k m1 k m2 ... k mn tm k * 1 k * 2 ... k * n โˆฃ t * k ij t * ๎ƒ‚ij log ๎ƒ=โˆ‘ k ij log =โˆ‘ k ij log ij t i k * j ij ๎ƒ‚* j d.o.f.=๎‚žmโˆ’1๎‚Ÿ n
  • 20. LLR for Normal Distribution โ— Assume X1 and X2 are normally distributed โ— Null hypothesis of identical mean and variance ๎‚ โˆ’๎‚ž xโˆ’๎ƒ‚๎‚Ÿ2 p ๎‚ž xโˆฃ๎ƒ‚ , ๎ƒˆ ๎‚Ÿ= 1 e 2 ๎ƒˆ2 ๎‚‘ ๎ƒ‚= โˆ‘ xi ๎‚‘ ๎ƒˆ= โˆ‘ ๎‚ž x i โˆ’๎ƒ‚๎‚Ÿ2 ๎‚ 2 ๎ƒ†๎ƒˆ N N ๎‚ž๎‚‘ ๎ƒˆ โˆ’2 log ๎ƒ=2 N 1 log ๎‚ƒN 2 log ๎‚‘ ๎ƒˆ1 ๎‚‘ ๎ƒˆ ๎‚‘ ๎ƒˆ2 ๎‚Ÿ d.o.f.=2
  • 21. Calculations โ— Assume X1 and X2 are normally distributed โ— Null hypothesis of identical mean and variance p ๎‚ž xโˆฃ๎ƒ‚ ,๎ƒˆ๎‚Ÿ= 1 ๎‚ 2๎ƒ† ๎ƒˆ e โˆ’๎‚ž xโˆ’๎ƒ‚๎‚Ÿ2 2๎ƒˆ 2 ๎ƒ‚= i ๎‚‘ N โˆ‘ xi ๎ƒˆ= i ๎‚‘ ๎‚N โˆ‘ ๎‚ž xโˆ’๎ƒ‚๎‚Ÿ2 log p๎‚ž X 1โˆฃ๎ƒ‚ , ๎ƒˆ ๎‚Ÿ๎‚ƒlog p๎‚ž X 1โˆฃ๎ƒ‚ ,๎ƒˆ ๎‚Ÿโˆ’log p๎‚ž X 1โˆฃ๎ƒ‚1, ๎ƒˆ 1 ๎‚Ÿโˆ’log p๎‚ž X 2โˆฃ๎ƒ‚2, ๎ƒˆ 2 ๎‚Ÿ= โˆ’ โˆ‘ [ i=1. . N 1 log ๎‚ 2 ๎ƒ†๎‚ƒlog ๎ƒˆ๎‚ƒ ๎‚ž x 1i โˆ’๎ƒ‚๎‚Ÿ2 2 ๎ƒˆ2 ] [ โˆ’ โˆ‘ log ๎‚ 2 ๎ƒ†๎‚ƒlog ๎ƒˆ๎‚ƒ i=1. . N 2 ๎‚ž x 2 i โˆ’๎ƒ‚๎‚Ÿ2 2 ๎ƒˆ2 ] โˆ‘ [ ] โˆ‘[ ] 2 2 ๎‚ž x โˆ’๎ƒ‚ ๎‚Ÿ ๎‚ž x โˆ’๎ƒ‚ ๎‚Ÿ ๎‚ƒ log ๎‚ 2 ๎ƒ†๎‚ƒlog ๎ƒˆ 1 ๎‚ƒ 1i 2 1 ๎‚ƒ log ๎‚ 2 ๎ƒ†๎‚ƒlog ๎ƒˆ 2๎‚ƒ 2i 2 2 i=1. . N 1 2 ๎ƒˆ1 i=1. . N 2 2 ๎ƒˆ2 โˆ’2 log ๎ƒ=2 N 1 log ๎‚ž ๎ƒˆ ๎ƒˆ1 ๎‚ƒN 2 log ๎ƒˆ ๎ƒˆ2 ๎‚Ÿ d.o.f.=2
  • 22. Transactional Data in Context Real-world input often consists of one or more bags of transactional values combined with an assortment of conventional 1.2 numerical or categorial 34 years male values. Extracting information from the transactional data can be difficult and is often, therefore, not done.
  • 23. Real World Target Variables Mislabeled a Secondary Instances Labels b Labeled as Red
  • 24. Luduan Modeling Methodology โ— Use LLR tests to find exemplars (query terms) from secondary label sets โ— Create positive and negative secondary label models for each class of transactional data โ— Cluster using output of all secondary label models and all conventional data โ— Test clusters for stability โ— Use distance cluster centroids and/or secondary label models as derived input variables
  • 25. Example #1- Auto Insurance โ— Predict probability of attrition and loss for auto insurance customers โ— Transactional variables include โ€“ Claim history โ€“ Traffic violation history โ€“ Geographical code of residence(s) โ€“ Vehicles owned โ— Observed attrition and loss define past behavior
  • 26. Derived Variables โ— Split training data according to observable classes โ€“ These include attrition and loss > 0 โ— Define LLR variables for each class/variable combination โ— These 2 m v derived variables can be used for clustering (spectral, k-means, neural gas ...) โ— Proximity in LLR space to clusters are the new modeling variables
  • 27. Results โ— Conventional NN modeling by competent analyst was able to explain 2% of variance โ€“ No significant difference on training/test data โ— Models built using Luduan based cluster proximity variables were able to explain 70% of variance (KS approximately 0.4) โ€“ No significant difference on training/test data
  • 28. Example #2 โ€“ Fraud Detection โ— Predict probability that an account is likely to result in charge-off due to payment fraud โ— Transactional variables include โ€“ Zip code โ€“ Recent payments and charges โ€“ Recent non-monetary transactions โ— Bad payments, charge-off, delinquency are observable behavioral outcomes
  • 29. Derived Variables โ— Split training data according to observable classes (charge-off, NSF payment, delinquency) โ— Define LLR variables for each class/variable combination โ— These 2 m v derived variables can be used directly as model variables โ— No results available for publication
  • 30. Example #3 โ€“ E-commerce monitor โ— Detect malfunctions or changes in behavior of e- commerce system due to fraud or system failure โ— Transaction variables include (time, SKU, amount) โ— Desired output is alarm for operational staff
  • 31. Derived Variables โ— Time warp derived as product of smoothed daily and weekly sales rates โ— Time warp updated monthly to account for seasonal variations โ— Warped time used in transactions โ— Warped time since last transaction โ‰ˆ LLR in single product/single price case โ— Full LLR allows testing for significant difference in Champion/Challenger e-commerce optimizer
  • 32. Transductive Derived Variables โ— All objective segmentations of data provide new LLR variables โ— Cross product of model outputs versus objective segmentation provide additional LLR variables for second level model derivation โ— Comparable to Luduan query construction technique โ€“ TREC pooled evaluation technique provided cross product of relevance versus perceived relevance
  • 33. Relationship To Risk Tables โ— Risk tables are estimate of relative risk for each value of a single symbolic variable โ€“ Useful with variables such as post-code of primary residence โ€“ Ad hoc smoothing used to deal with small counts โ— Not usually applied to symbol sequences โ— Risk tables ignore time entirely โ— Risk tables require considerable analyst finesse
  • 34. Relationship to Known Techniques โ— Clock-tick symbols โ€“ Time-embedded symbols viewed as sequences of symbols along with โ€œticksโ€ that occur at fixed time intervals โ€“ Allows multinomial LLR as poor man's mixed Poisson LLR โ— Not a well known technique, not used in production models โ— Difficulties in choosing time resolution and counting period
  • 35. Conclusions โ— Theoretical properties of transaction variables are well defined โ— Similarities to known techniques indicates low probability of gross failure โ— Similarity to Luduan techniques suggests high probability of superlative performance โ— Transactional LLR statistics define similarity metrics useful for clustering
  ็ฟป่ฏ‘๏ผš