尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
Tilani Gunawardena
Machine Learning and Data Mining
Evaluation and Credibility
• Introduction
• Train, Test and Validation sets
• Evaluation on Large data Unbalanced data
• Evaluation on Small data
– Cross validation
– Bootstrap
• Comparing data mining schemes
– Significance test
– Lift Chart / ROC curve
• Numeric Prediction Evaluation
Outline
Model’s Evaluation in the KDD Process
How to Estimate the Metrics?
• We can use:
– Training data;
– Independent test data;
– Hold-out method;
– k-fold cross-validation method;
– Leave-one-out method;
– Bootstrap method;
– And many more…
Estimation with Training Data
• The accuracy/error estimates on the training data are not
good indicators of performance on future data.
– Q: Why?
– A: Because new data will probably not be exactly the same as
the training data!
• The accuracy/error estimates on the training data
measure the degree of classifier’s overfitting.
Training set
Classifier
Training set
Estimation with Independent Test Data
• Estimation with independent test data is used when we
have plenty of data and there is a natural way to forming
training and test data.
• For example: Quinlan in 1987 reported experiments in a
medical domain for which the classifiers were trained on
data from 1985 and tested on data from 1986.
Training set
Classifier
Test set
Hold-out Method
• The hold-out method splits the data into training data and
test data (usually 2/3 for train, 1/3 for test). Then we build a
classifier using the train data and test it using the test data.
• The hold-out method is usually used when we have
thousands of instances, including several hundred instances
from each class.
Training set
Classifier
Test set
Data
Classification: Train, Validation, Test Split
Data
Predictions
Y N
Results Known
Training set
Validation set
+
+
-
-
+
Classifier Builder
Evaluate
+
-
+
-
ClassifierFinal Test Set
+
-
+
-
Final Evaluation
Model
Builder
The test data can’t be used for parameter tuning!
k-Fold Cross-Validation
• k-fold cross-validation avoids overlapping test sets:
– First step: data is split into k subsets of equal size;
– Second step: each subset in turn is used for testing and the
remainder for training.
• The estimates are averaged to
yield an overall estimate. Classifier
Data
train train test
train test train
test train train
Example
collect data from real world(photographs and labels)
Method 1: Training Process
Giving students the
answer before giving them
exam
Method 2
Cross Validation Error
Method 3
If the world happens to be well
represented by our dataset
• Model Selection
• Evaluating our selection Method
CV
35
The Bootstrap
• CV uses sampling without replacement
– The same instance, once selected, can not be selected again for a
particular training/test set
• The bootstrap uses sampling with replacement to form
the training set
– Sample a dataset of n instances n times with replacement to form
a new dataset of n instances
– Use this data as the training set
– Use the instances from the original
dataset that don’t occur in the new
training set for testing
Example
• Sample of same size N(with replacement)
• N=4,M=N=4,M=3
• N=150, M=5000
• This gives M=5000 means of random samples
of X
37
The 0.632 bootstrap
• Also called the 0.632 bootstrap
– A particular instance has a probability of 1–1/n of not being picked
– Thus its probability of ending up in the test data is:
– This means the training data will contain approximately 63.2% of
the instances
368.0
1
1 1
=»÷
ø
ö
ç
è
æ
- -
e
n
n
38
Estimating error
with the bootstrap
• The error estimate on the test data will be very pessimistic
– Trained on just ~63% of the instances
• Therefore, combine it with the resubstitution error:
• The resubstitution error gets less weight than the error on
the test data
• Repeat process several times with different replacement
samples; average the results
instancestraininginstancestest 368.0632.0 eeerr ×+×=
39
More on the bootstrap
• Probably the best way of estimating performance for very
small datasets
• However, it has some problems
– Completely random dataset with two classes of equal size. The true
error rate is 50% for any prediction rule.
– Consider the random dataset from above
– 0% resubstitution error and
~50% error on test data
– Bootstrap estimate for this classifier:
– True expected error: 50%
%6.31%0368.0%50632.0 =×+×=err
• It is a straightforward way to derive estimates
of standard errors and confidence intervals for
complex estimators of complex parameters of
the distribution
41
Evaluation Summary:
• Use Train, Test, Validation sets for “LARGE”
data
• Balance “un-balanced” data
• Use Cross-validation for Middle size/small
data
• Use the leave-one-out and bootstrap methods
for small data
• Don’t use test data for parameter tuning - use
separate validation data
Agenda
• Quantifying learner performance
– Cross validation
– Error vs. loss
– Precision & recall
• Model selection
Accuracy Vs Precision
accuracy refers to the
closeness of a
measurement or estimate
to the TRUE value.
precision (or variance) refers to
the degree of agreement for a
series of measurements.
Precision Vs Recall
precision: Percentage of
retrieved documents that
are relevant.
recall: Percentage of relevant
documents that are returned.
Scenario
• We use a dataset with knows classes to build a
model
• We use another dataset with known classes to
evaluate the model(this dataset could be part
of the original dataset)
• We compare/count the predicted classes
against the actual classes
Confusion Matrix
• A confusion matrix shows the number of
correct and incorrect predictions made by the
classification model compared to the actual
outcomes(target value) in the data
• The matrix is NxN, where N is the number of
target values(Classes)
• Performance of such models commonly
evaluated using data in the matrix
Two Types of Error
False negative (“miss”), FN
alarm doesn’t sound but person is carrying metal
False positive (“false alarm”), FP
alarm sounds but person is not carrying metal
How to evaluate the Classifier’s
Generalization Performance?
Predicted class
Actual
class
Pos Neg
Pos TP FN
Neg FP TN
• Assume that we test a classifier on some
test set and we derive at the end the
following confusion matrix (Two-Class)
• Also called contingency table
P
N
Measures in Two-Class Classification
Example:
1) How many images of Gerhard Schroeder in the data set?
2) How many predictions of G Schroeder are there?
3) What is the Probability that Hugo Chavez classified correctly in our learning algorithm?
4) Your learning algorithm predicted/classified as Hugo Chavez.
What is the probability he is actually Hugo Chavez?
5) Recall(“Hugo Chavez”) =
6)Precision(“Hugo Chavez”)=
7) Recall(“Colin Powell”)=
8) Precision(“Colin Powel”)=
9)Recall(“George W Bush”)=
10) Precision(“George W Bush”)=
1) True Positive (“Tony Blair”)=
2) False Positive (“Tony Blair”)=
3) False Negative(“Tony Blair”)=
4) True Positive (“Donald Rumsdeld”)=
5) False Positive (““Donald Rumsdeld”)=
6) False Negative(““Donald Rumsdeld”)=
Metrics for Classifier’s Evaluation
Predicted class
Actual
class
Pos Neg
Pos TP FN
Neg FP TN
• Accuracy = (TP+TN)/(P+N)
• Error = (FP+FN)/(P+N)
• Precision = TP/(TP+FP)
• Recall/TP rate = TP/P
• FP Rate = FP/N
P
N
Example: 3 classifiers
True
Predicted
pos neg
pos 60 40
neg 20 80
True
Predicted
pos neg
pos 70 30
neg 50 50
True
Predicted
pos neg
pos 40 60
neg 30 70
Classifier 1
TPR =
FPR =
Classifier 2
TPR =
FPR =
Classifier 3
TPR =
FPR =
Example: 3 classifiers
True
Predicted
pos neg
pos 60 40
neg 20 80
True
Predicted
pos neg
pos 70 30
neg 50 50
True
Predicted
pos neg
pos 40 60
neg 30 70
Classifier 1
TPR = 0.4
FPR = 0.3
Classifier 2
TPR = 0.7
FPR = 0.5
Classifier 3
TPR = 0.6
FPR = 0.2
Multiclass-Things to Notice
• The total number of test examples of any class would be the
sum of corresponding row(i.e the TP +FN for that class)
• The total number of FN’s for a class is sum of values in the
corresponding row(excluding the TP)
• The total number of FP’s for a class is the sum of values in
the corresponding column(excluding the TP)
• The total number of TN’s for a certain class will be the sum
of all column and rows excluding that class's column and row
Predicted
Actual A B C D E
A TPA EAB EAC EAD EAE
B EBA TPB EBC EBD EBE
C ECA ECB TPC ECD ECE
D EDA EDB EDC TPD EDE
E EEA EEB EEC EED TPE
Predicted
Actual A B C D E
A TPA EAB EAC EAD EAE
B EBA TPB EBC EBD EBE
C ECA ECB TPC ECD ECE
D EDA EDB EDC TPD EDE
E EEA EEB EEC EED TPE
Multi-class
Predicted
Act
ual
A B C
A TPA EAB EAC
B EBA TPB EBC
C ECA ECB TPC
Predicted class
Actual
class
P N
P TP FN
N FP TN
Predicted
Actual A Not A
A
Not A
Predicted
Actual B Not B
B
Not B
Predicted
Actual C Not C
C
Not C
Multi-class
Predicted
Act
ual
A B C
A TPA EAB EAC
B EBA TPB EBC
C ECA ECB TPC
Predicted class
Actual
class
P N
P TP FN
N FP TN
Predicted
Actual A Not A
A TPA EAB + EAC
Not A EBA + ECA TPB + EBC
ECB + TPC
Predicted
Actual B Not B
B TPB EBA + EBC
Not B EAB+ ECB TPA + EAC
ECA + TPC
Predicted
Actual C Not C
C TPC ECA + ECB
Not C EAC + EBC TPA + EAB
EBA + TPB
Example:
A B C
A 25 5 2
B 3 32 4
C 1 0 15
Overall Accuracy:
Precision A=
Recall B=
Predicted
Actual
Example:
A B C
A 25 5 2
B 3 32 4
C 1 0 15
Overall Accuracy = (25+32+15)/(25+5+2+3+32+4+1+0+15)
Precision A= 25/(25+3+1)
Recall B= 32/(32+3+4)
Counting the Costs
• In practice, different types of classification
errors often incur different costs
• Examples:
– ¨ Terrorist profiling
• “Not a terrorist” correct 99.99% of the time
– Loan decisions
– Fault diagnosis
– Promotional mailing
Cost Matrices
Pos Neg
Pos TP Cost FN Cost
Neg FP Cost TN Cost
Usually, TP Cost and TN Cost are set equal to 0
Hypothesized
class
True class
Lift Charts
• In practice, decisions are usually made by comparing
possible scenarios taking into account different costs.
• Example:
• Promotional mail out to 1,000,000 households. If we
mail to all households, we get 0.1% respond (1000).
• Data mining tool identifies
-subset of 100,000 households with 0.4% respond
(400); or
- subset of 400,000 households with 0.2% respond
(800);
• Depending on the costs we can make final decision
using lift charts!
• A lift chart allows a visual comparison for measuring
model performance
Generating a Lift Chart
• Given a scheme that outputs probability, sort the
instances in descending order according to the predicted
probability
• In lift chart, x axis is sample size and y axis is number of
true positives.
Rank Predicted
Probability
Actual Class
1 0.95 Yes
2 0.93 Yes
3 0.93 No
4 0.88 Yes
….. …. ….
Gains Chart
Example 01: Direct Marketing
• A company wants to do a mail marketing campaign
• It costs the company $1 for each item mailed
• They have information on 100,000 customers
• Create a cumulative gains and lift charts from the
following data
• Overall Response Rate: If we assume we have no
model other than the prediction of the overall
response rate, then we can predict the number of
positive responses as a fraction of the total customers
contacted
• Suppose the response rate is 20%
• If all 100,000 customers are contacted we will receive
around 20,000 positive responses
Cost($) Total Customers Contacted Positive Responses
100000 100000 20000
• Prediction of Response Model: A
Response model predicts who will
respond to a marketting campaign
• If we have a response model, we can
make more detailed predictions
• For example, we use the response
model to assign a score to all
100,000 customers and predict the
results of contacting only the top
10,000 customers, the top 20,000
customers ,etc
Cost($) Total
Customers
Contacted
Positive
Responses
10,000 10,000 6,000
20,000 20,000 10,000
30,000 30,000 13,000
40,000 40,000 15,800
50,000 50,000 17,000
60,000 60,000 18,000
70,000 70,000 18,800
80,000 80,000 19,400
90,000 90,000 19,800
100,000 100,000 20,000
Cumulative Gains Chart
• The y-axis shows the percentage of positive responses.
This is a percentage of the total possible positive
responses (20,000 as the overall response rate shows)
• The x-axis shows the percentage of customers
contacted, which is a fraction of the 100,000 total
customers
• Baseline(Overall response rate): If we contact X% of
customers then we will receive X% if the total positive
responses
• Lift Curve: Using the predictions of the response
model, calculate the percentage of positive responses
for the percent customers contacted and map these
points to create the lift curve
Cost($) Total Customers
Contacted
Positive
Responses
10,000 10,000 6,000
20,000 20,000 10,000
30,000 30,000 13,000
40,000 40,000 15,800
50,000 50,000 17,000
60,000 60,000 18,000
70,000 70,000 18,800
80,000 80,000 19,400
90,000 90,000 19,800
100,000 100,000 20,000
Lift Chart
• Shows the actual lift.
• To plot the chart: Calculate the points on the lift
curve by determining the ratio between the
result predicted by our model and the result
using no model.
• Example: For contacting 10% of customers, using
no model we should get 10% of responders and
using the given model we should get 30% of
responders. The y-value of the lift curve at 10% is
30 / 10 = 3
Lift Chart
Cumulative gains and lift charts are a graphical
representation of the advantage of using a predictive
model to choose which customers to contact
Example 2:
• Using the response model
P(x)=100-AGE(x) for
customer x and the data
table shown below,
construct the cumulative
gains and lift charts.
Calculate P(x) for each person x
1. Calculate P(x) for each person x
2. Order the people according to rank
P(x)
3. Calculate the percentage of total
responses for each cutoff point
Response Rate = Number of Responses /
Total Number of Responses
Total
Custo
mer
Conta
cted
#of
Respo
nses
Respo
nse
Rate
2
4
6
8
10
12
14
16
18
20
Calculate P(x) for each person x
1. Calculate P(x) for each person x
2. Order the people according to rank
P(x)
3. Calculate the percentage of total
responses for each cutoff point
Response Rate = Number of Responses /
Total Number of Responses
Cumulative Gains vs Lift Chart
The lift curve and the baseline have the same
values for 10%-20% and 90%-100%.
ROC Curves
• ROC curves are similar to lift charts
– Stands for “receiver operating characteristic”
– Used in signal detection to show tradeoff between
hit rate and false alarm rate over noisy channel
• Differences from gains chart:
– x axis shows percentage of false positives in
sample, rather than sample size
ROC Curve
81
Non-diseased
cases
Diseased
cases
Threshold
ROC Curves and Analysis
True
Predicted
pos neg
pos 60 40
neg 20 80
True
Predicted
pos neg
pos 70 30
neg 50 50
True
Predicted
pos neg
pos 40 60
neg 30 70
Classifier 1
TPr = 0.4
FPr = 0.3
Classifier 2
TPr = 0.7
FPr = 0.5
Classifier 3
TPr = 0.6
FPr = 0.2
ROC analysis
• True Positive Rate
– TPR = TP / (TP+FN)
– also called sensitivity
– true abnormals called abnormal by the observer
• False Positive Rate
– FPR = FP / (FP+TN)
• Specificity (TNR)= TN / (TN+FP)
– True normals called normal by the observer
• FPR = 1 - specificity
Evaluating
classifiers (via
their ROC
curves)
Classifier A
can’t
distinguish
between
normal and
abnormal.
B is better but
makes some
mistakes.
C makes very
few mistakes.
“Perfect”
means no
false positives
and no false
negatives.
Quiz 4:
1) How many images of Gerhard Schroeder in the data set?
2) How many predictions of G Schroeder are there?
3) What is the Probability that Hugo Chavez classified correctly in our learning algorithm?
4) Your learning algorithm predicted/classified as Hugo Chavez.
What is the probability he is actually Hugo Chavez?
5) Recall(“Hugo Chavez”) =
6)Precision(“Hugo Chavez”)=
7) Recall(“Colin Powell”)=
8) Precision(“Colin Powel”)=
9)Recall(“George W Bush”)=
10) Precision(“George W Bush”)=

More Related Content

What's hot

K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
1.10. pumping lemma for regular sets
1.10. pumping lemma for regular sets1.10. pumping lemma for regular sets
1.10. pumping lemma for regular sets
Sampath Kumar S
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
Nagarajan
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
knowledge representation using rules
knowledge representation using rulesknowledge representation using rules
knowledge representation using rules
Harini Balamurugan
 
Learning set of rules
Learning set of rulesLearning set of rules
Learning set of rules
swapnac12
 
Flipflop
FlipflopFlipflop
Flipflop
sohamdodia27
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
omaraldabash
 
NP Complete Problems
NP Complete ProblemsNP Complete Problems
NP Complete Problems
Nikhil Joshi
 
Lecture: Automata
Lecture: AutomataLecture: Automata
Lecture: Automata
Marina Santini
 
Unit05 dbms
Unit05 dbmsUnit05 dbms
Unit05 dbms
arnold 7490
 
Knowledge representation and Predicate logic
Knowledge representation and Predicate logicKnowledge representation and Predicate logic
Knowledge representation and Predicate logic
Amey Kerkar
 
Floyd Warshall Algorithm
Floyd Warshall AlgorithmFloyd Warshall Algorithm
Floyd Warshall Algorithm
InteX Research Lab
 
First order logic in knowledge representation
First order logic in knowledge representationFirst order logic in knowledge representation
First order logic in knowledge representation
Sabaragamuwa University
 
Chomsky Hierarchy.ppt
Chomsky Hierarchy.pptChomsky Hierarchy.ppt
Chomsky Hierarchy.ppt
AayushSingh233965
 
Conceptual dependency
Conceptual dependencyConceptual dependency
Conceptual dependency
Jismy .K.Jose
 
Counters & time delay
Counters & time delayCounters & time delay
Counters & time delay
Hemant Chetwani
 
Flip-Flop || Digital Electronics
Flip-Flop || Digital ElectronicsFlip-Flop || Digital Electronics
Flip-Flop || Digital Electronics
Md Sadequl Islam
 
Instruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) LimitationsInstruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) Limitations
Jose Pinilla
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
Kamal Acharya
 

What's hot (20)

K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
 
1.10. pumping lemma for regular sets
1.10. pumping lemma for regular sets1.10. pumping lemma for regular sets
1.10. pumping lemma for regular sets
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 
knowledge representation using rules
knowledge representation using rulesknowledge representation using rules
knowledge representation using rules
 
Learning set of rules
Learning set of rulesLearning set of rules
Learning set of rules
 
Flipflop
FlipflopFlipflop
Flipflop
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
NP Complete Problems
NP Complete ProblemsNP Complete Problems
NP Complete Problems
 
Lecture: Automata
Lecture: AutomataLecture: Automata
Lecture: Automata
 
Unit05 dbms
Unit05 dbmsUnit05 dbms
Unit05 dbms
 
Knowledge representation and Predicate logic
Knowledge representation and Predicate logicKnowledge representation and Predicate logic
Knowledge representation and Predicate logic
 
Floyd Warshall Algorithm
Floyd Warshall AlgorithmFloyd Warshall Algorithm
Floyd Warshall Algorithm
 
First order logic in knowledge representation
First order logic in knowledge representationFirst order logic in knowledge representation
First order logic in knowledge representation
 
Chomsky Hierarchy.ppt
Chomsky Hierarchy.pptChomsky Hierarchy.ppt
Chomsky Hierarchy.ppt
 
Conceptual dependency
Conceptual dependencyConceptual dependency
Conceptual dependency
 
Counters & time delay
Counters & time delayCounters & time delay
Counters & time delay
 
Flip-Flop || Digital Electronics
Flip-Flop || Digital ElectronicsFlip-Flop || Digital Electronics
Flip-Flop || Digital Electronics
 
Instruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) LimitationsInstruction Level Parallelism (ILP) Limitations
Instruction Level Parallelism (ILP) Limitations
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 

Similar to evaluation and credibility-Part 2

datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptx
RithikRaj25
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluation
khairulhuda242
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Thomas Ploetz
 
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
06-00-ACA-Evaluation.pdf
06-00-ACA-Evaluation.pdf06-00-ACA-Evaluation.pdf
06-00-ACA-Evaluation.pdf
AlexanderLerch4
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
ankit_ppt
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Maninda Edirisooriya
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to Statistics
Andrea Arcuri
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification Models
Pier Luca Lanzi
 
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptx
Temp762476
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Agile Testing Alliance
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
rajalakshmi5921
 
Dealing with imbalanced data in RTB
Dealing with imbalanced data in RTBDealing with imbalanced data in RTB
Dealing with imbalanced data in RTB
Yuya Kanemoto
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
Pier Luca Lanzi
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
weka Content
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
DataminingTools Inc
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
Aman Vasisht
 
Lecture1.ppt
Lecture1.pptLecture1.ppt
Lecture1.ppt
Minakshee Patil
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
PriyadharshiniG41
 
Black Box Testing.pdf
Black Box Testing.pdfBlack Box Testing.pdf
Black Box Testing.pdf
SupunLakshan4
 

Similar to evaluation and credibility-Part 2 (20)

datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptx
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluation
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
evaluation and credibility-Part 1
 
06-00-ACA-Evaluation.pdf
06-00-ACA-Evaluation.pdf06-00-ACA-Evaluation.pdf
06-00-ACA-Evaluation.pdf
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to Statistics
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification Models
 
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptx
 
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep YadavMachine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
Machine learning by Dr. Vivek Vijay and Dr. Sandeep Yadav
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
Dealing with imbalanced data in RTB
Dealing with imbalanced data in RTBDealing with imbalanced data in RTB
Dealing with imbalanced data in RTB
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Lecture1.ppt
Lecture1.pptLecture1.ppt
Lecture1.ppt
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
Black Box Testing.pdf
Black Box Testing.pdfBlack Box Testing.pdf
Black Box Testing.pdf
 

More from Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

BlockChain.pptx
BlockChain.pptxBlockChain.pptx
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Data analytics
Data analyticsData analytics
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Machine Learning and Data Mining
Machine Learning and Data MiningMachine Learning and Data Mining
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
Decision tree
Decision treeDecision tree
kmean clustering
kmean clusteringkmean clustering
Covering algorithm
Covering algorithmCovering algorithm
Hierachical clustering
Hierachical clusteringHierachical clustering
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Big data in telecom
Big data in telecomBig data in telecom
Cloud Computing
Cloud ComputingCloud Computing
MapReduce
MapReduceMapReduce
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Pig Experience
Pig ExperiencePig Experience
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
HadoopDB in Action
HadoopDB in ActionHadoopDB in Action

More from Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL (20)

BlockChain.pptx
BlockChain.pptxBlockChain.pptx
BlockChain.pptx
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Introduction to cloud computing
 
Data analytics
Data analyticsData analytics
Data analytics
 
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Hadoop Eco system
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
Machine Learning and Data Mining
Machine Learning and Data MiningMachine Learning and Data Mining
Machine Learning and Data Mining
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Decision tree
Decision treeDecision tree
Decision tree
 
kmean clustering
kmean clusteringkmean clustering
kmean clustering
 
Covering algorithm
Covering algorithmCovering algorithm
Covering algorithm
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Assosiate rule mining
Assosiate rule miningAssosiate rule mining
Assosiate rule mining
 
Big data in telecom
Big data in telecomBig data in telecom
Big data in telecom
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
MapReduce
MapReduceMapReduce
MapReduce
 
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
 
HadoopDB in Action
HadoopDB in ActionHadoopDB in Action
HadoopDB in Action
 

Recently uploaded

How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17
Celine George
 
Library news letter Kitengesa Uganda June 2024
Library news letter Kitengesa Uganda June 2024Library news letter Kitengesa Uganda June 2024
Library news letter Kitengesa Uganda June 2024
Friends of African Village Libraries
 
How to Create User Notification in Odoo 17
How to Create User Notification in Odoo 17How to Create User Notification in Odoo 17
How to Create User Notification in Odoo 17
Celine George
 
Creativity for Innovation and Speechmaking
Creativity for Innovation and SpeechmakingCreativity for Innovation and Speechmaking
Creativity for Innovation and Speechmaking
MattVassar1
 
Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...
Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...
Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...
biruktesfaye27
 
Accounting for Restricted Grants When and How To Record Properly
Accounting for Restricted Grants  When and How To Record ProperlyAccounting for Restricted Grants  When and How To Record Properly
Accounting for Restricted Grants When and How To Record Properly
TechSoup
 
The basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptxThe basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptx
heathfieldcps1
 
Keynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse CityKeynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse City
PJ Caposey
 
Interprofessional Education Platform Introduction.pdf
Interprofessional Education Platform Introduction.pdfInterprofessional Education Platform Introduction.pdf
Interprofessional Education Platform Introduction.pdf
Ben Aldrich
 
Post init hook in the odoo 17 ERP Module
Post init hook in the  odoo 17 ERP ModulePost init hook in the  odoo 17 ERP Module
Post init hook in the odoo 17 ERP Module
Celine George
 
bryophytes.pptx bsc botany honours second semester
bryophytes.pptx bsc botany honours  second semesterbryophytes.pptx bsc botany honours  second semester
bryophytes.pptx bsc botany honours second semester
Sarojini38
 
A Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by QuizzitoA Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by Quizzito
Quizzito The Quiz Society of Gargi College
 
Information and Communication Technology in Education
Information and Communication Technology in EducationInformation and Communication Technology in Education
Information and Communication Technology in Education
MJDuyan
 
Cross-Cultural Leadership and Communication
Cross-Cultural Leadership and CommunicationCross-Cultural Leadership and Communication
Cross-Cultural Leadership and Communication
MattVassar1
 
8+8+8 Rule Of Time Management For Better Productivity
8+8+8 Rule Of Time Management For Better Productivity8+8+8 Rule Of Time Management For Better Productivity
8+8+8 Rule Of Time Management For Better Productivity
RuchiRathor2
 
Diversity Quiz Finals by Quiz Club, IIT Kanpur
Diversity Quiz Finals by Quiz Club, IIT KanpurDiversity Quiz Finals by Quiz Club, IIT Kanpur
Diversity Quiz Finals by Quiz Club, IIT Kanpur
Quiz Club IIT Kanpur
 
Brand Guideline of Bashundhara A4 Paper - 2024
Brand Guideline of Bashundhara A4 Paper - 2024Brand Guideline of Bashundhara A4 Paper - 2024
Brand Guideline of Bashundhara A4 Paper - 2024
khabri85
 
BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...
BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...
BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...
Nguyen Thanh Tu Collection
 
managing Behaviour in early childhood education.pptx
managing Behaviour in early childhood education.pptxmanaging Behaviour in early childhood education.pptx
managing Behaviour in early childhood education.pptx
nabaegha
 
Talking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual AidsTalking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual Aids
MattVassar1
 

Recently uploaded (20)

How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17
 
Library news letter Kitengesa Uganda June 2024
Library news letter Kitengesa Uganda June 2024Library news letter Kitengesa Uganda June 2024
Library news letter Kitengesa Uganda June 2024
 
How to Create User Notification in Odoo 17
How to Create User Notification in Odoo 17How to Create User Notification in Odoo 17
How to Create User Notification in Odoo 17
 
Creativity for Innovation and Speechmaking
Creativity for Innovation and SpeechmakingCreativity for Innovation and Speechmaking
Creativity for Innovation and Speechmaking
 
Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...
Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...
Ethiopia and Eritrea Eritrea's journey has been marked by resilience and dete...
 
Accounting for Restricted Grants When and How To Record Properly
Accounting for Restricted Grants  When and How To Record ProperlyAccounting for Restricted Grants  When and How To Record Properly
Accounting for Restricted Grants When and How To Record Properly
 
The basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptxThe basics of sentences session 8pptx.pptx
The basics of sentences session 8pptx.pptx
 
Keynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse CityKeynote given on June 24 for MASSP at Grand Traverse City
Keynote given on June 24 for MASSP at Grand Traverse City
 
Interprofessional Education Platform Introduction.pdf
Interprofessional Education Platform Introduction.pdfInterprofessional Education Platform Introduction.pdf
Interprofessional Education Platform Introduction.pdf
 
Post init hook in the odoo 17 ERP Module
Post init hook in the  odoo 17 ERP ModulePost init hook in the  odoo 17 ERP Module
Post init hook in the odoo 17 ERP Module
 
bryophytes.pptx bsc botany honours second semester
bryophytes.pptx bsc botany honours  second semesterbryophytes.pptx bsc botany honours  second semester
bryophytes.pptx bsc botany honours second semester
 
A Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by QuizzitoA Quiz on Drug Abuse Awareness by Quizzito
A Quiz on Drug Abuse Awareness by Quizzito
 
Information and Communication Technology in Education
Information and Communication Technology in EducationInformation and Communication Technology in Education
Information and Communication Technology in Education
 
Cross-Cultural Leadership and Communication
Cross-Cultural Leadership and CommunicationCross-Cultural Leadership and Communication
Cross-Cultural Leadership and Communication
 
8+8+8 Rule Of Time Management For Better Productivity
8+8+8 Rule Of Time Management For Better Productivity8+8+8 Rule Of Time Management For Better Productivity
8+8+8 Rule Of Time Management For Better Productivity
 
Diversity Quiz Finals by Quiz Club, IIT Kanpur
Diversity Quiz Finals by Quiz Club, IIT KanpurDiversity Quiz Finals by Quiz Club, IIT Kanpur
Diversity Quiz Finals by Quiz Club, IIT Kanpur
 
Brand Guideline of Bashundhara A4 Paper - 2024
Brand Guideline of Bashundhara A4 Paper - 2024Brand Guideline of Bashundhara A4 Paper - 2024
Brand Guideline of Bashundhara A4 Paper - 2024
 
BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...
BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...
BỘ BÀI TẬP TEST THEO UNIT - FORM 2025 - TIẾNG ANH 12 GLOBAL SUCCESS - KÌ 1 (B...
 
managing Behaviour in early childhood education.pptx
managing Behaviour in early childhood education.pptxmanaging Behaviour in early childhood education.pptx
managing Behaviour in early childhood education.pptx
 
Talking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual AidsTalking Tech through Compelling Visual Aids
Talking Tech through Compelling Visual Aids
 

evaluation and credibility-Part 2

  • 1. Tilani Gunawardena Machine Learning and Data Mining Evaluation and Credibility
  • 2. • Introduction • Train, Test and Validation sets • Evaluation on Large data Unbalanced data • Evaluation on Small data – Cross validation – Bootstrap • Comparing data mining schemes – Significance test – Lift Chart / ROC curve • Numeric Prediction Evaluation Outline
  • 3. Model’s Evaluation in the KDD Process
  • 4. How to Estimate the Metrics? • We can use: – Training data; – Independent test data; – Hold-out method; – k-fold cross-validation method; – Leave-one-out method; – Bootstrap method; – And many more…
  • 5. Estimation with Training Data • The accuracy/error estimates on the training data are not good indicators of performance on future data. – Q: Why? – A: Because new data will probably not be exactly the same as the training data! • The accuracy/error estimates on the training data measure the degree of classifier’s overfitting. Training set Classifier Training set
  • 6. Estimation with Independent Test Data • Estimation with independent test data is used when we have plenty of data and there is a natural way to forming training and test data. • For example: Quinlan in 1987 reported experiments in a medical domain for which the classifiers were trained on data from 1985 and tested on data from 1986. Training set Classifier Test set
  • 7. Hold-out Method • The hold-out method splits the data into training data and test data (usually 2/3 for train, 1/3 for test). Then we build a classifier using the train data and test it using the test data. • The hold-out method is usually used when we have thousands of instances, including several hundred instances from each class. Training set Classifier Test set Data
  • 8. Classification: Train, Validation, Test Split Data Predictions Y N Results Known Training set Validation set + + - - + Classifier Builder Evaluate + - + - ClassifierFinal Test Set + - + - Final Evaluation Model Builder The test data can’t be used for parameter tuning!
  • 9. k-Fold Cross-Validation • k-fold cross-validation avoids overlapping test sets: – First step: data is split into k subsets of equal size; – Second step: each subset in turn is used for testing and the remainder for training. • The estimates are averaged to yield an overall estimate. Classifier Data train train test train test train test train train
  • 10. Example collect data from real world(photographs and labels)
  • 11.
  • 13. Giving students the answer before giving them exam
  • 15.
  • 16.
  • 17.
  • 19.
  • 20.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32. If the world happens to be well represented by our dataset
  • 33. • Model Selection • Evaluating our selection Method CV
  • 34. 35 The Bootstrap • CV uses sampling without replacement – The same instance, once selected, can not be selected again for a particular training/test set • The bootstrap uses sampling with replacement to form the training set – Sample a dataset of n instances n times with replacement to form a new dataset of n instances – Use this data as the training set – Use the instances from the original dataset that don’t occur in the new training set for testing
  • 35. Example • Sample of same size N(with replacement) • N=4,M=N=4,M=3 • N=150, M=5000 • This gives M=5000 means of random samples of X
  • 36. 37 The 0.632 bootstrap • Also called the 0.632 bootstrap – A particular instance has a probability of 1–1/n of not being picked – Thus its probability of ending up in the test data is: – This means the training data will contain approximately 63.2% of the instances 368.0 1 1 1 =»÷ ø ö ç è æ - - e n n
  • 37. 38 Estimating error with the bootstrap • The error estimate on the test data will be very pessimistic – Trained on just ~63% of the instances • Therefore, combine it with the resubstitution error: • The resubstitution error gets less weight than the error on the test data • Repeat process several times with different replacement samples; average the results instancestraininginstancestest 368.0632.0 eeerr ×+×=
  • 38. 39 More on the bootstrap • Probably the best way of estimating performance for very small datasets • However, it has some problems – Completely random dataset with two classes of equal size. The true error rate is 50% for any prediction rule. – Consider the random dataset from above – 0% resubstitution error and ~50% error on test data – Bootstrap estimate for this classifier: – True expected error: 50% %6.31%0368.0%50632.0 =×+×=err
  • 39. • It is a straightforward way to derive estimates of standard errors and confidence intervals for complex estimators of complex parameters of the distribution
  • 40. 41 Evaluation Summary: • Use Train, Test, Validation sets for “LARGE” data • Balance “un-balanced” data • Use Cross-validation for Middle size/small data • Use the leave-one-out and bootstrap methods for small data • Don’t use test data for parameter tuning - use separate validation data
  • 41. Agenda • Quantifying learner performance – Cross validation – Error vs. loss – Precision & recall • Model selection
  • 42. Accuracy Vs Precision accuracy refers to the closeness of a measurement or estimate to the TRUE value. precision (or variance) refers to the degree of agreement for a series of measurements.
  • 43. Precision Vs Recall precision: Percentage of retrieved documents that are relevant. recall: Percentage of relevant documents that are returned.
  • 44. Scenario • We use a dataset with knows classes to build a model • We use another dataset with known classes to evaluate the model(this dataset could be part of the original dataset) • We compare/count the predicted classes against the actual classes
  • 45. Confusion Matrix • A confusion matrix shows the number of correct and incorrect predictions made by the classification model compared to the actual outcomes(target value) in the data • The matrix is NxN, where N is the number of target values(Classes) • Performance of such models commonly evaluated using data in the matrix
  • 46. Two Types of Error False negative (“miss”), FN alarm doesn’t sound but person is carrying metal False positive (“false alarm”), FP alarm sounds but person is not carrying metal
  • 47. How to evaluate the Classifier’s Generalization Performance? Predicted class Actual class Pos Neg Pos TP FN Neg FP TN • Assume that we test a classifier on some test set and we derive at the end the following confusion matrix (Two-Class) • Also called contingency table P N
  • 48. Measures in Two-Class Classification
  • 49. Example: 1) How many images of Gerhard Schroeder in the data set? 2) How many predictions of G Schroeder are there? 3) What is the Probability that Hugo Chavez classified correctly in our learning algorithm? 4) Your learning algorithm predicted/classified as Hugo Chavez. What is the probability he is actually Hugo Chavez? 5) Recall(“Hugo Chavez”) = 6)Precision(“Hugo Chavez”)= 7) Recall(“Colin Powell”)= 8) Precision(“Colin Powel”)= 9)Recall(“George W Bush”)= 10) Precision(“George W Bush”)=
  • 50. 1) True Positive (“Tony Blair”)= 2) False Positive (“Tony Blair”)= 3) False Negative(“Tony Blair”)= 4) True Positive (“Donald Rumsdeld”)= 5) False Positive (““Donald Rumsdeld”)= 6) False Negative(““Donald Rumsdeld”)=
  • 51. Metrics for Classifier’s Evaluation Predicted class Actual class Pos Neg Pos TP FN Neg FP TN • Accuracy = (TP+TN)/(P+N) • Error = (FP+FN)/(P+N) • Precision = TP/(TP+FP) • Recall/TP rate = TP/P • FP Rate = FP/N P N
  • 52. Example: 3 classifiers True Predicted pos neg pos 60 40 neg 20 80 True Predicted pos neg pos 70 30 neg 50 50 True Predicted pos neg pos 40 60 neg 30 70 Classifier 1 TPR = FPR = Classifier 2 TPR = FPR = Classifier 3 TPR = FPR =
  • 53. Example: 3 classifiers True Predicted pos neg pos 60 40 neg 20 80 True Predicted pos neg pos 70 30 neg 50 50 True Predicted pos neg pos 40 60 neg 30 70 Classifier 1 TPR = 0.4 FPR = 0.3 Classifier 2 TPR = 0.7 FPR = 0.5 Classifier 3 TPR = 0.6 FPR = 0.2
  • 54. Multiclass-Things to Notice • The total number of test examples of any class would be the sum of corresponding row(i.e the TP +FN for that class) • The total number of FN’s for a class is sum of values in the corresponding row(excluding the TP) • The total number of FP’s for a class is the sum of values in the corresponding column(excluding the TP) • The total number of TN’s for a certain class will be the sum of all column and rows excluding that class's column and row Predicted Actual A B C D E A TPA EAB EAC EAD EAE B EBA TPB EBC EBD EBE C ECA ECB TPC ECD ECE D EDA EDB EDC TPD EDE E EEA EEB EEC EED TPE
  • 55. Predicted Actual A B C D E A TPA EAB EAC EAD EAE B EBA TPB EBC EBD EBE C ECA ECB TPC ECD ECE D EDA EDB EDC TPD EDE E EEA EEB EEC EED TPE
  • 56. Multi-class Predicted Act ual A B C A TPA EAB EAC B EBA TPB EBC C ECA ECB TPC Predicted class Actual class P N P TP FN N FP TN Predicted Actual A Not A A Not A Predicted Actual B Not B B Not B Predicted Actual C Not C C Not C
  • 57. Multi-class Predicted Act ual A B C A TPA EAB EAC B EBA TPB EBC C ECA ECB TPC Predicted class Actual class P N P TP FN N FP TN Predicted Actual A Not A A TPA EAB + EAC Not A EBA + ECA TPB + EBC ECB + TPC Predicted Actual B Not B B TPB EBA + EBC Not B EAB+ ECB TPA + EAC ECA + TPC Predicted Actual C Not C C TPC ECA + ECB Not C EAC + EBC TPA + EAB EBA + TPB
  • 58. Example: A B C A 25 5 2 B 3 32 4 C 1 0 15 Overall Accuracy: Precision A= Recall B= Predicted Actual
  • 59. Example: A B C A 25 5 2 B 3 32 4 C 1 0 15 Overall Accuracy = (25+32+15)/(25+5+2+3+32+4+1+0+15) Precision A= 25/(25+3+1) Recall B= 32/(32+3+4)
  • 60. Counting the Costs • In practice, different types of classification errors often incur different costs • Examples: – ¨ Terrorist profiling • “Not a terrorist” correct 99.99% of the time – Loan decisions – Fault diagnosis – Promotional mailing
  • 61. Cost Matrices Pos Neg Pos TP Cost FN Cost Neg FP Cost TN Cost Usually, TP Cost and TN Cost are set equal to 0 Hypothesized class True class
  • 62. Lift Charts • In practice, decisions are usually made by comparing possible scenarios taking into account different costs. • Example: • Promotional mail out to 1,000,000 households. If we mail to all households, we get 0.1% respond (1000). • Data mining tool identifies -subset of 100,000 households with 0.4% respond (400); or - subset of 400,000 households with 0.2% respond (800); • Depending on the costs we can make final decision using lift charts! • A lift chart allows a visual comparison for measuring model performance
  • 63. Generating a Lift Chart • Given a scheme that outputs probability, sort the instances in descending order according to the predicted probability • In lift chart, x axis is sample size and y axis is number of true positives. Rank Predicted Probability Actual Class 1 0.95 Yes 2 0.93 Yes 3 0.93 No 4 0.88 Yes ….. …. ….
  • 65. Example 01: Direct Marketing • A company wants to do a mail marketing campaign • It costs the company $1 for each item mailed • They have information on 100,000 customers • Create a cumulative gains and lift charts from the following data • Overall Response Rate: If we assume we have no model other than the prediction of the overall response rate, then we can predict the number of positive responses as a fraction of the total customers contacted • Suppose the response rate is 20% • If all 100,000 customers are contacted we will receive around 20,000 positive responses
  • 66. Cost($) Total Customers Contacted Positive Responses 100000 100000 20000 • Prediction of Response Model: A Response model predicts who will respond to a marketting campaign • If we have a response model, we can make more detailed predictions • For example, we use the response model to assign a score to all 100,000 customers and predict the results of contacting only the top 10,000 customers, the top 20,000 customers ,etc Cost($) Total Customers Contacted Positive Responses 10,000 10,000 6,000 20,000 20,000 10,000 30,000 30,000 13,000 40,000 40,000 15,800 50,000 50,000 17,000 60,000 60,000 18,000 70,000 70,000 18,800 80,000 80,000 19,400 90,000 90,000 19,800 100,000 100,000 20,000
  • 67. Cumulative Gains Chart • The y-axis shows the percentage of positive responses. This is a percentage of the total possible positive responses (20,000 as the overall response rate shows) • The x-axis shows the percentage of customers contacted, which is a fraction of the 100,000 total customers • Baseline(Overall response rate): If we contact X% of customers then we will receive X% if the total positive responses • Lift Curve: Using the predictions of the response model, calculate the percentage of positive responses for the percent customers contacted and map these points to create the lift curve
  • 68. Cost($) Total Customers Contacted Positive Responses 10,000 10,000 6,000 20,000 20,000 10,000 30,000 30,000 13,000 40,000 40,000 15,800 50,000 50,000 17,000 60,000 60,000 18,000 70,000 70,000 18,800 80,000 80,000 19,400 90,000 90,000 19,800 100,000 100,000 20,000
  • 69. Lift Chart • Shows the actual lift. • To plot the chart: Calculate the points on the lift curve by determining the ratio between the result predicted by our model and the result using no model. • Example: For contacting 10% of customers, using no model we should get 10% of responders and using the given model we should get 30% of responders. The y-value of the lift curve at 10% is 30 / 10 = 3
  • 70. Lift Chart Cumulative gains and lift charts are a graphical representation of the advantage of using a predictive model to choose which customers to contact
  • 71. Example 2: • Using the response model P(x)=100-AGE(x) for customer x and the data table shown below, construct the cumulative gains and lift charts.
  • 72. Calculate P(x) for each person x 1. Calculate P(x) for each person x 2. Order the people according to rank P(x) 3. Calculate the percentage of total responses for each cutoff point Response Rate = Number of Responses / Total Number of Responses Total Custo mer Conta cted #of Respo nses Respo nse Rate 2 4 6 8 10 12 14 16 18 20
  • 73. Calculate P(x) for each person x 1. Calculate P(x) for each person x 2. Order the people according to rank P(x) 3. Calculate the percentage of total responses for each cutoff point Response Rate = Number of Responses / Total Number of Responses
  • 74. Cumulative Gains vs Lift Chart The lift curve and the baseline have the same values for 10%-20% and 90%-100%.
  • 75. ROC Curves • ROC curves are similar to lift charts – Stands for “receiver operating characteristic” – Used in signal detection to show tradeoff between hit rate and false alarm rate over noisy channel • Differences from gains chart: – x axis shows percentage of false positives in sample, rather than sample size
  • 78. ROC Curves and Analysis True Predicted pos neg pos 60 40 neg 20 80 True Predicted pos neg pos 70 30 neg 50 50 True Predicted pos neg pos 40 60 neg 30 70 Classifier 1 TPr = 0.4 FPr = 0.3 Classifier 2 TPr = 0.7 FPr = 0.5 Classifier 3 TPr = 0.6 FPr = 0.2
  • 79. ROC analysis • True Positive Rate – TPR = TP / (TP+FN) – also called sensitivity – true abnormals called abnormal by the observer • False Positive Rate – FPR = FP / (FP+TN) • Specificity (TNR)= TN / (TN+FP) – True normals called normal by the observer • FPR = 1 - specificity
  • 80.
  • 81. Evaluating classifiers (via their ROC curves) Classifier A can’t distinguish between normal and abnormal. B is better but makes some mistakes. C makes very few mistakes.
  • 83. Quiz 4: 1) How many images of Gerhard Schroeder in the data set? 2) How many predictions of G Schroeder are there? 3) What is the Probability that Hugo Chavez classified correctly in our learning algorithm? 4) Your learning algorithm predicted/classified as Hugo Chavez. What is the probability he is actually Hugo Chavez? 5) Recall(“Hugo Chavez”) = 6)Precision(“Hugo Chavez”)= 7) Recall(“Colin Powell”)= 8) Precision(“Colin Powel”)= 9)Recall(“George W Bush”)= 10) Precision(“George W Bush”)=
  翻译: