Similar presentations:
Assessing and comparing classification algorithms
1. INTRODUCTION TO Machine Learning
Lecture Slides forINTRODUCTION TO
Machine Learning
ETHEM ALPAYDIN
© The MIT Press, 2004
[email protected]
http://www.cmpe.boun.edu.tr/~ethem/i2ml
2.
CHAPTER 14:Assessing and Comparing
Classification Algorithms
3. Introduction
Questions:Assessment of the expected error of a learning
algorithm: Is the error rate of 1-NN less than 2%?
Comparing the expected errors of two
algorithms: Is k-NN more accurate than MLP ?
Training/validation/test sets
Resampling methods: K-fold cross-validation
3
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
4. Algorithm Preference
Criteria (Application-dependent):Misclassification error, or risk (loss functions)
Training time/space complexity
Testing time/space complexity
Interpretability
Easy programmability
Cost-sensitive learning
4
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
5. Resampling and K-Fold Cross-Validation
The need for multiple training/validation sets{Xi,Vi}i: Training/validation sets of fold i
K-fold cross-validation: Divide X into k, Xi,i=1,...,K
V1 X1
T 1 X2 X3 X K
V2 X2
T 2 X1 X 3 X K
VK X K T K X1 X2 X K 1
Ti share K-2 parts
5
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
6. 5×2 Cross-Validation
5 times 2 fold cross-validation (Dietterich, 1998)T 1 X1 1
V1 X1 2
2
1
T 2 X1
V2 X1
T 3 X2 1
V3 X2 2
T 4 X2 2
V4 X2 1
T 9 X5 1
2
T 10 X5
V9 X5 2
1
V10 X5
6
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
7. Bootstrapping
Draw instances from a dataset with replacementProb that we do not pick an instance after N draws
N
1
1
1 e 0.368
N
that is, only 36.8% is new!
7
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
8. Measuring Error
Error rateRecall
= # of errors / # of instances = (FN+FP) / N
= # of found positives / # of positives
= TP / (TP+FN) = sensitivity = hit rate
Precision = # of found positives / # of found
= TP / (TP+FP)
Specificity = TN / (TN+FP)
False alarm rate = FP / (FP+TN) = 1 - Specificity
8
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
9. ROC Curve
9Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
10. Interval Estimation
X = { xt }t where xt ~ N ( μ, σ2)m ~ N ( μ, σ2/N)
N
m ~ Z
m
P 1.96 N
1.96 0.95
P m 1.96
m 1.96
0.95
N
N
P m z / 2
m z / 2
1
N
N
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
100(1- α) percent
confidence
interval
10
11.
mP N
1.64 0.95
P m 1.64
0.95
N
P m z
1
N
When
σ2 is not known:
S x m / N 1
2
t
t
2
N m
~ t N 1
S
S
S
P m t / 2,N 1
m t / 2,N 1
1
N
N
11
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
12. Hypothesis Testing
Reject a null hypothesis if not supported by thesample with enough confidence
X = { xt }t where xt ~ N ( μ, σ2)
H0: μ = μ0 vs. H1: μ ≠ μ0
Accept H0 with level of significance α if μ0 is in the
100(1- α) confidence interval
N m 0
z / 2 , z / 2
Two-sided test
12
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
13.
One-sided test: H0: μ ≤ μ0 vs. H1: μ > μ0Accept if
N m 0
, z
Variance unknown: Use t, instead of z
Accept H0: μ = μ0 if
N m 0
t / 2,N 1 ,t / 2,N 1
S
13
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
14. Assessing Error: H0: p ≤ p0 vs. H1: p > p0
Assessing Error:H0: p ≤ p0 vs. H1: p > p0
Single training/validation set: Binomial Test
If error prob is p0, prob that there are e errors or
less in N validation trials is
N j
j
P X e p 0 1 p 0
j 1 j
e
N j
Accept if this prob is less than 1- α
1- α
N=100, e=20
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
14
15. Normal Approximation to the Binomial
Number of errors X is approx N with mean Np0 andvar Np0(1-p0)
X Np 0
~Z
Np 0 1 p 0
Accept if this prob for X = e is
less than z1-α
1- α
15
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
16. Paired t Test
Multiple training/validation setsxti = 1 if instance t misclassified on fold i
N
t
Error rate of fold i:
x
t 1 i
pi
N
2
With m and s average and var of pi
we accept p0 or less error if
K m p 0
~ t K 1
S
is less than tα,K-1
16
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
17. Comparing Classifiers: H0: μ0 = μ1 vs. H1: μ0 ≠ μ1
Single training/validation set: McNemar’s TestUnder H0, we expect e01= e10=(e01+ e10)/2
e
e10 1
2
01
e01 e10
~ X12
Accept if < X2α,1
17
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
18. K-Fold CV Paired t Test
Use K-fold cv to get K training/validation foldspi1, pi2: Errors of classifiers 1 and 2 on fold i
pi = pi1 – pi2 : Paired difference on fold i
The null hypothesis is whether pi has mean 0
H 0 : 0 vs. H 0 : 0
i 1pi
K
m
K
K m 0
s
2
p
m
i 1 i
K
s2
K 1
K m
~ t K 1 Accept if in t / 2,K 1 ,t / 2,K 1
s
18
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
19. 5×2 cv Paired t Test
Use 5×2 cv to get 2 folds of 5 tra/val replications(Dietterich, 1998)
pi(j) : difference btw errors of 1 and 2 on fold j=1, 2
of replication i=1,...,5
1
2
pi pi pi
/ 2
2
i
1
s pi pi
p1 1
2
s
i 1 i / 5
5
p
2
2
i
pi
2
~ t5
Two-sided test: Accept H0: μ0 = μ1 if in (-tα/2,5,tα/2,5)
One-sided test: Accept H0: μ0 ≤ μ1 if < tα,5
19
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
20. 5×2 cv Paired F Test
p2 s
j
5
2
i 1
i
j 1
5
2
i 1 i
2
~ F10 ,5
Two-sided test: Accept H0: μ0 = μ1 if < Fα,10,5
20
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
21. Comparing L>2 Algorithms: Analysis of Variance (Anova)
Comparing L>2 Algorithms:Analysis of Variance (Anova)
H 0 : 1 2 L
Errors of L algorithms on K folds
X ij ~ N j , 2 , j 1,..., L, i 1,..., K
We construct two estimators to σ2 .
One is valid if H0 is true, the other is always valid.
We reject H0 if the two estimators disagree.
21
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
22.
If H 0 is true :K
mj
i 1
X ij
~ N , 2 / K
K
j 1m j
L
m
S
2
m
j
m
2
j
L
L 1
Thus an estimator of 2 is K S 2 , namely,
L
ˆ 2 K
j
m
m
j 1
m
/K
2
j
L 1
2
j
2
m
~ X L2 1 SSb K m j m
2
j
So when H 0 is true, we have
SSb
2
~
X
L 1
2
22
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
23.
Regardless of H 0 our second estimator to 2 is theaverage of group variances S 2j :
X
K
S 2j
i 1
mj
2
ij
L
2
ˆ
K 1
SSw X ij m j
j 1
S 2j
L
j
i
X
mj
2
ij
L K 1
2
j
K 1
S 2j
i
~X
2
K 1
SSw
2
2
SSb / 2 SSw / 2
~ X L2 K 1
SSb / L 1
/
~ FL 1,L K 1
L 1 L K 1 SSw / L K 1
H 0 : 1 2 L if F ,L 1,L K 1
23
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
24. Other Tests
Range test (Newman-Keuls): 145 23Nonparametric tests (Sign test, Kruskal-Wallis)
Contrasts: Check if 1 and 2 differ from 3,4, and 5
Multiple comparisons require Bonferroni correction
If there are m tests, to have an overall significance
of α, each test should have a significance of α/m.
Regression: CLT states that the sum of iid variables
from any distribution is approximately normal and
the preceding methods can be used.
Other loss functions ?
24
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)