尊敬的 微信汇率:1円 ≈ 0.046078 元 支付宝汇率:1円 ≈ 0.046168元 [退出登录]
SlideShare a Scribd company logo
XJinliaXXngXXXXXXXX
jlxu@bupt.edu.cn
Department of Computer Science
Institute of Network Technology . BUPT
May 20, 2016
K-means, E.M. and Miture ModelsK-means, E.M. and Mixture Models
Remind:Two``-MainProblemsinML
• Two-mainproblemsinML:
– Regression: Linear Regression, Neural net...
– Classification: Decision Tree, kNN, Bayessian Classifier...
• Today, we will learn:
– K-means: a trivial unsupervised classification algorithm.
– Expectation Maximization: a general algorithm for density estimation.
∗ We will see how to use EM in general cases and in specific case of GMM.
– GMM: a tool for modelling Data-in-the-Wild (density estimator)
∗ We also learn how to use GMM in a Bayessian Classifier
Contents
• Unsupervised Learning
• K-means clustering
• Expectation Maximization (E.M.)
• Gaussian mixtures as a Density Estimator
– Gaussian mixtures
– EM for mixtures
• Gaussian mixtures for classification
Unsupervised Learning
•
– Label of each sample is included in the training set
Sample Label
x1 y1
... ...
xn yk
• Unsupervised learning:
– Traning set contains the samples only
Sample Label
x1
...
xn
Supervised learning techniques:
Unsupervised Learning
−10 0 10 20 30 40 50
0
10
20
30
40
50
60
(a) Supervised learning.
−10 0 10 20 30 40 50
0
10
20
30
40
50
60
(b) Unsupervised learning.
Figure 1: Unsupervised vs. Supervised Learning
What is unsupervised learning useful for?
• Collecting and labeling a large training set can be very expensive.
• Be able to find features which are helpful for categorization.
• Gain insight into the natural structure of the data.
Contents
• Unsupervised Learning
• K-means clustering
• Expectation Maximization (E.M.)
• Gaussian mixtures as a Density Estimator
– Gaussian mixtures
– EM for mixtures
• Gaussian mixtures for classification
K-means clustering
• Clustering algorithms aim to find
groups of “similar” data points among
the input data.
• K-means is an effective algorithm to ex-
tract a given number of clusters from a
training set.
• Once done, the cluster locations can
be used to classify data into distinct
classes. −10 0 10 20 30 40 50
0
10
20
30
40
50
60
K-means clustering
• Given:
– The dataset: {xn}N
n=1 = {x1, x2, ..., xN}
– Number of clusters: K (K < N)
• Goal: find a partition S = {Sk}K
k=1 so that it minimizes the objective function
J =
N∑
n=1
K∑
k=1
rnk ∥ xn − µk ∥2
(1)
where rnk = 1 if xn is assigned to cluster Sk, and rnj = 0 for j ̸= k.
i.e. Find values for the {rnk} and the {µk} to minimize (1).
K-means clustering
J =
N∑
n=1
K∑
k=1
rnk ∥ xn − µk ∥2
• Select some initial values for the µk.
• Expectation: keep the µk fixed, minimize J respect to rnk.
• Maximization: keep the rnk fixed, minimize J respect to the µk.
• Loop until no change in the partitions (or maximum number of interations is
exceeded).
K-means clustering
J =
N∑
n=1
K∑
k=1
rnk ∥ xn − µk ∥2
• Expectation: J is linear function of rnk
rnk =



1 if k = arg minj ∥ xn − µj ∥2
0 otherwise
• Maximization: setting the derivative of J with respect to µk to zero, gives:
µk =
∑
n rnkxn
∑
n rnk
Convergence of K-means: assured [why?], but may lead to local minimum of J
[8]
K-means clustering: How to understand?
J =
N∑
n=1
K∑
k=1
rnk ∥ xn − µk ∥2
• Expectation: minimize J respect to rnk
– For each xn, find the “closest” cluster mean µk and put xn into cluster Sk.
• Maximization: minimize J respect to µk
– For each cluster Sk, re-estimate the cluster mean µk to be the average value
of all samples in Sk.
• Loop until no change in the partitions (or maximum number of interations is
exceeded).
Initialize with random clusters
Assign each point to nearest center
Recompute optimum centers (means)
Repeat: Assign points to nearest center
Repeat: Recompute centers
Repeat...
Repeat...Until clustering does not change
Repeat...Until clustering does not change
Total error reduced at every step - guaranteed to converge.
K-means clustering: some variations
• Initial cluster centroids:
– Randomly selected
– Iterative procedure: k-mean++ [2]
• Number of clusters K:
– Empirically/experimentally: 2 ∼
√
n
– Learning [6]
• Objective function:
– General dissimilarity measure: k-medoids algorithm.
• Speeding up:
– kd-trees for pre-processing [7]
– Triangle inequality for distance calculation [4]
Contents
• Unsupervised Learning
• K-means clustering
• Expectation Maximization (E.M.)
• Gaussian mixtures as a Density Estimator
– Gaussian mixtures
– EM for mixtures
• Gaussian mixtures for classification
Expectation Maximization
E.M.
Expectation Maximization
• A general-purpose algorithm for MLE in a wide range of situations.
• First formally stated by Dempster, Laird and Rubin in 1977 [1]
• An excellent way of doing our unsupervised learning problem, as we will see
– EM is also used widely in other domains.
EM: a solution for MLE
• Given a statistical model with:
– a set X of observed data,
– a set Z of unobserved latent data,
– a vector of unknown parameters θ,
– a likelihood function L (θ; X, Z) = p (X, Z | θ)
• Roughly speaking, the aim of MLE is to determine θ = arg maxθ L (θ; X, Z)
– We known the old trick: partial derivatives of the log likelihood...
– But it is not always tractable [e.g.]
– Other solutions are available.
EM: General Case
L (θ; X, Z) = p (X, Z | θ)
• EM is just an iterative procedure for finding the MLE
• Expectation step: keep the current estimate θ(t)
fixed, calculate the expected
value of the log likelihood function
Q
(
θ | θ(t)
)
= E [log L (θ; X, Z)] = E [log p (X, Z | θ)]
• Maximization step: Find the parameter that maximizes this quantity
θ(t+1)
= arg max
θ
Q
(
θ | θ(t)
)
EM: Motivation
• If we know the value of the parameters θ, we can find the value of latent variables
Z by maximizing the log likelihood over all possible values of Z
– Searching on the value space of Z.
• If we know Z, we can find an estimate of θ
– Typically by grouping the observed data points according to the value of asso-
ciated latent variable,
– then averaging the values (or some functions of the values) of the points in
each group.
To understand this motivation, let’s take K-means as a trivial example...
EM: informal description
Both θ and Z are unknown, EM is an iterative algorithm:
1. Initialize the parameters θ to some random values.
2. Compute the best values of Z given these parameter values.
3. Use the just-computed values of Z to find better estimates for θ.
4. Iterate until convergence.
EM Convergence
• E.M. Convergence: Yes
– After each iteration, p (X, Z | θ) must increase or remain [NOT OBVIOUS]
– But it can not exceed 1 [OBVIOUS]
– Hence it must converge [OBVIOUS]
• Bad news: E.M. converges to local optimum.
– Whether the algorithm converges to the global optimum depends on the ini-
tialization.
• Let’s take K-means as an example, again...
• Details can be found in [9].
Contents
• Unsupervised Learning
• K-means clustering
• Expectation Maximization (E.M.)
• Gaussian mixtures as a Density Estimator
– Gaussian mixtures
– EM for mixtures
• Gaussian mixtures for classification
–
Remind: Bayes Classifier
0 10 20 30 40 50 60 70 80
−10
0
10
20
30
40
50
60
70
p (y = i | x) =
p (x | y = i) p (y = i)
p (x)
Remind: Bayes Classifier
0 10 20 30 40 50 60 70 80
−10
0
10
20
30
40
50
60
70
In case of Gaussian Bayes Classifier:
p (y = i | x) =
1
(2π)
d/2
∥Σi∥1/2
exp
[
−1
2 (x − µi)T
Σi (x − µi)
]
pi
p (x)
How can we deal with the denominator p (x)?
Remind: The Single Gaussian Distribution
• Multivariate Gaussian
N (x; µ, Σ) =
1
(2π)
d/2
∥ Σ ∥1/2
exp


−
1
2
(x − µ)T
Σ−1
(x − µ)



• For maximum likelihood
0 =
∂ ln N (x1, x2, ..., xN; µ, Σ)
∂µ
• and the solution is
µML =
1
N
N∑
i=1
xi
ΣML =
1
N
N∑
i=1
(xi − µML)T
(xi − µML)
The GMM assumption
• There are k components: {ci}k
i=1
• Component ci has an associated mean
vector µi
•
•
µ1
µ2
µ3
The GMM assumption
• There are k components: {ci}k
i=1
• Component ci has an associated mean
vector µi
• Each component generates data from a
Gaussian with mean µi and covariance
matrix Σi
• Each sample is generated according to
the following guidelines:
µ1
µ2
µ3
The GMM assumption
• There are k components: {ci}k
i=1
• Component ci has an associated mean
vector µi
• Each component generates data from a
Gaussian with mean µi and covariance
matrix Σi
• Each sample is generated according to
the following guidelines:
– Randomly select component ci
with probability P (ci) = wi, s.t.
∑k
i=1 wi = 1
µ2
The GMM assumption
• There are k components: {ci}k
i=1
• Component ci has an associated mean
vector µi
• Each component generates data from a
Gaussian with mean µi and covariance
matrix Σi
• Each sample is generated according to
the following guidelines:
– Randomly select component ci with
probability P (ci) = wi, s.t.
∑k
i=1 wi = 1
– Sample ~ N (µi, Σi)
µ2
x
Probability density function of GMM
“Linear combination” of Gaussians:
f (x) =
k∑
i=1
wiN (x; µi, Σi) , where
k∑
i=1
wi = 1
0 50 100 150 200 250
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
w1N µ1, σ2
1 w2N µ2, σ2
2
w3N µ3, σ2
3
f (x)
(a) The pdf of an 1D GMM with 3 components. (b) The pdf of an 2D GMM with 3 components.
Figure 2: Probability density function of some GMMs.
GMM: Problem definition
f (x) =
k∑
i=1
wiN (x; µi, Σi) , where
k∑
i=1
wi = 1
Given a training set, how to model these data point using GMM?
• Given:
– The trainning set: {xi}N
i=1
– Number of clusters: k
• Goal: model this data using a mixture of Gaussians
– Weights: w1, w2, ..., wk
– Means and covariances: µ1, µ2, ..., µk; Σ1, Σ2, ..., Σk
Computing likelihoods in unsupervised case
f (x) =
k∑
i=1
wiN (x; µi, Σi) , where
k∑
i=1
wi = 1
• Given a mixture of Gaussians, denoted by G. For any x, we can define the
likelihood:
P (x | G) = P (x | w1, µ1, Σ1, ..., wk, µk, Σk)
=
k∑
i=1
P (x | ci) P (ci)
=
k∑
i=1
wiN (x; µi, Σi)
• So we can define likelihood for the whole training set [Why?]
P (x1, x2, ..., xN | G) =
N∏
i=1
P (xi | G)
=
N∏
i=1
k∑
j=1
wjN (xi; µj, Σj)
Estimating GMM parameters
• We known this: Maximum Likelihood Estimation
ln P (X | G) =
N∑
i=1
ln



k∑
j=1
wjN (xi; µj, Σj)



– For the max likelihood:
0 =
∂ ln P (X | G)
∂µj
– This leads to non-linear non-analytically-solvable equations!
• Use gradient descent
– Slow but doable
• A much cuter and recently popular method...
E.M. for GMM
• Remember:
– We have the training set {xi}N
i=1, the number of components k.
– Assume we know p (c1) = w1, p (c2) = w2, ..., p (ck) = wk
– We don’t know µ1, µ2, ..., µk
The likelihood:
p (data | µ1, µ2, ..., µk) = p (x1, x2, ..., xN | µ1, µ2, ..., µk)
=
N∏
i=1
p (xi | µ1, µ2, ..., µk)
=
N∏
i=1
k∑
j=1
p (xi | wj, µ1, µ2, ..., µk) p (cj)
=
N∏
i=1
k∑
j=1
K exp


−
1
2σ2
(
xi − µj
)
2


wi
E.M. for GMM
• For Max. Likelihood, we know ∂
∂µi
log p (data | µ1, µ2, ..., µk) = 0
• Some wild algebra turns this into: For Maximum Likelihood, for each j:
µj =
N∑
i=1
p (cj | xi, µ1, µ2, ..., µk) xi
N∑
i=1
p (cj | xi, µ1, µ2, ..., µk)
This is N non-linear equations of µj’s.
• So:
– If, for each xi, we know p (cj | xi, µ1, µ2, ..., µk), then we could easily compute
µj,
– If we know each µj, we could compute p (cj | xi, µ1, µ2, ..., µk) for each xi
and cj.
E.M. for GMM
• E.M. is coming: on the t’th iteration, let our estimates be
λt = {µ1 (t) , µ2 (t) , ..., µk (t)}
• E-step: compute the expected classes of all data points for each class
p (cj | xi, λt) =
p (xi | cj, λt) p (cj | λt)
p (xi | λt)
=
p
(
xi | cj, µj (t) , σjI
)
p (cj)
k∑
m=1
p (xi | cm, µm (t) , σmI) p (cm)
• M-step: compute µ given our data’s class membership distributions
µj (t + 1) =
N∑
i=1
p (cj | xi, λt) xi
N∑
i=1
p (cj | xi, λt)
E.M. for General GMM: E-step
• On the t’th iteration, let our estimates be
λt = {µ1 (t) , µ2 (t) , ..., µk (t) , Σ1 (t) , Σ2 (t) , ..., Σk (t) , w1 (t) , w2 (t) , ..., wk (t)}
• E-step: compute the expected classes of all data points for each class
τij (t) ≡ p (cj | xi, λt) =
p (xi | cj, λt) p (cj | λt)
p (xi | λt)
=
p
(
xi | cj, µj (t) , Σj (t)
)
wj (t)
k∑
m=1
p (xi | cm, µm (t) , Σj (t)) wm (t)
E.M. for General GMM: M-step
• M-step: compute µ given our data’s class membership distributions
wj (t + 1) =
N∑
i=1
p (cj | xi, λt)
N
µj (t + 1) =
N∑
i=1
p (cj | xi, λt) xi
N∑
i=1
p (cj | xi, λt)
=
1
N
N∑
i=1
τij (t) =
1
Nwj (t + 1)
N∑
i=1
τij (t) xi
Σj (t + 1) =
N∑
i=1
p (cj | xi, λt)
[
xi − µj (t + 1)
] [
xi − µj (t + 1)
]
T
N∑
i=1
p (cj | xi, λt)
=
1
Nwj (t + 1)
N∑
i=1
τij (t)
[
xi − µj (t + 1)
] [
xi − µj (t + 1)
]
T
E.M. for General GMM: Initialization
• wj = 1/k, j = 1, 2, ..., k
• Each µj is set to a randomly selected point
– Or use K-means for this initialization.
• Each Σj is computed using the equation in previous slide...
Gaussian Mixture Example: Start
After first iteration
After 2nd iteration
After 3rd iteration
After 4th iteration
After 5th iteration
After 6th iteration
After 20th iteration
Local optimum solution
• E.M. is guaranteed to find the local optimal solution by monotonically increasing
the log-likelihood
• Whether it converges to the global optimal solution depends on the initialization
−10 −5 0 5 10 15
0
2
4
6
8
10
12
14
16
18
−10 −5 0 5 10 15
0
5
10
15
GMM: Selecting the number of components
• We can run the E.M. algorithm with different numbers of components.
– Need a criteria for selecting the “best” number of components
−10 −5 0 5 10 15
0
5
10
15
−10 −5 0 5 10 15
0
2
4
6
8
10
12
14
16
−10 −5 0 5 10 15
0
2
4
6
8
10
12
14
16
Contents
• Unsupervised Learning
• K-means clustering
• Expectation Maximization (E.M.)
– Regularized EM
– Model Selection
• Gaussian mixtures as a Density Estimator
– Gaussian mixtures
– EM for mixtures
• Gaussian mixtures for classification
Gaussian mixtures for classification
p (y = i | x) =
p (x | y = i) p (y = i)
p (x)
• To build a Bayesian classifier based on GMM, we can use GMM to model data in
each class
– So each class is modeled by one k-component GMM.
• For example:
Class 0: p (y = 0) , p (x | θ0), (a 3-component mixture)
Class 1: p (y = 1) , p (x | θ1), (a 3-component mixture)
Class 2: p (y = 2) , p (x | θ2), (a 3-component mixture)
...
GMM for Classification
• As previous, each class is modeled by a k-component GMM.
• A new test sample x is classified according to
c = arg max
i
p (y = i) p (x | θi)
where
p (x | θi) =
k∑
i=1
wiN (x; µi, Σi)
• Simple, quick (and is actually used!)
Q & A
References
[1] N. Laird A. Dempster and D. Rubin. Maximum likelihood from incomplete data
via the em algorithm. Journal of the Royal Statistical Society. Series B (Method-
ological), 39(1):pp. 1–38., 1977.
[2] David Arthur and Sergei Vassilvitskii. k-means ++ : The Advantages of Careful
Seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on
Discrete algorithms, volume 8, pages 1027–1035, 2007.
[3] N. Gumerov C. Yang, R. Duraiswami and L. Davis. Improved fast gauss transform
and efficient kernel density estimation. In IEEE International Conference on
Computer Vision, pages pages 464–471, 2003.
[4] Charles Elkan. Using the Triangle Inequality to Accelerate k-Means. In Proceed-
ings of the Twentieth International Conference on Machine Learning (ICML),
2003.
[5] Keshu Zhang Haifeng Li and Tao Jiang. The regularized em algorithm. In
Proceedings of the 20th National Conference on Artificial Intelligence, pages
pages 807 – 812, Pittsburgh, PA, 2005.
[6] Greg Hamerly and Charles Elkan. Learning the k in k-means. In In Neural
Information Processing Systems. MIT Press, 2003.
[7] Tapas Kanungo, David M Mount, Nathan S Netanyahu, Christine D Piatko, Ruth
Silverman, and Angela Y Wu. An efficient k-means clustering algorithm: anal-
ysis and implementation. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 24(7):881–892, July 2002.
[8] J MacQueen. Some methods for classification and analysis of multivariate obser-
vations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics
and Probability, volume 233, pages 281–297. University of California Press, 1967.
[9] C.F. Wu. On the convergence properties of the em algorithm. The Annals of
Statistics, 11:95–103, 1983.

More Related Content

What's hot

Genetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceGenetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial Intelligence
Sahil Kumar
 
Game Playing in Artificial Intelligence
Game Playing in Artificial IntelligenceGame Playing in Artificial Intelligence
Game Playing in Artificial Intelligence
lordmwesh
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Arshad Farhad
 
Lecture 2 (Machine Learning)
Lecture 2 (Machine Learning)Lecture 2 (Machine Learning)
Lecture 2 (Machine Learning)
VARUN KUMAR
 
Neural networks
Neural networksNeural networks
Neural networks
Slideshare
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
VARUN KUMAR
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
NAGUR SHAREEF SHAIK
 
Structured Knowledge Representation
Structured Knowledge RepresentationStructured Knowledge Representation
Structured Knowledge Representation
Sagacious IT Solution
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
Aman Patel
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Sanghyuk Chun
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
omaraldabash
 
Decision Trees
Decision TreesDecision Trees
Mycin presentation
Mycin presentationMycin presentation
Mycin presentation
Abdullah Khosa
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
Kirkwood Donavin
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
Livares Technologies Pvt Ltd
 
knowledge representation using rules
knowledge representation using rulesknowledge representation using rules
knowledge representation using rules
Harini Balamurugan
 
Defuzzification
DefuzzificationDefuzzification
unit-4-dynamic programming
unit-4-dynamic programmingunit-4-dynamic programming
unit-4-dynamic programming
hodcsencet
 
02 Machine Learning - Introduction probability
02 Machine Learning - Introduction probability02 Machine Learning - Introduction probability
02 Machine Learning - Introduction probability
Andres Mendez-Vazquez
 
Feature selection
Feature selectionFeature selection
Feature selection
Dong Guo
 

What's hot (20)

Genetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceGenetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial Intelligence
 
Game Playing in Artificial Intelligence
Game Playing in Artificial IntelligenceGame Playing in Artificial Intelligence
Game Playing in Artificial Intelligence
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Lecture 2 (Machine Learning)
Lecture 2 (Machine Learning)Lecture 2 (Machine Learning)
Lecture 2 (Machine Learning)
 
Neural networks
Neural networksNeural networks
Neural networks
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Structured Knowledge Representation
Structured Knowledge RepresentationStructured Knowledge Representation
Structured Knowledge Representation
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Mycin presentation
Mycin presentationMycin presentation
Mycin presentation
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
 
knowledge representation using rules
knowledge representation using rulesknowledge representation using rules
knowledge representation using rules
 
Defuzzification
DefuzzificationDefuzzification
Defuzzification
 
unit-4-dynamic programming
unit-4-dynamic programmingunit-4-dynamic programming
unit-4-dynamic programming
 
02 Machine Learning - Introduction probability
02 Machine Learning - Introduction probability02 Machine Learning - Introduction probability
02 Machine Learning - Introduction probability
 
Feature selection
Feature selectionFeature selection
Feature selection
 

Similar to Clustering:k-means, expect-maximization and gaussian mixture model

13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf
EmanAsem4
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
36rajneekant
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
홍배 김
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
Albert Bifet
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest Neighbors
Marina Santini
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
홍배 김
 
Cheatsheet unsupervised-learning
Cheatsheet unsupervised-learningCheatsheet unsupervised-learning
Cheatsheet unsupervised-learning
Steve Nouri
 
Lect4
Lect4Lect4
Lect4
sumit621
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
Vu Pham
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf
Rahul926331
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
Emad Nabil
 
Clustering-beamer.pdf
Clustering-beamer.pdfClustering-beamer.pdf
Clustering-beamer.pdf
LorenzoCampoli1
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
Steve Nouri
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 

Similar to Clustering:k-means, expect-maximization and gaussian mixture model (20)

13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf13_Unsupervised Learning.pdf
13_Unsupervised Learning.pdf
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest Neighbors
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Cheatsheet unsupervised-learning
Cheatsheet unsupervised-learningCheatsheet unsupervised-learning
Cheatsheet unsupervised-learning
 
Lect4
Lect4Lect4
Lect4
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 
Clustering-beamer.pdf
Clustering-beamer.pdfClustering-beamer.pdf
Clustering-beamer.pdf
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 

More from jins0618

Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
jins0618
 
Latent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite NetworksLatent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite Networks
jins0618
 
Web Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet EnvironmentsWeb Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet Environments
jins0618
 
吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践
jins0618
 
李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究
jins0618
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
jins0618
 
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big dataChristian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big data
jins0618
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
jins0618
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
jins0618
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
jins0618
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
jins0618
 
Wang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationWang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configuration
jins0618
 
Wang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdWang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under threshold
jins0618
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
jins0618
 
2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining
jins0618
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hin
jins0618
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
jins0618
 
Weiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisWeiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysis
jins0618
 
Ke yi small summaries for big data
Ke yi small summaries for big dataKe yi small summaries for big data
Ke yi small summaries for big data
jins0618
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
jins0618
 

More from jins0618 (20)

Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
 
Latent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite NetworksLatent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite Networks
 
Web Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet EnvironmentsWeb Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet Environments
 
吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践
 
李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
 
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big dataChristian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big data
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
Wang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationWang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configuration
 
Wang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdWang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under threshold
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
 
2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hin
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
 
Weiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisWeiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysis
 
Ke yi small summaries for big data
Ke yi small summaries for big dataKe yi small summaries for big data
Ke yi small summaries for big data
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
 

Recently uploaded

Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
mparmparousiskostas
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
sapna sharmap11
 
MySQL Notes For Professionals sttudy.pdf
MySQL Notes For Professionals sttudy.pdfMySQL Notes For Professionals sttudy.pdf
MySQL Notes For Professionals sttudy.pdf
Ananta Patil
 
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In BangaloreBangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
yashusingh54876
 
Hot Call Girls In Bangalore 🔥 9352988975 🔥 Real Fun With Sexual Girl Availabl...
Hot Call Girls In Bangalore 🔥 9352988975 🔥 Real Fun With Sexual Girl Availabl...Hot Call Girls In Bangalore 🔥 9352988975 🔥 Real Fun With Sexual Girl Availabl...
Hot Call Girls In Bangalore 🔥 9352988975 🔥 Real Fun With Sexual Girl Availabl...
nainasharmans346
 
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your DoorAhmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Russian Escorts in Delhi 9711199171 with low rate Book online
 
Classifying Shooting Incident Fatality in New York project presentation
Classifying Shooting Incident Fatality in New York project presentationClassifying Shooting Incident Fatality in New York project presentation
Classifying Shooting Incident Fatality in New York project presentation
Boston Institute of Analytics
 
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls HyderabadHyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
binna singh$A17
 
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENTHigh Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
ranjeet3341
 
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
PsychoTech Services
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
PsychoTech Services
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
yuvishachadda
 
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering RoadshowFabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Gabi Münster
 
🔥College Call Girls Kolkata 💯Call Us 🔝 8094342248 🔝💃Top Class Call Girl Servi...
🔥College Call Girls Kolkata 💯Call Us 🔝 8094342248 🔝💃Top Class Call Girl Servi...🔥College Call Girls Kolkata 💯Call Us 🔝 8094342248 🔝💃Top Class Call Girl Servi...
🔥College Call Girls Kolkata 💯Call Us 🔝 8094342248 🔝💃Top Class Call Girl Servi...
rukmnaikaseen
 
Product Cluster Analysis: Unveiling Hidden Customer Preferences
Product Cluster Analysis: Unveiling Hidden Customer PreferencesProduct Cluster Analysis: Unveiling Hidden Customer Preferences
Product Cluster Analysis: Unveiling Hidden Customer Preferences
Boston Institute of Analytics
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance PaymentCall Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
prijesh mathew
 
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
jasodak99
 
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering RoadshowDirect Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
Gabi Münster
 

Recently uploaded (20)

Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
Optimizing Feldera: Integrating Advanced UDFs and Enhanced SQL Functionality ...
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
 
MySQL Notes For Professionals sttudy.pdf
MySQL Notes For Professionals sttudy.pdfMySQL Notes For Professionals sttudy.pdf
MySQL Notes For Professionals sttudy.pdf
 
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In BangaloreBangalore Call Girls  ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
Bangalore Call Girls ♠ 9079923931 ♠ Beautiful Call Girls In Bangalore
 
Hot Call Girls In Bangalore 🔥 9352988975 🔥 Real Fun With Sexual Girl Availabl...
Hot Call Girls In Bangalore 🔥 9352988975 🔥 Real Fun With Sexual Girl Availabl...Hot Call Girls In Bangalore 🔥 9352988975 🔥 Real Fun With Sexual Girl Availabl...
Hot Call Girls In Bangalore 🔥 9352988975 🔥 Real Fun With Sexual Girl Availabl...
 
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your DoorAhmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
Ahmedabad Call Girls 7339748667 With Free Home Delivery At Your Door
 
Classifying Shooting Incident Fatality in New York project presentation
Classifying Shooting Incident Fatality in New York project presentationClassifying Shooting Incident Fatality in New York project presentation
Classifying Shooting Incident Fatality in New York project presentation
 
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls HyderabadHyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
Hyderabad Call Girls Service 🔥 9352988975 🔥 High Profile Call Girls Hyderabad
 
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENTHigh Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
High Profile Call Girls Navi Mumbai ✅ 9833363713 FULL CASH PAYMENT
 
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
Essential Skills for Family Assessment - Marital and Family Therapy and Couns...
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
🔥Night Call Girls Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Escorts Servi...
 
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering RoadshowFabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
Fabric Engineering Deep Dive Keynote from Fabric Engineering Roadshow
 
🔥College Call Girls Kolkata 💯Call Us 🔝 8094342248 🔝💃Top Class Call Girl Servi...
🔥College Call Girls Kolkata 💯Call Us 🔝 8094342248 🔝💃Top Class Call Girl Servi...🔥College Call Girls Kolkata 💯Call Us 🔝 8094342248 🔝💃Top Class Call Girl Servi...
🔥College Call Girls Kolkata 💯Call Us 🔝 8094342248 🔝💃Top Class Call Girl Servi...
 
Product Cluster Analysis: Unveiling Hidden Customer Preferences
Product Cluster Analysis: Unveiling Hidden Customer PreferencesProduct Cluster Analysis: Unveiling Hidden Customer Preferences
Product Cluster Analysis: Unveiling Hidden Customer Preferences
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance PaymentCall Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
Call Girls Hyderabad ❤️ 7339748667 ❤️ With No Advance Payment
 
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
❣VIP Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai Escorts S...
 
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering RoadshowDirect Lake Deep Dive slides from Fabric Engineering Roadshow
Direct Lake Deep Dive slides from Fabric Engineering Roadshow
 

Clustering:k-means, expect-maximization and gaussian mixture model

  • 1. XJinliaXXngXXXXXXXX jlxu@bupt.edu.cn Department of Computer Science Institute of Network Technology . BUPT May 20, 2016 K-means, E.M. and Miture ModelsK-means, E.M. and Mixture Models
  • 2. Remind:Two``-MainProblemsinML • Two-mainproblemsinML: – Regression: Linear Regression, Neural net... – Classification: Decision Tree, kNN, Bayessian Classifier... • Today, we will learn: – K-means: a trivial unsupervised classification algorithm. – Expectation Maximization: a general algorithm for density estimation. ∗ We will see how to use EM in general cases and in specific case of GMM. – GMM: a tool for modelling Data-in-the-Wild (density estimator) ∗ We also learn how to use GMM in a Bayessian Classifier
  • 3. Contents • Unsupervised Learning • K-means clustering • Expectation Maximization (E.M.) • Gaussian mixtures as a Density Estimator – Gaussian mixtures – EM for mixtures • Gaussian mixtures for classification
  • 4. Unsupervised Learning • – Label of each sample is included in the training set Sample Label x1 y1 ... ... xn yk • Unsupervised learning: – Traning set contains the samples only Sample Label x1 ... xn Supervised learning techniques:
  • 5. Unsupervised Learning −10 0 10 20 30 40 50 0 10 20 30 40 50 60 (a) Supervised learning. −10 0 10 20 30 40 50 0 10 20 30 40 50 60 (b) Unsupervised learning. Figure 1: Unsupervised vs. Supervised Learning
  • 6. What is unsupervised learning useful for? • Collecting and labeling a large training set can be very expensive. • Be able to find features which are helpful for categorization. • Gain insight into the natural structure of the data.
  • 7. Contents • Unsupervised Learning • K-means clustering • Expectation Maximization (E.M.) • Gaussian mixtures as a Density Estimator – Gaussian mixtures – EM for mixtures • Gaussian mixtures for classification
  • 8. K-means clustering • Clustering algorithms aim to find groups of “similar” data points among the input data. • K-means is an effective algorithm to ex- tract a given number of clusters from a training set. • Once done, the cluster locations can be used to classify data into distinct classes. −10 0 10 20 30 40 50 0 10 20 30 40 50 60
  • 9. K-means clustering • Given: – The dataset: {xn}N n=1 = {x1, x2, ..., xN} – Number of clusters: K (K < N) • Goal: find a partition S = {Sk}K k=1 so that it minimizes the objective function J = N∑ n=1 K∑ k=1 rnk ∥ xn − µk ∥2 (1) where rnk = 1 if xn is assigned to cluster Sk, and rnj = 0 for j ̸= k. i.e. Find values for the {rnk} and the {µk} to minimize (1).
  • 10. K-means clustering J = N∑ n=1 K∑ k=1 rnk ∥ xn − µk ∥2 • Select some initial values for the µk. • Expectation: keep the µk fixed, minimize J respect to rnk. • Maximization: keep the rnk fixed, minimize J respect to the µk. • Loop until no change in the partitions (or maximum number of interations is exceeded).
  • 11. K-means clustering J = N∑ n=1 K∑ k=1 rnk ∥ xn − µk ∥2 • Expectation: J is linear function of rnk rnk =    1 if k = arg minj ∥ xn − µj ∥2 0 otherwise • Maximization: setting the derivative of J with respect to µk to zero, gives: µk = ∑ n rnkxn ∑ n rnk Convergence of K-means: assured [why?], but may lead to local minimum of J [8]
  • 12. K-means clustering: How to understand? J = N∑ n=1 K∑ k=1 rnk ∥ xn − µk ∥2 • Expectation: minimize J respect to rnk – For each xn, find the “closest” cluster mean µk and put xn into cluster Sk. • Maximization: minimize J respect to µk – For each cluster Sk, re-estimate the cluster mean µk to be the average value of all samples in Sk. • Loop until no change in the partitions (or maximum number of interations is exceeded).
  • 14. Assign each point to nearest center
  • 16. Repeat: Assign points to nearest center
  • 20. Repeat...Until clustering does not change Total error reduced at every step - guaranteed to converge.
  • 21. K-means clustering: some variations • Initial cluster centroids: – Randomly selected – Iterative procedure: k-mean++ [2] • Number of clusters K: – Empirically/experimentally: 2 ∼ √ n – Learning [6] • Objective function: – General dissimilarity measure: k-medoids algorithm. • Speeding up: – kd-trees for pre-processing [7] – Triangle inequality for distance calculation [4]
  • 22. Contents • Unsupervised Learning • K-means clustering • Expectation Maximization (E.M.) • Gaussian mixtures as a Density Estimator – Gaussian mixtures – EM for mixtures • Gaussian mixtures for classification
  • 24. Expectation Maximization • A general-purpose algorithm for MLE in a wide range of situations. • First formally stated by Dempster, Laird and Rubin in 1977 [1] • An excellent way of doing our unsupervised learning problem, as we will see – EM is also used widely in other domains.
  • 25. EM: a solution for MLE • Given a statistical model with: – a set X of observed data, – a set Z of unobserved latent data, – a vector of unknown parameters θ, – a likelihood function L (θ; X, Z) = p (X, Z | θ) • Roughly speaking, the aim of MLE is to determine θ = arg maxθ L (θ; X, Z) – We known the old trick: partial derivatives of the log likelihood... – But it is not always tractable [e.g.] – Other solutions are available.
  • 26. EM: General Case L (θ; X, Z) = p (X, Z | θ) • EM is just an iterative procedure for finding the MLE • Expectation step: keep the current estimate θ(t) fixed, calculate the expected value of the log likelihood function Q ( θ | θ(t) ) = E [log L (θ; X, Z)] = E [log p (X, Z | θ)] • Maximization step: Find the parameter that maximizes this quantity θ(t+1) = arg max θ Q ( θ | θ(t) )
  • 27. EM: Motivation • If we know the value of the parameters θ, we can find the value of latent variables Z by maximizing the log likelihood over all possible values of Z – Searching on the value space of Z. • If we know Z, we can find an estimate of θ – Typically by grouping the observed data points according to the value of asso- ciated latent variable, – then averaging the values (or some functions of the values) of the points in each group. To understand this motivation, let’s take K-means as a trivial example...
  • 28. EM: informal description Both θ and Z are unknown, EM is an iterative algorithm: 1. Initialize the parameters θ to some random values. 2. Compute the best values of Z given these parameter values. 3. Use the just-computed values of Z to find better estimates for θ. 4. Iterate until convergence.
  • 29. EM Convergence • E.M. Convergence: Yes – After each iteration, p (X, Z | θ) must increase or remain [NOT OBVIOUS] – But it can not exceed 1 [OBVIOUS] – Hence it must converge [OBVIOUS] • Bad news: E.M. converges to local optimum. – Whether the algorithm converges to the global optimum depends on the ini- tialization. • Let’s take K-means as an example, again... • Details can be found in [9].
  • 30. Contents • Unsupervised Learning • K-means clustering • Expectation Maximization (E.M.) • Gaussian mixtures as a Density Estimator – Gaussian mixtures – EM for mixtures • Gaussian mixtures for classification –
  • 31. Remind: Bayes Classifier 0 10 20 30 40 50 60 70 80 −10 0 10 20 30 40 50 60 70 p (y = i | x) = p (x | y = i) p (y = i) p (x)
  • 32. Remind: Bayes Classifier 0 10 20 30 40 50 60 70 80 −10 0 10 20 30 40 50 60 70 In case of Gaussian Bayes Classifier: p (y = i | x) = 1 (2π) d/2 ∥Σi∥1/2 exp [ −1 2 (x − µi)T Σi (x − µi) ] pi p (x) How can we deal with the denominator p (x)?
  • 33. Remind: The Single Gaussian Distribution • Multivariate Gaussian N (x; µ, Σ) = 1 (2π) d/2 ∥ Σ ∥1/2 exp   − 1 2 (x − µ)T Σ−1 (x − µ)    • For maximum likelihood 0 = ∂ ln N (x1, x2, ..., xN; µ, Σ) ∂µ • and the solution is µML = 1 N N∑ i=1 xi ΣML = 1 N N∑ i=1 (xi − µML)T (xi − µML)
  • 34. The GMM assumption • There are k components: {ci}k i=1 • Component ci has an associated mean vector µi • • µ1 µ2 µ3
  • 35. The GMM assumption • There are k components: {ci}k i=1 • Component ci has an associated mean vector µi • Each component generates data from a Gaussian with mean µi and covariance matrix Σi • Each sample is generated according to the following guidelines: µ1 µ2 µ3
  • 36. The GMM assumption • There are k components: {ci}k i=1 • Component ci has an associated mean vector µi • Each component generates data from a Gaussian with mean µi and covariance matrix Σi • Each sample is generated according to the following guidelines: – Randomly select component ci with probability P (ci) = wi, s.t. ∑k i=1 wi = 1 µ2
  • 37. The GMM assumption • There are k components: {ci}k i=1 • Component ci has an associated mean vector µi • Each component generates data from a Gaussian with mean µi and covariance matrix Σi • Each sample is generated according to the following guidelines: – Randomly select component ci with probability P (ci) = wi, s.t. ∑k i=1 wi = 1 – Sample ~ N (µi, Σi) µ2 x
  • 38. Probability density function of GMM “Linear combination” of Gaussians: f (x) = k∑ i=1 wiN (x; µi, Σi) , where k∑ i=1 wi = 1 0 50 100 150 200 250 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 w1N µ1, σ2 1 w2N µ2, σ2 2 w3N µ3, σ2 3 f (x) (a) The pdf of an 1D GMM with 3 components. (b) The pdf of an 2D GMM with 3 components. Figure 2: Probability density function of some GMMs.
  • 39. GMM: Problem definition f (x) = k∑ i=1 wiN (x; µi, Σi) , where k∑ i=1 wi = 1 Given a training set, how to model these data point using GMM? • Given: – The trainning set: {xi}N i=1 – Number of clusters: k • Goal: model this data using a mixture of Gaussians – Weights: w1, w2, ..., wk – Means and covariances: µ1, µ2, ..., µk; Σ1, Σ2, ..., Σk
  • 40. Computing likelihoods in unsupervised case f (x) = k∑ i=1 wiN (x; µi, Σi) , where k∑ i=1 wi = 1 • Given a mixture of Gaussians, denoted by G. For any x, we can define the likelihood: P (x | G) = P (x | w1, µ1, Σ1, ..., wk, µk, Σk) = k∑ i=1 P (x | ci) P (ci) = k∑ i=1 wiN (x; µi, Σi) • So we can define likelihood for the whole training set [Why?] P (x1, x2, ..., xN | G) = N∏ i=1 P (xi | G) = N∏ i=1 k∑ j=1 wjN (xi; µj, Σj)
  • 41. Estimating GMM parameters • We known this: Maximum Likelihood Estimation ln P (X | G) = N∑ i=1 ln    k∑ j=1 wjN (xi; µj, Σj)    – For the max likelihood: 0 = ∂ ln P (X | G) ∂µj – This leads to non-linear non-analytically-solvable equations! • Use gradient descent – Slow but doable • A much cuter and recently popular method...
  • 42. E.M. for GMM • Remember: – We have the training set {xi}N i=1, the number of components k. – Assume we know p (c1) = w1, p (c2) = w2, ..., p (ck) = wk – We don’t know µ1, µ2, ..., µk The likelihood: p (data | µ1, µ2, ..., µk) = p (x1, x2, ..., xN | µ1, µ2, ..., µk) = N∏ i=1 p (xi | µ1, µ2, ..., µk) = N∏ i=1 k∑ j=1 p (xi | wj, µ1, µ2, ..., µk) p (cj) = N∏ i=1 k∑ j=1 K exp   − 1 2σ2 ( xi − µj ) 2   wi
  • 43. E.M. for GMM • For Max. Likelihood, we know ∂ ∂µi log p (data | µ1, µ2, ..., µk) = 0 • Some wild algebra turns this into: For Maximum Likelihood, for each j: µj = N∑ i=1 p (cj | xi, µ1, µ2, ..., µk) xi N∑ i=1 p (cj | xi, µ1, µ2, ..., µk) This is N non-linear equations of µj’s. • So: – If, for each xi, we know p (cj | xi, µ1, µ2, ..., µk), then we could easily compute µj, – If we know each µj, we could compute p (cj | xi, µ1, µ2, ..., µk) for each xi and cj.
  • 44. E.M. for GMM • E.M. is coming: on the t’th iteration, let our estimates be λt = {µ1 (t) , µ2 (t) , ..., µk (t)} • E-step: compute the expected classes of all data points for each class p (cj | xi, λt) = p (xi | cj, λt) p (cj | λt) p (xi | λt) = p ( xi | cj, µj (t) , σjI ) p (cj) k∑ m=1 p (xi | cm, µm (t) , σmI) p (cm) • M-step: compute µ given our data’s class membership distributions µj (t + 1) = N∑ i=1 p (cj | xi, λt) xi N∑ i=1 p (cj | xi, λt)
  • 45. E.M. for General GMM: E-step • On the t’th iteration, let our estimates be λt = {µ1 (t) , µ2 (t) , ..., µk (t) , Σ1 (t) , Σ2 (t) , ..., Σk (t) , w1 (t) , w2 (t) , ..., wk (t)} • E-step: compute the expected classes of all data points for each class τij (t) ≡ p (cj | xi, λt) = p (xi | cj, λt) p (cj | λt) p (xi | λt) = p ( xi | cj, µj (t) , Σj (t) ) wj (t) k∑ m=1 p (xi | cm, µm (t) , Σj (t)) wm (t)
  • 46. E.M. for General GMM: M-step • M-step: compute µ given our data’s class membership distributions wj (t + 1) = N∑ i=1 p (cj | xi, λt) N µj (t + 1) = N∑ i=1 p (cj | xi, λt) xi N∑ i=1 p (cj | xi, λt) = 1 N N∑ i=1 τij (t) = 1 Nwj (t + 1) N∑ i=1 τij (t) xi Σj (t + 1) = N∑ i=1 p (cj | xi, λt) [ xi − µj (t + 1) ] [ xi − µj (t + 1) ] T N∑ i=1 p (cj | xi, λt) = 1 Nwj (t + 1) N∑ i=1 τij (t) [ xi − µj (t + 1) ] [ xi − µj (t + 1) ] T
  • 47. E.M. for General GMM: Initialization • wj = 1/k, j = 1, 2, ..., k • Each µj is set to a randomly selected point – Or use K-means for this initialization. • Each Σj is computed using the equation in previous slide...
  • 56. Local optimum solution • E.M. is guaranteed to find the local optimal solution by monotonically increasing the log-likelihood • Whether it converges to the global optimal solution depends on the initialization −10 −5 0 5 10 15 0 2 4 6 8 10 12 14 16 18 −10 −5 0 5 10 15 0 5 10 15
  • 57. GMM: Selecting the number of components • We can run the E.M. algorithm with different numbers of components. – Need a criteria for selecting the “best” number of components −10 −5 0 5 10 15 0 5 10 15 −10 −5 0 5 10 15 0 2 4 6 8 10 12 14 16 −10 −5 0 5 10 15 0 2 4 6 8 10 12 14 16
  • 58. Contents • Unsupervised Learning • K-means clustering • Expectation Maximization (E.M.) – Regularized EM – Model Selection • Gaussian mixtures as a Density Estimator – Gaussian mixtures – EM for mixtures • Gaussian mixtures for classification
  • 59. Gaussian mixtures for classification p (y = i | x) = p (x | y = i) p (y = i) p (x) • To build a Bayesian classifier based on GMM, we can use GMM to model data in each class – So each class is modeled by one k-component GMM. • For example: Class 0: p (y = 0) , p (x | θ0), (a 3-component mixture) Class 1: p (y = 1) , p (x | θ1), (a 3-component mixture) Class 2: p (y = 2) , p (x | θ2), (a 3-component mixture) ...
  • 60. GMM for Classification • As previous, each class is modeled by a k-component GMM. • A new test sample x is classified according to c = arg max i p (y = i) p (x | θi) where p (x | θi) = k∑ i=1 wiN (x; µi, Σi) • Simple, quick (and is actually used!)
  • 61.
  • 62. Q & A
  • 63. References [1] N. Laird A. Dempster and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Method- ological), 39(1):pp. 1–38., 1977. [2] David Arthur and Sergei Vassilvitskii. k-means ++ : The Advantages of Careful Seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, volume 8, pages 1027–1035, 2007. [3] N. Gumerov C. Yang, R. Duraiswami and L. Davis. Improved fast gauss transform and efficient kernel density estimation. In IEEE International Conference on Computer Vision, pages pages 464–471, 2003. [4] Charles Elkan. Using the Triangle Inequality to Accelerate k-Means. In Proceed-
  • 64. ings of the Twentieth International Conference on Machine Learning (ICML), 2003. [5] Keshu Zhang Haifeng Li and Tao Jiang. The regularized em algorithm. In Proceedings of the 20th National Conference on Artificial Intelligence, pages pages 807 – 812, Pittsburgh, PA, 2005. [6] Greg Hamerly and Charles Elkan. Learning the k in k-means. In In Neural Information Processing Systems. MIT Press, 2003. [7] Tapas Kanungo, David M Mount, Nathan S Netanyahu, Christine D Piatko, Ruth Silverman, and Angela Y Wu. An efficient k-means clustering algorithm: anal- ysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):881–892, July 2002. [8] J MacQueen. Some methods for classification and analysis of multivariate obser- vations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, volume 233, pages 281–297. University of California Press, 1967.
  • 65. [9] C.F. Wu. On the convergence properties of the em algorithm. The Annals of Statistics, 11:95–103, 1983.
  翻译: