尊敬的 微信汇率:1円 ≈ 0.046239 元 支付宝汇率:1円 ≈ 0.04633元 [退出登录]
SlideShare a Scribd company logo
MACHINE LEARNING IN HIGH
ENERGY PHYSICS
LECTURE #1
Alex Rogozhnikov, 2015
INTRO NOTES
4 days
two lectures, two practice seminars every day
this is introductory track to machine learning
kaggle competition!
WHAT IS ML ABOUT?
Inference of statistical dependencies which give us ability to
predict
Data is cheap, knowledge is precious
WHERE ML IS CURRENTLY USED?
Search engines, spam detection
Security: virus detection, DDOS defense
Computer vision and speech recognition
Market basket analysis, Customer relationship management
(CRM)
Credit scoring, fraud detection
Health monitoring
Churn prediction
... and hundreds more
ML IN HIGH ENERGY PHYSICS
High-level triggers (LHCb trigger system: )
Particle identification
Tagging
Stripping line
Analysis
40MHz → 5kHz
Different data is used on different stages
GENERAL NOTION
In supervised learning the training data is represented as set
of pairs
,xi yi
is index of event
is vector of features available for event
is target — the value we need to predict
i
xi
yi
CLASSIFICATION EXAMPLE
, if finite set
on the plot: ,
Examples:
defining type of particle (or decay
channel)
— binary classification, 1
is signal, 0 is bck
∈ Yyi Y
∈xi ℝ2
∈ {0, 1, 2}yi
Y = {0, 1}
REGRESSION
y ∈ ℝ
Examples:
predicting price of house by it's positions
predicting number of customers / money income
reconstructing real momentum of particle
Why need automatic classification/regression?
in applications up to thousands of features
higher quality
much faster adaptation to new problems
CLASSIFICATION BASED ON
NEAREST NEIGHBOURS
Given training set of objects and their labels we
predict the label for new observation.
{ , }xi yi
= , j = arg ρ(x, )ŷ yj min
i
xi
VISUALIZATION OF DECISION RULE
NEAREST NEIGHBOURSk
A better way is to use neighbours:k
(x) =pi
# of knn events in class i
k
k = 1, 2, 5, 30
OVERFITTING
what is the quality of classification on training dataset when
?
answer: it is ideal (closest neighbor is event itself)
quality is lower when
this doesn't mean is the best,
it means we cannot use training events to estimate quality
when classifier's decision rule is too complex and captures
details from training data that are not relevant to
distribution, we call this overfitting (more details tomorrow)
k = 1
k > 1
k = 1
KNN REGRESSOR
Regression with nearest neighbours is done by averaging of
output
=ŷ
1
k ∑
j∈knn(x)
yj
KNN WITH WEIGHTS
COMPUTATIONAL COMPLEXITY
Given that dimensionality of space is and there are
training samples:
d n
training time ~ O(save a link to data)
prediction time: for each samplen × d
SPACIAL INDEX: BALL TREE
BALL TREE
training time ~
prediction time ~ for each sample
O(d × n log(n))
log(n) × d
Other option exist: KD-tree.
OVERVIEW OF KNN
1. Awesomely simple classifier and regressor
2. Have too optimistic quality on training data
3. Quite slow, though optimizations exist
4. Hard times with data of high dimensions
5. Too sensitive to scale of features
SENSITIVITY TO SCALE OF FEATURES
Euclidean distance:
ρ(x, y = ( − + ( − + ⋯ + ( −)
2
x1 y1 )
2
x2 y2 )
2
xd yd )
2
Change scale fo first feature:
ρ(x, y = (10 − 10 + ( − + ⋯ + ( −)
2
x1 y1 )
2
x2 y2 )
2
xd yd )
2
ρ(x, y ∼ 100( −)
2
x1 y1 )
2
Scaling of features frequently increases quality.
DISTANCE FUNCTION MATTERS
Minkowski distance
Canberra
Cosine metric
(x, y) = ( −ρp
∑i
xi yi )
p
ρ(x, y) = ∑i
| − |xi yi
| | + | |xi yi
ρ(x, y) =
< x, y >
|x| |y|
MINUTES BREAKx
RECAPITULATION
1. Statistical ML: problems
2. ML in HEP
3. nearest neighbours classifier and regressor.k
MEASURING QUALITY OF BINARY
CLASSIFICATION
The classifier's output in binary classification is real variable
Which classifier is better?
All of them are identical
ROC CURVE
These distributions have the same ROC curve:
(ROC curve is passed signal vs passed bck dependency)
ROC CURVE DEMONSTRATION
ROC CURVE
Contains important information:
all possible combinations of signal and background
efficiencies you may achieve by setting threshold
Particular values of thresholds (and initial pdfs) don't
matter, ROC curve doesn't contain this information
ROC curve = information about order of events:
Comparison of algorithms should be based on information
from ROC curve
s s b s b ... b b s b b
TERMINOLOGY AND CONVENTIONS
fpr = background efficiency = b
tpr = signal efficiency = s
→
where are predictions of
random background and signal events.
ROC AUC
(AREA UNDER THE ROC CURVE)
ROC AUC = P(x < y) x, y
Which classifier is better for triggers?
(they have the same ROC AUC)
STATISTICAL MACHINE LEARNING
Machine learning we use in practice is based on statistics
1. Main assumption: the data is generated from probabilistic
distribution:
2. Does there really exist the distribution of people / pages?
3. In HEP these distributions do exist
p(x, y)
OPTIMAL CLASSIFICATION. OPTIMAL
BAYESIAN CLASSIFIER
Assuming that we know real distributions we
reconstruct using Bayes' rule
p(x, y)
p(y|x) = =
p(x, y)
p(x)
p(y)p(x|y)
p(x)
=
p(y = 1 | x)
p(y = 0 | x)
p(y = 1) p(x | y = 1)
p(y = 0) p(x | y = 0)
LEMMA (NEYMAN–PEARSON):
p(y = 1 | x)
The best classification quality is provided by
(optimal bayesian classifier)
p(y = 0 | x)
OPTIMAL BINARY CLASSIFICATION
Optimal bayesian classifier has highest possible ROC curve.
Since the classification quality depends only on order,
gives optimal classification quality too!p(y = 1 | x)
=
p(y = 1 | x)
p(y = 0 | x)
p(y = 1) p(x | y = 1)
p(y = 0) p(x | y = 0)
FISHER'S QDA (QUADRATIC DISCRIMINANT
ANALYSIS)
Reconstructing probabilities from
data, assuming those are multidimensional normal
distributions:
p(x | y = 1), p(x | y = 0)
p(x | y = 0) ∼  ( , )μ0 Σ
0
p(x | y = 1) ∼  ( , )μ1 Σ
1
QDA COMPLEXITY
samples, dimensionsn d
training takes
computing covariation matrix
inverting covariation matrix
prediction takes for each sample
O(n + )d
2
d
3
O(n )d
2
O( )d
3
O( )d
2
f (x) = exp
(
− (x − μ (x − μ)
)
1
(2π |Σ)
k/2
|
1/2
1
2
)
T
Σ
−1
QDA
simple decision rule
fast prediction
many parameters to reconstruct in high dimensions
data almost never has gaussian distribution
WHAT ARE THE PROBLEMS WITH
GENERATIVE APPROACH?
Generative approach: trying to reconstruct , then use
it to predict.
p(x, y)
Real life distributions hardly can be reconstructed
Especially in high-dimensional spaces
So, we switch to discriminative approach: guessing p(y|x)
This is (finding parameters ).
LINEAR DECISION RULE
Decision function is linear:
d(x) =< w, x > +w0
{
d(x) > 0,
d(x) < 0,
 class  + 1
 class  − 1
parametric model w, w0
FINDING OPTIMAL PARAMETERS
A good initial guess: get such , that error of
classification is minimal ([true] = 1, [false] = 0):
Discontinuous optimization (arrrrgh!)
Let's make decision rule smooth
w, w0
 = [ ≠ sgn(d( ))]
∑
i∈events
yi xi
(x)p+1
(x)p−1
= f (d(x))
= 1 − (x)p+1
⎧
⎩
⎨
⎪
⎪
f (0) = 0.5
f (x) > 0.5
f (x) < 0.5
if x > 0
if x < 0
LOGISTIC FUNCTION
a smooth step rule.
σ(x) = =
e
x
1 + e
x
1
1 + e
−x
PROPERTIES
1. monotonic,
2.
3.
4.
σ(x) ∈ (0, 1)
σ(x) + σ(−x) = 1
(x) = σ(x)(1 − σ(x))σ′
2 σ(x) = 1 + tanh(x/2)
LOGISTIC FUNCTION
Optimizing log-likelihood (with probabilities obtained with
logistic function)
LOGISTIC REGRESSION
d(x)
(x)p+1
(x)p−1
=
=
=
< w, x > +w0
σ(d(x))
σ(−d(x))
 = − ln( ( )) = L( , ) → min
1
N ∑
i∈events
pyi
xi
1
N ∑
i
xi yi
Exercise: find expression and build plot for L( , )xi yi
DATA SCIENTIST PIPELINE
1. Experiments in appropriate high-level language or
environment
2. After experiments are over — implement final algorithm in
low-level language (C++, CUDA, FPGA)
Second point is not always needed.
SCIENTIFIC PYTHON
NumPy
vectorized computations in python
Matplotlib
for drawing
Pandas
for data manipulation and analysis (based on
NumPy)
SCIENTIFIC PYTHON
Scikit-learn
most popular library for machine learning
Scipy
libraries for science and engineering
Root_numpy
convenient way to work with ROOT files
THE END

More Related Content

What's hot

Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
jins0618
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
Andres Mendez-Vazquez
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
범준 김
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Marjan Sterjev
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Models
petitegeek
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
Masahiro Suzuki
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
Vu Pham
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
Alexander Litvinenko
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
홍배 김
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
Vivian S. Zhang
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
Hojin Yang
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
Sanghyuk Chun
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applications
Frank Nielsen
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 

What's hot (20)

Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
 
Multiclass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark ExamplesMulticlass Logistic Regression: Derivation and Apache Spark Examples
Multiclass Logistic Regression: Derivation and Apache Spark Examples
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Models
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
 
K-means, EM and Mixture models
K-means, EM and Mixture modelsK-means, EM and Mixture models
K-means, EM and Mixture models
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Information-theoretic clustering with applications
Information-theoretic clustering  with applicationsInformation-theoretic clustering  with applications
Information-theoretic clustering with applications
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 

Similar to MLHEP 2015: Introductory Lecture #1

The world of loss function
The world of loss functionThe world of loss function
The world of loss function
홍배 김
 
Free Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.comFree Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.com
Edhole.com
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
Shocky1
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
Gianmario Spacagna
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
Chun-Ming Chang
 
Introduction
IntroductionIntroduction
Introduction
butest
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
Northwestern University
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
홍배 김
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
Elvis DOHMATOB
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
SanjanaSaxena17
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
Vara Prasad
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
Christian Robert
 
4.Support Vector Machines.ppt machine learning and development
4.Support Vector Machines.ppt machine learning and development4.Support Vector Machines.ppt machine learning and development
4.Support Vector Machines.ppt machine learning and development
PriyankaRamavath3
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 

Similar to MLHEP 2015: Introductory Lecture #1 (20)

The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Free Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.comFree Ebooks Download ! Edhole.com
Free Ebooks Download ! Edhole.com
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Introduction
IntroductionIntroduction
Introduction
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
4.Support Vector Machines.ppt machine learning and development
4.Support Vector Machines.ppt machine learning and development4.Support Vector Machines.ppt machine learning and development
4.Support Vector Machines.ppt machine learning and development
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 

Recently uploaded

The use of probiotics and antibiotics in aquaculture production.pptx
The use of probiotics and antibiotics in aquaculture production.pptxThe use of probiotics and antibiotics in aquaculture production.pptx
The use of probiotics and antibiotics in aquaculture production.pptx
MAGOTI ERNEST
 
The Limited Role of the Streaming Instability during Moon and Exomoon Formation
The Limited Role of the Streaming Instability during Moon and Exomoon FormationThe Limited Role of the Streaming Instability during Moon and Exomoon Formation
The Limited Role of the Streaming Instability during Moon and Exomoon Formation
Sérgio Sacani
 
GBSN - Microbiology (Unit 2) Susceptibility of Microbial agents
GBSN - Microbiology (Unit 2) Susceptibility of Microbial agentsGBSN - Microbiology (Unit 2) Susceptibility of Microbial agents
GBSN - Microbiology (Unit 2) Susceptibility of Microbial agents
Areesha Ahmad
 
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptxSynopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
FarhanaHussain18
 
Discovery of Merging Twin Quasars at z=6.05
Discovery of Merging Twin Quasars at z=6.05Discovery of Merging Twin Quasars at z=6.05
Discovery of Merging Twin Quasars at z=6.05
Sérgio Sacani
 
seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
Nistarini College, Purulia (W.B) India
 
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
yashika sharman06
 
Centrifugation types and its application
Centrifugation types and its applicationCentrifugation types and its application
Centrifugation types and its application
MDAsifKilledar
 
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيعحبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات حبوب سايتوتك الامارات
 
Explainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video DetectionExplainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video Detection
VasileiosMezaris
 
一比一原版(macewan学位证书)加拿大麦科文大学毕业证如何办理
一比一原版(macewan学位证书)加拿大麦科文大学毕业证如何办理一比一原版(macewan学位证书)加拿大麦科文大学毕业证如何办理
一比一原版(macewan学位证书)加拿大麦科文大学毕业证如何办理
xzydcvt
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
suyashempire
 
Top Call Girls Lucknow 🔥 9079923931 🔥 Real Fun With Sexual Girl Available 24/...
Top Call Girls Lucknow 🔥 9079923931 🔥 Real Fun With Sexual Girl Available 24/...Top Call Girls Lucknow 🔥 9079923931 🔥 Real Fun With Sexual Girl Available 24/...
Top Call Girls Lucknow 🔥 9079923931 🔥 Real Fun With Sexual Girl Available 24/...
$A19
 
一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理
gyhwyo
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
goluk9330
 
Module_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISMModule_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISM
rajeshwexl
 
GBSN - Biochemistry (Unit 12) Hormones
GBSN - Biochemistry (Unit 12) HormonesGBSN - Biochemistry (Unit 12) Hormones
GBSN - Biochemistry (Unit 12) Hormones
Areesha Ahmad
 
Cultivation of human viruses and its different techniques.
Cultivation of human viruses and its different techniques.Cultivation of human viruses and its different techniques.
Cultivation of human viruses and its different techniques.
MDAsifKilledar
 
20240515_CEBaP Poster_SR eating_drinking.pdf
20240515_CEBaP Poster_SR eating_drinking.pdf20240515_CEBaP Poster_SR eating_drinking.pdf
20240515_CEBaP Poster_SR eating_drinking.pdf
Hans Van Remoortel
 
Rodents, Birds and locust_Pests of crops.pdf
Rodents, Birds and locust_Pests of crops.pdfRodents, Birds and locust_Pests of crops.pdf
Rodents, Birds and locust_Pests of crops.pdf
PirithiRaju
 

Recently uploaded (20)

The use of probiotics and antibiotics in aquaculture production.pptx
The use of probiotics and antibiotics in aquaculture production.pptxThe use of probiotics and antibiotics in aquaculture production.pptx
The use of probiotics and antibiotics in aquaculture production.pptx
 
The Limited Role of the Streaming Instability during Moon and Exomoon Formation
The Limited Role of the Streaming Instability during Moon and Exomoon FormationThe Limited Role of the Streaming Instability during Moon and Exomoon Formation
The Limited Role of the Streaming Instability during Moon and Exomoon Formation
 
GBSN - Microbiology (Unit 2) Susceptibility of Microbial agents
GBSN - Microbiology (Unit 2) Susceptibility of Microbial agentsGBSN - Microbiology (Unit 2) Susceptibility of Microbial agents
GBSN - Microbiology (Unit 2) Susceptibility of Microbial agents
 
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptxSynopsis presentation VDR gene polymorphism and anemia (2).pptx
Synopsis presentation VDR gene polymorphism and anemia (2).pptx
 
Discovery of Merging Twin Quasars at z=6.05
Discovery of Merging Twin Quasars at z=6.05Discovery of Merging Twin Quasars at z=6.05
Discovery of Merging Twin Quasars at z=6.05
 
seed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdfseed production, Nursery & Gardening.pdf
seed production, Nursery & Gardening.pdf
 
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
Call Girls Noida🔥9873777170🔥Gorgeous Escorts in Noida Available 24/7
 
Centrifugation types and its application
Centrifugation types and its applicationCentrifugation types and its application
Centrifugation types and its application
 
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيعحبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
حبوب الاجهاض الامارات | 00971547952044 | حبوب اجهاض امارات للبيع
 
Explainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video DetectionExplainable Deepfake Image/Video Detection
Explainable Deepfake Image/Video Detection
 
一比一原版(macewan学位证书)加拿大麦科文大学毕业证如何办理
一比一原版(macewan学位证书)加拿大麦科文大学毕业证如何办理一比一原版(macewan学位证书)加拿大麦科文大学毕业证如何办理
一比一原版(macewan学位证书)加拿大麦科文大学毕业证如何办理
 
Post translation modification by Suyash Garg
Post translation modification by Suyash GargPost translation modification by Suyash Garg
Post translation modification by Suyash Garg
 
Top Call Girls Lucknow 🔥 9079923931 🔥 Real Fun With Sexual Girl Available 24/...
Top Call Girls Lucknow 🔥 9079923931 🔥 Real Fun With Sexual Girl Available 24/...Top Call Girls Lucknow 🔥 9079923931 🔥 Real Fun With Sexual Girl Available 24/...
Top Call Girls Lucknow 🔥 9079923931 🔥 Real Fun With Sexual Girl Available 24/...
 
一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理一比一原版美国佩斯大学毕业证如何办理
一比一原版美国佩斯大学毕业证如何办理
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
 
Module_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISMModule_1.In autotrophic nutrition ORGANISM
Module_1.In autotrophic nutrition ORGANISM
 
GBSN - Biochemistry (Unit 12) Hormones
GBSN - Biochemistry (Unit 12) HormonesGBSN - Biochemistry (Unit 12) Hormones
GBSN - Biochemistry (Unit 12) Hormones
 
Cultivation of human viruses and its different techniques.
Cultivation of human viruses and its different techniques.Cultivation of human viruses and its different techniques.
Cultivation of human viruses and its different techniques.
 
20240515_CEBaP Poster_SR eating_drinking.pdf
20240515_CEBaP Poster_SR eating_drinking.pdf20240515_CEBaP Poster_SR eating_drinking.pdf
20240515_CEBaP Poster_SR eating_drinking.pdf
 
Rodents, Birds and locust_Pests of crops.pdf
Rodents, Birds and locust_Pests of crops.pdfRodents, Birds and locust_Pests of crops.pdf
Rodents, Birds and locust_Pests of crops.pdf
 

MLHEP 2015: Introductory Lecture #1

  • 1. MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015
  • 2. INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
  • 3. WHAT IS ML ABOUT? Inference of statistical dependencies which give us ability to predict Data is cheap, knowledge is precious
  • 4. WHERE ML IS CURRENTLY USED? Search engines, spam detection Security: virus detection, DDOS defense Computer vision and speech recognition Market basket analysis, Customer relationship management (CRM) Credit scoring, fraud detection Health monitoring Churn prediction ... and hundreds more
  • 5. ML IN HIGH ENERGY PHYSICS High-level triggers (LHCb trigger system: ) Particle identification Tagging Stripping line Analysis 40MHz → 5kHz Different data is used on different stages
  • 6. GENERAL NOTION In supervised learning the training data is represented as set of pairs ,xi yi is index of event is vector of features available for event is target — the value we need to predict i xi yi
  • 7. CLASSIFICATION EXAMPLE , if finite set on the plot: , Examples: defining type of particle (or decay channel) — binary classification, 1 is signal, 0 is bck ∈ Yyi Y ∈xi ℝ2 ∈ {0, 1, 2}yi Y = {0, 1}
  • 8. REGRESSION y ∈ ℝ Examples: predicting price of house by it's positions predicting number of customers / money income reconstructing real momentum of particle Why need automatic classification/regression? in applications up to thousands of features higher quality much faster adaptation to new problems
  • 9. CLASSIFICATION BASED ON NEAREST NEIGHBOURS Given training set of objects and their labels we predict the label for new observation. { , }xi yi = , j = arg ρ(x, )ŷ yj min i xi
  • 11. NEAREST NEIGHBOURSk A better way is to use neighbours:k (x) =pi # of knn events in class i k
  • 12. k = 1, 2, 5, 30
  • 13. OVERFITTING what is the quality of classification on training dataset when ? answer: it is ideal (closest neighbor is event itself) quality is lower when this doesn't mean is the best, it means we cannot use training events to estimate quality when classifier's decision rule is too complex and captures details from training data that are not relevant to distribution, we call this overfitting (more details tomorrow) k = 1 k > 1 k = 1
  • 14. KNN REGRESSOR Regression with nearest neighbours is done by averaging of output =ŷ 1 k ∑ j∈knn(x) yj
  • 16.
  • 17. COMPUTATIONAL COMPLEXITY Given that dimensionality of space is and there are training samples: d n training time ~ O(save a link to data) prediction time: for each samplen × d
  • 19. BALL TREE training time ~ prediction time ~ for each sample O(d × n log(n)) log(n) × d Other option exist: KD-tree.
  • 20. OVERVIEW OF KNN 1. Awesomely simple classifier and regressor 2. Have too optimistic quality on training data 3. Quite slow, though optimizations exist 4. Hard times with data of high dimensions 5. Too sensitive to scale of features
  • 21. SENSITIVITY TO SCALE OF FEATURES Euclidean distance: ρ(x, y = ( − + ( − + ⋯ + ( −) 2 x1 y1 ) 2 x2 y2 ) 2 xd yd ) 2 Change scale fo first feature: ρ(x, y = (10 − 10 + ( − + ⋯ + ( −) 2 x1 y1 ) 2 x2 y2 ) 2 xd yd ) 2 ρ(x, y ∼ 100( −) 2 x1 y1 ) 2 Scaling of features frequently increases quality.
  • 22. DISTANCE FUNCTION MATTERS Minkowski distance Canberra Cosine metric (x, y) = ( −ρp ∑i xi yi ) p ρ(x, y) = ∑i | − |xi yi | | + | |xi yi ρ(x, y) = < x, y > |x| |y|
  • 24. RECAPITULATION 1. Statistical ML: problems 2. ML in HEP 3. nearest neighbours classifier and regressor.k
  • 25. MEASURING QUALITY OF BINARY CLASSIFICATION The classifier's output in binary classification is real variable Which classifier is better? All of them are identical
  • 26. ROC CURVE These distributions have the same ROC curve: (ROC curve is passed signal vs passed bck dependency)
  • 28. ROC CURVE Contains important information: all possible combinations of signal and background efficiencies you may achieve by setting threshold Particular values of thresholds (and initial pdfs) don't matter, ROC curve doesn't contain this information ROC curve = information about order of events: Comparison of algorithms should be based on information from ROC curve s s b s b ... b b s b b
  • 29. TERMINOLOGY AND CONVENTIONS fpr = background efficiency = b tpr = signal efficiency = s →
  • 30. where are predictions of random background and signal events. ROC AUC (AREA UNDER THE ROC CURVE) ROC AUC = P(x < y) x, y
  • 31. Which classifier is better for triggers? (they have the same ROC AUC)
  • 32. STATISTICAL MACHINE LEARNING Machine learning we use in practice is based on statistics 1. Main assumption: the data is generated from probabilistic distribution: 2. Does there really exist the distribution of people / pages? 3. In HEP these distributions do exist p(x, y)
  • 33. OPTIMAL CLASSIFICATION. OPTIMAL BAYESIAN CLASSIFIER Assuming that we know real distributions we reconstruct using Bayes' rule p(x, y) p(y|x) = = p(x, y) p(x) p(y)p(x|y) p(x) = p(y = 1 | x) p(y = 0 | x) p(y = 1) p(x | y = 1) p(y = 0) p(x | y = 0) LEMMA (NEYMAN–PEARSON): p(y = 1 | x)
  • 34. The best classification quality is provided by (optimal bayesian classifier) p(y = 0 | x) OPTIMAL BINARY CLASSIFICATION Optimal bayesian classifier has highest possible ROC curve. Since the classification quality depends only on order, gives optimal classification quality too!p(y = 1 | x) = p(y = 1 | x) p(y = 0 | x) p(y = 1) p(x | y = 1) p(y = 0) p(x | y = 0)
  • 35. FISHER'S QDA (QUADRATIC DISCRIMINANT ANALYSIS) Reconstructing probabilities from data, assuming those are multidimensional normal distributions: p(x | y = 1), p(x | y = 0) p(x | y = 0) ∼  ( , )μ0 Σ 0 p(x | y = 1) ∼  ( , )μ1 Σ 1
  • 36.
  • 37. QDA COMPLEXITY samples, dimensionsn d training takes computing covariation matrix inverting covariation matrix prediction takes for each sample O(n + )d 2 d 3 O(n )d 2 O( )d 3 O( )d 2 f (x) = exp ( − (x − μ (x − μ) ) 1 (2π |Σ) k/2 | 1/2 1 2 ) T Σ −1
  • 38. QDA simple decision rule fast prediction many parameters to reconstruct in high dimensions data almost never has gaussian distribution
  • 39. WHAT ARE THE PROBLEMS WITH GENERATIVE APPROACH? Generative approach: trying to reconstruct , then use it to predict. p(x, y) Real life distributions hardly can be reconstructed Especially in high-dimensional spaces So, we switch to discriminative approach: guessing p(y|x)
  • 40. This is (finding parameters ). LINEAR DECISION RULE Decision function is linear: d(x) =< w, x > +w0 { d(x) > 0, d(x) < 0,  class  + 1  class  − 1 parametric model w, w0
  • 41. FINDING OPTIMAL PARAMETERS A good initial guess: get such , that error of classification is minimal ([true] = 1, [false] = 0): Discontinuous optimization (arrrrgh!) Let's make decision rule smooth w, w0  = [ ≠ sgn(d( ))] ∑ i∈events yi xi (x)p+1 (x)p−1 = f (d(x)) = 1 − (x)p+1 ⎧ ⎩ ⎨ ⎪ ⎪ f (0) = 0.5 f (x) > 0.5 f (x) < 0.5 if x > 0 if x < 0
  • 42. LOGISTIC FUNCTION a smooth step rule. σ(x) = = e x 1 + e x 1 1 + e −x PROPERTIES 1. monotonic, 2. 3. 4. σ(x) ∈ (0, 1) σ(x) + σ(−x) = 1 (x) = σ(x)(1 − σ(x))σ′ 2 σ(x) = 1 + tanh(x/2)
  • 44. Optimizing log-likelihood (with probabilities obtained with logistic function) LOGISTIC REGRESSION d(x) (x)p+1 (x)p−1 = = = < w, x > +w0 σ(d(x)) σ(−d(x))  = − ln( ( )) = L( , ) → min 1 N ∑ i∈events pyi xi 1 N ∑ i xi yi
  • 45. Exercise: find expression and build plot for L( , )xi yi DATA SCIENTIST PIPELINE 1. Experiments in appropriate high-level language or environment 2. After experiments are over — implement final algorithm in low-level language (C++, CUDA, FPGA) Second point is not always needed.
  • 46. SCIENTIFIC PYTHON NumPy vectorized computations in python Matplotlib for drawing Pandas for data manipulation and analysis (based on NumPy)
  • 47. SCIENTIFIC PYTHON Scikit-learn most popular library for machine learning Scipy libraries for science and engineering Root_numpy convenient way to work with ROOT files
  翻译: