ๅฐŠๆ•ฌ็š„ ๅพฎไฟกๆฑ‡็Ž‡๏ผš1ๅ†† โ‰ˆ 0.046078 ๅ…ƒ ๆ”ฏไป˜ๅฎๆฑ‡็Ž‡๏ผš1ๅ†† โ‰ˆ 0.046168ๅ…ƒ [้€€ๅ‡บ็™ปๅฝ•]
SlideShare a Scribd company logo
NGBoost:
Natural
Gradient
Boosting
Mohamed Ali Habib
Outlines
โ€ข Introduction.
โ€ข What is probabilistic regression?
โ€ข Why is it useful?
โ€ข How does other methods compare to NGBoost?
โ€ข Gradient Boosting Algorithm.
โ€ข NGBoost:
โ€ข Main components.
โ€ข Steps.
โ€ข Usage.
โ€ข Experiments & Results.
โ€ข Computational Complexity.
โ€ข Future Work.
โ€ข References.
Introduction
What is probabilistic regression?
(Standard Regression)
Note: This use of conditional probability distributions is already the norm in classification
Why is probabilistic regression (prediction) useful?
The measure of uncertainty makes probabilistic prediction crucial in applications like healthcare and
weather forecasting.
Why is probabilistic regression (prediction) useful?
All in all, probabilistic regression (prediction) provides better insight over standard (scalar)
regression.
E[Y|X=x]
X=x P(Y|X=x)
Problems with existing methods
Methods:
โ€ข Post-hoc variance.
โ€ข Generalized Additive Models for Shape Scale
and Location (GAMLSS)
โ€ข Bayesian methods like MMC.
โ€ข Bayesian deep learning.
Problems:
โ€ข Inflexible.
โ€ข Slow.
โ€ข Require expert knowledge.
โ€ข Make strong assumption about nature of data
(Homoscedasticity*)
Limitations of deep learning methods: difficult to use
out-of-the-box
โ€ข Require expert knowledge.
โ€ข Usually perform only on par with traditional
methods on limited size or tabular data.
โ€ข Require extensive hyperparameter tuning.
* Homoscedasticity: means that all random
variables in a sequence have the same finite
variance.
Gradient Boosting
Machines (GBMs)
โ€ข A set of highly modular methods
that:
โ€ข work out-of-the-box.
โ€ข Perform well on structured
data, even with small datasets.
โ€ข Demonstrated empirical success on
Kaggle and other data science
competitions.
Source: what algorithms are most successful on Kaggle?
Problems related to GBMs
โ€ข Assume Homoscedasticity: constant variance.
โ€ข Predicted distributions should have at least two
degrees of freedom (two parameters) to
effectively convey both the magnitude and the
uncertainty of the predictions.
What is the solution then?
(Spoiler Alert) it is NGBoost ๏Š
NGBoost sovles the problem of simultaneous boosting of multiple parameters from
the base learners using:
โ€ข A multiparameter boosting approach.
โ€ข Use of natural gradients.
Gradient
Boosting
Algorithm
โ€ข An ensemble of simple models are involved in making a prediction.
โ€ข Results in a prediction model in the form of ensemble weak models.
โ€ข Intuition: the best possible next model, when combined with previous models,
minimizes the overall prediction error.
โ€ข Components:
โ€ข A loss function to be optimized.
โ€ข E.g., MSE or Logarithmic Loss.
โ€ข A weak learner to make predictions.
โ€ข Most common choice is Decision Trees or Regression Trees.
โ€ข Common to constrain the learner such as specifying maximum number of
layers, nodes, splits or leaf nodes.
โ€ข An additive model to add weak learners to minimize the loss function.
โ€ข A gradient descent procedure is used to minimize the loss when adding
trees.
Gradient
Boosting
Algorithm
Gradient Boosting
Algorithm
Explanation:
Step 1: Initialize prediction to a constant whose value minimizes
the loss. You could solve using Gradient Descent or manually if
problem is trivial.
Step 2: build the trees (weak learners)
(A) Compute residuals between prediction and observed data.
Use prediction of previous step ๐น ๐‘ฅ = ๐น๐‘šโˆ’1(๐‘ฅ), which is
๐น0(๐‘ฅ) for ๐‘š = 1.
(B) Optimize tree on the residuals (make residuals the target
output). ๐‘— here loops over leaf nodes.
(C) Determine output for each leaf in tree. E.g., if leaf has 14.7
and 2.7, then output is the value of ๐›พ that minimizes the
summation. Different than Step 1, here we are taking
previous prediction ๐น๐‘šโˆ’1(๐‘ฅ๐‘–) into account.
(D) Make a new prediction for each sample. The summation
accounts for the case that a single sample ends up in
multiple leaves. So, you take a scaled sum of the outputs ๐›พ
for each leaf. Choosing a small learning rate ๐œ improves
prediction
Step 3: Final prediction is the prediction of the last tree.
To learn more:
โ€ข Paper: Greedy Function Approximation: A Gradient Boosting Machine, Jerome H. Friedman.
โ€ข Video explanations: Gradient Boost part 1, part 2, part 3, part 4.
โ€ข Decision Trees video explanation: Decision Trees.
โ€ข AdaBoost video explanation: AdaBoost.
NGBoost: Natural Gradient Boosting
โ€ข A method for probabilistic prediction with competitive state-of-the-art performance on a variety
of datasets.
โ€ข Combines a multiparameter boosting algorithm with the natural gradient to efficiently how the
parameters of the presumed outcome distribution vary with the observed features.
โ€ข In a standard prediction setting:
โ€ข the object of interest is the estimate of a scalar function ฮ•(๐‘ฆ|๐‘ฅ) where ๐‘ฅ is the vector of covariates
(observed features) and ๐‘ฆ is the prediction target.
โ€ข For NGBoost:
โ€ข The object of interest is a conditional probability distribution ๐‘ƒ๐œƒ(๐‘ฆ|๐‘ฅ).
โ€ข Assuming ๐‘ƒ๐œƒ ๐‘ฆ ๐‘ฅ has a parametric form of ๐‘ parameters where ๐œƒ ๐œ– โ„๐‘
(vector of p parameters).
NGBoost: Natural
Gradient Boosting
Components:
โ€ข Base learner (e.g. Regression Tree).
โ€ข Parametric probability distribution (Normal, Laplace, Poisson, etc.).
โ€ข Scoring Rule (MLE, CRPS, etc.).
NGBoost:
Natural
Gradient
Boosting
Steps:
1. Pick a scoring rule to grade our estimate of P(Y|X=x)
2. Assume that P(Y|X=x) has some parametric form
3. Fit the parameters ฮธ(x) as a function of x using
gradient boosting
4. Use the natural gradient to correct the training
dynamics of this approach
Proper Scoring Rule
A proper scoring rule ๐‘†(๐‘ƒ, ๐‘ฆ) must satisfy:
ฮ•๐‘ฆ~๐‘„ ๐‘†(๐‘„, ๐‘ฆ) โ‰ค ฮ•๐‘ฆ~๐‘„ ๐‘† ๐‘ƒ, ๐‘ฆ โˆ€ ๐‘ƒ, ๐‘„
๐‘„: ๐‘ก๐‘Ÿ๐‘ข๐‘’ ๐‘‘๐‘–๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘๐‘ข๐‘ก๐‘–๐‘œ๐‘› ๐‘œ๐‘“ ๐‘œ๐‘ข๐‘ก๐‘๐‘œ๐‘š๐‘’๐‘  ๐‘ฆ
๐‘ƒ: ๐‘Ž๐‘›๐‘ฆ ๐‘œ๐‘กโ„Ž๐‘’๐‘Ÿ ๐‘‘๐‘–๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘๐‘ข๐‘ก๐‘–๐‘œ๐‘› ๐‘’. ๐‘”. ๐‘๐‘Ÿ๐‘’๐‘‘๐‘–๐‘๐‘ก๐‘’๐‘‘ ๐‘œ๐‘“ ๐‘œ๐‘ข๐‘ก๐‘๐‘œ๐‘š๐‘’๐‘  ๐‘ฆ
In other words, the scoring rule assigns a score to the forecast such that the true distribution ๐‘„
of the outcomes gets the best score in expectation compared to other distributions, like ๐‘ƒ.
(Gneiting and Raftery, 2007. Strictly Proper Scoring Rules, Prediction, and Estimation.)
1. Pick a scoring rule to grade our estimate of P(Y|X=x)
Point Prediction Loss Function
Probabilistic
Prediction
Scoring Rule
Example scoring rule: negative log-likelihood
Notes:
โ€ข A scoring rule in probabilistic regression is analogous to loss function in standard regression.
โ€ข NLL when minimized gives the Maximum Likelihood Estimation (MLE).
โ€ข Taking the log simplifies the calculus.
โ€ข NLL (MLE) is the most common propre scoring rule.
โ€ข CRPS is another good alternative to MLE.
2. Assume P(Y|X=x) has some parametric form
ฮผ = 1
ฯƒ = 1
ฮผ = 2
ฯƒ = 0.5
ฮผ = 2.5
ฯƒ = 0.75
ฮผ = 3.5
ฯƒ = 1.5
Note:
here they are
assuming a normal
distribution, but
you can swap out
with any other
distribution
(Poisson, Bernoulli,
etc.) that fits your
application.
3. Fit the parameters ฮธ(x) as a function of x using gradient boosting
ฮผ = 1
ฯƒ = 1
ฮผ = 2
ฯƒ = 0.5
ฮผ = 2.5
ฯƒ = 0.75
ฮผ = 3.5
ฯƒ = 1.5
This approach performs poorly in practice.
What we get:
What we want:
The algorithm is failing to adjust the mean which is affecting prediction.
What could be the solution?
Use natural gradients instead of ordinary gradients.
What we typically do: gradient descent in the parameter space
โ€ข Pick a small region around your value of ๐œƒ
โ€ข Which direction, to step into in the ball, decreases the score. (aka gradient)
What we want to do: Gradient descent in the space of distributions
Every point in this space represents
some distribution.
Parametrizing the space of distributions
is just a โ€œnameโ€ for P
Each distribution has such
a name (i.e. is โ€œidentifiedโ€
by its parameters)
The problem is:
Gradient descent in the parameter space is not gradient descent in the distribution space because
distances donโ€™t correspond.
Thatโ€™s because distances are not the same in both spaces.
Spaces have
different
shape and
density
4. Use the natural gradient to correct the training dynamics of this
approach.
this is the natural gradient
Idea: do gradient descent in the distribution by
searching parameters in the transformed region
โ€ข is the Riemannian
metric of the space of
distributions
โ€ข It depends on the
parametric form chosen
and the score function
โ€ข If the score is NLL, this is
the Fisher Information
Hereโ€™s the trick:
โ€ข Multiplying the ordinary gradient with Riemannian metric which will implicitly transform optimal direction
in parameter space to optimal direction in the distributional space.
โ€ข We can conveniently compute the natural gradient by applying a transformation to the gradient
Proper scoring rules
and corresponding
gradients for fitting a
Normal distributions
~๐‘(0,1)
NGBoost
Explanation:
1. Estimate a common ๐œƒ(0)
such that it minimizes ๐‘†.
2. For each iteration ๐‘š:
โ€ข Compute the natural gradient ๐‘”๐‘–
(๐‘š)
of ๐‘† with
respect to the predicted parameters of that
example up to that stage, ๐œƒ๐‘–
(๐‘šโˆ’1)
.
โ€ข Fit learners, one per parameter, on natural
gradients. E.g. ๐‘“(๐‘š)
= (๐‘“๐œ‡
๐‘š
, ๐‘“log ๐œŽ
๐‘š
)
โ€ข Compute a scaling factor ๐œŒ(๐‘š)
(scalar) such that
it minimizes true scoring rule along the
projected gradient in the form of line search. In
practice, they found setting ๐œŒ = 1 and then
halving successively works well.
โ€ข Update predicted parameters.
Notes:
โ€ข learning rate ๐œ‚ is typically 0.1 or 0.01. According to
Friedman assumption.
โ€ข Sub-sampling mini-batches can improve computation
performance for large datasets.
Experiments
โ€ข UCI ML Repository benchmarks.
โ€ข Probabilistic Regression:
โ€ข Configuration:
โ€ข Data split: 70% training, 20% validation, and 10% testing.
โ€ข Repeated 20 times.
โ€ข Ablation:
โ€ข 2nd-Order boosting: use 2nd order gradients instead of natural gradients.
โ€ข Multiparameter boosting: using ordinary gradients instead of natural
gradients.
โ€ข Homoscedastic boosting: assuming constant variance to see the benefits of
the allowing parameters other than the conditional mean to vary across ๐‘ฅ.
โ€ข Why? To demonstrate that multiparameter boosting and the natural
gradient work together to improve performance.
โ€ข Point estimation.
Results
The result is equal or better performance than state-of-the art probabilistic prediction methods
Results
Ablation
Results
NGBoost is competitive for point prediction too
Usage
Computational
Complexity
Difference between NGBoost and other boosting algorithms:
โ€ข NGBoost is a series of learners that must be fit for each
parameter, whereas standard boosting fits only one series of
learners.
โ€ข Natural Gradient ๐‘๐‘ฅ๐‘ ๐ผ๐‘ 
โˆ’1
matrix is computed at each step.
Note that ๐‘ is the number of parameters.
In practice:
โ€ข The matrix is small for most used distributions. Only 2x2 if using
Normal distribution.
โ€ข If dataset is huge, it may still be expensive to compute large
number of matrices for each iteration.
Future work
โ€ข Apply NGBoost to classification (e.g.
survival)
โ€ข Joint prediction: ๐‘ƒ๐œƒ(๐‘ง, ๐‘ฆ|๐‘ฅ)
โ€ข Technical innovations:
โ€ข Better tree-based base learners and
regularization are likely to improve
performance especially in terms of large
datasets.
References
โ€ข NGBoost: Natural Gradient Boosting for
Probabilistic Prediction
โ€ข NGBoost: Stanford ML Group
Thank you

More Related Content

What's hot

Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
Yan Xu
ย 
็ฉบ้–“ใƒ‡ใƒผใ‚ฟใฎใŸใ‚ใฎๅ›žๅธฐๅˆ†ๆž
็ฉบ้–“ใƒ‡ใƒผใ‚ฟใฎใŸใ‚ใฎๅ›žๅธฐๅˆ†ๆž็ฉบ้–“ใƒ‡ใƒผใ‚ฟใฎใŸใ‚ใฎๅ›žๅธฐๅˆ†ๆž
็ฉบ้–“ใƒ‡ใƒผใ‚ฟใฎใŸใ‚ใฎๅ›žๅธฐๅˆ†ๆžspringking
ย 
[DL่ผช่ชญไผš]NVAE: A Deep Hierarchical Variational Autoencoder
[DL่ผช่ชญไผš]NVAE: A Deep Hierarchical Variational Autoencoder[DL่ผช่ชญไผš]NVAE: A Deep Hierarchical Variational Autoencoder
[DL่ผช่ชญไผš]NVAE: A Deep Hierarchical Variational Autoencoder
Deep Learning JP
ย 
ๅ› ๆžœๆŽข็ดข: ๅŸบๆœฌใ‹ใ‚‰ๆœ€่ฟ‘ใฎ็™บๅฑ•ใพใงใ‚’ๆฆ‚่ชฌ
ๅ› ๆžœๆŽข็ดข: ๅŸบๆœฌใ‹ใ‚‰ๆœ€่ฟ‘ใฎ็™บๅฑ•ใพใงใ‚’ๆฆ‚่ชฌๅ› ๆžœๆŽข็ดข: ๅŸบๆœฌใ‹ใ‚‰ๆœ€่ฟ‘ใฎ็™บๅฑ•ใพใงใ‚’ๆฆ‚่ชฌ
ๅ› ๆžœๆŽข็ดข: ๅŸบๆœฌใ‹ใ‚‰ๆœ€่ฟ‘ใฎ็™บๅฑ•ใพใงใ‚’ๆฆ‚่ชฌ
Shiga University, RIKEN
ย 
XGBoostใ‹ใ‚‰NGBoostใพใง
XGBoostใ‹ใ‚‰NGBoostใพใงXGBoostใ‹ใ‚‰NGBoostใพใง
XGBoostใ‹ใ‚‰NGBoostใพใง
Tomoki Yoshida
ย 
[PRML] ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’๏ผˆ็ฌฌ1็ซ ๏ผšๅบ่ซ–๏ผ‰
[PRML] ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’๏ผˆ็ฌฌ1็ซ ๏ผšๅบ่ซ–๏ผ‰[PRML] ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’๏ผˆ็ฌฌ1็ซ ๏ผšๅบ่ซ–๏ผ‰
[PRML] ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’๏ผˆ็ฌฌ1็ซ ๏ผšๅบ่ซ–๏ผ‰
Ryosuke Sasaki
ย 
PRML่ผช่ชญ#12
PRML่ผช่ชญ#12PRML่ผช่ชญ#12
PRML่ผช่ชญ#12
matsuolab
ย 
ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽข็ดขใจ้žใ‚ฌใ‚ฆใ‚นๆ€ง
ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽข็ดขใจ้žใ‚ฌใ‚ฆใ‚นๆ€งๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽข็ดขใจ้žใ‚ฌใ‚ฆใ‚นๆ€ง
ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽข็ดขใจ้žใ‚ฌใ‚ฆใ‚นๆ€ง
Shiga University, RIKEN
ย 
้‡‘่žๆ™‚็ณปๅˆ—ใฎใŸใ‚ใฎๆทฑๅฑคt้Ž็จ‹ๅ›žๅธฐใƒขใƒ‡ใƒซ
้‡‘่žๆ™‚็ณปๅˆ—ใฎใŸใ‚ใฎๆทฑๅฑคt้Ž็จ‹ๅ›žๅธฐใƒขใƒ‡ใƒซ้‡‘่žๆ™‚็ณปๅˆ—ใฎใŸใ‚ใฎๆทฑๅฑคt้Ž็จ‹ๅ›žๅธฐใƒขใƒ‡ใƒซ
้‡‘่žๆ™‚็ณปๅˆ—ใฎใŸใ‚ใฎๆทฑๅฑคt้Ž็จ‹ๅ›žๅธฐใƒขใƒ‡ใƒซ
Kei Nakagawa
ย 
็คพไผšๅฟƒ็†ๅญฆใจGlmm
็คพไผšๅฟƒ็†ๅญฆใจGlmm็คพไผšๅฟƒ็†ๅญฆใจGlmm
็คพไผšๅฟƒ็†ๅญฆใจGlmm
Hiroshi Shimizu
ย 
Prml6
Prml6Prml6
Prml6
Arata Honda
ย 
PRML่ผช่ชญ#13
PRML่ผช่ชญ#13PRML่ผช่ชญ#13
PRML่ผช่ชญ#13
matsuolab
ย 
MLaPP 9็ซ  ใ€Œไธ€่ˆฌๅŒ–็ทšๅฝขใƒขใƒ‡ใƒซใจๆŒ‡ๆ•ฐๅž‹ๅˆ†ๅธƒๆ—ใ€
MLaPP 9็ซ  ใ€Œไธ€่ˆฌๅŒ–็ทšๅฝขใƒขใƒ‡ใƒซใจๆŒ‡ๆ•ฐๅž‹ๅˆ†ๅธƒๆ—ใ€MLaPP 9็ซ  ใ€Œไธ€่ˆฌๅŒ–็ทšๅฝขใƒขใƒ‡ใƒซใจๆŒ‡ๆ•ฐๅž‹ๅˆ†ๅธƒๆ—ใ€
MLaPP 9็ซ  ใ€Œไธ€่ˆฌๅŒ–็ทšๅฝขใƒขใƒ‡ใƒซใจๆŒ‡ๆ•ฐๅž‹ๅˆ†ๅธƒๆ—ใ€
moterech
ย 
ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใจใ‚ทใƒŸใƒฅใƒฌใƒผใ‚ทใƒงใƒณๆณ•ใฎๅŸบ็คŽ
ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใจใ‚ทใƒŸใƒฅใƒฌใƒผใ‚ทใƒงใƒณๆณ•ใฎๅŸบ็คŽใƒ™ใ‚คใ‚บๆŽจ่ซ–ใจใ‚ทใƒŸใƒฅใƒฌใƒผใ‚ทใƒงใƒณๆณ•ใฎๅŸบ็คŽ
ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใจใ‚ทใƒŸใƒฅใƒฌใƒผใ‚ทใƒงใƒณๆณ•ใฎๅŸบ็คŽ
Tomoshige Nakamura
ย 
You Only Learn One Representation: Unified Network for Multiple Tasks
You Only Learn One Representation: Unified Network for Multiple TasksYou Only Learn One Representation: Unified Network for Multiple Tasks
You Only Learn One Representation: Unified Network for Multiple Tasks
harmonylab
ย 
ICASSP่ชญใฟไผš2020
ICASSP่ชญใฟไผš2020ICASSP่ชญใฟไผš2020
ICASSP่ชญใฟไผš2020
Yuki Saito
ย 
[DL่ผช่ชญไผš]Deep Dynamics Models for Learning Dexterous Manipulation
[DL่ผช่ชญไผš]Deep Dynamics Models for Learning Dexterous Manipulation[DL่ผช่ชญไผš]Deep Dynamics Models for Learning Dexterous Manipulation
[DL่ผช่ชญไผš]Deep Dynamics Models for Learning Dexterous Manipulation
Deep Learning JP
ย 
[DL่ผช่ชญไผš]QUASI-RECURRENT NEURAL NETWORKS
[DL่ผช่ชญไผš]QUASI-RECURRENT NEURAL NETWORKS[DL่ผช่ชญไผš]QUASI-RECURRENT NEURAL NETWORKS
[DL่ผช่ชญไผš]QUASI-RECURRENT NEURAL NETWORKS
Deep Learning JP
ย 
ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽจ่ซ–: ๅ› ๆžœๆง‹้€ ๆŽข็ดขใซ้–ขใ™ใ‚‹ๆœ€่ฟ‘ใฎ็™บๅฑ•
ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽจ่ซ–: ๅ› ๆžœๆง‹้€ ๆŽข็ดขใซ้–ขใ™ใ‚‹ๆœ€่ฟ‘ใฎ็™บๅฑ•ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽจ่ซ–: ๅ› ๆžœๆง‹้€ ๆŽข็ดขใซ้–ขใ™ใ‚‹ๆœ€่ฟ‘ใฎ็™บๅฑ•
ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽจ่ซ–: ๅ› ๆžœๆง‹้€ ๆŽข็ดขใซ้–ขใ™ใ‚‹ๆœ€่ฟ‘ใฎ็™บๅฑ•
Shiga University, RIKEN
ย 
ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซใ‚’็”จใ„ใŸใƒžใƒซใƒใƒขใƒผใƒ€ใƒซๅญฆ็ฟ’
ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซใ‚’็”จใ„ใŸใƒžใƒซใƒใƒขใƒผใƒ€ใƒซๅญฆ็ฟ’ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซใ‚’็”จใ„ใŸใƒžใƒซใƒใƒขใƒผใƒ€ใƒซๅญฆ็ฟ’
ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซใ‚’็”จใ„ใŸใƒžใƒซใƒใƒขใƒผใƒ€ใƒซๅญฆ็ฟ’
Masahiro Suzuki
ย 

What's hot (20)

Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
ย 
็ฉบ้–“ใƒ‡ใƒผใ‚ฟใฎใŸใ‚ใฎๅ›žๅธฐๅˆ†ๆž
็ฉบ้–“ใƒ‡ใƒผใ‚ฟใฎใŸใ‚ใฎๅ›žๅธฐๅˆ†ๆž็ฉบ้–“ใƒ‡ใƒผใ‚ฟใฎใŸใ‚ใฎๅ›žๅธฐๅˆ†ๆž
็ฉบ้–“ใƒ‡ใƒผใ‚ฟใฎใŸใ‚ใฎๅ›žๅธฐๅˆ†ๆž
ย 
[DL่ผช่ชญไผš]NVAE: A Deep Hierarchical Variational Autoencoder
[DL่ผช่ชญไผš]NVAE: A Deep Hierarchical Variational Autoencoder[DL่ผช่ชญไผš]NVAE: A Deep Hierarchical Variational Autoencoder
[DL่ผช่ชญไผš]NVAE: A Deep Hierarchical Variational Autoencoder
ย 
ๅ› ๆžœๆŽข็ดข: ๅŸบๆœฌใ‹ใ‚‰ๆœ€่ฟ‘ใฎ็™บๅฑ•ใพใงใ‚’ๆฆ‚่ชฌ
ๅ› ๆžœๆŽข็ดข: ๅŸบๆœฌใ‹ใ‚‰ๆœ€่ฟ‘ใฎ็™บๅฑ•ใพใงใ‚’ๆฆ‚่ชฌๅ› ๆžœๆŽข็ดข: ๅŸบๆœฌใ‹ใ‚‰ๆœ€่ฟ‘ใฎ็™บๅฑ•ใพใงใ‚’ๆฆ‚่ชฌ
ๅ› ๆžœๆŽข็ดข: ๅŸบๆœฌใ‹ใ‚‰ๆœ€่ฟ‘ใฎ็™บๅฑ•ใพใงใ‚’ๆฆ‚่ชฌ
ย 
XGBoostใ‹ใ‚‰NGBoostใพใง
XGBoostใ‹ใ‚‰NGBoostใพใงXGBoostใ‹ใ‚‰NGBoostใพใง
XGBoostใ‹ใ‚‰NGBoostใพใง
ย 
[PRML] ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’๏ผˆ็ฌฌ1็ซ ๏ผšๅบ่ซ–๏ผ‰
[PRML] ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’๏ผˆ็ฌฌ1็ซ ๏ผšๅบ่ซ–๏ผ‰[PRML] ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’๏ผˆ็ฌฌ1็ซ ๏ผšๅบ่ซ–๏ผ‰
[PRML] ใƒ‘ใ‚ฟใƒผใƒณ่ช่ญ˜ใจๆฉŸๆขฐๅญฆ็ฟ’๏ผˆ็ฌฌ1็ซ ๏ผšๅบ่ซ–๏ผ‰
ย 
PRML่ผช่ชญ#12
PRML่ผช่ชญ#12PRML่ผช่ชญ#12
PRML่ผช่ชญ#12
ย 
ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽข็ดขใจ้žใ‚ฌใ‚ฆใ‚นๆ€ง
ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽข็ดขใจ้žใ‚ฌใ‚ฆใ‚นๆ€งๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽข็ดขใจ้žใ‚ฌใ‚ฆใ‚นๆ€ง
ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽข็ดขใจ้žใ‚ฌใ‚ฆใ‚นๆ€ง
ย 
้‡‘่žๆ™‚็ณปๅˆ—ใฎใŸใ‚ใฎๆทฑๅฑคt้Ž็จ‹ๅ›žๅธฐใƒขใƒ‡ใƒซ
้‡‘่žๆ™‚็ณปๅˆ—ใฎใŸใ‚ใฎๆทฑๅฑคt้Ž็จ‹ๅ›žๅธฐใƒขใƒ‡ใƒซ้‡‘่žๆ™‚็ณปๅˆ—ใฎใŸใ‚ใฎๆทฑๅฑคt้Ž็จ‹ๅ›žๅธฐใƒขใƒ‡ใƒซ
้‡‘่žๆ™‚็ณปๅˆ—ใฎใŸใ‚ใฎๆทฑๅฑคt้Ž็จ‹ๅ›žๅธฐใƒขใƒ‡ใƒซ
ย 
็คพไผšๅฟƒ็†ๅญฆใจGlmm
็คพไผšๅฟƒ็†ๅญฆใจGlmm็คพไผšๅฟƒ็†ๅญฆใจGlmm
็คพไผšๅฟƒ็†ๅญฆใจGlmm
ย 
Prml6
Prml6Prml6
Prml6
ย 
PRML่ผช่ชญ#13
PRML่ผช่ชญ#13PRML่ผช่ชญ#13
PRML่ผช่ชญ#13
ย 
MLaPP 9็ซ  ใ€Œไธ€่ˆฌๅŒ–็ทšๅฝขใƒขใƒ‡ใƒซใจๆŒ‡ๆ•ฐๅž‹ๅˆ†ๅธƒๆ—ใ€
MLaPP 9็ซ  ใ€Œไธ€่ˆฌๅŒ–็ทšๅฝขใƒขใƒ‡ใƒซใจๆŒ‡ๆ•ฐๅž‹ๅˆ†ๅธƒๆ—ใ€MLaPP 9็ซ  ใ€Œไธ€่ˆฌๅŒ–็ทšๅฝขใƒขใƒ‡ใƒซใจๆŒ‡ๆ•ฐๅž‹ๅˆ†ๅธƒๆ—ใ€
MLaPP 9็ซ  ใ€Œไธ€่ˆฌๅŒ–็ทšๅฝขใƒขใƒ‡ใƒซใจๆŒ‡ๆ•ฐๅž‹ๅˆ†ๅธƒๆ—ใ€
ย 
ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใจใ‚ทใƒŸใƒฅใƒฌใƒผใ‚ทใƒงใƒณๆณ•ใฎๅŸบ็คŽ
ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใจใ‚ทใƒŸใƒฅใƒฌใƒผใ‚ทใƒงใƒณๆณ•ใฎๅŸบ็คŽใƒ™ใ‚คใ‚บๆŽจ่ซ–ใจใ‚ทใƒŸใƒฅใƒฌใƒผใ‚ทใƒงใƒณๆณ•ใฎๅŸบ็คŽ
ใƒ™ใ‚คใ‚บๆŽจ่ซ–ใจใ‚ทใƒŸใƒฅใƒฌใƒผใ‚ทใƒงใƒณๆณ•ใฎๅŸบ็คŽ
ย 
You Only Learn One Representation: Unified Network for Multiple Tasks
You Only Learn One Representation: Unified Network for Multiple TasksYou Only Learn One Representation: Unified Network for Multiple Tasks
You Only Learn One Representation: Unified Network for Multiple Tasks
ย 
ICASSP่ชญใฟไผš2020
ICASSP่ชญใฟไผš2020ICASSP่ชญใฟไผš2020
ICASSP่ชญใฟไผš2020
ย 
[DL่ผช่ชญไผš]Deep Dynamics Models for Learning Dexterous Manipulation
[DL่ผช่ชญไผš]Deep Dynamics Models for Learning Dexterous Manipulation[DL่ผช่ชญไผš]Deep Dynamics Models for Learning Dexterous Manipulation
[DL่ผช่ชญไผš]Deep Dynamics Models for Learning Dexterous Manipulation
ย 
[DL่ผช่ชญไผš]QUASI-RECURRENT NEURAL NETWORKS
[DL่ผช่ชญไผš]QUASI-RECURRENT NEURAL NETWORKS[DL่ผช่ชญไผš]QUASI-RECURRENT NEURAL NETWORKS
[DL่ผช่ชญไผš]QUASI-RECURRENT NEURAL NETWORKS
ย 
ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽจ่ซ–: ๅ› ๆžœๆง‹้€ ๆŽข็ดขใซ้–ขใ™ใ‚‹ๆœ€่ฟ‘ใฎ็™บๅฑ•
ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽจ่ซ–: ๅ› ๆžœๆง‹้€ ๆŽข็ดขใซ้–ขใ™ใ‚‹ๆœ€่ฟ‘ใฎ็™บๅฑ•ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽจ่ซ–: ๅ› ๆžœๆง‹้€ ๆŽข็ดขใซ้–ขใ™ใ‚‹ๆœ€่ฟ‘ใฎ็™บๅฑ•
ๆง‹้€ ๆ–น็จ‹ๅผใƒขใƒ‡ใƒซใซใ‚ˆใ‚‹ๅ› ๆžœๆŽจ่ซ–: ๅ› ๆžœๆง‹้€ ๆŽข็ดขใซ้–ขใ™ใ‚‹ๆœ€่ฟ‘ใฎ็™บๅฑ•
ย 
ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซใ‚’็”จใ„ใŸใƒžใƒซใƒใƒขใƒผใƒ€ใƒซๅญฆ็ฟ’
ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซใ‚’็”จใ„ใŸใƒžใƒซใƒใƒขใƒผใƒ€ใƒซๅญฆ็ฟ’ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซใ‚’็”จใ„ใŸใƒžใƒซใƒใƒขใƒผใƒ€ใƒซๅญฆ็ฟ’
ๆทฑๅฑค็”Ÿๆˆใƒขใƒ‡ใƒซใ‚’็”จใ„ใŸใƒžใƒซใƒใƒขใƒผใƒ€ใƒซๅญฆ็ฟ’
ย 

Similar to ngboost.pptx

ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
Hadrian7
ย 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
ananth
ย 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn Algorithm
Supun Abeysinghe
ย 
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
San Kim
ย 
Regression ppt
Regression pptRegression ppt
Regression ppt
SuyashSingh70
ย 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Universitat Politรจcnica de Catalunya
ย 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
Knoldus Inc.
ย 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
ChenYiHuang5
ย 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
Ramakrishna Reddy Bijjam
ย 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
PrabhuSelvaraj15
ย 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
ย 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
Sanghyuk Chun
ย 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
Joe li
ย 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
Sourya Dey
ย 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
Valerii Klymchuk
ย 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Maninda Edirisooriya
ย 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.
Wuhyun Rico Shin
ย 
An overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdfAn overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdf
vudinhphuong96
ย 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
JCMwave
ย 
Linear Regression Paper Review.pptx
Linear Regression Paper Review.pptxLinear Regression Paper Review.pptx
Linear Regression Paper Review.pptx
MurindanyiSudi1
ย 

Similar to ngboost.pptx (20)

ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
ย 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
ย 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn Algorithm
ย 
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
ย 
Regression ppt
Regression pptRegression ppt
Regression ppt
ย 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
ย 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
ย 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
ย 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
ย 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
ย 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
ย 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
ย 
Deep learning concepts
Deep learning conceptsDeep learning concepts
Deep learning concepts
ย 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
ย 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
ย 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
ย 
Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.Paper review: Learned Optimizers that Scale and Generalize.
Paper review: Learned Optimizers that Scale and Generalize.
ย 
An overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdfAn overview of gradient descent optimization algorithms.pdf
An overview of gradient descent optimization algorithms.pdf
ย 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
ย 
Linear Regression Paper Review.pptx
Linear Regression Paper Review.pptxLinear Regression Paper Review.pptx
Linear Regression Paper Review.pptx
ย 

Recently uploaded

Better Builder Magazine, Issue 49 / Spring 2024
Better Builder Magazine, Issue 49 / Spring 2024Better Builder Magazine, Issue 49 / Spring 2024
Better Builder Magazine, Issue 49 / Spring 2024
Better Builder Magazine
ย 
๐Ÿ”ฅYoung College Call Girls Chandigarh ๐Ÿ’ฏCall Us ๐Ÿ” 7737669865 ๐Ÿ”๐Ÿ’ƒIndependent Chan...
๐Ÿ”ฅYoung College Call Girls Chandigarh ๐Ÿ’ฏCall Us ๐Ÿ” 7737669865 ๐Ÿ”๐Ÿ’ƒIndependent Chan...๐Ÿ”ฅYoung College Call Girls Chandigarh ๐Ÿ’ฏCall Us ๐Ÿ” 7737669865 ๐Ÿ”๐Ÿ’ƒIndependent Chan...
๐Ÿ”ฅYoung College Call Girls Chandigarh ๐Ÿ’ฏCall Us ๐Ÿ” 7737669865 ๐Ÿ”๐Ÿ’ƒIndependent Chan...
sonamrawat5631
ย 
Standards Method of Detailing Structural Concrete.pdf
Standards Method of Detailing Structural Concrete.pdfStandards Method of Detailing Structural Concrete.pdf
Standards Method of Detailing Structural Concrete.pdf
baoancons14
ย 
paper relate Chozhavendhan et al. 2020.pdf
paper relate Chozhavendhan et al. 2020.pdfpaper relate Chozhavendhan et al. 2020.pdf
paper relate Chozhavendhan et al. 2020.pdf
ShurooqTaib
ย 
Hot Call Girls In Bangalore โœ” 9079923931 โœ” Hi I Am Divya Vip Call Girl Servic...
Hot Call Girls In Bangalore โœ” 9079923931 โœ” Hi I Am Divya Vip Call Girl Servic...Hot Call Girls In Bangalore โœ” 9079923931 โœ” Hi I Am Divya Vip Call Girl Servic...
Hot Call Girls In Bangalore โœ” 9079923931 โœ” Hi I Am Divya Vip Call Girl Servic...
Banerescorts
ย 
anatomy of space vehicle and aerospace structures.pptx
anatomy of space vehicle and aerospace structures.pptxanatomy of space vehicle and aerospace structures.pptx
anatomy of space vehicle and aerospace structures.pptx
ssusercf1619
ย 
BBOC407 Module 1.pptx Biology for Engineers
BBOC407  Module 1.pptx Biology for EngineersBBOC407  Module 1.pptx Biology for Engineers
BBOC407 Module 1.pptx Biology for Engineers
sathishkumars808912
ย 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Balvir Singh
ย 
Covid Management System Project Report.pdf
Covid Management System Project Report.pdfCovid Management System Project Report.pdf
Covid Management System Project Report.pdf
Kamal Acharya
ย 
Call Girls In Rohini (Delhi) Call 9711199012 โˆฐ Escort Service In Delhi โˆฐ
Call Girls In Rohini (Delhi) Call 9711199012 โˆฐ Escort Service In Delhi โˆฐCall Girls In Rohini (Delhi) Call 9711199012 โˆฐ Escort Service In Delhi โˆฐ
Call Girls In Rohini (Delhi) Call 9711199012 โˆฐ Escort Service In Delhi โˆฐ
AK47
ย 
An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...
An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...
An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...
DharmaBanothu
ย 
Call Girls Goa (india) โ˜Ž๏ธ +91-7426014248 Goa Call Girl
Call Girls Goa (india) โ˜Ž๏ธ +91-7426014248 Goa Call GirlCall Girls Goa (india) โ˜Ž๏ธ +91-7426014248 Goa Call Girl
Call Girls Goa (india) โ˜Ž๏ธ +91-7426014248 Goa Call Girl
sapna sharmap11
ย 
Call Girls In Tiruppur ๐Ÿ‘ฏโ€โ™€๏ธ 7339748667 ๐Ÿ”ฅ Free Home Delivery Within 30 Minutes
Call Girls In Tiruppur ๐Ÿ‘ฏโ€โ™€๏ธ 7339748667 ๐Ÿ”ฅ Free Home Delivery Within 30 MinutesCall Girls In Tiruppur ๐Ÿ‘ฏโ€โ™€๏ธ 7339748667 ๐Ÿ”ฅ Free Home Delivery Within 30 Minutes
Call Girls In Tiruppur ๐Ÿ‘ฏโ€โ™€๏ธ 7339748667 ๐Ÿ”ฅ Free Home Delivery Within 30 Minutes
kamka4105
ย 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
Pallavi Sharma
ย 
๐Ÿ”ฅLiploCk Call Girls Pune ๐Ÿ’ฏCall Us ๐Ÿ” 7014168258 ๐Ÿ”๐Ÿ’ƒIndependent Pune Escorts Ser...
๐Ÿ”ฅLiploCk Call Girls Pune ๐Ÿ’ฏCall Us ๐Ÿ” 7014168258 ๐Ÿ”๐Ÿ’ƒIndependent Pune Escorts Ser...๐Ÿ”ฅLiploCk Call Girls Pune ๐Ÿ’ฏCall Us ๐Ÿ” 7014168258 ๐Ÿ”๐Ÿ’ƒIndependent Pune Escorts Ser...
๐Ÿ”ฅLiploCk Call Girls Pune ๐Ÿ’ฏCall Us ๐Ÿ” 7014168258 ๐Ÿ”๐Ÿ’ƒIndependent Pune Escorts Ser...
adhaniomprakash
ย 
Call Girls Chandigarh ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available 24/7...
Call Girls Chandigarh ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available 24/7...Call Girls Chandigarh ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available 24/7...
Call Girls Chandigarh ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available 24/7...
shourabjaat424
ย 
My Aerospace Design and Structures Career Engineering LinkedIn version Presen...
My Aerospace Design and Structures Career Engineering LinkedIn version Presen...My Aerospace Design and Structures Career Engineering LinkedIn version Presen...
My Aerospace Design and Structures Career Engineering LinkedIn version Presen...
Geoffrey Wardle. MSc. MSc. Snr.MAIAA
ย 
๐ŸšบANJALI MEHTA High Profile Call Girls Ahmedabad ๐Ÿ’ฏCall Us ๐Ÿ” 9352988975 ๐Ÿ”๐Ÿ’ƒTop C...
๐ŸšบANJALI MEHTA High Profile Call Girls Ahmedabad ๐Ÿ’ฏCall Us ๐Ÿ” 9352988975 ๐Ÿ”๐Ÿ’ƒTop C...๐ŸšบANJALI MEHTA High Profile Call Girls Ahmedabad ๐Ÿ’ฏCall Us ๐Ÿ” 9352988975 ๐Ÿ”๐Ÿ’ƒTop C...
๐ŸšบANJALI MEHTA High Profile Call Girls Ahmedabad ๐Ÿ’ฏCall Us ๐Ÿ” 9352988975 ๐Ÿ”๐Ÿ’ƒTop C...
dulbh kashyap
ย 
College Call Girls Kolkata ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available...
College Call Girls Kolkata ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available...College Call Girls Kolkata ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available...
College Call Girls Kolkata ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available...
Ak47
ย 
Update 40 models( Solar Cell ) in SPICE PARK(JUL2024)
Update 40 models( Solar Cell ) in SPICE PARK(JUL2024)Update 40 models( Solar Cell ) in SPICE PARK(JUL2024)
Update 40 models( Solar Cell ) in SPICE PARK(JUL2024)
Tsuyoshi Horigome
ย 

Recently uploaded (20)

Better Builder Magazine, Issue 49 / Spring 2024
Better Builder Magazine, Issue 49 / Spring 2024Better Builder Magazine, Issue 49 / Spring 2024
Better Builder Magazine, Issue 49 / Spring 2024
ย 
๐Ÿ”ฅYoung College Call Girls Chandigarh ๐Ÿ’ฏCall Us ๐Ÿ” 7737669865 ๐Ÿ”๐Ÿ’ƒIndependent Chan...
๐Ÿ”ฅYoung College Call Girls Chandigarh ๐Ÿ’ฏCall Us ๐Ÿ” 7737669865 ๐Ÿ”๐Ÿ’ƒIndependent Chan...๐Ÿ”ฅYoung College Call Girls Chandigarh ๐Ÿ’ฏCall Us ๐Ÿ” 7737669865 ๐Ÿ”๐Ÿ’ƒIndependent Chan...
๐Ÿ”ฅYoung College Call Girls Chandigarh ๐Ÿ’ฏCall Us ๐Ÿ” 7737669865 ๐Ÿ”๐Ÿ’ƒIndependent Chan...
ย 
Standards Method of Detailing Structural Concrete.pdf
Standards Method of Detailing Structural Concrete.pdfStandards Method of Detailing Structural Concrete.pdf
Standards Method of Detailing Structural Concrete.pdf
ย 
paper relate Chozhavendhan et al. 2020.pdf
paper relate Chozhavendhan et al. 2020.pdfpaper relate Chozhavendhan et al. 2020.pdf
paper relate Chozhavendhan et al. 2020.pdf
ย 
Hot Call Girls In Bangalore โœ” 9079923931 โœ” Hi I Am Divya Vip Call Girl Servic...
Hot Call Girls In Bangalore โœ” 9079923931 โœ” Hi I Am Divya Vip Call Girl Servic...Hot Call Girls In Bangalore โœ” 9079923931 โœ” Hi I Am Divya Vip Call Girl Servic...
Hot Call Girls In Bangalore โœ” 9079923931 โœ” Hi I Am Divya Vip Call Girl Servic...
ย 
anatomy of space vehicle and aerospace structures.pptx
anatomy of space vehicle and aerospace structures.pptxanatomy of space vehicle and aerospace structures.pptx
anatomy of space vehicle and aerospace structures.pptx
ย 
BBOC407 Module 1.pptx Biology for Engineers
BBOC407  Module 1.pptx Biology for EngineersBBOC407  Module 1.pptx Biology for Engineers
BBOC407 Module 1.pptx Biology for Engineers
ย 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
ย 
Covid Management System Project Report.pdf
Covid Management System Project Report.pdfCovid Management System Project Report.pdf
Covid Management System Project Report.pdf
ย 
Call Girls In Rohini (Delhi) Call 9711199012 โˆฐ Escort Service In Delhi โˆฐ
Call Girls In Rohini (Delhi) Call 9711199012 โˆฐ Escort Service In Delhi โˆฐCall Girls In Rohini (Delhi) Call 9711199012 โˆฐ Escort Service In Delhi โˆฐ
Call Girls In Rohini (Delhi) Call 9711199012 โˆฐ Escort Service In Delhi โˆฐ
ย 
An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...
An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...
An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...
ย 
Call Girls Goa (india) โ˜Ž๏ธ +91-7426014248 Goa Call Girl
Call Girls Goa (india) โ˜Ž๏ธ +91-7426014248 Goa Call GirlCall Girls Goa (india) โ˜Ž๏ธ +91-7426014248 Goa Call Girl
Call Girls Goa (india) โ˜Ž๏ธ +91-7426014248 Goa Call Girl
ย 
Call Girls In Tiruppur ๐Ÿ‘ฏโ€โ™€๏ธ 7339748667 ๐Ÿ”ฅ Free Home Delivery Within 30 Minutes
Call Girls In Tiruppur ๐Ÿ‘ฏโ€โ™€๏ธ 7339748667 ๐Ÿ”ฅ Free Home Delivery Within 30 MinutesCall Girls In Tiruppur ๐Ÿ‘ฏโ€โ™€๏ธ 7339748667 ๐Ÿ”ฅ Free Home Delivery Within 30 Minutes
Call Girls In Tiruppur ๐Ÿ‘ฏโ€โ™€๏ธ 7339748667 ๐Ÿ”ฅ Free Home Delivery Within 30 Minutes
ย 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
ย 
๐Ÿ”ฅLiploCk Call Girls Pune ๐Ÿ’ฏCall Us ๐Ÿ” 7014168258 ๐Ÿ”๐Ÿ’ƒIndependent Pune Escorts Ser...
๐Ÿ”ฅLiploCk Call Girls Pune ๐Ÿ’ฏCall Us ๐Ÿ” 7014168258 ๐Ÿ”๐Ÿ’ƒIndependent Pune Escorts Ser...๐Ÿ”ฅLiploCk Call Girls Pune ๐Ÿ’ฏCall Us ๐Ÿ” 7014168258 ๐Ÿ”๐Ÿ’ƒIndependent Pune Escorts Ser...
๐Ÿ”ฅLiploCk Call Girls Pune ๐Ÿ’ฏCall Us ๐Ÿ” 7014168258 ๐Ÿ”๐Ÿ’ƒIndependent Pune Escorts Ser...
ย 
Call Girls Chandigarh ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available 24/7...
Call Girls Chandigarh ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available 24/7...Call Girls Chandigarh ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available 24/7...
Call Girls Chandigarh ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available 24/7...
ย 
My Aerospace Design and Structures Career Engineering LinkedIn version Presen...
My Aerospace Design and Structures Career Engineering LinkedIn version Presen...My Aerospace Design and Structures Career Engineering LinkedIn version Presen...
My Aerospace Design and Structures Career Engineering LinkedIn version Presen...
ย 
๐ŸšบANJALI MEHTA High Profile Call Girls Ahmedabad ๐Ÿ’ฏCall Us ๐Ÿ” 9352988975 ๐Ÿ”๐Ÿ’ƒTop C...
๐ŸšบANJALI MEHTA High Profile Call Girls Ahmedabad ๐Ÿ’ฏCall Us ๐Ÿ” 9352988975 ๐Ÿ”๐Ÿ’ƒTop C...๐ŸšบANJALI MEHTA High Profile Call Girls Ahmedabad ๐Ÿ’ฏCall Us ๐Ÿ” 9352988975 ๐Ÿ”๐Ÿ’ƒTop C...
๐ŸšบANJALI MEHTA High Profile Call Girls Ahmedabad ๐Ÿ’ฏCall Us ๐Ÿ” 9352988975 ๐Ÿ”๐Ÿ’ƒTop C...
ย 
College Call Girls Kolkata ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available...
College Call Girls Kolkata ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available...College Call Girls Kolkata ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available...
College Call Girls Kolkata ๐Ÿ”ฅ 7014168258 ๐Ÿ”ฅ Real Fun With Sexual Girl Available...
ย 
Update 40 models( Solar Cell ) in SPICE PARK(JUL2024)
Update 40 models( Solar Cell ) in SPICE PARK(JUL2024)Update 40 models( Solar Cell ) in SPICE PARK(JUL2024)
Update 40 models( Solar Cell ) in SPICE PARK(JUL2024)
ย 

ngboost.pptx

  • 2. Outlines โ€ข Introduction. โ€ข What is probabilistic regression? โ€ข Why is it useful? โ€ข How does other methods compare to NGBoost? โ€ข Gradient Boosting Algorithm. โ€ข NGBoost: โ€ข Main components. โ€ข Steps. โ€ข Usage. โ€ข Experiments & Results. โ€ข Computational Complexity. โ€ข Future Work. โ€ข References.
  • 3. Introduction What is probabilistic regression? (Standard Regression) Note: This use of conditional probability distributions is already the norm in classification
  • 4. Why is probabilistic regression (prediction) useful? The measure of uncertainty makes probabilistic prediction crucial in applications like healthcare and weather forecasting.
  • 5. Why is probabilistic regression (prediction) useful? All in all, probabilistic regression (prediction) provides better insight over standard (scalar) regression. E[Y|X=x] X=x P(Y|X=x)
  • 6. Problems with existing methods Methods: โ€ข Post-hoc variance. โ€ข Generalized Additive Models for Shape Scale and Location (GAMLSS) โ€ข Bayesian methods like MMC. โ€ข Bayesian deep learning. Problems: โ€ข Inflexible. โ€ข Slow. โ€ข Require expert knowledge. โ€ข Make strong assumption about nature of data (Homoscedasticity*) Limitations of deep learning methods: difficult to use out-of-the-box โ€ข Require expert knowledge. โ€ข Usually perform only on par with traditional methods on limited size or tabular data. โ€ข Require extensive hyperparameter tuning. * Homoscedasticity: means that all random variables in a sequence have the same finite variance.
  • 7. Gradient Boosting Machines (GBMs) โ€ข A set of highly modular methods that: โ€ข work out-of-the-box. โ€ข Perform well on structured data, even with small datasets. โ€ข Demonstrated empirical success on Kaggle and other data science competitions. Source: what algorithms are most successful on Kaggle?
  • 8. Problems related to GBMs โ€ข Assume Homoscedasticity: constant variance. โ€ข Predicted distributions should have at least two degrees of freedom (two parameters) to effectively convey both the magnitude and the uncertainty of the predictions. What is the solution then? (Spoiler Alert) it is NGBoost ๏Š NGBoost sovles the problem of simultaneous boosting of multiple parameters from the base learners using: โ€ข A multiparameter boosting approach. โ€ข Use of natural gradients.
  • 9. Gradient Boosting Algorithm โ€ข An ensemble of simple models are involved in making a prediction. โ€ข Results in a prediction model in the form of ensemble weak models. โ€ข Intuition: the best possible next model, when combined with previous models, minimizes the overall prediction error. โ€ข Components: โ€ข A loss function to be optimized. โ€ข E.g., MSE or Logarithmic Loss. โ€ข A weak learner to make predictions. โ€ข Most common choice is Decision Trees or Regression Trees. โ€ข Common to constrain the learner such as specifying maximum number of layers, nodes, splits or leaf nodes. โ€ข An additive model to add weak learners to minimize the loss function. โ€ข A gradient descent procedure is used to minimize the loss when adding trees.
  • 11. Gradient Boosting Algorithm Explanation: Step 1: Initialize prediction to a constant whose value minimizes the loss. You could solve using Gradient Descent or manually if problem is trivial. Step 2: build the trees (weak learners) (A) Compute residuals between prediction and observed data. Use prediction of previous step ๐น ๐‘ฅ = ๐น๐‘šโˆ’1(๐‘ฅ), which is ๐น0(๐‘ฅ) for ๐‘š = 1. (B) Optimize tree on the residuals (make residuals the target output). ๐‘— here loops over leaf nodes. (C) Determine output for each leaf in tree. E.g., if leaf has 14.7 and 2.7, then output is the value of ๐›พ that minimizes the summation. Different than Step 1, here we are taking previous prediction ๐น๐‘šโˆ’1(๐‘ฅ๐‘–) into account. (D) Make a new prediction for each sample. The summation accounts for the case that a single sample ends up in multiple leaves. So, you take a scaled sum of the outputs ๐›พ for each leaf. Choosing a small learning rate ๐œ improves prediction Step 3: Final prediction is the prediction of the last tree. To learn more: โ€ข Paper: Greedy Function Approximation: A Gradient Boosting Machine, Jerome H. Friedman. โ€ข Video explanations: Gradient Boost part 1, part 2, part 3, part 4. โ€ข Decision Trees video explanation: Decision Trees. โ€ข AdaBoost video explanation: AdaBoost.
  • 12. NGBoost: Natural Gradient Boosting โ€ข A method for probabilistic prediction with competitive state-of-the-art performance on a variety of datasets. โ€ข Combines a multiparameter boosting algorithm with the natural gradient to efficiently how the parameters of the presumed outcome distribution vary with the observed features. โ€ข In a standard prediction setting: โ€ข the object of interest is the estimate of a scalar function ฮ•(๐‘ฆ|๐‘ฅ) where ๐‘ฅ is the vector of covariates (observed features) and ๐‘ฆ is the prediction target. โ€ข For NGBoost: โ€ข The object of interest is a conditional probability distribution ๐‘ƒ๐œƒ(๐‘ฆ|๐‘ฅ). โ€ข Assuming ๐‘ƒ๐œƒ ๐‘ฆ ๐‘ฅ has a parametric form of ๐‘ parameters where ๐œƒ ๐œ– โ„๐‘ (vector of p parameters).
  • 13. NGBoost: Natural Gradient Boosting Components: โ€ข Base learner (e.g. Regression Tree). โ€ข Parametric probability distribution (Normal, Laplace, Poisson, etc.). โ€ข Scoring Rule (MLE, CRPS, etc.).
  • 14. NGBoost: Natural Gradient Boosting Steps: 1. Pick a scoring rule to grade our estimate of P(Y|X=x) 2. Assume that P(Y|X=x) has some parametric form 3. Fit the parameters ฮธ(x) as a function of x using gradient boosting 4. Use the natural gradient to correct the training dynamics of this approach
  • 15. Proper Scoring Rule A proper scoring rule ๐‘†(๐‘ƒ, ๐‘ฆ) must satisfy: ฮ•๐‘ฆ~๐‘„ ๐‘†(๐‘„, ๐‘ฆ) โ‰ค ฮ•๐‘ฆ~๐‘„ ๐‘† ๐‘ƒ, ๐‘ฆ โˆ€ ๐‘ƒ, ๐‘„ ๐‘„: ๐‘ก๐‘Ÿ๐‘ข๐‘’ ๐‘‘๐‘–๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘๐‘ข๐‘ก๐‘–๐‘œ๐‘› ๐‘œ๐‘“ ๐‘œ๐‘ข๐‘ก๐‘๐‘œ๐‘š๐‘’๐‘  ๐‘ฆ ๐‘ƒ: ๐‘Ž๐‘›๐‘ฆ ๐‘œ๐‘กโ„Ž๐‘’๐‘Ÿ ๐‘‘๐‘–๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘๐‘ข๐‘ก๐‘–๐‘œ๐‘› ๐‘’. ๐‘”. ๐‘๐‘Ÿ๐‘’๐‘‘๐‘–๐‘๐‘ก๐‘’๐‘‘ ๐‘œ๐‘“ ๐‘œ๐‘ข๐‘ก๐‘๐‘œ๐‘š๐‘’๐‘  ๐‘ฆ In other words, the scoring rule assigns a score to the forecast such that the true distribution ๐‘„ of the outcomes gets the best score in expectation compared to other distributions, like ๐‘ƒ. (Gneiting and Raftery, 2007. Strictly Proper Scoring Rules, Prediction, and Estimation.)
  • 16. 1. Pick a scoring rule to grade our estimate of P(Y|X=x) Point Prediction Loss Function Probabilistic Prediction Scoring Rule Example scoring rule: negative log-likelihood Notes: โ€ข A scoring rule in probabilistic regression is analogous to loss function in standard regression. โ€ข NLL when minimized gives the Maximum Likelihood Estimation (MLE). โ€ข Taking the log simplifies the calculus. โ€ข NLL (MLE) is the most common propre scoring rule. โ€ข CRPS is another good alternative to MLE.
  • 17. 2. Assume P(Y|X=x) has some parametric form ฮผ = 1 ฯƒ = 1 ฮผ = 2 ฯƒ = 0.5 ฮผ = 2.5 ฯƒ = 0.75 ฮผ = 3.5 ฯƒ = 1.5 Note: here they are assuming a normal distribution, but you can swap out with any other distribution (Poisson, Bernoulli, etc.) that fits your application.
  • 18. 3. Fit the parameters ฮธ(x) as a function of x using gradient boosting ฮผ = 1 ฯƒ = 1 ฮผ = 2 ฯƒ = 0.5 ฮผ = 2.5 ฯƒ = 0.75 ฮผ = 3.5 ฯƒ = 1.5
  • 19. This approach performs poorly in practice. What we get: What we want: The algorithm is failing to adjust the mean which is affecting prediction. What could be the solution? Use natural gradients instead of ordinary gradients.
  • 20. What we typically do: gradient descent in the parameter space โ€ข Pick a small region around your value of ๐œƒ โ€ข Which direction, to step into in the ball, decreases the score. (aka gradient)
  • 21. What we want to do: Gradient descent in the space of distributions Every point in this space represents some distribution.
  • 22. Parametrizing the space of distributions is just a โ€œnameโ€ for P Each distribution has such a name (i.e. is โ€œidentifiedโ€ by its parameters)
  • 23. The problem is: Gradient descent in the parameter space is not gradient descent in the distribution space because distances donโ€™t correspond. Thatโ€™s because distances are not the same in both spaces. Spaces have different shape and density
  • 24. 4. Use the natural gradient to correct the training dynamics of this approach. this is the natural gradient Idea: do gradient descent in the distribution by searching parameters in the transformed region
  • 25. โ€ข is the Riemannian metric of the space of distributions โ€ข It depends on the parametric form chosen and the score function โ€ข If the score is NLL, this is the Fisher Information Hereโ€™s the trick: โ€ข Multiplying the ordinary gradient with Riemannian metric which will implicitly transform optimal direction in parameter space to optimal direction in the distributional space. โ€ข We can conveniently compute the natural gradient by applying a transformation to the gradient
  • 26. Proper scoring rules and corresponding gradients for fitting a Normal distributions ~๐‘(0,1)
  • 27. NGBoost Explanation: 1. Estimate a common ๐œƒ(0) such that it minimizes ๐‘†. 2. For each iteration ๐‘š: โ€ข Compute the natural gradient ๐‘”๐‘– (๐‘š) of ๐‘† with respect to the predicted parameters of that example up to that stage, ๐œƒ๐‘– (๐‘šโˆ’1) . โ€ข Fit learners, one per parameter, on natural gradients. E.g. ๐‘“(๐‘š) = (๐‘“๐œ‡ ๐‘š , ๐‘“log ๐œŽ ๐‘š ) โ€ข Compute a scaling factor ๐œŒ(๐‘š) (scalar) such that it minimizes true scoring rule along the projected gradient in the form of line search. In practice, they found setting ๐œŒ = 1 and then halving successively works well. โ€ข Update predicted parameters. Notes: โ€ข learning rate ๐œ‚ is typically 0.1 or 0.01. According to Friedman assumption. โ€ข Sub-sampling mini-batches can improve computation performance for large datasets.
  • 28. Experiments โ€ข UCI ML Repository benchmarks. โ€ข Probabilistic Regression: โ€ข Configuration: โ€ข Data split: 70% training, 20% validation, and 10% testing. โ€ข Repeated 20 times. โ€ข Ablation: โ€ข 2nd-Order boosting: use 2nd order gradients instead of natural gradients. โ€ข Multiparameter boosting: using ordinary gradients instead of natural gradients. โ€ข Homoscedastic boosting: assuming constant variance to see the benefits of the allowing parameters other than the conditional mean to vary across ๐‘ฅ. โ€ข Why? To demonstrate that multiparameter boosting and the natural gradient work together to improve performance. โ€ข Point estimation.
  • 29. Results The result is equal or better performance than state-of-the art probabilistic prediction methods
  • 31. Results NGBoost is competitive for point prediction too
  • 32. Usage
  • 33. Computational Complexity Difference between NGBoost and other boosting algorithms: โ€ข NGBoost is a series of learners that must be fit for each parameter, whereas standard boosting fits only one series of learners. โ€ข Natural Gradient ๐‘๐‘ฅ๐‘ ๐ผ๐‘  โˆ’1 matrix is computed at each step. Note that ๐‘ is the number of parameters. In practice: โ€ข The matrix is small for most used distributions. Only 2x2 if using Normal distribution. โ€ข If dataset is huge, it may still be expensive to compute large number of matrices for each iteration.
  • 34. Future work โ€ข Apply NGBoost to classification (e.g. survival) โ€ข Joint prediction: ๐‘ƒ๐œƒ(๐‘ง, ๐‘ฆ|๐‘ฅ) โ€ข Technical innovations: โ€ข Better tree-based base learners and regularization are likely to improve performance especially in terms of large datasets.
  • 35. References โ€ข NGBoost: Natural Gradient Boosting for Probabilistic Prediction โ€ข NGBoost: Stanford ML Group
  ็ฟป่ฏ‘๏ผš