尊敬的 微信汇率:1円 ≈ 0.046239 元 支付宝汇率:1円 ≈ 0.04633元 [退出登录]
SlideShare a Scribd company logo
Deep Learning
DSE Module 2
Sivagami R
BITS Pilani
The author of this deck, Prof. Seetha Parameswaran,
is gratefully acknowledging the authors who made
their course materials freely available online.
2
Single Perceptron for Regression
Real Valued Output
3
Linear Regression Example
● Suppose that we wish to estimate the prices of houses (in dollars)
based on their area (in square feet) and age (in years).
● The linearity assumption just says that the target (price) can be
expressed as a weighted sum of the features (area and age):
price = warea * area + wage * age + b
● warea and wage are called weights, and b is called a bias.
● The weights determine the influence of each feature on our prediction
and the bias just says what value the predicted price should take
when all of the features take value 0.
4
Data
● Data
○ The dataset is called a training dataset or training set.
○ Each row is called an example (or data point, data instance, sample).
○ The thing we are trying to predict is called a label (or target). 
○ The independent variables upon which the predictions are based are called
features (or covariates).
5
Affine transformations and Linear Models
● The equation of the form
● is an affine transformation of input features, which is characterized by a
linear transformation of features via weighted sum, combined with a
translation via the added bias.
● Models whose output prediction is determined by the affine
transformation of input features are linear models.
● The affine transformation is specified by the chosen weights (w) and bias
(b). 6
Loss Function
● Loss function is a quality measure for some given model or a
measure of fitness.
● The loss function quantifies the distance between the real and
predicted value of the target.
● The loss will usually be a non-negative number where smaller values
are better and perfect predictions incur a loss of 0.
● The most popular loss function in regression problems is the squared
error.
7
Squared Error Loss Function
● The most popular loss function in regression problems is the squared
error.
● For each example,
● For the entire dataset of n examples
○ average (or equivalently, sum) the losses
● When training the model, find parameters (w∗, b∗ ) that minimize the
total loss across all training examples
8
Minibatch Stochastic Gradient Descent (SGD)
● Apply Gradient descent algorithm on a random minibatch of examples
every time we need to compute the update.
● In each iteration,
○ Step 1: randomly sample a minibatch B consisting of a fixed number of training
examples.
○ Step 2: compute the derivative (gradient) of the average loss on the minibatch with
regard to the model parameters.
○ Step 3: multiply the gradient by a predetermined positive value η and subtract the
resulting term from the current parameter values.
9
Training using SGD Algorithm
PS: The number of epochs and the learning rate are both hyperparameters. Setting
hyperparameters requires some adjustment by trial and error.
10
Prediction
● Estimating targets given features is commonly called prediction or
inference.
● Given the learned model, values of target can be predicted, for any
set of features.
11
Single-Layer Neural Network
Linear regression is a single-layer neural network
○ Number of inputs (or feature dimensionality) in the input layer is d.
○ The inputs are x1 , . . . , xd.
○ The output is o1.
○ Number of outputs in the output layer is 1.
○ Number of layers for the neural network is 1. (conventionally we do not consider the
input layer when counting layers.)
○ Every input is connected to every output, This transformation is a fully-connected
layer or dense layer.
12
Multiple Perceptrons for Classification
Binary Outputs
13
Classification Example
● Each input consists of a 2 × 2 grayscale image.
● Represent each pixel value with a single scalar, giving four features
x1 , x2, x3 , x4.
● Assume that each image belongs to one among the categories
“square”, “triangle”, and “circle”.
● How to represent the labels?
○ Use label encoding. y ∈ {1, 2, 3}, where the integers represent {circle, square,
triangle} respectively.
○ Use one-hot encoding. y ∈ {(1, 0, 0), (0, 1, 0), (0, 0, 1)}.
■ y would be a three-dimensional vector, with (1, 0, 0) corresponding to “circle”,
(0, 1, 0) to “square”, and (0, 0, 1) to “triangle”.
14
Network Architecture
● A model with multiple outputs, one per class. Each output will
correspond to its own affine function.
○ 4 features and 3 possible output categories
15
Network Architecture
○ 12 scalars to represent the weights and 3 scalars to represent the biases
○ compute three logits, o1, o2, and o3, for each input.
○ weights is a 3×4 matrix and bias is 1×4 matrix
16
Softmax Operation
● Interpret the outputs of our model as probabilities.
○ Any output ŷj is interpreted as the probability that a given item belongs to class j. Then
choose the class with the largest output value as our prediction argmaxj yj .
○ If ŷ1 , ŷ2 , and ŷ3 are 0.1, 0.8, and 0.1, respectively, then predict category 2.
○ To interpret the outputs as probabilities, we must guarantee that, they will be
nonnegative and sum up to 1.
● The softmax function transforms the outputs such that they become
nonnegative and sum to 1, while requiring that the model remains
differentiable.
○ first exponentiate each logit (ensuring non-negativity) and then divide by their sum
(ensuring that they sum to 1)
● Softmax is a nonlinear function.
17
Log-Likelihood Loss Function / Cross-Entropy loss
● The softmax function gives us a vector ŷ, which we can interpret as
estimated conditional probabilities of each class given any input x,
○ ŷ1 = P (y = cat | x).
● Compare the estimates with reality by checking how probable the
actual classes are according to our model, given the features:
● Maximize P (Y | X) = minimizing the negative log-likelihood
18
Multi Layered Perceptrons (MLP)
19
Multilayer Perceptron
● With deep neural networks, use the data to jointly learn both a
representation via hidden layers and a linear predictor that acts upon
that representation.
● Add many hidden layers by stacking many fully-connected layers on
top of each other. Each layer feeds into the layer above it, until we
generate outputs.
● The first (L−1) layers learns the representation and the final layer is
the linear predictor. This architecture is commonly called a multilayer
perceptron (MLP).
20
MLP Architecture
○ MLP has 4 inputs, 3 outputs, and its hidden layer contains 5 hidden units.
○ Number of layers in this MLP is 2.
○ The layers are both fully connected. Every input influences every neuron in the
hidden layer, and each of these in turn influences every neuron in the output layer.
○ The outputs of the hidden layer are called as hidden representations or hidden-
layer variable or a hidden variable.
21
Nonlinearity in MLP
22
Activation Functions
23
Activation function
● Activation function of a neuron defines the output of that neuron given
an input or set of inputs.
● Activation functions decide whether a neuron should be activated or
not by calculating the weighted sum and adding bias with it.
● They are differentiable operators to transform input signals to outputs,
while most of them add non-linearity.
● Artificial neural networks are designed as universal function
approximators, they must have the ability to calculate and learn any
nonlinear function.
24
1. Step Function
25
2. Sigmoid (Logistic) Activation Function
Squashing function as: it squashes any input in the range (-inf, inf) to some value
in the range (0, 1) 26
3. Tanh (hyperbolic tangent) Activation Function
● More efficient because it has a wider range for faster learning and grading.
● The tanh activation usually works better than sigmoid activation function for
hidden units because the mean of its output is closer to zero, and so it centers the
data better for the next layer.
● Issues with tanh
○ computationally expensive
○ lead to vanishing gradients
27
4. ReLU Activation Function
● If we combine two ReLU units, we can recover a piecewise linear approximation of the Sigmoid function.
● Some ReLU variants: Softplus (Smooth ReLU), Noisy ReLU, Leaky ReLU, Parametric ReLU and Exponential ReLU
(ELU).
● Advantages
○ Fast Learning and Efficient computation
○ Fewer vanishing gradient problems
○ Sparse activation
○ Scale invariant (max operation)
● Disadvantages
○ Leads to exploding gradient.
28
Comparing Activation Functions
29
Training MLP
30
Two Layer Neural Network
31
Compute the Activations
32
Vectoring Forward Propagation
33
Neural Network Training – Forward Pass
34
Neural Network Training – Forward Pass
35
Neural Network Training – Forward Pass
36
Forward Propagation Algorithm
37
Computation Graph for Forward Pass
38
Cost Function
39
Neural Network Training – Backward Pass
40
Neural Network Training – Backward Pass
41
Neural Network Training – Backward Pass
42
Backpropagation Algorithm to compute Gradients
43
Computation Graph for BackProp
44
Neural Network Training – Update Parameters
45
Parameter Updation
46
Training of MultiLayer Perceptron (MLP)
Requires
1. Forward Pass through each layer to compute the output.
2. Compute the deviation or error between the desired output and
computed output in the forward pass (first step). This morphs into
objective function, as we want to minimize this deviation or error.
3. The deviation has to be send back through each layer to compute the
delta or change in the parameter values. This is achieved using back
propagation algorithm.
4. Update the parameters.
47
48
Scaling up for L layers in MLP
49
Forward Propagation Algorithm
50
Backward Propagation Algorithm
51
Update the Parameters
52
Ref:
Chapter 3 and 4 of T1
53
Next Session:
Power of MLP
54

More Related Content

Similar to Deep Learning Module 2A Training MLP.pptx

Machine Learning With Neural Networks
Machine Learning  With Neural NetworksMachine Learning  With Neural Networks
Machine Learning With Neural Networks
Knoldus Inc.
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
SanjanaSaxena17
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
Atul Krishna
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
Etsuji Nakai
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
Stratio
 
LightGBM and Multilayer perceptron (MLP) slide
LightGBM and Multilayer perceptron (MLP) slideLightGBM and Multilayer perceptron (MLP) slide
LightGBM and Multilayer perceptron (MLP) slide
riahaque1950
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
SakkaravarthiShanmug
 
Illustrative Introductory Neural Networks
Illustrative Introductory Neural NetworksIllustrative Introductory Neural Networks
Illustrative Introductory Neural Networks
YasutoTamura1
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
CSSC ML Workshop
CSSC ML WorkshopCSSC ML Workshop
CSSC ML Workshop
GDSC UofT Mississauga
 
Nural Network ppt presentation which help about nural
Nural Network ppt presentation which help about nuralNural Network ppt presentation which help about nural
Nural Network ppt presentation which help about nural
sayaleedeshmukh5
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
AminaRepo
 
Eye deep
Eye deepEye deep
Eye deep
sveitser
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
Universitat Politècnica de Catalunya
 
Activation_function.pptx
Activation_function.pptxActivation_function.pptx
Activation_function.pptx
Mohamed Essam
 
Neural networks
Neural networksNeural networks
Neural networks
HarshitGupta367
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
Pratik Aggarwal
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
Deep learning: Mathematical Perspective
Deep learning: Mathematical PerspectiveDeep learning: Mathematical Perspective
Deep learning: Mathematical Perspective
YounusS2
 
Artificial Neural Networks presentations
Artificial Neural Networks presentationsArtificial Neural Networks presentations
Artificial Neural Networks presentations
migob991
 

Similar to Deep Learning Module 2A Training MLP.pptx (20)

Machine Learning With Neural Networks
Machine Learning  With Neural NetworksMachine Learning  With Neural Networks
Machine Learning With Neural Networks
 
Machine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by stepMachine Learning Notes for beginners ,Step by step
Machine Learning Notes for beginners ,Step by step
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
LightGBM and Multilayer perceptron (MLP) slide
LightGBM and Multilayer perceptron (MLP) slideLightGBM and Multilayer perceptron (MLP) slide
LightGBM and Multilayer perceptron (MLP) slide
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
 
Illustrative Introductory Neural Networks
Illustrative Introductory Neural NetworksIllustrative Introductory Neural Networks
Illustrative Introductory Neural Networks
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
 
CSSC ML Workshop
CSSC ML WorkshopCSSC ML Workshop
CSSC ML Workshop
 
Nural Network ppt presentation which help about nural
Nural Network ppt presentation which help about nuralNural Network ppt presentation which help about nural
Nural Network ppt presentation which help about nural
 
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reductionAaa ped-17-Unsupervised Learning: Dimensionality reduction
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
 
Eye deep
Eye deepEye deep
Eye deep
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
 
Activation_function.pptx
Activation_function.pptxActivation_function.pptx
Activation_function.pptx
 
Neural networks
Neural networksNeural networks
Neural networks
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Deep learning: Mathematical Perspective
Deep learning: Mathematical PerspectiveDeep learning: Mathematical Perspective
Deep learning: Mathematical Perspective
 
Artificial Neural Networks presentations
Artificial Neural Networks presentationsArtificial Neural Networks presentations
Artificial Neural Networks presentations
 

Recently uploaded

Decentralized Justice in Gaming and Esports
Decentralized Justice in Gaming and EsportsDecentralized Justice in Gaming and Esports
Decentralized Justice in Gaming and Esports
Federico Ast
 
Trends In Cybersecurity | Rise Of Iot Security Solutions | IoT Device Security
Trends In Cybersecurity | Rise Of Iot Security Solutions |  IoT Device SecurityTrends In Cybersecurity | Rise Of Iot Security Solutions |  IoT Device Security
Trends In Cybersecurity | Rise Of Iot Security Solutions | IoT Device Security
Lumiverse Solutions Pvt Ltd
 
India Cyber Threat Report of 2024 with year
India Cyber Threat Report of 2024 with yearIndia Cyber Threat Report of 2024 with year
India Cyber Threat Report of 2024 with year
AkashKumar1733
 
Seizing the IPv6 Advantage: For a Bigger, Faster and Stronger Internet
Seizing the IPv6 Advantage: For a Bigger, Faster and Stronger InternetSeizing the IPv6 Advantage: For a Bigger, Faster and Stronger Internet
Seizing the IPv6 Advantage: For a Bigger, Faster and Stronger Internet
APNIC
 
KubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial IntelligentKubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial Intelligent
Emre Gündoğdu
 
VVIP Call Girls Kolkata💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Se...
VVIP Call Girls Kolkata💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Se...VVIP Call Girls Kolkata💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Se...
VVIP Call Girls Kolkata💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Se...
graggunno
 
Pigasus 2.0: FPGA‐Accelerated Intrusion Detection/Prevention System
Pigasus 2.0: FPGA‐Accelerated Intrusion Detection/Prevention SystemPigasus 2.0: FPGA‐Accelerated Intrusion Detection/Prevention System
Pigasus 2.0: FPGA‐Accelerated Intrusion Detection/Prevention System
lowkeyact
 
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call GirlsBangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
narwatsonia7
 
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. ITNetwork Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Sarthak Sobti
 
Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7
Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7
Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7
manalishivani8
 
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
uqbyfm
 
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
dtagbe
 
Powai Call Girls ☑ +91-9920725232 ☑ Available Hot Girls Aunty Book Now
Powai Call Girls ☑ +91-9920725232 ☑ Available Hot Girls Aunty Book NowPowai Call Girls ☑ +91-9920725232 ☑ Available Hot Girls Aunty Book Now
Powai Call Girls ☑ +91-9920725232 ☑ Available Hot Girls Aunty Book Now
reddyaditi530
 
Unlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENT
Unlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENTUnlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENT
Unlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENT
keshavtiwari584
 
一比一原版(uom学位证书)北安普顿大学毕业证如何办理
一比一原版(uom学位证书)北安普顿大学毕业证如何办理一比一原版(uom学位证书)北安普顿大学毕业证如何办理
一比一原版(uom学位证书)北安普顿大学毕业证如何办理
9nfobpgg
 
HistorySrSec2024 daahi sadhin sgg-25.pdf
HistorySrSec2024 daahi sadhin sgg-25.pdfHistorySrSec2024 daahi sadhin sgg-25.pdf
HistorySrSec2024 daahi sadhin sgg-25.pdf
AdiySgh
 
Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...
Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...
Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...
payalgupta2u
 
Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
sanju baba
 
169+ Call Girls In Navi Mumbai | 9930245274 | Reliability Escort Service Near...
169+ Call Girls In Navi Mumbai | 9930245274 | Reliability Escort Service Near...169+ Call Girls In Navi Mumbai | 9930245274 | Reliability Escort Service Near...
169+ Call Girls In Navi Mumbai | 9930245274 | Reliability Escort Service Near...
tanichadda371 #v08
 
VVIP Call Girls💯Call Us {{ 7374876321 }} 🔝 💃 Independent Female Escort Service
VVIP Call Girls💯Call Us {{ 7374876321 }} 🔝 💃 Independent Female Escort ServiceVVIP Call Girls💯Call Us {{ 7374876321 }} 🔝 💃 Independent Female Escort Service
VVIP Call Girls💯Call Us {{ 7374876321 }} 🔝 💃 Independent Female Escort Service
graggunno
 

Recently uploaded (20)

Decentralized Justice in Gaming and Esports
Decentralized Justice in Gaming and EsportsDecentralized Justice in Gaming and Esports
Decentralized Justice in Gaming and Esports
 
Trends In Cybersecurity | Rise Of Iot Security Solutions | IoT Device Security
Trends In Cybersecurity | Rise Of Iot Security Solutions |  IoT Device SecurityTrends In Cybersecurity | Rise Of Iot Security Solutions |  IoT Device Security
Trends In Cybersecurity | Rise Of Iot Security Solutions | IoT Device Security
 
India Cyber Threat Report of 2024 with year
India Cyber Threat Report of 2024 with yearIndia Cyber Threat Report of 2024 with year
India Cyber Threat Report of 2024 with year
 
Seizing the IPv6 Advantage: For a Bigger, Faster and Stronger Internet
Seizing the IPv6 Advantage: For a Bigger, Faster and Stronger InternetSeizing the IPv6 Advantage: For a Bigger, Faster and Stronger Internet
Seizing the IPv6 Advantage: For a Bigger, Faster and Stronger Internet
 
KubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial IntelligentKubeCon & CloudNative Con 2024 Artificial Intelligent
KubeCon & CloudNative Con 2024 Artificial Intelligent
 
VVIP Call Girls Kolkata💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Se...
VVIP Call Girls Kolkata💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Se...VVIP Call Girls Kolkata💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Se...
VVIP Call Girls Kolkata💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Se...
 
Pigasus 2.0: FPGA‐Accelerated Intrusion Detection/Prevention System
Pigasus 2.0: FPGA‐Accelerated Intrusion Detection/Prevention SystemPigasus 2.0: FPGA‐Accelerated Intrusion Detection/Prevention System
Pigasus 2.0: FPGA‐Accelerated Intrusion Detection/Prevention System
 
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call GirlsBangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
Bangalore Call Girls 9079923931 With -Cuties' Hot Call Girls
 
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. ITNetwork Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
Network Security and Cyber Laws (Complete Notes) for B.Tech/BCA/BSc. IT
 
Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7
Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7
Call Girls Dehradun 8824825030 Escort In Dehradun service 24X7
 
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
 
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
 
Powai Call Girls ☑ +91-9920725232 ☑ Available Hot Girls Aunty Book Now
Powai Call Girls ☑ +91-9920725232 ☑ Available Hot Girls Aunty Book NowPowai Call Girls ☑ +91-9920725232 ☑ Available Hot Girls Aunty Book Now
Powai Call Girls ☑ +91-9920725232 ☑ Available Hot Girls Aunty Book Now
 
Unlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENT
Unlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENTUnlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENT
Unlimited Fun With Call Girls Hyderabad ✅ 7737669865 💘 FULL CASH PAYMENT
 
一比一原版(uom学位证书)北安普顿大学毕业证如何办理
一比一原版(uom学位证书)北安普顿大学毕业证如何办理一比一原版(uom学位证书)北安普顿大学毕业证如何办理
一比一原版(uom学位证书)北安普顿大学毕业证如何办理
 
HistorySrSec2024 daahi sadhin sgg-25.pdf
HistorySrSec2024 daahi sadhin sgg-25.pdfHistorySrSec2024 daahi sadhin sgg-25.pdf
HistorySrSec2024 daahi sadhin sgg-25.pdf
 
Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...
Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...
Call Girls In Chennai 💯Call Us 🔝 8824825030 🔝Independent Chennai Escorts Serv...
 
Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
Karol Bagh Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
 
169+ Call Girls In Navi Mumbai | 9930245274 | Reliability Escort Service Near...
169+ Call Girls In Navi Mumbai | 9930245274 | Reliability Escort Service Near...169+ Call Girls In Navi Mumbai | 9930245274 | Reliability Escort Service Near...
169+ Call Girls In Navi Mumbai | 9930245274 | Reliability Escort Service Near...
 
VVIP Call Girls💯Call Us {{ 7374876321 }} 🔝 💃 Independent Female Escort Service
VVIP Call Girls💯Call Us {{ 7374876321 }} 🔝 💃 Independent Female Escort ServiceVVIP Call Girls💯Call Us {{ 7374876321 }} 🔝 💃 Independent Female Escort Service
VVIP Call Girls💯Call Us {{ 7374876321 }} 🔝 💃 Independent Female Escort Service
 

Deep Learning Module 2A Training MLP.pptx

  • 1. Deep Learning DSE Module 2 Sivagami R BITS Pilani
  • 2. The author of this deck, Prof. Seetha Parameswaran, is gratefully acknowledging the authors who made their course materials freely available online. 2
  • 3. Single Perceptron for Regression Real Valued Output 3
  • 4. Linear Regression Example ● Suppose that we wish to estimate the prices of houses (in dollars) based on their area (in square feet) and age (in years). ● The linearity assumption just says that the target (price) can be expressed as a weighted sum of the features (area and age): price = warea * area + wage * age + b ● warea and wage are called weights, and b is called a bias. ● The weights determine the influence of each feature on our prediction and the bias just says what value the predicted price should take when all of the features take value 0. 4
  • 5. Data ● Data ○ The dataset is called a training dataset or training set. ○ Each row is called an example (or data point, data instance, sample). ○ The thing we are trying to predict is called a label (or target). ○ The independent variables upon which the predictions are based are called features (or covariates). 5
  • 6. Affine transformations and Linear Models ● The equation of the form ● is an affine transformation of input features, which is characterized by a linear transformation of features via weighted sum, combined with a translation via the added bias. ● Models whose output prediction is determined by the affine transformation of input features are linear models. ● The affine transformation is specified by the chosen weights (w) and bias (b). 6
  • 7. Loss Function ● Loss function is a quality measure for some given model or a measure of fitness. ● The loss function quantifies the distance between the real and predicted value of the target. ● The loss will usually be a non-negative number where smaller values are better and perfect predictions incur a loss of 0. ● The most popular loss function in regression problems is the squared error. 7
  • 8. Squared Error Loss Function ● The most popular loss function in regression problems is the squared error. ● For each example, ● For the entire dataset of n examples ○ average (or equivalently, sum) the losses ● When training the model, find parameters (w∗, b∗ ) that minimize the total loss across all training examples 8
  • 9. Minibatch Stochastic Gradient Descent (SGD) ● Apply Gradient descent algorithm on a random minibatch of examples every time we need to compute the update. ● In each iteration, ○ Step 1: randomly sample a minibatch B consisting of a fixed number of training examples. ○ Step 2: compute the derivative (gradient) of the average loss on the minibatch with regard to the model parameters. ○ Step 3: multiply the gradient by a predetermined positive value η and subtract the resulting term from the current parameter values. 9
  • 10. Training using SGD Algorithm PS: The number of epochs and the learning rate are both hyperparameters. Setting hyperparameters requires some adjustment by trial and error. 10
  • 11. Prediction ● Estimating targets given features is commonly called prediction or inference. ● Given the learned model, values of target can be predicted, for any set of features. 11
  • 12. Single-Layer Neural Network Linear regression is a single-layer neural network ○ Number of inputs (or feature dimensionality) in the input layer is d. ○ The inputs are x1 , . . . , xd. ○ The output is o1. ○ Number of outputs in the output layer is 1. ○ Number of layers for the neural network is 1. (conventionally we do not consider the input layer when counting layers.) ○ Every input is connected to every output, This transformation is a fully-connected layer or dense layer. 12
  • 13. Multiple Perceptrons for Classification Binary Outputs 13
  • 14. Classification Example ● Each input consists of a 2 × 2 grayscale image. ● Represent each pixel value with a single scalar, giving four features x1 , x2, x3 , x4. ● Assume that each image belongs to one among the categories “square”, “triangle”, and “circle”. ● How to represent the labels? ○ Use label encoding. y ∈ {1, 2, 3}, where the integers represent {circle, square, triangle} respectively. ○ Use one-hot encoding. y ∈ {(1, 0, 0), (0, 1, 0), (0, 0, 1)}. ■ y would be a three-dimensional vector, with (1, 0, 0) corresponding to “circle”, (0, 1, 0) to “square”, and (0, 0, 1) to “triangle”. 14
  • 15. Network Architecture ● A model with multiple outputs, one per class. Each output will correspond to its own affine function. ○ 4 features and 3 possible output categories 15
  • 16. Network Architecture ○ 12 scalars to represent the weights and 3 scalars to represent the biases ○ compute three logits, o1, o2, and o3, for each input. ○ weights is a 3×4 matrix and bias is 1×4 matrix 16
  • 17. Softmax Operation ● Interpret the outputs of our model as probabilities. ○ Any output ŷj is interpreted as the probability that a given item belongs to class j. Then choose the class with the largest output value as our prediction argmaxj yj . ○ If ŷ1 , ŷ2 , and ŷ3 are 0.1, 0.8, and 0.1, respectively, then predict category 2. ○ To interpret the outputs as probabilities, we must guarantee that, they will be nonnegative and sum up to 1. ● The softmax function transforms the outputs such that they become nonnegative and sum to 1, while requiring that the model remains differentiable. ○ first exponentiate each logit (ensuring non-negativity) and then divide by their sum (ensuring that they sum to 1) ● Softmax is a nonlinear function. 17
  • 18. Log-Likelihood Loss Function / Cross-Entropy loss ● The softmax function gives us a vector ŷ, which we can interpret as estimated conditional probabilities of each class given any input x, ○ ŷ1 = P (y = cat | x). ● Compare the estimates with reality by checking how probable the actual classes are according to our model, given the features: ● Maximize P (Y | X) = minimizing the negative log-likelihood 18
  • 20. Multilayer Perceptron ● With deep neural networks, use the data to jointly learn both a representation via hidden layers and a linear predictor that acts upon that representation. ● Add many hidden layers by stacking many fully-connected layers on top of each other. Each layer feeds into the layer above it, until we generate outputs. ● The first (L−1) layers learns the representation and the final layer is the linear predictor. This architecture is commonly called a multilayer perceptron (MLP). 20
  • 21. MLP Architecture ○ MLP has 4 inputs, 3 outputs, and its hidden layer contains 5 hidden units. ○ Number of layers in this MLP is 2. ○ The layers are both fully connected. Every input influences every neuron in the hidden layer, and each of these in turn influences every neuron in the output layer. ○ The outputs of the hidden layer are called as hidden representations or hidden- layer variable or a hidden variable. 21
  • 24. Activation function ● Activation function of a neuron defines the output of that neuron given an input or set of inputs. ● Activation functions decide whether a neuron should be activated or not by calculating the weighted sum and adding bias with it. ● They are differentiable operators to transform input signals to outputs, while most of them add non-linearity. ● Artificial neural networks are designed as universal function approximators, they must have the ability to calculate and learn any nonlinear function. 24
  • 26. 2. Sigmoid (Logistic) Activation Function Squashing function as: it squashes any input in the range (-inf, inf) to some value in the range (0, 1) 26
  • 27. 3. Tanh (hyperbolic tangent) Activation Function ● More efficient because it has a wider range for faster learning and grading. ● The tanh activation usually works better than sigmoid activation function for hidden units because the mean of its output is closer to zero, and so it centers the data better for the next layer. ● Issues with tanh ○ computationally expensive ○ lead to vanishing gradients 27
  • 28. 4. ReLU Activation Function ● If we combine two ReLU units, we can recover a piecewise linear approximation of the Sigmoid function. ● Some ReLU variants: Softplus (Smooth ReLU), Noisy ReLU, Leaky ReLU, Parametric ReLU and Exponential ReLU (ELU). ● Advantages ○ Fast Learning and Efficient computation ○ Fewer vanishing gradient problems ○ Sparse activation ○ Scale invariant (max operation) ● Disadvantages ○ Leads to exploding gradient. 28
  • 31. Two Layer Neural Network 31
  • 34. Neural Network Training – Forward Pass 34
  • 35. Neural Network Training – Forward Pass 35
  • 36. Neural Network Training – Forward Pass 36
  • 38. Computation Graph for Forward Pass 38
  • 40. Neural Network Training – Backward Pass 40
  • 41. Neural Network Training – Backward Pass 41
  • 42. Neural Network Training – Backward Pass 42
  • 43. Backpropagation Algorithm to compute Gradients 43
  • 44. Computation Graph for BackProp 44
  • 45. Neural Network Training – Update Parameters 45
  • 47. Training of MultiLayer Perceptron (MLP) Requires 1. Forward Pass through each layer to compute the output. 2. Compute the deviation or error between the desired output and computed output in the forward pass (first step). This morphs into objective function, as we want to minimize this deviation or error. 3. The deviation has to be send back through each layer to compute the delta or change in the parameter values. This is achieved using back propagation algorithm. 4. Update the parameters. 47
  • 48. 48
  • 49. Scaling up for L layers in MLP 49
  • 53. Ref: Chapter 3 and 4 of T1 53

Editor's Notes

  1. Ch 3 of T1
  2. cardinality |B| represents the number of examples in each minibatch (the batch size) and η denotes the learning rate. We emphasize that the values of the batch size and learning rate are manually pre-specified and not typically learned through model training. These parameters that are tunable but not updated in the training loop are called hyperparameters. Hyperparameter tuning is the process by which hyperparameters are chosen, and typically requires that we adjust them based on the results of the training loop as assessed on a separate validation dataset (or validation set).
  3. Ch 3 of T1
  4. A one-hot encoding is a vector with as many components as we have categories. The component corresponding to particular instanceʼs category is set to 1 and all other components are set to 0.
  5. softmax is a nonlinear function, the outputs of softmax regression are still determined by an affine transformation of input features; thus, softmax regression is a linear model.
  6. Chapter 4 of T1
  7. We will lean computation graph in module 3
  8. We will lean computation graph in module 3
  翻译: