尊敬的 微信汇率:1円 ≈ 0.046166 元 支付宝汇率:1円 ≈ 0.046257元 [退出登录]
SlideShare a Scribd company logo
• computing techniques
• applications of soft computing
• Neuron
• Nerve structure and synapse-
• Artificial Neuron and its model
• Activation functions
• Neural network architecture
• single layer and multilayer feed forward networks
• McCullochPitts neuron model
• perceptron model- Adaline and Madaline
• multilayer perception model
• back propagation learning methods
• effect of learning rule coefficient
• back propagation algorithm
• factors affecting back propagation training applications.
Neural Networks
● Artificial neural network (ANN) is a machine learning
approach that models human brain and consists of a number
of artificial neurons.
● Neuron in ANNs tend to have fewer connections than
biological neurons.
● Each neuron in ANN receives a number of inputs.
● An activation function is applied to these inputs which results
in activation level of neuron (output value of the neuron).
● Knowledge about the learning task is given in the form of
examples called training examples.
● An Artificial Neural Network is specified by:
− neuron model: the information processing unit of the NN,
− an architecture: a set of neurons and links connecting neurons.
Each link has a weight,
− a learning algorithm: used for training the NN by modifying the
weights in order to model a particular learning task correctly on
the training examples.
● The aim is to obtain a NN that is trained and generalizes
● It should behaves correctly on new instances of the
learning task.
● The neuron is the basic information processing unit of a
NN. It consists of:
1 A set of links, describing the neuron inputs, with weights W1, W2,
…, Wm
2 An adder function (linear combiner) for computing the weighted
sum of the inputs:
(real numbers)
3 Activation function for limiting the amplitude of the neuron
output. Here ‘b’ denotes bias.
)(uy b
The Neuron Diagram
 
 )(
Bias of a Neuron
● The bias b has the effect of applying a transformation to
the weighted sum u
v = u + b
● The bias is an external parameter of the neuron. It can be
modeled by adding an extra input.
● v is called induced field of the neuron
xwv j
 
Neuron Models
● The choice of activation function determines the
neuron model.
● step function:
● ramp function:
● sigmoid function with z,x,y parameters
● Gaussian function:
 
Step Function
c d
Ramp Function
Sigmoid function
• The Gaussian function is the probability
function of the normal distribution. Sometimes
also called the frequency curve.
Network Architectures
● Three different classes of network architectures
− single-layer feed-forward
− multi-layer feed-forward
− recurrent
● The architecture of a neural network is linked
with the learning algorithm used to train
Single Layer Feed-forward
Input layer
source nodes
Output layer
Perceptron: Neuron Model
(Special form of single layer feed forward)
− The perceptron was first proposed by Rosenblatt (1958) is a simple
neuron that is used to classify its input into one of two categories.
− A perceptron uses a step function that returns +1 if weighted sum
of its input  0 and -1 otherwise
b (bias)
v y
Perceptron for Classification
● The perceptron is used for binary classification.
● First train a perceptron for a classification task.
− Find suitable weights in such a way that the training examples are
correctly classified.
− Geometrically try to find a hyper-plane that separates the examples of the
two classes.
● The perceptron can only model linearly separable classes.
● When the two classes are not linearly separable, it may be
desirable to obtain a linear separator that minimizes the mean
squared error.
● Given training examples of classes C1, C2 train the perceptron in
such a way that :
− If the output of the perceptron is +1 then the input is assigned to class C1
− If the output is -1 then the input is assigned to C2
1 true true
false true
0 1 X2
Boolean function OR – Linearly separable
Learning Process for Perceptron
● Initially assign random weights to inputs between -0.5 and
● Training data is presented to perceptron and its output is
● If output is incorrect, the weights are adjusted accordingly
using following formula.
wi  wi + (a* xi *e), where ‘e’ is error produced
and ‘a’ (-1  a  1) is learning rate
− ‘a’ is defined as 0 if output is correct, it is +ve, if output is too low and
–ve, if output is too high.
− Once the modification to weights has taken place, the next piece of
training data is used in the same way.
− Once all the training data have been applied, the process starts again
until all the weights are correct and all errors are zero.
− Each iteration of this process is known as an epoch.
Example: Perceptron to learn OR
● Initially consider w1 = -0.2 and w2 = 0.4
● Training data say, x1 = 0 and x2 = 0, output is 0.
● Compute y = Step(w1*x1 + w2*x2) = 0. Output is correct so
weights are not changed.
● For training data x1=0 and x2 = 1, output is 1
● Compute y = Step(w1*x1 + w2*x2) = 0.4 = 1. Output is correct
so weights are not changed.
● Next training data x1=1 and x2 = 0 and output is 1
● Compute y = Step(w1*x1 + w2*x2) = - 0.2 = 0. Output is
incorrect, hence weights are to be changed.
● Assume a = 0.2 and error e=1
wi = wi + (a * xi * e) gives w1 = 0 and w2 =0.4
● With these weights, test the remaining test data.
● Repeat the process till we get stable result.
Perceptron: Limitations
● The perceptron can only model linearly separable
− those functions which can be drawn in 2-dim graph and single
straight line separates values in two part.
● Boolean functions given below are linearly separable:
− OR
● It cannot model XOR function as it is non linearly
− When the two classes are not linearly separable, it may be
desirable to obtain a linear separator that minimizes the mean
squared error.
XOR – Non linearly separable function
● A typical example of non-linearly separable function is the
XOR that computes the logical exclusive or..
● This function takes two input arguments with values in {0,1}
and returns one output in {0,1},
● Here 0 and 1 are encoding of the truth values false and
● The output is true if and only if the two inputs have
different truth values.
● XOR is non linearly separable function which can not be
modeled by perceptron.
● For such functions we have to use multi layer feed-forward
These two classes (true and false) cannot be separated using a
line. Hence XOR is non linearly separable.
Input Output
X1 X2 X1 XOR X2
0 0 0
0 1 1
1 0 1
1 1 0
1 true false
false true
0 1 X2
Using ADALINE Network
ADALINE widrow-hoff Learning
Learning algorithm
Least square minimization
Comparison with perceptron
Multi layer feed-forward NN (FFNN)
● FFNN is a more general network architecture, where there are
hidden layers between input and output layers.
● Hidden nodes do not directly receive inputs nor send outputs to
the external environment.
● FFNNs overcome the limitation of single-layer NN.
● They can handle non-linearly separable learning tasks.
Hidden Layer
3-4-2 Network
Inputs OutputofHiddenNodes Output
X1 X2 H1 H2
0 0 0 0 –0.50 0
0 1 –10 1 0.5 1 1
1 0 1 –10 0.5 1 1
1 1 0 0 –0.50 0
Since we are representing two states by 0 (false) and 1 (true), we
will map negative outputs (–1, –0.5) of hidden and output layers
to 0 and positive output (0.5) to 1.
● The ANN for XOR has two hidden nodes that realizes this non-linear
separation and uses the sign (step) activation function.
● Arrows from input nodes to two hidden nodes indicate the directions of
the weight vectors (1,-1) and (-1,1).
● The output node is used to combine the outputs of the two hidden
Input nodes Hidden layer Output layer Output
H1 –0.5
X1 1
–1 1
–1 H2
X2 1 1
● The classical learning algorithm of FFNN is based on the
gradient descent method.
● For this reason the activation function used in FFNN are
continuous functions of the weights, differentiable
● The activation function for node i may be defined as a
simple form of the sigmoid function in the following
where A > 0, Vi =  Wij * Yj , such that Wij is a weight of the link
from node i to node j and Yj is the output of node j.
)( ViA
Vi 
Training Algorithm: Back-propagation
● The Back propagation algorithm learns in the same way as
single perceptron.
● It searches for weight values that minimize the total error of
the network over the set of training examples (training set).
● Back propagation consists of the repeated application of the
following two passes:
− Forward pass: In this step, the network is activated on one example
and the error of (each neuron of) the output layer is computed.
− Backward pass: in this step the network error is used for updating
the weights. The error is propagated backwards from the output
layer through the network layer by layer. This is done by recursively
computing the local gradient of each neuron.
Feed-forward Network
Feed-forward networks often have one or more hidden layers of sigmoid neurons followed
by an output layer of linear neurons.
Multiple layers of neurons with nonlinear transfer functions allow the network to learn
nonlinear and linear relationships between input and output vectors.
The linear output layer lets the network produce values outside the range -1 to +1. On the
other hand, if you want to constrain the outputs of a network (such as between 0 and 1),
then the output layer should use a sigmoid transfer function (such as logsig).
Backpropagation Learning
The following slides describes teaching process of multi-layer neural network
employing back-propagation algorithm. To illustrate this process the three layer neural
network with two inputs and one output, which is shown in the picture below, is used:
Learning Algorithm
Each neuron is composed of two units. First unit adds products of weights coefficients and
input signals. The second unit realise nonlinear function, called neuron transfer (activation)
function. Signal e is adder output signal, and y = f(e) is output signal of nonlinear element.
Signal y is also output signal of neuron.
Learning Algorithm:
To teach the neural network we need training data set. The training data set consists of
input signals (x1 and x2 ) assigned with corresponding target (desired output) z.
The network training is an iterative process. In each iteration weights coefficients of nodes
are modified using new data from training data set. Modification is calculated using
algorithm described below:
Each teaching step starts with forcing both input signals from training set. After this stage
we can determine output signals values for each neuron in each network layer.
Learning Algorithm:
Pictures below illustrate how signal is propagating through the network,
Symbols w(xm)n represent weights of connections between network input xm and
neuron n in input layer. Symbols yn represents output signal of neuron n.
Learning Algorithm:
Learning Algorithm:
Learning Algorithm:
Propagation of signals through the hidden layer. Symbols wmn represent weights
of connections between output of neuron m and input of neuron n in the next
Learning Algorithm:
Learning Algorithm:
Learning Algorithm:
Propagation of signals through the output layer.
Learning Algorithm:
In the next algorithm step the output signal of the network y is
compared with the desired output value (the target), which is found in
training data set. The difference is called error signal d of output layer
Learning Algorithm:
The idea is to propagate error signal d (computed in single teaching step)
back to all neurons, which output signals were input for discussed
Learning Algorithm:
The idea is to propagate error signal d (computed in single teaching step)
back to all neurons, which output signals were input for discussed
Learning Algorithm:
The weights' coefficients wmn used to propagate errors back are equal to
this used during computing output value. Only the direction of data flow
is changed (signals are propagated from output to inputs one after the
other). This technique is used for all network layers. If propagated errors
came from few neurons they are added. The illustration is below:
Learning Algorithm:
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function
(which weights are modified).
Learning Algorithm:
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function
(which weights are modified).
Learning Algorithm:
When the error signal for each neuron is computed, the weights
coefficients of each neuron input node may be modified. In formulas
below df(e)/de represents derivative of neuron activation function
(which weights are modified).
Weight Update Rule
● The Backprop weight update rule is based on the gradient
descent method:
− It takes a step in the direction yielding the maximum decrease of
the network error E.
− This direction is the opposite of the gradient of E.
● Iteration of the Backprop algorithm is usually terminated
when the sum of squares of errors of the output values for
all training data in an epoch is less than some threshold
such as 0.01
ijijij www 
Back-prop learning algorithm
initialize weights randomly;
while (stopping criterion not satisfied or n <max_iterations)
for each example (x,d)
- run the network with input x and compute the output y
- update the weights in backward order starting from those of
the output layer:
with computed using the (generalized) Delta rule
n = n+1;
jijiji www 
Stopping criterions
● Total mean squared error change:
− Back-prop is considered to have converged when the absolute
rate of change in the average squared error per epoch is
sufficiently small (in the range [0.1, 0.01]).
● Generalization based criterion:
− After each epoch, the NN is tested for generalization.
− If the generalization performance is adequate then stop.
− If this stopping criterion is used then the part of the training set
used for testing the network generalization will not used for
updating the weights.
● Data representation
● Network Topology
● Network Parameters
● Training
● Validation
● Data representation depends on the problem.
● In general ANNs work on continuous (real valued) attributes.
Therefore symbolic attributes are encoded into continuous ones.
● Attributes of different types may have different ranges of values
which affect the training process.
● Normalization may be used, like the following one which scales
each attribute to assume values between 0 and 1.
for each value xi of ith attribute, mini and maxi are the minimum and
maximum value of that attribute over the training set.
Data Representation
● The number of layers and neurons depend on the specific
● In practice this issue is solved by trial and error.
● Two types of adaptive algorithms can be used:
− start from a large network and successively remove some neurons
and links until network performance degrades.
− begin with a small network and introduce new neurons until
performance is satisfactory.
Network Topology
● How are the weights initialized?
● How is the learning rate chosen?
● How many hidden layers and how many
● How many examples in the training set?
Network parameters
Initialization of weights
● In general, initial weights are randomly chosen, with
typical values between -1.0 and 1.0 or -0.5 and 0.5.
● If some inputs are much larger than others, random
initialization may bias the network to give much more
importance to larger inputs.
● In such a case, weights can be initialized as follows:
ij i
w For weights from the input to the first layer
For weights from the first to the second layer
jk ij
w 
● The right value of  depends on the application.
● Values between 0.1 and 0.9 have been used in
many applications.
● Other heuristics is that adapt  during the
training as described in previous slides.
Choice of learning rate
● Rule of thumb:
− the number of training examples should be at least five to ten
times the number of weights of the network.
● Other rule:
|W|= number of weights
a=expected accuracy on test seta)-(1
N 
● The networks generated using these
weights and input vectors are stable, except
● X2 stabilizes to X1 (which is at hamming
distance 1).
● Finally, with the obtained weights and
stable states (X1 and X3), we can stabilize
any new (partial) pattern to one of those

More Related Content

What's hot

Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
Mohammed Bennamoun
Fuzzy Membership Function
Fuzzy Membership Function Fuzzy Membership Function
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and DhanashriRadial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and Dhanashri
sheetal katkar
Hebb network
Hebb networkHebb network
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
Akash Goel
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
Junya Tanaka
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
Si Haem
Hetro associative memory
Hetro associative memoryHetro associative memory
Hetro associative memory
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
Francesco Collova'
The Wumpus World in Artificial intelligence.pptx
The Wumpus World in Artificial intelligence.pptxThe Wumpus World in Artificial intelligence.pptx
The Wumpus World in Artificial intelligence.pptx
Artificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksArtificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural Networks
The Integral Worm
Mc culloch pitts neuron
Mc culloch pitts neuronMc culloch pitts neuron
Artificial Neural Network Topology
Artificial Neural Network TopologyArtificial Neural Network Topology
Artificial Neural Network Topology
Harshana Madusanka Jayamaha
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Prakash Pimpale
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Mohammed Bennamoun
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
Single Layer Rosenblatt Perceptron
Single Layer Rosenblatt PerceptronSingle Layer Rosenblatt Perceptron
Single Layer Rosenblatt Perceptron
Adaptive Resonance Theory
Adaptive Resonance TheoryAdaptive Resonance Theory
Adaptive Resonance Theory
Naveen Kumar
Unit 1 - ML - Introduction to Machine Learning.pptx
Unit 1 - ML - Introduction to Machine Learning.pptxUnit 1 - ML - Introduction to Machine Learning.pptx
Unit 1 - ML - Introduction to Machine Learning.pptx

What's hot (20)

Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
Fuzzy Membership Function
Fuzzy Membership Function Fuzzy Membership Function
Fuzzy Membership Function
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and DhanashriRadial basis function network ppt bySheetal,Samreen and Dhanashri
Radial basis function network ppt bySheetal,Samreen and Dhanashri
Hebb network
Hebb networkHebb network
Hebb network
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
Hetro associative memory
Hetro associative memoryHetro associative memory
Hetro associative memory
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
The Wumpus World in Artificial intelligence.pptx
The Wumpus World in Artificial intelligence.pptxThe Wumpus World in Artificial intelligence.pptx
The Wumpus World in Artificial intelligence.pptx
Artificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural NetworksArtificial Intelligence: Artificial Neural Networks
Artificial Intelligence: Artificial Neural Networks
Mc culloch pitts neuron
Mc culloch pitts neuronMc culloch pitts neuron
Mc culloch pitts neuron
Artificial Neural Network Topology
Artificial Neural Network TopologyArtificial Neural Network Topology
Artificial Neural Network Topology
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron ClassifiersArtificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
Single Layer Rosenblatt Perceptron
Single Layer Rosenblatt PerceptronSingle Layer Rosenblatt Perceptron
Single Layer Rosenblatt Perceptron
Adaptive Resonance Theory
Adaptive Resonance TheoryAdaptive Resonance Theory
Adaptive Resonance Theory
Unit 1 - ML - Introduction to Machine Learning.pptx
Unit 1 - ML - Introduction to Machine Learning.pptxUnit 1 - ML - Introduction to Machine Learning.pptx
Unit 1 - ML - Introduction to Machine Learning.pptx

Viewers also liked

Johnsson: applications of_self-organizing_maps
Johnsson: applications of_self-organizing_mapsJohnsson: applications of_self-organizing_maps
Johnsson: applications of_self-organizing_maps
ArchiLab 7
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
face detection
face detectionface detection
face detection
Smriti Tikoo
Multidimensional Perceptual Map for Project Prioritization and Selection - 20...
Multidimensional Perceptual Map for Project Prioritization and Selection - 20...Multidimensional Perceptual Map for Project Prioritization and Selection - 20...
Multidimensional Perceptual Map for Project Prioritization and Selection - 20...
Jack Zheng
Learning Vector Quantization LVQ
Learning Vector Quantization LVQLearning Vector Quantization LVQ
Learning Vector Quantization LVQ
Vector quantization
Vector quantizationVector quantization
Vector quantization
Rajani Sharma
Back propagation network
Back propagation networkBack propagation network
Back propagation network
HIRA Zaidi
Hopfield Networks
Hopfield NetworksHopfield Networks
Hopfield Networks
Kanchana Rani G
Neural Networks
Neural NetworksNeural Networks
Neural Networks
ankita pandey
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing maps
neural network
neural networkneural network
neural network
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications

Viewers also liked (14)

Johnsson: applications of_self-organizing_maps
Johnsson: applications of_self-organizing_mapsJohnsson: applications of_self-organizing_maps
Johnsson: applications of_self-organizing_maps
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
Implementation Of Back-Propagation Neural Network For Isolated Bangla Speech ...
face detection
face detectionface detection
face detection
Multidimensional Perceptual Map for Project Prioritization and Selection - 20...
Multidimensional Perceptual Map for Project Prioritization and Selection - 20...Multidimensional Perceptual Map for Project Prioritization and Selection - 20...
Multidimensional Perceptual Map for Project Prioritization and Selection - 20...
Learning Vector Quantization LVQ
Learning Vector Quantization LVQLearning Vector Quantization LVQ
Learning Vector Quantization LVQ
Vector quantization
Vector quantizationVector quantization
Vector quantization
Back propagation network
Back propagation networkBack propagation network
Back propagation network
Hopfield Networks
Hopfield NetworksHopfield Networks
Hopfield Networks
Neural Networks
Neural NetworksNeural Networks
Neural Networks
Kohonen self organizing maps
Kohonen self organizing mapsKohonen self organizing maps
Kohonen self organizing maps
neural network
neural networkneural network
neural network
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications

Similar to Unit 1

sravanthi computers
Machine Learning With Neural Networks
Machine Learning  With Neural NetworksMachine Learning  With Neural Networks
Machine Learning With Neural Networks
Knoldus Inc.
Nural Network ppt presentation which help about nural
Nural Network ppt presentation which help about nuralNural Network ppt presentation which help about nural
Nural Network ppt presentation which help about nural
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
Pratik Aggarwal
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptx
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
Taymoor Nazmy
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
Tamer Ahmed Farrag, PhD
Neural Networks
Neural NetworksNeural Networks
Chapter-5-Part I-Basics-Neural-Networks.pptx
Chapter-5-Part I-Basics-Neural-Networks.pptxChapter-5-Part I-Basics-Neural-Networks.pptx
Chapter-5-Part I-Basics-Neural-Networks.pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx10 Backpropagation Algorithm for Neural Networks (1).pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx
Artificial Neural Network for machine learning
Artificial Neural Network for machine learningArtificial Neural Network for machine learning
Artificial Neural Network for machine learning
Neural Networks
Neural NetworksNeural Networks
Neural Networks
Sagacious IT Solution
Principles of soft computing-Associative memory networks
Principles of soft computing-Associative memory networksPrinciples of soft computing-Associative memory networks
Principles of soft computing-Associative memory networks
Sivagowry Shathesh

Similar to Unit 1 (20)

Machine Learning With Neural Networks
Machine Learning  With Neural NetworksMachine Learning  With Neural Networks
Machine Learning With Neural Networks
Nural Network ppt presentation which help about nural
Nural Network ppt presentation which help about nuralNural Network ppt presentation which help about nural
Nural Network ppt presentation which help about nural
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptx
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
Neural Networks
Neural NetworksNeural Networks
Neural Networks
Chapter-5-Part I-Basics-Neural-Networks.pptx
Chapter-5-Part I-Basics-Neural-Networks.pptxChapter-5-Part I-Basics-Neural-Networks.pptx
Chapter-5-Part I-Basics-Neural-Networks.pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx10 Backpropagation Algorithm for Neural Networks (1).pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx
Artificial Neural Network for machine learning
Artificial Neural Network for machine learningArtificial Neural Network for machine learning
Artificial Neural Network for machine learning
Neural Networks
Neural NetworksNeural Networks
Neural Networks
Principles of soft computing-Associative memory networks
Principles of soft computing-Associative memory networksPrinciples of soft computing-Associative memory networks
Principles of soft computing-Associative memory networks

Recently uploaded

❣Independent Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai E...
❣Independent Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai E...❣Independent Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai E...
❣Independent Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai E...
An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...
An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...
An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...
Intuit CRAFT demonstration presentation for sde
Intuit CRAFT demonstration presentation for sdeIntuit CRAFT demonstration presentation for sde
Intuit CRAFT demonstration presentation for sde
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
sapna sharmap11
Call Girls Chennai +91-8824825030 Vip Call Girls Chennai
Call Girls Chennai +91-8824825030 Vip Call Girls ChennaiCall Girls Chennai +91-8824825030 Vip Call Girls Chennai
Call Girls Chennai +91-8824825030 Vip Call Girls Chennai
paraasingh12 #V08
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
SPICE PARK JUL2024 ( 6,866 SPICE Models )
SPICE PARK JUL2024 ( 6,866 SPICE Models )SPICE PARK JUL2024 ( 6,866 SPICE Models )
SPICE PARK JUL2024 ( 6,866 SPICE Models )
Tsuyoshi Horigome
High Profile Call Girls Ahmedabad 🔥 7737669865 🔥 Real Fun With Sexual Girl Av...
High Profile Call Girls Ahmedabad 🔥 7737669865 🔥 Real Fun With Sexual Girl Av...High Profile Call Girls Ahmedabad 🔥 7737669865 🔥 Real Fun With Sexual Girl Av...
High Profile Call Girls Ahmedabad 🔥 7737669865 🔥 Real Fun With Sexual Girl Av...
🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...
🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...
🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...
dulbh kashyap
College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...
College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...
College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...
The Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC Conduit
The Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC ConduitThe Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC Conduit
The Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC Conduit
Guangdong Ctube Industry Co., Ltd.
Pallavi Sharma
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
Lubi Valves
My Airframe Metallic Design Capability Studies..pdf
My Airframe Metallic Design Capability Studies..pdfMy Airframe Metallic Design Capability Studies..pdf
My Airframe Metallic Design Capability Studies..pdf
Geoffrey Wardle. MSc. MSc. Snr.MAIAA
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Cuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Cuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort ServiceCuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Cuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Call Girls Madurai 8824825030 Escort In Madurai service 24X7
Call Girls Madurai 8824825030 Escort In Madurai service 24X7Call Girls Madurai 8824825030 Escort In Madurai service 24X7
Call Girls Madurai 8824825030 Escort In Madurai service 24X7
Poonam Singh

Recently uploaded (20)

❣Independent Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai E...
❣Independent Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai E...❣Independent Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai E...
❣Independent Call Girls Chennai 💯Call Us 🔝 7737669865 🔝💃Independent Chennai E...
An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...
An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...
An In-Depth Exploration of Natural Language Processing: Evolution, Applicatio...
Intuit CRAFT demonstration presentation for sde
Intuit CRAFT demonstration presentation for sdeIntuit CRAFT demonstration presentation for sde
Intuit CRAFT demonstration presentation for sde
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Chennai +91-8824825030 Vip Call Girls Chennai
Call Girls Chennai +91-8824825030 Vip Call Girls ChennaiCall Girls Chennai +91-8824825030 Vip Call Girls Chennai
Call Girls Chennai +91-8824825030 Vip Call Girls Chennai
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
Particle Swarm Optimization–Long Short-Term Memory based Channel Estimation w...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
SPICE PARK JUL2024 ( 6,866 SPICE Models )
SPICE PARK JUL2024 ( 6,866 SPICE Models )SPICE PARK JUL2024 ( 6,866 SPICE Models )
SPICE PARK JUL2024 ( 6,866 SPICE Models )
High Profile Call Girls Ahmedabad 🔥 7737669865 🔥 Real Fun With Sexual Girl Av...
High Profile Call Girls Ahmedabad 🔥 7737669865 🔥 Real Fun With Sexual Girl Av...High Profile Call Girls Ahmedabad 🔥 7737669865 🔥 Real Fun With Sexual Girl Av...
High Profile Call Girls Ahmedabad 🔥 7737669865 🔥 Real Fun With Sexual Girl Av...
🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...
🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...
🚺ANJALI MEHTA High Profile Call Girls Ahmedabad 💯Call Us 🔝 9352988975 🔝💃Top C...
College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...
College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...
College Call Girls Kolkata 🔥 7014168258 🔥 Real Fun With Sexual Girl Available...
The Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC Conduit
The Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC ConduitThe Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC Conduit
The Differences between Schedule 40 PVC Conduit Pipe and Schedule 80 PVC Conduit
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
My Airframe Metallic Design Capability Studies..pdf
My Airframe Metallic Design Capability Studies..pdfMy Airframe Metallic Design Capability Studies..pdf
My Airframe Metallic Design Capability Studies..pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Cuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Cuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort ServiceCuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Cuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Call Girls Madurai 8824825030 Escort In Madurai service 24X7
Call Girls Madurai 8824825030 Escort In Madurai service 24X7Call Girls Madurai 8824825030 Escort In Madurai service 24X7
Call Girls Madurai 8824825030 Escort In Madurai service 24X7

Unit 1

  • 2. • computing techniques • applications of soft computing • Neuron • Nerve structure and synapse- • Artificial Neuron and its model • Activation functions • Neural network architecture • single layer and multilayer feed forward networks • McCullochPitts neuron model • perceptron model- Adaline and Madaline • multilayer perception model • back propagation learning methods • effect of learning rule coefficient • back propagation algorithm • factors affecting back propagation training applications. 2
  • 3. Neural Networks ● Artificial neural network (ANN) is a machine learning approach that models human brain and consists of a number of artificial neurons. ● Neuron in ANNs tend to have fewer connections than biological neurons. ● Each neuron in ANN receives a number of inputs. ● An activation function is applied to these inputs which results in activation level of neuron (output value of the neuron). ● Knowledge about the learning task is given in the form of examples called training examples.
  • 4. Contd.. ● An Artificial Neural Network is specified by: − neuron model: the information processing unit of the NN, − an architecture: a set of neurons and links connecting neurons. Each link has a weight, − a learning algorithm: used for training the NN by modifying the weights in order to model a particular learning task correctly on the training examples. ● The aim is to obtain a NN that is trained and generalizes well. ● It should behaves correctly on new instances of the learning task.
  • 5. Neuron ● The neuron is the basic information processing unit of a NN. It consists of: 1 A set of links, describing the neuron inputs, with weights W1, W2, …, Wm 2 An adder function (linear combiner) for computing the weighted sum of the inputs: (real numbers) 3 Activation function for limiting the amplitude of the neuron output. Here ‘b’ denotes bias.   m 1 jjxwu j  )(uy b
  • 7. Bias of a Neuron ● The bias b has the effect of applying a transformation to the weighted sum u v = u + b ● The bias is an external parameter of the neuron. It can be modeled by adding an extra input. ● v is called induced field of the neuron bw xwv j m j j    0 0
  • 8. Neuron Models ● The choice of activation function determines the neuron model. Examples: ● step function: ● ramp function: ● sigmoid function with z,x,y parameters ● Gaussian function:                  2 2 1 exp 2 1 )(     v v )exp(1 1 )( yxv zv            otherwise))/())((( if if )( cdabcva dvb cva v       cvb cva v if if )(
  • 12. • The Gaussian function is the probability function of the normal distribution. Sometimes also called the frequency curve.
  • 13. Network Architectures ● Three different classes of network architectures − single-layer feed-forward − multi-layer feed-forward − recurrent ● The architecture of a neural network is linked with the learning algorithm used to train
  • 14. Single Layer Feed-forward Input layer of source nodes Output layer of neurons
  • 15. Perceptron: Neuron Model (Special form of single layer feed forward) − The perceptron was first proposed by Rosenblatt (1958) is a simple neuron that is used to classify its input into one of two categories. − A perceptron uses a step function that returns +1 if weighted sum of its input  0 and -1 otherwise x1 x2 xn w2 w1 wn b (bias) v y (v)       0if1 0if1 )( v v v
  • 16. Perceptron for Classification ● The perceptron is used for binary classification. ● First train a perceptron for a classification task. − Find suitable weights in such a way that the training examples are correctly classified. − Geometrically try to find a hyper-plane that separates the examples of the two classes. ● The perceptron can only model linearly separable classes. ● When the two classes are not linearly separable, it may be desirable to obtain a linear separator that minimizes the mean squared error. ● Given training examples of classes C1, C2 train the perceptron in such a way that : − If the output of the perceptron is +1 then the input is assigned to class C1 − If the output is -1 then the input is assigned to C2
  • 17. X1 1 true true false true 0 1 X2 Boolean function OR – Linearly separable
  • 18. Learning Process for Perceptron ● Initially assign random weights to inputs between -0.5 and +0.5 ● Training data is presented to perceptron and its output is observed. ● If output is incorrect, the weights are adjusted accordingly using following formula. wi  wi + (a* xi *e), where ‘e’ is error produced and ‘a’ (-1  a  1) is learning rate − ‘a’ is defined as 0 if output is correct, it is +ve, if output is too low and –ve, if output is too high. − Once the modification to weights has taken place, the next piece of training data is used in the same way. − Once all the training data have been applied, the process starts again until all the weights are correct and all errors are zero. − Each iteration of this process is known as an epoch.
  • 19. Example: Perceptron to learn OR function ● Initially consider w1 = -0.2 and w2 = 0.4 ● Training data say, x1 = 0 and x2 = 0, output is 0. ● Compute y = Step(w1*x1 + w2*x2) = 0. Output is correct so weights are not changed. ● For training data x1=0 and x2 = 1, output is 1 ● Compute y = Step(w1*x1 + w2*x2) = 0.4 = 1. Output is correct so weights are not changed. ● Next training data x1=1 and x2 = 0 and output is 1 ● Compute y = Step(w1*x1 + w2*x2) = - 0.2 = 0. Output is incorrect, hence weights are to be changed. ● Assume a = 0.2 and error e=1 wi = wi + (a * xi * e) gives w1 = 0 and w2 =0.4 ● With these weights, test the remaining test data. ● Repeat the process till we get stable result.
  • 20. Perceptron: Limitations ● The perceptron can only model linearly separable functions, − those functions which can be drawn in 2-dim graph and single straight line separates values in two part. ● Boolean functions given below are linearly separable: − AND − OR − COMPLEMENT ● It cannot model XOR function as it is non linearly separable. − When the two classes are not linearly separable, it may be desirable to obtain a linear separator that minimizes the mean squared error.
  • 21. XOR – Non linearly separable function ● A typical example of non-linearly separable function is the XOR that computes the logical exclusive or.. ● This function takes two input arguments with values in {0,1} and returns one output in {0,1}, ● Here 0 and 1 are encoding of the truth values false and true, ● The output is true if and only if the two inputs have different truth values. ● XOR is non linearly separable function which can not be modeled by perceptron. ● For such functions we have to use multi layer feed-forward network.
  • 22. These two classes (true and false) cannot be separated using a line. Hence XOR is non linearly separable. Input Output X1 X2 X1 XOR X2 0 0 0 0 1 1 1 0 1 1 1 0 X1 1 true false false true 0 1 X2
  • 27.
  • 30. LSM
  • 37. Multi layer feed-forward NN (FFNN) ● FFNN is a more general network architecture, where there are hidden layers between input and output layers. ● Hidden nodes do not directly receive inputs nor send outputs to the external environment. ● FFNNs overcome the limitation of single-layer NN. ● They can handle non-linearly separable learning tasks. Input layer Output layer Hidden Layer 3-4-2 Network
  • 38. Inputs OutputofHiddenNodes Output Node X1XORX2 X1 X2 H1 H2 0 0 0 0 –0.50 0 0 1 –10 1 0.5 1 1 1 0 1 –10 0.5 1 1 1 1 0 0 –0.50 0 Since we are representing two states by 0 (false) and 1 (true), we will map negative outputs (–1, –0.5) of hidden and output layers to 0 and positive output (0.5) to 1.
  • 39. FFNN for XOR ● The ANN for XOR has two hidden nodes that realizes this non-linear separation and uses the sign (step) activation function. ● Arrows from input nodes to two hidden nodes indicate the directions of the weight vectors (1,-1) and (-1,1). ● The output node is used to combine the outputs of the two hidden nodes. Input nodes Hidden layer Output layer Output H1 –0.5 X1 1 –1 1 Y –1 H2 X2 1 1
  • 40. FFNN NEURON MODEL ● The classical learning algorithm of FFNN is based on the gradient descent method. ● For this reason the activation function used in FFNN are continuous functions of the weights, differentiable everywhere. ● The activation function for node i may be defined as a simple form of the sigmoid function in the following manner: where A > 0, Vi =  Wij * Yj , such that Wij is a weight of the link from node i to node j and Yj is the output of node j. )*( 1 1 )( ViA e Vi   
  • 41. Training Algorithm: Back-propagation ● The Back propagation algorithm learns in the same way as single perceptron. ● It searches for weight values that minimize the total error of the network over the set of training examples (training set). ● Back propagation consists of the repeated application of the following two passes: − Forward pass: In this step, the network is activated on one example and the error of (each neuron of) the output layer is computed. − Backward pass: in this step the network error is used for updating the weights. The error is propagated backwards from the output layer through the network layer by layer. This is done by recursively computing the local gradient of each neuron.
  • 42. Feed-forward Network Feed-forward networks often have one or more hidden layers of sigmoid neurons followed by an output layer of linear neurons. Multiple layers of neurons with nonlinear transfer functions allow the network to learn nonlinear and linear relationships between input and output vectors. The linear output layer lets the network produce values outside the range -1 to +1. On the other hand, if you want to constrain the outputs of a network (such as between 0 and 1), then the output layer should use a sigmoid transfer function (such as logsig).
  • 43. Backpropagation Learning Algorithm The following slides describes teaching process of multi-layer neural network employing back-propagation algorithm. To illustrate this process the three layer neural network with two inputs and one output, which is shown in the picture below, is used:
  • 44. Learning Algorithm Backpropagation Each neuron is composed of two units. First unit adds products of weights coefficients and input signals. The second unit realise nonlinear function, called neuron transfer (activation) function. Signal e is adder output signal, and y = f(e) is output signal of nonlinear element. Signal y is also output signal of neuron.
  • 45. Learning Algorithm: Backpropagation To teach the neural network we need training data set. The training data set consists of input signals (x1 and x2 ) assigned with corresponding target (desired output) z. The network training is an iterative process. In each iteration weights coefficients of nodes are modified using new data from training data set. Modification is calculated using algorithm described below: Each teaching step starts with forcing both input signals from training set. After this stage we can determine output signals values for each neuron in each network layer.
  • 46. Learning Algorithm: Backpropagation Pictures below illustrate how signal is propagating through the network, Symbols w(xm)n represent weights of connections between network input xm and neuron n in input layer. Symbols yn represents output signal of neuron n.
  • 49. Learning Algorithm: Backpropagation Propagation of signals through the hidden layer. Symbols wmn represent weights of connections between output of neuron m and input of neuron n in the next layer.
  • 52. Learning Algorithm: Backpropagation Propagation of signals through the output layer.
  • 53. Learning Algorithm: Backpropagation In the next algorithm step the output signal of the network y is compared with the desired output value (the target), which is found in training data set. The difference is called error signal d of output layer neuron
  • 54. Learning Algorithm: Backpropagation The idea is to propagate error signal d (computed in single teaching step) back to all neurons, which output signals were input for discussed neuron.
  • 55. Learning Algorithm: Backpropagation The idea is to propagate error signal d (computed in single teaching step) back to all neurons, which output signals were input for discussed neuron.
  • 56. Learning Algorithm: Backpropagation The weights' coefficients wmn used to propagate errors back are equal to this used during computing output value. Only the direction of data flow is changed (signals are propagated from output to inputs one after the other). This technique is used for all network layers. If propagated errors came from few neurons they are added. The illustration is below:
  • 57. Learning Algorithm: Backpropagation When the error signal for each neuron is computed, the weights coefficients of each neuron input node may be modified. In formulas below df(e)/de represents derivative of neuron activation function (which weights are modified).
  • 58. Learning Algorithm: Backpropagation When the error signal for each neuron is computed, the weights coefficients of each neuron input node may be modified. In formulas below df(e)/de represents derivative of neuron activation function (which weights are modified).
  • 59. Learning Algorithm: Backpropagation When the error signal for each neuron is computed, the weights coefficients of each neuron input node may be modified. In formulas below df(e)/de represents derivative of neuron activation function (which weights are modified).
  • 60. Weight Update Rule ● The Backprop weight update rule is based on the gradient descent method: − It takes a step in the direction yielding the maximum decrease of the network error E. − This direction is the opposite of the gradient of E. ● Iteration of the Backprop algorithm is usually terminated when the sum of squares of errors of the output values for all training data in an epoch is less than some threshold such as 0.01 ijijij www  ij ij w -w    E 
  • 61. Back-prop learning algorithm (incremental-mode) n=1; initialize weights randomly; while (stopping criterion not satisfied or n <max_iterations) for each example (x,d) - run the network with input x and compute the output y - update the weights in backward order starting from those of the output layer: with computed using the (generalized) Delta rule end-for n = n+1; end-while; jijiji www  jiw
  • 62. Stopping criterions ● Total mean squared error change: − Back-prop is considered to have converged when the absolute rate of change in the average squared error per epoch is sufficiently small (in the range [0.1, 0.01]). ● Generalization based criterion: − After each epoch, the NN is tested for generalization. − If the generalization performance is adequate then stop. − If this stopping criterion is used then the part of the training set used for testing the network generalization will not used for updating the weights.
  • 63. ● Data representation ● Network Topology ● Network Parameters ● Training ● Validation NN DESIGN ISSUES
  • 64. ● Data representation depends on the problem. ● In general ANNs work on continuous (real valued) attributes. Therefore symbolic attributes are encoded into continuous ones. ● Attributes of different types may have different ranges of values which affect the training process. ● Normalization may be used, like the following one which scales each attribute to assume values between 0 and 1. for each value xi of ith attribute, mini and maxi are the minimum and maximum value of that attribute over the training set. Data Representation i i minmax min    i i i x x
  • 65. ● The number of layers and neurons depend on the specific task. ● In practice this issue is solved by trial and error. ● Two types of adaptive algorithms can be used: − start from a large network and successively remove some neurons and links until network performance degrades. − begin with a small network and introduce new neurons until performance is satisfactory. Network Topology
  • 66. ● How are the weights initialized? ● How is the learning rate chosen? ● How many hidden layers and how many neurons? ● How many examples in the training set? Network parameters
  • 67. Initialization of weights ● In general, initial weights are randomly chosen, with typical values between -1.0 and 1.0 or -0.5 and 0.5. ● If some inputs are much larger than others, random initialization may bias the network to give much more importance to larger inputs. ● In such a case, weights can be initialized as follows:   Ni N ,...,1 |x| 1 2 1 ij i w For weights from the input to the first layer For weights from the first to the second layer   Ni N i ,...,1 )xw( 1 2 1 jk ij w 
  • 68. ● The right value of  depends on the application. ● Values between 0.1 and 0.9 have been used in many applications. ● Other heuristics is that adapt  during the training as described in previous slides. Choice of learning rate
  • 69. Training ● Rule of thumb: − the number of training examples should be at least five to ten times the number of weights of the network. ● Other rule: |W|= number of weights a=expected accuracy on test seta)-(1 |W| N 
  • 70. Contd.. ● The networks generated using these weights and input vectors are stable, except X2. ● X2 stabilizes to X1 (which is at hamming distance 1). ● Finally, with the obtained weights and stable states (X1 and X3), we can stabilize any new (partial) pattern to one of those