尊敬的 微信汇率:1円 ≈ 0.046239 元 支付宝汇率:1円 ≈ 0.04633元 [退出登录]
SlideShare a Scribd company logo
Artificial Neural Networks
Artificial Neural Networks (ANN)
• An artificial neural network (ANN) is the piece of a
computing system designed to simulate the way the
human brain analyzes and processes information.
• ANNs have self-learning capabilities that enable
them to produce better results as more data
becomes available.
• Provide a general, practical method for learning
real-valued, discrete-valued and vector valued
functions from examples
• ANN learning is robust to errors in training data
• Applied to problems like interpreting visual scenes,
speech recognition, robot control strategies, hand
written character recognition and face recognition
Biological Motivation
• ANN has been inspired by biological learning
system
• Biological learning system is made up of
complex web of interconnected neurons
• ANNs are built out of densely interconnected
set of units where each unit takes a number of
real valued inputs and produces a single
real-valued output
4
Facts from Neuro Biology -Connectionist
Models
Consider human brain
Number of neurons ~ 1011
neurons
• Connections per neuron ~ 104-5
• Neuron switching time ~ 10-3
seconds(0.001)
• Computer switching time ~ 10-10
seconds
• Scene recognition time ~ 10-1
seconds(0.1)
→ Information processing ability of biological neural system
much similar to parallel computation
→ Motivation for ANN system is to capture this kind of
highly parallel computation
Neural Network Representation -
Example
• ALVINN – a learned ANN to steer an autonomous
vehicle driving at normal speeds on public highways.
• Input to NN is 30x32 grid of intensities obtained from
a forward-pointed camera mounted on the vehicle
• Output is the direction in which the vehicle is steered
• ALVINN is trained for steering commands of a human
driving the vehicle for 5 min.
• It has used its learned networks to successfully drive
at speeds up to 70 miles/hour and for distances of 90
miles on public highways
6
NN representation of ALVINN system
Left picture shows the image of a forward mounted camera is mapped to
960 NN inputs, which are fed forward to 4 hidden units, connected to 30
output units. These outputs encode the commanded steering direction.
Right picture shows weight values for one of the hidden units in this
network. The 30x32 weights into the hidden unit are displayed in the large
matrix, with white block indicating positive and black indicating negative
weights. The weights from this hidden unit to 30 output units are depicted
by smaller rectangular block.
Appropriate problems for NN Learning
• Instances are represented by many attribute-value pairs
• The target function output may be discrete-valued, real-valued or
vector-valued attributes
• The training examples may contain errors
• Long training times are acceptable
• Fast evaluation of the learned target function may be required
• The ability of humans to understand the learned target function may be
required
• Alternative designs for primitive units that make up ANN are
– Perceptrons
– Linear units
– Sigmoid units
• Backpropagation algorithm is most commonly used ANN learning
technique
Perceptrons
• One type of ANN system is based on a unit
called perceptron
• A perceptron takes a vector of real valued
inputs, calculates a linear combination of
these inputs, then outputs a 1 if the result is
greater than some threshold and -1
otherwise.
Perceptron
Sometimes we’ll use simpler vector notation:
Learning a perceptron involves choosing values
for the weights w0
,…wn
.
Representational power of Perceptrons
• We can view the perceptron as representing a
hyperplane decision surface in the
n-dimensional space of instances.
• The perceptron outputs a 1 for instances lying
on one side of the hyperplane and outputs -1
for instances lying on other side.
• Perceptrons can represent all of the primitive
boolean functions AND, OR, NAND and NOR.
• Some boolean functions cannot be
represented by a single perceptron such as
XOR function whose value is 1 iff x1
not equal
to x2
12
Decision Surface of two input Perceptron
x1
and x2
are perceptron inputs
(a) A set of training examples and the decision surface
of a perceptron that classifies them correctly
(b) A set of training examples that is not linearly
separable.
• A single perceptron can be used to represent many boolean
functions.
• Eg: 1(true), -1(false)
Represents some useful functions
• Two input AND gate if w0
= -0.8, w1
= +0.5, w2
=+0.5
• For (-1,-1) with x0
=1
=w0
x0
+w1
x1
+w2
x2
= (-0.8 x 1) – (0.5 x (-1)) + (0.5 x (-1))
= -0.8-0.5-0.5
= -1.8 ~ 0 -> -1
x1 x2 Output
-1 -1 -1
-1 +1 -1
+1 -1 -1
+1 +1 +1
• For (-1,+1) with x0
=1
=w0
x0
+w1
x1
+w2
x2
= (-0.8 x 1) – (0.5 x (-1)) + (0.5 x (+1))
= -0.8-0.5+0.5
= -0.8 ~ 0 -> -1
• For (+1,-1) with x0
=1
=w0
x0
+w1
x1
+w2
x2
= (-0.8 x 1) – (0.5 x (+1)) + (0.5 x (-1))
= -0.8+0.5-0.5
= -0.8 ~ 0 -> -1
• For (+1,+1) with x0
=1
=w0
x0
+w1
x1
+w2
x2
= (-0.8 x 1) – (0.5 x (+1)) + (0.5 x (+1))
= -0.8+0.5+0.5
= 0.2 ~ 1 -> +1
• Two input OR gate if w0
= 0.1, w1
= +0.1, w2
=+0.1
• For (-1,-1) with x0
=1
=w0
x0
+w1
x1
+w2
x2
= (0.1 x 1) + (0.1 x (-1)) + (0.1 x -1)
= 0.1-0.1-0.1
= -0.1 ~ 0 -> -1
• Similarly calculate for (-1,+1), (+1,-1), (+1,+1)
• NOT gate w0
=0.5, w1
=-1
x1 x2 Output
-1 -1 -1
-1 +1 +1
+1 -1 +1
+1 +1 +1
• Two input XOR gate
• With a single perceptron, implementation of XOR Boolean
function is not possible because the samples are not linearly
separable.
x1 x2 Output
-1 -1 -1
-1 +1 +1
+1 -1 +1
+1 +1 -1
Perceptron training rule
• The learning problem is to determine a weight
vector that causes the perceptron to produce
the correct +1/-1 output for each of the given
training examples
• Algorithms to solve this learning problem are
– Perceptron rule
– Delta rule
Perceptron Training Rule
• One way to learn an acceptable weight vector is to
begin with random weights, then iteratively apply
the perceptron to each training example, modify
the perceptron weights whenever it misclassifies an
example.
• This process is repeated, iterating through the
training examples as many times as needed until the
perceptron classifies all training examples
correctly.
Perceptron Rule
• Weights are modified at each step according to
perceptron training rule which revises the weight wi
associated with input xi
according to the rule
wi
← wi
+ Δwi
whereΔwi
= η (t – o) xi
Where:
– t is target output for the current training example
– o is perceptron output or output generated by the
hypothesis
– η is positive constant (e.g., 0.1) called learning rate
• The role of the learning rate is to moderate the
degree to which weights are changed at each step. It
is usually set to some small value
Perceptron Rule
• Can prove it will converge
– If training data is linearly separable
– and η sufficiently small
• Limitations
It can fail to converge if the example are not
linearly separable
Gradient Descent and Delta Rule
• If the training examples are not linearly
separable, the delta rule converges toward a
best-fit approximation to the target concept.
• Delta rule is the variant of LMS.
• The key idea behind the delta rule is to use
gradient descent to search the hypothesis space
of possible weight vectors to find the weights
that best fit the training examples.
• The gradient descent provides the basis for the
Backpropogation algorithm which can learn
networks with many interconnected units.
• The training error of a hypothesis relative to the
training examples can be measured as
Where
-D is the set of training examples
-td
is the target output for training example d
-od
is the output of the linear unit for
training example d
This error is the half the squared difference
between the target output td
and the linear unit
output od
summed over all training examples.
Visualizing the Hypothesis Space
• To understand the gradient descent algorithm, it
is helpful to visualize the entire hypothesis space
of possible weight vectors and their associated E
values.
• Here the axes w0
and w1
represent possible
values for the two weights of a simple linear unit.
• The w0,
w1
plane represents the entire
hypothesis space.
• The vertical axis indicates the error E relative to
some fixed set of training examples.
Gradient Descent (1/4)
• Gradient descent search determines a weight
vector that minimizes E by starting with an
arbitrary initial weight vector, then
repeatedly modifying it in small steps.
• At each step, the weight vector is altered in
the direction that produces the steepest
descent along the error surface.
• This process continues until the global
minimum error is reached.
25
Gradient Descent (2/4)
Gradient
Training rule:
i.e.,
26
Gradient Descent (3/4)
27
Gradient Descent Algorithm for
training a linear unit (4/4)
Gradient-Descent (training examples, η )
Each training example is a pair of the form <x, t> where
x is the vector of input values, and t is the target output
value. η is the learning rate.
• Initialize each wi
to some small random value
• Until the termination condition is met, Do
– Initialize each Δwi
to zero.
– For each <x, t> in training_examples, Do
* Input the instance x to the unit and compute the output o
* For each linear unit weight wi
, Do
Δwi
← Δwi
+ η (t – o) xi
– For each linear unit weight wi
, Do
wi
← wi
+ Δwi
Stochastic Approximation to Gradient
Descent
• Difficulties in Gradient Descent
– Converging to local minimum
– Slow
– No guarantee to find global minimum
• One variation on gradient descent is
incremental gradient descent or stochastic
gradient descent which updates the weights
incrementally, following the calculation of the
error for each individual example
29
Incremental (Stochastic) Gradient Descent (1/2)
Batch mode Gradient Descent:
Do until satisfied
1. Compute the gradient ▽ED
[w]
2. w ← w - η ▽ED
[w]
Incremental mode Gradient Descent:
Do until satisfied
• For each training example d in D
1. Compute the gradient ▽Ed
[w]
2. w ← w - η ▽Ed
[w]
30
Incremental (Stochastic) Gradient Descent (2/2)
Incremental Gradient Descent can approximate
Batch Gradient Descent arbitrarily closely if η
made small enough
Differences between standard gradient
descent and stochastic gradient descent
Standard Gradient Descent Stochastic gradient descent
The error is summed over all
examples before updating
weights
Weights are updated upon
examining each training
example
Summing over multiple
examples requires more
computation
Less computation
Falls into local minima Sometimes avoid falling into
local minima
Multilayer Neural Networks (Multilayer Perceptrons)
• Single perceptrons can only express linear decision
surfaces
• Multilayer networks learned by the Back propagation
algorithm are capable of expressing non linear
decision surfaces
Decision regions of a multilayer
feedforward network
Speech Recognition Task
• It involves distinguishing among 10 possible
vowels, all spoken in the context of h-d (hid,
had, head, hood etc)
• The input speech signal is represented by two
numerical parameters (F1, F2)obtained from
a spectral analysis of the sound.
• The 10 network outputs correspond to 10
possible vowel sounds
• The network prediction is the output whose
value is highest.
Differentiable Threshold Unit
• Unit used as the basis for constructing
multilayer networks – sigmoid unit which is
very similar to perceptron but based on
smoothed, differentiable threshold function.
• Like perceptron, the sigmoid unit first
computes a linear combination of its inputs,
then applies a threshold to the result.
36
Sigmoid Unit
σ(x) is the sigmoid function
• The sigmoid function has the Nice property that its
derivative is easily expressed in terms of its output.
• Nice property:
We can derive gradient decent rules to train
• One sigmoid unit
• Multilayer networks of sigmoid units →
Backpropagation
Sigmoid Function
Backpropagation Algorithm
• The backpropagation algorithm learns the weights
for a multilayer network, given a network with a
fixed set of units and interconnections.
• It employs gradient descent to attempt to
minimize the squared error between the network
output values and target values for these outputs.
• We are considering networks with multiple
output units rather than single unit, we begin by
redefining E to sum the errors over all of the
network output units
Notations/Extensions
• An index is assigned to each node in the
network, where a node is either an input to
the network or the output of some unit in the
network
• Xij
denotes the input from node i to unit j and
wij
denotes the corresponding weight
• δn
denotes the error term associated with
unit n.
40
Backpropagation Algorithm for feedforward
networks containing two layers of sigmoid units
Backpropagation(training examples, η, nin
,
nout
, nhidden
)
Each training example is a pair of the form <x, t>
where
x is the vector of input values, and t is vector of
target output values. η is the learning
rate(.05)
nin
is the number of network inputs, nhidden
is the
number of units in the hidden layer, nout
is the
number of output units.
The input from unit i to unit j is denoted by xij
,
and the weight from unit i to unit j is denoted
• Create feed-forward network with nin
inputs, nhidden
hidden units and nout
output units.
• Initialize all weights to small random numbers. (between -0.05 and 0.05)
• Until the termination condition is met, Do
• For each training example, Do
• Propagate the input forward through the network
1. Input the training example to the network and compute the output ou
of every unit u in the network.
• Propagate the errors backward through the network
2. For each output unit k calculate its error δk
δk
← οk
(1 - οk
) (tk
- οk
)
3. For each hidden unit h, calculate its error term δh
δh
← οh
(1 - οh
) ∑ k ∈outputs
wh,k
δk
4. Update each network weight wi,j
wi,j
← wi,j
+ Δwi,j
where Δwi,j
= η δi
xi,j
42
More on Back propagation
• Gradient descent over entire network weight vector
• Easily generalized to arbitrary directed graphs
• Will find a local, not necessarily global error minimum
– In practice, often works well (can run multiple times)
• Often include weight momentum α to speedup
convergence
Δwi,j
(n) = η δj
xi,j
+ α Δwi,j
(n - 1)
43
Learning Hidden Layer Representations (1/2)
A target function:
Can this be learned??
44
Learning Hidden Layer Representations (2/2)
A network: Learned hidden layer representation:
An illustrative Example: Face recognition
• To illustrate some of the practical design choices
involved in applying backpropagation – face
recognition task
• Learning task
– Classifying camera images of faces of various people in
various poses.
– Images of 20 different people were collected, including
approximately 32 images per person, varying the person’s
expression (happy, sad, angry, neutral), the direction in
which they were looking.(left, right, straight ahead, up)
and whether or not they were wearing sunglasses
– Other variations
• Background behind the person
• The clothing worn by the person
• Position of the person’s face within the image
Target Functions
• A variety of target functions can be learned
from this image data.
• Given an image as input, we could train an
ANN to output the identity of the person, the
direction in which the person is facing, the
gender of the person, wearing sun glass or
not etc.
• Consider the learning task as
– Learning the direction in which the person is
facing (to their left, right, straight, upward)
47
Neural Nets for Face Recognition
• 90% accurate learning head pose, and recognizing
1-of-20 faces
48
Learned Hidden Unit Weights
• Each output unit has four weights – dark(-ve), light (+ve)
Blocks
• Leftmost block – weight w0
which determines unit threshold
• Right 3 blocks – weights on inputs from three hidden units
Design Choices
• Input Encoding
• Output Encoding
• Network graph structure
• Other learning algorithm parameters
Input Encoding
• Preprocess the image to extract edges,
regions of uniform intensity or other local
image features, then input these features to
the network.
• This leads to variable number of features
(edges) per image, whereas the ANN has a
fixed number of input units.
• The pixel intensity values ranging from 0 to
255 are linearly scaled to 0 to1.
Output Encoding
• ANN must output one of four values indicating the
direction in which the person is looking
• We could encode this four-way classification using single
output unit, assigning outputs of 0.2, 0.4, 0.6 and 0.8 to
encode these four possible values.
• Instead use four distinct output units, each representing
one of four possible face directions, with the highest
valued output taken as the network prediction.
• This is called 1-of-n output encoding
• Obvious choices
– To encode a face looking to left 1,0,0,0
– To encode a face looking straight 0,1,0,0
• Target output vector
– 0.9,0.1,0.1,0.1,0.1
Network Graph Structure
• Backpropagation can be applied to any acyclic
directed graph of sigmoid units.
• Design choice here is, how many units to
include in the network and how to
interconnect them.
• Standard structure is two layers of sigmoid
units (one hidden layer and one output layer)
Other learning algorithm parameters
• Learning rate =0.3
Often include weight momentum α to speedup
convergence
Δwi,j
(n) = η δj
xi,j
+ α Δwi,j
(n - 1)
• Momentum = 0.3

More Related Content

Similar to ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING

nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
MayuraD1
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Ahmed Yousry
 
Circuitanlys2
Circuitanlys2Circuitanlys2
Circuitanlys2
Senthil Kumar
 
Sess07 Clustering02_KohonenNet.pptx
Sess07 Clustering02_KohonenNet.pptxSess07 Clustering02_KohonenNet.pptx
Sess07 Clustering02_KohonenNet.pptx
SarthakKabi2
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Randa Elanwar
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
Atul Krishna
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
Taymoor Nazmy
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptx
AbdusSadik
 
m3 (2).pdf
m3 (2).pdfm3 (2).pdf
m3 (2).pdf
Devikashetty14
 
Lecture9April2020_time_11_55amto12_50pm(Neural_network_PPT).pptx
Lecture9April2020_time_11_55amto12_50pm(Neural_network_PPT).pptxLecture9April2020_time_11_55amto12_50pm(Neural_network_PPT).pptx
Lecture9April2020_time_11_55amto12_50pm(Neural_network_PPT).pptx
VAIBHAVSAHU55
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
ssuserab4f3e
 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
PradeeshSAI
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
Akash Goel
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptx
sghorai
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
DebabrataPain1
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
10 Backpropagation Algorithm for Neural Networks (1).pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx10 Backpropagation Algorithm for Neural Networks (1).pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx
SaifKhan703888
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
Francesco Collova'
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)
Jon Lederman
 

Similar to ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING (20)

nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 
Circuitanlys2
Circuitanlys2Circuitanlys2
Circuitanlys2
 
Sess07 Clustering02_KohonenNet.pptx
Sess07 Clustering02_KohonenNet.pptxSess07 Clustering02_KohonenNet.pptx
Sess07 Clustering02_KohonenNet.pptx
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptx
 
m3 (2).pdf
m3 (2).pdfm3 (2).pdf
m3 (2).pdf
 
Lecture9April2020_time_11_55amto12_50pm(Neural_network_PPT).pptx
Lecture9April2020_time_11_55amto12_50pm(Neural_network_PPT).pptxLecture9April2020_time_11_55amto12_50pm(Neural_network_PPT).pptx
Lecture9April2020_time_11_55amto12_50pm(Neural_network_PPT).pptx
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
cnn.pptx
cnn.pptxcnn.pptx
cnn.pptx
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
10 Backpropagation Algorithm for Neural Networks (1).pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx10 Backpropagation Algorithm for Neural Networks (1).pptx
10 Backpropagation Algorithm for Neural Networks (1).pptx
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)
 

Recently uploaded

SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
Pallavi Sharma
 
Data Communication and Computer Networks Management System Project Report.pdf
Data Communication and Computer Networks Management System Project Report.pdfData Communication and Computer Networks Management System Project Report.pdf
Data Communication and Computer Networks Management System Project Report.pdf
Kamal Acharya
 
Literature review for prompt engineering of ChatGPT.pptx
Literature review for prompt engineering of ChatGPT.pptxLiterature review for prompt engineering of ChatGPT.pptx
Literature review for prompt engineering of ChatGPT.pptx
LokerXu2
 
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
AK47
 
Online train ticket booking system project.pdf
Online train ticket booking system project.pdfOnline train ticket booking system project.pdf
Online train ticket booking system project.pdf
Kamal Acharya
 
My Airframe Metallic Design Capability Studies..pdf
My Airframe Metallic Design Capability Studies..pdfMy Airframe Metallic Design Capability Studies..pdf
My Airframe Metallic Design Capability Studies..pdf
Geoffrey Wardle. MSc. MSc. Snr.MAIAA
 
Lateral load-resisting systems in buildings.pptx
Lateral load-resisting systems in buildings.pptxLateral load-resisting systems in buildings.pptx
Lateral load-resisting systems in buildings.pptx
DebendraDevKhanal1
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Balvir Singh
 
❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...
❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...
❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...
hotchicksescort
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
Lubi Valves
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
sapna sharmap11
 
CSP_Study - Notes (Paul McNeill) 2017.pdf
CSP_Study - Notes (Paul McNeill) 2017.pdfCSP_Study - Notes (Paul McNeill) 2017.pdf
CSP_Study - Notes (Paul McNeill) 2017.pdf
Ismail Sultan
 
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
DharmaBanothu
 
comptia-security-sy0-701-exam-objectives-(5-0).pdf
comptia-security-sy0-701-exam-objectives-(5-0).pdfcomptia-security-sy0-701-exam-objectives-(5-0).pdf
comptia-security-sy0-701-exam-objectives-(5-0).pdf
foxlyon
 
一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理
一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理
一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理
nonods
 
Cuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Cuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort ServiceCuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Cuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
yakranividhrini
 
AN INTRODUCTION OF AI & SEARCHING TECHIQUES
AN INTRODUCTION OF AI & SEARCHING TECHIQUESAN INTRODUCTION OF AI & SEARCHING TECHIQUES
AN INTRODUCTION OF AI & SEARCHING TECHIQUES
drshikhapandey2022
 
Mahipalpur Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
Mahipalpur Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...Mahipalpur Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
Mahipalpur Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
simrangupta87541
 
BBOC407 Module 1.pptx Biology for Engineers
BBOC407  Module 1.pptx Biology for EngineersBBOC407  Module 1.pptx Biology for Engineers
BBOC407 Module 1.pptx Biology for Engineers
sathishkumars808912
 
Covid Management System Project Report.pdf
Covid Management System Project Report.pdfCovid Management System Project Report.pdf
Covid Management System Project Report.pdf
Kamal Acharya
 

Recently uploaded (20)

SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
 
Data Communication and Computer Networks Management System Project Report.pdf
Data Communication and Computer Networks Management System Project Report.pdfData Communication and Computer Networks Management System Project Report.pdf
Data Communication and Computer Networks Management System Project Report.pdf
 
Literature review for prompt engineering of ChatGPT.pptx
Literature review for prompt engineering of ChatGPT.pptxLiterature review for prompt engineering of ChatGPT.pptx
Literature review for prompt engineering of ChatGPT.pptx
 
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
🔥Independent Call Girls In Pune 💯Call Us 🔝 7014168258 🔝💃Independent Pune Esco...
 
Online train ticket booking system project.pdf
Online train ticket booking system project.pdfOnline train ticket booking system project.pdf
Online train ticket booking system project.pdf
 
My Airframe Metallic Design Capability Studies..pdf
My Airframe Metallic Design Capability Studies..pdfMy Airframe Metallic Design Capability Studies..pdf
My Airframe Metallic Design Capability Studies..pdf
 
Lateral load-resisting systems in buildings.pptx
Lateral load-resisting systems in buildings.pptxLateral load-resisting systems in buildings.pptx
Lateral load-resisting systems in buildings.pptx
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
 
❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...
❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...
❣Unsatisfied Bhabhi Call Girls Surat 💯Call Us 🔝 7014168258 🔝💃Independent Sura...
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
 
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call GirlCall Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
Call Girls Goa (india) ☎️ +91-7426014248 Goa Call Girl
 
CSP_Study - Notes (Paul McNeill) 2017.pdf
CSP_Study - Notes (Paul McNeill) 2017.pdfCSP_Study - Notes (Paul McNeill) 2017.pdf
CSP_Study - Notes (Paul McNeill) 2017.pdf
 
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
A high-Speed Communication System is based on the Design of a Bi-NoC Router, ...
 
comptia-security-sy0-701-exam-objectives-(5-0).pdf
comptia-security-sy0-701-exam-objectives-(5-0).pdfcomptia-security-sy0-701-exam-objectives-(5-0).pdf
comptia-security-sy0-701-exam-objectives-(5-0).pdf
 
一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理
一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理
一比一原版(psu学位证书)美国匹兹堡州立大学毕业证如何办理
 
Cuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Cuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort ServiceCuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
Cuttack Call Girls 💯Call Us 🔝 7374876321 🔝 💃 Independent Female Escort Service
 
AN INTRODUCTION OF AI & SEARCHING TECHIQUES
AN INTRODUCTION OF AI & SEARCHING TECHIQUESAN INTRODUCTION OF AI & SEARCHING TECHIQUES
AN INTRODUCTION OF AI & SEARCHING TECHIQUES
 
Mahipalpur Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
Mahipalpur Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...Mahipalpur Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
Mahipalpur Call Girls Delhi 🔥 9711199012 ❄- Pick Your Dream Call Girls with 1...
 
BBOC407 Module 1.pptx Biology for Engineers
BBOC407  Module 1.pptx Biology for EngineersBBOC407  Module 1.pptx Biology for Engineers
BBOC407 Module 1.pptx Biology for Engineers
 
Covid Management System Project Report.pdf
Covid Management System Project Report.pdfCovid Management System Project Report.pdf
Covid Management System Project Report.pdf
 

ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING

  • 2. Artificial Neural Networks (ANN) • An artificial neural network (ANN) is the piece of a computing system designed to simulate the way the human brain analyzes and processes information. • ANNs have self-learning capabilities that enable them to produce better results as more data becomes available. • Provide a general, practical method for learning real-valued, discrete-valued and vector valued functions from examples • ANN learning is robust to errors in training data • Applied to problems like interpreting visual scenes, speech recognition, robot control strategies, hand written character recognition and face recognition
  • 3. Biological Motivation • ANN has been inspired by biological learning system • Biological learning system is made up of complex web of interconnected neurons • ANNs are built out of densely interconnected set of units where each unit takes a number of real valued inputs and produces a single real-valued output
  • 4. 4 Facts from Neuro Biology -Connectionist Models Consider human brain Number of neurons ~ 1011 neurons • Connections per neuron ~ 104-5 • Neuron switching time ~ 10-3 seconds(0.001) • Computer switching time ~ 10-10 seconds • Scene recognition time ~ 10-1 seconds(0.1) → Information processing ability of biological neural system much similar to parallel computation → Motivation for ANN system is to capture this kind of highly parallel computation
  • 5. Neural Network Representation - Example • ALVINN – a learned ANN to steer an autonomous vehicle driving at normal speeds on public highways. • Input to NN is 30x32 grid of intensities obtained from a forward-pointed camera mounted on the vehicle • Output is the direction in which the vehicle is steered • ALVINN is trained for steering commands of a human driving the vehicle for 5 min. • It has used its learned networks to successfully drive at speeds up to 70 miles/hour and for distances of 90 miles on public highways
  • 6. 6 NN representation of ALVINN system Left picture shows the image of a forward mounted camera is mapped to 960 NN inputs, which are fed forward to 4 hidden units, connected to 30 output units. These outputs encode the commanded steering direction. Right picture shows weight values for one of the hidden units in this network. The 30x32 weights into the hidden unit are displayed in the large matrix, with white block indicating positive and black indicating negative weights. The weights from this hidden unit to 30 output units are depicted by smaller rectangular block.
  • 7. Appropriate problems for NN Learning • Instances are represented by many attribute-value pairs • The target function output may be discrete-valued, real-valued or vector-valued attributes • The training examples may contain errors • Long training times are acceptable • Fast evaluation of the learned target function may be required • The ability of humans to understand the learned target function may be required • Alternative designs for primitive units that make up ANN are – Perceptrons – Linear units – Sigmoid units • Backpropagation algorithm is most commonly used ANN learning technique
  • 8. Perceptrons • One type of ANN system is based on a unit called perceptron • A perceptron takes a vector of real valued inputs, calculates a linear combination of these inputs, then outputs a 1 if the result is greater than some threshold and -1 otherwise.
  • 9. Perceptron Sometimes we’ll use simpler vector notation: Learning a perceptron involves choosing values for the weights w0 ,…wn .
  • 10. Representational power of Perceptrons • We can view the perceptron as representing a hyperplane decision surface in the n-dimensional space of instances. • The perceptron outputs a 1 for instances lying on one side of the hyperplane and outputs -1 for instances lying on other side.
  • 11. • Perceptrons can represent all of the primitive boolean functions AND, OR, NAND and NOR. • Some boolean functions cannot be represented by a single perceptron such as XOR function whose value is 1 iff x1 not equal to x2
  • 12. 12 Decision Surface of two input Perceptron x1 and x2 are perceptron inputs (a) A set of training examples and the decision surface of a perceptron that classifies them correctly (b) A set of training examples that is not linearly separable.
  • 13. • A single perceptron can be used to represent many boolean functions. • Eg: 1(true), -1(false) Represents some useful functions • Two input AND gate if w0 = -0.8, w1 = +0.5, w2 =+0.5 • For (-1,-1) with x0 =1 =w0 x0 +w1 x1 +w2 x2 = (-0.8 x 1) – (0.5 x (-1)) + (0.5 x (-1)) = -0.8-0.5-0.5 = -1.8 ~ 0 -> -1 x1 x2 Output -1 -1 -1 -1 +1 -1 +1 -1 -1 +1 +1 +1
  • 14. • For (-1,+1) with x0 =1 =w0 x0 +w1 x1 +w2 x2 = (-0.8 x 1) – (0.5 x (-1)) + (0.5 x (+1)) = -0.8-0.5+0.5 = -0.8 ~ 0 -> -1 • For (+1,-1) with x0 =1 =w0 x0 +w1 x1 +w2 x2 = (-0.8 x 1) – (0.5 x (+1)) + (0.5 x (-1)) = -0.8+0.5-0.5 = -0.8 ~ 0 -> -1 • For (+1,+1) with x0 =1 =w0 x0 +w1 x1 +w2 x2 = (-0.8 x 1) – (0.5 x (+1)) + (0.5 x (+1)) = -0.8+0.5+0.5 = 0.2 ~ 1 -> +1
  • 15. • Two input OR gate if w0 = 0.1, w1 = +0.1, w2 =+0.1 • For (-1,-1) with x0 =1 =w0 x0 +w1 x1 +w2 x2 = (0.1 x 1) + (0.1 x (-1)) + (0.1 x -1) = 0.1-0.1-0.1 = -0.1 ~ 0 -> -1 • Similarly calculate for (-1,+1), (+1,-1), (+1,+1) • NOT gate w0 =0.5, w1 =-1 x1 x2 Output -1 -1 -1 -1 +1 +1 +1 -1 +1 +1 +1 +1
  • 16. • Two input XOR gate • With a single perceptron, implementation of XOR Boolean function is not possible because the samples are not linearly separable. x1 x2 Output -1 -1 -1 -1 +1 +1 +1 -1 +1 +1 +1 -1
  • 17. Perceptron training rule • The learning problem is to determine a weight vector that causes the perceptron to produce the correct +1/-1 output for each of the given training examples • Algorithms to solve this learning problem are – Perceptron rule – Delta rule
  • 18. Perceptron Training Rule • One way to learn an acceptable weight vector is to begin with random weights, then iteratively apply the perceptron to each training example, modify the perceptron weights whenever it misclassifies an example. • This process is repeated, iterating through the training examples as many times as needed until the perceptron classifies all training examples correctly.
  • 19. Perceptron Rule • Weights are modified at each step according to perceptron training rule which revises the weight wi associated with input xi according to the rule wi ← wi + Δwi whereΔwi = η (t – o) xi Where: – t is target output for the current training example – o is perceptron output or output generated by the hypothesis – η is positive constant (e.g., 0.1) called learning rate • The role of the learning rate is to moderate the degree to which weights are changed at each step. It is usually set to some small value
  • 20. Perceptron Rule • Can prove it will converge – If training data is linearly separable – and η sufficiently small • Limitations It can fail to converge if the example are not linearly separable
  • 21. Gradient Descent and Delta Rule • If the training examples are not linearly separable, the delta rule converges toward a best-fit approximation to the target concept. • Delta rule is the variant of LMS. • The key idea behind the delta rule is to use gradient descent to search the hypothesis space of possible weight vectors to find the weights that best fit the training examples. • The gradient descent provides the basis for the Backpropogation algorithm which can learn networks with many interconnected units.
  • 22. • The training error of a hypothesis relative to the training examples can be measured as Where -D is the set of training examples -td is the target output for training example d -od is the output of the linear unit for training example d This error is the half the squared difference between the target output td and the linear unit output od summed over all training examples.
  • 23. Visualizing the Hypothesis Space • To understand the gradient descent algorithm, it is helpful to visualize the entire hypothesis space of possible weight vectors and their associated E values. • Here the axes w0 and w1 represent possible values for the two weights of a simple linear unit. • The w0, w1 plane represents the entire hypothesis space. • The vertical axis indicates the error E relative to some fixed set of training examples.
  • 24. Gradient Descent (1/4) • Gradient descent search determines a weight vector that minimizes E by starting with an arbitrary initial weight vector, then repeatedly modifying it in small steps. • At each step, the weight vector is altered in the direction that produces the steepest descent along the error surface. • This process continues until the global minimum error is reached.
  • 27. 27 Gradient Descent Algorithm for training a linear unit (4/4) Gradient-Descent (training examples, η ) Each training example is a pair of the form <x, t> where x is the vector of input values, and t is the target output value. η is the learning rate. • Initialize each wi to some small random value • Until the termination condition is met, Do – Initialize each Δwi to zero. – For each <x, t> in training_examples, Do * Input the instance x to the unit and compute the output o * For each linear unit weight wi , Do Δwi ← Δwi + η (t – o) xi – For each linear unit weight wi , Do wi ← wi + Δwi
  • 28. Stochastic Approximation to Gradient Descent • Difficulties in Gradient Descent – Converging to local minimum – Slow – No guarantee to find global minimum • One variation on gradient descent is incremental gradient descent or stochastic gradient descent which updates the weights incrementally, following the calculation of the error for each individual example
  • 29. 29 Incremental (Stochastic) Gradient Descent (1/2) Batch mode Gradient Descent: Do until satisfied 1. Compute the gradient ▽ED [w] 2. w ← w - η ▽ED [w] Incremental mode Gradient Descent: Do until satisfied • For each training example d in D 1. Compute the gradient ▽Ed [w] 2. w ← w - η ▽Ed [w]
  • 30. 30 Incremental (Stochastic) Gradient Descent (2/2) Incremental Gradient Descent can approximate Batch Gradient Descent arbitrarily closely if η made small enough
  • 31. Differences between standard gradient descent and stochastic gradient descent Standard Gradient Descent Stochastic gradient descent The error is summed over all examples before updating weights Weights are updated upon examining each training example Summing over multiple examples requires more computation Less computation Falls into local minima Sometimes avoid falling into local minima
  • 32. Multilayer Neural Networks (Multilayer Perceptrons) • Single perceptrons can only express linear decision surfaces • Multilayer networks learned by the Back propagation algorithm are capable of expressing non linear decision surfaces
  • 33. Decision regions of a multilayer feedforward network
  • 34. Speech Recognition Task • It involves distinguishing among 10 possible vowels, all spoken in the context of h-d (hid, had, head, hood etc) • The input speech signal is represented by two numerical parameters (F1, F2)obtained from a spectral analysis of the sound. • The 10 network outputs correspond to 10 possible vowel sounds • The network prediction is the output whose value is highest.
  • 35. Differentiable Threshold Unit • Unit used as the basis for constructing multilayer networks – sigmoid unit which is very similar to perceptron but based on smoothed, differentiable threshold function. • Like perceptron, the sigmoid unit first computes a linear combination of its inputs, then applies a threshold to the result.
  • 36. 36 Sigmoid Unit σ(x) is the sigmoid function • The sigmoid function has the Nice property that its derivative is easily expressed in terms of its output. • Nice property: We can derive gradient decent rules to train • One sigmoid unit • Multilayer networks of sigmoid units → Backpropagation
  • 38. Backpropagation Algorithm • The backpropagation algorithm learns the weights for a multilayer network, given a network with a fixed set of units and interconnections. • It employs gradient descent to attempt to minimize the squared error between the network output values and target values for these outputs. • We are considering networks with multiple output units rather than single unit, we begin by redefining E to sum the errors over all of the network output units
  • 39. Notations/Extensions • An index is assigned to each node in the network, where a node is either an input to the network or the output of some unit in the network • Xij denotes the input from node i to unit j and wij denotes the corresponding weight • δn denotes the error term associated with unit n.
  • 40. 40 Backpropagation Algorithm for feedforward networks containing two layers of sigmoid units Backpropagation(training examples, η, nin , nout , nhidden ) Each training example is a pair of the form <x, t> where x is the vector of input values, and t is vector of target output values. η is the learning rate(.05) nin is the number of network inputs, nhidden is the number of units in the hidden layer, nout is the number of output units. The input from unit i to unit j is denoted by xij , and the weight from unit i to unit j is denoted
  • 41. • Create feed-forward network with nin inputs, nhidden hidden units and nout output units. • Initialize all weights to small random numbers. (between -0.05 and 0.05) • Until the termination condition is met, Do • For each training example, Do • Propagate the input forward through the network 1. Input the training example to the network and compute the output ou of every unit u in the network. • Propagate the errors backward through the network 2. For each output unit k calculate its error δk δk ← οk (1 - οk ) (tk - οk ) 3. For each hidden unit h, calculate its error term δh δh ← οh (1 - οh ) ∑ k ∈outputs wh,k δk 4. Update each network weight wi,j wi,j ← wi,j + Δwi,j where Δwi,j = η δi xi,j
  • 42. 42 More on Back propagation • Gradient descent over entire network weight vector • Easily generalized to arbitrary directed graphs • Will find a local, not necessarily global error minimum – In practice, often works well (can run multiple times) • Often include weight momentum α to speedup convergence Δwi,j (n) = η δj xi,j + α Δwi,j (n - 1)
  • 43. 43 Learning Hidden Layer Representations (1/2) A target function: Can this be learned??
  • 44. 44 Learning Hidden Layer Representations (2/2) A network: Learned hidden layer representation:
  • 45. An illustrative Example: Face recognition • To illustrate some of the practical design choices involved in applying backpropagation – face recognition task • Learning task – Classifying camera images of faces of various people in various poses. – Images of 20 different people were collected, including approximately 32 images per person, varying the person’s expression (happy, sad, angry, neutral), the direction in which they were looking.(left, right, straight ahead, up) and whether or not they were wearing sunglasses – Other variations • Background behind the person • The clothing worn by the person • Position of the person’s face within the image
  • 46. Target Functions • A variety of target functions can be learned from this image data. • Given an image as input, we could train an ANN to output the identity of the person, the direction in which the person is facing, the gender of the person, wearing sun glass or not etc. • Consider the learning task as – Learning the direction in which the person is facing (to their left, right, straight, upward)
  • 47. 47 Neural Nets for Face Recognition • 90% accurate learning head pose, and recognizing 1-of-20 faces
  • 48. 48 Learned Hidden Unit Weights • Each output unit has four weights – dark(-ve), light (+ve) Blocks • Leftmost block – weight w0 which determines unit threshold • Right 3 blocks – weights on inputs from three hidden units
  • 49. Design Choices • Input Encoding • Output Encoding • Network graph structure • Other learning algorithm parameters
  • 50. Input Encoding • Preprocess the image to extract edges, regions of uniform intensity or other local image features, then input these features to the network. • This leads to variable number of features (edges) per image, whereas the ANN has a fixed number of input units. • The pixel intensity values ranging from 0 to 255 are linearly scaled to 0 to1.
  • 51. Output Encoding • ANN must output one of four values indicating the direction in which the person is looking • We could encode this four-way classification using single output unit, assigning outputs of 0.2, 0.4, 0.6 and 0.8 to encode these four possible values. • Instead use four distinct output units, each representing one of four possible face directions, with the highest valued output taken as the network prediction. • This is called 1-of-n output encoding • Obvious choices – To encode a face looking to left 1,0,0,0 – To encode a face looking straight 0,1,0,0 • Target output vector – 0.9,0.1,0.1,0.1,0.1
  • 52. Network Graph Structure • Backpropagation can be applied to any acyclic directed graph of sigmoid units. • Design choice here is, how many units to include in the network and how to interconnect them. • Standard structure is two layers of sigmoid units (one hidden layer and one output layer)
  • 53. Other learning algorithm parameters • Learning rate =0.3 Often include weight momentum α to speedup convergence Δwi,j (n) = η δj xi,j + α Δwi,j (n - 1) • Momentum = 0.3
  翻译: