Similar presentations:
Neural networks
1. Kazan National Research Technical University named after A.N. Tupolev German-Russian Institute of Advanced Technologies (GRIAT)
NEURAL NETWORKSby Dr. Igor Anikin
2. Table of contents
The basic concepts of neural networksSingle layer neural networks
Artificial neural networks.
The structure of an artificial neuron.
Activation functions.
Basic paradigms of neural networks.
Fundamentals of learning and training samples.
Using neural networks in practice
Rosenblatt's single layer perceptron.
Learning single layer neural networks.
Associative memory and its realization on single layer neural networks.
Using single layer neural networks for pattern recognition and time
series forecasing.
Multilayer perceptrons
The structure of multilayer perceptrons
Back propagation of error.
Using multilayer perceptrons for pattern recognition and time series
forecasing.
3.
Self-organizing mapsThe principle of unsupervised learning.
Kohonen self-organizing maps.
Learning Kohonen networks.
Practical using of Kohonen networks
Recurent neural networks
Neural networks with feedback.
Hopfield neural network.
Hamming neural network.
Training Hopfield and Hamming neural networks.
Practical using of Hopfield and Hamming neural networks.
Training and Testing
Training error and testing error.
4. References
1.2.
3.
4.
David Kriesel.
Neural
A
brief Introduction
networks
to
//
http://www.dkriesel.com/en/science/neural_net
works
Raul Rojas. Neural Networks. A Systematic
Introduction
//
http://www.inf.fu-berlin.de/inst/ag-ki/rojas_
home/documents/1996/NeuralNetworks/neuron.pdf
.
L.P.J. Veelenturf. Analysis and Application of
Artificial
Neural
Networks
//
http://www.ru.lv/~peter/zinatne/ebooks/Anal
ysis%20and%20Applications%20of%20Artificial
%20Neural%20Networks.pdf
Artificial Neural Networks – Methodological
Advances and Biomedical Applications //
5.
The basic concepts ofneural networks
6. Questions for motivation discussion
What tasks are machines good at doing thathumans are not?
What tasks are humans good at doing that
machines are not?
What tasks are both good at?
What does it mean to learn?
How is learning related to intelligence?
What does it mean to be intelligent?
Do you believe a machine will ever been
intelligent?
If a computer were intelligent, how would
you know?
7. Types of learning
Knowledge acquisition from expert.Knowledge acquisition from data:
Supervised learning – the system is supplied
with a set of training examples consisting of
inputs and corresponding outputs, and is
required to discover the relation or mapping
between them.
Unsupervised learning – the system is
supplied with a set of training examples
consisting only of inputs. It is required to
discover what appropriate outputs should be.
8. Artificial Neural Network
An extremely simplified model of thehuman’s brain
Transforms inputs into the best outputs
(some neural networks are the universal
function approximators).
9. Artificial Neural Networks
Development of Neural Networks date back to the early1940s.
It experienced an upsurge in popularity in the late 1980s
due to discovery of new techniques of NN training.
Some NNs are models of biological neural networks and
some are not, but historically, much of the inspiration for the
field of NNs came from the desire to produce artificial
systems capable of sophisticated, perhaps intelligent,
computations similar to those that the human brain
routinely performs, and thereby possibly to enhance our
understanding of the human brain.
Most NNs have some sort of training rule. In other words,
NNs learn from the examples (as children learn to recognize
dogs from examples of dogs) and exhibit some capability for
generalization beyond the training data.
10. ANN vs Computers
Computershave
programmed
to
be
explicitly
Analyze the problem to be solved.
Write the code in a programming language.
Neural networks learn from the examples
No requirement of an explicit description of the problem.
No need for a programmer.
The neural computer adapts itself during a training
period, based on examples of similar problems even
without a desired solution to each problem. After
sufficient training the neural computer is able to relate
the problem data to the solutions, inputs to outputs, and
it is then able to offer a viable solution to a brand new
problem.
11. ANN vs Computers
Digital ComputersDeductive Reasoning. We
apply known rules to input
data to produce output.
Computation
is
centralized, synchronous,
and serial.
Memory is literally stored,
and location addressable.
Not fault tolerant. One
transistor goes and it no
longer works.
Exact.
Static connectivity.
Applicable if well-defined
rules
accessible
with
precise input data.
Neural Networks
Inductive
Reasoning. We use
given input and output data
(training examples) to make a
reasoning.
Computation
is
collective,
asynchronous, and parallel.
Memory
is
distributed,
internalized, short term and
content addressable.
Fault tolerant, redundancy, and
sharing of responsibilities.
Inexact.
Dynamic connectivity.
Applicable if rules are unknown
or complicated, or if data are
noisy or partial.
12. Biological neuron
13. Biological neuron
Many“neurons”
co-operate
perform the desired function
Basic elements:
Axon
Dendrite
Synapse
to
14. Artificial Neuron Structure
The output of a neuron is a function of theweighted sum of the inputs plus a bias
n
S x j w j b,
j 1
y f (S )
15. Common activation functions
16.
17. Examples of ANN topologies
Single layer ANNMultilayer ANN
ANN with one recurrent layer
18. Fundamentals of learning and training samples
The weights in a neural network are themost important factor in determining its
function.
A training set is a set of training patterns,
which we use to train our neural net.
Training is the act of presenting the
network with some sample data and
modifying
the
weights
to
better
approximate the desired function
19. Fundamentals of learning and training samples
There are two main types of trainingSupervised Training
Supplies the neural network with inputs and the
correct outputs (results).
We can estimate a error vector for certain input.
Response of the network to the inputs is measured.
The weights are modified to reduce the difference
between the actual and desired outputs
Unsupervised Training
The training set only consists of input patterns.
The neural network adjusts its own weights so that
similar inputs cause similar outputs. The network
identifies the patterns and differences in the inputs
without any external assistance
20. Fundamentals of learning and training samples
A training pattern is an input vector pwith the components x1, x2, . . . , xn
whose desired output is known.
By entering the training pattern into
the network we receive an output that
can be compared with the desired
output.
The set of training patterns is called P.
It contains a finite number of ordered
pairs (p, t) of training patterns with
corresponding desired output t.
21. Fundamentals of learning and training samples
Teaching input. Let j be an output neuron.The teaching input tj is the desired and
correct value j should output after the input of
a certain training pattern.
Analogously to the vector p the teaching
inputs t1, t2, . . . , tn of the neurons can also be
combined into a vector t. This vector always
refers to a specific training pattern p and
contained in the set P of the training patterns.
22. Fundamentals of learning and training samples
Error vector. For several output neuronsΩ1,Ω2, . . . ,ΩO the difference between
output vector and teaching input under a
training input p is referred to as error
vector.
t1 y1
E p ...
t y
O
O
23. Fundamentals of learning
Let P be the set of training patters. Inlearning procedure we realize finite
number of iterations or epochs.
Epoch – single presentation of the
entire data to the neural network.
Typically many epochs are required to
train the neural network
Iteration - the process of providing the
network with an single input and
updating the network's weights
24. General learning procedure
Let P be the set of n training patters pnFor i=1 to n
begin
1.
2.
We calculate NN output vector yi for the training
pattern pi.
We compare yi with desired output ti. Then we
calculate the error of output and make modification
of weights.
end
3.
If total error for the training set P more
than some threshold then go to the step 2
25. Using training samples
We have to divide the set of training samplesinto two subsets:
one training set really used to train;
one verification set to test our progress of learning.
The usual division relations are, 70% for
training data and 30% for verification data
(randomly chosen).
We can finish the training process when the
network provides the good results on the
training data as well as on the verification
data.
26. Learning curve
The learning curve indicates the progressof the error, which can be determined in
various ways. This curve can indicate
whether the network is progressing or not.
27. Error measurement
Let Ω be the output neuron and O be theset of output neurons.
The specific error Errp is based on a single
training sample.
The total error Err is based on all training samples.
Err
Errp
p P
28. When do we stop learning?
Generally, the training process isstopped when the user in front of the
learning computer "thinks" the error is
small enough.
29. Using neural networks in practice (discussion)
Classificationin marketing: consumer spending pattern classification
In defence: radar and sonar image classification
In medicine: ultrasound and electrocardiogram image
classification, EEGs, medical diagnosis
Recognition and identification
In general computing and telecommunications : speech, vision and
handwriting recognition
In finance: signature verification and bank note verification
Assessment
In engineering: product inspection monitoring and control
In defence: target tracking
In security: motion detection, surveillance image analysis and
fingerprint matching
Forecasting and prediction
In finance: foreign exchange rate and stock market forecasting
In agriculture: crop yield forecasting
In marketing: sales forecasting
In meteorology: weather prediction
30.
Single layer neural networks31. Single layer network with binary threshold activation function
ny j F S j F wij xi T j
i 1
w11
w21
W
...
w
n1
Matrix form
S W T X T
w12
w22
...
wn 2
... w1m
... w2 m
... ...
... wnm
32. Single layer network with binary threshold activation function
1, S 0y
0, S 0
S w11 x1 w21 x2 T1
w11 x1 w21 x2 T1 0
x2
T1 w11
x1
w21 w21
33. Practice with single layer neural network
1.2.
Performing a calculations in single layer
neural networks with using direct and matrix
form. Using various activation functions.
Using single layer neural networks with
binary threshold activation function as linear
classifier. Adjusting the linear classifier
based on training samples.
34. Hebbian learning rule
-Introduced by Donald Hebb in his 1949 book “The Organizationof Behavior”.
-Describes a basic mechanism for synaptic plasticity
wij t 0 0, i, j
wij t 1 wij t xi y j , where t time
S w11 x1 w21 x2 T
w11 t 1 w11 t x1 y1
w21 t 1 w21 t x2 y1
T t 1 T t y1
35. Hebbian learning rule (matrix form)
x112
x1
X
...
L
x1
x12
x22
...
x2L
... x1n
... xn2
, where X i x1i ,..., xni input pattern
... ...
L
... xn
S XW
Y F (S )
W X T Y hebbian learning rule
36. Practice with hebbian learning rule
Construction the neural network basedon hebbian learning rule for modeling
OR logical operator
37. Delta rule (Widrow-Hoff rule)
1. The delta rule is a gradient descent learning rule forupdating the weights of the inputs to artificial
neurons in single-layer neural network
2. The goal is to minimize the error between the actual
outputs and the target outputs in the training data
3. For each (input/output) training pair, the delta rule
determines the direction you need to adjust wij to
reduce the error for that training pair.
4. Derivatives are used for teaching
38. Delta rule (Widrow-Hoff rule)
ADALINE (ADAptive LINear Element) networkn
y1 w j1 x j T
j 1
L
1 L k k 2
E E (k ) ( y1 t )
2 k 1
k 1
1
E (k ) ( y1k t k ) 2
2
39. Delta rule (Widrow-Hoff rule)
Gradient descent method: findthe steepest
way
down
the
slope from where you are, and
take a step in that direction
E (k )
w j1 (t 1) w j1 (t )
w j1 (t )
E
E y1k
k
( y1k t k ) x kj
w j1 (t ) y1 w j1
E (k )
T (t 1) T (t )
T (t )
E
E y1k
k
( y1k t k )
T (t ) y1 T
40. Delta rule algorithm
1.2.
3.
4.
5.
Define 0<a<1 and Emin
Initialize the weights with some
small random value
Take input pattern and calculate
output vector.
Modify weights and bias according
delta rule.
Do steps 3-4 until E<Emin
41. Linear classifiers
42. Practice with delta rule
Construction the ADALINE neuralnetwork
(linear
classifier
with
minimum error value) based on given
training patterns.
43. Rosenblatt's single layer perceptron
The perceptron is an algorithm forsupervised classification of an input
into one of several possible nonbinary outputs.
It is a type of linear classifier.
Was invented in 1957 by Frank
Rosenblatt as a machine for image
recognition.
44. Rosenblatt's single layer perceptron
1, s 0f (s)
1, s 0
Learning rule
45. Rosenblatt's learning algorithm
1.2.
3.
4.
5.
Initialise the weights and the threshold.
Weights may be initialised to 0 or to a
small random value.
Take input pattern x from X and
calculate output vector y from Y.
If yi=tj then wij will not change.
If yi≠tj then wij(t+1) = wij (t) + xi tj
Do steps 2-4 until yi=tj for whole
training set
46. Rosenblatt's single layer perceptron
Itwas
quickly
proved
that
perceptrons could not be trained to
recognize many classes of patterns.
It is linear classifier. For example, it
is impossible for these classes of
network to learn an XOR function.
47. Practice with Rosenblatt's perceptron
Construction the linear classifier (Rosenblatt’s neuralnetwork perceptron) based on given training patterns.
48. Associative memory
Associative memory (computer science) - adata-storage device in which a location is
identified by its informational content rather
than by names, addresses, or relative
positions, and from which the data may be
retrieved. This memory enable one to retrieve
a piece of data from only a tiny sample of itself.
Associative memory (psychology) - recalling
a previously experienced item by thinking of
something that is linked with it, thus invoking
the association
49. Associative memory
Autoassociativememories
are
capable of retrieving a piece of data
upon presentation of only partial
information from that piece of data
Heteroassociative
memories
can
recall an associated piece of datum
from one category upon presentation
of data from another category.
50. Autoassociative memory based on sign activation function
Neuralstructure:
Activation function
network
1, s 0
f (s)
1, s 0
Number of neurons
in the input layer =
Number of neurons
in the output layer
Learning rule
(adopted hebbian rule)
W XT X
Example:
51. Practice with autoassociative memory
Realization of the associative memorybased on sign activation function.
Working with multiple patterns.
Recognition of the original and noisy
patterns.
Investigation of the properties and
constraints of the associative memory
based on sign activation function.
52. Using single layer neural networks for time series forecasting
A time series points, measuredin time spaced
intervals
sequence of data
typically at points
at uniform time
53. Using single layer neural networks for time series forecasting
Training samplesx ( 2)
x(1)
x(3)
x ( 2)
X
...
...
x(m p ) x(m p 1)
x( p )
... x( p 1)
...
...
... x(m 1)
...
x( p 1)
x( p 2)
Y
...
x ( m)
54. Practice with time series forecasting
Using ADALINE neural networks forcurrency forecasting:
Creation the training set from the raw
data (www.val.ru).
Learning the ADALINE.
Training ADALINE network with using
delta rule and estimation the error.
55. Multilayer perceptron
56. Multilayer perceptron
A multilayerperceptron (MLP)
is
a feed
forward artificial neural network model that maps
sets of input data onto a set of appropriate
outputs.
Consists of multiple layers (input, output, one or
several hidden layers) of nodes in a directed
graph, with each layer fully connected to the next
one.
Neurons with a nonlinear activation function.
Utilizes
a supervised
learning technique
called backpropagation of error.
Typical structure
57. Multilayer perceptron
Structure (2 hidden layers)Calculation the output Y for input vector X
58. Multilayer perceptron
Activation function is not a thresholdFunction approximator
Usually a sigmoid function
Not limited to linear problems
Information flows in one direction
The outputs of one layer act as inputs to
the next layer
59. Classification ability
A single layer network can only finda linear discriminant function.
It can divide the input space by
means of hyperplane (straight lines
in two-dimensional space)
60. Classification ability
Universal Function Approximation TheoremMLP with one hidden layer can approximate
arbitrarily closely every continuous function that
maps intervals of real numbers to some output
interval of real numbers
f:[0,1]n->[0,1]
2n+1 neurons in hidden layer.
Can form single convex
decision regions
One hidden layer is sufficient
for the large majority of problems
61. Classification ability
Any function can be approximated to arbitraryaccuracy by a network with two hidden layers
MLP with two hidden layers can classify sets of
any form. It can form arbitrary disjoint decision
regions
62. Backpropagation algorithm
D. Rumelhart, G. Hinton, R. Williams (1986)Most common method of obtaining the
weights in the multilayer perceptron
A form of supervised training
The basic backpropagation algorithm is
based on minimizing the error of the
network using the derivatives of the error
function
Backpropagation of error generalizes the
delta rule
63. Basic steps
Forward propagation of a trainingpattern's input through the neural
network in order to generate the
propagation's output activations.
Backward propagation of the
output’s error through the neural
network using the training pattern
target in order to generate the deltas
of all output and hidden neurons.
64. Backpropagation
65. Backpropagation
We use gradient descent method forminimizing the error
66. Backpropagation
Theorem. For any hidden layer i of the neuralnetwork, error of the neuron i calculates by
recursive way through the errors of neurons of
the next layer j.
m
i j F ( S j ) wij
j 1
where m – number of neurons in the next layer j
wij – weights between neuron i and neurons in the
next layer j
Sj – weighted sum for the neuron j in next layer.
Proof
67. Backpropagation
Theorem. We can calculate derivatives of errorE through the weights w and bias T by
following way.
Proof
68. Backpropagation
Backpropagation rule69. Backpropagation algorithm
the training speed (0< <1) anddesired minimal error Em
2.Initialize the weights and biases by random
way.
3.Take consequently all input patterns x from X.
1.Define
y j F ( vector
wij yi T j ) y by following way
Calculate output
i
Realize backpropogation shceme by following
way
ij
j
i
Modify ijweights
and j biases
by following way
w (t 1) w (t ) F ( S ) y
T j (t 1) T j (t ) j F ( S j )
70. Backpropagation algorithm
4. Calculate overall error for all patterns1 L
E ( y kj t kj ) 2
2 k 1 j
5. If E>Em then go to the step 3.
71. Practice. Calculation delta-rule expressions for various activation functions
72. Some problems
The learning rate is importantToo small
Convergence extremely slow
Too large
May not converge
The result may
converge to
a local minimum.
Possible decision:
Using adaptive
learning rate
73. Some problems
OverfittingThe number of hidden neurons is very important, it defines the
complexity of the decision boundary:
Too few
Underfit the data – it does not have enough free
parameters to fit the training data well.
Too many
Overfit the data – NN learns the insignificant details
Try different number and use validation set to choose the
best one.
Start small and increase the number until satisfactory results
are obtained.
74.
What constitutes a “good” trainingset?
Samples must represent the general
population
Samples must contain members of each
class
Samples in each class must contain a
wide range of variations or noise effect
75. Practice with multilayer perceptron
1.2.
Using MLP for noisy digits
recognition &
Using MLP for time series
forecasting.
- Training set preparation.
- MLP learning in Deductor software.
- Estimation the error.
76. Recurrent neural networks
Capable to influence to themselvesby means of recurrences, e.g. by
including the network output in the
following computation steps.
Hopfield neural network
Hamming neural network
77. Hopfield network
1. Invented by John Hopfield in 1982.2. Content-addressable memory with binary threshold nodes (-1,1 or 0,1)
3. wij=wji, wii=0
yi t 1 F w ji y j t Ti
j 1
j i
Y t 1 F S t ; S t W T Y t T
S S1 ,..., S n Y y1 ,..., yn
T
T T1 ,..., Tn
w11
w
W 21
...
wn1
T
T
w12
w22
...
wn 2
... w1n
... w2 n
wii 0
... ...
... wnn
78. Hopfield network
79. Hopfield network as associative memory
80. Using hopfield network as associative memory
y1 y112 2
y
y
Y 1
... ...
L
y L y1
y12
y22
...
y2L
y1n
2
... yn
... ...
L
... yn
...
y i {0;1}
Hebbian rule
W (2Y I )T (2Y I ) I
1 1 1
L 0 0
where I 1 1 1 I 0 L 0
1 1 1
0 0 L
81. Hopfield network as associative memory
1.2.
Take noisy pattern y
Realize iterations
yi (t 1) F w ji y j (t )
j
1, S 0
sign( S )
0, S 0
3.
Until we will not reach stable state
(attractor)
82. Example
83. Practice with Hopfield network
Realization of the associative memorybased on Hopfield Neural Network
Working with multiple patterns.
Recognition of the original and noisy
patterns.
Investigation of the properties and
constraints of the associative memory
based on Hopfield network.
84. Hamming network
R. Lippman (1987)Hamming network is two-network bipolar classifier. The first
layer is single-layer perceptron. It calculates hamming distance
between the vectors. The second network is Hopfield network.
85. Hamming network
X 1 x11 ,..., x1n X 2 x12 ,..., xn2 X m x1m ,..., xnmwij xij / 2 , T j n / 2
yj dj
d j Hamming distance between input pattern and j stored pattern
1, if k j
vkj
, e const ,0 e 1 / m
e, if k j
S j , S j 0
z j F S j
0, S j 0
86. Hamming network working algorithm
Define weights wij, TjGet input pattern and initialize
Hopfield weights
Make iterations in Hopfield network
until we get stable output.
Take output neuron with 1 value.
87. Self-organizing maps
88. Self-organizing maps
Unsupervised TrainingThe training set only consists of input
patterns.
The neural network adjusts its own weights
so that similar inputs cause similar outputs.
The network identifies the patterns and
differences in the inputs without any
external assistance
89. Self-organizing maps (SOM)
A self-organizing map (SOM) is a type ofartificial neural network that is trained using
unsupervised learning to
produce
a
lowdimensional
(typically
two-dimensional),
discretized representation of the input space of
the training samples, called a map.
Self-organizing maps are different from other
artificial neural networks in the sense that they
use a neighborhood function to preserve the
topological properties of the input space.
The model was first described as an artificial
neural network by the Finnish professor Teuvo
Kohonen.
90. Self-organizing maps
We only ask which neuron is active at themoment.
We are not interested in the exact output of the
neuron but in knowing which neuron provides
output.
These networks widely used for clustering
SOMs (like our brain) decide the task of
mapping
a
high-dimensional
input
(N
dimensions) onto areas in a low-dimensional
grid of cells (G dimensions).
91.
92. Scheme of training of self-organizing map
93. Competitive learning
Competitive learning is a form of unsupervisedlearning in artificial neural networks, in which nodes
compete for the right to respond to a subset of the
input data
S j wij xi W j X T
i
where X x1 ,..., xn input pattern
W j w1 j ,..., wnj
winner take all rule
if S k max S j then
j
1, if j k
y j F S j
0, if j k
94. Competitive learning
Dj X Wjx1 w1 j 2 ... xn wnj 2
Neuron winner Dk min D j
j
Wk (t 1) Wk (t ) X (t ) Wk (t )
95. Vector quantization
It works by dividing a large set ofpoints (vectors) into groups having
approximately the same number of
points closest to them. Each group is
represented by its centroid point, as
in k-means and
some
other clustering algorithms.
96. Vector quantization
Choose random weights from[0;1].
t=1
Take all input patterns Xl,l=1,L
D lj X l W j
pattern
recognition
x1 w1 j 2 ... xn wnj 2
Neuron winner Dkl min D lj
j
Applications:
data compression
Video codecs
wij (t 1) wij (t ) (t ) xi wij (t ) , j k
QuickTime
wij (t 1) wij (t ), j k
(t ) 1 / t
Cinepak
Indeo etc.
t=t+1
Audio codecs
Ogg Vorbis
TwinVQ
DTS etc.
97. Kohonen Maps
98. Kohonen maps
Neightborh ood function for neuron winnerh p,k,t e
u k u p
2
2 2 ( t )
W p (t 1) W p (t ) t h p, k , t X (t ) W p (t ) for winner neuron
99. Kohonen maps learning procedure
1.Choose random weights from [0;1].
2.
t=1
3.
Take input pattern Xl and calculate Dij=(Xl-Wij),where i,j=1,m
4.
Detect winner neuron D(k1,k2)=min(Dij)
5.
Calculate for every output neuron
uk u p
h p,k,t e
6.
2
2 2 ( t )
Modify weights by following way
W p (t 1) W p (t ) t h p, k , t X (t ) W p (t ) for winner neuron
Repeat steps 3-6 for all input patterns
100. Training and Testing
101. Training
The goal is to achieve a balancebetween correct responses for the
training patterns and correct
responses for new patterns.
102. Training and Verification
The set of all known samples isbroken into two independent sets
Training set
A group of samples used to train the neural
network
Testing set
A group of samples used to test the
performance of the neural network
Used to estimate the error rate
103. Verification
Provides an unbiased test of the qualityof the network
Common error is to “test” the neural
network using the same samples that
were used to train the neural network.
The network was optimized on these
samples, and will obviously perform well on
them
Doesn’t give any indication as to how well
the network will be able to classify inputs
that weren’t in the training set
104. Summary (Discussion)
Artificial neural networks are inspired by the learningprocesses that take place in biological systems.
Artificial neurons and neural networks try to imitate
the working mechanisms of their biological
counterparts.
Learning can be perceived as an optimisation
process.
Biological neural learning happens by the
modification of the synaptic strength. Artificial neural
networks learn in the same way.
The synapse strength modification rules for artificial
neural networks can be derived by applying
mathematical optimisation methods.
105. Summary
Learning tasks of artificial neural networks canbe reformulated as function approximation
tasks.
Neural networks can be considered as nonlinear
function approximating tools (i.e., linear
combinations of nonlinear basis functions),
where the parameters of the networks should be
found by applying optimisation methods.
The optimisation is done with respect to the
approximation error measure.
In general it is enough to have a single hidden
layer neural network (MLP or other) to learn the
approximation of a nonlinear function.