Abstract
In
this work we considered artificial neural network as a method of solution for solving constrained and
unconstrained optimization problems. Constrained optimization problems are defined as the mathematical representation of real world problems
concerned with the determination of a minimum or a maximum of a function of
several variables, which are required to satisfy a number of constraints. Such function
optimization are sought in diverse fields, including mechanical, electrical and
industrial engineering, operational research, management sciences, computer
sciences, system analysis, economics, medical sciences, manufacturing, social
and public planning and image processing.
Our
approach for solving optimization
problem is using artificial neural networks. We considered in particular
gradient based neural networks which works well for unconstrained problems. For
constrained optimization problems we introduced the Lagrange neural network for
equality constrained problems and also the recurrent network model for more
general constrained problems. We were able to show that Neural Networks are
capable of solving nonlinear programming problems, and in particular they
provide better insight into the nature of solutions when conventional methods
find it cumbersome .The eminent merit of neural network computation is that it
is able to compute the optimal solution during the dynamical transient motion
and it provides real time on line solution.
CHAPTER ONE
GENERAL OVERVIEW OF OPTIMIZATION THEORY
1.0
Introduction
Optimization
refers to the art and science of allocating scarce resources to the best
possible effect. Optimization techniques
are called into play every day in question of industrial planning, allocation,
scheduling, decision-making, etc. For example, how does a global petroleum
refiner decide where to buy crude oil, where to ship it for processing, what products to convert it
to, where to sell those products, and at
what prices? A maximum-profit
optimization model is used to solve this problem. How does an airline know how
to route its planes and schedule its crews at
minimum cost while meeting constraints on airplane flight hours between
maintenance and maximum flight time for
crews? New optimization techniques are arriving daily, often stimulated by fascinating insights from other
fields. Genetic algorithms, for example, use an analogy to chromosome encoding
and natural selection to evolve a good optimized solution.
Artificial
neural networks, on the other hand process information the way biological
nervous systems do, such as the brain, for optimal decision making. Certain
characteristics of their architecture
and the way they process information
makes them superior to conventional techniques on certain class of optimization problems.
Optimality
criteria form the foundations of mathematical programming both theoretically
and computationally. In general, these criteria can be classified as either
necessary or sufficient. Naturally, one would like to have the same criterion
be both necessary and sufficient; however, this occurs only under somewhat
ideal conditions, which are rarely satisfied in practice.
1.1 Optimization problems
Optimization problems are defined as the mathematical representation of
real world problems concerned with the determination of a minimum or a maximum
of a function of several variables, which are required to satisfy a number of
constraints. Such function optimization are sought in diverse fields, including
mechanical, electrical and industrial engineering, operational research,
management sciences, computer sciences, system analysis, economics, medical
sciences, manufacturing, social and public planning and image processing.
Although many classical optimization algorithms such as simplex,
Karmarkar interior point, direct and indirect techniques are given to solve
linear, quadratic and nonlinear optimization problems, in many applications, it
is desire to have real-time on-line solutions of corresponding optimization
problems. However, traditional optimization algorithms are not suitable for
real-time on-line implementation on the computer. The dynamical system approach
is one of the promising approaches that can handle these difficulties.
In the recent years many artificial neural networks models developed to
solve optimization problems. Several basic and advance questions associated
with these models have motivated the studies presented in this work.
theoretical areas of interest include fundamental methods, models and
algorithms for solving general optimization problems using artificial recurrent
neural networks. On the other hand, we will also present and discuss the numerical
analysis for the corresponding models, simulations and applications of
recurrent neural networks that solve various practical optimization problems.
Recurrent dynamical neural network is an area of neural networks which
is one of the fundamental topics of the subject, and combines many mathematical
concepts like ordinary and partial differential equations, dynamical systems,
unconstrained and constrained optimization, local and global optima for a
function of several variables, sigmoid functions, error estimation, integration
and gradient descent methods. Here we must compute the optimal solution for the
constrained optimization problem with objective function of several variables
that corresponds with the solution of a system
of ordinary differential equations. From mathematical point of view the
convergence of the solution and stability of the method has quiet
importance,
The troublesome problem of just what numerical optimization analysis is
arises in recurrent dynamical neural network, as it does in other branches of
the field. Should the optimization analysis part be the main aim, or is it the
generation of an efficient, tested and validated program which is important?
The answer is surely that both areas are important, but at the end of the day
numerical analysis and mathematical techniques are some service industry and
what the customers want is reliable codes to solve their problems. The
theoretical analysis forms part of the reliability assessment, as it determines
bounds on errors and levels of stability. These error bounds form the basis of
a theoretical justification for the solution convergence of the corresponding
numerical algorithm to the actual solution of the
original neural network model.
1.2 Constrained optimization
In this section, we shall first consider an important class of
constrained linear programming problems and their general dual form. Second, we
shall introduce primal and dual form of a constrained convex quadratic
programming problem. Then we will consider the nonlinear convex programming
problems. This, as we shall see, leads to discovering some primal-dual
relationships that exists for corresponding class of constrained optimization
problems.
1.3 Problems
involving the Lagrange Multipliers
The
problem considered in this section is the following.
Problem
A Min
such
that here , .
The functions and , are assumed to be twice
continuously differentiable. Many methods for solving the above problems have
been published (Arrow & Solow, 1958; Bazaraam & Goode, 1972; Fletcher
& Lill, 1971; Miele, el al, 1972). Generally, they are based on one of the
two main procedures that follow.
1) The first is the
Lagrange multiplier technique, where the constraints are adjoined to the
function by means of multipliers to
form a new function:
Generally
called the Lagrangian of the problem. The problem is then reduced to finding the saddle point of in the space, and
thus the dimension of the problem is increased from n to.
2) The other basic
approach is the penalty function method (Mc Kormick, 1967; Powell, 1972). In
this case a function including the constraints in a proper manner is adjoined
to the original function, for example
where
c is a positive real-valued
parameter. Under very mild conditions, the solution of
tends
to the solution of
such that as
It
turns out that the penalty function method is not very attractive from a
numerical point of view, since the functions created become very badly
conditioned for numerical optimization.
1.4
Notation
For the purpose of notation, as usual
the symbol is the differentiation operator with respect to the
vector i.e.,
The symbol represents the
operator applied twice, thus is the matrix whose element is . Subsequently, will denote , will denote ; for a parametrized function, will represent the derivative with respect to the
parametrizing variable, i.e. . By the term “differentiable” we mean continuously
differentiable. The rank of a matrix A
will be denoted by.
1.5
Optimality Condition for Equality constraint problems
In this section, with reference to
problem (A), we are interested in the conditions in which and the
constraint sets will be satisfied at an optimum.
The following theorem generalizes the
concept of Lagrange multipliers.
1.6
The Theorem of Lagrange
The Theorem of Lagrange provides a
powerful characterization of optima of equality-constrained optimization
problems in terms of the behavior of the objective function and the constraint
functions at these points.
1.7
Theorem (Lagrange)
Let and be function, . Suppose is a local maximum
or minimum of on the set , where is open. Let, then there exists a vectorsuch that
where ’s are called the Lagrangian multipliers associated
with the local optimum, is the rank of
the gradient vector that is, the
rank of
We defined the Lagrangian function as follows:
The function L is called the Lagrangian associated with the problem (A). With
this function, we get the first order necessary equations as follows:
1.8
Definition.
Let . We say that is a regular
point of if the
vectors of are linear
independent.
In matrix form, is a regular
point of if the Jacobian
matrix of g at , denoted by has rank where .
1.9 Remark:
1) If a pair satisfies the twin conditions that and we will say that meets the first
order conditions of the Theorem of
Lagrange, or meets the first
order necessary conditions in equality constrained optimization problems.
2) The Theorem of
Lagrange only provides necessary conditions for local optima at ,
and that, only for those local optima which
also meet the condition that.
These conditions are not asserted to be sufficient, meaning that, the theorem
does not imply that if there exist such that,
and ,
then must either be a local maximum or a local
minimum, even if also meets the rank condition.
3)
1.10
Sufficiency conditions for the Lagrangian
The sufficiency conditions for the
Lagrangian method is stated as follows;
Define:
where:
and
The matrix is called the Bordered Hessian matrix (BH-matrix)
(Taha, 1972), Having computed the stationary point for the Lagrangian function and the BH- matrix evaluated at, then is;
(i)
A
maximum point if, starting with the principal major determinant of order ,
the last principal minor determinants of form an alternating sign pattern with .
(ii) A minimum point
if, starting with the principal minor determinant of order ,
the last principal minor determinants of have the sign of .
1.11 Remark:
The above conditions are sufficient for
identifying an extreme point, thus a stationary point may be an extreme point
without satisfying these conditions.
Other conditions exist that are both
necessary and sufficient for identifying extreme points. The major disadvantage
is that the procedure may prove computationally complex. However with a
spreadsheet such as MathCAD (Mathcad User´s guide), this difficulty is easily
overcome. For this purpose we define a variant of the BH-matrix as;
Evaluated at the stationary point, where is an unknown parameter. We now consider the
determinant ; then each of the real roots of the polynomial
must
be
(i)
Negative
if is a maximum point
(ii) Positive if is a minimum point
1.12 Fundamentals of
Biological Neural Networks
The biological neural network consists of nerve cells (neurons) as in
Fig. 1.1,
which are interconnected as in Fig. 1.2. The cell body of the neuron,
which includes the neuron's nucleus is where most of the neural computation
takes place.
Fig.1.1. A
biological neural cell (neuron).
Neural activity passes from one neuron to another in terms of electrical
triggers which travel from one cell to the other down the neuron's axon, by
means of an electrochemical process of voltage-gated ion exchange along the
axon and of diffusion of neurotransmitter molecules through the membrane over
the synaptic gap. The axon can be viewed as a connection wire. However, the
mechanism of signal flow is not via electrical conduction but via charge
exchange that is transported by diffusion of ions. This transportation process
moves along the neuron's cell, down the axon and then through synaptic
junctions at the end of the axon via a very narrow synaptic space to the
dendrites and/or some of the next neuron at an average rate of 3 m/sec.
Fig.1.2
Interconnection of biological neural nets.
Figures1.1 and 1.2 indicate that since a given neuron may have several
(hundreds
of) synapses, a neuron can connect (pass its message/signal) to many
(hundreds of)
other neurons. Similarly, since there are many dendrites per each
neuron, a single
neuron can receive messages (neural signals) from many other neurons. In
this
manner, the biological neural network interconnects [Ganong, 1973].
It is important to note that not all interconnections, are equally
weighted. Some
have a higher priority (a higher weight) than others. Also some are
excitory and
some are inhibitory (serving to block transmission of a message). These
differences are effected by differences in chemistry and by the existence of
chemical transmitter and modulating substances inside and near the neurons, the
axons and in the synaptic junction. This nature of interconnection between
neurons and weighting of messages is also fundamental to artificial neural
networks (ANNs).
A simple analog of the neural element of Fig. 1.1 is as in Fig. 1.3. In
that analog,
which is the common building block (neuron) of every artificial neural
network, we observe the differences in weighting of messages at the various
interconnections (synapses) as mentioned above. Analogs of cell body, dendrite,
axon and synaptic junction of the biological neuron of Fig. 1.1 are indicated
in the appropriate parts of Fig. 1.3. The biological network of Fig. 1.2 thus
becomes the network of Fig.1.4.
Fig.1.3
Schematic analog of a biological neural cell.
fig. 1.4.
Schematic analog of a biological neural network.
1.13 Artificial neural networks
Artificial neural networks consist of a calculation unit called neuron.
Every neuron has some real valued inputs. Inside every neuron, each input is
multiplied with corresponding neural coefficient defining its value. The sum of
all these products adds to a value called bias. Finally, activation function
affects this sum and determines the real valued output of the neuron feed
forwardly [19] or by some feedback [20].
1.14 Feed forward back propagation neural networks
Primary discussions regarding artificial neural networks introduced in
the 40's with
presentation of the feed forward neural networks. Artificial neural
networks in some extents are modeled from the brain and neural system of the
human, which are able to give acceptable solutions based on correct information
records from the problem.
The basic structure for the feed forward back propagating neural network
(nets without feedback) consists of some number of nodes in the input layer,
the hidden layer, and the output layer that has one node. The sigmoid functions
approximate linear functions, yet allow the update scheme to propagate
backwards through differentiable functions. The manner in which input data
generates output data for a given neural network depends on the interconnection
weights. These weights are adjusted to reduce the error between the neural network
outputs and the actual output values. i.e.
(1.1)
where is the actual
output for the ith training point. is the
estimated value from the neural network for the ith training
point from the neural network. n is the total number of training points
obtained by taking known data points for a given task. Here the objective is to
train the network so that the output from the network minimizes equation (1.1).
1.15 Perceptron
The
Perceptron, is possibly the earliest neural computation model, first conceived dby
F. Rosenblatt, and dates back to 1958. The Perceptron serves as a building
block to more complicated neural network
models, such as the multilayer perceptron (MLP) or the radial basis function model. The Perceptrron
possesses the fundamental structure shown below.
We observe that
the output of the perceptron can be written;
Or in vector
notation
1.16 The Perceptron
training rule
The perceptron training rule weight update according to
with , and is the target
value,
is perceptron output, is a small constant called the learning rate.
1.17
Manual Computations
In this section we consider manual
computations of a neural network consisting of two inputs and a bias as shown;
ACTIVATION FUNCTIONS
The weights and are specified
as;
The output is then computed as;
Where can be any of the activation function shown above.
For the Sigmoid activation function,, .
For the
Hyperbolic Tangent activation function, ;
.
1.18 Recurrent dynamical artificial neural network
Khanna in year 1990 [21], describes associative memory as "the
ability to get from one internal representation to another or to infer a
complex representation from a portion of it".
Effectively our goal in applying neural networks is to create a
functional mapping from steady optimization space to either dynamical time
dependent space or some parameter space. Two approaches to achieving this
mapping have been extensively studied by Xia [14], [15] and [22] to [25], Malek
[4], [16], [26] and [27] and their coauthors. The first approach relies on a
structure with adjustable parameters. On the basis of known input/output pairs,
these parameters are selected or changed. If this approach is successful, the
appropriate selection of these parameters will yield a mapping device which
will always provide the associated output values for a given input. The second
approach uses information from the primal and dual optimization problem and applied
primarily by Malek in year 2005 [16]. The basis for such systems is a precisely
defined set of ordinary differential equations that automatically satisfy the
related primal and dual optimization problems simultaneously. These information
are defined by the cumulative designing the system and are laid out in a
hierarchical fashion. The system then performs a sequential set of values,
using the output from the previous as the input to the
next. If successful, a system can be created which will associate input
with its correlated output. The challenge is to make the system complete enough
(consistent, convergent and stable) to always associate the correct output with
a given input.
The primary difference in these two approaches is that adjustable
parameters in the first are a prior, i.e., the parameters are settled upon and
maintained before data is introduced into the system. The second approach has
no adjustable parameters thus its model is simple to use. The advantage of this
approach is that in this way, we can obtain a solution for the given real life
problem, however we wish to assume a prior knowledge of relationships between
constrained optimization problem and dynamical system. Moreover the solution for
optimization problem consists of a solution for real life problem, since
optimization problem is simulated from the corresponding real life problem. The
work presented in this section applies recurrent dynamical artificial neural
network. We shall emphasize on networks that do not use network parameters or
penalty parameters in advance. This approach is a metric driven method. i.e.,
we establish distance between the input and the neural network output. For a
given input, the neural network outputs the value whose distance from the given
input is smallest using linear constraint least square technique or any other
related method. One manner of doing this mapping is to associate the
equilibrium points of a dynamical system with the optimal points of constraint
optimization problem. When the input is the initial condition of the dynamical
system, the system will converge to an equilibrium point. Thus this optimal
solution contains a solution that minimizes equation (1.1), where we use the
feed back process to produce corresponding optimal weights. This means that the
artificial neural network structure is recurrent.
The structure of the recurrent dynamical artificial neural network is
different from the feed forward artificial neural network. However it is
possible to make some corresponding relations between these two neural networks
(see Rumelhart 1986, [28]). i.e., there is a sense in which the error back
propagation scheme may be applied to networks that contain feedback, (see Fig. 1.5).
The feed forward network in Fig.1.5 may be represented to simulate feedback
network with a given set of weight and bias parameters. Having developed the
equivalent structure as shown in Fig. 1.6, it becomes proper to say "the
goal for recurrent dynamical artificial neural network, as with the back
propagation artificial neural network, is to minimize the error function given
by equation (1.1).
Training of dynamical neural networks has received considerable
attention in the last 30 years [20], [29] and [30]. The equations governing the
behavior of the simplest supervised recurrent dynamical neural network are
(1.2)
(1.3)
(1.4)
where
is the
sigmoid function. The adjustable parameters in this supervised recurrent
dynamical neural network are found in the A, B matrices and
vector C. The input x of the neural network corresponds to the
input data associated with a training point. This input is then applied to the
system governed by equation (1.2). When equation (1.3) reaches an equilibrium value
u* for this input, we obtain the output of the neural network by taking
the dot product of C and u* by equation (1.4). This neural
network output will then compare with the actual output. To update the elements
of A, B, and Cone may use gradient descent method using
and
This minimization task requires that the neural network possess enough
parameter freedom to enable each input set to generate an output close to the
actual value. This is not a case in many problems. Thus in the next section we
emphasize on the unsupervised recurrent dynamical artificial neural networks.
Fig. 1.5 Equivalent structures of a two unit network;
Feed forward network, and feed-back network for a given biases b1 and b2
and weights 1 w and 2 w .
1.17 Networks dynamic analysis
For many times dependent cost functions an online optimizer on the basis
of an analog circuit [31], [32] and [33]) is desirable. Dynamic solvers or
analog computer, was first proposed by Dennis [34], Rybashov [35] and [36],
Karpinskaya [37], and later studied by Kenedy and Chua [38], Rodriguez-Vazquez
et al. [39], Tank and Hopfield [31]. These dynamic solvers usually employ
neural networks since they have many advantages over the traditional
algorithms. Massively parallel processing and fast convergence are two of the
most important advantages of the neural networks.
CHAPTER TWO
REVIEW OF
RELATED LITERATURE
2.0 Introduction
Optimization
theory can be described as the procedure for obtaining the optimum (i.e.
maximum or minimum) of a given real life problem, represented in mathematical
form. This may be a problem of
maximizing profit for a production outfit or minimizing cost for a given
transportation agenda. In his work, “Optimality production in a local
industry,” Ukwuma P.C. (1999) used a linear programming optimization method to
show that the problem of high cost of production with attendant low or no
profit at all can be solved by generating an optimal production batches for the
different brands of drinks produced by the company.
Optimization
models play a central role, among Company Managers, Engineering Designers and
Financial Portfolio Managers as they use it for decision-making. It’s use cuts
across many disciplines like economics, physics, biology and many more…
According
to Bertsekas, 2009, the Lagrange multipliers method and its other versions are
widely used by Engineers, Scientists and Economics.
Developments
after the second world war and other related world events stimulated much interest
among Engineers, Economists, Business Executives, Mathematicians and others… It
motivated men to think of the cheapest way to achieve their aims or satisfy
their wants. This thinking and the interesting mathematical analysis brought
prominent scholars such as Stephen Boyd, Lieven Vandenberghe, Philip. Loewen,
Yongwei Huang, Rangranjan K Sundram, Krause Kuhn and Tucker among many others
to a thinking pattern and the associated mathematical analysis gave birth to
what is now known and referred to as Optimization theory.
In
the late eighteenth century Joseph Louis Lagrange(1736-1831) propounded his
elegant and powerful theorem on equality constrained optimization problems.
From here optimization theory and its application began to dominate fields such
as economics and engineering, as well as other related fields.
2.1 Basic Principles of ANNs and Their Early
Structures
2.1.1 Basic Principles of ANN Design
The basic principles of the artificial neural networks (ANNs) were first
formulated
by McCulloch and Pitts in 1943, in terms of five assumptions, as
follows:
1)
The
activity of a neuron (ANN) is all-or-nothing.
2)
A
certain fixed number of synapses larger than 1 must be excited within a given interval
of neural addition for a neuron to be excited.
3)
The
only significant delay within the neural system is the synaptic delay.
4)
The
activity of any inhibitory synapse absolutely prevents the excitation of the neuron
at that time.
5)
The
structure of the interconnection network does not change over time.
By assumption (1) above, the neuron is a binary element.
Whereas these are probably historically the earliest systematic
principles, they
do not all apply to today's state-of-the-art of ANN design.
The Hebbian Learning Law (Hebbian Rule) due to Donald Hebb (1949) is
also
a widely applied principle. The Hebbian Learning Law states that:
When an axon of cell A is near-enough to excite cell B and when it
repeatedly
and persistently takes part in firing it, then some growth process or
metabolic change takes place in one or both these cells such that the efficiency
of cell A [Hebb, 1949] is increased" (i.e. | the weight of the
contribution of the output of cell A to the above firing of cell B is
increased).
The Hebbian rule can be explained in terms of the following example:
Suppose
that cell S causes salivation and is excited by cell F which, in turn,
is excited by
the sight of food. Also, suppose that cell L, which is excited by
hearing a bell ring,
connects to cell S but cannot alone cause S to fire. Now, after repeated
firing of S by cell F while also cell L is firing, then L will eventually be
able to cause S to fire without having cell F fire. This will be due to the
eventual increase in the weight of the input from cell L into cell S. Here
cells L and S play the role of cells A, B respectively, as in the formulation
of the Hebbian rule above.
In a recent survey, Osman and Laporte [130] reported that while neural
networks are a very powerful technique for solving problems of prediction,
classification and pattern recognition, they “have not been as successful when
applied to optimization problems and are not competitive with the best
meta-heuristics from the operations research literature, when applied to
combinatorial optimization problems.” Researchers have been trying for over a
decade now to make neural networks competitive with meta-heuristics[145] such
as simulated annealing,[1, 2] tabulated search,[67, 68] constraint logic
programming,[180] and genetic algorithms,[46, 69] and they have experienced
varying degrees of success. Almost every type of combinatorial optimization problem
(COP) has been tackled by neural networks, and many of the approaches result in
solutions that are very competitive with alternative techniques in terms of solution
quality. Unfortunately, the reputation of neural networks for solving COPs does
not reflect this evidence.
The reasons are twofold. First, from within the first few years of
research, controversy and debate has surrounded the field, discouraging many
researchers: a state from which the field has never truly recovered. Second,
researchers are
forced to simulate the behavior of neural networks on digital computers
while awaiting the development of suitable hardware advances. These simulations
are designed to evaluate the potential of neural networks for generating
near-optimal
solutions to COPs, and naturally result in large CPU times that are uncompetitive
with alternative techniques. Further research into the design of suitable
hardware is contingent upon demonstrated success of neural network simulations.
Unfortunately, the many successful applications of neural networks will
not receive full merit until the reputation of neural networks has been
salvaged .The idea of using neural
networks to provide solutions to difficult NP-hard optimization problems[61,
125] originated in 1985 when Hopfield and Tank demonstrated that the Travelling
Salesman Problem (TSP) could be solved using a Hopfield neural
network.[87] This work is well-known, and their impressive results for the
solution of a TSP generated much excitement in the neural network and
operations research communities alike. Their work is also quite controversial because
many researchers have been unable to reproduce their results (due perhaps to
certain omissions in
Hopfield and Tank’s article[87] relating to simulation procedure, termination
criteria, etc.). Wilson and Pawley[192] first published these findings nearly
three years after Hopfield and Tank’s original article was published. In doing
so, they
raised serious doubts as to the validity of the Hopfield-Tank (H-T)
approach to solving optimization problems, which seemingly served to dampen the
enthusiasm surrounding the field. Since Wilson and Pawley’s results were
published,
it has been widely recognized that the H-T formulation is not ideal,
even for problems other than the TSP. The nature of the energy function that
the method utilizes causes infeasible solutions to occur most of the time. A
number of penalty parameters need to be fixed before each simulation of the
network, yet the values of these parameters that will enable the network to
generate valid solutions are unknown.
The problem of optimally selecting these parameters is not trivial, and
much work has been done to try to facilitate this process.[79, 96, 109] Many
other researchers believed that the H-T energy function needed to be modified
before any progress would be made, and considerable effort has also been spent
in this area.[21, 177] Perhaps the most important breakthrough in the field,
however, came from the valid subspace approach of Aiyer et
al.,[6] and the subsequent work of Gee.[63] By representing all of the
constraints by a single term, the feasibility of the Hopfield network can now
be essentially guaranteed.
The question of solution quality has also been addressed over the last
decade by various methods that attempt to avoid the many local minima of the
energy function. Many variations of the Hopfield network have been proposed and
can be broadly categorized as either deterministic or stochastic. The
stochastic approaches are perhaps more successfully able to improve solution
quality because they attempt to embed the principles of simulated annealing
into the Hopfield network to escape from local minima. Such attempts have
resulted in several alternative approaches including Boltzmann,[1, 4, 82]
Cauchy,[90, 167] and Gaussian Machines,[8] which are able to compete
effectively with other techniques in terms of solution quality. The other main
neural network approach to combinatorial
optimization is based on Kohonen’s Self-Organizing Feature Map.[101]
While much of the Hopfield network literature is focused on the solution of the
TSP, a similar focus on the TSP is found in almost all of the literature
relating to the use of self-organizing approaches to optimization.[11, 52, 56]
In this case, the reason is not simply because of the benchmark status of the
TSP, but more because the vast majority of these approaches are based upon the elastic
net method.[49] Kohonen’s
principles of self-organization[101] are combined with the concept of an
“elastic band” containing a circular ring of neurons that move in the Euclidean
plane of the TSP cities, so that the “elastic band” eventually passes through all
of the cities and represents the final TSP tour. Such approaches rely upon the
fact that the “elastic band” can move in Euclidean space, and that physical
distances between the neurons and the cities can be measured in the same space.
Self-organizing neural networks have been successfully applied to solve other
two-dimensional problems
such as vehicle routing[173, 174] and shortest path problems[164].
Recently, a generalization of the self-organizing map has been proposed to
solve generalized quadratic assignment problems, relieving the restriction to
two-dimensional problems, and has been applied to solve several optimization problems
arising from industrial situations.[74, 155, 158].
CHAPTER THREE
HOPFIELD
NEURAL NETWORKS
3.0 Introduction
Neural networks for unconstrained optimization were
developed based on the “Lyapunov stability theory” of dynamical systems: if a
network is “stable,” its “energy” will decrease to a minimum as the system
approaches its “equilibrium state.” If
one can properly set up a network that maps the objective function of an
optimization problem onto an “energy function,” then the solution is a natural
result of network convergence, and can be obtained at a very fast speed.
In the past two decades, artificial neural
networks or neural networks (NNs) have been widely developed in solving
mathematical programming (MP) or optimization problems. Among various NNs, a
couple of networks have been utilized for optimization (Looi, 1992; Kumar,
2005), such as Hopfield neural netwoks HNNs) (Hopfield, 1982), self-organizing
feature maps (Kohonen, 1982), and Boltzmann machines (Ackley et al., 1985). As
HNNs and general NN structure have been developed for almost all classes of
optimization problems, it is important for us to give a brief review of the techniques involved, the benefits and drawbacks.
Research studies on neurons and their
networks for information processing have drawn much attention by scholars in
the fields of physics, electronics, and electricity. They have tried to
formulate the problems mimicking the behavior of biological NNs to abstract principles
of computation employed by the brain. Hopfield and Tank made a momentous
contribution for neural computation by solving a traveling salesman problem
(TSP) and a linear programming (LP) problem (Hopfield, 1982, 1984; Hopfield and
Tank, 1985; and Tank and Hopfield, 1986).
Constrained optimization problems in
general assume that the minimization (or maximization) of some objective cost
(or benefit) function is subject to various constraints with independent
variables, and these problems are popular in various areas such as in science,
engineering, and business (Bazaraa et al., 2005, 2006).
Technically, the HNN approach to
optimization is to handle a dynamic system in which its energy function, or a
Lyapunov function , characterizes the behavior of the networks and represents
the problems to be solved. Its architecture can be realized by the use of an
electronic circuit, possibly on a VLSI circuit, as an on-line solution with a
parallel-distributed process (Cichocki and Unbehauen, 1993). These special
characteristics are beneficial for real-time optimization and have been applied
in many areas such as pattern recognition, scheduling, manufacturing, and
numerous business applications (Looi, 1992; Sharda, 1994; Smith and Gupta,
2000; Wong et al., 2000; Zhang and Huang, 1995). Due to renewed interests in
NNs by extending HNNs, various NNs have been proposed. For instance, Kennedy
and Chua (1988) extended the Tank and Hopfield’s work to solve a non-linear
programming (NLP) problem. Rodrıguez-Vazquez et al. (1988, 1990) presented
switched-capacitor NNs for solving LP and NLP problems. Later, Zhang and
Constantinides (1992) introduced a Lagrange NNs for solving general NLP
problems, and Cichocki and Unbehauen (1993) proposed the Lagrange multiplier
method based NN in solving the NLP problem. Xia et al. (2005) offered a primal
and dual method for solving linear and quadric programming problems.
3.1 General function and basic aspects for HNNs
Neural networks have been characterized in
various ways according to many relevant features (Basheer and Hajmeer, 2000; Sima
and Orponen, 2001). In general, NNs have two kinds of structures (Zurada,
1992). Both have to be configured such that the application of a set of inputs
produces (either direct or via a relaxation process) the desired set of
outputs, and various methods exist to set out the strengths of the connections.
One way is to place the weights explicitly, using a priori knowledge. The other
way is to train the neural network by feeding it teaching patterns and letting it
change its weights according to some learning rule. HNNs belong to non-training
model. In particular, HNNs are considered as feedback (or recurrent) without
learning from a pattern set (Zurada, 1992). The HNNs consist of a set of neurons
and a corresponding set of unit delays, forming a multiple loop feedback
system. Each feedback loops is equal to a neuron. Basically, the output of each
neuron is feedback, via a unit delay element, to each of the other neurons in
the system. HNNs are not affected by additional inputs, and each neuron is fully
connected with the remaining neurons by weights and always
updates the connection weights. The
associative memory or content- addressable memory is a memory organization that
accesses memory by its contents and it always searches for the closest matching
from all stored prototypes (Du and Swamy, 2006). This is analogous to the
storage of information in an associative memory as biological neurons (Kohonen,
1982). In other words, the process
of association and information retrieval
is simulated by the dynamical behavior of a highly interconnected system of
non-linear circuit elements with collective computational abilities (Hopfield, 1982).
Due to the above characteristics, HNNs are
easily realized for their capability of parallel computation as a result of
being a fully connected property. The networks use electric circuits to
simulate the actions of biological neurons, and the basic model can be implemented
by interconnecting an array or resistors, non-linear amplifiers with
symmetrical output, and external bias current as shown in Fig. 1. There are n
neurons, and each neuron is represented by a resistor gi, a capacitance ci,
and a non-linear amplifier with an activation function i = 1,2,. . . ,n,
respectively. The resistance–capacitance charging equation determines the rate change
of ui, i = 1,2, . . . ,n, where ui is
the input voltage of the amplifier. The input is mapped to the output voltage vi or (normal or invert) through the function i = 1,2,. . . ,n, respectively. The choice of connecting the normal or invert
output is dependent on the positive or negative value of the conductance or
weight wij, i = 1,2,. . . ,n, j = 1,2,. . . ,n. The input of each neuron comes from two
sources, external inputs Ii and inputs from other neurons with the
interconnection strength or weights wij
from neuron j to neuron i (Hopfield, 1982). In addition,
Hopfield has shown that the sufficient condition for a stable network is that
its synaptic weights are symmetric, i.e., wji
= wij with wjj = 0, and, i.e. the system
has a Lyapunov function (Hopfield, 1982). Here, the weights are derived from
the function, and the neurons stay transit according to the local information
(i.e., feedback) until a steady state is reached (Burke and Ignizio, 1992).
Fig. 3.1 Basic structure of
HN
Fig. 3.2 Continuous-time
Hopfield structure
Hopfield (1982) presented an energy
function to demonstrate its stability as local minimum of the energy function
corresponds to the energy of the stored patterns. According to the work of
Bruck and San (1988), the network can converge to a minimum from any initial
state. Thus, the characteristic is helpful for applications. In addition,
one popular NN for optimization, the
Boltzmann machine, is an extension of discrete HNN. It only replaces the
deterministic local search dynamics of the HNN by randomized local search
dynamics (Kumar, 2005; Kurita and Funahashi, 1996), but it has only been
applied to a small number of problems.
3.2 Continuous Hopfield networks
The continuous or
continuous-time HNN utilizes as common output functions in the network, the sigmoid
and hyperbolic tangent functions. Fig. 3.2 shows the common
circuit of the continuous-time HNN, where j = 1,. . . ,n, is the integration time constant of the jth neuron, and is an externally applied threshold. The
integrator can be realized by an operational amplifier, capacitor, and
resistor .
Here, is called the forgetting
factor of the integrator, which forces the internal signal to zero for a zero input. To formulate the
network, we establish a differential equation for the activation potential as
The output neuron is given by which is a continuous output function such as
a sigmoid function. Hopfield (1984) proposed an energy function for the
continuous HNN as;
where the function is a monotone
increasing function. Hopfield and Tank (1985), Tank and Hopfield (1986)
introduced the continuous HNN to solve the TSP and LP problems. Afterwards,
many researchers implemented HNN to solve the optimization problem, especially
in MP problems. Hence, the continuous model is our major concern. In general,
the continuous model is superior to the discrete one in terms of the local
minimum problem, because of its smoother energy surface. Hence, the continuous
HNN has dominated the solving techniques for optimization problems, especially for
combinatorial problems.
3.3 Methods of
Hopfield networks
HNN manipulates MP problems through
approximation (Hopfield and Tank, 1985). It emulates the behavior of neurons
through an electronic circuit. After the circuit is established with many
similar sets of components, a parallel computation can be realized for an
on-line solution. These special characteristics are beneficial for real-time
optimization and have been applied to solve optimization problems such as LP,
NLP, quadratic programming, MILP,
and multi-objective linear programming
(Bouzerdoum and Pattison, 1993; Cichocki and Unbehauen, 1993; Ham and Kostanic,
2001; Hopfield and Tank, 1985; Kennedy and Chua, 1988; Looi, 1992;
Rodrı´guez-Vazquez et al., 1988, 1990; Shih et al., 2004; Smith et al., 1998;
Smith, 1999; Wang, 1992, 1996; Xia, 1996). From the computational aspect, the
operation of HNN for an optimization problem manages a dynamic system
characterized by an energy function, which is the combination of the objective
function and constraints of the original problem. Because there are some
similarities between the function and the formulation of the NLP problem, many
common techniques of NLP can thus be exploited. In such a way, three common
techniques to handle NLP problems, i.e., penalty functions, Lagrange functions
(or augmented Lagrange functions), and primal and dual functions, are adapted.
Firstly, penalty functions use penalty parameters to combine the constraints
and the objective function, and then they construct an energy function to be
minimized. Secondly, Lagrange functions (or augmented Lagrange functions) take
advantage of Lagrange multiples to construct an energy function to be operated.
Thirdly, primal and dual functions are made up of a primal function and a dual
function and try to reach the minimum gap between the functions. These three
techniques are suitable for solving various MP problems such as LP, NLP, and
MILP. The following sections review the literature of HNNs from a technical
standpoint.
3.4 Penalty function methods
The penalty function method is a popular
technique for optimization in which it is used to construct a single
unconstrained problem or a sequence of unconstrained problems. A searching
method can be adopted to search for the solution in the decision space, and the
steepest descent approach is the popular technique for obtaining the searching
direction (Cichocki and Unbehauen, 1993).
In our survey, many NN approaches utilize
a rather basic penalty function to build the energy function and usually
converge to a stable point.
3.5 Summary
The Hopfield neural network (HNN) is one
major neural network (NN) for solving optimization or mathematical programming
(MP) problems. The major advantage of HNN is that its structure can be realized
on an electronic circuit, possibly on a VLSI (very large-scale integration)
circuit, for an on-line solver with a parallel-distributed process. The
structure of HNN utilizes three common methods, penalty functions, Lagrange
multipliers, and primal and dual methods to construct an energy function. When
the function reaches a steady state, an approximate solution of the problem is
obtained.
CHAPTER FOUR
MAIN RESULTS
ON GRADIENT
BASED NEURAL NETWORKS
FOR OPTIMIZATION
PROBLEMS
4.0 Introduction
Optimization problems arise in almost every field of engineering,
business and management sciences. In many engineering and scientific
applications, the real-time solutions of optimization problems are demanded.
However, traditional algorithms for digital computers may not be able to
provide the solutions on-line. Therefore, the search for real-time on-line
solutions in such cases becomes not only important but also essential. In
1980's, an attractive and very promising approach was introduced to provide
real-time solutions for optimization problems. The new approach is termed artificial
neural network (ANN). Hopfield and Tank [12, 13, 28] initiated the recent study
of neural networks in optimization. Generally speaking, ANN provides an
alternative and attractive way for the solution of optimization problems. The
significant and unique feature of ANN to optimization is the realization of
simple and real-time hardware (or circuit) implementation. In other words, an
electrical circuit can be constructed which generates the on-line solution of
certain optimization problems. The seminal work of Hopfield and Tank has
inspired many researches to investigate various neural networks for solving
linear and nonlinear programming problems. Numerous optimization neural
networks have been develop ed, see [3, 15, 24, 21, 22, 27, 1, 5, 2, 20, 34, 14,
16, 17, 18, 19, 30, 31, 33]. From the optimization point of view, most of the
existing ANN models for optimization problems could be divided into two
classes. One class is the gradient-based neural network models, which are used
for the unconstrained optimization problems. These problems normally come from
a) some kind of transformations from the constrained minimization problems with
penalty function methods [15, 24, 22, 2, 20]; and b) the complementarity
problems with the NCP functions [17, 18]. The other class is the pro jective
gradient based neural network models, which are derived from constrained
minimization problems and complementarity problems with KKT-conditions [34, 14,
19, 30, 31]. [30, 31] present some neural networks for solving linear,
quadratic and nonlinear convex programming problems, which are based on KKT
system for optimization problems. Their models correspond to some variants of
the pro jective gradient method for the complementarity problem and the variational
inequality problem [7, 23, 8, 9, 10, 11]. It should be noted that for
Lagrangian conditions with the nonnegative Lagrange multiplier, if the Lagrange
multiplier is penalized or transformed into unrestricted case, the resulting
neural network model belongs to the first class. On the other hand, if the
nonnegative Lagrange multiplier is enforced by projection, it belongs to the
second class.
4.1 Unconstrained
optimization
In this section we focus on the following class of unconstrained
optimization problem;
(4.1)
With the gradient-based dynamics,
(4.2)
where is the
objective (or energy) function, is the gradient of , is an arbitrary initial point and H is a symmetric positive definite matrix. For our purposes we
assume .
In
general a neural network consists of
a)
an
energy function
b)
a
dynamical system which is stable, and
c)
the
fact that the energy function should be monotonically decreasing along the
solution of the dynamical system.
4.2 Remark:
1.
Subsequently
we will consider equation (4.2) as the general gradient based neural network.
2.
The
stability analysis of the general gradient based neural network will be
considered. We note that stability analysis for various neural network models
has been addressed in the literature [15,34,21,1,20,30,2]. However, all these
discussions are based on Lyapunov’s
direct method which requires the existence of a Lyapunov function. This however
limits the stability analysis for a general neural network model. In this work
we introduce a new approach to address the stability issues of a general neural
network model.
4.3 Lyapunov's Direct Method
Consider
the continuous-time system
with
an equilibrium point at . This is a time-invariant system, since f does not depend explicitly on t. The stability analysis of the
equilibrium point in such a system is a difficult task in general. This is due
to the fact that we cannot write a simple formula relating the trajectory to
the initial state. The idea behind Lyapunov's “direct” method is to establish
properties of the equilibrium point (or, more generally, of the nonlinear
system) by studying how certain carefully selected scalar functions of the
state evolve as the system state evolves. The term direct is to contrast this approach with Lyapunov's indirect method which attempts to
establish properties of the equilibrium point by studying the behaviour of the
linearized system at that point.
Consider,
for instance, a continuous scalar function that is 0 at
the origin and positive elsewhere in some ball enclosing the origin, i.e. and for n this ball. Such a may be thought of as an “energy” function. Let denote the time derivative of along any trajectory of
the system, i.e. its rate of change as varies
according to (4.3). If this derivative is negative throughout the region
(except at the origin), then this implies that the energy is strictly decreasing
over time. In this case, because the energy is lower bounded by 0, the energy
must go to 0, which implies that all trajectories converge to the zero state.
4.4
Lyapunov Functions
4.4.1
Defnition
Let V
be a continuous map from to . We call a locally positive defnite
function around if
(i) ,
(ii) ; for some r.
Similarly,
the function is called locally positive semi-definite if the strict inequality
on the function in the second condition is replaced by. The function is locally negative definite if is locally positive definite, and locally negative
semi-definite if is locally positive semi-definite.
For
the continuous time case, we shall restrict ourselves to that have continuous first partial derivatives. We
shall denote the derivative of such a V with
respect to time along a trajectory of the system (4.3) by. This derivative is given by;
where is a row
vector, i.e., the gradient vector or Jacobian of V with respect to x,
containing the component-wise partial derivatives .
4.5 Definition (Lyapunov function)
Let V be a locally positive definite
function and let be its
derivative along trajectories of system (4.3). If is locally negative
semi-definite, then V is called a
Lyapunov function of the system (4.3).
4.5 Theorem (Lyapunov Theorem for Local Stability)
If
there exists a Lyapunov function of system (4.3), then is a stable equilibrium point in the sense of
Lyapunov. If in addition for some , i.e. if is locally
negative definite, then is an asymptotically
stable equilibrium point.
We have the following important definitions and Theorems
4.6 Theorem
Suppose s a continuous function. Then for arbitrary and there exists
a local solution satisfying to (4.3) for
some . If f is
locally Lipschitz continuous at then the solution is unique, and if f is Lipschitz continuous in then can be extended to
4.7 Definition (Equilibrium)
A point is called an equilibrium point of (2.3) if.
4.8 Definition (Stability in
the sense of Lyapunov)
Let be the solution of (4.3). An isolated equilibrium
point s Lyapunov stable if for any and any scalar there exists a such that if then for .
4.9 Definition (Asymptotic
Stability)
An isolated equilibrium point is said to be asymptotically stable if in addition
to being Lyapunov stable, it has the property that as , if.
4.10 Remark
We note that the above classical stabilities are only for isolated
equilibrium points. Non-isolated equilibrium points could occur inevitably in
the dynamical system for optimization problems. For example, consider , the points on the circle are all the
equilibrium points of (4.2). So it is necessary to extend the definitions of
various stabilities from isolated equilibrium points to non-isolated ones.
4.11 Definition (Connected
equilibrium set)
The
subset of the equilibrium set of the dynamical system (4.3) is called a connected
equilibrium set of S, if for any there exists a
continuous curve connecting and , and if there exists a positive number such that there
is no equilibrium point of (4.3) except itself in the
open neighbourhood of , where denotes the
distance from x to
4.12 Definition
(Stability for connected equilibrium set)
Let be the
solution of (4.3). The connected equilibrium set is stable if for any and any scalar there exists a such that if, then .
4.12 Definition
(Asymptotic stability for connected equilibrium set)
The connected equilibrium set is said to be
asymptotically stable if in addition to being stable, it has the property that as if .
4.13 Remark
We
observe that there is a simple fact that the objective function in (4.1) remains constant on the connected equilibrium
set of the system
(4.2)since on and is connected. In the above example, equals zero on
the equilibrium set .
The following
theorem provides some interesting properties for the autonomous system (4.3).
4.14 Theorem
a)
Given a solution of
(4.3), then for any real constant c,
the function is also a solution of (4.3).
b)
Through
any point passes at most one trajectory.
4.15 Remark
Unless otherwise stated, subsequently in
this work we will assume that the objective function in (4.1) is
bounded below and continuously differentiable. In addition, its gradient
function is Lipchitz continuous in , i.e. there exists a constant L (> 0) such that
4.16 Analysis for Gradient-Based Neural Network
The
analysis in this section is for, the identical matrix in (4.2), for simplicity. For the
case where H is symmetric positive definite
matrix other than the identity matrix, the analysis is similar with the norm;
and induced distance;
4.17 Remark
In the literature [15,34,21,1,20,30,2 ], the stability analysis for a
dynamical system is provided by Lyapunov’s direct method which requires the
existence of a Lyapunov function. Unfortunately however, a Lyapunov function
may not exist in general. To overcome this difficulty, we introduce a new
approach to address the stability issues of a dynamical system without
employing any Lyapunov function. Furthermore, the approach does not require
prior information on the equilibrium point or set at all. The new approach only
considers the energy function and the
associated dynamical system. Subsequently we will show how this works for the
general gradient-based neural network (4.1)-(4.2).
First we need the following preliminaries.
4.18 Theorem
Let in
(4.2) be Lipschitz continuous in and , then there is no periodic solution for the neural
network system (4.2).
Proof: From Theorem (4.14) there exists a unique solution satisfying Suppose there is a periodic solution and the minimal period is ,
i.e.
Since (/),
there exist a such that , then . Since is non-increasing on from , , so must be a
constant on which is a contradiction since .
4.19 Remark
From Theorem (2) and Theorem (3), we can conclude that the phase space
of (4.2) can only consist of;
·
Equilibrium
points
·
Non-intersecting
trajectories
4.20 Lemma
Let be the trajectory of the system (4.2), is any contour of , then is orthogonal
to at their
intersecting points.
Proof: Consider the phase plane of (4.2) as , where s
represents the parameter along the contour of in the phase
plane of (4.2). Since , then we know that has continuous
partial derivative with respect to s.
Then remains
constant on its contour, i.e.
we have
We also require the following lemma by Barbalat.
4.21 Lemma (Barbalat)
If a differentiable function t
has a finite limit as , and is uniformly continuous,
then as .
4.22 Theorem
If is bounded
below and its gradient is Lipschitz continuous in, then for any initial point , the
trajectory of the system
(4.2), satisfying , will converge to an equilibrium point of the neural
network (4.2) as .
4.23 Remark
The result of theorem (4) confirms the the energy
function E in (4.1) and the dynamical
system (4.2) constitute a neural network model. The result in theorem (4) also
shows that any trajectory of the neural network (4.2) for the optimization
problem (4.1) will converge to an equilibrium point of the neural network
(4.2), irrespective of the boundedness of level sets of the objective function
and the isolation of the equilibrium point.
Because , the energy function in (4.1) is monotonically decreasing in t if is not an equilibrium point. This fact reveals that
the energy function is not preserved following the trajectory of the
neural network system (4.2). Theorem (3) and (4) lead to the following
interesting results.
4.24 Theorem
Assume that in (4.1) is bounded below and its gradient is Lipschitz continuous in . Let be a connected equilibrium set of the neural network
(4.2), if s stable, then it is asymptotically stable.
4.25 Remark
The result of theorem (5) would facilitate the proof of both stability
and asymptotic stability in many neural networks for optimization problems. The
asymptotic stability is particularly important for the neural network (4.2) of
optimization problem (4.1), because only points in the asymptotically stable
connected equilibrium set are the optimal solutions of the problem (4.1). The
following theorem demonstrates the equivalence of the asymptotically stable
connected equilibrium set of neural network (4.2) and the optimal set of the problem
(4.1).
4.26 Theorem
Assume
that in (4.1) is bounded below and its gradient is Lipschitz continuous in . Let be a connected equilibrium set of neural network
(4.2). Then is
asymptotically stable if and only if each point of the set is an optimal
solution of the optimization problem (4.1).
We consider here the following
examples to illustrate the above results.
4.27 Example
(4.4)
The level sets of are all
unbounded unless it is empty. The graphical profile of is depicted below.
|
The level sets of are all
unbounded unless it is empty, the neural network for (4.4) is
(4.5)
The equilibrium set of (16) is the line , which is connected and unbounded. The solutions to
equations (4.5) are
The trajectories are lines
We observe that every trajectory approaches
the equilibrium point
as , which is on the line , every trajectory is orthogonal
to the line , and on the line .
4.28 Example
(4.6)
As a reference point we plot the graph of .
Fig. 4.1 Profile of
We note that the
maximum of occurs precisely at
The neural network for (4.6) is
Separating variables;
Unfortunately the integral on the left does not have a closed form
solution, so we resort to a numerical solution with MathCAD as follows.
MATHCAD SOLUTION
To solve the
first order ODE
Fig. 4.2
Solution
profile with initial value x0 = 1
Fig.4.3 Solution profile with initial value x0 = 2
Fig.4.4 Solution profile with initial value x0 = 3
The equilibrium set of (4.6) is , and we have
the following cases
·
If
(Fig 4.2)
·
If
(Fig 4.3)
·
If
(Fig 4.4)
Theorem 4.24 and the above examples show
clearly that the infinite point can be viewed as an extended ordinary point. If
the point is an isolated minimize of the objective function as in Example (4.28),
some trajectories will approach it. If the point
is not an isolated minimize as in Example
(4.27), there is no trajectory. With a finite initial point can approach it. We
also note that if the set is relaxed by the level sets for all , Theorem (4.24) and Theorem (4.26) still hold. For
instance in Example (4.28), but Lipschitz
continuous in any level set of , so the conclusions still hold as shown above.
4.29
Lagrange programming neural network for nonlinear optimization
The
Lagrange programming neural network (LPNN) was proposed by Shengwei Zhang and
A. G. Constantinides [1] from their research concerning analog computational
circuits. An analog computational circuit is usually constructed by a dense
interconnection of simple analog computational elements (neurons) and governed
by a set of differential equations. By using an analog computational circuit,
the optimization problem can be solved either by iterations using a digital
computer or by setting up the associated neural circuit and measuring the node
voltage after it settles down to a steady state.
The
Lagrange programming neural network is designed for general nonlinear
programming. It is based on the well-known Lagrange multiplier method for constrained programming. Instead of
following a direct descent approach of the penalty function, the network looks
for, if possible, a point satisfying the first-order necessary conditions of
optimality in the state space. Consider the following nonlinear programming
problem with equality constraints:
Min
such
that
where
and
are given
functions and. . The
components of g are denoted f and g are assumed to be twice
continuously differentiable.
The Lagrange
function is defined
by
where is referred
to as the Lagrange multiplier. Furthermore, we have
The first-order
necessary condition of optimality can be expressed as a stationary point of over x and , i.e.
The transient
behaviour of the neural network is defined by the following equations.
If the network
is physically stable, the equilibrium point described by;
which
obviously meets the first-order necessary condition of optimality and thus
provides a Lagrange solution. There are two classes of neurons in the network,
variable neurons x and Lagrangian
neurons , with regard to their contribution in searching for
an optimal solution. Variable neurons seek for a minimum point of the cost
function and provide the solution at an equilibrium point, while Lagrangian
neurons lead the dynamic trajectory into the feasible region determined by the
constraints.
4.30 Remark
The
disadvantage of the Lagrange neural network lies in that it handles equality
constraints only. Though in theory inequality constraints can be converted to
equality constraints by introducing slack variables, the dimension of the
neural network will inevitably increase, which is usually not deemed optimal in
terms of model complexity. In this sense, we need a neural network which can be
regarded as an extension of the Lagrange network.
4.31
The Augmented Lagrange Programming Neural Network
Another variation of the LPNN is the augmented Lagrangian programming neural network(ALPNN) which we
now introduce.
Consider the more general problem with inequality constraint.
Min
such that (4.7)
.
Thus . The functions and are assumed to be twice continuously differentiable.
4.32
Definition
Let be a vector satisfying the constraint conditions, then will denote the index set
(4.8)
If the gradients are linearly
independent, then is called a regular point.
Given that the
Lagrangian of the problem is defined by
(4.9)
There exists the
following Karush-Khun-Tucker Theorem .
4.33
Theorem
Let be a local minimum of problem (4.7) and assume that is a regular point. Then there exists a unique vector such that;
(4.10)
(4.11)
(4.12)
.
From the above theorem, we observe that,
if the multipliers are redefined as some positive function of original ones, , say, the nonnegative constraints imposed on the
multipliers are removed completely. This implies that it is no longer necessary
to convert inequality constraints into equality ones by slack variables in
order to reuse those results concerned merely with equality constraints in the
construction of Lagrange neural networks, and they are used directly[13]. The
Lagrange-type neural network that is conceived with this method has two
advantages over the original one: First, all the results on equality constraint
can be transplanted in their original forms, and be proved similarly with minor
modification; second, we may obtain the augmented Lagrangian functions with the
same smoothness as that of objective function and constraints, therefore, they
are more convenient to implement in circuits.
Define the augmented Lagrangian function
as:
(4.13)
where c is a positive penalty parameter. The aim is to construct a
continuous-time dynamical system (Neural Network) that will settle down to the
KKT pair of nonlinear programming problem. Here is such a system:
(4.14)
From the system defined above, we may get ’s analytical
expression
(4.15)
where assume that is a nonzero
initial value of . This expression
shows that, if x is outside the feasible region of problem
(i.e. there is at least an i such that ) , then the
corresponding multipliers will increase exponentially as time t increases. Thus the penalty terms containing those
multipliers will increase continuously with time until the constraints are
satisfied. This procedure will be repeated incessantly unless all the
constraints are satisfied. Hence x will eventually be conducted into the feasible region whereas the
multipliers associated with the inactive constraints approach to zero, and the
remaining multipliers to constant.
4.34 Analysis of stability and convergence
Second-Order Sufficient Conditions: Let be regular point
for problem. If there exists vector satisfying
(4.16)
(4.17)
.
and for every such that for every , it follows that;
(4.18)
In addition, satisfies the strict complementarity assumption
(4.19)
then is a strict local minimum of (4.7).
By straightforward calculation, the
gradient and Hessian matrix of the augmented Lagrangian function with respect to
x are respectively given as
and
Let and e as defined in the second order sufficient
conditions. There is, for any c,
(4.20)
and there exists
a by Lemma 0.4 such that;
Equations (14) and (15) show that is a strict
local minimum of the augumented Lagrangian function
We now
state without proof the following important proposition [ ].
4.35 Proposition
Let is the KKT pair
for problem. If the second-order sufficient conditions are satisfied, then
system (7) is locally asymptotically exponentially stable.
4.36 A Model for nonlinear programming
4.36.1 The Basic Problem Formulation:
Consider the following nonlinear convex programming problem subject to
linear equality and inequality constraints. We call it the primal problem:
(4.21)
Where x is an n-dimensional vector, is a continuously differentiable function on . , are coefficient matrices and are constant column vectors. The minimization problem
for the primal (1) is now written;
(4.22)
Where is the Lagrange
function and , are Lagrange
multipliers. According to the Karush-Tucker (KKT)[ ] condition, is a solution to (1) if and only if there exist such that satisfies the
following conditions;
(3)(4.23)
4.36.2
Neural Network Model
Here, we
introduce the novel recurrent network model [ ] for solving (4.21) and its dual
as follows;
(4)(4.24)
(5) (4.25)
(6)
(4.26)
Where k is some positive constant. The main
property of the above system is stated in the following theorem.
4.36.3 Theorem
If the solution of the neural network described by the differential
equations (4.24)-(4.26) converges to a stable
state then converges to the optimal solution of the primal
problem (4.21) and the Lagrange
multipliers converge to the
optimal solution of the dual of the convex programming problem(1).
Proof
Let be the ith
component of x, then equation (4) can
be written as;
(4.27)
(8)(4.28)
Condition (4.28)
is to ensure that x will be bounded
below by 0. Let be the limit of , i.e.;
(4.29)
By stability of
convergence, we have
From equations
(7) and (8) we have;
(4.30)
(4.31)
In other words;
(4.32)
(4.33)
Or
(4.34)
(4.35)
Similarly from
equations (5) and (6)
(4.36)
(4.37)
(4.38)
By KKT
conditions in (3) and conditions in (4.34)-(4.38) we have shown
that and are the optimal solutions for the problem (1) and its
dual problem respectively.
4.36.4
Remark
In chapter four
we will provide numerical examples to illustrate the foregoing.
In this section we now present a number of numerical
examples to illustrate the neural network ability to solve nonlinear
programming problems. We begin with examples on recurrent neural network
discussed in chapter 3.
4.37 Simulation Example for testing the Neural Networks
To demonstrate the behaviour and properties of the recurrent neural network
model, two examples are simulated by using the Euler method for time step
dt = 0.01 and n =1000. We consider in particular the following convex
nonlinear programming problem;
The solution of the above problem is the point and this can be
easily verified analytically. We have test the neural network model by
constructing and solving the proposed
differential equations (4.24)-(4.26)as follows. In
compact form, the required equations are;
(4.39)
(4.40)
(4.41)
Since, then
, and
It
then follows that;
Applying
the Euler scheme we get
Here
we take the time step h = k = dt
= 0.01, and j = 1,2,..., N – 1.
Or
The
Euler scheme can be written more compactly as,
(4.42)
h
= 0.01, and j = 1,2,..., N – 1.
In Table 4.1 we give various set of initial input values for the neural
network model and the corresponding optimal solutions.
Initial
primal variables
|
Initial
dual variables
|
Optimal solution
|
||||||
|
|
|
|
primal
|
dual
|
|
||
|
|
|
|
|
||||
1
|
– 1
|
– 1
|
1
|
0.0016
|
1.1225
|
–0.0021
|
0.05
|
1.26001137
|
– 1
|
1
|
1
|
– 1
|
0.0005
|
1.8843
|
0.0054
|
–0.665
|
3.55058649
|
1
|
2
|
– 3
|
1
|
0.00004
|
1.9621
|
0.01321
|
-0.732
|
3.84983641
|
Table 4.1
4.38 Discussion
For the first initial point the convergence of the Euler scheme after 60
iterations does not give a good value for the optimal solution. However the
subsequent initial values perform better at reaching the optimum value of the
objective function.
The above analysis has demonstrated the ability of neural networks to
solve both unconstrained and constrained mathematical programming problems,
which has been the major objective of this work.
CHAPTER FIVE
SUMMARY AND
CONCLUSION
5.1 Summary
Optimality
criteria form the foundations of mathematical programming both theoretically
and computationally.
On
the other hand constrained optimization problems are defined as the
mathematical representation of real world problems concerned with the determination
of a minimum or a maximum of a function of several variables, which are
required to satisfy a number of constraints. Such function optimization are
sought in diverse fields, including mechanical, electrical and industrial
engineering, operational research, management sciences, computer sciences, system
analysis, economics, medical sciences, manufacturing, social and public
planning and image processing.
In
this work we have been concerned with the problem of solving optimization
problems using artificial neural networks. We considered in particular gradient
based neural networks which works well for unconstrained problems. For
constrained optimization problems we introduced the Lagrange neural network for
equality constrained problems and also the recurrent network model for more
general constrained problems. We were able to show that Neural Networks are
capable of solving nonlinear programming problems, and in particular they
provide better insight into the nature of solutions when conventional methods
fail.
5.2 Conclusion
The gradient based neural network as well as the recurrent neural
network introduced in this work possess
many advantages compared to existing traditional algorithm for solving nonlinear programming problems. It converges
to the exact solutions of primal and dual problem without requiring any
parameter tuning. The model is simple to use. Another advantage of the model is
that it is very intuitive and can be explained in common sense without formal
mathematics.
References
[1] A. Cichocki, R. Unbehauen, Neural Networks for Optimization and
Signal Processing, John Wiley & Sons, Chichester-New
York-Brisbane-
Toronto-Singapore, 1993.
[2] L. O. Chua and G. N. Lin. “Non-linear programming without
computation”, IEEE Trans. Circuits and Systems, CAS-31, pp.182- 186,
1984.
[3] D. W. Tank and J. J. Hopfield. “Simple “neural” optimization
networks: an A/D converter, signal decision circuit, and a linear programming circuit”,
IEEE Trans. Circuits and Systems, CAS-33, pp.533-541, 1986.
[4] M. J. Smith and C. L. Portmann. “Practical design and
analysis of a simple “neural” optimization circuit”, IEEE Trans. Circuit and
Systems, Vol.36, pp.42-50, 1989.
[5] M. P. Kennedy and L. O. Chua. “Neural networks for non-linear programming”,
IEEE Trans. Circuit and Systems, Vol. 35, pp.554-562, 1988.
[6] W. E. Lillo, M. H. Loh, S. Hui and S. H. Z(ak. “On solving
constrained optimization problems with neural networks : a penalty method approach”,
Technical Report TR EE 91-43, School of EE, Purdue University, West Lafayette,
IN, 1991
[7] W. E. Lillo, S. Hui, S. Hui and S. H. Z ( ak. “Neural network for
constrained optimization problems”, International Journal of Circuit Theory
and Applications, vol. 21, pp.385-399, 1993.
[8] V.M. Mladenov, P.N. Proshkov, "Modelling and Simulation of
Continuous Neural Networks for Constrained Optimization Problems", 2nd
IMACS International Conference on: Circuits, Systems and Computers,
(CSC'98), Greece,
published in "Recent Advances in Information Science and
Technology", World Scientific, ISBN 981-02-3657-3, 386-393, 1998.
[9] Mladenov V.M., N. Mastorakis, "Design of 2-Dimensional
Recursive Filters by using Neural Networks", CSCC’99, Athens, Greece, "Computational
Intelligence and Applications", (N.Mastorakis Editor), WSES Press, Athens 1999,
pp. 12-20.
[10] V.M. Mladenov, N. Maratos, “Neural Networks for Solving Constrained
Optimization Problems”, CSCC’00, Athens, Greece, (N.Mastorakis Editor), WSES
Press, Athens 2000.
[11] D.G. Luenberger, Linear and Nonlinear Programming, Addison-Wesley,
1984.
[12] E. Polak, Optimization: Algorithms and Consistent
Approximations, Springer, 1997.
[13] A.F. Filippov, “Differential Equations with Discontinuous Righthand
Sides”, Kluwer Academic Publishers, Dordrecht, the Netherlands, 1988.
[14] Wang, J., 1993. Analysis and design
of a recurrent neural network for linear
programming. IEEE Transactions
on Circuits and Systems – I: Fundamental
Theory and Applications 40,
613–618.
[15] Wang, J., 1996. A
recurrent neural network for solving the shortest path problem. IEEE
Transactions on Circuits and Systems – I: Fundamental Theory and
Applications 43, 482–486.
[16] Wang, J., 1997. Primal
and dual assignment networks. IEEE Transactions on Neural Networks 8, 784–790.
[17] Wang, J., 1998. Primal
and dual neural networks for shortest-path routing. IEEETransactions on
Systems, Man, and Cybernetics – Part A: Systems and Humans 28, 864–869.
[18] Watta, P.B., Hassoun,
M.H., 1996. A coupled gradient network approach for static and temporal
mixed-integer optimization. IEEE Transactions on Neural
Networks 7, 578–593.
[19] Widrow, B., Rumelhart,
D.E., Lehr, M.A., 1994. Neural networks: Applications in industry, business and
science. Communication of the ACM 37, 93–105.
[20] Wilson, G.V., Pawley,
G.S., 1988. On the stability of the traveling salesman problem algorithm of
Hopfield and Tank. Biological Cybernetics 58, 63–70.
[21] Wilson, R.L., Sharda, R.,
1992. Neural networks. OR/MS Today 19, 36–42.
[22] Wong, B.K., Lai, V.S.,
Lam, J., 2000. A bibliography of neural networks in business: Techniques and
applications research: 1994–1998. Computers and Operations Research 27,
1045–1076.
[23] Wu, A.I., Tam, P.K.S.,
1999. A neural network methodology and strategy of quadratic optimization.
Neural Computing and Applications 8, 283–289.
[24] Wu, X.Y., Xia, Y.S., Li,
J., Chen, W.K., 1996. A high-performance neural network for solving linear and
quadratic programming problems. IEEE Transactions on Neural Network 7, 643–651.
[25] Xia, Y., 1996. A new neural network for
solving linear programming and its
application. IEEE Transactions
on Neural Networks 7, 525–529.
[26] Xia, Y., Wang, J., 1995.
Neural network for solving linear programming problem with bound variables.
IEEE Transactions on Neural Networks 6, 515–519.
[27] Xia, Y., Wang, J., 1998.
A general methodology for designing globally convergent optimization neural
networks. IEEE Transactions on Neural Networks 9, 1331–1334.
[28] Xia, Y., Feng, G., Wang,
J., 2005. A primal–dual neural network for online resolving constrained
kinematic redundancy in robot motion control. IEEE Transactions on Systems Man
and Cybernetics Part B – Cybernetics 35, 54–64.
[29] Yalcinoz, T., Short,
M.J., 1997. Large-scale economic dispatch using an improved Hopfield neural
network. IEE Proceedings Generation, Transmission and
Distribution 144, 181–185.
[30], S.H., Upatising, V.,
Hui, S., 1995. Solving linear programming problems with neural networks: A
comparative study. IEEE Transactions on Neural Networks 6, 94–104. The
MatheWorks: <http://www.mathworks.com/>.
[31] Zhang, X.S., 2000. Neural
Networks in Optimization. Kluwer Academic, London.
[32] Zhang, S.,
Constantinides, A.G., 1992. Lagrange programming neural networks. IEEE Transactions
on Circuits Systems 39, 441–452.
[33] Zhang, H.C., Huang, S.H.,
1995. Applications of neural networks in manufacturing: A state-of-the-art
survey. International Journal of Production Research 33, 705– 728.
[34] Zurada, J.M., 1992.
Introduction to Artificial Neural Systems. West, New York. U.-P. Wen et al. /
European Journal of Operational Research xxx (2008) xxx–xxx 13
ARTICLE IN PRESS