APPLICATION OF ARTIFICIAL NEURAL NETWORK IN OPTIMIZATION THEORY



 Abstract
In this work we considered artificial neural network as a method of solution for  solving constrained and unconstrained optimization problems.   Constrained optimization problems are defined as the mathematical representation of real world problems concerned with the determination of a minimum or a maximum of a function of several variables, which are required to satisfy a number of constraints. Such function optimization are sought in diverse fields, including mechanical, electrical and industrial engineering, operational research, management sciences, computer sciences, system analysis, economics, medical sciences, manufacturing, social and public planning and image processing.

Our approach  for solving optimization problem is using artificial neural networks. We considered in particular gradient based neural networks which works well for unconstrained problems. For constrained optimization problems we introduced the Lagrange neural network for equality constrained problems and also the recurrent network model for more general constrained problems. We were able to show that Neural Networks are capable of solving nonlinear programming problems, and in particular they provide better insight into the nature of solutions when conventional methods find it cumbersome .The eminent merit of neural network computation is that it is able to compute the optimal solution during the dynamical transient motion and it provides real time on line solution.

THIS IS A SAMPLE | WE ARE PROFESSIONALS IN WRITING
CHAPTER ONE
 GENERAL OVERVIEW OF OPTIMIZATION THEORY

1.0  Introduction
Optimization refers to the art and science of allocating scarce resources to the best possible  effect. Optimization techniques are called into play every day in question of industrial planning, allocation, scheduling, decision-making, etc. For example, how does a global petroleum refiner decide where to buy crude oil, where to ship it for  processing, what products to convert it to,  where to sell those products, and at what prices?  A maximum-profit optimization model is used to solve this problem. How does an airline know how to route its planes and schedule its crews at  minimum cost while meeting constraints on airplane flight hours between maintenance and  maximum flight time for crews? New optimization techniques are arriving daily, often  stimulated by fascinating insights from other fields. Genetic algorithms, for example, use an analogy to chromosome encoding and natural selection to evolve a good optimized solution.
Artificial neural networks, on the other hand process information the way biological nervous systems do, such as the brain, for optimal decision making. Certain characteristics of their  architecture and the way they process  information makes them superior to conventional techniques on certain class of  optimization problems.
Optimality criteria form the foundations of mathematical programming both theoretically and computationally. In general, these criteria can be classified as either necessary or sufficient. Naturally, one would like to have the same criterion be both necessary and sufficient; however, this occurs only under somewhat ideal conditions, which are rarely satisfied in practice.

1.1  Optimization problems
Optimization problems are defined as the mathematical representation of real world problems concerned with the determination of a minimum or a maximum of a function of several variables, which are required to satisfy a number of constraints. Such function optimization are sought in diverse fields, including mechanical, electrical and industrial engineering, operational research, management sciences, computer sciences, system analysis, economics, medical sciences, manufacturing, social and public planning and image processing.
Although many classical optimization algorithms such as simplex, Karmarkar interior point, direct and indirect techniques are given to solve linear, quadratic and nonlinear optimization problems, in many applications, it is desire to have real-time on-line solutions of corresponding optimization problems. However, traditional optimization algorithms are not suitable for real-time on-line implementation on the computer. The dynamical system approach is one of the promising approaches that can handle these difficulties.
In the recent years many artificial neural networks models developed to solve optimization problems. Several basic and advance questions associated with these models have motivated the studies presented in this work.
theoretical areas of interest include fundamental methods, models and algorithms for solving general optimization problems using artificial recurrent neural networks. On the other hand, we will also present and discuss the numerical analysis for the corresponding models, simulations and applications of recurrent neural networks that solve various practical optimization problems.
Recurrent dynamical neural network is an area of neural networks which is one of the fundamental topics of the subject, and combines many mathematical concepts like ordinary and partial differential equations, dynamical systems, unconstrained and constrained optimization, local and global optima for a function of several variables, sigmoid functions, error estimation, integration and gradient descent methods. Here we must compute the optimal solution for the constrained optimization problem with objective function of several variables that corresponds with the solution of a  system of ordinary differential equations. From mathematical point of view the
convergence of the solution and stability of the method has quiet importance,
The troublesome problem of just what numerical optimization analysis is arises in recurrent dynamical neural network, as it does in other branches of the field. Should the optimization analysis part be the main aim, or is it the generation of an efficient, tested and validated program which is important? The answer is surely that both areas are important, but at the end of the day numerical analysis and mathematical techniques are some service industry and what the customers want is reliable codes to solve their problems. The theoretical analysis forms part of the reliability assessment, as it determines bounds on errors and levels of stability. These error bounds form the basis of a theoretical justification for the solution convergence of the corresponding numerical algorithm to the actual solution of the
original neural network model.

1.2 Constrained optimization
In this section, we shall first consider an important class of constrained linear programming problems and their general dual form. Second, we shall introduce primal and dual form of a constrained convex quadratic programming problem. Then we will consider the nonlinear convex programming problems. This, as we shall see, leads to discovering some primal-dual relationships that exists for corresponding class of constrained optimization problems.


1.3 Problems involving the Lagrange Multipliers
The problem considered in this section is the following.
Problem A                             Min
such that   here ,  .
The functions and  are assumed to be twice continuously differentiable. Many methods for solving the above problems have been published (Arrow & Solow, 1958; Bazaraam & Goode, 1972; Fletcher & Lill, 1971; Miele, el al, 1972). Generally, they are based on one of the two main procedures that follow.
1)    The first is the Lagrange multiplier technique, where the constraints are adjoined to the function by means of  multipliers to form a new function:

Generally called the Lagrangian of the problem. The problem is then reduced to   finding the saddle point of  in the  space, and thus the dimension of the problem is increased from n to.

2)    The other basic approach is the penalty function method (Mc Kormick, 1967; Powell, 1972). In this case a function including the constraints in a proper manner is adjoined to the original function, for example

where c is a positive real-valued parameter. Under very mild conditions, the solution of

tends to the solution of

such that    as  
It turns out that the penalty function method is not very attractive from a numerical point of view, since the functions created become very badly conditioned for numerical optimization.

1.4 Notation
For the purpose of notation, as usual the symbol is the differentiation operator with respect to the vector i.e.,

The symbol  represents the operator applied twice, thus is the  matrix whose element is . Subsequently,  will denote will denote ; for a parametrized function, will represent the derivative with respect to the parametrizing variable, i.e. . By the term “differentiable” we mean continuously differentiable. The rank of a matrix A will be denoted by.

1.5 Optimality Condition for Equality constraint problems
In this section, with reference to problem (A), we are interested in the conditions in which  and the constraint sets will be satisfied at an optimum.
The following theorem generalizes the concept of Lagrange multipliers.

1.6 The Theorem of Lagrange
The Theorem of Lagrange provides a powerful characterization of optima of equality-constrained optimization problems in terms of the behavior of the objective function and the constraint functions at these points.

1.7 Theorem (Lagrange)
Let  and   be     function, . Suppose  is a local maximum or minimum of on the set , where  is open. Let, then there exists a vectorsuch that

where ’s are called the Lagrangian multipliers associated with the local optimum, is the rank of the gradient vector  that is, the rank of

We defined the Lagrangian function    as follows:
 
The function L is called the Lagrangian associated with the problem (A). With this function, we get the first order necessary equations as follows:

1.8 Definition.
Let . We say that  is a regular point of  if the vectors   of  are linear independent.
In matrix form,  is a regular point of  if the Jacobian matrix of g at , denoted by     has rank  where .
1.9 Remark:
1)    If a pair  satisfies the twin conditions that  and we  will say that   meets the first order conditions  of the Theorem of Lagrange, or   meets the first order necessary conditions in equality constrained optimization problems.
2)    The Theorem of Lagrange only provides necessary conditions for local optima at , and that, only for those local optima which also meet the condition that. These conditions are not asserted to be sufficient, meaning that, the theorem does not imply that if there exist such that, and , then  must either be a local maximum or a local minimum, even if also meets the rank condition.
3)     
1.10 Sufficiency conditions for the Lagrangian
The sufficiency conditions for the Lagrangian method is stated as follows;
Define:

where:
     and    
The matrix is called the Bordered Hessian matrix (BH-matrix) (Taha, 1972), Having computed the stationary point for the Lagrangian function and the BH- matrix evaluated at, then is;
(i)    A maximum point if, starting with the principal major determinant of order , the last  principal minor determinants of form an alternating sign pattern with .
(ii) A minimum point if, starting with the principal minor determinant of order , the last  principal minor determinants of have the sign of .

1.11 Remark:
The above conditions are sufficient for identifying an extreme point, thus a stationary point may be an extreme point without satisfying these conditions.
Other conditions exist that are both necessary and sufficient for identifying extreme points. The major disadvantage is that the procedure may prove computationally complex. However with a spreadsheet such as MathCAD (Mathcad User´s guide), this difficulty is easily overcome. For this purpose we define a variant of the BH-matrix as;

Evaluated at the stationary point, where is an unknown parameter. We now consider the determinant ; then each of the real roots of the polynomial

must be
(i)    Negative if is a maximum point
(ii) Positive if is a minimum point

1.12 Fundamentals of Biological Neural Networks
The biological neural network consists of nerve cells (neurons) as in Fig. 1.1,
which are interconnected as in Fig. 1.2. The cell body of the neuron, which includes the neuron's nucleus is where most of the neural computation takes place.




Fig.1.1. A biological neural cell (neuron).

Neural activity passes from one neuron to another in terms of electrical triggers which travel from one cell to the other down the neuron's axon, by means of an electrochemical process of voltage-gated ion exchange along the axon and of diffusion of neurotransmitter molecules through the membrane over the synaptic gap. The axon can be viewed as a connection wire. However, the mechanism of signal flow is not via electrical conduction but via charge exchange that is transported by diffusion of ions. This transportation process moves along the neuron's cell, down the axon and then through synaptic junctions at the end of the axon via a very narrow synaptic space to the dendrites and/or some of the next neuron at an average rate of 3 m/sec.

Fig.1.2 Interconnection of biological neural nets.

Figures1.1 and 1.2 indicate that since a given neuron may have several (hundreds
of) synapses, a neuron can connect (pass its message/signal) to many (hundreds of)
other neurons. Similarly, since there are many dendrites per each neuron, a single
neuron can receive messages (neural signals) from many other neurons. In this
manner, the biological neural network interconnects [Ganong, 1973].
It is important to note that not all interconnections, are equally weighted. Some
have a higher priority (a higher weight) than others. Also some are excitory and
some are inhibitory (serving to block transmission of a message). These differences are effected by differences in chemistry and by the existence of chemical transmitter and modulating substances inside and near the neurons, the axons and in the synaptic junction. This nature of interconnection between neurons and weighting of messages is also fundamental to artificial neural networks (ANNs).
A simple analog of the neural element of Fig. 1.1 is as in Fig. 1.3. In that analog,
which is the common building block (neuron) of every artificial neural network, we observe the differences in weighting of messages at the various interconnections (synapses) as mentioned above. Analogs of cell body, dendrite, axon and synaptic junction of the biological neuron of Fig. 1.1 are indicated in the appropriate parts of Fig. 1.3. The biological network of Fig. 1.2 thus becomes the network of Fig.1.4.

Fig.1.3 Schematic analog of a biological neural cell.


fig. 1.4. Schematic analog of a biological neural network.


1.13 Artificial neural networks
Artificial neural networks consist of a calculation unit called neuron. Every neuron has some real valued inputs. Inside every neuron, each input is multiplied with corresponding neural coefficient defining its value. The sum of all these products adds to a value called bias. Finally, activation function affects this sum and determines the real valued output of the neuron feed forwardly [19] or by some feedback [20].

1.14 Feed forward back propagation neural networks
Primary discussions regarding artificial neural networks introduced in the 40's with
presentation of the feed forward neural networks. Artificial neural networks in some extents are modeled from the brain and neural system of the human, which are able to give acceptable solutions based on correct information records from the problem.
The basic structure for the feed forward back propagating neural network (nets without feedback) consists of some number of nodes in the input layer, the hidden layer, and the output layer that has one node. The sigmoid functions approximate linear functions, yet allow the update scheme to propagate backwards through differentiable functions. The manner in which input data generates output data for a given neural network depends on the interconnection weights. These weights are adjusted to reduce the error between the neural network outputs and the actual output values. i.e.
                                                      (1.1)
where  is the actual output for the ith training point. is the estimated value from the neural network for the ith training point from the neural network. n is the total number of training points obtained by taking known data points for a given task. Here the objective is to train the network so that the output from the network minimizes equation (1.1).



1.15 Perceptron
The Perceptron, is possibly the earliest neural computation model, first conceived dby F. Rosenblatt, and dates back to 1958. The Perceptron serves as a building block to more complicated neural network  models, such as the multilayer perceptron (MLP) or the  radial basis function model. The Perceptrron possesses the fundamental structure shown below.

We observe that the output of the perceptron can be written;

Or in vector notation

1.16 The Perceptron training rule
The perceptron training rule weight update according to

with , and   is the target value,
is perceptron output, is a small constant called the learning rate.
1.17 Manual Computations
In this section we consider manual computations of a neural network consisting of two inputs  and a bias  as shown;
                                                                                      ACTIVATION FUNCTIONS
The weights  and are specified as;



The output is then computed as;


Where can be any of the activation function shown above.
For the Sigmoid activation function,.

For the Hyperbolic Tangent activation function, ;
.

1.18 Recurrent dynamical artificial neural network
Khanna in year 1990 [21], describes associative memory as "the ability to get from one internal representation to another or to infer a complex representation from a portion of it".
Effectively our goal in applying neural networks is to create a functional mapping from steady optimization space to either dynamical time dependent space or some parameter space. Two approaches to achieving this mapping have been extensively studied by Xia [14], [15] and [22] to [25], Malek [4], [16], [26] and [27] and their coauthors. The first approach relies on a structure with adjustable parameters. On the basis of known input/output pairs, these parameters are selected or changed. If this approach is successful, the appropriate selection of these parameters will yield a mapping device which will always provide the associated output values for a given input. The second approach uses information from the primal and dual optimization problem and applied primarily by Malek in year 2005 [16]. The basis for such systems is a precisely defined set of ordinary differential equations that automatically satisfy the related primal and dual optimization problems simultaneously. These information are defined by the cumulative designing the system and are laid out in a hierarchical fashion. The system then performs a sequential set of values, using the output from the previous as the input to the
next. If successful, a system can be created which will associate input with its correlated output. The challenge is to make the system complete enough (consistent, convergent and stable) to always associate the correct output with a given input.
The primary difference in these two approaches is that adjustable parameters in the first are a prior, i.e., the parameters are settled upon and maintained before data is introduced into the system. The second approach has no adjustable parameters thus its model is simple to use. The advantage of this approach is that in this way, we can obtain a solution for the given real life problem, however we wish to assume a prior knowledge of relationships between constrained optimization problem and dynamical system. Moreover the solution for optimization problem consists of a solution for real life problem, since optimization problem is simulated from the corresponding real life problem. The work presented in this section applies recurrent dynamical artificial neural network. We shall emphasize on networks that do not use network parameters or penalty parameters in advance. This approach is a metric driven method. i.e., we establish distance between the input and the neural network output. For a given input, the neural network outputs the value whose distance from the given input is smallest using linear constraint least square technique or any other related method. One manner of doing this mapping is to associate the equilibrium points of a dynamical system with the optimal points of constraint
optimization problem. When the input is the initial condition of the dynamical system, the system will converge to an equilibrium point. Thus this optimal solution contains a solution that minimizes equation (1.1), where we use the feed back process to produce corresponding optimal weights. This means that the artificial neural network structure is recurrent.
The structure of the recurrent dynamical artificial neural network is different from the feed forward artificial neural network. However it is possible to make some corresponding relations between these two neural networks (see Rumelhart 1986, [28]). i.e., there is a sense in which the error back propagation scheme may be applied to networks that contain feedback, (see Fig. 1.5). The feed forward network in Fig.1.5 may be represented to simulate feedback network with a given set of weight and bias parameters. Having developed the equivalent structure as shown in Fig. 1.6, it becomes proper to say "the goal for recurrent dynamical artificial neural network, as with the back propagation artificial neural network, is to minimize the error function given by equation (1.1).
Training of dynamical neural networks has received considerable attention in the last 30 years [20], [29] and [30]. The equations governing the behavior of the simplest supervised recurrent dynamical neural network are
                                             (1.2)
                                                                                     (1.3)
                                                                                                    (1.4)
  where                                      
is the sigmoid function. The adjustable parameters in this supervised recurrent dynamical neural network are found in the A, B matrices and vector C. The input x of the neural network corresponds to the input data associated with a training point. This input is then applied to the system governed by equation (1.2). When equation (1.3) reaches an equilibrium value u* for this input, we obtain the output of the neural network by taking the dot product of C and u* by equation (1.4). This neural network output will then compare with the actual output. To update the elements of A, B, and Cone may use gradient descent method using
and
This minimization task requires that the neural network possess enough parameter freedom to enable each input set to generate an output close to the actual value. This is not a case in many problems. Thus in the next section we emphasize on the unsupervised recurrent dynamical artificial neural networks.


Fig. 1.5  Equivalent structures of a two unit network; Feed forward network, and feed-back network for a given biases b1 and b2 and weights 1 w and 2 w .


1.17 Networks dynamic analysis
For many times dependent cost functions an online optimizer on the basis of an analog circuit [31], [32] and [33]) is desirable. Dynamic solvers or analog computer, was first proposed by Dennis [34], Rybashov [35] and [36], Karpinskaya [37], and later studied by Kenedy and Chua [38], Rodriguez-Vazquez et al. [39], Tank and Hopfield [31]. These dynamic solvers usually employ neural networks since they have many advantages over the traditional algorithms. Massively parallel processing and fast convergence are two of the most important advantages of the neural networks.

CHAPTER TWO
REVIEW OF RELATED LITERATURE

2.0 Introduction
Optimization theory can be described as the procedure for obtaining the optimum (i.e. maximum or minimum) of a given real life problem, represented in mathematical form.  This may be a problem of maximizing profit for a production outfit or minimizing cost for a given transportation agenda. In his work, “Optimality production in a local industry,” Ukwuma P.C. (1999) used a linear programming optimization method to show that the problem of high cost of production with attendant low or no profit at all can be solved by generating an optimal production batches for the different brands of drinks produced by the company.
Optimization models play a central role, among Company Managers, Engineering Designers and Financial Portfolio Managers as they use it for decision-making. It’s use cuts across many disciplines like economics, physics, biology and many more…
According to Bertsekas, 2009, the Lagrange multipliers method and its other versions are widely used by Engineers, Scientists and Economics.
Developments after the second world war and other related world events stimulated much interest among Engineers, Economists, Business Executives, Mathematicians and others… It motivated men to think of the cheapest way to achieve their aims or satisfy their wants. This thinking and the interesting mathematical analysis brought prominent scholars such as Stephen Boyd, Lieven Vandenberghe, Philip. Loewen, Yongwei Huang, Rangranjan K Sundram, Krause Kuhn and Tucker among many others to a thinking pattern and the associated mathematical analysis gave birth to what is now known and referred to as Optimization theory.
In the late eighteenth century Joseph Louis Lagrange(1736-1831) propounded his elegant and powerful theorem on equality constrained optimization problems. From here optimization theory and its application began to dominate fields such as economics and engineering, as well as other related fields.
2.1 Basic Principles of ANNs and Their Early Structures
2.1.1 Basic Principles of ANN Design
The basic principles of the artificial neural networks (ANNs) were first formulated
by McCulloch and Pitts in 1943, in terms of five assumptions, as follows:
1)    The activity of a neuron (ANN) is all-or-nothing.
2)    A certain fixed number of synapses larger than 1 must be excited within a given interval of neural addition for a neuron to be excited.
3)    The only significant delay within the neural system is the synaptic delay.
4)    The activity of any inhibitory synapse absolutely prevents the excitation of the neuron at that time.
5)    The structure of the interconnection network does not change over time.
By assumption (1) above, the neuron is a binary element.
Whereas these are probably historically the earliest systematic principles, they
do not all apply to today's state-of-the-art of ANN design.
The Hebbian Learning Law (Hebbian Rule) due to Donald Hebb (1949) is also
a widely applied principle. The Hebbian Learning Law states that:
When an axon of cell A is near-enough to excite cell B and when it repeatedly
and persistently takes part in firing it, then some growth process or metabolic change takes place in one or both these cells such that the efficiency of cell A [Hebb, 1949] is increased" (i.e. | the weight of the contribution of the output of cell A to the above firing of cell B is increased).
The Hebbian rule can be explained in terms of the following example: Suppose
that cell S causes salivation and is excited by cell F which, in turn, is excited by
the sight of food. Also, suppose that cell L, which is excited by hearing a bell ring,
connects to cell S but cannot alone cause S to fire. Now, after repeated firing of S by cell F while also cell L is firing, then L will eventually be able to cause S to fire without having cell F fire. This will be due to the eventual increase in the weight of the input from cell L into cell S. Here cells L and S play the role of cells A, B respectively, as in the formulation of the Hebbian rule above.

In a recent survey, Osman and Laporte [130] reported that while neural networks are a very powerful technique for solving problems of prediction, classification and pattern recognition, they “have not been as successful when applied to optimization problems and are not competitive with the best meta-heuristics from the operations research literature, when applied to combinatorial optimization problems.” Researchers have been trying for over a decade now to make neural networks competitive with meta-heuristics[145] such as simulated annealing,[1, 2] tabulated search,[67, 68] constraint logic programming,[180] and genetic algorithms,[46, 69] and they have experienced varying degrees of success. Almost every type of combinatorial optimization problem (COP) has been tackled by neural networks, and many of the approaches result in solutions that are very competitive with alternative techniques in terms of solution quality. Unfortunately, the reputation of neural networks for solving COPs does not reflect this evidence.
The reasons are twofold. First, from within the first few years of research, controversy and debate has surrounded the field, discouraging many researchers: a state from which the field has never truly recovered. Second, researchers are
forced to simulate the behavior of neural networks on digital computers while awaiting the development of suitable hardware advances. These simulations are designed to evaluate the potential of neural networks for generating near-optimal
solutions to COPs, and naturally result in large CPU times that are uncompetitive with alternative techniques. Further research into the design of suitable hardware is contingent upon demonstrated success of neural network simulations.
Unfortunately, the many successful applications of neural networks will not receive full merit until the reputation of neural networks has been salvaged .The  idea of using neural networks to provide solutions to difficult NP-hard optimization problems[61, 125] originated in 1985 when Hopfield and Tank demonstrated that the Travelling
Salesman Problem (TSP) could be solved using a Hopfield neural network.[87] This work is well-known, and their impressive results for the solution of a TSP generated much excitement in the neural network and operations research communities alike. Their work is also quite controversial because many researchers have been unable to reproduce their results (due perhaps to certain omissions in
Hopfield and Tank’s article[87] relating to simulation procedure, termination criteria, etc.). Wilson and Pawley[192] first published these findings nearly three years after Hopfield and Tank’s original article was published. In doing so, they
raised serious doubts as to the validity of the Hopfield-Tank (H-T) approach to solving optimization problems, which seemingly served to dampen the enthusiasm surrounding the field. Since Wilson and Pawley’s results were published,
it has been widely recognized that the H-T formulation is not ideal, even for problems other than the TSP. The nature of the energy function that the method utilizes causes infeasible solutions to occur most of the time. A number of penalty parameters need to be fixed before each simulation of the network, yet the values of these parameters that will enable the network to generate valid solutions are unknown.
The problem of optimally selecting these parameters is not trivial, and much work has been done to try to facilitate this process.[79, 96, 109] Many other researchers believed that the H-T energy function needed to be modified before any progress would be made, and considerable effort has also been spent in this area.[21, 177] Perhaps the most important breakthrough in the field, however, came from the valid subspace approach of Aiyer et al.,[6] and the subsequent work of Gee.[63] By representing all of the constraints by a single term, the feasibility of the Hopfield network can now be essentially guaranteed.
The question of solution quality has also been addressed over the last decade by various methods that attempt to avoid the many local minima of the energy function. Many variations of the Hopfield network have been proposed and can be broadly categorized as either deterministic or stochastic. The stochastic approaches are perhaps more successfully able to improve solution quality because they attempt to embed the principles of simulated annealing into the Hopfield network to escape from local minima. Such attempts have resulted in several alternative approaches including Boltzmann,[1, 4, 82] Cauchy,[90, 167] and Gaussian Machines,[8] which are able to compete effectively with other techniques in terms of solution quality. The other main neural network approach to combinatorial
optimization is based on Kohonen’s Self-Organizing Feature Map.[101] While much of the Hopfield network literature is focused on the solution of the TSP, a similar focus on the TSP is found in almost all of the literature relating to the use of self-organizing approaches to optimization.[11, 52, 56] In this case, the reason is not simply because of the benchmark status of the TSP, but more because the vast majority of these approaches are based upon the elastic net method.[49] Kohonen’s
principles of self-organization[101] are combined with the concept of an “elastic band” containing a circular ring of neurons that move in the Euclidean plane of the TSP cities, so that the “elastic band” eventually passes through all of the cities and represents the final TSP tour. Such approaches rely upon the fact that the “elastic band” can move in Euclidean space, and that physical distances between the neurons and the cities can be measured in the same space. Self-organizing neural networks have been successfully applied to solve other two-dimensional problems
such as vehicle routing[173, 174] and shortest path problems[164]. Recently, a generalization of the self-organizing map has been proposed to solve generalized quadratic assignment problems, relieving the restriction to two-dimensional problems, and has been applied to solve several optimization problems arising from industrial situations.[74, 155, 158].

CHAPTER THREE
   HOPFIELD NEURAL NETWORKS
 3.0 Introduction
Neural networks for unconstrained optimization were developed based on the “Lyapunov stability theory” of dynamical systems: if a network is “stable,” its “energy” will decrease to a minimum as the system approaches its “equilibrium state.”  If one can properly set up a network that maps the objective function of an optimization problem onto an “energy function,” then the solution is a natural result of network convergence, and can be obtained at a very fast speed.
In the past two decades, artificial neural networks or neural networks (NNs) have been widely developed in solving mathematical programming (MP) or optimization problems. Among various NNs, a couple of networks have been utilized for optimization (Looi, 1992; Kumar, 2005), such as Hopfield neural netwoks HNNs) (Hopfield, 1982), self-organizing feature maps (Kohonen, 1982), and Boltzmann machines (Ackley et al., 1985). As HNNs and general NN structure have been developed for almost all classes of optimization problems, it is important for us to give a brief  review of the  techniques involved, the  benefits and drawbacks.
Research studies on neurons and their networks for information processing have drawn much attention by scholars in the fields of physics, electronics, and electricity. They have tried to formulate the problems mimicking the behavior of biological NNs to abstract principles of computation employed by the brain. Hopfield and Tank made a momentous contribution for neural computation by solving a traveling salesman problem (TSP) and a linear programming (LP) problem (Hopfield, 1982, 1984; Hopfield and Tank, 1985; and Tank and Hopfield, 1986).
Constrained optimization problems in general assume that the minimization (or maximization) of some objective cost (or benefit) function is subject to various constraints with independent variables, and these problems are popular in various areas such as in science, engineering, and business (Bazaraa et al., 2005, 2006).
Technically, the HNN approach to optimization is to handle a dynamic system in which its energy function, or a Lyapunov function , characterizes the behavior of the networks and represents the problems to be solved. Its architecture can be realized by the use of an electronic circuit, possibly on a VLSI circuit, as an on-line solution with a parallel-distributed process (Cichocki and Unbehauen, 1993). These special characteristics are beneficial for real-time optimization and have been applied in many areas such as pattern recognition, scheduling, manufacturing, and numerous business applications (Looi, 1992; Sharda, 1994; Smith and Gupta, 2000; Wong et al., 2000; Zhang and Huang, 1995). Due to renewed interests in NNs by extending HNNs, various NNs have been proposed. For instance, Kennedy and Chua (1988) extended the Tank and Hopfield’s work to solve a non-linear programming (NLP) problem. Rodrıguez-Vazquez et al. (1988, 1990) presented switched-capacitor NNs for solving LP and NLP problems. Later, Zhang and Constantinides (1992) introduced a Lagrange NNs for solving general NLP problems, and Cichocki and Unbehauen (1993) proposed the Lagrange multiplier method based NN in solving the NLP problem. Xia et al. (2005) offered a primal and dual method for solving linear and quadric programming problems.

3.1 General function and basic aspects for HNNs

Neural networks have been characterized in various ways according to many relevant features (Basheer and Hajmeer, 2000; Sima and Orponen, 2001). In general, NNs have two kinds of structures (Zurada, 1992). Both have to be configured such that the application of a set of inputs produces (either direct or via a relaxation process) the desired set of outputs, and various methods exist to set out the strengths of the connections. One way is to place the weights explicitly, using a priori knowledge. The other way is to train the neural network by feeding it teaching patterns and letting it change its weights according to some learning rule. HNNs belong to non-training model. In particular, HNNs are considered as feedback (or recurrent) without learning from a pattern set (Zurada, 1992). The HNNs consist of a set of neurons and a corresponding set of unit delays, forming a multiple loop feedback system. Each feedback loops is equal to a neuron. Basically, the output of each neuron is feedback, via a unit delay element, to each of the other neurons in the system. HNNs are not affected by additional inputs, and each neuron is fully connected with the remaining neurons by weights and always
updates the connection weights. The associative memory or content- addressable memory is a memory organization that accesses memory by its contents and it always searches for the closest matching from all stored prototypes (Du and Swamy, 2006). This is analogous to the storage of information in an associative memory as biological neurons (Kohonen, 1982). In other words, the process
of association and information retrieval is simulated by the dynamical behavior of a highly interconnected system of non-linear circuit elements with collective computational abilities (Hopfield, 1982).
Due to the above characteristics, HNNs are easily realized for their capability of parallel computation as a result of being a fully connected property. The networks use electric circuits to simulate the actions of biological neurons, and the basic model can be implemented by interconnecting an array or resistors, non-linear amplifiers with symmetrical output, and external bias current as shown in Fig. 1. There are n neurons, and each neuron is represented by a resistor gi, a capacitance ci, and a non-linear amplifier with an activation function i = 1,2,. . . ,n, respectively. The resistance–capacitance charging equation determines the rate change of ui, i = 1,2, . . . ,n, where ui is the input voltage of the amplifier. The input is mapped to the output voltage vi or  (normal or invert) through the function  i = 1,2,. . . ,n, respectively. The choice of connecting the normal or invert output is dependent on the positive or negative value of the conductance or weight wij, i = 1,2,. . . ,n, j = 1,2,. . . ,n. The input of each neuron comes from two sources, external inputs Ii and inputs from other neurons with the interconnection strength or weights wij from neuron j to neuron i (Hopfield, 1982). In addition, Hopfield has shown that the sufficient condition for a stable network is that its synaptic weights are symmetric, i.e., wji = wij with wjj = 0, and, i.e. the system has a Lyapunov function (Hopfield, 1982). Here, the weights are derived from the function, and the neurons stay transit according to the local information (i.e., feedback) until a steady state is reached (Burke and Ignizio, 1992).
                
                                         Fig. 3.1 Basic structure of HN
                                    
                               Fig. 3.2 Continuous-time Hopfield structure

Hopfield (1982) presented an energy function to demonstrate its stability as local minimum of the energy function corresponds to the energy of the stored patterns. According to the work of Bruck and San (1988), the network can converge to a minimum from any initial state. Thus, the characteristic is helpful for applications. In addition, one  popular NN for optimization, the Boltzmann machine, is an extension of discrete HNN. It only replaces the deterministic local search dynamics of the HNN by randomized local search dynamics (Kumar, 2005; Kurita and Funahashi, 1996), but it has only been applied to a small number of problems.

3.2 Continuous Hopfield networks
The continuous or continuous-time HNN utilizes as common output functions in the network, the sigmoid and hyperbolic tangent functions. Fig. 3.2 shows the common circuit of the continuous-time HNN, where    j = 1,. . . ,n, is the integration time constant of the jth neuron, and  is an externally applied threshold. The integrator can be realized by an operational amplifier, capacitor, and resistor . Here,  is called the forgetting factor of the integrator, which forces the internal signal  to zero for a zero input. To formulate the network, we establish a differential equation for the activation potential  as

The output neuron is given by  which is a continuous output function such as a sigmoid function. Hopfield (1984) proposed an energy function for the continuous  HNN as;

where the function is a monotone increasing function. Hopfield and Tank (1985), Tank and Hopfield (1986) introduced the continuous HNN to solve the TSP and LP problems. Afterwards, many researchers implemented HNN to solve the optimization problem, especially in MP problems. Hence, the continuous model is our major concern. In general, the continuous model is superior to the discrete one in terms of the local minimum problem, because of its smoother energy surface. Hence, the continuous HNN has dominated the solving techniques for optimization problems, especially for combinatorial problems.

3.3 Methods of Hopfield networks
HNN manipulates MP problems through approximation (Hopfield and Tank, 1985). It emulates the behavior of neurons through an electronic circuit. After the circuit is established with many similar sets of components, a parallel computation can be realized for an on-line solution. These special characteristics are beneficial for real-time optimization and have been applied to solve optimization problems such as LP, NLP, quadratic programming, MILP,
and multi-objective linear programming (Bouzerdoum and Pattison, 1993; Cichocki and Unbehauen, 1993; Ham and Kostanic, 2001; Hopfield and Tank, 1985; Kennedy and Chua, 1988; Looi, 1992; Rodrı´guez-Vazquez et al., 1988, 1990; Shih et al., 2004; Smith et al., 1998; Smith, 1999; Wang, 1992, 1996; Xia, 1996). From the computational aspect, the operation of HNN for an optimization problem manages a dynamic system characterized by an energy function, which is the combination of the objective function and constraints of the original problem. Because there are some similarities between the function and the formulation of the NLP problem, many common techniques of NLP can thus be exploited. In such a way, three common techniques to handle NLP problems, i.e., penalty functions, Lagrange functions (or augmented Lagrange functions), and primal and dual functions, are adapted. Firstly, penalty functions use penalty parameters to combine the constraints and the objective function, and then they construct an energy function to be minimized. Secondly, Lagrange functions (or augmented Lagrange functions) take advantage of Lagrange multiples to construct an energy function to be operated. Thirdly, primal and dual functions are made up of a primal function and a dual function and try to reach the minimum gap between the functions. These three techniques are suitable for solving various MP problems such as LP, NLP, and MILP. The following sections review the literature of HNNs from a technical standpoint.


3.4 Penalty function methods
The penalty function method is a popular technique for optimization in which it is used to construct a single unconstrained problem or a sequence of unconstrained problems. A searching method can be adopted to search for the solution in the decision space, and the steepest descent approach is the popular technique for obtaining the searching direction (Cichocki and Unbehauen, 1993).
In our survey, many NN approaches utilize a rather basic penalty function to build the energy function and usually converge to a stable point.

3.5 Summary
The Hopfield neural network (HNN) is one major neural network (NN) for solving optimization or mathematical programming (MP) problems. The major advantage of HNN is that its structure can be realized on an electronic circuit, possibly on a VLSI (very large-scale integration) circuit, for an on-line solver with a parallel-distributed process. The structure of HNN utilizes three common methods, penalty functions, Lagrange multipliers, and primal and dual methods to construct an energy function. When the function reaches a steady state, an approximate solution of the problem is obtained.

 CHAPTER FOUR
MAIN RESULTS
ON GRADIENT BASED NEURAL NETWORKS
FOR OPTIMIZATION PROBLEMS
 4.0 Introduction

Optimization problems arise in almost every field of engineering, business and management sciences. In many engineering and scientific applications, the real-time solutions of optimization problems are demanded. However, traditional algorithms for digital computers may not be able to provide the solutions on-line. Therefore, the search for real-time on-line solutions in such cases becomes not only important but also essential. In 1980's, an attractive and very promising approach was introduced to provide real-time solutions for optimization problems. The new approach is termed artificial neural network (ANN). Hopfield and Tank [12, 13, 28] initiated the recent study of neural networks in optimization. Generally speaking, ANN provides an alternative and attractive way for the solution of optimization problems. The significant and unique feature of ANN to optimization is the realization of simple and real-time hardware (or circuit) implementation. In other words, an electrical circuit can be constructed which generates the on-line solution of certain optimization problems. The seminal work of Hopfield and Tank has inspired many researches to investigate various neural networks for solving linear and nonlinear programming problems. Numerous optimization neural networks have been develop ed, see [3, 15, 24, 21, 22, 27, 1, 5, 2, 20, 34, 14, 16, 17, 18, 19, 30, 31, 33]. From the optimization point of view, most of the existing ANN models for optimization problems could be divided into two classes. One class is the gradient-based neural network models, which are used for the unconstrained optimization problems. These problems normally come from a) some kind of transformations from the constrained minimization problems with penalty function methods [15, 24, 22, 2, 20]; and b) the complementarity problems with the NCP functions [17, 18]. The other class is the pro jective gradient based neural network models, which are derived from constrained minimization problems and complementarity problems with KKT-conditions [34, 14, 19, 30, 31]. [30, 31] present some neural networks for solving linear, quadratic and nonlinear convex programming problems, which are based on KKT system for optimization problems. Their models correspond to some variants of the pro jective gradient method for the complementarity problem and the variational inequality problem [7, 23, 8, 9, 10, 11]. It should be noted that for Lagrangian conditions with the nonnegative Lagrange multiplier, if the Lagrange multiplier is penalized or transformed into unrestricted case, the resulting neural network model belongs to the first class. On the other hand, if the nonnegative Lagrange multiplier is enforced by projection, it belongs to the second class.
4.1 Unconstrained optimization
In this section we focus on the following class of unconstrained optimization problem;
                                                                                        (4.1)
With the gradient-based dynamics,
                                       (4.2)
where is the objective (or energy) function, is the gradient of is an arbitrary initial point and H is a symmetric positive definite matrix. For our purposes we assume .
In general a neural network consists of
a)     an energy function
b)    a dynamical system which is stable, and
c)     the fact that the energy function should be monotonically decreasing along the solution of the dynamical system.


4.2 Remark:
1.     Subsequently we will consider equation (4.2) as the general gradient based neural network.
2.     The stability analysis of the general gradient based neural network will be considered. We note that stability analysis for various neural network models has been addressed in the literature [15,34,21,1,20,30,2]. However, all these discussions  are based on Lyapunov’s direct method which requires the existence of a Lyapunov function. This however limits the stability analysis for a general neural network model. In this work we introduce a new approach to address the stability issues of a general neural network model.

4.3 Lyapunov's Direct Method
Consider the continuous-time system
                                
with an equilibrium point at . This is a time-invariant system, since f does not depend explicitly on t. The stability analysis of the equilibrium point in such a system is a difficult task in general. This is due to the fact that we cannot write a simple formula relating the trajectory to the initial state. The idea behind Lyapunov's “direct” method is to establish properties of the equilibrium point (or, more generally, of the nonlinear system) by studying how certain carefully selected scalar functions of the state evolve as the system state evolves. The term direct is to contrast this approach with Lyapunov's indirect method which attempts to establish properties of the equilibrium point by studying the behaviour of the linearized system at that point.
Consider, for instance, a continuous scalar function  that is 0 at the origin and positive elsewhere in some ball enclosing the origin, i.e. and for n this ball. Such a may be thought of as an “energy” function. Let denote the time derivative of  along any trajectory of the system, i.e. its rate of change as  varies according to (4.3). If this derivative is negative throughout the region (except at the origin), then this implies that the energy is strictly decreasing over time. In this case, because the energy is lower bounded by 0, the energy must go to 0, which implies that all trajectories converge to the zero state.

4.4 Lyapunov Functions
4.4.1 Defnition  Let V be a continuous map from  to . We call  a locally positive defnite function around  if
(i)  ,
(ii) for some r.
Similarly, the function is called locally positive semi-definite if the strict inequality on the function in the second condition is replaced by. The function is locally negative definite if is locally positive definite, and locally negative semi-definite if is locally positive semi-definite.
For the continuous time case, we shall restrict ourselves to that have continuous first partial derivatives. We shall denote the derivative of such a V with respect to time along a trajectory of the system (4.3) by. This derivative is given by;
                                         where is a row vector, i.e.,  the gradient vector or Jacobian of V with respect to x, containing the component-wise partial derivatives .


4.5 Definition (Lyapunov function)  
Let V be a locally positive definite function and let  be its derivative along trajectories of system (4.3). If  is locally negative semi-definite, then V is called a Lyapunov function of the system (4.3).

4.5 Theorem (Lyapunov Theorem for Local Stability)
If there exists a Lyapunov function of system (4.3), then is a stable equilibrium point in the sense of Lyapunov. If in addition  for some , i.e. if  is locally negative definite, then  is an asymptotically stable equilibrium point.
We have the following important definitions and Theorems

4.6 Theorem 
Suppose s a continuous function. Then for arbitrary  and  there exists a local solution satisfying  to (4.3) for some . If f is locally Lipschitz continuous at then the solution is unique, and if f is Lipschitz continuous in  then can be extended to


4.7 Definition (Equilibrium)
A point is called an equilibrium point of (2.3) if.
4.8 Definition (Stability in the sense of Lyapunov)
Let be the solution of (4.3). An isolated equilibrium point s Lyapunov stable if for any and any scalar  there exists a such that if then  for .

4.9 Definition (Asymptotic Stability)
An isolated equilibrium point is said to be asymptotically stable if in addition to being Lyapunov stable, it has the property that  as , if.

4.10 Remark
We note that the above classical stabilities are only for isolated equilibrium points. Non-isolated equilibrium points could occur inevitably in the dynamical system for optimization problems. For example, consider , the points on the circle  are all the equilibrium points of (4.2). So it is necessary to extend the definitions of various stabilities from isolated equilibrium points to non-isolated ones.

4.11 Definition (Connected equilibrium set)
The subset of the equilibrium set of the dynamical system (4.3) is called a connected equilibrium set of S, if for any  there exists a continuous curve  connecting  and , and if there exists a positive number  such that there is no equilibrium point of (4.3) except  itself in the open neighbourhood  of , where  denotes the distance from x to 

4.12 Definition (Stability for connected equilibrium set)
Let  be the solution of (4.3). The connected equilibrium set is stable if for any and any scalar  there exists a such that if, then .

4.12 Definition (Asymptotic stability for connected equilibrium set)
The connected equilibrium set  is said to be asymptotically stable if in addition to being stable, it has the property that  as  if .

4.13 Remark
We observe that there is a simple fact that the objective function in (4.1) remains constant on the connected equilibrium set  of the system (4.2)since on and is connected. In the above example,  equals zero on the equilibrium set .
The following theorem provides some interesting properties for the autonomous system (4.3).

4.14 Theorem
a)     Given  a solution of (4.3), then for any real constant c, the function is also a solution of (4.3).
b)    Through any point passes at most one trajectory.



4.15 Remark
Unless otherwise stated, subsequently in this work we will assume that the objective function  in (4.1) is bounded below and continuously differentiable. In addition, its gradient function  is Lipchitz continuous in , i.e. there exists a constant L (> 0) such that

4.16 Analysis for Gradient-Based Neural Network
The analysis in this section is for, the identical matrix in (4.2), for simplicity. For the case where H is symmetric positive definite matrix other than the identity matrix, the analysis is similar with the norm;

and induced distance;

4.17 Remark
In the literature [15,34,21,1,20,30,2 ], the stability analysis for a dynamical system is provided by Lyapunov’s direct method which requires the existence of a Lyapunov function. Unfortunately however, a Lyapunov function may not exist in general. To overcome this difficulty, we introduce a new approach to address the stability issues of a dynamical system without employing any Lyapunov function. Furthermore, the approach does not require prior information on the equilibrium point or set at all. The new approach only considers the energy function  and the associated dynamical system. Subsequently we will show how this works for the general gradient-based neural network (4.1)-(4.2).
First we need the following preliminaries.

4.18 Theorem
Let in (4.2) be Lipschitz continuous in and , then there is no periodic solution for the neural network system (4.2).
Proof: From Theorem (4.14) there exists a unique solution  satisfying  Suppose there is a periodic solution  and the minimal period is , i.e.
Since (/), there exist a such that , then . Since is non-increasing on from , so  must be a constant on which is a contradiction since .
4.19 Remark
From Theorem (2) and Theorem (3), we can conclude that the phase space of (4.2) can only consist of;
·        Equilibrium points
·        Non-intersecting trajectories

4.20 Lemma
Let be the trajectory of the system (4.2), is any contour of , then  is orthogonal to  at their intersecting points.
Proof: Consider the phase plane of (4.2) as , where s represents the parameter along the contour of  in the phase plane of (4.2). Since , then we know that has continuous partial derivative with respect to s. Then  remains constant on its contour, i.e.

we have

We also require the following lemma by Barbalat.

4.21 Lemma (Barbalat)
If a differentiable function t has a finite limit as , and  is uniformly continuous, then    as .

4.22 Theorem
If  is bounded below and its gradient is Lipschitz continuous in, then for any initial point ,  the trajectory  of the system (4.2), satisfying , will converge to an equilibrium point of the neural network (4.2) as .

4.23 Remark
The result of theorem (4) confirms the the energy function E in (4.1) and the dynamical system (4.2) constitute a neural network model. The result in theorem (4) also shows that any trajectory of the neural network (4.2) for the optimization problem (4.1) will converge to an equilibrium point of the neural network (4.2), irrespective of the boundedness of level sets of the objective function and the isolation of the equilibrium point.
Because , the energy function in (4.1) is monotonically decreasing in t if is not an equilibrium point. This fact reveals that the energy function is not preserved following the trajectory of the neural network system (4.2). Theorem (3) and (4) lead to the following interesting results.

4.24 Theorem
Assume that in (4.1) is bounded below and its gradient is Lipschitz continuous in . Let be a connected equilibrium set of the neural network (4.2), if s stable, then it is asymptotically stable.

4.25 Remark
The result of theorem (5) would facilitate the proof of both stability and asymptotic stability in many neural networks for optimization problems. The asymptotic stability is particularly important for the neural network (4.2) of optimization problem (4.1), because only points in the asymptotically stable connected equilibrium set are the optimal solutions of the problem (4.1). The following theorem demonstrates the equivalence of the asymptotically stable connected equilibrium set of neural network (4.2) and the optimal set of the problem (4.1).

4.26 Theorem
Assume that in (4.1) is bounded below and its gradient is Lipschitz continuous in . Let be a connected equilibrium set of neural network (4.2). Then is asymptotically stable if and only if each point of the set is an optimal solution of the optimization problem (4.1).


We consider here the following examples to illustrate the above results.

4.27 Example
                                                                          (4.4)
                               
The level sets of  are all unbounded unless it is empty. The graphical profile of  is depicted below.
Fig.4.0   Profile of
 
                               
The level sets of  are all unbounded unless it is empty, the neural network for (4.4) is
                                                                                         (4.5)
The equilibrium set of (16) is the line , which is connected and unbounded. The solutions to equations (4.5) are


The trajectories are lines


We observe that every trajectory approaches the equilibrium point
as , which is on the line  , every trajectory  is orthogonal to the line , and  on the line .

4.28 Example
                                                                           (4.6)
                                    
As a reference point we plot the graph of .

                                                         Fig. 4.1   Profile of 
We note that the maximum of occurs precisely at 
The neural network for (4.6) is
          
Separating variables;

Unfortunately the integral on the left does not have a closed form solution, so we resort to a numerical solution with MathCAD  as follows.
 
MATHCAD SOLUTION
To solve the first order ODE







                                   Fig. 4.2  Solution profile with initial value x0 = 1

       
                                     Fig.4.3  Solution profile with initial value x0 = 2

            
                                                 Fig.4.4  Solution profile with initial value x0 = 3

The equilibrium set of (4.6) is ,  and we have the following cases
·        If    (Fig 4.2)  
·        If            (Fig 4.3)
·        If     (Fig 4.4)

Theorem 4.24 and the above examples show clearly that the infinite point can be viewed as an extended ordinary point. If the point is an isolated minimize of the objective function as in Example (4.28), some trajectories will approach it. If the point
is not an isolated minimize as in Example (4.27), there is no trajectory. With a finite initial point can approach it. We also note that if the set is relaxed by the level sets  for all , Theorem (4.24) and Theorem (4.26) still hold. For instance in Example (4.28),  but Lipschitz continuous in any level set of , so the conclusions still hold as shown above.

4.29 Lagrange programming neural network for nonlinear optimization
The Lagrange programming neural network (LPNN) was proposed by Shengwei Zhang and A. G. Constantinides [1] from their research concerning analog computational circuits. An analog computational circuit is usually constructed by a dense interconnection of simple analog computational elements (neurons) and governed by a set of differential equations. By using an analog computational circuit, the optimization problem can be solved either by iterations using a digital computer or by setting up the associated neural circuit and measuring the node voltage after it settles down to a steady state.

The Lagrange programming neural network is designed for general nonlinear programming. It is based on the well-known Lagrange multiplier method  for constrained programming. Instead of following a direct descent approach of the penalty function, the network looks for, if possible, a point satisfying the first-order necessary conditions of optimality in the state space. Consider the following nonlinear programming problem with equality constraints:
                                         Min
such that   
where
and                       
are given functions and. The components of g are denoted  f and g are assumed to be twice continuously differentiable.
The Lagrange function is defined by



where is referred to as the Lagrange multiplier. Furthermore, we have



                        
The first-order necessary condition of optimality can be expressed as a stationary point of over x and , i.e.


The transient behaviour of the neural network is defined by the following equations.




If the network is physically stable, the equilibrium point  described by;


 
which obviously meets the first-order necessary condition of optimality and thus provides a Lagrange solution. There are two classes of neurons in the network, variable neurons x and Lagrangian neurons , with regard to their contribution in searching for an optimal solution. Variable neurons seek for a minimum point of the cost function and provide the solution at an equilibrium point, while Lagrangian neurons lead the dynamic trajectory into the feasible region determined by the constraints.

4.30 Remark
The disadvantage of the Lagrange neural network lies in that it handles equality constraints only. Though in theory inequality constraints can be converted to equality constraints by introducing slack variables, the dimension of the neural network will inevitably increase, which is usually not deemed optimal in terms of model complexity. In this sense, we need a neural network which can be regarded as an extension of the Lagrange network.

4.31 The Augmented Lagrange Programming Neural Network
Another variation of the LPNN is the augmented Lagrangian programming neural network(ALPNN) which we now introduce.
Consider the more general problem with inequality constraint.
Min
                                 such that                               (4.7)                 
 .                     
Thus . The functions  and are assumed to be twice continuously differentiable.

4.32 Definition
Let be a vector satisfying the constraint conditions, then will denote the index set
                             (4.8)
If the gradients  are linearly independent, then is called a regular point.
Given that the Lagrangian of the problem is defined by
                                  (4.9)
There exists the following Karush-Khun-Tucker Theorem .

4.33 Theorem
Let be a local minimum of problem (4.7) and assume that is a regular point. Then there exists a unique vector such that;
                                         (4.10)
                                  (4.11)
                                                                                                    (4.12)
                                                                                 .

From the above theorem, we observe that, if the multipliers are redefined as some positive function of original ones, ,  say, the nonnegative constraints imposed on the multipliers are removed completely. This implies that it is no longer necessary to convert inequality constraints into equality ones by slack variables in order to reuse those results concerned merely with equality constraints in the construction of Lagrange neural networks, and they are used directly[13]. The Lagrange-type neural network that is conceived with this method has two advantages over the original one: First, all the results on equality constraint can be transplanted in their original forms, and be proved similarly with minor modification; second, we may obtain the augmented Lagrangian functions with the same smoothness as that of objective function and constraints, therefore, they are more convenient to implement in circuits.
Define the augmented Lagrangian function as:
          (4.13)
where c is a positive penalty parameter. The aim is to construct a continuous-time dynamical system (Neural Network) that will settle down to the KKT pair of nonlinear programming problem. Here is such a system:
                                         (4.14)
                                     
From the system defined above, we may get ’s analytical expression
                       (4.15)
where assume that is a nonzero initial value of . This expression shows that, if x is outside the feasible region of problem (i.e. there is at least an i such that ) , then the corresponding multipliers will increase exponentially as time t increases. Thus the penalty terms containing those multipliers will increase continuously with time until the constraints are satisfied. This procedure will be repeated incessantly unless all the constraints are satisfied. Hence x will eventually be conducted into the feasible region whereas the multipliers associated with the inactive constraints approach to zero, and the remaining multipliers to constant.

4.34 Analysis of stability and convergence
Second-Order Sufficient Conditions: Let be regular point for problem. If there exists vector satisfying
                                            (4.16)
                                     (4.17)
                                                                                                                                                          .
and for every such that for every , it follows that;
                          (4.18)
In addition, satisfies the strict complementarity assumption
                                                         (4.19)
then is a strict local minimum of (4.7).
By straightforward calculation, the gradient and Hessian matrix of the augmented Lagrangian function with respect to x are respectively given as

and

Let and e as defined in the second order sufficient conditions. There is, for any c,
                                            (4.20)
and there exists a by Lemma 0.4 such that;

Equations (14) and (15) show that  is a strict local minimum of the augumented Lagrangian function 
We now state without proof the following important proposition [ ].

4.35 Proposition
Let is the KKT pair for problem. If the second-order sufficient conditions are satisfied, then system (7) is locally asymptotically exponentially stable.

4.36 A Model for nonlinear programming
4.36.1 The Basic Problem Formulation:
Consider the following nonlinear convex programming problem subject to linear equality and inequality constraints. We call it the primal problem:
                              (4.21)
Where x is an n-dimensional vector,  is a continuously differentiable function on .  are coefficient matrices and are constant column vectors. The minimization problem for the primal (1) is now written;
          (4.22)
Where is the Lagrange function and  are Lagrange multipliers. According to the Karush-Tucker (KKT)[  ] condition, is a solution to (1) if and only if there exist such that  satisfies the following conditions;
         (3)(4.23)
4.36.2 Neural Network Model
Here, we introduce the novel recurrent network model [ ] for solving (4.21) and its dual as follows;
     (4)(4.24)
                                                            (5) (4.25)
                                       (6) (4.26)
Where k is some positive constant. The main property of the above system is stated in the following theorem.

4.36.3 Theorem
If the solution of the neural network described by the differential equations (4.24)-(4.26) converges to a stable state  then converges to the optimal solution of the primal problem (4.21) and the Lagrange multipliers  converge to the optimal solution of the dual of the convex programming problem(1).
Proof
Let be the ith component of x, then equation (4) can be written as;
         (4.27)
(8)(4.28)
Condition (4.28) is to ensure that x will be bounded below by 0. Let  be the limit of , i.e.;
                                         (4.29)
By stability of convergence, we have

From equations (7) and (8) we have;
            (4.30)
            (4.31)
In other words;
                                            (4.32)
                                        (4.33)
Or                                                (4.34)
                                        (4.35)
Similarly from equations (5) and (6)
                                                                             (4.36)
                                                                             (4.37)
                                                                      (4.38)
By KKT conditions in (3) and conditions in (4.34)-(4.38) we have shown that  and are the optimal solutions for the problem (1) and its dual problem respectively.
4.36.4 Remark
In chapter four we will provide numerical examples to illustrate the foregoing.
In this section we now present a number of numerical examples to illustrate the neural network ability to solve nonlinear programming problems. We begin with examples on recurrent neural network discussed in chapter 3.

4.37 Simulation Example for testing the Neural Networks

To demonstrate the behaviour and properties of the recurrent neural network model, two examples are simulated by using the Euler method for time step
dt = 0.01 and n =1000. We consider in particular the following convex nonlinear programming problem;
                                                
The solution of the above problem is the point  and this can be easily verified analytically. We have test the neural network model by constructing  and solving the proposed differential equations (4.24)-(4.26)as follows. In compact form, the required equations are;
     (4.39)
                                                              (4.40)
                                                (4.41)
Since, then
, and 
It then follows that;

Applying the Euler scheme we get
 
Here we take the time step h = k = dt = 0.01, and  j = 1,2,..., N – 1.


Or
 
The Euler scheme can be written more compactly as,

                                 (4.42)

   h = 0.01, and  j = 1,2,..., N – 1.    
In Table 4.1 we give various set of initial input values for the neural network model and the corresponding optimal solutions.

Initial primal variables
Initial dual variables
Optimal solution




primal
dual






1
– 1
– 1
1
0.0016
1.1225
–0.0021
0.05
1.26001137
– 1

1
1
– 1
0.0005
1.8843
0.0054
–0.665
3.55058649
1


2
– 3
1
0.00004
1.9621
0.01321
-0.732
3.84983641

Table 4.1

4.38 Discussion
For the first initial point the convergence of the Euler scheme after 60 iterations does not give a good value for the optimal solution. However the subsequent initial values perform better at reaching the optimum value of the objective function.
The above analysis has demonstrated the ability of neural networks to solve both unconstrained and constrained mathematical programming problems, which has been the major objective of this work.

CHAPTER FIVE
SUMMARY AND CONCLUSION
 5.1 Summary
Optimality criteria form the foundations of mathematical programming both theoretically and computationally.
On the other hand constrained optimization problems are defined as the mathematical representation of real world problems concerned with the determination of a minimum or a maximum of a function of several variables, which are required to satisfy a number of constraints. Such function optimization are sought in diverse fields, including mechanical, electrical and industrial engineering, operational research, management sciences, computer sciences, system analysis, economics, medical sciences, manufacturing, social and public planning and image processing.
In this work we have been concerned with the problem of solving optimization problems using artificial neural networks. We considered in particular gradient based neural networks which works well for unconstrained problems. For constrained optimization problems we introduced the Lagrange neural network for equality constrained problems and also the recurrent network model for more general constrained problems. We were able to show that Neural Networks are capable of solving nonlinear programming problems, and in particular they provide better insight into the nature of solutions when conventional methods fail.

5.2 Conclusion
The gradient based neural network as well as the recurrent neural network   introduced in this work possess many advantages compared to existing traditional algorithm  for solving  nonlinear programming problems. It converges to the exact solutions of primal and dual problem without requiring any parameter tuning. The model is simple to use. Another advantage of the model is that it is very intuitive and can be explained in common sense without formal mathematics.

References
[1] A. Cichocki, R. Unbehauen, Neural Networks for Optimization and Signal Processing, John Wiley & Sons, Chichester-New York-Brisbane-
Toronto-Singapore, 1993.
[2] L. O. Chua and G. N. Lin. “Non-linear programming without computation”, IEEE Trans. Circuits and Systems, CAS-31, pp.182- 186, 1984.
[3] D. W. Tank and J. J. Hopfield. “Simple “neural” optimization networks: an A/D converter, signal decision circuit, and a linear programming circuit”, IEEE Trans. Circuits and Systems, CAS-33, pp.533-541, 1986.
[4] M. J. Smith and C. L. Portmann. “Practical design and analysis of a simple “neural” optimization circuit”, IEEE Trans. Circuit and Systems, Vol.36, pp.42-50, 1989.
[5] M. P. Kennedy and L. O. Chua. “Neural networks for non-linear programming”, IEEE Trans. Circuit and Systems, Vol. 35, pp.554-562, 1988.
[6] W. E. Lillo, M. H. Loh, S. Hui and S. H. Z(ak. “On solving constrained optimization problems with neural networks : a penalty method approach”, Technical Report TR EE 91-43, School of EE, Purdue University, West Lafayette, IN, 1991
[7] W. E. Lillo, S. Hui, S. Hui and S. H. Z ( ak. “Neural network for constrained optimization problems”, International Journal of Circuit Theory and Applications, vol. 21, pp.385-399, 1993.
[8] V.M. Mladenov, P.N. Proshkov, "Modelling and Simulation of Continuous Neural Networks for Constrained Optimization Problems", 2nd IMACS International Conference on: Circuits, Systems and Computers, (CSC'98), Greece,
published in "Recent Advances in Information Science and Technology", World Scientific, ISBN 981-02-3657-3, 386-393, 1998.
[9] Mladenov V.M., N. Mastorakis, "Design of 2-Dimensional Recursive Filters by using Neural Networks", CSCC’99, Athens, Greece, "Computational Intelligence and Applications", (N.Mastorakis Editor), WSES Press, Athens 1999, pp. 12-20.
[10] V.M. Mladenov, N. Maratos, “Neural Networks for Solving Constrained Optimization Problems”, CSCC’00, Athens, Greece, (N.Mastorakis Editor), WSES Press, Athens 2000.
[11] D.G. Luenberger, Linear and Nonlinear Programming, Addison-Wesley, 1984.
[12] E. Polak, Optimization: Algorithms and Consistent Approximations, Springer, 1997.
[13] A.F. Filippov, “Differential Equations with Discontinuous Righthand Sides”, Kluwer Academic Publishers, Dordrecht, the Netherlands, 1988.
[14] Wang, J., 1993. Analysis and design of a recurrent neural network for linear
programming. IEEE Transactions on Circuits and Systems – I: Fundamental
Theory and Applications 40, 613–618.
[15] Wang, J., 1996. A recurrent neural network for solving the shortest path problem. IEEE Transactions on Circuits and Systems – I: Fundamental Theory and
Applications 43, 482–486.
[16] Wang, J., 1997. Primal and dual assignment networks. IEEE Transactions on Neural Networks 8, 784–790.
[17] Wang, J., 1998. Primal and dual neural networks for shortest-path routing. IEEETransactions on Systems, Man, and Cybernetics – Part A: Systems and Humans 28, 864–869.
[18] Watta, P.B., Hassoun, M.H., 1996. A coupled gradient network approach for static and temporal mixed-integer optimization. IEEE Transactions on Neural
Networks 7, 578–593.
[19] Widrow, B., Rumelhart, D.E., Lehr, M.A., 1994. Neural networks: Applications in industry, business and science. Communication of the ACM 37, 93–105.
[20] Wilson, G.V., Pawley, G.S., 1988. On the stability of the traveling salesman problem algorithm of Hopfield and Tank. Biological Cybernetics 58, 63–70.
[21] Wilson, R.L., Sharda, R., 1992. Neural networks. OR/MS Today 19, 36–42.
[22] Wong, B.K., Lai, V.S., Lam, J., 2000. A bibliography of neural networks in business: Techniques and applications research: 1994–1998. Computers and Operations Research 27, 1045–1076.
[23] Wu, A.I., Tam, P.K.S., 1999. A neural network methodology and strategy of quadratic optimization. Neural Computing and Applications 8, 283–289.
[24] Wu, X.Y., Xia, Y.S., Li, J., Chen, W.K., 1996. A high-performance neural network for solving linear and quadratic programming problems. IEEE Transactions on Neural Network 7, 643–651.
[25]  Xia, Y., 1996. A new neural network for solving linear programming and its
application. IEEE Transactions on Neural Networks 7, 525–529.
[26] Xia, Y., Wang, J., 1995. Neural network for solving linear programming problem with bound variables. IEEE Transactions on Neural Networks 6, 515–519.
[27] Xia, Y., Wang, J., 1998. A general methodology for designing globally convergent optimization neural networks. IEEE Transactions on Neural Networks 9, 1331–1334.
[28] Xia, Y., Feng, G., Wang, J., 2005. A primal–dual neural network for online resolving constrained kinematic redundancy in robot motion control. IEEE Transactions on Systems Man and Cybernetics Part B – Cybernetics 35, 54–64.
[29] Yalcinoz, T., Short, M.J., 1997. Large-scale economic dispatch using an improved Hopfield neural network. IEE Proceedings Generation, Transmission and
Distribution 144, 181–185.
[30], S.H., Upatising, V., Hui, S., 1995. Solving linear programming problems with neural networks: A comparative study. IEEE Transactions on Neural Networks 6, 94–104. The MatheWorks: <http://www.mathworks.com/>.
[31] Zhang, X.S., 2000. Neural Networks in Optimization. Kluwer Academic, London.
[32] Zhang, S., Constantinides, A.G., 1992. Lagrange programming neural networks. IEEE Transactions on Circuits Systems 39, 441–452.
[33] Zhang, H.C., Huang, S.H., 1995. Applications of neural networks in manufacturing: A state-of-the-art survey. International Journal of Production Research 33, 705– 728.
[34] Zurada, J.M., 1992. Introduction to Artificial Neural Systems. West, New York. U.-P. Wen et al. / European Journal of Operational Research xxx (2008) xxx–xxx 13
ARTICLE IN PRESS


THIS IS A SAMPLE | WE ARE PROFESSIONALS IN WRITING

READ RECENT UPDATES HERE