Stability Analysis of Gradient-Based Neural Networks for Optimization Problems
Abstract | The paper introduces a new approach to analyze the stability of neural network models without using any Lyapunov function. With the new method, we investigate the stability properties of the general gradient-based neural network model for optimization problems. Our discussion includes both isolated equilibrium points and connected equilibrium sets which could be unbounded. For a general optimization problem, if the objective function is bounded below and its gradient is Lipschitz continuous, we prove that a) any trajectory of the gradient-based neural network converges to an equilibrium point, and b) the Lyapunov stability is equivalent to the asymptotical stability in the gradient-based neural networks. For a convex optimization problem, under the same assumptions, we show that any trajectory of gradient-based neural networks will converge to an asymptotically stable equilibrium point of the neural networks. For a general nonlinear objective function, we propose a re ned gradientbased neural network, whose trajectory with any arbitrary initial point will converge to an equilibrium point, which satis es the second order necessary optimality conditions for optimization problems. Promising simulation results of the re ned gradient-based neural network on some problems are also reported. Index terms | gradient-based neural network, equilibrium point, equilibrium set, asymptotic stability, exponential stability.
Abstract | The paper introduces a new approach to analyze the stability of neural network models without using any Lyapunov function. With the new method, we investigate the stability properties of the general gradient-based neural network model for optimization problems. Our discussion includes both isolated equilibrium points and connected equilibrium sets which could be unbounded. For a general optimization problem, if the objective function is bounded below and its gradient is Lipschitz continuous, we prove that a) any trajectory of the gradient-based neural network converges to an equilibrium point, and b) the Lyapunov stability is equivalent to the asymptotical stability in the gradient-based neural networks. For a convex optimization problem, under the same assumptions, we show that any trajectory of gradient-based neural networks will converge to an asymptotically stable equilibrium point of the neural networks. For a general nonlinear objective function, we propose a re ned gradientbased neural network, whose trajectory with any arbitrary initial point will converge to an equilibrium point, which satis es the second order necessary optimality conditions for optimization problems. Promising simulation results of the re ned gradient-based neural network on some problems are also reported. Index terms | gradient-based neural network, equilibrium point, equilibrium set, asymptotic stability, exponential stability.
This research was supported in part by grants FRG/98-99/II-29 and FRG/99-00/II-23 from Hong Kong Baptist University, and by the State Key Lab of Scienti c and Engineering Computing in P. R. China. y School of Mathematics and Computer Science, Nanjing Normal University, Nanjing 210097, P.R. China. z Corresponding author. Department of Mathematics, Hong Kong Baptist University.
1 Introduction
Optimization problems arise in almost every eld of engineering, business and management sciences. In many engineering and scienti c applications, the real-time solutions of optimization problems are demanded. However, traditional algorithms for digital computers may not be able to provide the solutions on-line. Therefore, the search for real-time on-line solutions in such cases becomes not only important but also essential. In 1980's, an attractive and very promising approach was introduced to provide real-time solutions for optimization problems. The new approach is termed arti cial neural network (ANN). Hop eld and Tank 12, 13, 28] initiated the recent study of neural networks in optimization. Generally speaking, ANN provides an alternative and attractive way for the solution of optimization problems. The signi cant and unique feature of ANN to optimization is the realization of simple and real-time hardware (or circuit) implementation. In other words, an electrical circuit can be constructed which generates the on-line solution of certain optimization problems. The seminal work of Hop eld and Tank has inspired many researches to investigate various neural networks for solving linear and nonlinear programming problems. Numerous optimization neural networks have been developed, see 3, 15, 24, 21, 22, 27, 1, 5, 2, 20, 34, 14, 16, 17, 18, 19, 30, 31, 33]. From the optimization point of view, most of the existing ANN models for optimization problems could be divided into two classes. One class is the gradient-based neural network models, which are used for the unconstrained optimization problems. These problems normally come from a) some kind of transformations from the constrained minimization problems with penalty function methods 15, 24, 22, 2, 20]; and b) the complementarity problems with the NCP functions 17, 18]. The other class is the projective gradient based neural network models, which are derived from constrained minimization problems and complementarity problems with KKT-conditions 34, 14, 19, 30, 31]. 30, 31] present some neural networks for solving linear, quadratic and nonlinear convex programming problems, which are based on KKT system for optimization problems. Their models correspond to some variants of the projective gradient method for the complementarity problem and the variational inequality problem 7, 23, 8, 9, 10, 11]. It should be noted that for Lagrangian conditions with the nonnegative Lagrange multiplier, if the Lagrange multiplier is penalized or transformed into unrestricted case, the resulting neural network model belongs to the rst class. On the other hand, if the nonnegative Lagrange multiplier is enforced by projection, it belongs to the second class. In this paper, we pay our attention to the following unconstrained optimization problem min E (x); x 2 Rn (1) with the general gradient-based motion equation, that is dx(t) = ?Hg(x(t)); x(t ) = x 2 Rn; (2) 0 0 dt where E (x) : Rn ! R is the objective function or called the energy function, g(x) : Rn ! Rn is the gradient of E (x), x0 2 Rn is an arbitrary initial point and H is a 2 symmetric positive de nite matrix. In this paper, we will assume that E (x) is smooth, i.e., E (x) 2 C 2 (Rn). Generally speaking, a neural network model consists of a) an energy function, b) a motion equation which is stable, and c) the fact that the energy function should be monotonically decreasing along the solution of the motion equation. For simplicity, in the rest of the paper, we will call (2) as the general gradient-based neural network. The veri cation of b) and c) for the general gradient-based neural network (2) will be conducted later. The analysis of if a motion equation is stable or not is called stability analysis. The stability analysis for various neural network models has been addressed in the literature, see 15, 34, 21, 1, 20, 30, 2]. However, all these discussions are based on the Lyapunov's direct method which requires the existence of a Lyapunov function. Unfortunately, a Lyapunov function may not always exist. This certainly limits the stability analysis for a general neural network model. In this paper, we introduce a new approach to address the stability issues of a general neural network model. The detailed discussion is given in Section 2. But rst, let us review some of the classical results. For the ordinary di erential equation dx(t) = f (x); (3) dt we rst state some classical results on the existence and uniqueness of the solution, and some stability de nitions for the dynamic system (3) in 29, 25, 32]. Theorem 1 32] Assume that f is a continuous function from Rn to Rn. Then for arbitrary t0 0 and x0 2 Rn there exists a local solution x(t) satisfying x(t0 ) = x0 ; t 2 t0 ; ) to (3) for some > t0 . If furthermore f is locally Lipschitz continuous at x0 then the solution is unique, and if f is Lipschitz continuous in Rn then can be extended to 1. De nition 1 (Equilibrium point) A point x 2 Rn is called an equilibrium point of (3) if f (x ) = 0. De nition 2 (Stability in the sense of Lyapunov) Let x(t) be the solution of (3). An isolated equilibrium point x is Lyapunov stable if for any x0 = x(t0 ) and any scalar " > 0 there exists a > 0 such that if kx(t0) ? x k < then kx(t) ? x k < " for t t0 . De nition 3 (Asymptotic stability) An isolated equilibrium point x is said to be asymptotically stable if in addition to being a Lyapunov stable it has the property that x(t) ! x as t ! 1, if kx(t0 ) ? x k < . De nition 4 (Exponential stability) An isolated equilibrium point x is exponentially stable for (3) if there exist ! < 0; > 0; > 0 such that any arbitrary solution x(t) of (3), with the initial condition x(t0 ) = x0 , kx(t0 ) ? x k < , is de ned on 0; 1) and satis es kx(t) ? x k e!t kx(t0 ) ? x k; t t0 : 3
Note that above classical stabilities are only for isolated equilibrium points. Nonisolated equilibrium points could occur inevitably in the dynamic system for optimization problems. For example, if E (x) = (x2 + x2 ? 1)2 , the points on the circle x2 + x2 = 1 1 2 1 2 are all the equilibrium points of (2). So it is necessary to extend the de nitions of various stabilities from the isolated equilibrium points to non-isolated ones. De nition 5 (Connected equilibrium set) The subset ? of the equilibrium set S = fx 2 Rn : f (x) = 0g of the dynamic system (3) is called a connected equilibrium set of S , if for any x1 ; x2 2 ? there exists a continuous curve in ? connecting x1 and x2 , and if there exists a positive number such that there is no other equilibrium point of (3) except ? itself in the open neighborhood B ( ; ?) = fx 2 Rn : d(x; ?) < g of ?, where d(x; ?) = inf y2? kx ? yk denotes the distance from x to ?. De nition 6 (Stability for connected equilibrium set) Let x(t) be the solution of (3). The connected equilibrium set ? is stable if for any x0 = x(t0 ) and any scalar " > 0 there exists a > 0 such that if d(x(t0 ); ?) < then d(x(t); ?) < " for t t0 . De nition 7 (Asymptotic stability for connected equilibrium set) The connected equilibrium set ? is said to be asymptotically stable if in addition to being stable it has the property that d(x(t); ?) ! 0 as t ! 1, if d(x(t0 ); ?) < . De nition 8 (Exponential stability for connected equilibrium set) The connected equilibrium set ? is exponentially stable for (3) if there exist ! < 0; > 0; > 0 such that any arbitrary solution x(t) of (3), with the initial condition x(t0 ) = x0 , d(x(t0 ), ?) < , is de ned on 0; 1) and satis es d(x(t); ?) e!t d(x(t0 ); ?); t t0 : Note that there is a simple fact that the objective function E (x) in (1) remains constant on the connected equilibrium set ? of the system (2) since g(x) = 0 on ? and ? is connected. In the above example, E (x) equals zero on the equilibrium set f(x1 ; x2)jx2 + x2 = 1g. 1 2 The following Theorem 2 provides some interesting properties for autonomous system (3). Theorem 2 25] (a) If x(t); r1 t r2 is a solution of (3), then for any real constant c the function x1(t) = x(t + c) is also a solution of (3). (b) Through any point passes at most one trajectory. Throughout this paper, we assume that the objective function E (x) in (1) is bounded below and continuously di erentiable. In addition, its gradient function g(x) is Lipschitz continuous in Rn, i.e. there exists a constant L (> 0) such that kg(x1) ? g(x2)k Lkx1 ? x2 k for any x1; x2 2 Rn: (4) Our study in this paper is mainly on the various stability properties of the gradientbased neural network (2) for the optimization problem (1). Our contributions consist of the following ve parts.
First, we introduce a new approach to study the stability issues for various motion equations. Second, under some mild assumptions, we prove that any trajectory of the gradient-based neural network (2) converges to an equilibrium point of (2) as t ! 1. Third, we show that the Lyapunov stability is equivalent to the asymptotic stability in the gradient-based neural networks (2) for (1). As a direct corollary, for the convex objective function, we obtain that any trajectory of the gradient-based neural network (2) converges to an asymptotically stable equilibrium point under some mild conditions. These conditions are much weaker than those in Lyapunov's direct method or in the invariant set method 26, Chapter 3]. Finally, for the general nonlinear objective function, we propose a re ned gradientbased neural network, whose trajectory with any arbitrary initial point will converge to an equilibrium point, which satis es the second order necessary optimality conditions for optimization problems. The rest of the paper is organized as follows. In Section 2, detailed discussions on stability issues of the gradient-based neural network (2) for the optimization problem (1) are provided. A re ned neural network model for the general optimization problem is provided in Section 3. Some stability issues are also addressed in this section. Section 4 is devoted to the simulation of the re ned neural network on some problems. Promising simulation results are reported. Finally, some concluding remarks are drawn in Section
2 Analysis for Gradient-Based Neural Network
The analysis in this section is for H = I , the identical matrix in (2), for simplicity. For the case where H is symmetric positive de nite matrix other than the identical matrix, the analysis is similar with the norm kxkH ?1 = xT H ?1x and the induced distance d(x; y) = kx ? ykH ?1 for any x; y 2 Rn. In the literature 15, 34, 21, 1, 20, 30, 2], the stability analysis for a motion equation is proved by Lyapunov's direct method which requires the existence of a Lyapunov function. But unfortunately, a Lyapunov function may not exist in general. To overcome this di culty, we introduce a new approach to address the stability issues of a motion equation without using any Lyapunov function. In addition, the approach does not require the prior information of the equilibrium point or set at all. The new approach only pays the attention to the energy function E (x) and the motion equation. In the rest of this section, we will illustrate how this approach works for the general gradient-based neural network model (1)-(2). 5
To obtain our main results Theorem 4 { Theorem 7, we need the following preliminaries.
Theorem 3 Let g(x) in (2) be Lipschitz continuous in R and g(x) 6 0, then there is n no periodic solution for the neural network system (2).
Proof: From Theorem 1 there exists a unique solution x(t) satisfying x(t0 ) = x0 , t 2 t0 ; 1). Suppose that there is a periodic solution x(t) and the minimal period is
x(t) = x(t + T ); 8 t 2 t0 ; 1): Since g(x) 6 0, there exists a t1 2 0; 1) such that x(t1 ) = x1 ; g(x1) 6= 0, then x(t1 + x T ) = x1 . Since E (x(t)) is not increasing on t1; t1 + T ] from dE(dt(t)) = ?g(x)T g(x) 0, t 2 t1 ; t1 + T ], so E (x(t)) must be constant on t1 ; t1 + T ], which is a contradiction x since dE(dt(t)) jt=t1 = ?g(x1 )T g(x1) < 0. 2 From Theorem 2 and Theorem 3, we can conclude that the phase space of (2) can only consist of equilibrium points, nonintersecting trajectories. Moreover, every trajectory approaches to an equilibrium point, which is shown later (Theorem 4) without any assumption on the boundedness of level sets of the objective function. We extend Rn to include in nite point as an ordinary point, whose meaning is that for every d 6= 0 2 Rn; x0 + d approaches to it as ! 1. It is easy to see there is a simple and important fact that every trajectory of the system (2) is orthogonal to contour curves of the objective function E (x) in (1) if they intersect, because E (x) is continuously di erentiable and E (x) remains constant on any contour. T > 0, that is,
Lemma 1 Let x(t) be a trajectory of the system (2), is any contour of E (x) in (1),
then x(t) is orthogonal to at their intersecting points.
from Theorem 7.2 of 6, p.25], we know that x(t; s) has continuous partial derivative with respect to s. Then since E (x(t; s)) remains constant on its contour, i.e.
Proof: First, let us denote the phase plane of (2) as x(t; s), where s represents the parameter along any contour of E (x) in the phase plane of (2). Since E (x) 2 C 2(R ), n
E (x(t; s)) = constant 8s;
(rxE )T dx = 0 () ( dx )T dx = 0: ds dt ds This proves the lemma. 2 In the proof of Theorem 4, we also need the following Barbalat lemma. 6 we have
Lemma 2 (Barbalat) 26] If a di erentiable function f (t) has a nite limit as t ! 1, and ( ) is uniformly continuous, then ( ) ! 0 as t ! 1. Theorem 4 If E (x) is bounded below and its gradient g(x) is Lipschitz continuous
d t f dt d t f dt
in Rn, then for any initial point x0 , the trajectory x(t) of the system (2), satisfying x(t0 ) = x0 , will converge to an equilibrium point of the neural network (2) as t ! 1.
Proof: Since E (x) is bounded below and monotonic decreasing along the trajectory x(t), it is easy to see that there exists an E 2 R such that lim !1 E (x(t)) = E . Then
we have that
E (x0) ? E = kg(x(t))k2dt; (5) t0 so there exists at least a sequence fti g, satisfying ti < ti+1 and limi!1 ti = 1, such that lim kg(x(ti))k2 = 0: (6) i!1 In the following, we show that kg(x(t))k is bounded in t0 ; 1) by contradiction. Assume that there is a sequence fsig, satisfying si < si+1 and limi!1 si = 1, such that kg(x(si))k > i2 i = 1; 2; : (7) By extracting two subsequences if necessary from ftig and fsig, respectively, we can assume that si 2 ti ; ti+1], i = 1; 2; . Without loss of generality, we assume that kg(x(si))k = t2max ] kg(x(t))k: (8) t ;t
From (6), for
= 0:1, there exists a K1 such that 8 i > K1 , kg(x(ti))k < 0: From (8) and the continuity of g(x), for each i > K1 there exists a i 2 (ti ; si) such that kg(x( i))k = 0 and kg(x(t))k 0 8t 2 ( i; si): (9) This together with (5) imply si ? i ! 0 as t ! 1. Then there exists a K2 such that 8 i > K2 , L(si ? i) 0:1; (10) where L is the Lipschitz constant de ned in (4). From the continuity of g(x), (8) and (10), we have that for i > maxfK1; K2g kg(x(si))k ? 0:1 kg(x(si)) ? g(x( i))k Lkx(si) ? x( i)k (11) = Lkg(x( i + (si ? i))k(si ? i); 2 0; 1] 0:1kg(x(si))k: This is a contradiction to (7). Therefore kg(x(t))k is bounded in t0 ; 1g, that is, there exists an L1 such that kg(x(t))k L1 t 2 t0 ; 1): (12)
Next, we will prove that limt!1 x(t) exists for any initial condition x(t0 ) = x0 , where x(t) is the solution of the system (2). Furthermore, the limit point, say x , is an equilibrium point of the neural network (2). First, let us de ne as the limit set of the trajectory x(t) with x(t0 ) = x0, where ftig is a sequence satisfying ti < ti+1 ; i = 1; 2; and ti ! 1 as i ! 1. Now we prove that ? is a connected set by contradiction. Assume that ? is not connected, then there exist two sets ?1 ; ?2, and two open sets 1 ; 2 such that ?1 ; ?2 ?, ?1 ?2 = ?, ?1 \ ?2 = ;; 1 \ 2 = ;, and ?1 1 ; ?2 2 . We can also assume that at least one of the sets ?1 ; ?2 is bounded otherwise ?1 and ?2 are connected at in nite point. Suppose ?1 and 1 are bounded. Therefore, there exists a bounded closed set D such that 1 D. Considering two points xt 2 ?1 and xs 2 ?2, there are two sequences ftig and fsig with ti < si < ti+1, i = 1; 2; , limi!1 ti = 1 and limi!1 si = 1 such that
i
? = fx : 9 fti g; such that x(t0 ) = x0 and ilim x(ti) = x g !1
lim x(ti ) = xt ; ilim x(si ) = xs: !1 !1
2,
s As a result, there exist an > 0, B (xt ) 1 , B (x ) centered at x with radius , and a K such that 8 i > K ,
where B (x) is a ball
x(ti) 2 B (xt ); x(si) 2 B (xs): From the continuity of x(t), for each i > K , there exists a i 2 (ti ; si) such that x( i) 2 1 ; x( i) 2 2 and x( i) 2 D. Then the closeness and boundedness of D imply = = that there exists a convergent subsequence fx(ui )g such that limk!1 x(ui ) = x . Since 1 and 2 are both open sets, then x 2 1 and x 2 2 , so x 2 ?, which is a = = = contradiction to the de nition of ?. Therefore, ? must be a connected set. From the discussion at the beginning of the proof and the de nition of ?, it is easy to see that the value of E (x(t)) is the constant E on ?, that is, ? is a contour of E (x(t)). The trajectory x(t) of the neural network (2) is orthogonal to ? at the limit point (that is, every point on ?) from Lemma 1. On the other hand, since ? is the limit set of x(t), the trajectory x(t) must converge to ?. This could be true only if ?
k k
8
is a single point set. That is, x(t) converges to a limit point, say x , as t ! 1. And moreover (14) implies g(x ) = 0. 2 The result in Theorem 4 con rms that the energy function E (x) in (1) and the motion equation equation (2) constitute a neural network model. The result in Theorem 4 also shows that any trajectory of the neural network (2) for the optimization problem (1) will converge to an equilibrium point of the neural network (2), regardless of the boundedness of level sets of the objective function and the isolation of the equilibrium point. x Because of dE(dt(t)) = ?kg(x(t))k2, the energy function E (x(t)) in (1) is monotonical decreasing in t if x0 is not an equilibrium point. This fact reveals that the energy function E (x(t)) does not preserve following the trajectory of the neural network system (2). Theorem 3 and Theorem 4 lead to the following surprising results.
Theorem 5 Assume that E (x) in (1) is bounded below and its gradient g(x) is LipsProof: From De nition 5, if ? is a connected equilibrium set of the neural network
chitz continuous in Rn. Let ? be a connected equilibrium set of the neural network (2), if ? is stable, then it is asymptotically stable.
(2), there exists a positive number such that there is no other equilibrium point of (3) except ? itself in any open neighborhood B ( ; ?) = fx 2 Rn : d(x; ?) < g of ?. Since ? is stable, from De nition 6, if x0 is in some small neighborhood of ?, then the trajectory x(t) satisfying x(0) = x0 will stay in some neighborhood of ? for any t 0. From Theorem 4, x(t) will approach to an equilibrium point of neural network (2), so x(t) must approach to a point x 2 ? as t ! 1, that is, ? is asymptotically stable from De nition 7. 2 The result in Theorem 5 would ease many redundant discussions in proving both the stability and asymptotic stability in many neural networks for optimization problems. The asymptotic stability is particularly important for the neural network (2) of optimization problem (1), because only the point(s) in the asymptotically stable connected equilibrium set are the optimal solution(s) of the problem (1). The following Theorem 6 shows the equivalence of the asymptotically stable connected equilibrium set of neural network (2) and the optimal solution set of the problem (1).
Theorem 6 Assume that E (x) in (1) is bounded below and its gradient g(x) is Lip-
schitz continuous in Rn. Let ? be a connected equilibrium set of the neural network (2). Then ? is asymptotically stable if and only if each point of the set is an optimal solution of the optimization problem (1).
Proof: From De nition 7, the connected equilibrium set ? is asymptotically stable
if and only if any trajectory in all very small neighborhood of ? will converge to ?. The objective function E (x(t)) in (1) is decreasing along any trajectory of the neural network (2) as indicated in the proof of Theorem 3. In addition, E (x) remains constant on ?. So the conclusion is true. 2 9
Example 1:
The following two examples illustrate the above results.
min E1 (x; y) = (x + y ? 1)2: (15) The level sets of E1 (x; y) are all unbounded unless it is empty, the neural network for (15) is ( dx = ?(x + y ? 1); dt (16) dy = ?(x + y ? 1): dt The equilibrium set of (16) is the line l1 : x + y ? 1 = 0, which is connected and unbounded. The solutions to (16) are y 1 x = 2 (1 + x0 ? y0) + x0 +20?1 e?2t ; y y = 1 (1 ? x0 + y0) + x0 +20?1 e?2t : 2 The trajectories are lines l2 : y ? x = y 0 ? x 0 : 1 It is obvious to see that every trajectory approaches to the equilibrium point ( 2 (1+ x0 ? 1 y0); 2 (1 ? x0 + y0)) as t ! 1, which is on the line l1, every trajectory l2 is orthogonal to the line l1 , and E1 (x; y) = 0 on the line l1 .
Example 2:
The neural network for (17) is
min E2 (x) = x2e?x:
(17)
dx(t) = ?xe?x (2 ? x): (18) dt The equilibrium set of (18) is f0; 2; 1g, and if x0 > 2; x(t) ! 1; t ! 1; if x0 < 2; x(t) ! 0; t ! 1; if x0 = 2; x(t) = 2; 8 t: Theorem 4 and the above examples indicate that the in nite point can be viewed as an extended ordinary point. If the point is an isolated minimizer of the objective function as in Example 2, some trajectory will approach to it. If the point is not an isolated minimizer as in Example 1, there is no trajectory with a nite initial point can approach to it. In fact, if the set Rn is relaxed by level sets fx : E (x) E (x0 )g for all x0 's, Theorem 1 and Theorem 4 still hold. For instance, in Example 2, g(x) is not Lipschitz continuous in Rn, but Lipschitz continuous in any level set of E (x), so the conclusions still hold as shown above. At the end of this section, we provide some results for the convex optimization problem as the direct corollary of the above discussion. Some similar results have been discussed in previous works of the neural networks for convex optimization problems.
10
Theorem 7 (a) If E (x) is convex, then the equilibrium set of the neural network (2)
is connected and asymptotically stable. (b) If the convex objective function E (x) is bounded below and continuously di erentiable with Lipschitz continuous gradient, then every trajectory x(t) will converge to an asymptotically stable equilibrium point of the neural network (2). (c) If E (x) is uniformly convex, then every trajectory of (2) will converge to an equilibrium point globally and exponentially. Note that there is no any assumption on the boundedness of level sets of E (x) or the isolation of the equilibrium point in Theorem 7. These assumptions are always required in previous works of neural networks for convex optimization problems.
3 A Re ned Neural Network
For the general nonlinear objective function, some results for convex optimization problems as mentioned in Theorem 7 may not hold. However, motivated by the work of 4] where eigenvalues and eigenvectors of a matrix can be computed by neural networks, we propose a re ned gradient-based neural network model for the optimization problem (1) in this section. The essence of the re ned neural network model is the following. First, by using the gradient-based neural network (2), the trajectory of the neural network will converge to an equilibrium point of the neural network. Second, by using the neural network of 4], the minimum eigenvalue, say , of the Hessian matrix at the equilibrium point can be computed. If 0, the model will output the current equilibrium point and stop the simulation. The obtained point satis es the second order necessary optimality conditions for the optimization problem (1). Otherwise < 0, the trajectory is perturbed along the eigenvector corresponding to . With the new initial point, the gradient-based neural network is implemented once again. The process in the second part may be repeated a few times until an equilibrium point with positive semi-de nite Hessian is reached. In the whole process, the reduction of the energy function E (x) is guaranteed. Summarizing the above discussions, we provide the re ned neural network model for the optimization problem (1) as follows.
Step 0. Given an initial point x0 , a very small positive number , and a small positive number . Step 1. Implement the neural network (2) with the initial point x0 to reach an equilibrium point, say x . Step 2. Use the neural network in 4] to compute the minimum eigenvalue and its corresponding eigenvector v of the Hessian matrix H of E (x) at x . If ? , stop. Otherwise, set x0 = x + v and go to Step 1. A block diagram of the re ned neural network is shown in Figure 2, where H = I and (> 0) is a scaling parameter. The MOS switch (see Appendix B, 5]) used in model
Figure 2: Computation of S in the MOS switch of Figure 2 Figure 2 is shown in Figure 3, where " is used to control the stopping of Step 1, for details, see Section 4. The output of Figure 3, S , equals 1 if krxf (x(t))k ", equals zero otherwise. From Theorem 4, the re ned neural network is convergent, which is summarized in the following theorem.
Theorem 8 If E (x) is bounded below and its gradient g(x) is Lipschitz continuous
in Rn, then for any initial point x0 , the trajectory x(t) of the re ned neural network, satisfying x(t0 ) = x0 , will converge to an equilibrium point of the neural network as t ! 1. In addition, the equilibrium point satis es the second order necessary optimality conditions for the optimization problem.
In this section, the re ned neural network in Section 3 is simulated on two problems. Our simulations are conducted with Matlab version 5.2. The numerical ordinary differential equation solver used in all simulations is ode23s. The matrix H in (2) is set to I , and the scaling parameter is xed at = 1000 in all simulations. The stopping criterion is krf (x(t))k " = 10?6: (19) We use tf to denote the nal time when (19) is satis ed. In Step 2 of the re ned neural network, we choose = 10?8 and = 0:1.
Problem 1:
min E (x) = (x3 ? 3x1)2 + x2 : 1 2 The corresponding dynamic system is ( dx1 = ?6x1 (x2 ? 1)(x2 ? 3); 1 1 dt dx2 = ?2x2 : dt
From the de nition of E (x), we can see that it has the following 5 stationary points, p (0;p T , ( 3; 0)T , ( 1; 0). Among these stationary points, three of them, (0; 0)T , and 0) ( 3; 0)T are optimal solutions. For initial points of (1; 1)T and (?1; ?1)T , Step 1 of the re ned neural network drives the trajectories of the system to points (1; 0)T and (?1; 0)T , respectively, which are stationary points but not optimal solutions of the optimization problem. While Step 2 continues the simulation until the system reaches optimal solutions (see Figures 4 and 5). For the other two initial points of (2; 2)T and (?2; ?2)T , the trajectories of the system converge to optimal solutions directly. These simulation results are summarized in Table 1. While the trajectories of x(t) for the four di erent starting points are illustrated in Figures 4-7. Table 1. Simulation results of Problem 1 initial point tf rf (x) f (x) limit point (1; 1)T 0.0107 5.87e-07 2.43e-15 (1.73205,0) (?1; ?1)T 0.0079 5.51e-07 9.48e-15 (-3.04e-08,3.43e-08) (2; 2)T 0.0076 6.11e-07 9.35e-14 (1.73205,0) (?2; ?2)T 0.0076 6.11e-07 9.35e-14 (-1.73205,0)
Problem 2:
min E (x) = (x2 ? x2 + 1)2: 1 2 The corresponding dynamic system is 8 dx1 < dt = ?4x1 (x2 ? x2 + 1); 1 2 : dx2 = 4x2 (x2 ? x2 + 1): 1 2 dt 13
1.8 1.6 1.4 1.2
Trajectories of x(t)
x1(t) 1 0.8 0.6 0.4 0.2 x2(t) 0 -0.2 0 0.002 0.004 0.006 0.008 time in seconds 0.01 0.012
Figure 3: The evolution of x(t) in Problem 1 with x0 = (1; 1)
Figure 7: The evolution of x(t) in Problem 2 with x0 = (0:1; 0) Obviously in Problem 2, the stationary points are (0; 0)T and (x1 ; x2)T satisfying x2 ? 1 x2 = ?1. Optimal solutions for Problem 2 are (x1 ; x2)T satisfying (x1 )2 ? (x2 )2 = ?1. 2 For initial points of (0:1; 0)T and (?0:1; 0)T , Step 1 of the re ned neural network drives the trajectories of the system to point (0; 0)T which is a stationary point but not an optimal solution. While Step 2 continues the simulation until the trajectories reach optimal solutions (see Figures 8 and 9). For the other initial point of (1; ?1)T , the trajectory of the system converges to an optimal solution directly. These simulation results are summarized in Table 2. The trajectories of x(t) for the three di erent starting points are illustrated in Figures 8-10. In Problem 1, the equilibrium points are all isolated. While in Problem 2 the equilibrium set is connected and unbounded. The simulation results of both problems con rm the results of Theorem 8. Table 2. Simulation results of Problem 2 initial point tf rf (x) f (x) limit point T (0:1; 0) 0.0306 6.73e-08 2.83e-16 (0,1) T (?0:1; 0) 0.0306 6.73e-08 2.83e-16 (0,1) (1; ?1)T 0.0183 6.45e-08 1.16e-16 (0.786212,-1.27206)
5 Concluding Remarks
Our research in this paper focuses on the gradient-based neural network (2) for the optimization problem (1), which is the most commonly used neural network model for 16
Figure 9: The evolution of x(t) in Problem 2 with x0 = (1; ?1) 17
optimization problems. First, we extend some de nitions of isolated equilibrium points and the associated stabilities to the case of any connected equilibrium set which could be unbounded. With these more general de nitions, we have obtained the following results: First, a new approach is introduced to analyze stability properties of various neural network models. The new approach does not require the existence of any Lyapunov function. The use of our new approach in stability analysis is illustrated in Section 2. For gradient-based neural networks (2), if E (x) is bounded below and its gradient g(x) is Lipschitz continuous, then any trajectory x(t) of the system (2) will converge to an equilibrium point of the neural network (2). The Lyapunov stability is equivalent to the asymptotically stability in gradientbased neural networks for optimization problems. For convex optimization problems, the equilibrium set is connected and asymptotically stable, and that any trajectory of gradient-based neural networks will converge to an asymptotically stable equilibrium point if the objective function is bounded below and has Lipschitz continuous gradient. Our results require weaker assumptions than the Lyapunov's direct method where the equilibrium point must be isolated or the invariant set method where level sets of the objective function must be bounded. In fact, our assumptions are necessary, because the bounded below assumption on E (x) ensures that the optimization problem (1) is meanful and Lipschitz continuity of g(x) guarantees the existence and uniqueness of the solution of the neural network (2). For the general nonlinear objective function, a re ned gradient-based neural network model is established, whose trajectory with any arbitrary initial point will converge to an equilibrium point, which satis es the second order necessary optimality conditions for optimization problems. We believe that similar conclusions would hold for projective gradient based neural networks for optimization problems.
References
1] A. Bouzerdoum and T. R. Pattison, \Neural network for quadratic optimization with bound constraints," IEEE Trans. Neural Networks, vol. 4, pp. 293-304, 1993. 2] Y. H. Chen and S. C. Fang, \Solving convex programming problem with equality constraints by neural networks," Computers Math. Appli., vol. 36, pp. 41-68, 1998. 3] L. O. Chua and G. N. Lin, \Nonlinear programming without computation," IEEE Trans. Circuits Syst. , vol. 31, pp. 182-188, 1984. 18
4] A. Cichocki and R. Unbehauen, \Neural networks for computing eigenvalues and eigenvectors", Biological Cybernetics, vol. 68, pp. 155-164, 1992. 5] A. Cichocki and R. Unbehauen, Neural Networks for Optimization and Signal Processing. Chichester, U. K.: Wiley, 1993. 6] E. A. Coddington and N. Levinson, Theory of Ordinary Di erential Equations, New York: McGraw-Hill, 1955. 7] A. A. Goldstein, \Convex programming in Hilbert space", Bulletin of American Mathematical Society, vol. 70, pp. 709-710, 1964. 8] B. S. He, \A projection and contraction method for a class of linear complementary problem and its application in convex quadratic programming", Applied Mathematics and Optimization, vol. 25, pp. 247-262, 1992. 9] B. S. He, \A new method for a class of linear variational inequalities", Mathematical Programming, vol. 66, pp. 137-144, 1994. 10] B. S. He, \Solving a class of linear projection equations", Numerische Mathematik, vol. 68, pp. 71-80, 1994. 11] B. S. He, \A class of projection and contraction methods for monotone variational inequalities", Applied Mathematics and Optimization, vol. 35, pp. 69-76, 1997. 12] J. J. Hop eld, \Neurons with graded response have collective computational ability", in Proc. Natl. Acad. Sci., USA 81, pp. 3088-3092, 1984. 13] J. J. Hop eld and D. W. Tank, \Neural computation of decisions in optimization problems", Biol. Cybern. , vol. 52, pp. 141-152, 1985. 14] Z.-G. Hou, C.-P. Wu and P. Bao, \A neural network for hierarchical optimization of nonlinear large-scale systems", International Journal of Systems Science, 29 (2), pp. 159-166, 1998. 15] M. P. Kennedy and L. O. Chua, \Neural networks for nonlinear programming", IEEE Trans. Circuits Syst. , vol. 35, pp. 554-562, 1988. 16] L.-Z. Liao and Y. Dai, \A time delay neural network model for unconstrained nonconvex optimization", manuscript, 1999. 17] L.-Z. Liao and H. Qi, \A neural network for the linear complementarity problem," Math. Comput. Modelling, vol. 29 (3), pp. 9-18, 1999. 18] L.-Z. Liao, H. Qi and L. Qi, \Solving nonlinear complementarity problems with neural networks: A reformulation method approach", (to appear in JCAM). 19] X. B. Liang and J. Wang, \A recurrent neural network for optimizing a continuously di erentiable function with bound constraints," manuscript, 1999. 20] W. E. Lillo, M. H. Loh, S. Hui and S. H. Zak, \On solving constrained optimization problems with neural networks: A penalty method approach," IEEE Trans. Neural Networks, vol. 4, pp. 931-940, 1993. 21] C. Y. Maa and M. A. Shanblatt, \Linear and quadratic programming neural network analysis," IEEE Trans. Neural Networks, vol. 3, pp. 580-594, 1992. 19
22] C. Y. Maa and M. A. Shanblatt, \A two-phase optimization neural network", IEEE Trans. Neural Networks, vol. 3, pp. 1003-1009, 1992. 23] B. T. Polyak, \Constrained minimization problems", USSR Computational Mathematics and Mathematical Physics, vol. 6, pp. 1-50, 1966. 24] A. Rodriguez-Vazquez, R. Dominguez-Castro, A. Rueda, J. L. Huertas and E. Sanchez-Sinencio, \Nonlinear switch-capacitor 'neural' networks for optimization problems," IEEE Trans. Circuits Syst. , vol. 37, pp. 384-398, 1990. 25] D. A. Sanchez, Ordinary Di erential Equations and Stability Theory: An Introduction. W. H. Freeman and Company, San Francisco, 1968. 26] J. J. E. Slotine and W. Li, Applied Nonlinear Control. Englewood Cli s, NJ: Prentice Hall, 1991. 27] S. Sudharsanan and M. Sundareshan, \Exponential stability and a systematic synthesis of a neural network for quadratic minimization," Neural Networks, vol. 4, pp. 599-613, 1991. 28] D. W. Tank and J. J. Hop eld, \Simple neural optimization networks: An A/D convert, signal decision circuit, and a linear programming circuit," IEEE Trans. Circuits Syst. , vol. 33, pp. 533-541, 1986. 29] J. L. Williems, Stability Theory of Dynamical Systems, Nelson, 1970. 30] X. Wu, Y. Xia, J. Li and W. K. Chen, \A high performance neural network for solving linear and quadratic programming problems," IEEE Trans. Neural Networks, vol. 7, pp. 643-651, 1996. 31] Y. Xia and J. Wang, \A general methodology for designing globally convergent optimization neural networks", IEEE Trans. Neural Networks, vol. 9, pp. 13311343, 1998. 32] J. Zabczyk, Mathematical Control Theory: An Introduction, Birkhauser, Boston, 1992. _ 33] S. H. Zak, V. Upatising and S. Hui, \Solving linear programming problems with neural networks: a comparative study", IEEE Trans. Neural Networks, vol. 6, pp. 94-104, 1995. 34] S. Zhang and A. G. Constantinides, \Lagrange programming neural network," IEEE Trans. Circuits Syst., vol. 39, pp. 441-452, 1992.