DC Programming and DCA

Theory, Algorithms and Applications

DC Programming and DCA constitute the backbone of smooth/nonsmooth nonconvex programming and global optimization. They are inroduced by Pham Dinh Tao in 1985 in their preliminary form and extensively developed by Le Thi Hoai An and Pham Dinh Tao since 1993 to become now classic and more and more popular. This page collects links to papers, software, announcements, etc. that may be of relevance for people working DC Programming and DCA.

DC PROGRAMMING

DC = Difference of Convex functions

The DC programming and its DC algorithm (DCA) address the problem of minimizing a function f = g−h (with g, h being lower semicontinuous proper convex functions on Rⁿ) on the whole space.

α= inf {f(x) ):= g(x) )− h(x): x ∈ Rⁿ}

A constained DC program whose feasible set C is convex can always be transformed into an unconstrained DC program by adding the indicator function of C (it is equal to 0 in C, infinity elsewhere) to the first DC component g.

WHY DC PROGRAMMING ?

In recent years there has been a very active research in the following classes of nonconvex and non-differentiable optimization problems

(1) sup {f(x) : x ∈ C}, where g, f and C are convex

(2) inf {g(x) - h(x) : x ∈ IRⁿ} where g, h are convex

(3) inf {g(x) - h(x) : x ∈ C, f₁(x) - f₂(x) ≤ 0} where g, h, f₁, f₂ and C are convex

The reason is quite simple: most real life optimization problems are of nonconvex nature. Moreover industrialists have begin to replace convex models by nonconvex ones which are more complex but more reliable and especially more economic.
Problem (1) is a special case of Problem (2) with g is the indicator function of C and h = f, while the latter (resp. Problem (3)) can be equivalently transformed into the form of (1) (resp. (2)) by introducing an additional scalar variable (resp. via exact penalty relative to the dc constraint f₁(x) - f₂(x) ≤ 0).

Clearly the complexity increases from (1) to (3), the solution of one of them implies the solution of the two others. Problem (2) is called a DC program whose particular structure has been permitting a good deal of development both in qualitative and quantitative studies.

We can say that all most reald world optimization problems can be formulated as DC programs.

The richness of the set of DC functions on X = Rⁿ, denoted by DC(X) :

(i) DC(X) is a subspace containing the class of lower-C2 functions. In particular, DC(X) contains the space C1,1(X) of functions whose gradient is locally Lipschitzian on X.

(ii) DC(X) is closed under all the operations usually considered in optimization.

In particular, a linear combination of f_i DC(X) belongs to DC(X), a finite supremum of DC functions is DC. Moreover, the product of two d.c. functions remains DC.

(iii) Under some caution we can say that DC(X) is the subspace generated by the convex cone Γo(X) :

DC(X) = Γo(X) − Γo(X).

This relation marks the passage from convex optimization to nonconvex optimization and also indicates that DC(X) constitutes a minimal realistic extension of Γo(X) - the convex cone of all lower semicontinuous proper convex functions on Rⁿ.

DCA

DCA = DC Algorithm

WHAT IS DCA ?

DCA is an optimization approach based on local optimality and the duality in DC programming for solving DC programs. DCA have been introduced by Pham Dinh Tao in 1985 as an extension of his subgradient algorithms (for convex maximization programming) to DC programming. Crucial developments and improvements for DCA from both theoretical and computational aspects have been completed since 1993 throughout the joint works of Le Thi Hoai An and Pham Dinh Tao.

WHY DCA ?

Global optimization approaches such as Branch and Bound, Cutting plan for DC programs do not work in large-scale DC programs that we are often faced in real world problems !
DCA is a robust and efficient method for nonsmooth nonconvex programming which allow to solve large-scale DC programs.
DCA is simple to use and easy to implement.
DCA was successfully applied l arge scale setting to a lot of different and various nonconvex optimization problems to which it quite often gave global solutions and proved to be more robust and more efficient than related standard methods, especially in the

It is worth noting that for suitable DC decompositions , DCA generates almost standard algorithms in convex and nonconvex programming.

AN INTRODUCTION TO DCA

In recent years there has been a very active research in nonconvex programming. There are two different but complementary approaches we can say two schools in dc programming. A great deal of work involves global optimization (which is concerned with finding global solutions to nonconvex programs) whose main tools and solution methods are developed according to the spirit of the combinatorial optimization, but with the difference that one works in the continuous framework (See R. Horst and H. Tuy 1993, R. Horst, P.M. Pardalos and N.V. Thoai 1995).

Beside this combinatorial approach to global continuous optimization, the convex analysis approach to nonconvex programming has been much less worked. It seemed to take rise in the works of Pham Dinh Tao on the computation of bound-norms of matrices (i.e. maximizing a seminorm over the unit ball of a norm) in 1974. These works are extended in a natural and logical way to the DC (difference of convex functions) program:

α= inf {f(x) := g(x) − h(x): x ∈ X } (P_dc)

where X = Rⁿ is the usual Euclidean space and g, h hare lower semicontinuous proper convex functions on X.

Indeed we would like to make an extension of convex programming, not too large to still allow using the arsenal of powerful tools in convex analysis and convex optimization but sufficiently wide to cover most real world nonconvex optimization problems . There the convexity of the two DC components g and h of the objective function has been used to develop appropriate tools from both theoretical and algorithmic viewpoints. The other support of this approach is the DC duality, which has been first studied by J.F. Toland in 1978 (Toland, 1978, 1979) who generalized, in a very elegant and natural way, the just mentioned works by Pham Dinh Tao on convex maximization programming (g then is the indicator function of a non-empty closed convex set in X). In contrast with the combinatorial approach where many global algorithms have been studied, there have been a very few algorithms for solving DC programs in the convex analysis approach. Here we are interested in local and global optimality conditions, relationships between local and global solutions to primal DC programs and their dual

α=inf {h*(y) − g*(y): y ∈ Y } (D_dc)

(where Y is the dual space of X, which can be identified with X it self, and g*, h* denote the conjugate functions of gand h, respectively) and solution algorithms.

The transportation of global solutions between (P_dc) and (D_dc)is expressed as:

if x* is an optimal solution of (P_dc), then y* in ∂h(x*) is an optimal solution of (D_dc),
if y* is an optimal solution of (D_dc), then x* in ∂g*(x*) is an optimal solution of (P_dc).

Under technical conditions, this transportation holds also for local solutions of (P_dc) and (D_dc).

DC programming investigates the structure of the vector space DC(X), DC duality and optimality conditions for DC programs. The complexity of DC programs resides, of course, in the lack of practical optimal globality conditions. We developed instead the following necessary local optimality conditions for DC programs in their primal part, (by symmetry their dual part is trivial ):

∂g(x*) ∩ ∂h(y*) ≠ Ø (1)

(such a point x is called critical point of g − h or for (P_dc)), and

∂g(x*) ⊃ ∂h(y*) (2)

The condition (2) is also sufficient for many important classes of DC programs. In particular it is sufficient for the next cases quite often encountered in practice:

In polyhedral DC programs with h being a polyhedral convex function. In this case, if h is differentiable at a critical point x, then x is actually a local minimizer for (P_dc). Since a convex function is differentiable everywhere except for a set of measure zero, one can say that a critical point x is almost always a local minimizer for (P_dc).
In case the function f is locally convex at x .

HOW DCA WORKS ?

Based on local optimality conditions and duality in DC programming, the DCA consists in the construction of two sequences {x^k} and {y^k}, candidates to be optimal solutions of primal and dual programs respectively, such that the sequences {g(x^k)−h(x^k)} and {h(y^k)−g(y^k)} are decreasing, and {x^k} (resp. {y^k}) converges to a primal feasible solution x* (resp. a dual feasible solution y*) verifying local optimality conditions and

x* ∈ ∂g*(y*)

y* ∈ ∂h(x*)

These two sequences {x^k} and {y^k} are determined in the way that x^{k+1} (resp. y^k) is a solution to the convex program (P_k) (resp. (D_k)) defined by

inf {g(x) − h(x^k) − <x − x^k, y^k> : x ∈ Rⁿ}, (P_k)

inf {h*(y) − g*(y^{k−1})− <y − y^{k−1}, x^k> : y ∈ Rⁿ} (D_k).

The first interpretation of DCA is simple : at each iteration one replaces in the primal DC program (P_dc) the second component h by its affine minorization h_k(x) := h(x^k) + <x − x^k>, y^k> at a neighbourhood of x^k to give birth to the convex program (P_k) whose the solution set is nothing but ∂g(y^k). Likewise, the second DC component g of the dual DC program (D_dc) is replaced by its affine minorization (g * )_k(y) := g*(y^k) + <y − y^k>, x^{k+1} at a neighbourhood of y^k to obtain the convex program (D_k) whose ∂h(x^{k+1}) is the solution set. DCA performs so a double linearization with the help of the subgradients of h and g and the DCA then yields the next scheme:

y^k ∈ ∂h(x^k)

x^{k+1} ∈ ∂g*(y^k).

The second interpretation of DCA :

Let x be an optimal solution of primal DC program (P_dc) and y* ∈ ∂h(x*). By the transportation of global solutions between (P_dc) and (D_dc), y is an optimal solution of the dual DC program (D_dc).
Let h be the affine minorization of h defined by h_*(x) := h(x*) + <x − x*, y*>
and consider the next convex program

α* := inf{g(x)− h_*(x) : x ∈ Rⁿ} = inf {f(x) + h(x) − h_*in(x) : x ∈ Rⁿ} (3)

Since the function f_*(x) = f(x) + h(x) − h_*(x) is a convex majorization of f, α* >= α .

But f_*(x*) = f(x*) = α. Hence finally α*= α.

On the other hand, the optimal solution set of (3) is ∂g*(y*) that is contained in the optimal solution set of (P_dc), following the transportation of global solution between (P_dc) and (D_dc). Taking into account of this transportation and the decrease of the sequence {g(x^k)− h(x^k)}, one can understand better the role played by the linearized programs (P_k), (D_k) and (3). And the fact that DCA converges to an optimal solution of (P_dc) from a good initial point.

Finally it is important to point out the deeper insight into DCA . Let h_l and g*_l be the polyhedral convex functions (which underapproximate, respectively, the convex functions h and g* defined by

h_l(x) : = sup {h_i(x) : i = 0, ..., l}, for all x ∈ Rⁿ,

g*_l (y)= sup {(g*)_i(y) : i = 1, ..., l}, all y ∈ Rⁿ.

Let k := inf{l : g(x_l) − h(x_l) = g(x^{l+1}) − h(x^{l+1})} . If k is finite, then the solution computed by DCA, x^{k+1} and y^k, are global minimizers for the polyhedral DC programs

β_k = inf{g(x) − h_k(x) : x ∈ Rⁿ } (P_k)

and

β_k = inf {h(y) − (g)_k(y) : y ∈ Rⁿ } (D_k)

respectively. This property holds especially in polyhedral DC programs where DCA has a finite convergence.

The hidden features reside in (k is finite or equal to +∞):

x^{k+1} (resp. y^{k}) is not only an optimal solution of (P_{k}) (resp. (D_{k})) but also an optimal solution to the more tightly approximate problem (P^{k}) (resp. (D^{k})),

β_{k}+ ε_{k} ≤ α ≤ β_{k} where ε_{k}:= inf {h^{k}(x) - h(x) : x ∈ P} ≤ 0 and the more ε_{k} is near zero (i.e., the more the polyhedral convex minorization h^{k} is close to h over P), the more x^{k+1} is near P (P denotes thesolution set of (P_dc)).

If h and h^{k} coincide at some optimal solution to (P_{dc}) or g^{*} and (g^{*})^{k} coincide at some optimal solution to (D_{dc}), then x^{k+1} (resp. y^{k}) is also an optimal solution of P_{dc}) (resp. (D_{dc})).

These nice features explain partially the efficiency of DCA with respect to related standard methods.

KEY PROPERTIES OF DCA:

DCA is constructed from DC components and their conjugates but not from the DC function f itself.
A DC function has infinitely many DC decompositions and there are as many DCA as there are DC decompositions.
DCA is a descent method without linesearch which has a linear convergence for general DC programs.
DCA has a finite convergence for polyhedral DC programs.

HOW TO USE DCA ?

(i) Find a DC decomposition g - h of the object if function f

(ii) Compute subgradients of h and g*

(iii) Implement the algorithm that consists of three steps :

1. Choose x⁰ ∈ Rⁿ
2. Set y^k ∈ ∂h(x^k)
3. Set x^{k+1} ∈ ∂g(y^k) that leads quite often to solving the convex program

inf {g(x) − h(x^k) + <x− x^k, y^k> : x ∈ Rⁿ}

until the convergence.

IMPORTANT QUESTIONS

How to find a ''good'' DC decomposition ?
How to find a ''good'' starting point ?

From the theoretical point of view, the question of optimal DC decompositions is still open. Of course, this depends strongly on the very specific structure of the problem being considered. In order to tackle the large scale setting, one tries in practice to choose g and h such that sequences {x^k} and {y^k} can be easily calculated, i.e. either they are in explicit form or their computations are inexpensive.

GLOBALIZING

To guarantee globality of sought solutions or to improve their quality it is advised to combine DCA with global optimization techniques,the most popular of which are Branch-and-Bound, SDP and cutting plane techniques...., in a deeper and efficient way. Note that for a DC function f = g - h, a good convex minorization of f can be taken as g + co(-h) where co stands for convex envelope. Knowing that the convex envelope of a concave function on a bounded polyhedral convex set can be easily computed.

APPLICATIONS

The major difficulty in nonconvex programming resides in the fact that there is, in general, no practicable global optimality conditions. Thus, checking globalilty of solutions computed by local algorithms is only possible in the cases where optimal values are known a priori or by comparison with global algorithms which, unfortunately, cannot be applied to large scale problems. A pertinent comparison of local algorithms should be based on the following aspects:

mathematical foundations of the algorithms,
rate of convergence and running-time,
ability to treating large-scale problems,
quality of computed solutions: the lower the corresponding value of the objective is, the better the local algorithm will be,
the degree of dependence on initial points: the larger the set (made up of starting points which ensure convergence of the algorithm to a global solution) is, the better the algorithm will be.

DCA seems to meet these features since it was successfully applied to a lot of different and various nonconvex optimization problems:

1. The Trust Region subproblems

2. Nonconvex Quadratic Programs

3. Quadratic Zero-One Programming problems / Mixed Zero-One Programming problems

4. Minimizing a quadratic function under convex quadratic constraints

5. Multiple Criteria Optimization: Optimization over the Efficient and Weakly EfficientSets

6. Linear Complementarity Problem

7. Nonlinear Bilevel Programming Problems.

8. Optimization in Mathematical Programming problems with Equilibrium Constraints

9. Optimization in Mathematics Finance:

1. Portfolio optimization under DC transaction costs and minimal transaction unit constraints
2. Portfolio selection problem under buy-in threshold constraints
3. Downside risk portfolio selection problem under cardinality constraint
4. Collusive game solutions via optimization

10. The strategic supply chain design problem from qualified partner set

11. The concave cost supply chain problem

12. Nonconvex Multicommodity Network Optimization problems

13. Molecular Optimization via the Distance Geometry Problems

14. Molecular Optimization by minimizing the Lennard-Jones Potential Energy

15. Minimizing Morse potential energy

16. Multiple alignment of sequences

17. Phylogenetic analysis

18. Optimization for Restoration of Signals and Images

19. Discrete tomography

20. Tikhonov Regularization for Nonlinear Ill-Posed problems

21. Engineering structures

22. Multidimensional Scaling Problems (MDS)

23. Clustering

24. Fuzzy Clustering

25. Multilevel hierarchique clustering and its application to multicast structures

26. Support Vector Machine

27. Large margin classification to ψ−learning

28. Generating highly nonlinear balanced Boolean Functions in Cryptography

To be continued ...

MISCELLANIES

CAVEAT: Neither are the given lists of papers complete nor do they always provide the most recent reference.

BUT: The intention is to provide a useful page for the entire DC Programming community. To make and keep this page sufficiently complete and up to date I need your help. Please do inform me about necessary updates or missing contributions and mail suggestions or comments to

This email address is being protected from spambots. You need JavaScript enabled to view it.

NOTICE: the works by H. Tuy, R. Horst and N.V.Thoai on DC optimization only concerns global approaches which are generally applicable for small DC programs. This homepage is devoted to the convex analysis approach to nonconvex programming - DC programming and DCA, and the combined DCA - global optimization techniques.

FaLang translation system by Faboba

Admin