The optimization methods described in this chapter attempt to minimize the objective function which is the sum of the squares of the distances between the diagram simulation results and the experimental dots. This function is defined as follows
J = ∑i, jwj (ypred(i, j) - yexp(i, j))2,
where the indexes i and j denote rows and columns in the dataset, yexp represents the known experimental data and ypred is the corresponding simulated values. The weight wj calculated for each data column by one of the formula wj = 1/|<xj>|, wj = (<xj2>)-1/2 or wj = 1/|<xj2>-<xj><xj>|, where <xj> is the mean value of the points in a trajectory of experimental dots. Users also are able to individually override these weights.
Minimization of the objective function can be performing by one of the following optimization methods.
| • | Adaptive simulated annealing (ASA) |
| • | Stochastic ranking evolution strategy (SRES) |
| • | Global optimization using the DIRECT algorithm (GLBSOLVE) |
| • | Quadratic Hill-climbing |
| • | Multi-objective cellular genetic algorithm (MOCell) |
| • | Multi-objective particle swarm optimization (MOPSO) |
Comparative characteristics of the methods are given by the following table.
Method |
References |
Year |
Parallel |
Parameter constraints |
ASA |
[1] |
1996 |
- |
+ |
SRES |
[2] |
2000 |
+ |
+ |
GLBSOLVE |
[3], [4] |
1999 |
- |
- |
Hill-climbing |
[5] |
1965 |
- |
+ |
MOCell |
[6] |
2005 |
+ |
+ |
MOPSO |
[7], [8] |
1995 |
+ |
+ |
Below is a brief description of the methods.
Adaptive simulated annealing (ASA)
Adaptive simulated annealing (ASA) is a global optimization algorithm that relies on randomly importance-sampling the parameter space [1]. ASA algorithm is developed to statistically find the best global fit of a nonlinear non-convex cost-function over a D-dimensional space. It is argued that this algorithm permits an annealing schedule for "temperature" T decreasing exponentially in annealing-time k,
T = T0exp(-ck(1/D)).
The introduction of re-annealing also permits adaptation to changing sensitivities in the multi-dimensional parameter-space. This annealing schedule is faster than fast Cauchy annealing, where T = T0/k, and much faster than Boltzmann annealing, where T = T0/ln k.
The following options of the algorithm must be set.
| • | Calculation accuracy is a positive double value to determine the difference between deviation of simulation result from experimental dataset under parameter values of current calculation step and values of previous calculation step. The default is '1.0E-11'. |
Stochastic ranking evolution strategy (SRES)
The Evolution Strategy using Stochastic Ranking [2] is a (μ λ)-ES evolutionary optimization algorithm that uses stochastic ranking as the constraint handling technique. In the (μ λ)-ES algorithm, the individual i is a set of real-valued vectors (xi, σi) for all i = 1,...,λ. The initial population of x is generated according to a uniform n-dimensional probability distribution over the search space S. Let δx be an approximate measure of the expected distance to the global optimum, then the initial setting for the "mean step sizes" should be
σi, j(0) = δxj / n½ ≈ (lj − rj) / n½, lj ≤ xj ≤ rj, i = 1,...,λ; j = 1,...,n,
where σi, j denotes the jth component of the vector σi. We use these initial values as upper bounds on σ. Following the bubble-sort-like procedure is used to rank the individuals in a population, and the best (highest ranked) μ individuals out of λ are selected for the next generation. The truncation level is set at μ / λ ≈ 1/7. Variation of strategy parameters is performed before the modification of objective variables. We generate λ new strategy parameters from μ old ones so that we can use the λ new strategy parameters in generating λ offspring later. The "mean step sizes" are updated according to the log-normal update rule: i = 1,...,μ; h = 1,...,λ; j = 1,...,n,
σh, j(g+1) = 1/2 (σh, j(g) + σk, j(g)) exp(τ′ N(0,1) + τ Nj(0,1)) (1)
where N(0,1) is a normally distributed one-dimensional random variable with an expectation of 0 and variance 1 and k = 1,...,μ is an index generated at random and anew for each j. The "learning rates" τ and τ′ are set equal to (4n)−¼ and (2n)−½, respectively. Recombination is performed on the self-adaptive parameters before applying the update rule given by (1). Having varied the strategy parameters, each individual (xi, σi), for all i = 1,...,μ creates λ/μ offspring on average, so that a total of λ offspring are generated
xh, j(g+1) = xh, j(g) + σh, j(g+1) Nj(0,1).
Recombination is not used in the variation of objective variables. When an offspring is generated outside the parametric bounds defined by the problem, the mutation (variation) of the objective variable will be retried until the variable is within its bounds. In order to save computation time, the mutation is retried only ten times and then ignored, learning the object variable in its original state within the parameter bounds.
The following options of the algorithm must be set.
| • | Number of iterations is a positive integer value to determine the number of iterations the algorithm shall evolve the population. The default is '1750'. |
| • | Population size is a positive integer value to determine the number of individuals that survive after each iteration. The default is '20'. |
Global optimization using the DIRECT algorithm (GLBSOLVE)
A deterministic GO method [3] is a version of the DIRECT algorithm [4]. The first step in the DIRECT algorithm is to transform the search space to be the unit hypercube. The function is then sampled at the center-point of this cube. Computing the function value at he center-point instead of doing it at the vertices is an advantage when dealing with problems in higher dimensions. The hypercube is then divided into smaller hyperrectangles whose center-points are also sampled. Instead of using a Lipschitz constant when determining the rectangles o sample next, DIRECT identifies a set of potentially optimal rectangles in each iteration. All potentially optimal rectangles are further divided into smaller rectangles whose center-points are sampled. When no Lipschitz constant is used, there is no natural way of defining convergence (except when the optimal function value is known as in the test problems). Instead, the procedure described above is performed for a predefined number of iterations. Note that this solver does not support parameters constraints.
The following options of the algorithm must be set.
| • | Number of iterations is a positive integer value to determine the number of iterations the algorithm shall divide the hypercube (search space) into smaller hyperrectangles. The default is '1500'. |
Quadratic Hill-climbing
The gradient method [5] for minimazing general function -H(x), where variable x denotes the column vector of variables (x1,...,xn). This method rests on maximizing a quadratic approximation to the function H(x) on a suitably chosen spherical region. Computational technique for maximization take the form of an iterative procedure. The method is specifically designed to work for functions which are not concave and for starting points which are not necessarily near a maximum. Let us denote by Fx the vector of first partial derivatives evaluated at x and by Sx the symmetric
(n x n) matrix of second partial derivatives evaluated at x. Assume that H(x) admits of a second-order Taylor series expansion around a point
a = (a1,...,an):
H(x) ≈ H(a) + (x − a)'Fa + ½(x − a)'Sa(x − a) (1)
where the subscripts indicate the point of evaluation. The iterative procedure for finding the maximum of the function is, given point xp at which Sxp and Fxp are evaluated, to define the next point, xp+1, as the maximum of the quadratic approximation (1) on a spherical region centered at xp. Ideally, the region should be taken as large as possible provided that it is small enough that in the region the quadratic approximation is a satisfactory guide to the actual behavior of the function. The following procedure attempts to approximate this ideal. Two distinct cases arise:
(a) Fxp significantly different from 0. In this event we choose a number
α = λ1 + R||Fxp|| (2)
where λ1 is the largest eigenvalue of Sxp, and R is a positive parameter determined by the following rule. Let ΔH be the actual change in the function due to the proposed Δx and let ΔQ be the corresponding change in the quadratic approximation. Let z = ΔH/ΔQ. If z ≤ 0, the proposed Δx implies overshooting; it is therefore not accepted, R is increased by a factor of 4 and a new (S - αI)-1 is calculated. If z > 0 and close to unity (in practice, if z is between 0.7 and 1.3) R is decreased by multiplying R by a factor of 0.4. If z > 2, R is again increased by a factor of 4. For other values of z (0 ≤ z≤ 0.7 and 1.3 ≤ z ≤ 2) the magnitude of the factor multiplying R is determined by linear interpolation between 0.4 and 4. We now take
xp+1 = xp − (Sxp − α I)-1Fxp or xp+1 = xp − S -1xp Fxp
according to whether α is positive or not. If α = 0 at this point, we generally directly computed the step size necessary to produce a positive α. This typically saved a number of iterations. Now xp+1 is the maximum of the quadratic approximation to the function on a region Bα of radius
||(Sxp − α I)-1Fxp||
with the center at xp.
(b) Fxp is so near 0 that the length of the step taken is within a preset tolerance of 0. Then, if Sxp is negative definite, the process is terminated and xp is accepted as the location of the maximum. If Sxp is not negative definite, we are at a saddle point or at the bottom of a cylindrical valley. A step is taken along the eigenvector corresponding λ1 and the algorithm recycles in the usual manner. One final feature, incorporated for reason of computational efficiency rather than theoretical elegance, was the introduction of a scalar hp into (2), writing it as
xp+1 = xp − hp(Sxp − α I)-1Fxp.
At each step the computation is first performed with hp = 1. If this gives an improvement in H(x), hp is multiplied by a constant, which magnitude is a decreasing function of the absolute value of the angle between the current step and the immediately preceding step. Then the function is examined at the new point so obtained. This process is repeated until the function declines in which event the last step is accepted. It should be noted that these attempts at stretching the step are relatively cheap since they require only an evaluation of the function. This is in contrast to changes in α within each iteration which require reinversion of (Sxp − α I).
The following options of the algorithm must be set.
| • | Calculation accuracy is a positive double value to determine the difference between deviation of simulation result from experimental dataset under parameter values of current calculation step and values of previous calculation step. The default is '1.0E-5'. |
Multi-objective cellular genetic algorithm (MOCell)
This algorithm [6] based on the cellular model of genetic algorithms. In such model the concept of (small) neighborhood is intensively used; this means that an individual may only interact with its nearby neighbors in the breeding loop. The overlapped small neighborhoods of the model help in exploring the search space because the induced slow diffusion of solutions through the population provides a kind of exploration (diversification), while exploitation (intensification) takes place inside each neighborhood by genetic operations. MOCell is an adaptation of a canonical cellular genetic algorithm to the multi-objective field. It uses an external archive to store the non-dominated solutions found during the execution of the algorithm. The main feature characterizing MOCell is that a number of solutions are moved back into the population from the archive after each iteration, replacing randomly selected existing individuals.
The following options of the algorithm must be set.
| • | Iteration limit is a maximum value of iterations. The default is '1500'. |
| • | Population size is a number of the individuals in the population. The default is '20'. |
Multi-objective particle swarm optimization (MOPSO)
The particle swarm optimization (PSO) algorithm, initially proposed by Kennedy and Eberhart [7], is a direct search algorithm based on the simulation of the social behavior of birds within a flock. The swarm is typically modeled by particles in the multidimensional space that have a position and a velocity. These particles fly through hyperspace and have two essential reasoning capabilities: their memory of their own best position and knowledge of the global or their neighborhood's best. Let xi(t) denote the position of particle pi at time step t. The position of pi is then changed by adding a velocity vi(t) to its current position, i.e.
xi(t) = xi(t − 1) + vi(t).
Let gbest is the position of the best particle from the entire swarm and pbest is the position of the neighborhood best that the particle obtained by communicating with a subset of the swarm. In this case, velocity equation is given by
vi(t) = W · vi(t − 1) + C1r1(xpbest − xi(t)) + C2r2(xgbest − xi(t)),
where W is the inertia weight, C1 and C2 are the learning factors (usually defined as constants), and r1, r2 are random values from the interval [0,1]. Here we used a multiple-objective particle swarm optimizers based on the paper of Sierra and Coello [8]. This optimizer allows to take into account parametric constraints and uses a crowding factor for the leaders selection as well as the following mutation operators: an uniform mutation operator, where the variability range allowed for each decision variable is constant over generations, and a non-uniform mutation operator, where such range decreases over time. These operators modify the values of the particle decision variables with a certain probability.
The following options of the algorithm must be set.
| • | Number of iterations is a maximum value of iterations. The default is '500'. |
| • | Number of particles is a number of the particles in a swarm. The default is '20'. |
References
| 1. | L Ingber (1996), Adaptive simulated annealing (ASA): Lessons learned, Control and Cybernetics, 25(1):33-54. |
| 2. | TP Runarsson, X Yao (2000), Stochastic Ranking for Constrained Evolutionary Optimization, IEEE Transactions on Evolutionary Computation, 4(3):284-294. |
| 3. | M Björkman, K Holmström (1999), Global Optimization Using the DIRECT Algorithm in Matlab, AMO, 1(2):17-37. |
| 4. | DR Jones (2001), DIRECT global optimization algorithm. in: Encyclopedia of optimization, 431-440. |
| 5. | SM Goldfeld, RE Quandt, HF Trotter (1965), Maximization by Quadratic Hill-climbing, Econometric Research Program, Research Memorandum 72. |
| 6. | AJ Nebro, JJ Durillo, F Luna, B Dorronsoro and E Alba (2009), A Cellular Genetic Algorithm for Multiobjective Optimization, International Journal of Intelligent Systems, 24(7): 726-746. |
| 7. | J Kennedy, RC Eberhart (1995). Particle Swarm Optimization, Proceedings of the 1995 IEEE International Conference on Neural Networks, Piscataway, New Jersey, IEEE Service Center, 1942-1948. |
| 8. | MR Sierra, CA Coello Coello (2005), Improving PSO-Based Multi-objective Optimization Using Crowding, Mutation and e-Dominance, Evolutionary Multi-Criterion Optimization, 3410:505-519, Springer Berlin/Heidelberg. |