Evolutionary Algorithms are largely used search and optimization procedures that, when properly designed, can solve intractable problems in tractable polynomial time. Efficiency enhancements are used to turn them from tractable to practical.

In this paper we show preliminary results of two efficiency enhancements proposed for the Extended Compact Genetic Algorithm. First, a model building enhancement was used to reduce the complexity of the process from O(n^{3}) to O(n^{2}), speeding up the algorithm by 1000 times on a 4096 bits problem. Then, local-search hybridization was used to reduce the population size by at least 32 times, reducing the memory and running time required by the algorithm. These results draw the first steps toward a competent and efficient Genetic Algorithm.

This paper reviews a competent Pittsburgh LCS that automatically mines important substructures of the underlying problems and takes problems that were intractable with first-generation Pittsburgh LCS and renders them tractable. Specifically, we propose a ?-ary extended compact classifier system which uses (1) a competent genetic algorithm (GA) in the form of ?-ary extended compact genetic algorithm, and (2) a niching method in the form restricted tournament replacement, to evolve a set of maximally accurate and maximally general rules. Besides showing that linkage exist on the multiplexer problem, and that ?eCCS scales exponentially with the number of address bits (building block size) and quadratically with the problem size, this paper also explores non-traditional rule encodings. Gene expression encodings, such as the Karva language, can also be used to build ?eCCS probabilistic models. However, results show that the traditional ternary encoding { 0,1,#} presents a better scalability than the gene expression inspired ones. ]]>

This paper presents the real-coded extended compact genetic algorithms (rECGA) for decomposable real-valued optimization problems. Mutual information among real-valued variables is employed to measure variables interaction or dependency, and the variables clustering and aggregation algorithms are proposed to identify the substructures of a problem through partitioning variables. Then, mixture Gaussian probability density function is estimated to model the promising individuals for each substructure, and the sampling of multivariate Gaussian probability density function is done by adopting Cholesky decomposition. Finally, experiments on decomposable test functions are conducted. The results show that the rECGA is able to correctly identify the substructure of decomposable problems with linear or nonlinear correlations, and achieves a good scalability. ]]>

The Bayesian optimization algorithm (BOA) uses Bayesian networks to learn linkages between the decision variables of an optimization problem. This paper studies the influence of different selection and replacement methods on the accuracy of linkage learning in BOA. Results on concatenated m-k deceptive trap functions show that the model accuracy depends on a large extent on the choice of selection method and to a lesser extent on the replacement strategy used. Specifically, it is shown that linkage learning in BOA is more accurate with truncation selection than with tournament selection. The choice of replacement strategy is important when tournament selection is used, but it is not relevant when using truncation selection. On the other hand, if performance is our main concern, tournament selection and restricted tournament replacement should be preferred. These results aim to provide practitioners with useful information about the best way to tune BOA with respect to structural model accuracy and overall performance. ]]>

This paper presents a simple real-coded estimation of distribution algorithm (EDA) design using ?-ary extended compact genetic algorithm (?ECGA) and discretization methods. Specifically, the real-valued decision variables are mapped to discrete symbols of user-specified cardinality using discretization methods. The ?ECGA is then used to build the probabilistic model and to sample a new population based on the probabilistic model. The effect of alphabet cardinality and the selection pressure on the scalability of the real-coded ECGA (rECGA) method is investigated. The results show that the population size required by rECGA—to successfully solve a class of additivelyseparable problems—scales sub-quadratically with problem size and the number of function evaluations scales sub-cubically with problem size. The proposed rECGA is simple, making it amenable for further empirical and theoretical analysis. Moreover, the probabilistic models built in the proposed realcoded ECGA are readily interpretable and can be easily visualized. The proposed algorithm and the results presented in this paper are first step towards conducting a systematic analysis of real-coded EDAs and towards developing a design theory for development of scalable and robust real-coded EDAs. ]]>

**Abstract:**

Estimation of distribution algorithms (EDAs) are stochastic optimization techniques that explore the space of potential solutions by building and sampling explicit probabilistic models of promising candidate solutions. While the primary goal of applying EDAs is to discover the global optimum or at least its accurate approximation, besides this, any EDA provides us with a sequence of probabilistic models, which in most cases hold a great deal of information about the problem. Although using problem-specific knowledge has been shown to significantly improve performance of EDAs and other evolutionary algorithms, this readily available source of problem-specific information has been practically ignored by the EDA community. This paper takes the first step towards the use of probabilistic models obtained by EDAs to speed up the solution of similar problems in future. More specifically, we propose two approaches to biasing model building in the hierarchical Bayesian optimization algorithm (hBOA) based on knowledge automatically learned from previous hBOA runs on similar problems. We show that the proposed methods lead to substantial speedups and argue that the methods should work well in other applications that require solving a large number of problems with similar structure.

**Abstract:**

This paper proposes the incremental Bayesian optimization algorithm (iBOA), which modifies standard BOA by removing the population of solutions and using incremental updates of the Bayesian network. iBOA is shown to be able to learn and exploit unrestricted Bayesian networks using incremental techniques for updating both the structure as well as the parameters of the probabilistic model. This represents an important step toward the design of competent incremental estimation of distribution algorithms that can solve difficult nearly decomposable problems scalably and reliably.

**Abstract:**

Efficiency enhancement techniques—such as parallelization and hybridization—are among the most important ingredients of practical applications of genetic and evolutionary algorithms and that is why this research area represents an important niche of evolutionary computation. This paper describes and analyzes sporadic model building, which can be used to enhance the efficiency of the hierarchical Bayesian optimization algorithm (hBOA) and other estimation of distribution algorithms (EDAs) that use complex multivariate probabilistic models. With sporadic model building, the structure of the probabilistic model is updated once in every few iterations (generations), whereas in the remaining iterations, only model parameters (conditional and marginal probabilities) are updated. Since the time complexity of updating model parameters is much lower than the time complexity of learning the model structure, sporadic model building decreases the overall time complexity of model building. The paper shows that for boundedly difficult nearly decomposable and hierarchical optimization problems, sporadic model building leads to a significant model-building speedup, which decreases the asymptotic time complexity of model building in hBOA by a factor of *O(n ^{0.26})* to

**Abstract:**

Effective and efficient multiscale modeling is essential to advance both the science and synthesis in a wide array of fields such as physics, chemistry, materials science, biology, biotechnology and pharmacology. This study investigates the efficacy and potential of using genetic algorithms for multiscale materials modeling and addresses some of the challenges involved in designing competent algorithms that solve hard problems quickly, reliably and accurately. In particular, this thesis demonstrates the use of genetic algorithms (GAs) and genetic programming (GP) in multiscale modeling with the help of two non-trivial case studies in materials science and chemistry.

The first case study explores the utility of genetic programming (GP) in multi-timescaling alloy kinetics simulations. In essence, GP is used to bridge molecular dynamics and kinetic Monte Carlo methods to span orders-of-magnitude in simulation time. Specifically, GP is used to regress symbolically an inline barrier function from a limited set of molecular dynamics simulations to enable kinetic Monte Carlo that simulate seconds of real time. Results on a non-trivial example of vacancy-assisted migration on a surface of a face-centered cubic (fcc) Copper-Cobalt (CuxCo1-x) alloy show that GP predicts all barriers with 0.1% error from calculations for less than 3% of active configurations, independent of type of potentials used to obtain the learning set of barriers via molecular dynamics. The resulting method enables 2–9 orders-of-magnitude increase in real-time dynamics simulations taking 4–7 orders-of-magnitude less CPU time.

The second case study presents the application of multiobjective genetic algorithms (MOGAs) in multiscaling quantum chemistry

simulations. Specifically, MOGAs are used to bridge high-level quantum chemistry and semiempirical methods to provide accurate representation of complex molecular excited-state and ground-state behavior. Results on ethylene and benzene—two common building-blocks in organic chemistry—indicate that MOGAs produce high-quality semiempirical methods that (1) are stable to small perturbations, (2) yield accurate configuration energies on untested and critical excited states, and

(3) yield ab initio quality excited-state dynamics. The proposed method enables simulations of more complex systems to realistic multi-picosecond timescales, well beyond previous attempts or expectation of human experts, and 2–3 orders-of-magnitude reduction in computational cost.

While the two applications use simple evolutionary operators, in order to tackle more complex systems, their scalability and limitations have to be investigated. The second part of the thesis addresses some of the challenges involved with a successful design of genetic algorithms and genetic programming for multiscale modeling. The first issue addressed is the scalability of genetic programming, where facetwise models are built to assess the population size required by GP to ensure adequate supply of raw building blocks and also to ensure accurate decision-making between competing building blocks.

This study also presents a design of competent genetic programming, where traditional fixed recombination operators are replaced by building and sampling probabilistic models of promising candidate programs. The proposed scalable GP, called extended compact GP (eCGP), combines the ideas from extended compact genetic algorithm (eCGA) and probabilistic incremental program evolution (PIPE) and adaptively identifies, propagates and exchanges important subsolutions of a search problem. Results show that eCGP scales cubically with problem size on both GP-easy and GP-hard problems.

Finally, facetwise models are developed to explore limitations of scalability of MOGAs, where the scalability of multiobjective algorithms in reliably maintaining Pareto-optimal solutions is addressed. The results show that even when the building blocks are accurately identified, massive multimodality of the search problems can easily overwhelm the nicher (diversity preserving operator) and lead to exponential scale-up. Facetwise models are developed, which incorporate the combined effects of model accuracy, decision making, and sub-structure supply, as well as the effect of niching on the population sizing, to predict a limit on the growth rate of a maximum number of sub-structures that can compete in the two objectives to circumvent the failure of the niching method. The results show that if the number of competing building blocks between multiple objectives is less than the proposed limit, multiobjective GAs scale-up polynomially with the problem size on boundedly-difficult problems.

]]>]]>