Evolutionary Algorithms are largely used search and optimization procedures that, when properly designed, can solve intractable problems in tractable polynomial time. Efficiency enhancements are used to turn them from tractable to practical.

In this paper we show preliminary results of two efficiency enhancements proposed for the Extended Compact Genetic Algorithm. First, a model building enhancement was used to reduce the complexity of the process from O(n^{3}) to O(n^{2}), speeding up the algorithm by 1000 times on a 4096 bits problem. Then, local-search hybridization was used to reduce the population size by at least 32 times, reducing the memory and running time required by the algorithm. These results draw the first steps toward a competent and efficient Genetic Algorithm.

This paper reviews a competent Pittsburgh LCS that automatically mines important substructures of the underlying problems and takes problems that were intractable with first-generation Pittsburgh LCS and renders them tractable. Specifically, we propose a ?-ary extended compact classifier system which uses (1) a competent genetic algorithm (GA) in the form of ?-ary extended compact genetic algorithm, and (2) a niching method in the form restricted tournament replacement, to evolve a set of maximally accurate and maximally general rules. Besides showing that linkage exist on the multiplexer problem, and that ?eCCS scales exponentially with the number of address bits (building block size) and quadratically with the problem size, this paper also explores non-traditional rule encodings. Gene expression encodings, such as the Karva language, can also be used to build ?eCCS probabilistic models. However, results show that the traditional ternary encoding { 0,1,#} presents a better scalability than the gene expression inspired ones. ]]>

Brainstorming has been greatly used as a method to generate a large number of ideas by variety of each participant’s knowledge. However, brainstorming does not always work well because of spatial, communication limitations. Moreover, brainstorming techniques present limited scalability. Meanwhile, genetics algorithms have been mostly regarded as an engineering or technological tool. However, the innovation intuition suggests that genetic algorithms may be also regarded as models of human innovation and creativity. This paper focuses on online creativity sessions. Modeling those creative efforts using selecto-recombinative mechanism can provide three times more novel ideas, increase the posting frequency by a 2.6 factor, and help overcome superficiality on online communications by favoring synthetic thinking. ]]>

**Abstract:**

In this paper we report on the effect of viral infection with tropism on the formation of building blocks in genetic operations. In previous research, we applied genetic algorithms to the analysis of time-series signals with noise. We demonstrated the possibility of reducing the number of required entities and improving the rate of convergence when searching for a solution by having some of the host chromosomes harbor viruses with a tropism function. Here, we simulate problems having both multimodality and deceptiveness features and problems that include noise as test functions, and show that viral infection with tropism can increase the proportion of building blocks in the population when it cannot be assumed that a necessary and sufficient number of entities are available to find a solution. We show that this capability is especially noticeable in problems that include noise.

**Abstract:**

Estimation of distribution algorithms (EDAs) are stochastic optimization techniques that explore the space of potential solutions by building and sampling explicit probabilistic models of promising candidate solutions. While the primary goal of applying EDAs is to discover the global optimum or at least its accurate approximation, besides this, any EDA provides us with a sequence of probabilistic models, which in most cases hold a great deal of information about the problem. Although using problem-specific knowledge has been shown to significantly improve performance of EDAs and other evolutionary algorithms, this readily available source of problem-specific information has been practically ignored by the EDA community. This paper takes the first step towards the use of probabilistic models obtained by EDAs to speed up the solution of similar problems in future. More specifically, we propose two approaches to biasing model building in the hierarchical Bayesian optimization algorithm (hBOA) based on knowledge automatically learned from previous hBOA runs on similar problems. We show that the proposed methods lead to substantial speedups and argue that the methods should work well in other applications that require solving a large number of problems with similar structure.

**Abstract:**

This paper proposes the incremental Bayesian optimization algorithm (iBOA), which modifies standard BOA by removing the population of solutions and using incremental updates of the Bayesian network. iBOA is shown to be able to learn and exploit unrestricted Bayesian networks using incremental techniques for updating both the structure as well as the parameters of the probabilistic model. This represents an important step toward the design of competent incremental estimation of distribution algorithms that can solve difficult nearly decomposable problems scalably and reliably.

**Abstract:**

Effective and efficient multiscale modeling is essential to advance both the science and synthesis in a wide array of fields such as physics, chemistry, materials science, biology, biotechnology and pharmacology. This study investigates the efficacy and potential of using genetic algorithms for multiscale materials modeling and addresses some of the challenges involved in designing competent algorithms that solve hard problems quickly, reliably and accurately. In particular, this thesis demonstrates the use of genetic algorithms (GAs) and genetic programming (GP) in multiscale modeling with the help of two non-trivial case studies in materials science and chemistry.

The first case study explores the utility of genetic programming (GP) in multi-timescaling alloy kinetics simulations. In essence, GP is used to bridge molecular dynamics and kinetic Monte Carlo methods to span orders-of-magnitude in simulation time. Specifically, GP is used to regress symbolically an inline barrier function from a limited set of molecular dynamics simulations to enable kinetic Monte Carlo that simulate seconds of real time. Results on a non-trivial example of vacancy-assisted migration on a surface of a face-centered cubic (fcc) Copper-Cobalt (CuxCo1-x) alloy show that GP predicts all barriers with 0.1% error from calculations for less than 3% of active configurations, independent of type of potentials used to obtain the learning set of barriers via molecular dynamics. The resulting method enables 2–9 orders-of-magnitude increase in real-time dynamics simulations taking 4–7 orders-of-magnitude less CPU time.

The second case study presents the application of multiobjective genetic algorithms (MOGAs) in multiscaling quantum chemistry

simulations. Specifically, MOGAs are used to bridge high-level quantum chemistry and semiempirical methods to provide accurate representation of complex molecular excited-state and ground-state behavior. Results on ethylene and benzene—two common building-blocks in organic chemistry—indicate that MOGAs produce high-quality semiempirical methods that (1) are stable to small perturbations, (2) yield accurate configuration energies on untested and critical excited states, and

(3) yield ab initio quality excited-state dynamics. The proposed method enables simulations of more complex systems to realistic multi-picosecond timescales, well beyond previous attempts or expectation of human experts, and 2–3 orders-of-magnitude reduction in computational cost.

While the two applications use simple evolutionary operators, in order to tackle more complex systems, their scalability and limitations have to be investigated. The second part of the thesis addresses some of the challenges involved with a successful design of genetic algorithms and genetic programming for multiscale modeling. The first issue addressed is the scalability of genetic programming, where facetwise models are built to assess the population size required by GP to ensure adequate supply of raw building blocks and also to ensure accurate decision-making between competing building blocks.

This study also presents a design of competent genetic programming, where traditional fixed recombination operators are replaced by building and sampling probabilistic models of promising candidate programs. The proposed scalable GP, called extended compact GP (eCGP), combines the ideas from extended compact genetic algorithm (eCGA) and probabilistic incremental program evolution (PIPE) and adaptively identifies, propagates and exchanges important subsolutions of a search problem. Results show that eCGP scales cubically with problem size on both GP-easy and GP-hard problems.

Finally, facetwise models are developed to explore limitations of scalability of MOGAs, where the scalability of multiobjective algorithms in reliably maintaining Pareto-optimal solutions is addressed. The results show that even when the building blocks are accurately identified, massive multimodality of the search problems can easily overwhelm the nicher (diversity preserving operator) and lead to exponential scale-up. Facetwise models are developed, which incorporate the combined effects of model accuracy, decision making, and sub-structure supply, as well as the effect of niching on the population sizing, to predict a limit on the growth rate of a maximum number of sub-structures that can compete in the two objectives to circumvent the failure of the niching method. The results show that if the number of competing building blocks between multiple objectives is less than the proposed limit, multiobjective GAs scale-up polynomially with the problem size on boundedly-difficult problems.

]]>

**Abstract:**

This report provides documentation for the general purpose genetic algorithm toolbox for matlab in C++. The fitness function used in the toolbox is written in matlab. The toolbox provides different selection, recombination, mutation, niching, and constraint-handling operators. Problems with single and multiple objectives can be solved with the toolbox. Moreover, the toolbox is easily extensible and customizable for incorporating other operators and for solving user-defined search problems.

**Abstract:**

This report provides documentation for the general purpose genetic algorithm toolbox. The toolbox provides different selection, recombination, mutation, niching, and constraint-handling operators. Problems with single and multiple objectives can be solved with the toolbox. Moreover, the toolbox is easily extensible and customizable for incorporating other operators and for solving user-defined search problems.

**Abstract:**

This paper focuses on automated procedures to reduce the dimensionality of protein structure prediction datasets by simplifying the way in which the primary sequence of a protein is represented. The potential benefits of this procedure are faster and easier learning process as well as the generation of more compact and human-readable classifiers. The dimensionality reduction procedure we propose consists on the reduction of the 20-letter amino acid (AA) alphabet, which is normally used to specify a protein sequence, into a lower cardinality alphabet. This reduction comes about by a clustering of AA types accordingly to their physical and chemical similarity. Our automated reduction procedure is guided by a fitness function based on the Mutual Information between the AA-based input attributes of the dataset and the protein structure feature that being predicted.

To search for the optimal reduction, the Extended Compact Genetic Algorithm (ECGA) was used, and afterwards the results of this process were fed into (and validated by) BioHEL, a genetics-based machine learning technique. BioHEL used the reduced alphabet to induce rules for protein structure prediction features. BioHEL results are compared to two standard machine learning systems. Our results show that it is possible to reduce the size of the alphabet used for prediction from twenty to just three letters resulting in more compact, i.e. interpretable, rules. Also, a protein-wise accuracy performance measure suggests that the loss of accuracy accrued by this substantial alphabet reduction is not statistically significant when compared to the full alphabet.

]]>