Bayesian Heuristic Approach
Abstract
Results of complexity theory show the limitations of exact analysis. That explains popularity of heuristic algorithms. It is well known that efficiency of heuristics depends on the parameters. Thus we need some automatic procedures for tuning the heuristics. That helps comparing results of different heuristics. This enhance their efficiency, too.
In this paper the theory and applications of Bayesian Heuristic Approach are discussed. Examples of the Bayesian approach to automated tuning of heuristics are investigated.
All the algorithms are implemented as platform independent Java applets or servlets. Readers can easily verify and apply the results for studies and for real life optimization models.
The information is on the main website:
and four mirrors.
Optimization, Bayesian, Tuning Heuristics, Distance Studies
The efficiency of metaheuristics depends on parameters. Often this relation is defined by statistical simulation and have many local minima. Therefore methods of stochastic global optimization are needed to optimize the parameters.
The traditional numerical analysis considers optimization algorithms that guarantee some accuracy for all functions to be optimized. This includes the exact algorithms. Limiting the maximal error requires a computational effort that often increases exponentially with the size of the problem [1].
An alternative is the average analysis where the expected error is made as small as possible [2]. The average is taken over a set of functions to be optimized. The average analysis is called the Bayesian Approach (BA) [3,4]. Application of BA to optimization of heuristics is called the Bayesian Heuristic Approach (BHA) [5].
In the examples of this paper distances from the global minimum are not known. Thus the efficiency of methods is tested by comparing with average results of the "pure" Monte Carlo. Monte Carlo converges and does not depend on parameters that can be adjusted to a given problem by using some expert knowledge or additional test runs.
Possibilities of application are illustrated by several examples. designed for distance graduate studies in the Internet environment. All the algorithms are implemented as platform independent Java applets or servlets therefore readers can easily verify and apply the results for studies and for real life heuristic optimization problems.
Tuning of heuristics is regarded as an optimization problem. We are looking for such parameters of heuristics that provide best results. Denote the original function to be optimized as . We know just that this function belongs to some family of functions. Thus we cannot define the optimization quality by just a single sample . All the family have to be regarded. Assume that search time is limited and defined as the number of iterations.
Denote by ( ) the results obtained applying times a heuristic to a function . Here and , where is the final decision of heuristics after iterations. Formally heuristic after iteration transforms the original function into another function . Here belongs to a family of functions transformed by heuristics . Transformed function shows how the value of the original function obtained applying times the heuristic depends on the heuristic parameters .
Denote results of th iteration as .
Here and
.
Using the same heuristics parameter we obtain different results
depending on which sample function optimize. Thus here is optimization under uncertainty. In the optimization under uncertainty two different approaches are widely used:
the MinMax and the Bayesian.
The MinMax approach minimizes the maximal deviation from the solution. Here the worst case is regarded. The Bayesian approach searches for minimal expected deviation. Thus the average case is investigated. If the family is a small set of well defined functions then the MinMax approach is efficient. Otherwise we regard the average case because the worst case is too bad.
Applying the Bayesian approach
we fix a prior distribution on a set of functions .
We need that defining average results.
A prior distribution is transformed into a posterior
distribution on a subset of the family of .
We can do that by conditional probabilities.
Now we can define a risk function . The risk function shows
how the expected deviation from the solution depends on parameters .
Here
, where is the optimal and is a current
values of heuristic parameters. Owing to linearity of deviation the risk function can be expressed as the difference
.
The second component is a constant associated with the set and
a prior distribution . We see that
does not depend on the decisions therefore we disregard this component. Then we
can define the risk function as the expected value of
In optimization problems theory and software are interconnected. The final results depend on the mathematical theory of optimization and the software implementation. Thus we have to regard them both.
That is just an initial part of the GMJ. Important is to make GMJ open for development by users. Users contribute their own optimization methods in addition to the Bayesian ones. User optimization models are included as GMJ tasks. The results of optimization are represented by GMJ analysis objects. A minimal set of methods, tasks, and analysis objects is implemented by default. The rest depends on users.
Monte Carlo is apparently the simplest method providing
convergence with probability one for any continuous objective function.
Thus we can test efficiency of global optimization methods by comparing
them with Monte Carlo search (3).
The analysis tool of GMJ "Convergence" defines how the best values of objective
functions depend on iteration number.
A method of search can be classified as
efficient only if it converges faster than Monte Carlo.
Other optimization methods change many variables at each iteration.
Here the 'Projection' shows how the density of observations depends on variables .
In efficient methods the density is higher around the global minimum.
Comparing "Projection"
of different methods we see higher densities around global minima
in both Bayesian methods "Exkor" and "Bayes".
The illustration are projections of the Exkor and Bayes methods in Fig. 1. and Fig. 2.


The main objective of BHA is improving any given heuristic by defining the best parameters and the best ``mixtures'' of different heuristics. The examples indicate that heuristic decision rules mixed and adapted by BHA often outperform the best individual heuristics. In addition, BHA provides almost sure convergence. However, the final results of BHA depend on the quality of the specific heuristics including the expert knowledge. Therefore the BHA should be regarded as a tool for enhancing the heuristics but not for replacing them.
Many well known optimization algorithms, such as Genetic Algorithms [14], GRASP [15], and Tabu Search [16], may be regarded as metaheuristics that can be improved using BHA.
Genetic Algorithms [14] is an important "source" of useful stochastic search heuristics. It is well known that the results of the genetic algorithms depend on the mutation and crossover parameters. The Bayesian Heuristic Approach could be used in optimizing those parameters.
In tabu search the issues of identifying best combinations of short and long term memory and best balances of intensification and diversification strategies can be obtained using BHA.
Hence, the Bayesian Heuristics
Approach can be applied to improve various stochastic or heuristic algorithms
of discrete optimization. The proved convergence of a discrete search method is an asset.
If not, then the convergence conditions are provided by tuning the BHA parameters
[17].
We remove the heuristic algorithm out of such nonoptimal decisions by taking decision with probability where is an increasing function of and is a parameter vector. The BA is to optimize the parameters by minimizing the best result obtained applying times the randomized heuristic algorithm .
Optimization of adapts the heuristic algorithm to a given problem. Let us illustrate the parametrization of by three randomization functions: Here the upper index denotes the Monte Carlo component (randomization by the uniform distribution). The index defines the linear component of randomization. The index denotes the pure heuristics with no randomization: if , and , otherwise. Here parameters define the probabilities of using randomization . The optimal may be applied solving different but related problems, too [17]. That is important in the "online" optimization.
The first heuristic is denoted as "Random". Here we select cut elements with equal probabilities. The second heuristic is 'Max Size First'. The third heuristic is 'Min Size First'. The names speaks for themselves. We search for the best mixture of these three heuristics at fixed optimization time. The time parameter is important because the optimal mixture depends on the number of iterations.
A sample of optimization of a twodimensional "guillotine" cut problem are in Fig. 3. Numbers
show how many times the corresopnding heuristic is used.
A sample of optimization of a threedimensional packing problems is
in Fig. 4.


Permutations are made by trying to close the gaps between lectures. The best obtained schedule is recorded after each iteration. Changes to worse schedules are made with some probabilities. That is for improving convergence conditions.
Three different heuristics are regarded. Two heuristics with fixed parameters and the third heuristic with parameters obtained by global stochastic optimization.
Using the first heuristic we keep the current schedule until the better schedule is found. The parameter is the probability to pass by the next teacher. This way we can reach just a sort of local minimum.
A natural first step to provide convergence
to the global minimum
is the Simulated Annealing (SA):
we move from the current schedule
to the permuted schedule
with probability
The difference from the traditional SA is that here we want to improve the average results for some fixed number of iterations . Thus the cooling rate should be regarded, too. A way to do it is by introducing the cooling rate parameter .
This
transforms expression (4)
The third heuristic optimizes all three parameters for a fixed optimization time defined by the number of "internal"^{} iterations . Fig. 5. shows the results of SA optimization in school scheduling. Fig. 6. shows initial and optimized student schedules.


The videoconferencing is regular:
each Friday from 8:00 until 9:30 EET (EEST).
Now the language is Lithuanian because no foreign students are connected.
However, the essential part of the website is in English so the English broadcasts
are possible, too.
The main difficulty for English speaking students apparently are early hours (7:008:30 CET) .
The connection is free. Static IP and NetMeeting is used. Advance notification is assumed. Interactive mode is easy using some standard
Audio and Video tools.
Fig. 7. is a snapshot of videoconferencing
about the Wiener model with
some "online" graphical explanations
in response to students questions.

This document was generated using the LaTeX2HTML translator Version 200221 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html local_icons split 3 image_type gif show_section_numbers html_version 4.0 tune4.tex
The translation was initiated by mockus on 20060501