In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
and
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
, population-based incremental learning (PBIL) is an
optimization algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
, and an
estimation of distribution algorithm
''Estimation of distribution algorithms'' (EDAs), sometimes called ''probabilistic model-building genetic algorithms'' (PMBGAs), are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilis ...
. This is a type of
genetic algorithm
In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to gene ...
where the
genotype
The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
of an entire population (
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
vector) is evolved rather than individual members.
[
] The algorithm is proposed by Shumeet Baluja in 1994. The algorithm is simpler than a standard genetic algorithm, and in many cases leads to better results than a standard genetic algorithm.
Algorithm
In PBIL, genes are represented as real values in the range
,1 indicating the probability that any particular
allele
An allele (, ; ; modern formation from Greek ἄλλος ''állos'', "other") is a variation of the same sequence of nucleotides at the same place on a long DNA molecule, as described in leading textbooks on genetics and evolution.
::"The chro ...
appears in that
gene
In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
.
The PBIL algorithm is as follows:
# A population is generated from the probability vector.
# The fitness of each member is evaluated and ranked.
# Update population genotype (probability vector) based on fittest individual.
# Mutate.
# Repeat steps 1–4
Source code
This is a part of source code implemented in
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
. In the paper, learnRate = 0.1, negLearnRate = 0.075, mutProb = 0.02, and mutShift = 0.05 is used. N = 100 and ITER_COUNT = 1000 is enough for a small problem.
public void optimize() {
final int totalBits = getTotalBits();
final double[] probVec = new double[totalBits];
Arrays.fill(probVec, 0.5);
bestCost = POSITIVE_INFINITY;
for (int i = 0; i < ITER_COUNT; i++) {
// Creates N genes
final boolean[][] genes = new totalBits];
for (boolean[] gene : genes) {
for (int k = 0; k < gene.length; k++) {
if (rand_nextDouble() < probVec[k])
gene[k] = true;
}
}
// Calculate costs
final double[] costs = new double[N];
for (int j = 0; j < N; j++) {
costs = costFunc.cost(toRealVec(genes[j], domains));
}
// Find min and max cost genes
boolean[] minGene = null, maxGene = null;
double minCost = POSITIVE_INFINITY, maxCost = NEGATIVE_INFINITY;
for (int j = 0; j < N; j++) {
double cost = costs
if (minCost > cost) {
minCost = cost;
minGene = genes
}
if (maxCost < cost) {
maxCost = cost;
maxGene = genes
}
}
// Compare with the best cost gene
if (bestCost > minCost) {
bestCost = minCost;
bestGene = minGene;
}
// Update the probability vector with max and min cost genes
for (int j = 0; j < totalBits; j++) {
if (minGene maxGene {
probVec = probVec * (1d - learnRate) +
(minGene ? 1d : 0d) * learnRate;
} else {
final double learnRate2 = learnRate + negLearnRate;
probVec = probVec * (1d - learnRate2) +
(minGene ? 1d : 0d) * learnRate2;
}
}
// Mutation
for (int j = 0; j < totalBits; j++) {
if (rand.nextDouble() < mutProb) {
probVec = probVec * (1d - mutShift) +
(rand.nextBoolean() ? 1d : 0d) * mutShift;
}
}
}
}
See also
*
Estimation of distribution algorithm
''Estimation of distribution algorithms'' (EDAs), sometimes called ''probabilistic model-building genetic algorithms'' (PMBGAs), are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilis ...
(EDA)
*
Learning Classifier System
Learning classifier systems, or LCS, are a paradigm of rule-based machine learning methods that combine a discovery component (e.g. typically a genetic algorithm) with a learning component (performing either supervised learning, reinforcement lear ...
(LCS)
References
Genetic algorithms
Articles with example Java code