Fundamentals of Genetic Algorithms

"Nature's algorithm is simple — vary, select, repeat. Genetic algorithms prove that this is enough to solve problems of extraordinary complexity. — Claude"- Claude 2026

Fundamentals of Genetic Algorithms

How borrowing Darwin's logic — vary, select, repeat — produces a surprisingly powerful family of optimization algorithms.

Genetic algorithms are one example of an AI-driven optimization method. Others include optimizing artificial neural networks — using an algorithm like Adam to tune the weights — and optimizing reinforcement learning models, where the agent's decision-making policy is what is being optimized.

Learning objectives

By the end of this page you should be able to:

  1. Describe the principles of genetic algorithms and their evolutionary basis.
  2. Apply operators such as selection, crossover, and mutation in genetic algorithms.
  3. Evaluate the role of fitness functions in guiding optimal solutions.

Principles and Evolutionary Basis

In 1859 Darwin observed that individuals with traits better suited to their environment tend to survive and reproduce more — driving gradual improvement across generations. In 1975, John Holland translated this idea into a computational algorithm. A genetic algorithm (GA) is an optimization algorithm that works by maintaining a pool of candidate solutions (called the population) and improving them over successive rounds. At each round, poor solutions are discarded and new ones are generated by combining and slightly varying the better ones — just as evolution keeps traits that help and discards those that don't. Each round is called a generation.

The terminology is borrowed directly from biology and used throughout the field of evolutionary computation. The table below maps each biological concept to its equivalent component in the optimization algorithm — from this point forward, all of these terms refer exclusively to parts of the algorithm, not to biology:

BiologyGenetic algorithm
Individual organismOne candidate solution
ChromosomeThe encoding of a solution (e.g., a bit string)
GeneOne element of that encoding
AlleleThe value a gene holds (e.g., 0 or 1)
PopulationThe current set of candidate solutions
FitnessHow good a candidate solution is — its quality score (fitness score)
Natural selectionSelection operator
ReproductionCrossover and mutation
GenerationOne complete cycle of evaluation → selection → reproduction

A chromosome is therefore simply a data structure — a string of values — that encodes one candidate solution to the problem being solved. A population is the working set of solutions the algorithm is currently evaluating. Selection is a computational procedure that filters which solutions are carried forward. Crossover and mutation are operators that produce new candidate solutions from existing ones. The biology is the inspiration; the implementation is entirely algorithmic.

Encoding

Encoding is the process of representing a candidate solution as a string of values that the algorithm can store and manipulate. Before a GA can run, the problem must be translated: every possible solution in the real world must map to a string, and every string must map back to a valid solution. The encoding type must be chosen to match the structure of the problem — the right choice makes the algorithm's operations natural and meaningful.

Four encoding types cover most problems:

  • Binary: each position holds a 0 or 1. Works naturally for yes/no and combinatorial problems. Example: the 0/1 knapsack problem — each bit indicates whether a particular item is packed (1) or left out (0).
    ItemABCDE
    Packed?10110
  • Permutation: each position holds a distinct element from a fixed set, representing an ordering. Used for sequencing and routing problems. Example: the Traveling Salesman Problem — each string is a sequence of city labels representing a route.
    Stop1st2nd3rd4th5th
    City31425
  • Real-valued: each position holds a floating-point number. Used when the solution is a set of continuous parameters. Example: tuning the weights of a neural network — each position holds one weight value.
    Weightw₁w₂w₃w₄
    Value0.472−1.8300.0910.654
  • Integer / value: each position holds a discrete value from a defined set. Used when solutions involve categories or discrete choices. Example: scheduling — each position represents a machine assigned to a task.
    TaskABCD
    Machine2131

For a full walkthrough of all encoding types with interactive examples, see Obitko — GA Encoding. GeeksforGeeks also has a concise reference: Encoding Methods in Genetic Algorithms.

Scoring

A score (also called a fitness score) is a single number that tells the algorithm how good a candidate solution is. The function that computes it — called the scoring function or fitness function — takes an encoded solution, evaluates it against the goal of the problem, and returns that number. The algorithm uses scores to decide which solutions to keep, which to discard, and which to use as the basis for generating new ones. A score does not have to come from a formula — it can be the result of a simulation, a test run, or any measurable outcome, as long as it reliably reflects solution quality. Whether a higher or lower score is better depends on the problem.

Binary — 0/1 knapsack (maximise total value without exceeding weight limit of 8):

ItemABCDETotal valueTotal weightScore
Value85693
Weight32422
Solution 1: [1,0,1,1,0]239 ❌0 (over limit)
Solution 2: [1,1,0,1,0]227 ✓22 (higher = better)

A solution that violates a constraint — here, exceeding the weight limit — receives a score of 0 (or a heavy penalty), making it unlikely to be selected.

Permutation — TSP (minimise total travel distance across 4 cities):

RouteLeg 1Leg 2Leg 3ReturnScore (total distance)
[1, 3, 2, 4]1→3: 253→2: 152→4: 184→1: 2078 km
[1, 2, 4, 3]1→2: 102→4: 184→3: 83→1: 2561 km (lower = better)

The score is simply the total route distance. Lower is better, so the algorithm minimises this value.

Real-valued — neural network weights (maximise prediction accuracy on a test set):

Solutionw₁w₂w₃w₄Score (accuracy)
Solution A0.472−1.8300.0910.65472%
Solution B0.810−0.4400.3301.20087% (higher = better)

The network is run against a set of test inputs and the percentage of correct predictions is the score.

Integer/value — job scheduling (minimise the time until all tasks are finished, called the makespan):

SolutionA→B→C→D→Machine loadsScore (makespan)
[2, 1, 3, 1]M2M1M3M1M1: 2+3=5, M2: 4, M3: 55
[1, 1, 2, 3]M1M1M2M3M1: 4+2=6, M2: 5, M3: 36 (worse)

Task durations: A=4, B=2, C=5, D=3. The makespan is the longest any single machine has to work. Lower = better.

The Generational Cycle

Flowchart of a genetic algorithm showing initialization, evaluation, selection, crossover, mutation and new generation loop.
The genetic algorithm cycle: candidate solutions are scored, the best-scoring ones are selected, and new candidate solutions are generated from them. Source: GeeksforGeeks

A GA repeats the same loop until a stopping condition is met — for example, a candidate solution reaches a target quality score (fitness threshold), a set maximum number of rounds (generations) has elapsed, or the best score in the pool has stopped improving (stagnation):

Two properties make GAs useful for a wide range of problems. First, they work with many solutions at once (population-based): rather than following a single path and refining one answer, the algorithm maintains a whole pool of candidates and compares them at every step — this helps avoid getting stuck on a single poor answer. Second, they need only a score for each candidate (making them derivative-free): no formula describing how the problem behaves, no gradient, no mathematical analysis of its structure. As long as you can evaluate how good a solution is, the algorithm can run — which makes it applicable to problems that are too complex or irregular for traditional mathematical methods (non-differentiable, discontinuous, or black-box problems).

Selection, Crossover, and Mutation

Three operators drive the algorithm. Together they balance exploitation (refining the best solutions found so far) against exploration (trying solutions in parts of the problem space not yet examined).

Selection

Selection decides which candidate solutions are kept and used to generate the next round. Higher-scoring solutions should be chosen more often — but if only the very best solutions are ever selected, the pool quickly becomes a set of near-identical copies of the same answer (loss of diversity), and the algorithm stops making progress (premature convergence). The goal is to favor good solutions while keeping enough variety to keep improving. Three common strategies:

  • Fitness-proportionate selection (roulette wheel): each candidate is given a selection chance proportional to its score as a share of the total. For a pool with scores 4, 2, 6 (total = 12), the chances of being selected are 4-in-12, 2-in-12, and 6-in-12. Candidates are randomly drawn using these chances to fill the next round. The drawback: if one solution scores far higher than all others, it is drawn almost every time and all variety is lost.
  • Tournament selection: pick k candidate solutions at random and carry forward only the highest-scoring one. Repeat until the next round is filled. Larger values of k favor high-scoring solutions more strongly (increasing selection pressure); smaller values give lower-scoring ones more of a chance.
  • Rank selection: sort all candidates by score and assign selection chances by rank position rather than by the raw score itself. This prevents any single outlier score from dominating when the scores across the pool vary widely.

Elitism is a common addition: the highest-scoring solution found so far is always copied unchanged into the next round, ensuring the best answer the algorithm has found is never discarded.

Crossover (Recombination)

Crossover is a recombination operator: it takes two selected candidate solutions — called parents in GA terminology, meaning the two encoded solutions chosen as inputs — and combines segments of their encodings to produce new candidate solutions (offspring), which enter the next generation. Applied with probability pc (typically 0.6–0.9), it is the primary mechanism by which useful partial solutions found in different candidates are combined into better overall solutions.

Single-point crossover: a random crossover point divides each parent's encoding; the tails are swapped to produce two children.

Parent 11 1 0 10 0 1
Parent 20 1 1 01 0 0
Child 11 1 0 11 0 0
Child 20 1 1 00 0 1

Two-point crossover extends this to two cut points, swapping the middle segment. Uniform crossover decides position-by-position (coin flip) which parent's value each position in the new solution takes — higher diversity, but fewer contiguous solution fragments are preserved intact.

Mutation

Mutation applies small, random changes to a candidate solution's encoding with a low probability pm (typically 0.001–0.01). Its purpose is to introduce variation that recombination alone cannot produce. Without mutation, the algorithm can only combine values already present in the pool; if those values are all similar, it has no way to discover anything genuinely different. Mutation is what prevents the pool from becoming a set of near-identical solutions too early (premature convergence).

  • Bit-flip: iterate through each position in the encoded solution and flip its value (0→1 or 1→0) with probability pm.
  • Swap: choose two positions in the encoding and exchange their values.
  • Inversion: select a sub-sequence of the encoding and reverse it.

Setting pm too high turns the GA into a random search; too low and all solutions may become nearly identical before a good optimum is found — a problem called premature convergence, where the population locks onto a local optimum and stops improving. Mutation rate is a key hyperparameter.

Bit-flip and swap mutation diagrams showing how individual genes are changed.
Bit-flip and swap mutation. Source: Towards Data Science

The Algorithm

Putting the three operators together, the complete GA loop is concise:

population = initialize(pop_size, chromosome_length)

for generation in range(max_generations):
    fitness_scores = evaluate(population, fitness_function)

    if converged(fitness_scores):
        break

    parents  = selection(population, fitness_scores)
    offspring = crossover(parents, p_c)
    offspring = mutation(offspring, p_m)
    population = form_next_generation(parents, offspring, elitism=True)

return best(population, fitness_scores)

Fitness Functions

The fitness function (formally written f: chromosome → ℝ, and also called the objective function) is the scoring rule that tells the algorithm how good each candidate solution is — a single number that represents solution quality. It is the only connection between the algorithm and the problem being solved. The GA has no other information about the problem: no rules, no shortcuts, no understanding of why one solution is better than another. It only sees scores. Everything it learns about which solutions are good comes entirely from those scores.

What Makes a Good Fitness Function

Designing the fitness function is often the hardest part of applying a GA. Three properties matter:

  • Evaluable: the scoring function must be computable for every possible candidate solution in a reasonable amount of time. Since it runs once for every candidate at every generation, a slow scoring function dramatically slows the whole algorithm.
  • Discriminating: the function must assign meaningfully different scores to different solutions. If most candidates receive the same score, selection has nothing to act on and the algorithm makes no real progress.
  • Aligned: maximising the fitness value must truly mean solving the problem better. Misaligned fitness functions — ones that reward a proxy rather than the true goal — are a common and subtle source of failure.

Fitness Landscape

Plotting the fitness score over the full search space — with each point representing one encoded candidate solution — creates a fitness landscape: a surface of peaks (high-fitness solutions) and valleys (low-fitness solutions). The algorithm's population roams this landscape, and selection steers it toward peaks. The landscape's shape determines how easy or hard the search will be:

Smooth unimodal

One clear peak, no local optima — almost any method works. GAs are rarely needed here.

Rugged multimodal

Many peaks of varying height. Crossover and mutation keep the pool varied, reducing the risk of settling on a good-but-not-best peak.

Deceptive

Local cues point away from the global optimum. Partial patterns that look promising lead, when combined, to a poor solution. The hardest type for GAs.

Selective Pressure

Selective pressure describes how strongly the quality scores influence which solutions are kept. High pressure means only the top-scoring solutions make it through — progress is fast but the algorithm risks locking in on a good-but-not-best answer (a local optimum) and never escaping it (premature convergence). Low pressure means even weaker solutions are regularly kept — the pool stays varied (maintaining diversity) and keeps exploring, but progress is slow. Tuning the selection method, pool size, and mutation rate to find the right balance is one of the core challenges in applying GAs.

The Schema Theorem

Holland's Schema Theorem (1975) provides the theoretical foundation for why GAs work. A schema is a template that describes a subset of chromosomes — for example, the pattern 1##0#1 matches any binary string with 1 in position 1, 0 in position 4, and 1 in position 6. Schemas with above-average fitness, few defined positions (low order), and defined positions close together (short defining length) grow exponentially across generations. Holland argued that GAs implicitly evaluate an exponential number of schemas in parallel — a phenomenon he called implicit parallelism — which explains their efficiency despite the enormous size of most search spaces.

Tools & Tutorials

Further reading

→ This page was created with help from Claude AI.