"Nature's algorithm is simple — vary, select, repeat. Genetic algorithms prove that this is enough to solve problems of extraordinary complexity. — Claude"- Claude 2026
How borrowing Darwin's logic — vary, select, repeat — produces a surprisingly powerful family of optimization algorithms.
Genetic algorithms are one example of an AI-driven optimization method. Others include optimizing artificial neural networks — using an algorithm like Adam to tune the weights — and optimizing reinforcement learning models, where the agent's decision-making policy is what is being optimized.
By the end of this page you should be able to:
In 1859 Darwin observed that individuals with traits better suited to their environment tend to survive and reproduce more — driving gradual improvement across generations. In 1975, John Holland translated this idea into a computational algorithm. A genetic algorithm (GA) is an optimization algorithm that works by maintaining a pool of candidate solutions (called the population) and improving them over successive rounds. At each round, poor solutions are discarded and new ones are generated by combining and slightly varying the better ones — just as evolution keeps traits that help and discards those that don't. Each round is called a generation.
The terminology is borrowed directly from biology and used throughout the field of evolutionary computation. The table below maps each biological concept to its equivalent component in the optimization algorithm — from this point forward, all of these terms refer exclusively to parts of the algorithm, not to biology:
A chromosome is therefore simply a data structure — a string of values — that encodes one candidate solution to the problem being solved. A population is the working set of solutions the algorithm is currently evaluating. Selection is a computational procedure that filters which solutions are carried forward. Crossover and mutation are operators that produce new candidate solutions from existing ones. The biology is the inspiration; the implementation is entirely algorithmic.
Encoding is the process of representing a candidate solution as a string of values that the algorithm can store and manipulate. Before a GA can run, the problem must be translated: every possible solution in the real world must map to a string, and every string must map back to a valid solution. The encoding type must be chosen to match the structure of the problem — the right choice makes the algorithm's operations natural and meaningful.
Four encoding types cover most problems:
For a full walkthrough of all encoding types with interactive examples, see Obitko — GA Encoding. GeeksforGeeks also has a concise reference: Encoding Methods in Genetic Algorithms.
A score (also called a fitness score) is a single number that tells the algorithm how good a candidate solution is. The function that computes it — called the scoring function or fitness function — takes an encoded solution, evaluates it against the goal of the problem, and returns that number. The algorithm uses scores to decide which solutions to keep, which to discard, and which to use as the basis for generating new ones. A score does not have to come from a formula — it can be the result of a simulation, a test run, or any measurable outcome, as long as it reliably reflects solution quality. Whether a higher or lower score is better depends on the problem.
Binary — 0/1 knapsack (maximise total value without exceeding weight limit of 8):
A solution that violates a constraint — here, exceeding the weight limit — receives a score of 0 (or a heavy penalty), making it unlikely to be selected.
Permutation — TSP (minimise total travel distance across 4 cities):
The score is simply the total route distance. Lower is better, so the algorithm minimises this value.
Real-valued — neural network weights (maximise prediction accuracy on a test set):
The network is run against a set of test inputs and the percentage of correct predictions is the score.
Integer/value — job scheduling (minimise the time until all tasks are finished, called the makespan):
Task durations: A=4, B=2, C=5, D=3. The makespan is the longest any single machine has to work. Lower = better.
A GA repeats the same loop until a stopping condition is met — for example, a candidate solution reaches a target quality score (fitness threshold), a set maximum number of rounds (generations) has elapsed, or the best score in the pool has stopped improving (stagnation):
Two properties make GAs useful for a wide range of problems. First, they work with many solutions at once (population-based): rather than following a single path and refining one answer, the algorithm maintains a whole pool of candidates and compares them at every step — this helps avoid getting stuck on a single poor answer. Second, they need only a score for each candidate (making them derivative-free): no formula describing how the problem behaves, no gradient, no mathematical analysis of its structure. As long as you can evaluate how good a solution is, the algorithm can run — which makes it applicable to problems that are too complex or irregular for traditional mathematical methods (non-differentiable, discontinuous, or black-box problems).
Three operators drive the algorithm. Together they balance exploitation (refining the best solutions found so far) against exploration (trying solutions in parts of the problem space not yet examined).
Selection decides which candidate solutions are kept and used to generate the next round. Higher-scoring solutions should be chosen more often — but if only the very best solutions are ever selected, the pool quickly becomes a set of near-identical copies of the same answer (loss of diversity), and the algorithm stops making progress (premature convergence). The goal is to favor good solutions while keeping enough variety to keep improving. Three common strategies:
Elitism is a common addition: the highest-scoring solution found so far is always copied unchanged into the next round, ensuring the best answer the algorithm has found is never discarded.
Crossover is a recombination operator: it takes two selected candidate solutions — called parents in GA terminology, meaning the two encoded solutions chosen as inputs — and combines segments of their encodings to produce new candidate solutions (offspring), which enter the next generation. Applied with probability pc (typically 0.6–0.9), it is the primary mechanism by which useful partial solutions found in different candidates are combined into better overall solutions.
Single-point crossover: a random crossover point divides each parent's encoding; the tails are swapped to produce two children.
Two-point crossover extends this to two cut points, swapping the middle segment. Uniform crossover decides position-by-position (coin flip) which parent's value each position in the new solution takes — higher diversity, but fewer contiguous solution fragments are preserved intact.
Mutation applies small, random changes to a candidate solution's encoding with a low probability pm (typically 0.001–0.01). Its purpose is to introduce variation that recombination alone cannot produce. Without mutation, the algorithm can only combine values already present in the pool; if those values are all similar, it has no way to discover anything genuinely different. Mutation is what prevents the pool from becoming a set of near-identical solutions too early (premature convergence).
Setting pm too high turns the GA into a random search; too low and all solutions may become nearly identical before a good optimum is found — a problem called premature convergence, where the population locks onto a local optimum and stops improving. Mutation rate is a key hyperparameter.
Putting the three operators together, the complete GA loop is concise:
population = initialize(pop_size, chromosome_length) for generation in range(max_generations): fitness_scores = evaluate(population, fitness_function) if converged(fitness_scores): break parents = selection(population, fitness_scores) offspring = crossover(parents, p_c) offspring = mutation(offspring, p_m) population = form_next_generation(parents, offspring, elitism=True) return best(population, fitness_scores)
The fitness function (formally written f: chromosome → ℝ, and also called the objective function) is the scoring rule that tells the algorithm how good each candidate solution is — a single number that represents solution quality. It is the only connection between the algorithm and the problem being solved. The GA has no other information about the problem: no rules, no shortcuts, no understanding of why one solution is better than another. It only sees scores. Everything it learns about which solutions are good comes entirely from those scores.
Designing the fitness function is often the hardest part of applying a GA. Three properties matter:
Plotting the fitness score over the full search space — with each point representing one encoded candidate solution — creates a fitness landscape: a surface of peaks (high-fitness solutions) and valleys (low-fitness solutions). The algorithm's population roams this landscape, and selection steers it toward peaks. The landscape's shape determines how easy or hard the search will be:
One clear peak, no local optima — almost any method works. GAs are rarely needed here.
Many peaks of varying height. Crossover and mutation keep the pool varied, reducing the risk of settling on a good-but-not-best peak.
Local cues point away from the global optimum. Partial patterns that look promising lead, when combined, to a poor solution. The hardest type for GAs.
Selective pressure describes how strongly the quality scores influence which solutions are kept. High pressure means only the top-scoring solutions make it through — progress is fast but the algorithm risks locking in on a good-but-not-best answer (a local optimum) and never escaping it (premature convergence). Low pressure means even weaker solutions are regularly kept — the pool stays varied (maintaining diversity) and keeps exploring, but progress is slow. Tuning the selection method, pool size, and mutation rate to find the right balance is one of the core challenges in applying GAs.
Holland's Schema Theorem (1975) provides the theoretical foundation for why GAs work. A schema is a template that describes a subset of chromosomes — for example, the pattern 1##0#1 matches any binary string with 1 in position 1, 0 in position 4, and 1 in position 6. Schemas with above-average fitness, few defined positions (low order), and defined positions close together (short defining length) grow exponentially across generations. Holland argued that GAs implicitly evaluate an exponential number of schemas in parallel — a phenomenon he called implicit parallelism — which explains their efficiency despite the enormous size of most search spaces.
1##0#1