Algorithm Design

Approximation
Algorithms

Most important real-world optimization problems — routing, scheduling, packing, covering — are NP-hard. Waiting for an exact solution is not an option. Approximation algorithms offer a principled alternative: polynomial-time procedures with provable guarantees that their solutions are never too far from optimal. Understanding those guarantees is the core of the field.

NP-Hard Optimization Approximation Ratio Vertex Cover Travelling Salesman Knapsack / PTAS Inapproximability

The NP-Hard Problem

An optimization problem asks us to find the best solution from a (usually enormous) set of candidates — the shortest tour, the smallest cover, the most valuable packing. For problems in P, exact optimal solutions can be found in polynomial time. For NP-hard optimization problems, no polynomial-time exact algorithm is known, and widely believed complexity theory suggests none exists.

Faced with an NP-hard optimization problem, three responses are available: restrict inputs to tractable special cases, accept exponential runtime for small instances, or give up on optimality and settle for a solution that is provably “close enough.” The third path is the domain of approximation algorithms: efficient procedures that sacrifice exactness in exchange for a guaranteed bound on how far from optimal their output can be.

Key property: Unlike heuristics, approximation algorithms come with a proof. The guarantee holds for every instance, not just on average or in practice. This makes them analytically rigorous tools, not guesses.

THE APPROXIMATION RATIO — HOW CLOSE IS CLOSE ENOUGH? MINIMISATION (e.g. TSP, Vertex Cover) OPT Optimal cost C* (unknown, NP-hard to find) ALG Algorithm cost C (found in poly time) Gap ρ = C / C* ≥ 1 A ρ-approximation guarantees C ≤ ρ ⋅ C* for every input APPROXIMABILITY SPECTRUM PTAS Polynomial-Time Approximation Scheme ρ = 1+ε (any ε>0) e.g. Knapsack, Euclidean TSP Constant-Factor APX Fixed ratio, independent of input size ρ = 2, 1.5 etc. e.g. Vertex Cover (2), Metric TSP (1.5) Logarithmic Factor Ratio grows with input size ρ = O(log n) e.g. Set Cover, Dominating Set Inapproximable No poly-time approx unless P=NP ρ = any polynomial e.g. General TSP, Max Clique
Left: for a minimisation problem, the approximation ratio ρ = C / C* measures how much the algorithm's cost C exceeds the unknown optimal C*. A ρ-approximation guarantees this ratio is at most ρ on every input. Right: NP-hard problems vary enormously in their approximability. Some admit arbitrarily accurate polynomial-time solutions (PTAS); others cannot be approximated within any polynomial factor unless P = NP.

Formal Definition

An algorithm A is a ρ-approximation algorithm for an optimisation problem if, for every input instance I, the cost C of A's output satisfies:

C ≤ ρ ⋅ OPT   (minimisation)     or     C ≥ OPT / ρ   (maximisation)

The ratio ρ ≥ 1 is called the approximation ratio or performance guarantee. It is a worst-case bound that holds for every instance, regardless of size or structure. A smaller ρ is better. When ρ = 1 + ε for any ε > 0 chosen by the user (at the cost of a slower runtime), the algorithm family is called a Polynomial-Time Approximation Scheme (PTAS).

A critical observation: All NP-complete decision problems are equivalent under polynomial reductions, so NP-completeness theory gives us no way to distinguish them in difficulty. Approximation theory reveals fine-grained structure: vertex cover and maximum clique are both NP-hard, yet vertex cover admits a constant-factor approximation while maximum clique is provably inapproximable within any polynomial factor.

Vertex cover: a graph with edges and highlighted vertex cover nodes
A graph instance of Minimum Vertex Cover. The highlighted vertices form a cover — every edge has at least one endpoint in the set. Finding the minimum such set is NP-hard; the matching-based algorithm finds one at most twice the minimum size. Source: GeeksforGeeks — Vertex Cover Problem

The Matching-Based 2-Approximation

Problem: Given an undirected graph G = (V, E), find the smallest set C ⊆ V such that every edge has at least one endpoint in C (a vertex cover). This is NP-hard.

Algorithm: Repeatedly pick any uncovered edge (u, v), add both u and v to the cover, and remove all edges incident to either. Return C when no edges remain. Runtime: O(V + E).

Why it gives a 2-approximation: The edges the algorithm picks form a maximal matching M — no two selected edges share a vertex. Any vertex cover must include at least one endpoint of every edge in M, so OPT ≥ |M|. The algorithm includes both endpoints of each edge in M, giving |C| = 2|M| ≤ 2 ⋅ OPT. The ratio of 2 is the best known for vertex cover; improving it would be a major result.

Optimal TSP route connecting cities on a map
An optimised route through cities solving the Travelling Salesman Problem. For general (non-metric) TSP, no polynomial approximation is possible unless P = NP. Adding the triangle inequality constraint (Metric TSP) makes 1.5-approximation achievable via Christofides' algorithm. Source: Towards Data Science — Solving Geographic TSP with Python

Why TSP Approximability Depends on the Metric

General TSP: For arbitrary edge weights, no polynomial-time ρ-approximation exists for any constant ρ unless P = NP. The proof uses a reduction from Hamiltonian cycle: if an approximation existed, it could decide Hamiltonian cycle in polynomial time, which is NP-complete.

Metric TSP (where edge weights satisfy the triangle inequality): Christofides' algorithm (1976) achieves a 1.5-approximation in polynomial time. It builds a minimum spanning tree (a lower bound on OPT), finds a minimum-weight perfect matching on the odd-degree vertices, combines them into an Eulerian circuit, and shortcircuits repeated vertices. The 1.5 ratio follows from the fact that the matching costs at most OPT/2 and the MST costs at most OPT.

Recent progress: In 2021, Karlin, Klein, and Oveis Gharan improved the ratio to 1.5 − 10−36 — the first improvement in nearly 50 years. The current best inapproximability lower bound is 123/122 ≈ 1.008, leaving a gap that remains open.

The Knapsack PTAS

The 0/1 Knapsack problem — given items with weights and values, fill a capacity-W knapsack to maximise value — is NP-hard. However, it admits a Polynomial-Time Approximation Scheme: for any ε > 0 chosen by the user, there is an algorithm that runs in time O(n3/ε) and returns a solution with value at least (1 − ε) ⋅ OPT.

The technique — value rounding: Round all item values down to the nearest multiple of K = ε ⋅ vmax / n, where vmax is the maximum item value and n is the number of items. This reduces the range of distinct values to O(n/ε), making dynamic programming tractable in polynomial time. The rounding introduces at most ε relative error in the optimum.

The trade-off: Smaller ε gives a better approximation but longer runtime. This trade-off is inherent: if runtime polynomial in 1/ε were not needed, the problem would be solvable exactly in polynomial time. The existence of a PTAS does not contradict NP-hardness — it means the problem becomes tractable when we accept a small, user-chosen error.

Problem Best Known Ratio Inapproximability Lower Bound Technique
Minimum Vertex Cover 2 1.36 (under UGC: 2) Maximal matching
Metric TSP 1.5 − 10−36 123/122 ≈ 1.008 MST + matching (Christofides)
Set Cover ln n + 1 (1−ε) ln n unless P=NP Greedy — tight
Knapsack (0/1) 1 + ε (PTAS) None (PTAS exists) Value rounding + DP
Max-3SAT 7/8 ≈ 0.875 7/8 − ε (optimal under PCP) Random assignment
General TSP None Inapproximable for any ρ
Maximum Clique n / 2O(log n) n1−ε for any ε unless P=NP Semidefinite programming
🌳

Greedy Algorithms

Make locally optimal choices at each step. Often yield logarithmic approximations for covering problems (Set Cover: greedily pick the set covering the most uncovered elements). Simple and fast, but the ratio is determined by the problem's structure.

🔀

LP Relaxation + Rounding

Formulate the problem as an integer linear program, solve the LP relaxation in polynomial time, then round the fractional solution to an integer one. Vertex Cover's LP gives a 2-approximation by rounding every variable ≥ 0.5 up to 1. The integrality gap of the LP bounds the achievable ratio.

🧵

Primal–Dual Schema

Simultaneously construct a feasible primal solution and a dual solution that witnesses a lower bound on OPT. The approximation ratio follows from comparing the two. Used in Shortest Path, Steiner Tree, and generalised matching problems.

📏

Dynamic Programming + Rounding

Round or scale problem parameters to reduce the state space of DP to polynomial size, introducing controlled error. The basis of PTAS designs for Knapsack, Subset Sum, and Scheduling. The approximation ratio trades off with runtime via the error parameter ε.

The PCP Theorem and inapproximability: The Probabilistically Checkable Proofs theorem (1992–1998) established that for many NP-hard problems, approximating beyond a certain threshold is itself NP-hard. It unified the study of hardness of approximation and showed that some problems are provably hard to approximate to any constant factor — not merely hard to solve exactly.