Learning and Memory in Cognitive Computing

"The brain does not store memories like files on a disk — it stores them as patterns of connection strengths, built up gradually by experience and retrieved by partial cues."- Claude 2026

Learning and Memory in Cognitive Computing

How biological brains and artificial networks both change through experience — and what the similarities and gaps between them reveal about how memory works.

Synaptic plasticity: strengthening (LTP) or weakening (LTD) of a synapse depends on how actively it is used. Source: Alan Woodruff / Queensland Brain Institute, UQ

Learning objectives

By the end of this page you should be able to:

Explain mechanisms of learning and memory in biological and artificial systems.
Describe neural network training techniques supporting learning processes.
Analyze computational models of memory based on cognitive neuroscience.

Learning and Memory in Biological and Artificial Systems

The story of learning in the brain starts at the synapse — the tiny gap between two neurons across which a signal passes. When a synapse is used repeatedly and effectively, it grows stronger: the transmitting neuron releases more signal, or the receiving neuron grows more receptors to catch it. When it is used rarely, it weakens. This activity-dependent change in connection strength — synaptic plasticity — is the physical basis of memory. The idea was stated plainly by Donald Hebb in 1949 in what has become known as Hebb's rule: neurons that fire together wire together. If neuron A repeatedly helps trigger neuron B, the connection between them strengthens.

Hebb's rule (1949): neurons that fire together wire together. If neuron A repeatedly helps trigger neuron B, the synapse between them strengthens. This single principle underlies nearly every learning model on this page.

The most studied form of synaptic strengthening is long-term potentiation (LTP), first documented in 1973. Whether a synapse potentiates or depresses depends on its recent activity: heavily used synapses tend toward LTP; rarely used ones toward LTD. Together they give the brain a mechanism for writing experience into connection strengths — exactly what a network's training algorithm does numerically.

Long-Term Potentiation (LTP)

Persistent strengthening of a synapse following repeated, rapid activation. The transmitting neuron releases more signal or the receiving neuron grows more receptors. The "volume" stays turned up for hours to a lifetime — the physical substrate of long-term memory formation.

Long-Term Depression (LTD)

Persistent weakening of a synapse following low or ineffective activity. The reverse of LTP — pruning connections that carry little useful signal, keeping the memory system from becoming saturated and preserving selectivity.

Types of memory: a brief map

Biological memory is not one thing. Cognitive neuroscience distinguishes several systems that operate differently and depend on different brain structures.

Memory type	What it holds	Duration	Key brain region
Working memory	Information actively held "in mind" right now — a phone number, the current step in a task	Seconds to minutes	Prefrontal cortex
Episodic memory	Specific personal events with their context ("what I had for breakfast yesterday")	Days to a lifetime	Hippocampus
Semantic memory	General world knowledge, facts, concepts ("Paris is the capital of France")	Long-term	Neocortex
Procedural memory	Skills and habits — how to ride a bicycle, how to type	Long-term; often automatic	Basal ganglia, cerebellum

Artificial neural networks conflate most of these into a single substrate — the weight matrix — but computational models have begun to represent each system separately, as the final section of this page explores.

Neural Network Training Techniques

A neural network "learns" by adjusting its connection weights until its outputs match a target. The mechanism that makes this adjustment is called backpropagation — short for backward propagation of errors — combined with an optimization strategy called gradient descent. Understanding these two ideas is essential because they are the closest artificial analog to the synaptic plasticity described above.

Gradient descent and backpropagation

Think of the network's total error as a surface in a high-dimensional space, with a valley at the minimum. Gradient descent navigates toward that valley by stepping opposite to the steepest upward slope. The size of each step is the learning rate — too large and it overshoots; too small and it stalls. Backpropagation supplies the gradients needed: using the chain rule of calculus, it propagates the output error backward through each layer, apportioning blame to each weight in proportion to its contribution to the mistake. Each training iteration runs in three stages:

①

Forward pass

Input flows through the network layer by layer, producing a prediction.

②

Compute error

A loss function measures how far the prediction is from the target.

③

Backward pass + update

Gradients propagate backward; each weight is nudged to reduce the error.

Batch vs online learning

In batch gradient descent, the gradient is averaged over the full training set before any weights are updated — stable but slow and memory-heavy. In stochastic (online) gradient descent, weights are updated after every single example — fast and memory-light but noisy. Mini-batch gradient descent splits the difference, averaging over small random subsets and is by far the most common practice in deep learning.

Overfitting and regularization

Overfitting occurs when a network memorizes the training data rather than learning its underlying pattern — it performs well on training examples but poorly on new ones. The biological parallel is rote memorization without generalization. Regularization techniques (adding a penalty for large weights, or randomly disabling neurons during training via dropout) push the network toward simpler, more generalizable solutions.

The code below trains a small two-layer network from scratch on a classic non-linear problem: the XOR function, which a single-layer network cannot solve. Only NumPy is used — no deep learning library — so every step of the gradient computation is visible.

import numpy as np

X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])

np.random.seed(0)
W1 = np.random.randn(2, 4)
b1 = np.zeros((1, 4))
W2 = np.random.randn(4, 1)
b2 = np.zeros((1, 1))

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_deriv(z):
    s = sigmoid(z)
    return s * (1 - s)

lr = 0.5

for epoch in range(10000):
    z1 = X @ W1 + b1
    a1 = sigmoid(z1)
    z2 = a1 @ W2 + b2
    a2 = sigmoid(z2)

    loss = np.mean((y - a2) ** 2)

    d_a2 = -2 * (y - a2) / len(y)
    d_z2 = d_a2 * sigmoid_deriv(z2)
    d_W2 = a1.T @ d_z2
    d_b2 = d_z2.sum(axis=0, keepdims=True)

    d_a1 = d_z2 @ W2.T
    d_z1 = d_a1 * sigmoid_deriv(z1)
    d_W1 = X.T @ d_z1
    d_b1 = d_z1.sum(axis=0, keepdims=True)

    W2 -= lr * d_W2
    b2 -= lr * d_b2
    W1 -= lr * d_W1
    b1 -= lr * d_b1

    if epoch % 2000 == 0:
        print(f'epoch {epoch:5d}  loss {loss:.4f}')

print('predictions:', np.round(a2.T, 2))

After 10,000 epochs the predictions converge to near [0, 1, 1, 0]. The loop structure — forward pass, compute error, backward pass, update — is identical in principle to the iterative weight updates happening in any deep learning framework, just without the engineering scaffolding that makes it fast at scale.

Computational Models of Memory Based on Cognitive Neuroscience

Standard feedforward networks trained by backpropagation are powerful classifiers, but they do not model memory as cognitive science understands it. Three families of model have been developed specifically to bridge that gap, each grounded in a different aspect of what neuroscience has found out about how memory works.

Hopfield networks: associative memory

A Hopfield network is a fully connected recurrent network proposed by John Hopfield in 1982, designed to model associative (content-addressable) memory — the ability to retrieve a complete memory from a partial or noisy cue, the way hearing the first few bars of a song brings back the whole melody. Patterns are stored as stable states of the network. The network is given a partial or corrupted input and iteratively updates its neurons — each checking whether flipping its state would lower a global energy function — until it settles into the nearest stored pattern. The biological parallel is the brain's ability to "fill in" a degraded percept from stored experience, a phenomenon psychologists call pattern completion.

Hopfield networks have a limited storage capacity — roughly 0.15 × N patterns for a network of N neurons — and can produce spurious states: stable configurations that were never stored, analogous to a false memory. Both limits have biological parallels and have been extensively studied as models of memory failure.

Complementary Learning Systems: hippocampus and neocortex

One of the most influential models bridging neuroscience and machine learning is the Complementary Learning Systems (CLS) theory, introduced by McClelland, McNaughton, and O'Reilly in 1995. It addresses a fundamental question: how does the brain learn new things quickly without overwriting what it already knows? A standard neural network suffers from catastrophic forgetting — training it intensively on a new task degrades its performance on old ones, because the same weights serve both. The brain avoids this.

CLS proposes that two systems with different learning dynamics work together to solve this. Memories are initially encoded in the hippocampus, then gradually transferred to neocortex through replay during sleep and rest. CLS has directly inspired continual learning and experience replay in modern deep reinforcement learning.

Hippocampus — fast encoding

One-shot learning. Sparse, non-overlapping representations keep new memories distinct (pattern separation). Acts as short-term buffer; damage disrupts recent but not remote memory.

Sleep replay — consolidation

The hippocampus replays recent memories during sleep, each replay nudging neocortical weights a little. Gradual transfer from fast short-term store to slow long-term store.

Neocortex — slow generalization

Integrates regularities across many exposures into distributed, overlapping representations. Supports semantic knowledge and generalization (pattern completion).

Working memory in recurrent networks: LSTMs

The Long Short-Term Memory (LSTM) architecture, introduced by Hochreiter and Schmidhuber in 1997, solves the vanishing-gradient problem of standard RNNs with an explicit cell state — a memory register controlled by three learned gates that model the selective maintenance and updating of working memory:

Input gate — Write

Decides which new information from the current input is worth storing in the cell state. Analogous to encoding a new experience into working memory.

Forget gate — Hold

Decides which existing content to keep and which to discard. Mirrors the selective maintenance of relevant context in working memory while clearing outdated information.

Output gate — Discard

Controls what portion of the cell state is exposed as the output at this step — what the network "pays attention to" from its stored context right now.

The three models each address a different memory function: Hopfield networks model pattern-completion retrieval from partial cues; CLS theory models the long-term consolidation of experience from fast hippocampal encoding to slow cortical generalization; and LSTMs model the active maintenance and updating of short-term context. A complete computational account of memory would need all three — and the brain appears to use analogs of all three simultaneously.

Tools & Tutorials

TensorFlow Playground — an in-browser, real-time neural network visualization; adjust learning rate, layers, activation functions, and watch gradient descent shape the decision boundary live. No installation required.
GeeksforGeeks — Backpropagation in Neural Network — a step-by-step walkthrough of the forward and backward passes, with a worked XOR example and code.
Towards Data Science — Hopfield Networks: Neural Memory Machines — a conceptual and implementation walkthrough of Hopfield networks from scratch, including animation of pattern retrieval.
GeeksforGeeks — Hopfield Neural Network — covers discrete and continuous variants, energy minimization, and storage capacity limits with code.

Learning and Memory in Cognitive Computing

Learning objectives

Learning and Memory in Biological and Artificial Systems

Long-Term Potentiation (LTP)

Long-Term Depression (LTD)

Types of memory: a brief map

Neural Network Training Techniques

Gradient descent and backpropagation

Forward pass

Compute error

Backward pass + update

Batch vs online learning

Overfitting and regularization

Computational Models of Memory Based on Cognitive Neuroscience

Hopfield networks: associative memory

Complementary Learning Systems: hippocampus and neocortex

Hippocampus — fast encoding

Sleep replay — consolidation

Neocortex — slow generalization

Working memory in recurrent networks: LSTMs

Input gate — Write

Forget gate — Hold

Output gate — Discard

Tools & Tutorials

Further reading