Makemore - Character-level Language Model

This project follows Andrej Karpathy’s Makemore lecture series, building character-level language models that generate new, plausible-sounding names. Each part introduces a new architectural concept, building intuition from raw probability counting to a WaveNet-inspired hierarchical model.

Progression

Bigram Model — Counts character co-occurrence frequencies, applies softmax, and samples from the learned distribution. Evaluates quality using average negative log-likelihood loss.
MLP (Bengio et al., 2003) — Learns character embeddings and a hidden layer to predict the next character using a context window. Introduces train/val/test splits, learning rate tuning, and mini-batch gradient descent.
Batch Normalization — Explores activation statistics, dead neurons, and gradient diagnostics. Implements Kaiming He initialization and BatchNorm from scratch to stabilize training.
Backpropagation from Scratch — Manually derives gradients for every operation in the network, replacing torch.backward(). A deep dive into the calculus behind neural network training.
WaveNet-style Architecture — Replaces flat context fusion with hierarchical dilated convolutions, progressively merging pairs of inputs across layers — mirroring the structure of DeepMind’s WaveNet.

Key Concepts

Negative log-likelihood and maximum likelihood estimation
Kaiming initialization and activation function calibration
Batch normalization and its tradeoffs
Manual gradient computation for cross-entropy, tanh, and softmax

GitHub: github.com/srushtii-m/makemore

Share on

Twitter Facebook LinkedIn

Srushti Manjunath

Progression

Key Concepts

Share on