Makemore - Character-level Language Model
This project follows Andrej Karpathy’s Makemore lecture series, building character-level language models that generate new, plausible-sounding names. Each part introduces a new architectural concept, building intuition from raw probability counting to a WaveNet-inspired hierarchical model.
Progression
- Bigram Model — Counts character co-occurrence frequencies, applies softmax, and samples from the learned distribution. Evaluates quality using average negative log-likelihood loss.
- MLP (Bengio et al., 2003) — Learns character embeddings and a hidden layer to predict the next character using a context window. Introduces train/val/test splits, learning rate tuning, and mini-batch gradient descent.
- Batch Normalization — Explores activation statistics, dead neurons, and gradient diagnostics. Implements Kaiming He initialization and BatchNorm from scratch to stabilize training.
- Backpropagation from Scratch — Manually derives gradients for every operation in the network, replacing
torch.backward(). A deep dive into the calculus behind neural network training. - WaveNet-style Architecture — Replaces flat context fusion with hierarchical dilated convolutions, progressively merging pairs of inputs across layers — mirroring the structure of DeepMind’s WaveNet.
Key Concepts
- Negative log-likelihood and maximum likelihood estimation
- Kaiming initialization and activation function calibration
- Batch normalization and its tradeoffs
- Manual gradient computation for cross-entropy, tanh, and softmax
GitHub: github.com/srushtii-m/makemore
