Makemore - Character-level Language Model

This project follows Andrej Karpathy’s Makemore lecture series, building character-level language models that generate new, plausible-sounding names. Each part introduces a new architectural concept, building intuition from raw probability counting to a WaveNet-inspired hierarchical model.

Progression

  1. Bigram Model — Counts character co-occurrence frequencies, applies softmax, and samples from the learned distribution. Evaluates quality using average negative log-likelihood loss.
  2. MLP (Bengio et al., 2003) — Learns character embeddings and a hidden layer to predict the next character using a context window. Introduces train/val/test splits, learning rate tuning, and mini-batch gradient descent.
  3. Batch Normalization — Explores activation statistics, dead neurons, and gradient diagnostics. Implements Kaiming He initialization and BatchNorm from scratch to stabilize training.
  4. Backpropagation from Scratch — Manually derives gradients for every operation in the network, replacing torch.backward(). A deep dive into the calculus behind neural network training.
  5. WaveNet-style Architecture — Replaces flat context fusion with hierarchical dilated convolutions, progressively merging pairs of inputs across layers — mirroring the structure of DeepMind’s WaveNet.

Key Concepts

  • Negative log-likelihood and maximum likelihood estimation
  • Kaiming initialization and activation function calibration
  • Batch normalization and its tradeoffs
  • Manual gradient computation for cross-entropy, tanh, and softmax

GitHub: github.com/srushtii-m/makemore