LLMs & Alignment

LLM Serving API

A production style REST API for serving large language models with FastAPI, token streaming via Server Sent Events, asyncio request batching, and sliding window rate limiting backed by Ollama.

Financial Reasoning with SFT + GRPO

Fine-tuned Gemma-3-270M for structured financial sentiment reasoning using a two-phase pipeline: SFT to teach output format, followed by GRPO with a multi-component reward function including a FinBERT teacher model for sentiment alignment.

VLM Fine-Tuning: SmolVLM-256M on ChartQA

Fine-tuned SmolVLM-256M on the ChartQA chart question-answering dataset using streaming lazy loading and LoRA/DoRA adapters, achieving full training in under 25 minutes on a 16GB GPU with less than 2GB peak VRAM usage.

GRPO Fine-Tuning with LoRA

Fine-tunes large language models using GRPO (Group Relative Policy Optimization) with LoRA adapters and 4-bit quantization, supporting any HuggingFace model and dataset with automatic field detection and a multi-component reward function.

LLaMA 2: Inference Architecture from Scratch

Implements the LLaMA 2 inference pipeline from scratch in PyTorch, covering rotary positional embeddings, RMSNorm, SwiGLU activations, grouped-query attention, and KV caching, the production techniques that distinguish modern LLMs from research transformers.

Srushti Manjunath

LLMs & Alignment

LLM Serving API

Financial Reasoning with SFT + GRPO

VLM Fine-Tuning: SmolVLM-256M on ChartQA

GRPO Fine-Tuning with LoRA

LLaMA 2: Inference Architecture from Scratch

GPT-2: Reproducing OpenAI’s Architecture from Scratch

GPT from Scratch

Makemore - Character-level Language Model