Compliance Report Generator

A full-stack compliance agent that processes regulatory documents (GDPR, internal policies) and makes them queryable through multiple retrieval strategies. Users upload PDFs, ask questions, and receive answers grounded in the document — with every response gated through PII detection, prompt safety checks, and human approval before being saved to memory.

Pipeline

PDF uploaded and parsed into chunks using pdfplumber (text + tables)
Chunks embedded and stored in Qdrant for semantic vector search
Entities and relationships extracted into Neo4j knowledge graph
User query routed to the selected agent mode
Safety checks run (PII detection + prompt guardrails) before any LLM call
Answer shown for human review — approved answers saved to Qdrant memory

Key Components

Multi-Modal Retrieval — vector similarity (Qdrant) finds semantically relevant chunks; Neo4j graph queries surface entity relationships that keyword search misses.

Three Agent Modes

Memory-Aware QA: retrieves document context with Neo4j graph memory injected into the prompt, role-specific instructions for legal analyst, policy researcher, or compliance officer
Tool Agent: LangChain ZERO_SHOT_REACT agent with 5 tools — risk scoring, compliance lookup, live news, summarization, and compliance score
Context-Aware Chain: role-based QA chain with custom PromptTemplates per user type

Security Layer

PII detection (emails, phone numbers, SSNs) before any LLM call
Keyword-based prompt guardrails blocking harmful or unethical queries
Ollama-based self-hosted safety classifier for air-gapped environments where prompts cannot leave the machine
Consistent enforcement across CLI, agents, and Streamlit UI

Human-in-the-Loop (HITL)

Streamlit UI: session state pauses execution, shows answer and source chunks, waits for approve or regenerate
CLI: programmatic HITL wrapper for batch workflows — runs chain, prints answer with sources, prompts for approval before saving to memory

LangGraph Ingestion Workflow — checkpointable pipeline (ingest, embed, graph, memory) that resumes from any failed step without reprocessing earlier stages.

What I Learned

How to combine vector search and knowledge graphs for richer document retrieval — semantic similarity finds relevant chunks, graph queries find entity relationships that keyword search misses
LangGraph for stateful, fault-tolerant workflows — nodes can fail and resume without reprocessing earlier steps
Building practical guardrails: PII regex detection and prompt classification before any LLM call, with a self-hosted Ollama fallback for privacy-sensitive deployments
Human-in-the-loop patterns in Streamlit using session state to pause execution and wait for user approval before writing to memory
Multi-agent orchestration: separating memory-aware QA from tool-calling agents and designing a tool registry that makes agent capabilities inspectable and extensible

GitHub: github.com/srushtii-m/compliance-agent

Share on

Twitter Facebook LinkedIn

Srushti Manjunath

Pipeline

Key Components

What I Learned

Share on