Building an ETL Pipeline : Sales Insights Case Study
In this project, I developed a complete data pipeline, guiding data from collection to insights through a series of structured stages.
- Extracted data, cleaned it and loaded into Google Cloud Storage for further processing.
- Transformed and modeled the data using fact and dimensional data modeling concepts using Python.
- Using ETL concept, orchestrated the data pipeline on Mage AI and loaded the transformed data into Google BigQuery.
- Developed a dashboard on Looker Studio.
Dataset Used
I’m using the dataset from DataCo Global. It consists of 52 features in areas of provisioning, production, sales and commercial distribution.
Technologies:
- Language: Python, SQL
- Extraction and transformation: Jupyter Notebook, Google BigQuery
- Storage: Google Cloud Storage
- Orchestration: Mage AI
- Dashboard: Looker Studio
Data Pipeline Architecture

Step 1: Cleaning and transformation - sc_data.ipynb Step 2: Storage Step 3: ETL, Orchestration - Mage: Extract, Transform, Load Step 4: Analytics - SQL script Step 5: Dashboard
Data Modeling
The datasets are designed using the principles of fact and dim data modeling concepts.

Step 1: Cleaning and Transformation
In this step, I loaded the CSV file into VSCode and carried out data cleaning and transformation activities prior to organizing them into fact and dim tables.script
Step 2: Stored the data on Google Cloud Storage
Step 3: ETL
- Launched the SSH instance and installed required libraries and Mage AI library.
- ETL is carried out by accessing the external IP address and mage-ai port number.
- Created a new pipeline with the following stages:
Step 4: Analytics
After running the pipeline in Mage, the fact and dimensional tables were generated in Google Big Query. Performed few queries and crearted sales analytics table. SQL queries here.
Step 5: Dashboard
After completing the analysis, I loaded the relevant tables into Looker Studio and created a dashboard. Dashboard here. 
