ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SYSTEM PROFILE :: NITIN.EXE β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β ROLE β Data Analyst β Data Engineer (transition in progress) β
β EXP β 1.5 years analytics + actively building DE stack β
β STACK β SQL Β· Python (PySpark/Pandas) Β· Airflow Β· dbt Β· Power BI β
β PROJECTS β E-Commerce ETL Β· Retail DW Β· Bike Store DB Β· Airflow+dbt β
β LOCATION β Delhi, India β Remote-ready β Hybrid-open β
β SEEKING β DA & DE Roles β Real problems, not toy datasets β
β MOTTO β "Not just dashboards β pipelines that actually run." β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β DE PIPELINE [ββββββββββββββββββββββββββββ] 70% β Full Stack Engineer β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#!/bin/bash
# What I'm building toward β no fluff
CURRENT_FOCUS="Production-grade ETL pipelines from scratch"
TARGET_STACK=("Airflow" "dbt" "PySpark" "Kafka" "AWS")
APPROACH="Learn by building real things, not just following tutorials"
OPEN_TO="Data Analyst & Data Engineer roles"
echo "[β] Analyst foundations: SQL, Python, Power BI, EDA"
echo "[β] Relational DB design: Hospital Mgmt + Bike Store DB shipped"
echo "[β] Data warehousing: Retail DW β star schema, dimensional model"
echo "[β] First pipeline shipped: E-Commerce Analytics (PostgreSQL β DuckDB)"
echo "[β‘] Currently: Airflow orchestration + dbt medallion transforms"
echo "[β³] Next: Kafka streaming + cloud deployment on AWS"
echo "[π―] End goal: Full-stack data engineer who understands the business"// CORE LANGUAGES
// DATA ENGINEERING β ACTIVE BUILD ZONE β‘
// ANALYTICS & BI
// DATABASES & INFRA
[STATUS: SHIPPED β] Β |Β View Repo β
PIPELINE ARCHITECTURE:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Raw CSVs βββΊ PostgreSQL βββΊ Python/Pandas βββΊ DuckDB
(src) (ingestion) (clean+dedupe) (OLAP queries)
β
βΌ
Seaborn + Matplotlib
(visual storytelling)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
What makes this real:
- Handles inconsistent, dirty real-world CSVs β not clean Kaggle data
- SQLAlchemy ORM for reliable, repeatable ingestion into PostgreSQL
- DuckDB for fast in-process analytical queries (no spinning up a warehouse)
- Full EDA with visual storytelling, not just charts for the sake of charts
[STATUS: SHIPPED β] Β |Β View Repo β
WAREHOUSE ARCHITECTURE:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Source Data βββΊ Staging Layer βββΊ Dimensional Model
(raw retail) (clean/conform) (Star Schema)
β
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
Fact Table Dim: Product Dim: Customer
(fact_sales) Dim: Store Dim: Date
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
What makes this real:
- Star schema dimensional model built for analytical query performance
- Proper fact and dimension table separation following Kimball methodology
- Slowly Changing Dimensions (SCD) handling for historical accuracy
- Designed to plug directly into BI tools like Power BI or Tableau
[STATUS: SHIPPED β] Β |Β View Repo β
SCHEMA DESIGN:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Business Requirements βββΊ ERD Design βββΊ Normalized Schema
(analysis) (entities) (3NF tables)
β
ββββββββββββββββββββ€
βΌ βΌ
Stored Procs Indexes
+ Triggers + Views
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
What makes this real:
- Full relational design: customers, orders, products, staff, stores, inventory
- Normalized to 3NF β no data anomalies, enforced referential integrity
- Stored procedures and triggers for business logic at the DB layer
- Window functions and JOIN-heavy queries for real analytical reporting
[STATUS: BUILDING π§]
PLANNED ARCHITECTURE:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CSV / PostgreSQL βββΊ Airflow DAGs βββΊ dbt Transforms
(sources) (orchestrate) (Bronze β Silver β Gold)
β
βΌ
Power BI Dashboard
(business insights)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Full medallion architecture. Airflow handling scheduling and orchestration, dbt handling all transformation logic with tests and documentation, Power BI at the output layer. Watch this repo.
[STATUS: QUEUED π‘]
Kafka Producer β Kafka Topics β PySpark Streaming β PostgreSQL / Delta Lake
Event-driven pipeline with real-time processing. The step that separates analysts from engineers.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SKILL ACQUISITION LOG :: 2026 β
β βββββββββββββββββββββββ¦βββββββββββββββββββββββββββββ¦βββββββββββ¦ββββββββββββββ β£
β TRACK β FOCUS AREAS β PROGRESS β STATUS β
β βββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ¬βββββββββββ¬ββββββββββββββ β£
β Data Engineering β PySpark β β β
β β SQLAlchemy β ββββββββ β β‘ ACTIVE β
β β Airflow β 75% β β
β β dbt β β β
β β Medallion Architecture β β β
β βββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ¬βββββββββββ¬βββββββββββββββ£
β Advanced Analytics β Advanced SQL β β β
β β Pandas β ββββββββ β β
SOLID β
β β Power BI β 95% β β
β β Tableau β β β
β β MySQL / PostgreSQL β β β
β βββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ¬βββββββββββ¬βββββββββββββββ£
β DB Design β Dimensional Modeling β β β
β β Star / Snowflake Schema β ββββββββ β β
SHIPPED β
β β 3NF Normalization β 80% β β
β β SCD / Slowly Changing Dims β β β
β βββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ¬βββββββββββ¬βββββββββββββββ£
β Cloud Infrastructure β AWS (S3, Glue, Athena, β ββββββββ β π’ SCOUTING β
β β Redshift, IAM, Lambda) β 25% β β
ββββββββββββββββββββββββ©βββββββββββββββββββββββββββββ©βββββββββββ©βββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β $ echo "Open to DA & DE roles. Let's build something real." β
β β
β > Open to DA & DE roles. Let's build something real. β
β β
β $ _ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ