About

Profile

I am Srujan, an engineer focused on building reliable systems for AI and machine learning workflows. My work centers on turning raw, messy, high-volume data into validated, versioned, and model-ready datasets for training, evaluation, and production use.

What I Work On

This site is where I document the systems I build: ingestion pipelines, data quality checks, dataset versioning, metadata catalogs, ML data loaders, and the engineering tradeoffs involved in moving from raw data to usable AI infrastructure.

Data Infrastructure AI Workflows Dataset Versioning Data Quality ML Data Loading

Current Focus

Building ingestion pipelines for messy, high-volume data.
Designing validation checks that make datasets trustworthy before training or evaluation.
Versioning datasets and metadata so experiments are reproducible.
Improving the path from raw data to model-ready training, evaluation, and production inputs.

Tools I Use

Python and SQL for data engineering, validation, and analysis.
Pandas, NumPy, PyTorch, scikit-learn, Plotly, and Dash for applied ML and data workflows.
Airflow, Spark, S3, Redshift, Flask, and cloud tooling for pipelines, storage, and production systems.
Metadata, quality checks, versioned datasets, and reproducible ML data preparation.

Selected Work

Data Pipeline Orchestration - an ETL pipeline using Airflow, Spark, S3, and Redshift.
Anomaly Detection in Electricity Consumption - a practical look at unusual-pattern detection in energy usage.
M5 Forecasting Series - exploratory analysis and forecasting work on retail demand data.
End-to-End Machine Learning Web App - an object detection application built around real product workflow constraints.