About
Profile
I am Srujan Jabbireddy, a data infrastructure engineer focused on building reliable systems for AI and machine learning workflows. My work centers on turning raw, messy, high-volume data into validated, versioned, and model-ready datasets for training, evaluation, and production use.
What I Work On
This site is where I document the systems I build: ingestion pipelines, data quality checks, dataset versioning, metadata catalogs, ML data loaders, and the engineering tradeoffs involved in moving from raw data to usable AI infrastructure.
Data Infrastructure
AI Workflows
Dataset Versioning
Data Quality
ML Data Loading
Current Focus
- Building ingestion pipelines for messy, high-volume data.
- Designing validation checks that make datasets trustworthy before training or evaluation.
- Versioning datasets and metadata so experiments are reproducible.
- Improving the path from raw data to model-ready training, evaluation, and production inputs.
Tools I Use
- Python and SQL for data engineering, validation, and analysis.
- Pandas, NumPy, PyTorch, scikit-learn, Plotly, and Dash for applied ML and data workflows.
- Airflow, Spark, S3, Redshift, Flask, and cloud tooling for pipelines, storage, and production systems.
- Metadata, quality checks, versioned datasets, and reproducible ML data preparation.
Selected Work
- Data Pipeline Orchestration - an ETL pipeline using Airflow, Spark, S3, and Redshift.
- Anomaly Detection in Electricity Consumption - a practical look at unusual-pattern detection in energy usage.
- M5 Forecasting Series - exploratory analysis and forecasting work on retail demand data.
- End-to-End Machine Learning Web App - an object detection application built around real product workflow constraints.