Profile

I am Srujan Jabbireddy, a data infrastructure engineer focused on building reliable systems for AI and machine learning workflows. My work centers on turning raw, messy, high-volume data into validated, versioned, and model-ready datasets for training, evaluation, and production use.

What I Work On

This site is where I document the systems I build: ingestion pipelines, data quality checks, dataset versioning, metadata catalogs, ML data loaders, and the engineering tradeoffs involved in moving from raw data to usable AI infrastructure.

Data Infrastructure AI Workflows Dataset Versioning Data Quality ML Data Loading

Current Focus

  • Building ingestion pipelines for messy, high-volume data.
  • Designing validation checks that make datasets trustworthy before training or evaluation.
  • Versioning datasets and metadata so experiments are reproducible.
  • Improving the path from raw data to model-ready training, evaluation, and production inputs.

Tools I Use

  • Python and SQL for data engineering, validation, and analysis.
  • Pandas, NumPy, PyTorch, scikit-learn, Plotly, and Dash for applied ML and data workflows.
  • Airflow, Spark, S3, Redshift, Flask, and cloud tooling for pipelines, storage, and production systems.
  • Metadata, quality checks, versioned datasets, and reproducible ML data preparation.

Selected Work