All notes

A chronological index of technical writeups, project notes, and learning logs.

learnings

Multimodal Lakehouse Implementation Notes

In the first version of this project, I wrote about the gap between a notebook embedding workflow and a production-shaped multimodal data pipeline. These notes go one layer deeper: what I built, why I used four modalities, where the architecture came from, ...

Read
learnings

Serverless Multimodal Data Lakehouse

Every machine learning tutorial shows the same pattern: load a dataset, preprocess it in a notebook, embed it, train a model. It looks simple. It works on 1,000 samples. It fails on 10 million.

Read
learnings

Anomaly Detection in Electricity Consumption

Important facts: Anomaly detection is a very important and active business metric for various fields. A technique that is used to identify the unusual patterns that are not in sync with the expectations. It has many applications in business-like health...

Read
learnings

An End-to-End Machine Learning Web App

Important Facts: Works best on 30 house amenities classes (as this model is trained on pre-defined amenities) Achieved mAP@0.5:0.95 - 0.495 – close enough to be a viable product Built on Pytorch, trained using GCP (1 GPU - P1000), developed API using...

Read
learnings

M5 - Forecasting - Part-III

After thorough exploration of the data and time-series visualization, now I will try my hand at forecasting methods and predicting the demand for the three states and three products seperately. We will discuss various different approach present, used and de...

Read
learnings

M5 - Forecasting - Part-II

After having explored this huge dataset, I wanted to explore with more of time-series components to understand the distribution much better. So, lets dive in.   For Initial Data Exploration of this data, please read here - PART-I

Read
learnings

M5 - Forecasting - Part-I

This is from a kaggle competition, where I wanted to participated and apply my learnings in forecasting methods. In this 1st part post, I am exploring the data with various visualization and trying to understand the dataset.

Read