Other Guides
Reference shelf
Longer study guides and supporting notes that sit behind the main writing feed.
Data System Design Interview Glossary
A reference of every technical concept a data engineer should be ready to name, define, or trade off in a data architecture interview. Organized to map onto your delivery framework: Functional Requirements → Non-Funct...
Study Guide: Data Operations Architecture at Scale
Production ML data operations is a collection of connected systems: ingestion, annotation, synthetic data, multimodal storage, enrichment, quality monitoring, agentic remediation, self-service tooling, scale patterns,...
Study Guide: The Nemotron 3 Super Data Engineering Pipeline
A structured reference for how NVIDIA built the 25-trillion-token pretraining corpus behind Nemotron 3 Super — a case study in large-scale data engineering for LLM pretraining, distinct from (but related to) operation...