Data Management
Preface

This book (or website) contains a collection of lecture notes for the Fall 2025 edition of 12-741: Data Management at Carnegie Mellon University.
Notably, the lecture notes are —to a great extent— generated through GenAI tools (mostly Claude Code, with Sonnet 4.5) after careful prompting and many rounds of revision. This effort represents an experiment and, as any experiment, should be treated with humility and an open mind.
My hypothesis for the experiment is that it should be possible for me to carefully extract the relevant knowledge that is stored in modern LLMs (through a lossy compression process) in order to generate useful lecture notes that go well beyond what I would be able to write myself in the same time, all while providing additional utility to the students. This additional utility would come in two flavors: (1) the lecture notes can now contain interactive elements based on code snippets that can facilitate building intuition and understanding difficult concepts; (2) the students are encouraged to be consciously and actively suspicious of the material that they are reading, given that it could contain errors.
This hypothesis might be proven correct, or incorrect. I don’t know what the answer is but I am curious about it. Obviously, with a small sample size of the students in this class and only self-reported measures of success, it will be hard to assess the hypothesis in general. But it’s worth trying.
To incentivize students to be more critical readers of the material contained in this book, I am awarding each student 1 bonus point (worth 1% of your final grade) for each errors found in the content (submitted to me via e-mail) up to a total of 5 points per student.
If you’re wondering about the website design and layout of this book: I’m using Quarto. To learn more about Quarto books visit https://quarto.org/docs/books.