1 Syllabus
12-741 Section A2 - Fall 2025
2 Course Information
Meeting Times: TR 5:00PM - 6:20 PM
Location: Baker Hall (BH) A53
Instructor: Mario Bergés
Office: PH 123G
Phone: x8-4572
Office Hours: Wednesdays 2:00 PM - 3:00 PM and by appointment
Office Hours Location: PH 123G
Teaching Assistants: TBD
Office Hours: TBD
Office Hours Location: TBD
2.1 Textbooks (Optional)
- H. Garcia-Molina, J.D. Ullman, J. Widom (2009), Database systems: the complete book, Prentice Hall
- Barber, D. (2012). Bayesian Reasoning and Machine Learning. Cambridge UP. [URL]
3 Course Description
Because of the increasing capabilities to generate and collect data, civil and environmental engineers are exposed to a flood of information. This course will introduce graduate students to concepts and methods for organizing and analyzing data.
Topics covered will include: representation and processing of data collected by sensors and sensor networks; data (pre)-processing; feature extraction; relational and no-SQL databases.
The student will learn how to access a database, how to understand and design basic databases, and how to find and process data for inferring trends and draw conclusions from this data.
4 Grading
| Component | Weight |
|---|---|
| Assignments | 40% |
| Final Project | 30% |
| - Written Report | 15% |
| - Individual Oral Presentation | 15% |
| Final Exam | 30% |
5 Policies
5.1 Assignments
A total of four assignments will be published. The topics covered in each assignment will closely follow the ones listed in the schedule of classes.
All assignments are to be solved individually. Discussions and conversations with other students regarding the problem sets are encouraged. However, the final solutions along with the reasoning behind them need to come from you and be clearly explained in the submitted documents.
Each assignment will be worth 10%.
5.1.1 Late Submissions
All assignments have due dates indicated on the syllabus. In general, submitting assignments on time lets the instructional team provide feedback in a more timely and efficient manner. Assignments build on each other, so timely submissions are crucial to your progress in the class. However, sometimes life happens. If you cannot submit an assignment on time, the default will be that you will be eligible for 90% of the grade the first 48 hours that the assignment is late. If you have to submit beyond 48 hours past the due date, please contact me as soon as possible so we can make arrangements. If you do not contact me before the 48 hours and you do not submit it, your assignment will receive a grade of 0%.
5.2 Group Projects
A portion of the grades for the course will be based on a group project, which consists of a written report and an oral presentation (individually). This course project is an opportunity to investigate deeper the methods covered in the course, develop new techniques, and/or apply them to real problems. The project will be completed by groups of 1-3 people. A 4-5 student group will be allowed only in special circumstances, e.g. for ambitious projects. Members are expected to contribute to the project with equal effort.
5.3 Final Exam
There will be a final examination. The date for the exam is still being decided. The exam will be closed book and electronic devices, but students can bring up to five pages of notes (following Chris Hendrickson’s approach, and Matteo Pozzi’s approach, who taught this class before me, I also think that preparing short notes is an opportunity to reflect upon major points of the course).
- Final Exam Date: Friday, December 9, at 8:30am ET
- Final Project Report Due: Friday, December 5, at 11:59pm ET
5.4 Software To Be Used
Students are free to use Mathworks Matlab, Octave or Python to solve the assignments. I personally prefer Python.
Matlab is available (see http://www.cmu.edu/computing/software/all/matlab/index.html) on all of the CMU computing clusters, and students are permitted to download Matlab to private machines so long as it is used on CMU’s campus (www.cmu.edu/myandrew). University licensed toolboxes are also available, e.g. for statistics. They are downloaded as part of the Carnegie Mellon version of Matlab. Microsoft Access is part of the Office package. MySQL is a free software available at: http://www.mysql.com/.
5.5 Collaboration
Collaboration is expected within the limits of discussing concepts and problems. However, each student must produce his/her own solution to the problems.
Copying from another student’s assignment is clearly plagiarism. Using information directly from websites, books, papers and other literary sources without appropriate attribution is also plagiarism. Assignments submitted for this class will be reviewed by the instructor and TA and may be scanned through web-based academic integrity software. Occurrences of cheating or plagiarism will be handled according to the university policy on Academic Integrity, https://www.cmu.edu/policies/documents/Academic%20Integrity.htm. Students are expected to have read this policy and conform to the highest standards of academic integrity. For incidents of academic misconduct, the University Academic Disciplinary Actions Policy, found at https://www.cmu.edu/student-affairs/theword/acad_standards/creative/disciplinary.html, will be followed.
5.6 Class Participation
Students are expected to be in class on time and participate in class discussions. Participation will be loosely monitored and used to calculate the participation grade. If you cannot make class, please inform your instructors and group members ahead of time. In class, students are expected to be courteous and respectful of the views and needs of other students and instructors.
5.7 Students with Disabilities
Students requesting classroom accommodation must first register with the Dean of Students Office. The Dean of Students Office will provide documentation to the student who must then provide this documentation to the Instructor when requesting accommodation.
5.8 Recording of Class Sessions
No recording or taping of any classroom activity is permitted without the express written consent of Prof. Bergés. Any student who needs to record or tape classroom activities because of a disability should contact the Carnegie Mellon Office of Equal Opportunity Services to request an appropriate accommodation.
5.9 Posting of Course Materials
All the material used in the course (syllabus, readings, problem sets, reports) is intended for use in the class only. No unauthorized posting, publication or redistribution is expected. Uploading course materials to Course Hero or other web sites is not an authorized use of the course material.
5.10 Take Care of Yourself
In general, do your best to maintain a healthy lifestyle this semester by eating well, exercising, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.
All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.
If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.
6 Topics Covered (Lecture by Lecture)
6.1 Lecture 1: Introduction
- Motivation for data management: Understanding why data management practices are critical in civil and environmental engineering problems.
- Real-world applications: Identify specific examples of existing engineering systems that leverage data management solutions (e.g., smart infrastructure, environmental monitoring systems, transportation networks).
- Course overview: Familiarize yourself with the course syllabus, grading breakdown, policies, and expectations.
- Core concepts: Define what a database is in your own words and understand its role in organizing and accessing data.
- Course logistics: Understanding assignment structure, final project requirements, and collaboration policies.
Project Milestones: Understand the grading breakdown and expectations of the final project, including the written report and individual oral presentation components.
6.2 Lecture 2: Sensors and Time-series Data
- Time series decomposition: Be able to decompose a time series into seasonal, trend, and residual components to better understand patterns in data collected from sensors.
- Noise characterization: Understand different types of noise in sensor measurements and how noise propagates through both direct and virtual measurements.
- Virtual sensors: Be able to work with virtual sensors and understand how they derive measurements from other sensors or models.
- Data quality: Recognize the importance of understanding measurement uncertainty and error in sensor-based systems.
Project Milestones: Identify your teammates for the group project (groups of 1-3 people).
Assignment #1 Out
6.3 Lecture 3: Processing Time Series Data
- Error propagation for linear Gaussian models: Be able to calculate error propagation for linear Gaussian models.
- Non-linear models: Understand the reasons for why error propagation is different when models are not linear.
- Outlier removal: Be able to conduct outlier removal in time series using Chauvenet’s criterion.
- Linear regression derivation: Be able to follow the derivation of linear regression with linear and non-linear basis functions.
- Time series interpolation and extrapolation: Use linear regression to interpolate or extrapolate from the time-series data that is available (estimating unobserved values in the time domain).
Project Milestones: Post on Piazza your specific group assignments and/or your need to join a group for the final project.
6.4 Lecture 4: Entity Relationship Diagrams and Set Theory
- Entity Relationship Diagrams: Understand the structure and visual components of these diagrams, and the motivation for using them.
- Components of an ER Diagram: Be able to distinguish between attributes, entity sets and relationships.
- Weak Entity Sets: Be able to identify weak entity sets and how to display them
- Roles and Subclasses in ER diagrams: Understand the need and use of roles and sub-classes in ER diagrams.
- Set Theory: Recognize the value of set theory for describing and working with data records.
- Definition of sets: Be able to define sets in your own words, along with derivative terms such as sub-sets and super-sets.
- Basic operations on sets: Be able to apply (and recognize the notation for) the basic set operations of intersection, union, set difference, complement, cartesian product, cardinality
- Venn Diagrams: Recognize the value of visual diagramming approaches to sets such as Venn diagrams, and identify equivalences between set theory expressions and their corresponding Venn diagrams.
- Properties of Set Operations: Understand and be able to apply commutative, associative, distributive properties of set operations. Similarly, understand what are complimentary sets, and disjoint sets.
Project Milestones: Come up with 3 ideas for your final project, regardless of whther or not you are already assigned to a team.
6.5 Lecture 5: The Relational Model
Here are the topics covered, along with the learning objectives for each one:
Introduction to the Relational Model: Understand the motivation behind the development of the relational model, and be able to provide a definition of it in your own words. Be able to describe the high-level connection between set theory and the relational model.
Relational Model Components: Be able to identify and relate the following concepts: relations, attributes, tuples and types; as well as connect them to their corresponding analgous terms: tables, columns, rows and domain. Be able to describe the meaning of key attributes and understand the difference between a schema and an instance of a database.
Relational Database Creation Process: Understand the steps of design (using a DDL), initial data load, and query execution.
Queries and Query Languages: Be able to identify the function of a query and how natural language queries for relatioanl databases can be formalized with, e.g., relational algebra. Understand the link between relational algebra and SQL, as well as the fact that queries return relations.
Relational Algebra Operators: Be able to apply and reason through the application of the following operators and their compositions: select, project, cross product, natural join, theta join, union, difference, intersection, rename.
From ER Diagrams to Relational Models: Be able to transition from E/R diagrams to relational models of databases.
Project Milestones: Come up with 3 ideas for your final project, regardless of whther or not you are already assigned to a team.
6.6 Lecture 6: The Structured Query Language (SQL)
Here are the topics covered, along with the learning objectives for each one:
Introduction to the SQL: Understand what SQL is and where it come from as well as its prevalence in industry and computing infrastructure today. Be able to distinguish between the Data Description Language and Data Manipulation Language groups of statements.
SQL Statements: Be able to formulate queries using the following statements, functions and clauses, as well as their relational algebra counterparts (where applicable):
SELECT,SELECT ... WHERE,SELECT ... GROUP BY,SELECT ... JOIN,SELECT ... SELECT(i.e., nested selects),SUM(),MAX(),AVG(),CREATE TABLE,INSERT,DELETE,ALTER,UPDATE.Local DBMS servers: Be able to install and issue queries to a local DBMS such as MySQL (and the MySQL Workbench as a client), or SQLite and the CLI interface to it.
Project Milestones: Submit a set of 5 slides for your group’s final project proposal presentation:
- Title of the project, and its members
- Motivation for your project
- Specific database problem you are aiming to solve and expected goals
- Proposed solution (and how it draws from concepts learned in this course)
- Remaining questions for your team about how to complete the project
6.7 Lecture 7: Database Design
Here are the topics covered, along with the learning objectives for each one:
- Principles of database design: Be able to explain how relational databases are designed and connnect the theory behind it to earlier concepts such as Entity Relationship Diagrams and Relational Algebra.
- Properties and normal forms: Be able to explain how the properties of the data lead to certain dependencies that allow for it to be structured. Understand how different structures lead to different design choices named “normal forms”. Be able to decompose a relation up to its Boyce-Codd Normal Form (BCNF), when possible.
- Functional dependencies: Be able to derive functional dependencies given prioperties of the data, and draw these dependencies in a diagram.
- Keys and closures: Understand the concept of closures and keys of relations.
Project Milestones: Meet with your team to agree on a set of milestones and deliverables that are needed to complete the final project. This will be required as part of Assignment 4.
6.8 Lecture 8: Data Representation and Compression
Here are the topics covered, along with the learning objectives for each one:
- Data Representation: Be able to explain why different choices of how to represent (or compress) data/information offer tradeoffs that we should be aware of.
- Filesystems and ASCII encoding: Be able to predict the size of a simple ASCII text file based on its content.
- Types of data compression: Be able to describe the differences between lossless and lossy compression schemes.
- Frequency Domain Representation of Data: Understand the benefits of frequency domain representations for certain types of data.
- Huffman coding: Be able to apply Huffman coding to an arbitray text string.
- JPEG and other application: Be aware of the compression algorithms used for JPEG and in other domains. Are LLMs a compression algorithm?
Project Milestones: Schedule a meeting with the instructor for any clarifications needed before you go on Thanksgiving break.
6.9 Lecture 9: Database Design Practicum
Here are the topics covered, along with the learning objectives for each one:
- ER Diagrams in Practice: Be able to take a description of a dataset and convert it into an E/R model of it.
- The Relational Model: Be able to translate the E/R diagram to a relational database schema.
- Implementing a Relational Database: Be able to write an SQL script to implement the schema in SQLite.
- Data Ingestion: Be able to populate an existing databse with data contained in CSV files using SQL statements.
- CRUD Using SQL: Be able to create, read, update and delete (CRUD) data in the SQLite database using SQL.
- CRUD using a Web Framework: Understand that SQL statements (and the database schema under it) can be abstracted into higher-level structures such as a RESTful API, or a web application.
- CRUD using Natural Language: Understand that higher levels of abstraction are possible, particularly through the use of LLMs and their ability to parse natural language.
Project Milestones: Define the structure of your final report, and set milestones for completing each section.