6 Module 5: The Relational Model

6.1 Module Overview

6.1.1 Learning Objectives

By the end of this module, students will be able to:

Understand the motivation behind the development of the relational model, and be able to provide a definition of it in your own words
Describe the high-level connection between set theory (from Module 4) and the relational model
Identify and relate the following concepts: relations, attributes, tuples and types; as well as connect them to their corresponding analogous terms: tables, columns, rows and domain
Describe the meaning of key attributes and understand the difference between a schema and an instance of a database
Understand the steps of relational database creation: design (using a DDL), initial data load, and query execution
Identify the function of a query and how natural language queries for relational databases can be formalized with relational algebra
Understand the link between relational algebra and SQL, as well as the fact that queries return relations
Apply and reason through the application of the following relational algebra operators and their compositions: select, project, cross product, natural join, theta join, union, difference, intersection, rename
Translate E/R diagrams to relational models of databases

6.1.2 Topics Covered

Introduction to the Relational Model
- Historical context and motivation
- Definition of the relational model
- Connection to set theory from Module 4
Relational Model Components
- Relations, attributes, tuples, types
- Analogous terms: tables, columns, rows, domains
- Key attributes and constraints
- Schema vs. instance
Relational Database Creation Process
- Database design using DDL (Data Definition Language)
- Initial data loading
- Query execution
Queries and Query Languages
- Purpose and function of queries
- Formalizing natural language queries with relational algebra
- Link between relational algebra and SQL
- Query results as relations
Relational Algebra Operators
- Select (\(\sigma\))
- Project (\(\pi\))
- Cross Product (\(\times\))
- Natural Join (\(\bowtie\))
- Theta Join (\(\bowtie_{\theta}\))
- Union (\(\cup\))
- Difference (\(-\))
- Intersection (\(\cap\))
- Rename (\(\rho\))
- Composing operators for complex queries
From ER Diagrams to Relational Models
- Converting entity sets to relations
- Translating relationships (1:1, 1:n, n:m)
- Handling weak entity sets
- Representing ISA hierarchies
- Complete example: Bridge monitoring system from Module 4

6.1.3 Project Milestones

Meet with your teammates to discuss final project ideas and select a winning one.

6.2 Lecture Notes

6.2.1 From ER Diagrams to Actual Databases

In Module 4, we learned two fundamental concepts: Entity Relationship (ER) diagrams for visualizing database structure, and set theory for understanding the mathematical foundations of data. ER diagrams helped us answer questions like “What entities exist in our system?” and “How are they related?” Set theory gave us the mathematical language to describe collections of data and operations on them.

But ER diagrams are design tools—they help us think through and communicate what our database should look like. They don’t actually store data or answer queries. To build a working database system, we need to translate our ER diagram design into a concrete implementation. This is where the relational model comes in.

The relational model provides the foundation for nearly all modern database systems. It gives us:

A precise way to represent data (as relations/tables)
A formal language to query data (relational algebra)
Rules for translating ER diagrams into actual database structures

An important insight: Any database design—and the requirements for it—ultimately comes from envisioned use cases. In other words, understanding how users will interact with your database (what questions they’ll ask, what queries they’ll run) is the most important starting point for design. You might have a perfect ER diagram, but if it doesn’t support the queries your users need to run efficiently, it’s not serving its purpose.

As we progress through this module, keep in mind this design principle: the structure of your database should be driven by the queries you anticipate running. The relational model gives us both the structure (how to organize data in tables) and the query language (relational algebra) to make this happen.

6.2.2 Introduction to the Relational Model

The relational model is used by almost all major commercial database systems today—from PostgreSQL and MySQL to Oracle and Microsoft SQL Server. Despite being over 40 years old, it remains the dominant approach for organizing and querying data.

At its core, a Database Management System (DBMS) provides efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data. The relational model is the theoretical foundation that makes this possible, combined with high-level query languages that are both simple to use and expressive enough to handle complex data needs.

6.2.2.1 What is the Relational Model?

The relational model organizes data using a few simple but powerful concepts:

Database = a set of named relations (also called tables)
Each relation has a set of named attributes (also called columns)
Each tuple (also called row) contains a value for each attribute
Each attribute has a type (also called domain) that constrains what values it can hold

Additionally:

A key is an attribute (or set of attributes) whose value uniquely identifies each tuple in the relation
NULL is a special value representing “unknown” or “undefined”

Two critical concepts distinguish the structure from the data itself:

Schema = the structural description of relations in the database (the set of relations, their attributes, types, and constraints)
Instance = the actual contents of the database at a given point in time (the actual data)

The schema is like a blueprint—it describes what the database looks like. The instance is the current state—the actual rows of data stored at this moment.

6.2.2.2 Historical Context and Motivation

The relational model was introduced by Edgar F. Codd in 1970 in his seminal paper “A Relational Model of Data for Large Shared Data Banks.” Before Codd’s work, databases were navigational—you had to write code that explicitly navigated through data structures using pointers and links, similar to traversing linked lists or trees in programming.

Note

I actually had not gone to the paper itself before, but in confirming what I got in these notes from Claude, I went to the source (so that I could include the link to the URL) today. The foresight of this 55 year old paper is quite remarkable. With the relational algebra we went over today, you might be able to understand the whole paper.

Codd’s revolutionary insight was that data could be organized as simple tables (relations) and queried using high-level, declarative languages based on mathematical logic. Instead of telling the computer how to find data (navigate this pointer, follow that link), you simply describe what you want, and the DBMS figures out how to retrieve it efficiently.

This abstraction dramatically simplified database programming and made databases accessible to non-programmers. It’s why SQL (Structured Query Language), which implements Codd’s relational algebra, became ubiquitous.

6.2.2.3 Connection to Set Theory

The relational model is fundamentally grounded in set theory, which we studied in Module 4. This connection isn’t coincidental—Codd built the relational model on the mathematical foundation of sets.

Here’s how the concepts map:

A relation is a set of tuples (no duplicate rows, unordered)
Relational algebra operations (select, project, join, union, etc.) are set operations
The Cartesian product from set theory becomes the cross product in relational algebra
Union, intersection, and difference work exactly as they do in set theory

This mathematical foundation gives the relational model:

Precision: Operations are well-defined mathematically
Compositionality: Query results are relations, so you can nest queries
Optimization: Mathematical properties allow DBMSs to rewrite queries for efficiency

Throughout this module, you’ll see how the set operations you learned in Module 4 translate directly into relational algebra operators for querying databases.

6.2.3 Relational Model Components

Now let’s examine the core components of the relational model in detail. We’ll use examples from the bridge monitoring system we designed in Module 4 to make these concepts concrete.

6.2.3.1 Relations, Tuples, and Attributes

At the mathematical core, a relation is a set of tuples that share the same structure. Each tuple is an ordered collection of values, and each value corresponds to an attribute of the relation.

Formal definition: If we have attributes \(A_1, A_2, \ldots, A_n\) with corresponding domains (sets of allowed values) \(D_1, D_2, \ldots, D_n\), then a relation \(R\) is a subset of the Cartesian product:

\[R \subseteq D_1 \times D_2 \times \cdots \times D_n\]

In simpler terms: a relation is a set of n-tuples, where each tuple has the same structure—values for the same set of attributes.

Example: Consider a Bridge relation with attributes (Bridge_ID, Name, Year_Built, Length). A tuple in this relation might be:

\[(B001, \text{"Roberto Clemente Bridge"}, 1928, 244)\]

This tuple represents one bridge in our monitoring system.

Key properties of relations:

Tuples are unordered: There’s no “first” or “last” tuple (they’re elements of a set)
No duplicate tuples: Each tuple in a relation must be unique (set property)
Attributes are atomic: Each attribute value is indivisible (no nested structures)

6.2.3.2 Tables, Rows, and Columns: The Common Terminology

While “relation,” “tuple,” and “attribute” are the formal mathematical terms, database practitioners use more intuitive terminology:

Mathematical Term	Database Term	What It Means
Relation	Table	A collection of related data
Tuple	Row	A single record in the table
Attribute	Column	A property or field of the records

Example: Here’s a Bridge table visualized:

Bridge_ID	Name	Year_Built	Length
B001	Roberto Clemente Bridge	1928	244
B002	Fort Pitt Bridge	1959	1201
B003	Fort Duquesne Bridge	1969	1201

In this table:

Each row is a tuple representing one bridge
Each column is an attribute (property) of bridges
The entire table is the Bridge relation

Throughout this course, we’ll use both sets of terms interchangeably, but remember they refer to the same concepts.

6.2.3.3 Domains and Types

Every attribute has a domain (or type)—the set of all possible values that attribute can take.

Examples of domains:

Bridge_ID: String of format “B” followed by 3 digits (e.g., “B001”, “B002”)
Year_Built: Integer between 1800 and current year
Length: Positive real number (meters)
Sensor_Type: Enumeration from {“accelerometer”, “strain_gauge”, “temperature”, “displacement”}

Domains serve several purposes:

Data integrity: Prevent nonsensical values (e.g., negative lengths, future construction years)
Storage optimization: DBMS can use appropriate storage formats (integers vs. strings vs. floats)
Query validation: Catch errors at query time (e.g., comparing incompatible types)

In practice, database systems provide built-in types (INTEGER, VARCHAR, DATE, FLOAT, etc.) and allow you to define constraints that further restrict domains.

6.2.3.4 Keys and Constraints

A key is an attribute (or set of attributes) that uniquely identifies each tuple in a relation. No two tuples can have the same values for all key attributes.

Example: In our Bridge relation, Bridge_ID is a key—each bridge has a unique ID:

Bridge_ID (KEY)	Name	Year_Built	Length
B001	Roberto Clemente Bridge	1928	244
B002	Fort Pitt Bridge	1959	1201
B003	Fort Duquesne Bridge	1969	1201

Notice that two bridges can have the same Length or even the same Name, but they must have different Bridge_ID values.

6.2.3.5 Schema vs. Instance

These two concepts are fundamental to understanding databases:

Schema = The structure or blueprint of the database

The schema defines:

What relations exist
What attributes each relation has
The types/domains of each attribute
What keys and constraints apply

The schema is relatively stable—it changes only when you redesign the database structure.

Instance = The actual data in the database at a specific moment

The instance contains:

The actual tuples (rows) in each relation
The specific values stored

The instance changes frequently—every time you insert, update, or delete data.

Example:

Bridge Schema:

Bridge(Bridge_ID: STRING, Name: STRING, Year_Built: INTEGER, Length: REAL)
  PRIMARY KEY: Bridge_ID
  CONSTRAINT: Year_Built > 1800

Bridge Instance (October 2024):

Bridge_ID	Name	Year_Built	Length
B001	Roberto Clemente Bridge	1928	244
B002	Fort Pitt Bridge	1959	1201
B003	Fort Duquesne Bridge	1969	1201

Bridge Instance (November 2024, after adding a new bridge):

Bridge_ID	Name	Year_Built	Length
B001	Roberto Clemente Bridge	1928	244
B002	Fort Pitt Bridge	1959	1201
B003	Fort Duquesne Bridge	1969	1201
B004	Smithfield Street Bridge	1883	361

The schema remained the same—only the instance changed.

Key insight: When we design databases (including when we translate ER diagrams to relational schemas), we’re defining the schema. When users interact with databases (querying, inserting, updating data), they’re working with the instance.

6.2.4 Relational Database Creation Process

Once you’ve designed your database schema (often starting from an ER diagram), you need to actually create and populate the database. This process typically involves three main steps: defining the structure using DDL, loading initial data, and executing queries. Let’s examine each step.

6.2.4.1 Database Design with DDL

DDL (Data Definition Language) is the subset of database commands used to define and modify the structure of your database—creating tables, specifying attributes, defining keys and constraints.

Here’s an example of creating our Bridge table using SQL DDL syntax (note: while we’re focusing on relational algebra in this module, seeing a bit of DDL helps understand the connection between abstract schemas and concrete databases):

CREATE TABLE Bridge (
    Bridge_ID VARCHAR(10) PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Year_Built INTEGER CHECK (Year_Built > 1800)
);

This DDL statement:

Creates a table named Bridge
Defines four attributes with their types
Specifies Bridge_ID as the primary key
Adds NOT NULL constraint (Name must have a value)

Some Key DDL operations:

CREATE TABLE: Define a new relation
ALTER TABLE: Modify an existing table structure
DROP TABLE: Delete a table

6.2.4.2 Data Loading

After defining the schema, you need to populate the database with actual data. This is called data loading or data insertion.

Example: Loading bridges into our system:

INSERT INTO Bridge (Bridge_ID, Name, Year_Built, Length)
VALUES ('B001', 'Roberto Clemente Bridge', 1928, 244);

INSERT INTO Bridge (Bridge_ID, Name, Year_Built, Length)
VALUES ('B002', 'Fort Pitt Bridge', 1959, 1201);

INSERT INTO Bridge (Bridge_ID, Name, Year_Built, Length)
VALUES ('B003', 'Fort Duquesne Bridge', 1969, 1201);

For large datasets, DBMSs typically provide bulk loading utilities that can read data from CSV files or other formats:

COPY Bridge FROM '/path/to/bridges.csv'
WITH (FORMAT CSV, HEADER TRUE);

Data loading considerations:

Constraint validation: The DBMS checks all constraints during insertion (keys, but also other constraints we haven’t seen yet such as foreign keys, CHECK constraints)
Transaction handling: Large data loads are often wrapped in transactions for consistency
Performance: Bulk loading is much faster than individual inserts

6.2.4.3 Query Execution

Once your database has structure (schema) and data (instance), users can query it—asking questions and retrieving information.

Example natural language query: “Find all bridges built before 1950”

This would be expressed in SQL as:

SELECT * FROM Bridge
WHERE Year_Built < 1950;

And would return:

Bridge_ID	Name	Year_Built	Length
B001	Roberto Clemente Bridge	1928	244
B004	Smithfield Street Bridge	1883	361

What happens during query execution?

Parsing: The DBMS parses the query to check syntax
Validation: Checks that tables and columns exist, types match, etc.
Optimization: The DBMS creates an execution plan (how to actually retrieve the data efficiently)
Execution: The plan is executed, accessing the stored data
Result: The result is returned as a new relation (table)

Key insight: The relational model’s power comes from declarative queries—you specify what you want, not how to get it. The DBMS handles the “how” through query optimization. This is fundamentally different from navigational databases where you had to specify the exact path through data structures.

In the next sections, we’ll learn relational algebra—the formal mathematical language that underlies these queries and allows us to reason about what queries compute.

6.2.5 Queries and Query Languages

Now that we understand how databases are structured and created, let’s focus on how we actually use them—by asking questions and retrieving information through queries.

A query is a request for information from a database. Some queries are easy to pose (“Find all bridges”), while others are more complex (“Find all sensors on bridges built before 1950 that have recorded readings above threshold in the past month”). Similarly, some queries are easy for a DBMS to execute efficiently, while others require significant computation.

An important note: the term “query language” is somewhat misleading—these languages are also used to modify data (inserting, updating, deleting), not just retrieve it. But historically, they’re called query languages because retrieval was their primary purpose.

A fundamental property: In the relational model, queries return relations. This property is called closure or compositionality—the result of a query is itself a relation, which means you can use it as input to another query. This enables building complex queries from simpler ones.

6.2.5.1 What is a Query?

At the most basic level, a query expresses an information need:

Natural language: “Show me all bridges built before 1950”
Formal language (relational algebra): \(\sigma_{\text{Year\_Built} < 1950}(\text{Bridge})\)
SQL: SELECT * FROM Bridge WHERE Year_Built < 1950

All three express the same information need, but with different levels of precision and formality.

Queries typically involve:

Selection: Choosing which rows satisfy certain conditions
Projection: Choosing which columns to include in results
Combination: Joining data from multiple tables
Aggregation: Computing summary statistics (count, average, etc.)

The power of declarative query languages is that you specify what you want, not how to get it. The DBMS figures out an efficient execution strategy.

6.2.5.2 Relational Algebra as a Query Language

Relational algebra is the formal mathematical language for expressing queries on relational databases. It consists of a set of operators that take one or more relations as input and produce a new relation as output.

Relational algebra serves several important purposes:

Formal foundation: Provides a precise mathematical definition of what queries compute
Query optimization: DBMSs use relational algebra internally to reason about and optimize queries
Conceptual tool: Helps us understand what operations are possible and how they compose

The core relational algebra operators are:

Select (\(\sigma\)): Picks certain rows based on a condition
Project (\(\pi\)): Picks certain columns
Cross Product (\(\times\)): Combines all rows from two tables
Natural Join (\(\bowtie\)): Combines rows from two tables based on common attributes
Union (\(\cup\)): Combines rows from two tables (with same schema)
Difference (\(-\)): Rows in first table but not in second
Intersection (\(\cap\)): Rows in both tables
Rename (\(\rho\)): Changes relation or attribute names

Notice how these connect to the set operations from Module 4! Union, difference, and intersection work exactly as they did with sets. The relational-specific operators (select, project, join) extend set theory for working with structured data.

We’ll explore each operator in detail in the next section.

6.2.5.3 The Link to SQL

SQL (Structured Query Language) is the practical implementation of relational algebra used by virtually all commercial database systems. While relational algebra is the mathematical foundation, SQL is what you actually write when working with databases.

Correspondence between relational algebra and SQL:

Relational Algebra	SQL Equivalent
\(\sigma_{\text{condition}}(R)\)	`SELECT * FROM R WHERE condition`
\(\pi_{\text{A,B}}(R)\)	`SELECT A, B FROM R`
\(R \times S\)	`SELECT * FROM R, S` (or `R CROSS JOIN S`)
\(R \bowtie S\)	`SELECT * FROM R NATURAL JOIN S`
\(R \cup S\)	`SELECT * FROM R UNION SELECT * FROM S`

Key differences:

Duplicates: Relational algebra is based on sets (no duplicates). SQL is based on multisets/bags (allows duplicates unless you use DISTINCT)
NULL values: SQL includes NULL for unknown/undefined values; pure relational algebra doesn’t
Syntax: SQL is more verbose but arguably more readable for complex queries
Additional features: SQL includes many features beyond basic relational algebra (aggregation, grouping, subqueries, etc.)

In this module, we’re focusing on relational algebra because it provides the clean mathematical foundation. Understanding relational algebra makes learning SQL straightforward—you’ll already understand what operations mean, and SQL syntax is just a different notation.

Example: Let’s see the same query in both notations.

Query: “Find the names of bridges built before 1950”

Relational Algebra: \[\pi_{\text{Name}}(\sigma_{\text{Year\_Built} < 1950}(\text{Bridge}))\]

SQL:

SELECT Name
FROM Bridge
WHERE Year_Built < 1950;

Both express: first filter bridges to those built before 1950 (select/WHERE), then extract just the Name column (project/SELECT Name).

6.2.6 Relational Algebra Operators

Now we’ll examine each relational algebra operator in detail. For each operator, we’ll provide a formal definition, explain what it does intuitively, and show examples using our bridge monitoring system.

Throughout this section, we’ll use these sample relations:

Bridge:

Bridge_ID	Name	Year_Built	Length
B001	Roberto Clemente Bridge	1928	244
B002	Fort Pitt Bridge	1959	1201
B003	Fort Duquesne Bridge	1969	1201
B004	Smithfield Street Bridge	1883	361

Sensor:

Sensor_ID	Type	Bridge_ID	Install_Date
S001	accelerometer	B001	2020
S002	strain_gauge	B001	2020
S003	temperature	B002	2019
S004	accelerometer	B003	2021

6.2.6.1 Select (\(\sigma\))

The select operator picks certain rows from a relation based on a condition. It filters the relation to include only tuples that satisfy the specified predicate.

Notation: \(\sigma_{\text{condition}}(R)\)

Definition: \(\sigma_{\text{condition}}(R) = \{t \mid t \in R \text{ and condition}(t) \text{ is true}\}\)

In words: Select returns all tuples \(t\) from relation \(R\) where the condition evaluates to true.

Example 1: Find all bridges built before 1950

\[\sigma_{\text{Year\_Built} < 1950}(\text{Bridge})\]

Result:

Bridge_ID	Name	Year_Built	Length
B001	Roberto Clemente Bridge	1928	244
B004	Smithfield Street Bridge	1883	361

Example 2: Find all accelerometer sensors

\[\sigma_{\text{Type} = \text{'accelerometer'}}(\text{Sensor})\]

Result:

Sensor_ID	Type	Bridge_ID	Install_Date
S001	accelerometer	B001	2020
S004	accelerometer	B003	2021

Key points:

Select operates on rows (horizontal filtering)
The condition can involve comparisons (\(=\), \(<\), \(>\), \(\leq\), \(\geq\), \(\neq\)), logical operators (\(\land\), \(\lor\), \(\neg\)), and arithmetic
The result schema is identical to the input schema—only rows are filtered

6.2.6.2 Project (\(\pi\))

The project operator picks certain columns from a relation. It extracts a subset of attributes while discarding the others.

Notation: \(\pi_{\text{A}_1, \text{A}_2, \ldots, \text{A}_k}(R)\)

Definition: \(\pi_{\text{A}_1, \ldots, \text{A}_k}(R) = \{(t.\text{A}_1, \ldots, t.\text{A}_k) \mid t \in R\}\)

In words: Project returns tuples containing only the specified attributes from \(R\).

Example 1: Get just the names and years of bridges

\[\pi_{\text{Name, Year\_Built}}(\text{Bridge})\]

Result:

Name	Year_Built
Roberto Clemente Bridge	1928
Fort Pitt Bridge	1959
Fort Duquesne Bridge	1969
Smithfield Street Bridge	1883

Example 2: Get sensor types

\[\pi_{\text{Type}}(\text{Sensor})\]

Result:

Type
accelerometer
strain_gauge
temperature

Key points:

Project operates on columns (vertical filtering)
Duplicates are always eliminated in relational algebra (since results are sets). Notice in Example 2 we only get one “accelerometer” even though there are two accelerometer sensors
SQL behaves differently: it keeps duplicates unless you use DISTINCT. In SQL, SELECT Type FROM Sensor would return “accelerometer” twice, while SELECT DISTINCT Type FROM Sensor matches the relational algebra behavior
The result schema contains only the projected attributes

6.2.6.3 Cross Product (\(\times\))

The cross product (also called Cartesian product) combines two relations by pairing each tuple from the first relation with every tuple from the second relation.

Notation: \(R \times S\)

Definition: \(R \times S = \{(r, s) \mid r \in R \text{ and } s \in S\}\)

In words: The cross product returns all possible pairs of tuples, one from \(R\) and one from \(S\).

Example: Consider smaller relations for clarity:

SmallBridge:

Bridge_ID	Name
B001	Roberto Clemente
B002	Fort Pitt

SmallSensor:

Sensor_ID	Type
S001	accelerometer
S002	strain_gauge

\[\text{SmallBridge} \times \text{SmallSensor}\]

Result:

Bridge_ID	Name	Sensor_ID	Type
B001	Roberto Clemente	S001	accelerometer
B001	Roberto Clemente	S002	strain_gauge
B002	Fort Pitt	S001	accelerometer
B002	Fort Pitt	S002	strain_gauge

Key points:

If \(R\) has \(m\) tuples and \(S\) has \(n\) tuples, then \(R \times S\) has \(m \times n\) tuples
Cross product by itself is rarely useful—it generates all possible combinations, including nonsensical ones (e.g., Bridge B001 paired with a sensor actually on Bridge B002)
It becomes useful when combined with select to filter for meaningful pairings (this is essentially what a join does)
The result schema includes all attributes from both relations

6.2.6.4 Natural Join (\(\bowtie\))

The natural join combines two relations by matching tuples that have equal values for all attributes with the same name. It’s like a cross product followed by filtering for equality on common attributes, then eliminating duplicate columns.

Notation: \(R \bowtie S\)

Definition: Natural join automatically:

Finds all attributes that appear in both \(R\) and \(S\)
Keeps only tuple pairs where these common attributes have equal values
Eliminates one copy of the duplicate attributes

Example: Join Bridge and Sensor on their common attribute Bridge_ID

\[\text{Bridge} \bowtie \text{Sensor}\]

This implicitly enforces Bridge.Bridge_ID = Sensor.Bridge_ID and removes the duplicate Bridge_ID column.

Result:

Bridge_ID	Name	Year_Built	Length	Sensor_ID	Type	Install_Date
B001	Roberto Clemente Bridge	1928	244	S001	accelerometer	2020
B001	Roberto Clemente Bridge	1928	244	S002	strain_gauge	2020
B002	Fort Pitt Bridge	1959	1201	S003	temperature	2019
B003	Fort Duquesne Bridge	1969	1201	S004	accelerometer	2021

Notice Bridge B004 doesn’t appear (no sensors are installed on it).

Formal definition using other operators:

Natural join can be defined in terms of cross product, select, and project. If \(R\) and \(S\) share attributes \(A_1, \ldots, A_k\):

\[R \bowtie S = \pi_{\text{all attributes, removing duplicates}}(\sigma_{R.A_1 = S.A_1 \land \cdots \land R.A_k = S.A_k}(R \times S))\]

Key points:

Automatically finds common attributes—no need to specify join condition
Very convenient when relations are designed with consistent naming
Eliminates one copy of join attributes (otherwise you’d have Bridge_ID appearing twice)
Result includes only tuples with matching values—unmatched tuples are excluded

6.2.6.5 Theta Join (\(\bowtie_\theta\))

The theta join (or just “join”) is the most general form of join. It combines tuples from two relations based on an arbitrary condition \(\theta\).

Notation: \(R \bowtie_\theta S\)

Definition: \(R \bowtie_\theta S = \sigma_\theta(R \times S)\)

In words: Theta join is a cross product followed by a select with condition \(\theta\).

Example 1: Find pairs of bridges and sensors where the sensor was installed after the bridge was built

\[\text{Bridge} \bowtie_{\text{Year\_Built} < \text{Install\_Date}} \text{Sensor}\]

This keeps pairs where the bridge’s construction year is before the sensor’s installation year.

Result:

Bridge_ID	Name	Year_Built	Length	Sensor_ID	Type	Bridge_ID	Install_Date
B001	Roberto Clemente Bridge	1928	244	S001	accelerometer	B001	2020
B001	Roberto Clemente Bridge	1928	244	S002	strain_gauge	B001	2020
B001	Roberto Clemente Bridge	1928	244	S003	temperature	B002	2019
…	…	…	…	…	…	…	…

(Note: This would include all combinations where Year_Built < Install_Date, which includes many nonsensical pairings)

Example 2: A more practical theta join—finding which sensors are on which bridges:

\[\text{Bridge} \bowtie_{\text{Bridge.Bridge\_ID} = \text{Sensor.Bridge\_ID}} \text{Sensor}\]

This is actually equivalent to natural join when the condition is equality on same-named attributes.

Key points:

Theta join is the basic operation implemented in DBMSs—the term “join” in database systems often means theta join
The condition \(\theta\) can be any boolean expression involving attributes from both relations
Natural join is a special case of theta join (where \(\theta\) is equality on all common attributes)
Unlike natural join, theta join does not eliminate duplicate columns

6.2.6.6 Union (\(\cup\))

The union operator combines tuples from two relations vertically, producing a single relation containing all tuples that appear in either input relation (or both).

Notation: \(R \cup S\)

Definition: \(R \cup S = \{t \mid t \in R \text{ or } t \in S\}\)

Requirement: \(R\) and \(S\) must be union-compatible—they must have the same number of attributes, and corresponding attributes must have compatible types. Formally, \(R\) and \(S\) must have the same schema.

Example: Suppose we have two separate monitoring systems with their own Bridge relations:

PittsburghBridges:

Bridge_ID	Name
B001	Roberto Clemente
B002	Fort Pitt

AlleghenyBridges:

Bridge_ID	Name
B002	Fort Pitt
B003	Fort Duquesne

\[\text{PittsburghBridges} \cup \text{AlleghenyBridges}\]

Result:

Bridge_ID	Name
B001	Roberto Clemente
B002	Fort Pitt
B003	Fort Duquesne

Notice Fort Pitt (B002) appears only once—duplicates are eliminated (set property).

Key points:

Union combines tuples vertically (adding rows)
Both relations must have identical schemas
Duplicates are eliminated (set-based operation)
This is the same union operation from set theory in Module 4

6.2.6.7 Difference (\(-\))

The difference operator returns tuples that are in the first relation but not in the second relation.

Notation: \(R - S\)

Definition: \(R - S = \{t \mid t \in R \text{ and } t \notin S\}\)

Requirement: Like union, \(R\) and \(S\) must be union-compatible (same schema).

Example: Using the same bridge relations:

\[\text{PittsburghBridges} - \text{AlleghenyBridges}\]

Result:

Bridge_ID	Name
B001	Roberto Clemente

Only Roberto Clemente appears because it’s in PittsburghBridges but not in AlleghenyBridges.

Example 2: Find bridges with no sensors

\[\pi_{\text{Bridge\_ID}}(\text{Bridge}) - \pi_{\text{Bridge\_ID}}(\text{Sensor})\]

This gets all bridge IDs from Bridge, then removes those that appear in Sensor, leaving only bridges without sensors.

Result:

Bridge_ID
B004

Key points:

Difference is not commutative: \(R - S \neq S - R\) (in general)
Both relations must have identical schemas
Useful for finding what’s in one relation but not another
Same as set difference from Module 4

6.2.6.8 Intersection (\(\cap\))

The intersection operator returns tuples that appear in both input relations.

Notation: \(R \cap S\)

Definition: \(R \cap S = \{t \mid t \in R \text{ and } t \in S\}\)

Requirement: \(R\) and \(S\) must be union-compatible (same schema).

Example:

\[\text{PittsburghBridges} \cap \text{AlleghenyBridges}\]

Result:

Bridge_ID	Name
B002	Fort Pitt

Only Fort Pitt appears because it’s in both relations.

Key points:

Intersection finds common elements
Both relations must have identical schemas
Can be expressed using difference: \(R \cap S = R - (R - S)\)
Same as set intersection from Module 4

6.2.6.9 Rename (\(\rho\))

The rename operator changes the name of a relation and/or its attributes. This is useful for disambiguation and for making schemas compatible for set operations.

Notation:

\(\rho_{S}(R)\) — rename relation \(R\) to \(S\)
\(\rho_{S(A_1, \ldots, A_n)}(R)\) — rename relation to \(S\) and attributes to \(A_1, \ldots, A_n\)

Use cases:

1. Unifying schemas for set operations

Suppose we have:

OldBridges(ID, BridgeName) and NewBridges(Identifier, Name)

They have different schemas, but we want to take their union. We can rename to make them compatible:

\[\rho_{\text{Bridges(Bridge\_ID, Name)}}(\text{OldBridges}) \cup \rho_{\text{Bridges(Bridge\_ID, Name)}}(\text{NewBridges})\]

2. Disambiguation in self-joins

A self-join is when a relation is joined with itself. We need to rename one copy to distinguish them.

Example: Find pairs of bridges built in the same year

\[\sigma_{\text{B1.Year\_Built} = \text{B2.Year\_Built} \land \text{B1.Bridge\_ID} < \text{B2.Bridge\_ID}}(\rho_{\text{B1}}(\text{Bridge}) \times \rho_{\text{B2}}(\text{Bridge}))\]

Here we rename Bridge to B1 and B2 so we can refer to them separately.

Alternate notation—Assignment statements:

Instead of deeply nested expressions, we can use assignment statements (a sequence of instructions):

B1 := Bridge
B2 := Bridge
SameYear := σ(B1.Year_Built = B2.Year_Built AND B1.Bridge_ID < B2.Bridge_ID)(B1 × B2)
Result := π(B1.Name, B2.Name, B1.Year_Built)(SameYear)

This is more readable for complex queries—each step is named and can be referenced later.

Key points:

Rename is essential for self-joins (comparing a relation to itself)
Useful for making schemas compatible for set operations
Assignment notation provides an algorithmic way to express complex queries

6.2.6.10 Composing Operators for Complex Queries

The real power of relational algebra comes from composing operators—building complex queries by combining simpler operations. Because each operator returns a relation, the output of one operation can be the input to another.

Example: “Find the names of bridges that have accelerometer sensors”

We need to:

Filter sensors to just accelerometers
Join with bridges to get bridge information
Project to get just bridge names

Step-by-step:

\(\text{AccelSensors} := \sigma_{\text{Type} = \text{'accelerometer'}}(\text{Sensor})\)
\(\text{BridgesWithAccel} := \text{Bridge} \bowtie \text{AccelSensors}\)
\(\text{Result} := \pi_{\text{Name}}(\text{BridgesWithAccel})\)

As a single expression:

\[\pi_{\text{Name}}(\text{Bridge} \bowtie \sigma_{\text{Type} = \text{'accelerometer'}}(\text{Sensor}))\]

Result:

Name
Roberto Clemente Bridge
Fort Duquesne Bridge

Example 2: “Find bridges built before 1950 that are longer than 300 meters”

\[\sigma_{\text{Length} > 300}(\sigma_{\text{Year\_Built} < 1950}(\text{Bridge}))\]

Or equivalently, using a compound condition:

\[\sigma_{\text{Year\_Built} < 1950 \land \text{Length} > 300}(\text{Bridge})\]

Result:

Bridge_ID	Name	Year_Built	Length
B004	Smithfield Street Bridge	1883	361

Key insight: Relational algebra is compositional—you can build arbitrarily complex queries by chaining operators together. This is the foundation for how DBMSs process queries: they break down complex SQL queries into sequences of relational algebra operations, then optimize the order and method of execution.

6.2.7 From ER Diagrams to Relational Models

Now we bring everything together: translating the ER diagrams from Module 4 into actual relational schemas that can be implemented in a database. This translation process is mechanical—there are specific rules for converting each ER diagram component into relational tables.

6.2.7.1 Translation Rules: Entity Sets to Relations

The most basic translation rule is straightforward: each entity set becomes a relation (table).

Rule: For each entity set \(E\) with attributes \(A_1, A_2, \ldots, A_n\):

Create a relation \(E\) with attributes \(A_1, A_2, \ldots, A_n\)
The key attribute(s) of the entity set become the primary key of the relation

Example: The Bridge entity set from Module 4:

ER Diagram: Bridge entity with attributes (Bridge_ID, Name, Year_Built, Material, Length), where Bridge_ID is the key.

Relational Schema:

Bridge(Bridge_ID, Name, Year_Built, Material, Length)
  PRIMARY KEY: Bridge_ID

As a table:

Bridge_ID	Name	Year_Built	Material	Length
B001	Roberto Clemente Bridge	1928	steel	244
B002	Fort Pitt Bridge	1959	steel	1201

Example 2: The Sensor entity set:

ER Diagram: Sensor entity with attributes (Sensor_ID, Type, Install_Date), where Sensor_ID is the key.

Relational Schema:

Sensor(Sensor_ID, Type, Install_Date)
  PRIMARY KEY: Sensor_ID

Key points:

Each entity becomes a row in the relation
Entity attributes become relation attributes
Key attributes become primary keys
Multi-valued attributes require special handling (we’ll see this shortly)

6.2.7.2 Handling Relationships

Relationships in ER diagrams connect entity sets. How we translate them depends on the multiplicity (1:1, 1:n, or n:m).

Note

Notice that in class (and in the slides) I only idscussed the n:m case, recommending that we create a new table for the relationship always. These notes also consider the special cases of 1:1 and 1:n relationships. While you can still create a new table for those relationships, it may be more efficient to avoid that and instead use foreign keys (or merge the tables) in some situations as discussed by Claude here.

6.2.7.2.1 One-to-One Relationships

For a one-to-one (1:1) relationship between entity sets \(E\) and \(F\):

Option 1: Add a foreign key to one of the relations

Choose either \(E\) or \(F\) (typically the one participating fully in the relationship)
Add \(F\)’s primary key as a foreign key attribute in \(E\) (or vice versa)

Example: Suppose each Bridge has exactly one TaxRecord and each TaxRecord belongs to exactly one Bridge.

ER Diagram: Bridge (1) — Has_Tax_Record — (1) TaxRecord

Relational Schema:

Bridge(Bridge_ID, Name, Year_Built, Length)
  PRIMARY KEY: Bridge_ID

TaxRecord(Tax_ID, Assessment_Value, Last_Updated, Bridge_ID)
  PRIMARY KEY: Tax_ID
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)

We added Bridge_ID as a foreign key in TaxRecord.

Option 2: Merge into a single relation

If the relationship is total (every entity participates), you could combine both entity sets into one table
This is less common but can simplify queries

6.2.7.2.2 One-to-Many Relationships

For a one-to-many (1:n) relationship from \(E\) to \(F\) (one \(E\) relates to many \(F\)s):

Rule: Add the primary key of the “one” side as a foreign key in the “many” side.

Example: One Bridge is monitored by many Sensors (from Module 4).

ER Diagram: Bridge (1) — Monitored_By — (n) Sensor

Relational Schema:

Bridge(Bridge_ID, Name, Year_Built, Length)
  PRIMARY KEY: Bridge_ID

Sensor(Sensor_ID, Type, Install_Date, Bridge_ID)
  PRIMARY KEY: Sensor_ID
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)

Each sensor has a Bridge_ID foreign key indicating which bridge it’s on.

As tables:

Bridge:

Bridge_ID	Name	Year_Built	Length
B001	Roberto Clemente Bridge	1928	244
B002	Fort Pitt Bridge	1959	1201

Sensor:

Sensor_ID	Type	Install_Date	Bridge_ID
S001	accelerometer	2020	B001
S002	strain_gauge	2020	B001
S003	temperature	2019	B002

Notice multiple sensors can have the same Bridge_ID (many-to-one from Sensor’s perspective).

6.2.7.2.3 Many-to-Many Relationships

For a many-to-many (n:m) relationship between entity sets \(E\) and \(F\):

Rule: Create a new relation (called a junction table or linking table) with:

Primary keys from both \(E\) and \(F\) as foreign keys
The combination of these foreign keys forms the primary key
Any attributes of the relationship itself

Example: Engineers inspect Bridges, and each engineer can inspect multiple bridges, while each bridge can be inspected by multiple engineers.

ER Diagram: Engineer (n) — Inspects — (m) Bridge

The relationship Inspects has attributes: Inspection_Date, Condition_Rating

Relational Schema:

Engineer(Employee_ID, Name, Specialty)
  PRIMARY KEY: Employee_ID

Bridge(Bridge_ID, Name, Year_Built, Length)
  PRIMARY KEY: Bridge_ID

Inspects(Employee_ID, Bridge_ID, Inspection_Date, Condition_Rating)
  PRIMARY KEY: (Employee_ID, Bridge_ID, Inspection_Date)
  FOREIGN KEY: Employee_ID REFERENCES Engineer(Employee_ID)
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)

As tables:

Inspects:

Employee_ID	Bridge_ID	Inspection_Date	Condition_Rating
E001	B001	2024-01-15	Good
E001	B002	2024-01-20	Fair
E002	B001	2024-02-10	Good

This captures that Engineer E001 inspected both B001 and B002, and Bridge B001 was inspected by both E001 and E002.

Key insight: Many-to-many relationships cannot be captured with just foreign keys in the entity tables—you must create a junction table.

6.2.7.3 Weak Entity Sets in Relational Models

Weak entity sets depend on another entity set for their identity. When translating to relations:

Rule:

Create a relation for the weak entity set
Include all its own attributes
Include the primary key of the identifying (owner) entity set as a foreign key
The primary key of the weak entity is a composite key: its own partial key plus the owner’s primary key

Example: Sensor Readings depend on Sensors (from Module 4).

ER Diagram: Sensor (strong) — Generates — (weak) Reading

Reading has partial key Sequence_Number (unique within each sensor, but not globally).

Relational Schema:

Sensor(Sensor_ID, Type, Bridge_ID)
  PRIMARY KEY: Sensor_ID

Reading(Sensor_ID, Sequence_Number, Timestamp, Value)
  PRIMARY KEY: (Sensor_ID, Sequence_Number)
  FOREIGN KEY: Sensor_ID REFERENCES Sensor(Sensor_ID)

As a table:

Reading:

Sensor_ID	Sequence_Number	Timestamp	Value
S001	1	2024-01-01 08:00	0.05
S001	2	2024-01-01 08:10	0.06
S002	1	2024-01-01 08:00	120.5
S002	2	2024-01-01 08:10	121.0

Notice Sequence_Number alone doesn’t uniquely identify a reading—you need both Sensor_ID and Sequence_Number.

6.2.7.4 ISA Hierarchies and Inheritance

ISA hierarchies (subclasses) have multiple translation approaches. The choice depends on your query patterns and storage efficiency needs.

Approach 1: Single Table (ER-Style) - Create one table for the superclass with all possible attributes - Add a Type column to distinguish subclasses - Use NULL for attributes that don’t apply to a given subclass

Approach 2: Table Per Subclass - Create separate tables for each subclass - Each subclass table contains all attributes (inherited + specific) - No table for the superclass

Approach 3: Table Per Hierarchy (Hybrid) - Create one table for the superclass with shared attributes - Create separate tables for each subclass with subclass-specific attributes only - Subclass tables reference the superclass via foreign key

Example: Bridge has subclasses Suspension_Bridge, Arch_Bridge, Truss_Bridge (from Module 4).

ER Diagram:

Bridge (superclass) with attributes (Bridge_ID, Name, Year_Built, Length)
Suspension_Bridge with specific attributes (Cable_Diameter, Tower_Height)
Arch_Bridge with specific attributes (Arch_Span, Rise)

Approach 1 - Single Table:

Bridge(Bridge_ID, Name, Year_Built, Length, Bridge_Type,
       Cable_Diameter, Tower_Height, Arch_Span, Rise)
  PRIMARY KEY: Bridge_ID

Bridge_ID	Name	Year_Built	Length	Bridge_Type	Cable_Diameter	Tower_Height	Arch_Span	Rise
B001	Roberto Clemente	1928	244	suspension	0.5	45	NULL	NULL
B002	Fort Pitt	1959	1201	arch	NULL	NULL	300	60

Approach 3 - Table Per Hierarchy:

Bridge(Bridge_ID, Name, Year_Built, Length)
  PRIMARY KEY: Bridge_ID

Suspension_Bridge(Bridge_ID, Cable_Diameter, Tower_Height)
  PRIMARY KEY: Bridge_ID
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)

Arch_Bridge(Bridge_ID, Arch_Span, Rise)
  PRIMARY KEY: Bridge_ID
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)

Tradeoffs:

Single Table: Simple queries for common attributes, but many NULLs and wasted space
Table Per Subclass: No NULLs, but queries for common attributes require UNIONs
Table Per Hierarchy: Balanced approach, but requires JOINs to get complete entity

6.2.7.5 Complete Example: Bridge Monitoring System

Let’s translate the complete ER diagram from Module 4:

Entity Sets:

Bridge (Bridge_ID, Name, Location, Year_Built, Materials)
Engineer (Employee_ID, Name, Specialty, Hire_Date)
Sensor (Sensor_ID, Type, Install_Date)
Manufacturer (Manufacturer_ID, Company_Name)

Relationships:

Bridge (1) — Monitored_By — (n) Sensor
Sensor (n) — Produced_By — (1) Manufacturer
Engineer (n) — Inspects — (m) Bridge (with attributes Inspection_Date, Condition_Rating)
Engineer (n) — Supervises — (n) Engineer (self-relationship)

Weak Entity:

Reading (depends on Sensor) with partial key Seq_Number

Complete Relational Schema:

-- Entity sets become tables
Bridge(Bridge_ID, Name, Location, Year_Built, Materials, Length)
  PRIMARY KEY: Bridge_ID

Engineer(Employee_ID, Name, Specialty, Hire_Date)
  PRIMARY KEY: Employee_ID

Sensor(Sensor_ID, Type, Install_Date, Bridge_ID, Manufacturer_ID)
  PRIMARY KEY: Sensor_ID
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)
  FOREIGN KEY: Manufacturer_ID REFERENCES Manufacturer(Manufacturer_ID)

Manufacturer(Manufacturer_ID, Company_Name)
  PRIMARY KEY: Manufacturer_ID

-- Weak entity
Reading(Sensor_ID, Seq_Number, Timestamp, Value)
  PRIMARY KEY: (Sensor_ID, Seq_Number)
  FOREIGN KEY: Sensor_ID REFERENCES Sensor(Sensor_ID)

-- Many-to-many relationships become junction tables
Inspects(Employee_ID, Bridge_ID, Inspection_Date, Condition_Rating)
  PRIMARY KEY: (Employee_ID, Bridge_ID, Inspection_Date)
  FOREIGN KEY: Employee_ID REFERENCES Engineer(Employee_ID)
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)

Supervises(Supervisor_ID, Supervisee_ID)
  PRIMARY KEY: (Supervisor_ID, Supervisee_ID)
  FOREIGN KEY: Supervisor_ID REFERENCES Engineer(Employee_ID)
  FOREIGN KEY: Supervisee_ID REFERENCES Engineer(Employee_ID)

Key observations:

Each entity set became a table
One-to-many relationships (Bridge-Sensor, Manufacturer-Sensor) are captured by foreign keys in the “many” side
Many-to-many relationships (Engineer-Bridge Inspects, Engineer-Engineer Supervises) required junction tables
The weak entity Reading has a composite primary key
Self-relationships like Supervises are handled like any other many-to-many relationship

This relational schema can now be implemented in any relational DBMS using DDL (as we saw in the Database Creation Process section), populated with data, and queried using relational algebra or SQL.