6  Module 5: The Relational Model

6.1 Module Overview

6.1.1 Learning Objectives

By the end of this module, students will be able to:

  • Understand the motivation behind the development of the relational model, and be able to provide a definition of it in your own words
  • Describe the high-level connection between set theory (from Module 4) and the relational model
  • Identify and relate the following concepts: relations, attributes, tuples and types; as well as connect them to their corresponding analogous terms: tables, columns, rows and domain
  • Describe the meaning of key attributes and understand the difference between a schema and an instance of a database
  • Understand the steps of relational database creation: design (using a DDL), initial data load, and query execution
  • Identify the function of a query and how natural language queries for relational databases can be formalized with relational algebra
  • Understand the link between relational algebra and SQL, as well as the fact that queries return relations
  • Apply and reason through the application of the following relational algebra operators and their compositions: select, project, cross product, natural join, theta join, union, difference, intersection, rename
  • Translate E/R diagrams to relational models of databases

6.1.2 Topics Covered

  • Introduction to the Relational Model
    • Historical context and motivation
    • Definition of the relational model
    • Connection to set theory from Module 4
  • Relational Model Components
    • Relations, attributes, tuples, types
    • Analogous terms: tables, columns, rows, domains
    • Key attributes and constraints
    • Schema vs. instance
  • Relational Database Creation Process
    • Database design using DDL (Data Definition Language)
    • Initial data loading
    • Query execution
  • Queries and Query Languages
    • Purpose and function of queries
    • Formalizing natural language queries with relational algebra
    • Link between relational algebra and SQL
    • Query results as relations
  • Relational Algebra Operators
    • Select (\(\sigma\))
    • Project (\(\pi\))
    • Cross Product (\(\times\))
    • Natural Join (\(\bowtie\))
    • Theta Join (\(\bowtie_{\theta}\))
    • Union (\(\cup\))
    • Difference (\(-\))
    • Intersection (\(\cap\))
    • Rename (\(\rho\))
    • Composing operators for complex queries
  • From ER Diagrams to Relational Models
    • Converting entity sets to relations
    • Translating relationships (1:1, 1:n, n:m)
    • Handling weak entity sets
    • Representing ISA hierarchies
    • Complete example: Bridge monitoring system from Module 4

6.1.3 Project Milestones

Meet with your teammates to discuss final project ideas and select a winning one.

6.2 Lecture Notes

6.2.1 From ER Diagrams to Actual Databases

In Module 4, we learned two fundamental concepts: Entity Relationship (ER) diagrams for visualizing database structure, and set theory for understanding the mathematical foundations of data. ER diagrams helped us answer questions like “What entities exist in our system?” and “How are they related?” Set theory gave us the mathematical language to describe collections of data and operations on them.

But ER diagrams are design tools—they help us think through and communicate what our database should look like. They don’t actually store data or answer queries. To build a working database system, we need to translate our ER diagram design into a concrete implementation. This is where the relational model comes in.

The relational model provides the foundation for nearly all modern database systems. It gives us:

  • A precise way to represent data (as relations/tables)
  • A formal language to query data (relational algebra)
  • Rules for translating ER diagrams into actual database structures

An important insight: Any database design—and the requirements for it—ultimately comes from envisioned use cases. In other words, understanding how users will interact with your database (what questions they’ll ask, what queries they’ll run) is the most important starting point for design. You might have a perfect ER diagram, but if it doesn’t support the queries your users need to run efficiently, it’s not serving its purpose.

As we progress through this module, keep in mind this design principle: the structure of your database should be driven by the queries you anticipate running. The relational model gives us both the structure (how to organize data in tables) and the query language (relational algebra) to make this happen.

6.2.2 Introduction to the Relational Model

The relational model is used by almost all major commercial database systems today—from PostgreSQL and MySQL to Oracle and Microsoft SQL Server. Despite being over 40 years old, it remains the dominant approach for organizing and querying data.

At its core, a Database Management System (DBMS) provides efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data. The relational model is the theoretical foundation that makes this possible, combined with high-level query languages that are both simple to use and expressive enough to handle complex data needs.

6.2.2.1 What is the Relational Model?

The relational model organizes data using a few simple but powerful concepts:

  • Database = a set of named relations (also called tables)
  • Each relation has a set of named attributes (also called columns)
  • Each tuple (also called row) contains a value for each attribute
  • Each attribute has a type (also called domain) that constrains what values it can hold

Additionally:

  • A key is an attribute (or set of attributes) whose value uniquely identifies each tuple in the relation
  • NULL is a special value representing “unknown” or “undefined”

Two critical concepts distinguish the structure from the data itself:

  • Schema = the structural description of relations in the database (the set of relations, their attributes, types, and constraints)
  • Instance = the actual contents of the database at a given point in time (the actual data)

The schema is like a blueprint—it describes what the database looks like. The instance is the current state—the actual rows of data stored at this moment.

6.2.2.2 Historical Context and Motivation

The relational model was introduced by Edgar F. Codd in 1970 in his seminal paper “A Relational Model of Data for Large Shared Data Banks.” Before Codd’s work, databases were navigational—you had to write code that explicitly navigated through data structures using pointers and links, similar to traversing linked lists or trees in programming.

I actually had not gone to the paper itself before, but in confirming what I got in these notes from Claude, I went to the source (so that I could include the link to the URL) today. The foresight of this 55 year old paper is quite remarkable. With the relational algebra we went over today, you might be able to understand the whole paper.

Codd’s revolutionary insight was that data could be organized as simple tables (relations) and queried using high-level, declarative languages based on mathematical logic. Instead of telling the computer how to find data (navigate this pointer, follow that link), you simply describe what you want, and the DBMS figures out how to retrieve it efficiently.

This abstraction dramatically simplified database programming and made databases accessible to non-programmers. It’s why SQL (Structured Query Language), which implements Codd’s relational algebra, became ubiquitous.

6.2.2.3 Connection to Set Theory

The relational model is fundamentally grounded in set theory, which we studied in Module 4. This connection isn’t coincidental—Codd built the relational model on the mathematical foundation of sets.

Here’s how the concepts map:

  • A relation is a set of tuples (no duplicate rows, unordered)
  • Relational algebra operations (select, project, join, union, etc.) are set operations
  • The Cartesian product from set theory becomes the cross product in relational algebra
  • Union, intersection, and difference work exactly as they do in set theory

This mathematical foundation gives the relational model:

  1. Precision: Operations are well-defined mathematically
  2. Compositionality: Query results are relations, so you can nest queries
  3. Optimization: Mathematical properties allow DBMSs to rewrite queries for efficiency

Throughout this module, you’ll see how the set operations you learned in Module 4 translate directly into relational algebra operators for querying databases.

6.2.3 Relational Model Components

Now let’s examine the core components of the relational model in detail. We’ll use examples from the bridge monitoring system we designed in Module 4 to make these concepts concrete.

6.2.3.1 Relations, Tuples, and Attributes

At the mathematical core, a relation is a set of tuples that share the same structure. Each tuple is an ordered collection of values, and each value corresponds to an attribute of the relation.

Formal definition: If we have attributes \(A_1, A_2, \ldots, A_n\) with corresponding domains (sets of allowed values) \(D_1, D_2, \ldots, D_n\), then a relation \(R\) is a subset of the Cartesian product:

\[R \subseteq D_1 \times D_2 \times \cdots \times D_n\]

In simpler terms: a relation is a set of n-tuples, where each tuple has the same structure—values for the same set of attributes.

Example: Consider a Bridge relation with attributes (Bridge_ID, Name, Year_Built, Length). A tuple in this relation might be:

\[(B001, \text{"Roberto Clemente Bridge"}, 1928, 244)\]

This tuple represents one bridge in our monitoring system.

Key properties of relations:

  1. Tuples are unordered: There’s no “first” or “last” tuple (they’re elements of a set)
  2. No duplicate tuples: Each tuple in a relation must be unique (set property)
  3. Attributes are atomic: Each attribute value is indivisible (no nested structures)

6.2.3.2 Tables, Rows, and Columns: The Common Terminology

While “relation,” “tuple,” and “attribute” are the formal mathematical terms, database practitioners use more intuitive terminology:

Mathematical Term Database Term What It Means
Relation Table A collection of related data
Tuple Row A single record in the table
Attribute Column A property or field of the records

Example: Here’s a Bridge table visualized:

Bridge_ID Name Year_Built Length
B001 Roberto Clemente Bridge 1928 244
B002 Fort Pitt Bridge 1959 1201
B003 Fort Duquesne Bridge 1969 1201

In this table:

  • Each row is a tuple representing one bridge
  • Each column is an attribute (property) of bridges
  • The entire table is the Bridge relation

Throughout this course, we’ll use both sets of terms interchangeably, but remember they refer to the same concepts.

6.2.3.3 Domains and Types

Every attribute has a domain (or type)—the set of all possible values that attribute can take.

Examples of domains:

  • Bridge_ID: String of format “B” followed by 3 digits (e.g., “B001”, “B002”)
  • Year_Built: Integer between 1800 and current year
  • Length: Positive real number (meters)
  • Sensor_Type: Enumeration from {“accelerometer”, “strain_gauge”, “temperature”, “displacement”}

Domains serve several purposes:

  1. Data integrity: Prevent nonsensical values (e.g., negative lengths, future construction years)
  2. Storage optimization: DBMS can use appropriate storage formats (integers vs. strings vs. floats)
  3. Query validation: Catch errors at query time (e.g., comparing incompatible types)

In practice, database systems provide built-in types (INTEGER, VARCHAR, DATE, FLOAT, etc.) and allow you to define constraints that further restrict domains.

6.2.3.4 Keys and Constraints

A key is an attribute (or set of attributes) that uniquely identifies each tuple in a relation. No two tuples can have the same values for all key attributes.

Example: In our Bridge relation, Bridge_ID is a key—each bridge has a unique ID:

Bridge_ID (KEY) Name Year_Built Length
B001 Roberto Clemente Bridge 1928 244
B002 Fort Pitt Bridge 1959 1201
B003 Fort Duquesne Bridge 1969 1201

Notice that two bridges can have the same Length or even the same Name, but they must have different Bridge_ID values.

6.2.3.5 Schema vs. Instance

These two concepts are fundamental to understanding databases:

Schema = The structure or blueprint of the database

The schema defines:

  • What relations exist
  • What attributes each relation has
  • The types/domains of each attribute
  • What keys and constraints apply

The schema is relatively stable—it changes only when you redesign the database structure.

Instance = The actual data in the database at a specific moment

The instance contains:

  • The actual tuples (rows) in each relation
  • The specific values stored

The instance changes frequently—every time you insert, update, or delete data.

Example:

Bridge Schema:

Bridge(Bridge_ID: STRING, Name: STRING, Year_Built: INTEGER, Length: REAL)
  PRIMARY KEY: Bridge_ID
  CONSTRAINT: Year_Built > 1800

Bridge Instance (October 2024):

Bridge_ID Name Year_Built Length
B001 Roberto Clemente Bridge 1928 244
B002 Fort Pitt Bridge 1959 1201
B003 Fort Duquesne Bridge 1969 1201

Bridge Instance (November 2024, after adding a new bridge):

Bridge_ID Name Year_Built Length
B001 Roberto Clemente Bridge 1928 244
B002 Fort Pitt Bridge 1959 1201
B003 Fort Duquesne Bridge 1969 1201
B004 Smithfield Street Bridge 1883 361

The schema remained the same—only the instance changed.

Key insight: When we design databases (including when we translate ER diagrams to relational schemas), we’re defining the schema. When users interact with databases (querying, inserting, updating data), they’re working with the instance.

6.2.4 Relational Database Creation Process

Once you’ve designed your database schema (often starting from an ER diagram), you need to actually create and populate the database. This process typically involves three main steps: defining the structure using DDL, loading initial data, and executing queries. Let’s examine each step.

6.2.4.1 Database Design with DDL

DDL (Data Definition Language) is the subset of database commands used to define and modify the structure of your database—creating tables, specifying attributes, defining keys and constraints.

Here’s an example of creating our Bridge table using SQL DDL syntax (note: while we’re focusing on relational algebra in this module, seeing a bit of DDL helps understand the connection between abstract schemas and concrete databases):

CREATE TABLE Bridge (
    Bridge_ID VARCHAR(10) PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Year_Built INTEGER CHECK (Year_Built > 1800)
);

This DDL statement:

  • Creates a table named Bridge
  • Defines four attributes with their types
  • Specifies Bridge_ID as the primary key
  • Adds NOT NULL constraint (Name must have a value)

Some Key DDL operations:

  • CREATE TABLE: Define a new relation
  • ALTER TABLE: Modify an existing table structure
  • DROP TABLE: Delete a table

6.2.4.2 Data Loading

After defining the schema, you need to populate the database with actual data. This is called data loading or data insertion.

Example: Loading bridges into our system:

INSERT INTO Bridge (Bridge_ID, Name, Year_Built, Length)
VALUES ('B001', 'Roberto Clemente Bridge', 1928, 244);

INSERT INTO Bridge (Bridge_ID, Name, Year_Built, Length)
VALUES ('B002', 'Fort Pitt Bridge', 1959, 1201);

INSERT INTO Bridge (Bridge_ID, Name, Year_Built, Length)
VALUES ('B003', 'Fort Duquesne Bridge', 1969, 1201);

For large datasets, DBMSs typically provide bulk loading utilities that can read data from CSV files or other formats:

COPY Bridge FROM '/path/to/bridges.csv'
WITH (FORMAT CSV, HEADER TRUE);

Data loading considerations:

  • Constraint validation: The DBMS checks all constraints during insertion (keys, but also other constraints we haven’t seen yet such as foreign keys, CHECK constraints)
  • Transaction handling: Large data loads are often wrapped in transactions for consistency
  • Performance: Bulk loading is much faster than individual inserts

6.2.4.3 Query Execution

Once your database has structure (schema) and data (instance), users can query it—asking questions and retrieving information.

Example natural language query: “Find all bridges built before 1950”

This would be expressed in SQL as:

SELECT * FROM Bridge
WHERE Year_Built < 1950;

And would return:

Bridge_ID Name Year_Built Length
B001 Roberto Clemente Bridge 1928 244
B004 Smithfield Street Bridge 1883 361

What happens during query execution?

  1. Parsing: The DBMS parses the query to check syntax
  2. Validation: Checks that tables and columns exist, types match, etc.
  3. Optimization: The DBMS creates an execution plan (how to actually retrieve the data efficiently)
  4. Execution: The plan is executed, accessing the stored data
  5. Result: The result is returned as a new relation (table)

Key insight: The relational model’s power comes from declarative queries—you specify what you want, not how to get it. The DBMS handles the “how” through query optimization. This is fundamentally different from navigational databases where you had to specify the exact path through data structures.

In the next sections, we’ll learn relational algebra—the formal mathematical language that underlies these queries and allows us to reason about what queries compute.

6.2.5 Queries and Query Languages

Now that we understand how databases are structured and created, let’s focus on how we actually use them—by asking questions and retrieving information through queries.

A query is a request for information from a database. Some queries are easy to pose (“Find all bridges”), while others are more complex (“Find all sensors on bridges built before 1950 that have recorded readings above threshold in the past month”). Similarly, some queries are easy for a DBMS to execute efficiently, while others require significant computation.

An important note: the term “query language” is somewhat misleading—these languages are also used to modify data (inserting, updating, deleting), not just retrieve it. But historically, they’re called query languages because retrieval was their primary purpose.

A fundamental property: In the relational model, queries return relations. This property is called closure or compositionality—the result of a query is itself a relation, which means you can use it as input to another query. This enables building complex queries from simpler ones.

6.2.5.1 What is a Query?

At the most basic level, a query expresses an information need:

  • Natural language: “Show me all bridges built before 1950”
  • Formal language (relational algebra): \(\sigma_{\text{Year\_Built} < 1950}(\text{Bridge})\)
  • SQL: SELECT * FROM Bridge WHERE Year_Built < 1950

All three express the same information need, but with different levels of precision and formality.

Queries typically involve:

  • Selection: Choosing which rows satisfy certain conditions
  • Projection: Choosing which columns to include in results
  • Combination: Joining data from multiple tables
  • Aggregation: Computing summary statistics (count, average, etc.)

The power of declarative query languages is that you specify what you want, not how to get it. The DBMS figures out an efficient execution strategy.

6.2.5.2 Relational Algebra as a Query Language

Relational algebra is the formal mathematical language for expressing queries on relational databases. It consists of a set of operators that take one or more relations as input and produce a new relation as output.

Relational algebra serves several important purposes:

  1. Formal foundation: Provides a precise mathematical definition of what queries compute
  2. Query optimization: DBMSs use relational algebra internally to reason about and optimize queries
  3. Conceptual tool: Helps us understand what operations are possible and how they compose

The core relational algebra operators are:

  • Select (\(\sigma\)): Picks certain rows based on a condition
  • Project (\(\pi\)): Picks certain columns
  • Cross Product (\(\times\)): Combines all rows from two tables
  • Natural Join (\(\bowtie\)): Combines rows from two tables based on common attributes
  • Union (\(\cup\)): Combines rows from two tables (with same schema)
  • Difference (\(-\)): Rows in first table but not in second
  • Intersection (\(\cap\)): Rows in both tables
  • Rename (\(\rho\)): Changes relation or attribute names

Notice how these connect to the set operations from Module 4! Union, difference, and intersection work exactly as they did with sets. The relational-specific operators (select, project, join) extend set theory for working with structured data.

We’ll explore each operator in detail in the next section.

6.2.6 Relational Algebra Operators

Now we’ll examine each relational algebra operator in detail. For each operator, we’ll provide a formal definition, explain what it does intuitively, and show examples using our bridge monitoring system.

Throughout this section, we’ll use these sample relations:

Bridge:

Bridge_ID Name Year_Built Length
B001 Roberto Clemente Bridge 1928 244
B002 Fort Pitt Bridge 1959 1201
B003 Fort Duquesne Bridge 1969 1201
B004 Smithfield Street Bridge 1883 361

Sensor:

Sensor_ID Type Bridge_ID Install_Date
S001 accelerometer B001 2020
S002 strain_gauge B001 2020
S003 temperature B002 2019
S004 accelerometer B003 2021

6.2.6.1 Select (\(\sigma\))

The select operator picks certain rows from a relation based on a condition. It filters the relation to include only tuples that satisfy the specified predicate.

Notation: \(\sigma_{\text{condition}}(R)\)

Definition: \(\sigma_{\text{condition}}(R) = \{t \mid t \in R \text{ and condition}(t) \text{ is true}\}\)

In words: Select returns all tuples \(t\) from relation \(R\) where the condition evaluates to true.

Example 1: Find all bridges built before 1950

\[\sigma_{\text{Year\_Built} < 1950}(\text{Bridge})\]

Result:

Bridge_ID Name Year_Built Length
B001 Roberto Clemente Bridge 1928 244
B004 Smithfield Street Bridge 1883 361

Example 2: Find all accelerometer sensors

\[\sigma_{\text{Type} = \text{'accelerometer'}}(\text{Sensor})\]

Result:

Sensor_ID Type Bridge_ID Install_Date
S001 accelerometer B001 2020
S004 accelerometer B003 2021

Key points:

  • Select operates on rows (horizontal filtering)
  • The condition can involve comparisons (\(=\), \(<\), \(>\), \(\leq\), \(\geq\), \(\neq\)), logical operators (\(\land\), \(\lor\), \(\neg\)), and arithmetic
  • The result schema is identical to the input schema—only rows are filtered

6.2.6.2 Project (\(\pi\))

The project operator picks certain columns from a relation. It extracts a subset of attributes while discarding the others.

Notation: \(\pi_{\text{A}_1, \text{A}_2, \ldots, \text{A}_k}(R)\)

Definition: \(\pi_{\text{A}_1, \ldots, \text{A}_k}(R) = \{(t.\text{A}_1, \ldots, t.\text{A}_k) \mid t \in R\}\)

In words: Project returns tuples containing only the specified attributes from \(R\).

Example 1: Get just the names and years of bridges

\[\pi_{\text{Name, Year\_Built}}(\text{Bridge})\]

Result:

Name Year_Built
Roberto Clemente Bridge 1928
Fort Pitt Bridge 1959
Fort Duquesne Bridge 1969
Smithfield Street Bridge 1883

Example 2: Get sensor types

\[\pi_{\text{Type}}(\text{Sensor})\]

Result:

Type
accelerometer
strain_gauge
temperature

Key points:

  • Project operates on columns (vertical filtering)
  • Duplicates are always eliminated in relational algebra (since results are sets). Notice in Example 2 we only get one “accelerometer” even though there are two accelerometer sensors
  • SQL behaves differently: it keeps duplicates unless you use DISTINCT. In SQL, SELECT Type FROM Sensor would return “accelerometer” twice, while SELECT DISTINCT Type FROM Sensor matches the relational algebra behavior
  • The result schema contains only the projected attributes

6.2.6.3 Cross Product (\(\times\))

The cross product (also called Cartesian product) combines two relations by pairing each tuple from the first relation with every tuple from the second relation.

Notation: \(R \times S\)

Definition: \(R \times S = \{(r, s) \mid r \in R \text{ and } s \in S\}\)

In words: The cross product returns all possible pairs of tuples, one from \(R\) and one from \(S\).

Example: Consider smaller relations for clarity:

SmallBridge:

Bridge_ID Name
B001 Roberto Clemente
B002 Fort Pitt

SmallSensor:

Sensor_ID Type
S001 accelerometer
S002 strain_gauge

\[\text{SmallBridge} \times \text{SmallSensor}\]

Result:

Bridge_ID Name Sensor_ID Type
B001 Roberto Clemente S001 accelerometer
B001 Roberto Clemente S002 strain_gauge
B002 Fort Pitt S001 accelerometer
B002 Fort Pitt S002 strain_gauge

Key points:

  • If \(R\) has \(m\) tuples and \(S\) has \(n\) tuples, then \(R \times S\) has \(m \times n\) tuples
  • Cross product by itself is rarely useful—it generates all possible combinations, including nonsensical ones (e.g., Bridge B001 paired with a sensor actually on Bridge B002)
  • It becomes useful when combined with select to filter for meaningful pairings (this is essentially what a join does)
  • The result schema includes all attributes from both relations

6.2.6.4 Natural Join (\(\bowtie\))

The natural join combines two relations by matching tuples that have equal values for all attributes with the same name. It’s like a cross product followed by filtering for equality on common attributes, then eliminating duplicate columns.

Notation: \(R \bowtie S\)

Definition: Natural join automatically:

  1. Finds all attributes that appear in both \(R\) and \(S\)
  2. Keeps only tuple pairs where these common attributes have equal values
  3. Eliminates one copy of the duplicate attributes

Example: Join Bridge and Sensor on their common attribute Bridge_ID

\[\text{Bridge} \bowtie \text{Sensor}\]

This implicitly enforces Bridge.Bridge_ID = Sensor.Bridge_ID and removes the duplicate Bridge_ID column.

Result:

Bridge_ID Name Year_Built Length Sensor_ID Type Install_Date
B001 Roberto Clemente Bridge 1928 244 S001 accelerometer 2020
B001 Roberto Clemente Bridge 1928 244 S002 strain_gauge 2020
B002 Fort Pitt Bridge 1959 1201 S003 temperature 2019
B003 Fort Duquesne Bridge 1969 1201 S004 accelerometer 2021

Notice Bridge B004 doesn’t appear (no sensors are installed on it).

Formal definition using other operators:

Natural join can be defined in terms of cross product, select, and project. If \(R\) and \(S\) share attributes \(A_1, \ldots, A_k\):

\[R \bowtie S = \pi_{\text{all attributes, removing duplicates}}(\sigma_{R.A_1 = S.A_1 \land \cdots \land R.A_k = S.A_k}(R \times S))\]

Key points:

  • Automatically finds common attributes—no need to specify join condition
  • Very convenient when relations are designed with consistent naming
  • Eliminates one copy of join attributes (otherwise you’d have Bridge_ID appearing twice)
  • Result includes only tuples with matching values—unmatched tuples are excluded

6.2.6.5 Theta Join (\(\bowtie_\theta\))

The theta join (or just “join”) is the most general form of join. It combines tuples from two relations based on an arbitrary condition \(\theta\).

Notation: \(R \bowtie_\theta S\)

Definition: \(R \bowtie_\theta S = \sigma_\theta(R \times S)\)

In words: Theta join is a cross product followed by a select with condition \(\theta\).

Example 1: Find pairs of bridges and sensors where the sensor was installed after the bridge was built

\[\text{Bridge} \bowtie_{\text{Year\_Built} < \text{Install\_Date}} \text{Sensor}\]

This keeps pairs where the bridge’s construction year is before the sensor’s installation year.

Result:

Bridge_ID Name Year_Built Length Sensor_ID Type Bridge_ID Install_Date
B001 Roberto Clemente Bridge 1928 244 S001 accelerometer B001 2020
B001 Roberto Clemente Bridge 1928 244 S002 strain_gauge B001 2020
B001 Roberto Clemente Bridge 1928 244 S003 temperature B002 2019

(Note: This would include all combinations where Year_Built < Install_Date, which includes many nonsensical pairings)

Example 2: A more practical theta join—finding which sensors are on which bridges:

\[\text{Bridge} \bowtie_{\text{Bridge.Bridge\_ID} = \text{Sensor.Bridge\_ID}} \text{Sensor}\]

This is actually equivalent to natural join when the condition is equality on same-named attributes.

Key points:

  • Theta join is the basic operation implemented in DBMSs—the term “join” in database systems often means theta join
  • The condition \(\theta\) can be any boolean expression involving attributes from both relations
  • Natural join is a special case of theta join (where \(\theta\) is equality on all common attributes)
  • Unlike natural join, theta join does not eliminate duplicate columns

6.2.6.6 Union (\(\cup\))

The union operator combines tuples from two relations vertically, producing a single relation containing all tuples that appear in either input relation (or both).

Notation: \(R \cup S\)

Definition: \(R \cup S = \{t \mid t \in R \text{ or } t \in S\}\)

Requirement: \(R\) and \(S\) must be union-compatible—they must have the same number of attributes, and corresponding attributes must have compatible types. Formally, \(R\) and \(S\) must have the same schema.

Example: Suppose we have two separate monitoring systems with their own Bridge relations:

PittsburghBridges:

Bridge_ID Name
B001 Roberto Clemente
B002 Fort Pitt

AlleghenyBridges:

Bridge_ID Name
B002 Fort Pitt
B003 Fort Duquesne

\[\text{PittsburghBridges} \cup \text{AlleghenyBridges}\]

Result:

Bridge_ID Name
B001 Roberto Clemente
B002 Fort Pitt
B003 Fort Duquesne

Notice Fort Pitt (B002) appears only once—duplicates are eliminated (set property).

Key points:

  • Union combines tuples vertically (adding rows)
  • Both relations must have identical schemas
  • Duplicates are eliminated (set-based operation)
  • This is the same union operation from set theory in Module 4

6.2.6.7 Difference (\(-\))

The difference operator returns tuples that are in the first relation but not in the second relation.

Notation: \(R - S\)

Definition: \(R - S = \{t \mid t \in R \text{ and } t \notin S\}\)

Requirement: Like union, \(R\) and \(S\) must be union-compatible (same schema).

Example: Using the same bridge relations:

\[\text{PittsburghBridges} - \text{AlleghenyBridges}\]

Result:

Bridge_ID Name
B001 Roberto Clemente

Only Roberto Clemente appears because it’s in PittsburghBridges but not in AlleghenyBridges.

Example 2: Find bridges with no sensors

\[\pi_{\text{Bridge\_ID}}(\text{Bridge}) - \pi_{\text{Bridge\_ID}}(\text{Sensor})\]

This gets all bridge IDs from Bridge, then removes those that appear in Sensor, leaving only bridges without sensors.

Result:

Bridge_ID
B004

Key points:

  • Difference is not commutative: \(R - S \neq S - R\) (in general)
  • Both relations must have identical schemas
  • Useful for finding what’s in one relation but not another
  • Same as set difference from Module 4

6.2.6.8 Intersection (\(\cap\))

The intersection operator returns tuples that appear in both input relations.

Notation: \(R \cap S\)

Definition: \(R \cap S = \{t \mid t \in R \text{ and } t \in S\}\)

Requirement: \(R\) and \(S\) must be union-compatible (same schema).

Example:

\[\text{PittsburghBridges} \cap \text{AlleghenyBridges}\]

Result:

Bridge_ID Name
B002 Fort Pitt

Only Fort Pitt appears because it’s in both relations.

Key points:

  • Intersection finds common elements
  • Both relations must have identical schemas
  • Can be expressed using difference: \(R \cap S = R - (R - S)\)
  • Same as set intersection from Module 4

6.2.6.9 Rename (\(\rho\))

The rename operator changes the name of a relation and/or its attributes. This is useful for disambiguation and for making schemas compatible for set operations.

Notation:

  • \(\rho_{S}(R)\) — rename relation \(R\) to \(S\)
  • \(\rho_{S(A_1, \ldots, A_n)}(R)\) — rename relation to \(S\) and attributes to \(A_1, \ldots, A_n\)

Use cases:

1. Unifying schemas for set operations

Suppose we have:

OldBridges(ID, BridgeName) and NewBridges(Identifier, Name)

They have different schemas, but we want to take their union. We can rename to make them compatible:

\[\rho_{\text{Bridges(Bridge\_ID, Name)}}(\text{OldBridges}) \cup \rho_{\text{Bridges(Bridge\_ID, Name)}}(\text{NewBridges})\]

2. Disambiguation in self-joins

A self-join is when a relation is joined with itself. We need to rename one copy to distinguish them.

Example: Find pairs of bridges built in the same year

\[\sigma_{\text{B1.Year\_Built} = \text{B2.Year\_Built} \land \text{B1.Bridge\_ID} < \text{B2.Bridge\_ID}}(\rho_{\text{B1}}(\text{Bridge}) \times \rho_{\text{B2}}(\text{Bridge}))\]

Here we rename Bridge to B1 and B2 so we can refer to them separately.

Alternate notation—Assignment statements:

Instead of deeply nested expressions, we can use assignment statements (a sequence of instructions):

B1 := Bridge
B2 := Bridge
SameYear := σ(B1.Year_Built = B2.Year_Built AND B1.Bridge_ID < B2.Bridge_ID)(B1 × B2)
Result := π(B1.Name, B2.Name, B1.Year_Built)(SameYear)

This is more readable for complex queries—each step is named and can be referenced later.

Key points:

  • Rename is essential for self-joins (comparing a relation to itself)
  • Useful for making schemas compatible for set operations
  • Assignment notation provides an algorithmic way to express complex queries

6.2.6.10 Composing Operators for Complex Queries

The real power of relational algebra comes from composing operators—building complex queries by combining simpler operations. Because each operator returns a relation, the output of one operation can be the input to another.

Example: “Find the names of bridges that have accelerometer sensors”

We need to:

  1. Filter sensors to just accelerometers
  2. Join with bridges to get bridge information
  3. Project to get just bridge names

Step-by-step:

  1. \(\text{AccelSensors} := \sigma_{\text{Type} = \text{'accelerometer'}}(\text{Sensor})\)
  2. \(\text{BridgesWithAccel} := \text{Bridge} \bowtie \text{AccelSensors}\)
  3. \(\text{Result} := \pi_{\text{Name}}(\text{BridgesWithAccel})\)

As a single expression:

\[\pi_{\text{Name}}(\text{Bridge} \bowtie \sigma_{\text{Type} = \text{'accelerometer'}}(\text{Sensor}))\]

Result:

Name
Roberto Clemente Bridge
Fort Duquesne Bridge

Example 2: “Find bridges built before 1950 that are longer than 300 meters”

\[\sigma_{\text{Length} > 300}(\sigma_{\text{Year\_Built} < 1950}(\text{Bridge}))\]

Or equivalently, using a compound condition:

\[\sigma_{\text{Year\_Built} < 1950 \land \text{Length} > 300}(\text{Bridge})\]

Result:

Bridge_ID Name Year_Built Length
B004 Smithfield Street Bridge 1883 361

Key insight: Relational algebra is compositional—you can build arbitrarily complex queries by chaining operators together. This is the foundation for how DBMSs process queries: they break down complex SQL queries into sequences of relational algebra operations, then optimize the order and method of execution.

6.2.7 From ER Diagrams to Relational Models

Now we bring everything together: translating the ER diagrams from Module 4 into actual relational schemas that can be implemented in a database. This translation process is mechanical—there are specific rules for converting each ER diagram component into relational tables.

6.2.7.1 Translation Rules: Entity Sets to Relations

The most basic translation rule is straightforward: each entity set becomes a relation (table).

Rule: For each entity set \(E\) with attributes \(A_1, A_2, \ldots, A_n\):

  • Create a relation \(E\) with attributes \(A_1, A_2, \ldots, A_n\)
  • The key attribute(s) of the entity set become the primary key of the relation

Example: The Bridge entity set from Module 4:

ER Diagram: Bridge entity with attributes (Bridge_ID, Name, Year_Built, Material, Length), where Bridge_ID is the key.

Relational Schema:

Bridge(Bridge_ID, Name, Year_Built, Material, Length)
  PRIMARY KEY: Bridge_ID

As a table:

Bridge_ID Name Year_Built Material Length
B001 Roberto Clemente Bridge 1928 steel 244
B002 Fort Pitt Bridge 1959 steel 1201

Example 2: The Sensor entity set:

ER Diagram: Sensor entity with attributes (Sensor_ID, Type, Install_Date), where Sensor_ID is the key.

Relational Schema:

Sensor(Sensor_ID, Type, Install_Date)
  PRIMARY KEY: Sensor_ID

Key points:

  • Each entity becomes a row in the relation
  • Entity attributes become relation attributes
  • Key attributes become primary keys
  • Multi-valued attributes require special handling (we’ll see this shortly)

6.2.7.2 Handling Relationships

Relationships in ER diagrams connect entity sets. How we translate them depends on the multiplicity (1:1, 1:n, or n:m).

Notice that in class (and in the slides) I only idscussed the n:m case, recommending that we create a new table for the relationship always. These notes also consider the special cases of 1:1 and 1:n relationships. While you can still create a new table for those relationships, it may be more efficient to avoid that and instead use foreign keys (or merge the tables) in some situations as discussed by Claude here.

6.2.7.2.1 One-to-One Relationships

For a one-to-one (1:1) relationship between entity sets \(E\) and \(F\):

Option 1: Add a foreign key to one of the relations

  • Choose either \(E\) or \(F\) (typically the one participating fully in the relationship)
  • Add \(F\)’s primary key as a foreign key attribute in \(E\) (or vice versa)

Example: Suppose each Bridge has exactly one TaxRecord and each TaxRecord belongs to exactly one Bridge.

ER Diagram: Bridge (1) — Has_Tax_Record — (1) TaxRecord

Relational Schema:

Bridge(Bridge_ID, Name, Year_Built, Length)
  PRIMARY KEY: Bridge_ID

TaxRecord(Tax_ID, Assessment_Value, Last_Updated, Bridge_ID)
  PRIMARY KEY: Tax_ID
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)

We added Bridge_ID as a foreign key in TaxRecord.

Option 2: Merge into a single relation

  • If the relationship is total (every entity participates), you could combine both entity sets into one table
  • This is less common but can simplify queries
6.2.7.2.2 One-to-Many Relationships

For a one-to-many (1:n) relationship from \(E\) to \(F\) (one \(E\) relates to many \(F\)s):

Rule: Add the primary key of the “one” side as a foreign key in the “many” side.

Example: One Bridge is monitored by many Sensors (from Module 4).

ER Diagram: Bridge (1) — Monitored_By — (n) Sensor

Relational Schema:

Bridge(Bridge_ID, Name, Year_Built, Length)
  PRIMARY KEY: Bridge_ID

Sensor(Sensor_ID, Type, Install_Date, Bridge_ID)
  PRIMARY KEY: Sensor_ID
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)

Each sensor has a Bridge_ID foreign key indicating which bridge it’s on.

As tables:

Bridge:

Bridge_ID Name Year_Built Length
B001 Roberto Clemente Bridge 1928 244
B002 Fort Pitt Bridge 1959 1201

Sensor:

Sensor_ID Type Install_Date Bridge_ID
S001 accelerometer 2020 B001
S002 strain_gauge 2020 B001
S003 temperature 2019 B002

Notice multiple sensors can have the same Bridge_ID (many-to-one from Sensor’s perspective).

6.2.7.2.3 Many-to-Many Relationships

For a many-to-many (n:m) relationship between entity sets \(E\) and \(F\):

Rule: Create a new relation (called a junction table or linking table) with:

  • Primary keys from both \(E\) and \(F\) as foreign keys
  • The combination of these foreign keys forms the primary key
  • Any attributes of the relationship itself

Example: Engineers inspect Bridges, and each engineer can inspect multiple bridges, while each bridge can be inspected by multiple engineers.

ER Diagram: Engineer (n) — Inspects — (m) Bridge

The relationship Inspects has attributes: Inspection_Date, Condition_Rating

Relational Schema:

Engineer(Employee_ID, Name, Specialty)
  PRIMARY KEY: Employee_ID

Bridge(Bridge_ID, Name, Year_Built, Length)
  PRIMARY KEY: Bridge_ID

Inspects(Employee_ID, Bridge_ID, Inspection_Date, Condition_Rating)
  PRIMARY KEY: (Employee_ID, Bridge_ID, Inspection_Date)
  FOREIGN KEY: Employee_ID REFERENCES Engineer(Employee_ID)
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)

As tables:

Inspects:

Employee_ID Bridge_ID Inspection_Date Condition_Rating
E001 B001 2024-01-15 Good
E001 B002 2024-01-20 Fair
E002 B001 2024-02-10 Good

This captures that Engineer E001 inspected both B001 and B002, and Bridge B001 was inspected by both E001 and E002.

Key insight: Many-to-many relationships cannot be captured with just foreign keys in the entity tables—you must create a junction table.

6.2.7.3 Weak Entity Sets in Relational Models

Weak entity sets depend on another entity set for their identity. When translating to relations:

Rule:

  • Create a relation for the weak entity set
  • Include all its own attributes
  • Include the primary key of the identifying (owner) entity set as a foreign key
  • The primary key of the weak entity is a composite key: its own partial key plus the owner’s primary key

Example: Sensor Readings depend on Sensors (from Module 4).

ER Diagram: Sensor (strong) — Generates — (weak) Reading

Reading has partial key Sequence_Number (unique within each sensor, but not globally).

Relational Schema:

Sensor(Sensor_ID, Type, Bridge_ID)
  PRIMARY KEY: Sensor_ID

Reading(Sensor_ID, Sequence_Number, Timestamp, Value)
  PRIMARY KEY: (Sensor_ID, Sequence_Number)
  FOREIGN KEY: Sensor_ID REFERENCES Sensor(Sensor_ID)

As a table:

Reading:

Sensor_ID Sequence_Number Timestamp Value
S001 1 2024-01-01 08:00 0.05
S001 2 2024-01-01 08:10 0.06
S002 1 2024-01-01 08:00 120.5
S002 2 2024-01-01 08:10 121.0

Notice Sequence_Number alone doesn’t uniquely identify a reading—you need both Sensor_ID and Sequence_Number.

6.2.7.4 ISA Hierarchies and Inheritance

ISA hierarchies (subclasses) have multiple translation approaches. The choice depends on your query patterns and storage efficiency needs.

Approach 1: Single Table (ER-Style) - Create one table for the superclass with all possible attributes - Add a Type column to distinguish subclasses - Use NULL for attributes that don’t apply to a given subclass

Approach 2: Table Per Subclass - Create separate tables for each subclass - Each subclass table contains all attributes (inherited + specific) - No table for the superclass

Approach 3: Table Per Hierarchy (Hybrid) - Create one table for the superclass with shared attributes - Create separate tables for each subclass with subclass-specific attributes only - Subclass tables reference the superclass via foreign key

Example: Bridge has subclasses Suspension_Bridge, Arch_Bridge, Truss_Bridge (from Module 4).

ER Diagram:

  • Bridge (superclass) with attributes (Bridge_ID, Name, Year_Built, Length)
  • Suspension_Bridge with specific attributes (Cable_Diameter, Tower_Height)
  • Arch_Bridge with specific attributes (Arch_Span, Rise)

Approach 1 - Single Table:

Bridge(Bridge_ID, Name, Year_Built, Length, Bridge_Type,
       Cable_Diameter, Tower_Height, Arch_Span, Rise)
  PRIMARY KEY: Bridge_ID
Bridge_ID Name Year_Built Length Bridge_Type Cable_Diameter Tower_Height Arch_Span Rise
B001 Roberto Clemente 1928 244 suspension 0.5 45 NULL NULL
B002 Fort Pitt 1959 1201 arch NULL NULL 300 60

Approach 3 - Table Per Hierarchy:

Bridge(Bridge_ID, Name, Year_Built, Length)
  PRIMARY KEY: Bridge_ID

Suspension_Bridge(Bridge_ID, Cable_Diameter, Tower_Height)
  PRIMARY KEY: Bridge_ID
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)

Arch_Bridge(Bridge_ID, Arch_Span, Rise)
  PRIMARY KEY: Bridge_ID
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)

Tradeoffs:

  • Single Table: Simple queries for common attributes, but many NULLs and wasted space
  • Table Per Subclass: No NULLs, but queries for common attributes require UNIONs
  • Table Per Hierarchy: Balanced approach, but requires JOINs to get complete entity

6.2.7.5 Complete Example: Bridge Monitoring System

Let’s translate the complete ER diagram from Module 4:

Entity Sets:

  • Bridge (Bridge_ID, Name, Location, Year_Built, Materials)
  • Engineer (Employee_ID, Name, Specialty, Hire_Date)
  • Sensor (Sensor_ID, Type, Install_Date)
  • Manufacturer (Manufacturer_ID, Company_Name)

Relationships:

  • Bridge (1) — Monitored_By — (n) Sensor
  • Sensor (n) — Produced_By — (1) Manufacturer
  • Engineer (n) — Inspects — (m) Bridge (with attributes Inspection_Date, Condition_Rating)
  • Engineer (n) — Supervises — (n) Engineer (self-relationship)

Weak Entity:

  • Reading (depends on Sensor) with partial key Seq_Number

Complete Relational Schema:

-- Entity sets become tables
Bridge(Bridge_ID, Name, Location, Year_Built, Materials, Length)
  PRIMARY KEY: Bridge_ID

Engineer(Employee_ID, Name, Specialty, Hire_Date)
  PRIMARY KEY: Employee_ID

Sensor(Sensor_ID, Type, Install_Date, Bridge_ID, Manufacturer_ID)
  PRIMARY KEY: Sensor_ID
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)
  FOREIGN KEY: Manufacturer_ID REFERENCES Manufacturer(Manufacturer_ID)

Manufacturer(Manufacturer_ID, Company_Name)
  PRIMARY KEY: Manufacturer_ID

-- Weak entity
Reading(Sensor_ID, Seq_Number, Timestamp, Value)
  PRIMARY KEY: (Sensor_ID, Seq_Number)
  FOREIGN KEY: Sensor_ID REFERENCES Sensor(Sensor_ID)

-- Many-to-many relationships become junction tables
Inspects(Employee_ID, Bridge_ID, Inspection_Date, Condition_Rating)
  PRIMARY KEY: (Employee_ID, Bridge_ID, Inspection_Date)
  FOREIGN KEY: Employee_ID REFERENCES Engineer(Employee_ID)
  FOREIGN KEY: Bridge_ID REFERENCES Bridge(Bridge_ID)

Supervises(Supervisor_ID, Supervisee_ID)
  PRIMARY KEY: (Supervisor_ID, Supervisee_ID)
  FOREIGN KEY: Supervisor_ID REFERENCES Engineer(Employee_ID)
  FOREIGN KEY: Supervisee_ID REFERENCES Engineer(Employee_ID)

Key observations:

  1. Each entity set became a table
  2. One-to-many relationships (Bridge-Sensor, Manufacturer-Sensor) are captured by foreign keys in the “many” side
  3. Many-to-many relationships (Engineer-Bridge Inspects, Engineer-Engineer Supervises) required junction tables
  4. The weak entity Reading has a composite primary key
  5. Self-relationships like Supervises are handled like any other many-to-many relationship

This relational schema can now be implemented in any relational DBMS using DDL (as we saw in the Database Creation Process section), populated with data, and queried using relational algebra or SQL.