5 Module 4: Entity Relationship Diagrams and Set Theory

5.1 Module Overview

5.1.1 Learning Objectives

By the end of this module, students will be able to:

Understand the structure and visual components of Entity Relationship Diagrams, and the motivation for using them
Be able to distinguish between attributes, entity sets and relationships
Be able to identify weak entity sets and how to display them
Understand the need and use of roles and sub-classes in ER diagrams
Recognize the value of set theory for describing and working with data records
Be able to define sets in your own words, along with derivative terms such as sub-sets and super-sets
Be able to apply (and recognize the notation for) the basic set operations of intersection, union, set difference, complement, cartesian product, cardinality
Recognize the value of visual diagramming approaches to sets such as Venn diagrams, and identify equivalences between set theory expressions and their corresponding Venn diagrams
Understand and be able to apply commutative, associative, distributive properties of set operations. Similarly, understand what are complimentary sets, and disjoint sets

5.1.2 Topics Covered

Entity Relationship Diagrams (ER Diagrams)
- Components: attributes, entity sets, relationships
- Weak Entity Sets
- Roles and Subclasses in ER diagrams
Set Theory
- Definition of sets, subsets, and supersets
- Basic operations: intersection, union, set difference, complement, cartesian product, cardinality
- Venn Diagrams
- Properties: commutative, associative, distributive, complementary sets, disjoint sets

5.1.3 Project Milestones

Come up with 3 ideas for your final project, regardless of whether or not you are already assigned to a team.

5.2 Lecture Notes

5.2.1 From Time Series to Data Structures

In Module 3, we focused on individual time series—sequences of measurements from sensors over time. We learned how to clean data, propagate uncertainty, detect outliers, and fit trends using linear regression. These techniques are essential for understanding what each sensor tells us.

But real engineering systems don’t consist of isolated sensors. Consider an environmental monitoring system for a watershed: you might have stream flow sensors at multiple locations, weather stations measuring precipitation and temperature, water quality sensors measuring pH and turbidity, and infrastructure sensors monitoring dams and bridges. Each sensor produces its own time series, but the real power comes from understanding the relationships among these measurements.

This raises new questions:

How do we describe the structure of our data collection system itself?
What entities (sensors, locations, infrastructure assets) exist in our system?
How are these entities related to each other?
What properties (attributes) does each entity have?

When systems become complex, with dozens or hundreds of sensors, multiple types of infrastructure, and intricate relationships, we need structured methods for organizing and visualizing this information. We can’t just keep track of it all in our heads or in ad-hoc spreadsheets.

Entity Relationship (ER) Diagrams provide a visual, intuitive way to model the structure of complex data systems before we implement them in databases. Think of ER diagrams as the architectural blueprint for your data—they help you think through what information you need to store and how different pieces relate to each other, long before you write a single line of database code.

5.2.2 Entity Relationship Diagrams

5.2.2.1 What Are Entity Relationship Diagrams?

Entity Relationship Diagrams (ER Diagrams) are graphical models that help us answer two fundamental questions about the data our system needs:

What information must the database hold? (What entities and their properties exist?)
What are the relationships among components of that information? (How do entities connect to each other?)

At their core, ER diagrams are visual representations where:

Boxes (rectangles) represent collections of similar data elements (called entity sets)
Arrows and diamonds represent connections and relationships between these elements
Ovals represent properties (attributes) of the elements

ER diagrams serve as a crucial design tool—they’re typically the first step in database design. You sketch out an ER diagram to understand your data structure, then later translate it into a more rigorous model (like a relational database schema). The diagram helps you communicate with stakeholders, identify missing information, and catch design flaws early when they’re easy to fix.

Why are ER diagrams important?

Communication: They provide a common visual language for engineers, database designers, and domain experts to discuss data structure
Design validation: They help catch structural problems (missing entities, incorrect relationships) before implementation
Documentation: They serve as living documentation of your system’s data architecture
Foundation for implementation: They directly inform the design of relational databases and other data structures

5.2.2.2 A Simple Example: Infrastructure Monitoring System

Let’s start with a concrete example from civil engineering: a monitoring system for a set of bridges in a city.

The scenario: The city maintains several bridges and has installed sensors on each to monitor structural health. Each bridge has multiple sensors measuring different things (vibration, strain, deflection). We need to organize data about:

The bridges themselves (location, construction year, type)
The sensors (type, installation date, measurement units)
The measurements collected over time
Which sensors are installed on which bridges

Here’s a simple ER diagram for this system:

Figure 5.1: Simple ER diagram for a bridge monitoring system

Reading this diagram:

Bridge and Sensor (blue boxes) are entity sets—collections of bridges and sensors
The ovals (yellow) attached to each box are attributes—properties that describe each entity
The diamond (green) labeled “Monitored_By” represents a relationship between bridges and sensors
The labels “1” and “n” indicate multiplicity: one bridge can be monitored by many (n) sensors

This simple diagram already tells us a lot: we can see at a glance what information we’re tracking and how bridges and sensors are related. As we develop our understanding, we’ll add more detail and handle more complex scenarios.

5.2.2.3 Basic Components of ER Diagrams

Now let’s examine each component of ER diagrams in detail, building up from the simplest concepts to more complex ones.

5.2.2.3.1 Entity Sets

An entity is an abstract object or “thing” that exists in the domain we’re modeling. Entities have distinct identity and properties that distinguish them from other objects.

Examples of entities:

A specific bridge (e.g., the Roberto Clemente Bridge in Pittsburgh)
A particular sensor (e.g., strain gauge #A-2301 installed on the bridge deck)
A monitoring station (e.g., USGS stream gauge 03049500 on the Allegheny River)
A maintenance event (e.g., the inspection performed on June 15, 2024)

An entity set is a collection (or group) of similar entities that share the same type and properties. Entity sets form the basis for database tables—each entity in the set will eventually become a row in a table.

Examples of entity sets:

Bridge: The collection of all bridges in the city’s infrastructure system
Sensor: The collection of all sensors deployed across the monitoring network
MonitoringStation: All environmental monitoring stations in a watershed
MaintenanceEvent: All recorded maintenance activities

Notation: Entity sets are represented as rectangles in ER diagrams.

Key insight: Think of entity sets as categories or types. Just as “Student” is a category containing individual students, “Bridge” is a category containing individual bridges. Each individual bridge is an entity; the collection of all bridges is the entity set.

5.2.2.3.2 Attributes

Attributes are properties or characteristics that describe the entities in an entity set. Each entity in a set has values for these attributes, and these values provide detailed information about that specific entity.

Notation: Attributes are represented as ovals connected to their entity set.

Example: The Bridge entity set might have these attributes:

Bridge_ID: A unique identifier for the bridge
Name: The bridge’s name (e.g., “Fort Pitt Bridge”)
Location: Geographic location or address
Year_Built: Construction year
Bridge_Type: Type of bridge (suspension, truss, arch, etc.)
Material: Construction materials used
Length: Span length in meters
Max_Load: Maximum load capacity in tons

Types of Attributes:

ER diagrams distinguish several special types of attributes:

1. Key Attributes (Primary Keys)

Some attributes uniquely identify each entity in the set—no two entities can have the same value for these attributes. These are called key attributes or primary keys.

In our Bridge example, Bridge_ID would be a key attribute
Each bridge has a unique ID; you can’t have two different bridges with the same ID
Notation: Underline the attribute name in the oval (e.g., Bridge_ID)

2. Multi-valued Attributes

Some attributes can have multiple values for a single entity. For example:

A bridge might be constructed from multiple materials (steel, concrete, and asphalt)
A sensor might measure multiple quantities simultaneously
A building might have multiple phone numbers

Notation: Multi-valued attributes are shown as double-lined ellipses (an ellipse within an ellipse).

Example: If the Material attribute of Bridge can hold multiple values, we’d draw it as:

Figure 5.2: Multi-valued attribute example

3. Derived Attributes

Some attributes can be computed or derived from other existing attributes rather than stored directly. For example:

Age of a bridge can be derived from Year_Built and the current year
Average_Daily_Flow might be derived from individual flow measurements
Total_Sensors for a bridge could be counted from the relationship to the Sensor entity set

Notation: Derived attributes are shown as dashed-line ellipses.

Why distinguish derived attributes? It helps in database design—you typically don’t store derived values (they’re redundant and can become inconsistent), but you do want to document them in your ER diagram to show what information users can obtain.

Example diagram showing different attribute types:

Figure 5.3: Different types of attributes in an ER diagram

In this diagram:

Bridge_ID is underlined (key attribute)
Material has double lines (multi-valued)
Age has dashed lines (derived from Year_Built)

5.2.2.3.3 Relationships

Entity sets by themselves only tell us what things exist in our system. The real power of ER diagrams comes from modeling relationships—the connections among entity sets.

Definition: A relationship is a connection or association among two or more entity sets. Relationships capture how entities interact with, depend on, or relate to each other.

Notation: Relationships are represented as diamonds in ER diagrams, with lines connecting the diamond to the participating entity sets.

Examples from infrastructure monitoring:

A Bridge is Monitored_By Sensors
A Sensor Records Measurements
A MaintenanceEvent Performed_On a Bridge
An Engineer Inspects a Bridge

Multiplicity (Cardinality) of Relationships

A critical aspect of relationships is their multiplicity or cardinality—how many entities from each set can participate in the relationship. Understanding multiplicity is essential for correct database design.

For binary relationships (relationships between two entity sets), there are four common patterns:

1. One-to-One (1:1)

Each entity in set A is associated with at most one entity in set B, and vice versa.

Example: In a city management system, each Building has exactly one TaxRecord, and each TaxRecord belongs to exactly one Building.

Figure 5.4: One-to-one relationship example

Notation: Arrow pointing to both entity sets, or labels “1” on both sides.

2. One-to-Many (1:n)

Each entity in set A can be associated with multiple entities in set B, but each entity in B is associated with at most one entity in A.

Example: One Bridge is monitored by many Sensors, but each Sensor is installed on only one Bridge.

Figure 5.5: One-to-many relationship example

Notation: Arrow points toward the “one” side, or labels “1” and “n” on the respective sides.

3. Many-to-One (n:1)

This is the reverse of one-to-many: multiple entities in set A can be associated with a single entity in set B.

Example: Many Sensors are manufactured by one Manufacturer, but each Manufacturer produces many sensors.

Figure 5.6: Many-to-one relationship example

Note: Many-to-one and one-to-many are essentially the same relationship viewed from different directions. The arrow points toward the “one” side.

4. Many-to-Many (n:m or m:n)

Each entity in set A can be associated with multiple entities in set B, and each entity in B can be associated with multiple entities in A.

Example: Multiple Engineers can inspect a given Bridge, and each Engineer can inspect multiple Bridges.

Figure 5.7: Many-to-many relationship example

Notation: No arrows, or labels “n” and “m” (or “n” on both sides) indicating multiple on each side.

Relationship Attributes

Relationships themselves can have attributes! These are properties that belong to the connection between entities, not to the entities themselves.

Example: Consider the Inspects relationship between Engineer and Bridge. We might want to record:

Inspection_Date: When the inspection occurred
Condition_Rating: The condition score assigned during that inspection
Notes: Comments from that specific inspection

These attributes don’t belong to the Engineer alone (one engineer performs many inspections on different dates), nor to the Bridge alone (a bridge is inspected multiple times). They belong to the relationship—the specific event of this engineer inspecting this bridge.

Figure 5.8: Relationship with attributes

Key insight: Relationship attributes are especially common in many-to-many relationships, where they capture information about each specific connection between entities.

5.2.2.4 Weak Entity Sets

Sometimes, an entity set cannot uniquely identify its entities using only its own attributes. Instead, the identity of each entity depends partly on attributes from another related entity set. This leads to the concept of weak entity sets.

Definition: A weak entity set is an entity set whose key (unique identifier) is composed of attributes, some or all of which belong to another entity set. The weak entity set depends on another entity set (called its identifying or owner entity set) for its existence and identification.

Motivation and Common Scenarios:

The principal source of weak entity sets occurs when entity sets fall into a hierarchy of containment or ownership. If entity set E contains subunits that belong to entities in set F, the names or identifiers of E entities may not be unique across the entire system—they’re only unique within the context of their parent F entity.

Example 1: Rooms in Buildings

Consider a university campus database with Building and Room entity sets.

Each building has a unique Building_ID
Rooms are identified by Room_Number within a building
But Room_Number alone isn’t globally unique—many buildings have a “Room 101”
A room’s complete identity requires both its room number and the building it’s in

In this case, Room is a weak entity set:

It depends on Building (the identifying entity set)
Its key is a combination of its own Room_Number and the Building_ID from the related Building entity

Figure 5.9: Weak entity set example: Rooms depend on Buildings

Notation:

Weak entity sets are shown as rectangles with double borders (double-lined box)
The identifying relationship (the many-one relationship connecting the weak entity to its owner) is shown as a diamond with double borders
The partial key attribute (e.g., Room_Number) is underlined in the weak entity

Example 2: Sensor Readings in a Monitoring System

Consider our bridge monitoring system. Suppose each sensor generates readings with sequential reading numbers:

Sensor is a regular (strong) entity with unique Sensor_ID
Reading entities are identified by a Sequence_Number (1, 2, 3, …) within each sensor
But Sequence_Number alone doesn’t uniquely identify a reading globally—Sensor A’s reading #5 is different from Sensor B’s reading #5
Each reading’s complete identity requires the Sensor_ID plus its own Sequence_Number

Therefore, Reading is a weak entity set depending on Sensor:

Figure 5.10: Weak entity set example: Readings depend on Sensors

Why distinguish weak entity sets?

Conceptual clarity: Makes explicit the dependency between entities
Database design: Affects how you create tables and foreign keys
Integrity constraints: Weak entities cannot exist without their owner (if you delete a Building, all its Rooms must be deleted too—this is called a cascading delete)
Key structure: Documents that the complete key is composite (spans multiple entity sets)

Key characteristics of weak entity sets:

They have a partial key (discriminator)—an attribute that is unique only within the context of the owner entity
They participate in an identifying relationship with their owner entity set
The relationship is always many-to-one from the weak entity to the owner
Weak entities cannot exist independently—they’re existentially dependent on their owner

5.2.2.5 Roles in Relationships

Sometimes, the same entity set participates in a relationship multiple times, playing different roles in each participation. When this happens, we need to explicitly label the roles to distinguish them.

When are roles needed?

Roles are necessary when:

The same entity set appears two or more times in a single relationship
We need to clarify which “side” of the relationship each participation represents
The meaning of the relationship depends on distinguishing the different ways the entity participates

Example 1: Road Network

Consider a transportation database modeling road segments connecting intersections.

Both endpoints of a road segment are intersections
So the Intersection entity set appears twice in the Connects relationship
We need to distinguish the “start” intersection from the “end” intersection

Figure 5.11: Roles in relationships: Road segments connect intersections

Here, the labels “from” and “to” are roles that clarify which intersection is the start point and which is the end point of the road segment.

Notation: We label the edges (lines) between the entity set and the relationship diamond with the role names.

Key insight: Roles are essential for clarity when an entity set relates to itself. Without role labels, it would be ambiguous which entity plays which part in the relationship.

5.2.2.6 Subclasses and ISA Hierarchies

Often, an entity set contains certain entities that have special properties not shared by all members of the set. Some entities are more specialized versions of the general entity type, with additional attributes or relationships.

Motivation: In these cases, it’s useful to define subclasses (also called subtypes or specializations)—special-case entity sets that inherit properties from a parent entity set but add their own specific attributes and relationships.

Definition: An ISA hierarchy (read as “is-a”) models inheritance where a subclass entity “is a” specialized version of the parent entity.

Example 1: Types of Bridges

Consider our bridge monitoring system. All bridges share common properties (location, year built, material), but different bridge types have type-specific attributes:

Suspension bridges have cable properties (cable diameter, tower height, anchorage type)
Truss bridges have truss configuration details (truss type, number of panels)
Arch bridges have arch span and rise characteristics

Rather than adding all these specialized attributes to every bridge (leaving most empty), we create subclasses:

Figure 5.12: ISA hierarchy: Specialized bridge types

Reading this diagram:

Suspension_Bridge, Truss_Bridge, and Arch_Bridge are subclasses of Bridge
The triangle labeled “isa” represents the inheritance relationship
Each suspension bridge is a bridge, so it has all Bridge attributes (ID, Name, Location) plus its specific attributes (Cable_Diameter, Tower_Height)
Similarly for truss and arch bridges

Notation:

Subclasses are connected to their parent (superclass) through a triangle labeled “isa”
The triangle points toward the subclasses
All ISA relationships are one-to-one (each subclass entity corresponds to exactly one parent entity)
We don’t draw arrows on ISA relationships because the one-to-one nature is implicit

Properties of ISA Hierarchies:

Inheritance: Subclasses inherit all attributes and relationships from their parent
Specialization: Subclasses add their own specific attributes and relationships
Substitutability: Anywhere a parent entity can appear, a subclass entity can appear (but not vice versa)
Mutual exclusivity (often): An entity typically belongs to only one subclass (a bridge is either suspension or truss, not both)—though this isn’t always required

Why use subclasses?

Clarity: Makes explicit which attributes apply to which entity types
Efficiency: Avoids storing null values for attributes that don’t apply
Extensibility: Easy to add new specialized types without modifying the parent
Type-specific relationships: Subclasses can participate in relationships unique to their type

Real-world analogy: Think of biological taxonomy. A “Mammal” is a subclass of “Animal”—it inherits all animal properties (eats, moves, reproduces) but adds mammal-specific features (warm-blooded, has fur, nurses young). Similarly, “Dog” is a subclass of “Mammal,” further specializing the type.

5.2.2.7 Putting It All Together: A Complete ER Diagram

Now let’s integrate all the concepts we’ve learned into a comprehensive ER diagram for a realistic infrastructure monitoring system. This example will demonstrate entity sets, attributes (including key, multi-valued, and derived), relationships with different multiplicities, weak entity sets, roles, and subclasses.

Scenario: A city operates a structural health monitoring system for bridges. The system tracks:

Bridges of different types (suspension, arch, truss)
Sensors installed on bridges
Engineers who maintain and inspect bridges
Inspection events with condition ratings
Sensor readings over time

Here’s the complete ER diagram:

Figure 5.13: Comprehensive ER diagram for bridge monitoring system

Reading this comprehensive diagram:

Entity Sets and Attributes:

Bridge: Has Bridge_ID (key), Name, Location, Year_Built, Materials (multi-valued - bridges can use multiple materials), and Age (derived from Year_Built)
Engineer: Has Employee_ID (key), Name, Specialty, Hire_Date
Sensor: Has Sensor_ID (key), Type, Install_Date, Units
Manufacturer: Has Manufacturer_ID (key), Company_Name
Reading (weak entity): Has Seq_Number (partial key), Timestamp, Value

Relationships:

Inspects (many-to-many): Engineers inspect bridges, with attributes Inspection_Date and Condition_Rating describing each inspection event
Monitored_By (one-to-many): Each bridge is monitored by many sensors
Produced_By (many-to-one): Many sensors are produced by one manufacturer
Generates (one-to-many, identifying): Each sensor generates many readings; Reading is a weak entity depending on Sensor
Supervises (self-relationship with roles): Engineers supervise other engineers, with roles “supervisor” and “supervisee”

Subclasses (ISA Hierarchy):

Suspension, Arch, and Truss are specialized types of bridges
Each has type-specific attributes (cable diameter, arch span, truss configuration)
They inherit all Bridge attributes

This diagram captures the complete structure of the monitoring system in a single visual representation. From this ER diagram, a database designer could:

Identify all the tables needed
Determine primary and foreign keys
Understand entity dependencies and referential integrity constraints
Plan for specialized storage of subclass-specific data

Design insights from this diagram:

Readings are weak entities because they only make sense in the context of a specific sensor
Materials is multi-valued because bridges often use multiple construction materials
Age is derived to avoid redundant storage (can compute from Year_Built)
Inspection attributes belong to the relationship, not to engineers or bridges alone
The supervision hierarchy uses roles to clarify the asymmetric relationship among engineers
Bridge subclasses allow type-specific attributes without cluttering the base Bridge entity

5.2.3 Set Theory for Data Management

5.2.3.1 Why Set Theory Matters for Databases

We’ve just learned how ER diagrams help us design the structure of a database—they tell us what entities and relationships we need. But once we’ve implemented a database, how do we actually work with the data?

This is where set theory becomes essential. Set theory provides the mathematical foundation for:

Understanding what data is: Every table in a database is fundamentally a set of records (rows). The entity set “Bridge” in our ER diagram becomes a set of bridge records in the database.
Querying data: When you ask a database “find all bridges built after 2000,” you’re asking for a subset of the bridge records. When you ask “find sensors that are on bridges AND manufactured by Company X,” you’re computing an intersection of two sets.
Combining data: Database operations like JOIN combine data from multiple tables using set operations. Understanding set theory helps you understand what these operations actually do.
Reasoning about data: Set theory gives us precise language to describe collections of data and relationships between collections.

The connection: Think of it this way:

ER diagrams describe the structure (the blueprint)
Set theory describes the data itself and operations on it (the mathematical foundation)
Databases are the implementation (the actual system)

As we’ll see, operations you perform on database records—filtering, combining, selecting—are fundamentally set operations. Understanding set theory helps you think clearly about what your queries do and how to construct them correctly.

5.2.3.2 Sets: The Basics

5.2.3.2.1 Defining Sets

A set is a collection of distinct objects, considered as a single entity. The objects in a set are called elements or members of the set.

Key properties of sets:

Unordered: The order of elements doesn’t matter. {A, B, C} is the same set as {C, A, B}
Distinct: Each element appears only once. There’s no such thing as “two copies” of the same element in a set
Well-defined: It must be clear whether any given object is in the set or not

Notation:

Curly braces denote sets: \(S = \{a, b, c\}\)
Capital letters typically denote sets: \(A, B, S, T\)
Lowercase letters typically denote elements: \(a, b, x, y\)
Membership: \(x \in S\) means “x is an element of S” (x is in S)
Non-membership: \(x \notin S\) means “x is not an element of S”

Examples from engineering data:

Example 1: The set of bridges in our monitoring system: \[B = \{\text{Roberto Clemente Bridge, Fort Pitt Bridge, Fort Duquesne Bridge}\}\]

Example 2: The set of sensor types: \[T = \{\text{accelerometer, strain\_gauge, temperature, displacement}\}\]

Example 3: The set of inspection years for a particular bridge: \[Y = \{2018, 2019, 2020, 2021, 2022, 2023, 2024\}\]

Set-builder notation:

Often, we define sets by specifying a property that elements must satisfy, rather than listing all elements. This is called set-builder notation:

\[S = \{x \mid \text{condition on } x\}\]

Read as: “S is the set of all x such that x satisfies this condition”

Engineering examples:

All bridges built after 2000: \(B_{new} = \{b \mid b \in Bridges \text{ and } b.year\_built > 2000\}\)
All sensors with readings exceeding a threshold: \(S_{high} = \{s \mid s \in Sensors \text{ and } s.max\_reading > 100\}\)
All engineers with civil specialization: \(E_{civil} = \{e \mid e \in Engineers \text{ and } e.specialty = \text{"Civil"}\}\)

Special sets:

Empty set (denoted \(\emptyset\) or \(\{\}\)): The set with no elements. For example, if no bridges were built in 1950, then \(\{b \mid b.year\_built = 1950\} = \emptyset\)
Universal set (denoted \(U\) or \(\Omega\)): The set of all elements under consideration in a given context. For our bridge database, the universal set might be all infrastructure assets in the city.

5.2.3.2.2 Subsets and Supersets

Sets can be related to each other through containment.

Definition: Set \(A\) is a subset of set \(B\) (written \(A \subseteq B\)) if every element of \(A\) is also an element of \(B\).

Mathematically: \(A \subseteq B\) if and only if \(\forall x: (x \in A \Rightarrow x \in B)\)

(Read as: “for all x, if x is in A, then x is in B”)

Definition: Set \(B\) is a superset of set \(A\) (written \(B \supseteq A\)) if \(A \subseteq B\). In other words, B contains all elements of A (and possibly more).

Proper subsets: \(A\) is a proper subset of \(B\) (written \(A \subset B\) or \(A \subsetneq B\)) if \(A \subseteq B\) and \(A \neq B\) (meaning B has at least one element not in A).

Engineering examples:

Example 1: Let \(S\) be all sensors, and \(S_{temp}\) be all temperature sensors. Then: \[S_{temp} \subset S\]

Temperature sensors are a proper subset of all sensors.

Example 2: Let \(B\) be all bridges, and \(B_{2020}\) be bridges inspected in 2020: \[B_{2020} \subseteq B\]

The bridges inspected in 2020 form a subset of all bridges.

Example 3: Consider these sets: - \(E\) = all engineers in the database - \(E_{civil}\) = engineers with civil engineering specialty - \(E_{senior}\) = engineers with more than 10 years experience - \(E_{lead}\) = senior civil engineers

Then: \(E_{lead} \subseteq E_{senior}\), \(E_{lead} \subseteq E_{civil}\), and both \(E_{senior} \subseteq E\) and \(E_{civil} \subseteq E\)

Important properties:

Every set is a subset of itself: \(A \subseteq A\)
The empty set is a subset of every set: \(\emptyset \subseteq A\) for any set \(A\)
Subset relation is transitive: if \(A \subseteq B\) and \(B \subseteq C\), then \(A \subseteq C\)

5.2.3.2.3 Set Membership and Cardinality

Set Membership:

We’ve already seen the membership notation:

\(x \in A\): “x is an element of A” or “x belongs to A”
\(x \notin A\): “x is not an element of A”

Example: If \(B = \{\text{Bridge A, Bridge B, Bridge C}\}\), then:

\(\text{Bridge A} \in B\) (Bridge A is in the set)
\(\text{Bridge D} \notin B\) (Bridge D is not in the set)

Important distinction: Be careful not to confuse membership (\(\in\)) with the subset relation (\(\subseteq\)):

\(\text{Bridge A} \in B\) (an individual bridge is an element of the bridge set)
\(\{\text{Bridge A}\} \subseteq B\) (a set containing one bridge is a subset of the bridge set)

Cardinality:

The cardinality of a set is the number of elements it contains. We denote it as \(|A|\) or \(\#A\) or sometimes \(card(A)\).

Examples:

If \(B = \{\text{Bridge A, Bridge B, Bridge C}\}\), then \(|B| = 3\)
If \(T = \{\text{accelerometer, strain\_gauge, temperature, displacement}\}\), then \(|T| = 4\)
The empty set has cardinality zero: \(|\emptyset| = 0\)

Engineering application: Cardinality is useful for counting:

How many sensors are installed? \(|Sensors|\)
How many bridges were inspected in 2024? \(|\{b \mid b.last\_inspection = 2024\}|\)
How many readings has a sensor generated? \(|\{r \mid r.sensor\_id = \text{"S001"}\}|\)

Properties:

Cardinality is always non-negative: \(|A| \geq 0\)
If \(A \subseteq B\), then \(|A| \leq |B|\) (subsets can’t be larger than their supersets)
If \(A \subset B\) (proper subset), then \(|A| < |B|\)

5.2.3.3 Operations on Sets

Just as we can perform arithmetic operations on numbers (addition, multiplication), we can perform operations on sets to create new sets. These operations are fundamental to database queries and data manipulation.

5.2.3.3.1 Union

The union of two sets contains all elements that are in either set (or both).

Definition: The union of sets \(A\) and \(B\), denoted \(A \cup B\), is: \[A \cup B = \{x \mid x \in A \text{ or } x \in B\}\]

Read as: “A union B is the set of all x such that x is in A or x is in B (or both)”

Key point: The “or” here is inclusive—an element is in the union if it’s in A, or in B, or in both.

Example 1: Simple sets

Let \(A = \{1, 2, 3\}\) and \(B = \{3, 4, 5\}\). Then: \[A \cup B = \{1, 2, 3, 4, 5\}\]

Note that 3 appears in both sets, but it only appears once in the union (sets contain distinct elements).

Example 2: Engineering application

Let \(B_{steel}\) be the set of steel bridges and \(B_{concrete}\) be the set of concrete bridges:

\[B_{steel} \cup B_{concrete} = \{\text{all bridges made of steel or concrete}\}\]

This represents all bridges built from either material.

Example 3: Sensor data

Let \(S_{2023}\) be sensors installed in 2023 and \(S_{2024}\) be sensors installed in 2024:

\[S_{2023} \cup S_{2024} = \{\text{all sensors installed in 2023 or 2024}\}\]

Properties of union:

Commutative: \(A \cup B = B \cup A\) (order doesn’t matter)
Associative: \((A \cup B) \cup C = A \cup (B \cup C)\)
Identity element: \(A \cup \emptyset = A\) (union with empty set does nothing)
Idempotent: \(A \cup A = A\) (unioning a set with itself gives the same set)

5.2.3.3.2 Intersection

The intersection of two sets contains only elements that are in both sets.

Definition: The intersection of sets \(A\) and \(B\), denoted \(A \cap B\), is: \[A \cap B = \{x \mid x \in A \text{ and } x \in B\}\]

Read as: “A intersection B is the set of all x such that x is in both A and B”

Example 1: Simple sets

Let \(A = \{1, 2, 3, 4\}\) and \(B = \{3, 4, 5, 6\}\). Then: \[A \cap B = \{3, 4\}\]

Only elements in both sets appear in the intersection.

Example 2: Engineering application

Let \(B_{old}\) be bridges built before 1950 and \(B_{steel}\) be steel bridges:

\[B_{old} \cap B_{steel} = \{\text{bridges that are both old AND made of steel}\}\]

This represents old steel bridges.

Example 3: Database query

Let \(E_{civil}\) be engineers with civil specialty and \(E_{senior}\) be engineers with >10 years experience:

\[E_{civil} \cap E_{senior} = \{\text{senior civil engineers}\}\]

This is exactly how database queries with multiple conditions work—you’re finding the intersection of multiple constraint sets.

Properties of intersection:

Commutative: \(A \cap B = B \cap A\)
Associative: \((A \cap B) \cap C = A \cap (B \cap C)\)
Identity element: \(A \cap U = A\) where \(U\) is the universal set
Zero element: \(A \cap \emptyset = \emptyset\) (intersection with empty set is always empty)
Idempotent: \(A \cap A = A\)

5.2.3.3.3 Set Difference

The set difference (or relative complement) contains elements that are in the first set but not in the second set.

Definition: The set difference of \(A\) and \(B\), denoted \(A \setminus B\) or \(A - B\), is: \[A \setminus B = \{x \mid x \in A \text{ and } x \notin B\}\]

Read as: “A minus B is the set of all x such that x is in A but not in B”

Important: Set difference is not commutative—\(A \setminus B\) is generally different from \(B \setminus A\).

Example 1: Simple sets

Let \(A = \{1, 2, 3, 4\}\) and \(B = \{3, 4, 5, 6\}\). Then: \[A \setminus B = \{1, 2\}\] \[B \setminus A = \{5, 6\}\]

Example 2: Engineering application

Let \(B\) be all bridges and \(B_{inspected}\) be bridges inspected this year:

\[B \setminus B_{inspected} = \{\text{bridges NOT yet inspected this year}\}\]

This gives you the bridges that still need inspection—a practical query!

Example 3: Sensor maintenance

Let \(S\) be all sensors and \(S_{faulty}\) be sensors reporting errors:

\[S \setminus S_{faulty} = \{\text{sensors that are working correctly}\}\]

Properties of set difference:

Not commutative: \(A \setminus B \neq B \setminus A\) (in general)
\(A \setminus \emptyset = A\) (removing nothing changes nothing)
\(A \setminus A = \emptyset\) (removing all elements leaves the empty set)
\(A \setminus B = \emptyset\) if and only if \(A \subseteq B\)

5.2.3.3.4 Complement

The complement of a set contains all elements in the universal set that are not in the given set.

Definition: The complement of set \(A\), denoted \(A^c\) or \(\overline{A}\) or \(A'\), is: \[A^c = \{x \mid x \in U \text{ and } x \notin A\}\]

where \(U\) is the universal set (the set of all elements under consideration).

Equivalently: \(A^c = U \setminus A\)

Important: The complement depends on what you define as your universal set.

Example 1: Simple example

Let \(U = \{1, 2, 3, 4, 5, 6, 7, 8, 9, 10\}\) and \(A = \{2, 4, 6, 8, 10\}\) (even numbers). Then: \[A^c = \{1, 3, 5, 7, 9\}\] (odd numbers)

Example 2: Engineering application

Let \(U\) be all bridges in the city and \(B_{old}\) be bridges built before 1950:

\[B_{old}^c = \{\text{bridges built in 1950 or later}\}\]

Example 3: Database application

Let \(U\) be all sensors and \(S_{active}\) be sensors currently transmitting data:

\[S_{active}^c = \{\text{sensors that are inactive or offline}\}\]

Properties of complement:

Double complement: \((A^c)^c = A\) (complement of complement gives back the original)
Complement of universal set: \(U^c = \emptyset\)
Complement of empty set: \(\emptyset^c = U\)
De Morgan’s Laws (very important):
- \((A \cup B)^c = A^c \cap B^c\)
- \((A \cap B)^c = A^c \cup B^c\)

De Morgan’s Laws in plain language:

“NOT (A or B)” is the same as “(NOT A) and (NOT B)”
“NOT (A and B)” is the same as “(NOT A) or (NOT B)”

These are fundamental to logic and database queries!

5.2.3.3.5 Cartesian Product

The Cartesian product creates a set of ordered pairs by combining elements from two sets. This operation is fundamental to understanding how database tables relate to each other.

Definition: The Cartesian product of sets \(A\) and \(B\), denoted \(A \times B\), is: \[A \times B = \{(a, b) \mid a \in A \text{ and } b \in B\}\]

Read as: “A cross B is the set of all ordered pairs (a,b) where a is in A and b is in B”

Important: The result is a set of ordered pairs—(a,b) is different from (b,a).

Example 1: Simple sets

Let \(A = \{1, 2\}\) and \(B = \{x, y\}\). Then: \[A \times B = \{(1,x), (1,y), (2,x), (2,y)\}\]

Notice: \(|A \times B| = |A| \times |B| = 2 \times 2 = 4\)

Example 2: Engineers and Bridges

Let \(E = \{\text{Alice, Bob}\}\) (engineers) and \(B = \{\text{Bridge A, Bridge B}\}\) (bridges). Then: \[E \times B = \{(\text{Alice, Bridge A}), (\text{Alice, Bridge B}), (\text{Bob, Bridge A}), (\text{Bob, Bridge B})\}\]

This represents all possible pairings of engineers with bridges—exactly what you get when you JOIN two database tables without any filtering condition!

Example 3: Sensor locations and types

Let \(L = \{\text{North, South, East, West}\}\) (locations) and \(T = \{\text{temperature, strain}\}\) (sensor types). Then: \[L \times T = \{(\text{North, temperature}), (\text{North, strain}), (\text{South, temperature}), \ldots\}\]

This gives all possible combinations of locations and sensor types—8 possibilities total.

Connection to Database Tables:

The Cartesian product is exactly what happens when you JOIN two tables without any conditions:

SELECT * FROM Engineers, Bridges;

This query produces \(|Engineers| \times |Bridges|\) rows—every engineer paired with every bridge. That’s why we usually add WHERE conditions to filter the Cartesian product down to meaningful relationships!

Properties of Cartesian product:

Not commutative: \(A \times B \neq B \times A\) (order matters in ordered pairs)
Cardinality: \(|A \times B| = |A| \times |B|\)
Associative (in a sense): \((A \times B) \times C \cong A \times (B \times C) \cong A \times B \times C\) (can extend to triples, etc.)
\(A \times \emptyset = \emptyset\) (product with empty set is empty)

Extended Cartesian product:

We can extend to multiple sets: \(A_1 \times A_2 \times \cdots \times A_n\) produces n-tuples (ordered sequences of n elements).

Database connection: A database table with n columns can be viewed as a subset of a Cartesian product:

\[\text{Bridge table} \subseteq \text{IDs} \times \text{Names} \times \text{Locations} \times \text{Years} \times \ldots\]

Not all possible combinations appear—only valid, meaningful bridge records. But the structure is fundamentally a Cartesian product filtered by constraints.

5.2.3.4 Visualizing Sets with Venn Diagrams

Venn diagrams are visual representations of sets and their relationships using overlapping circles (or other closed curves). They provide an intuitive way to understand set operations and relationships.

Basic structure:

Each set is represented by a circle or closed curve
The universal set \(U\) is typically represented by a rectangle containing all circles
Elements are points inside the circles
Overlapping regions represent intersections

How to read Venn diagrams:

The position of a point tells you which set(s) it belongs to:

Inside circle A only → element is in \(A \setminus B\)
Inside circle B only → element is in \(B \setminus A\)
In the overlap of A and B → element is in \(A \cap B\)
Outside both circles (but in rectangle) → element is in \((A \cup B)^c\)

Example: Bridge classification

Consider bridges classified by two properties: material (steel vs. not steel) and age (old vs. new).

Let: - \(S\) = set of steel bridges - \(O\) = set of old bridges (built before 1950)

The Venn diagram divides all bridges into four regions:

\(S \cap O\): Old steel bridges
\(S \cap O^c\): New steel bridges
\(S^c \cap O\): Old non-steel bridges
\(S^c \cap O^c\): New non-steel bridges

Translating between notation and diagrams:

Practice moving back and forth between symbolic notation and visual representation:

Set Expression	Description	Visual Region
\(A \cap B\)	Elements in both A and B	Overlap of circles
\(A \cup B\)	Elements in either A or B (or both)	Both circles combined
\(A \setminus B\)	Elements in A but not B	Part of A circle not overlapping B
\((A \cup B)^c\)	Elements in neither A nor B	Outside both circles
\(A^c \cap B^c\)	Elements in neither A nor B (same as above)	Outside both circles

Complex expressions:

Venn diagrams help verify identities like De Morgan’s Laws:

\((A \cup B)^c\) (complement of union) and \(A^c \cap B^c\) (intersection of complements) shade the same region
\((A \cap B)^c\) (complement of intersection) and \(A^c \cup B^c\) (union of complements) shade the same region

Three-set Venn diagrams:

With three sets, you get more complex overlapping patterns—up to 8 distinct regions representing all possible combinations of membership:

In none of the sets
In only A, only B, or only C
In A and B but not C, A and C but not B, or B and C but not A
In all three sets: \(A \cap B \cap C\)

Engineering application:

Consider monitoring system alerts with three conditions:

\(H\) = high strain readings
\(V\) = high vibration readings
\(T\) = temperature anomalies

Engineers might want to query different regions:

\(H \cap V \cap T\): Bridges with all three alert types (critical priority)
\(H \cup V \cup T\): Bridges with any alert type (all monitored bridges)
\(H \cap V \cap T^c\): High strain and vibration but normal temperature
\((H \cup V \cup T)^c\): Bridges with no alerts (all clear)

Venn diagrams make these queries visually intuitive.

5.2.3.5 Properties of Set Operations

Understanding the algebraic properties of set operations helps you manipulate set expressions, simplify queries, and reason about data.

5.2.3.5.1 Commutative Property

An operation is commutative if the order of operands doesn’t matter.

Commutative operations:

Union: \(A \cup B = B \cup A\)
- Example: (Steel bridges) ∪ (Old bridges) = (Old bridges) ∪ (Steel bridges)
Intersection: \(A \cap B = B \cap A\)
- Example: (Civil engineers) ∩ (Senior staff) = (Senior staff) ∩ (Civil engineers)

Non-commutative operations:

Set difference: \(A \setminus B \neq B \setminus A\) (generally)
- Example: (All sensors) (Faulty sensors) ≠ (Faulty sensors) (All sensors)
- The first gives working sensors; the second is empty!
Cartesian product: \(A \times B \neq B \times A\)
- Example: Engineers × Bridges ≠ Bridges × Engineers
- The ordered pairs \((e, b)\) and \((b, e)\) are different

5.2.3.5.2 Associative Property

An operation is associative if grouping doesn’t matter when combining three or more sets.

Associative operations:

Union: \((A \cup B) \cup C = A \cup (B \cup C)\)
- You can write \(A \cup B \cup C\) without ambiguity
- Example: (Steel bridges) ∪ (Concrete bridges) ∪ (Wood bridges) groups any way
Intersection: \((A \cap B) \cap C = A \cap (B \cap C)\)
- You can write \(A \cap B \cap C\) without ambiguity
- Example: Finding engineers who are civil AND senior AND experienced in bridges

Why this matters: Associativity means you can process multiple conditions in any order—useful for query optimization in databases.

5.2.3.5.3 Distributive Property

Distributive laws describe how one operation distributes over another.

Key distributive properties:

Intersection distributes over union: \[A \cap (B \cup C) = (A \cap B) \cup (A \cap C)\]

Example: Steel bridges that are (old or long) = (Steel bridges that are old) or (Steel bridges that are long)
Union distributes over intersection: \[A \cup (B \cap C) = (A \cup B) \cap (A \cup C)\]

Example: Sensors that are (temperature OR (strain AND high-precision)) = (temperature OR strain) AND (temperature OR high-precision)

Why this matters: These laws allow you to rewrite queries in equivalent forms, which can be more efficient to compute or clearer to understand.

Absorption laws (related):

\(A \cup (A \cap B) = A\)
\(A \cap (A \cup B) = A\)

These simplify redundant conditions.

5.2.3.5.4 Special Sets and Relationships

Empty Set (\(\emptyset\)):

Properties:

\(A \cup \emptyset = A\) (union identity)
\(A \cap \emptyset = \emptyset\) (intersection with empty is empty)
\(A \setminus \emptyset = A\)
\(\emptyset \subseteq A\) for all A (empty set is subset of everything)

Universal Set (\(U\)):

Properties:

\(A \cup U = U\)
\(A \cap U = A\) (intersection identity)
\(A \subseteq U\) for all A in the context
\(U^c = \emptyset\)

Complementary Sets:

Two sets \(A\) and \(B\) are complementary if:

\(A \cup B = U\) (together they cover everything)
\(A \cap B = \emptyset\) (they don’t overlap)
\(B = A^c\) (B is the complement of A)

Example: Active sensors and inactive sensors are complementary—every sensor is either active or inactive, but not both.

Disjoint Sets:

Two sets \(A\) and \(B\) are disjoint (or mutually exclusive) if: \[A \cap B = \emptyset\]

They have no elements in common.

Examples:

Suspension bridges and truss bridges are disjoint (a bridge can’t be both types)
Sensors installed in 2020 and sensors installed in 2021 are disjoint
Temperature sensors and strain gauges might be disjoint (if types don’t overlap)

Properties of disjoint sets:

If \(A\) and \(B\) are disjoint: \(|A \cup B| = |A| + |B|\)
This is the addition principle of counting

Partition:

A collection of sets \(A_1, A_2, \ldots, A_n\) is a partition of set \(S\) if:

They’re mutually disjoint: \(A_i \cap A_j = \emptyset\) for \(i \neq j\)
Their union is \(S\): \(A_1 \cup A_2 \cup \cdots \cup A_n = S\)
None are empty: \(A_i \neq \emptyset\)

Example: Bridge types (suspension, truss, arch, beam) partition the set of all bridges—every bridge is exactly one type.

Why partitions matter: They’re fundamental to classification systems and database table designs. Entity subclasses in ER diagrams often represent partitions.

5.2.3.6 Connecting Sets to ER Diagrams

Now we can bring everything together: ER diagrams describe structure, set theory provides the mathematical foundation, and database queries perform set operations.

5.2.3.6.1 Entity Sets as Sets

Every entity set in an ER diagram corresponds to a set in set theory:

The Bridge entity set → the set of all bridge records
The Sensor entity set → the set of all sensor records
Each entity (a specific bridge) → an element of the set

Attributes define subsets:

Bridges built after 2000 → \(\{b \in Bridges \mid b.year\_built > 2000\}\) (a subset)
Steel bridges → \(\{b \in Bridges \mid \text{"steel"} \in b.materials\}\) (a subset)

5.2.3.6.2 Relationships as Set Operations

One-to-many relationships can be expressed using sets:

For the relationship “Bridge is monitored by Sensors”:

For each bridge \(b\), there’s a set of sensors: \(Sensors(b) = \{s \in Sensors \mid s.bridge\_id = b.id\}\)
Finding all sensors on any bridge in set \(B_{subset}\): \[\bigcup_{b \in B_{subset}} Sensors(b)\]

Many-to-many relationships are subsets of Cartesian products:

The “Engineer Inspects Bridge” relationship is a subset of \(Engineers \times Bridges\): \[Inspects \subseteq Engineers \times Bridges\]

Not every \((engineer, bridge)\) pair is in the relationship—only those where that engineer actually inspected that bridge.

5.2.3.6.3 Queries as Set Operations

WHERE clause = intersection:

SELECT * FROM Bridges
WHERE material = 'steel' AND year_built > 2000;

This computes: \(B_{steel} \cap B_{new}\)

OR conditions = union:

SELECT * FROM Bridges
WHERE material = 'steel' OR material = 'concrete';

This computes: \(B_{steel} \cup B_{concrete}\)

NOT conditions = complement or difference:

SELECT * FROM Bridges
WHERE bridge_id NOT IN (SELECT bridge_id FROM Inspections WHERE year = 2024);

This computes: \(Bridges \setminus Bridges_{inspected\_2024}\)

JOIN = Cartesian product + filter:

SELECT * FROM Engineers JOIN Bridges ON Engineers.specialty = Bridges.type;

This computes: \(\{(e, b) \in Engineers \times Bridges \mid e.specialty = b.type\}\)

(A filtered subset of the Cartesian product)

Aggregation = cardinality:

SELECT COUNT(*) FROM Sensors WHERE type = 'temperature';

This computes: \(|S_{temperature}|\)

5.2.3.6.4 Design Implications

Understanding the set-theoretic foundation helps with:

Query optimization: Recognizing equivalent set expressions
- \(A \cap (B \cup C) = (A \cap B) \cup (A \cap C)\) suggests different query plans
Understanding results: Knowing what data you’ll get
- Cartesian product without JOIN condition explodes data size
- UNION removes duplicates (set union), UNION ALL doesn’t
Constraint specification: Defining data integrity rules
- Disjoint subclasses → partition constraint
- Foreign keys → subset constraint (child IDs must be in parent IDs)
Reasoning about coverage: Do your queries cover all cases?
- Have you partitioned the search space?
- Are conditions mutually exclusive or overlapping?

Example: Suppose you want to ensure all bridges are categorized by age. You might verify: \[(Bridges_{old} \cup Bridges_{new} = Bridges) \land (Bridges_{old} \cap Bridges_{new} = \emptyset)\]

This confirms old and new bridges form a partition—complete coverage, no overlap.