7  Module 6: The Structured Query Language (SQL)

7.1 Module Overview

7.1.1 Learning Objectives

By the end of this module, students will be able to:

  • Understand what SQL is and where it comes from, as well as its prevalence in industry and computing infrastructure today
  • Distinguish between the Data Description Language (DDL) and Data Manipulation Language (DML) groups of statements
  • Formulate queries using the following SQL statements, functions and clauses, as well as understand their relational algebra counterparts (where applicable):
    • SELECT, SELECT ... WHERE, SELECT ... GROUP BY, SELECT ... JOIN
    • Nested selects (SELECT ... SELECT)
    • Aggregate functions: SUM(), MAX(), AVG(), COUNT()
    • DDL statements: CREATE TABLE, ALTER, DROP
    • DML statements: INSERT, DELETE, UPDATE
  • Install and issue queries to a local DBMS such as MySQL (with MySQL Workbench as a client), or SQLite with its CLI interface
  • Translate relational algebra expressions to equivalent SQL queries and vice versa
  • Write relatively more complex SQL queries involving joins, aggregations, and subqueries

7.1.2 Topics Covered

  • Introduction to SQL
    • Historical context and development
    • SQL’s role in modern computing infrastructure
    • SQL standards and variations across database systems
    • The relationship between SQL and relational algebra
  • SQL Statement Categories
    • Data Definition Language (DDL): Defining database structure
    • Data Manipulation Language (DML): Querying and modifying data
  • Basic SQL Queries
    • SELECT statement structure
    • Filtering with WHERE clauses
    • Sorting results with ORDER BY
    • Eliminating duplicates with DISTINCT
    • Limiting results
  • SQL Operators and Functions
    • Comparison operators (=, <, >, <=, >=, <>)
    • Logical operators (AND, OR, NOT)
    • Pattern matching with LIKE
  • Aggregate Functions and Grouping
    • COUNT(), SUM(), AVG(), MIN(), MAX()
    • GROUP BY clause
    • Understanding the order of SQL clause execution
  • Joins in SQL
    • Cross joins (Cartesian product)
    • Inner joins (theta joins and natural joins)
    • Outer joins (LEFT, RIGHT, FULL)
    • Self-joins
  • Nested Queries (Subqueries)
    • Subqueries in WHERE clauses
    • Subqueries in FROM clauses (derived tables)
  • Data Definition with DDL
    • CREATE TABLE with constraints
    • Data types and domains
    • Primary keys and foreign keys
    • ALTER TABLE to modify structure
    • DROP TABLE to delete tables
  • Data Modification with DML
    • INSERT statements (single row and multiple rows)
    • UPDATE with conditions
    • DELETE with conditions
    • Transaction concepts (brief introduction)
  • Working with Local Database Systems
    • Installing and configuring MySQL or SQLite
    • Using database clients (MySQL Workbench, SQLite CLI)

7.1.3 Project Milestones

Submit a set of 5 slides for your group’s final project proposal presentation:

  1. Title of the project, and its members
  2. Motivation for your project
  3. Specific database problem you are aiming to solve and expected goals
  4. Proposed solution (and how it draws from concepts learned in this course)
  5. Remaining questions for your team about how to complete the project

7.2 Lecture Notes

7.2.1 From Relational Algebra to SQL

In Module 5, we studied the relational model and relational algebra—the mathematical foundation for working with relational databases. We learned operators like select (\(\sigma\)), project (\(\pi\)), join (\(\bowtie\)), union (\(\cup\)), and others that allow us to express queries precisely.

While relational algebra provides the theoretical framework, it’s not a practical language for working with real database systems. This is where SQL (Structured Query Language) comes in.

SQL is the practical implementation of relational algebra. Nearly every concept from Module 5’s relational algebra has a direct counterpart in SQL:

Relational Algebra SQL Equivalent
\(\sigma_{\text{condition}}(R)\) SELECT * FROM R WHERE condition
\(\pi_{\text{A,B}}(R)\) SELECT A, B FROM R
\(R \times S\) SELECT * FROM R CROSS JOIN S
\(R \bowtie S\) SELECT * FROM R NATURAL JOIN S
\(R \cup S\) SELECT * FROM R UNION SELECT * FROM S

However, SQL extends beyond pure relational algebra with additional features:

  • Aggregate functions (COUNT, SUM, AVG, MAX, MIN) for computing statistics
  • Grouping operations (GROUP BY) for organizing data
  • NULL values for representing missing or unknown data
  • Sorting (ORDER BY) for controlling output order
  • Data modification statements (INSERT, UPDATE, DELETE)
  • Schema definition statements (CREATE TABLE, ALTER TABLE, DROP TABLE)

SQL also differs from relational algebra in one important way: while relational algebra treats relations as sets (no duplicates), SQL treats tables as multisets or bags (duplicates allowed by default). You must explicitly use DISTINCT to eliminate duplicates.

Throughout this module, we’ll see how SQL translates the abstract concepts from Module 5 into executable commands that work with real database systems.

7.2.2 Introduction to SQL

7.2.2.1 What is SQL?

SQL (Structured Query Language) is a standardized programming language for managing and querying relational databases. It provides commands to:

  • Define database structures (tables, constraints, relationships)
  • Query data (retrieve information from tables)
  • Modify data (insert, update, delete records)
  • Control access (grant and revoke permissions)

SQL is both powerful and relatively simple to learn. Unlike procedural programming languages where you specify how to accomplish a task, SQL is declarative—you specify what you want, and the database management system (DBMS) figures out how to execute it efficiently.

7.2.2.2 Historical Context

SQL was developed in the early 1970s at IBM by Donald Chamberlin and Raymond Boyce. It was originally called SEQUEL (Structured English Query Language) and was designed for IBM’s System R database, one of the first implementations of Edgar Codd’s relational model.

The language was later renamed SQL, and in 1986, it became an ANSI (American National Standards Institute) standard. Major revisions followed:

  • SQL-86 (1986): First standard
  • SQL-92 (1992): Major expansion, widely implemented
  • SQL:1999 (1999): Added triggers, recursive queries
  • SQL:2003 (2003): Added window functions, XML support
  • SQL:2016 (2016): JSON support, pattern matching

Despite standardization, different database vendors have implemented SQL with variations. The core features remain consistent, but advanced features and syntax details can differ between systems like MySQL, PostgreSQL, Oracle, SQL Server, and SQLite.

7.2.2.3 SQL’s Role in Modern Computing

SQL is ubiquitous in modern computing infrastructure. It’s used in:

  • Web applications: Nearly all web apps use SQL databases for data persistence
  • Business intelligence: Data warehouses and analytics platforms rely on SQL
  • Financial systems: Banks and financial institutions use SQL extensively
  • Scientific computing: Research data management and analysis
  • Mobile apps: SQLite is embedded in iOS and Android applications
  • Government systems: Census data, tax systems, public records

According to various industry surveys, SQL consistently ranks as one of the most in-demand skills for data-related jobs. Learning SQL provides a foundation for working with structured data across virtually any domain.

7.2.2.4 SQL and Database Management Systems

SQL is a language standard, but you execute SQL commands through a Database Management System (DBMS)—software that manages the storage, retrieval, and manipulation of data. Common SQL-based DBMS include:

  • MySQL: Open-source, widely used for web applications
  • PostgreSQL: Open-source, advanced features, ACID compliant
  • SQLite: Lightweight, serverless, embedded in applications
  • Oracle Database: Enterprise-grade, commercial
  • Microsoft SQL Server: Enterprise-grade, Windows-focused
  • MariaDB: Open-source MySQL fork

For this course, we’ll focus on MySQL and SQLite as they’re freely available and well-suited for learning.

7.2.3 SQL Statement Categories: DDL vs DML

SQL statements fall into several categories based on their purpose. The two most important for this course are:

7.2.3.1 Data Definition Language (DDL)

DDL statements define and modify the structure of databases—they create, alter, and delete tables and other database objects.

Key DDL statements:

  • CREATE TABLE: Define a new table with columns and constraints
  • ALTER TABLE: Modify an existing table structure (add/remove columns, change constraints)
  • DROP TABLE: Delete a table and all its data

Example:

CREATE TABLE Bridge (
    Bridge_ID VARCHAR(10) PRIMARY KEY,
    Name VARCHAR(100),
    Year_Built INTEGER,
    Length REAL
);

DDL statements affect the schema (structure) of the database, not the data itself.

7.2.3.2 Data Manipulation Language (DML)

DML statements work with the data stored in tables—they query, insert, update, and delete records.

Key DML statements:

  • SELECT: Query data from tables (the most frequently used SQL statement)
  • INSERT: Add new records to a table
  • UPDATE: Modify existing records
  • DELETE: Remove records from a table

Example:

SELECT Name, Year_Built
FROM Bridge
WHERE Year_Built < 1950;

DML statements affect the instance (data) of the database, not the structure.

7.2.3.3 Other SQL Categories (Brief Overview)

For completeness, two other categories exist but are beyond this course’s scope:

  • Data Control Language (DCL): Manages permissions and access control (GRANT, REVOKE)
  • Transaction Control Language (TCL): Manages transactions (BEGIN, COMMIT, ROLLBACK)

We’ll briefly mention transactions in the DML section, but the focus of this module is on DDL and DML.

7.2.4 Basic SQL Queries

The SELECT statement is the foundation of SQL queries. It retrieves data from one or more tables based on specified criteria.

7.2.4.1 Basic SELECT Structure

The simplest form of a SELECT query has this structure:

SELECT column1, column2, ...
FROM table_name;

Example: Retrieve all bridge names and years built:

SELECT Name, Year_Built
FROM Bridge;

To retrieve all columns, use the asterisk (*):

SELECT *
FROM Bridge;

This corresponds to the relational algebra expression: Bridge (selecting all attributes).

7.2.4.2 Filtering Rows with WHERE

The WHERE clause filters rows based on a condition. This corresponds to the relational algebra select operator (\(\sigma\)).

SELECT column1, column2, ...
FROM table_name
WHERE condition;

Example: Find bridges built before 1950:

SELECT Name, Year_Built
FROM Bridge
WHERE Year_Built < 1950;

This translates to: \(\pi_{\text{Name, Year\_Built}}(\sigma_{\text{Year\_Built} < 1950}(\text{Bridge}))\)

7.2.4.3 Sorting Results with ORDER BY

The ORDER BY clause sorts the result set by one or more columns. Sorting doesn’t exist in pure relational algebra (since relations are unordered sets), but it’s essential for presenting results.

SELECT column1, column2, ...
FROM table_name
ORDER BY column1 [ASC|DESC], column2 [ASC|DESC], ...;
  • ASC: Ascending order (default)
  • DESC: Descending order

Example: List bridges by year built, oldest first:

SELECT Name, Year_Built
FROM Bridge
ORDER BY Year_Built ASC;

Example: List bridges by length, longest first:

SELECT Name, Length
FROM Bridge
ORDER BY Length DESC;

7.2.4.4 Eliminating Duplicates with DISTINCT

The DISTINCT keyword eliminates duplicate rows from results. This makes SQL behave like relational algebra (which treats relations as sets).

SELECT DISTINCT column1, column2, ...
FROM table_name;

Example: Find all unique sensor types:

SELECT DISTINCT Type
FROM Sensor;

Without DISTINCT, if multiple sensors have the same type, that type would appear multiple times in the result.

7.2.4.5 Limiting Results

Many SQL systems provide a way to limit the number of rows returned. The syntax varies by DBMS:

SQLite and MySQL:

SELECT column1, column2, ...
FROM table_name
LIMIT n;

Example: Get the 5 oldest bridges:

SELECT Name, Year_Built
FROM Bridge
ORDER BY Year_Built ASC
LIMIT 5;

Note: Different database systems use different syntax for this feature (TOP in SQL Server, FETCH FIRST in standard SQL).

7.2.4.6 Combining Clauses

You can combine these clauses in a single query. SQL processes them in a specific logical order:

  1. FROM: Identify the table
  2. WHERE: Filter rows
  3. SELECT: Choose columns
  4. DISTINCT: Eliminate duplicates
  5. ORDER BY: Sort results
  6. LIMIT: Restrict number of rows

Example: Find accelerometer sensors, show their IDs and types, sorted by ID:

SELECT DISTINCT Sensor_ID, Type
FROM Sensor
WHERE Type = 'accelerometer'
ORDER BY Sensor_ID
LIMIT 10;

7.2.5 SQL Operators and Conditions

SQL provides various operators for constructing conditions in WHERE clauses. These allow you to precisely filter data based on complex criteria.

7.2.5.1 Comparison Operators

Comparison operators test relationships between values:

Operator Meaning Example
= Equal to Year_Built = 1950
<> or != Not equal to Type <> 'accelerometer'
< Less than Length < 300
> Greater than Year_Built > 2000
<= Less than or equal Length <= 500
>= Greater than or equal Year_Built >= 1900

Example: Find bridges longer than 1000 meters:

SELECT Name, Length
FROM Bridge
WHERE Length > 1000;

7.2.5.2 Logical Operators

Logical operators combine multiple conditions:

  • AND: Both conditions must be true
  • OR: At least one condition must be true
  • NOT: Negates a condition

Example: Find bridges built between 1900 and 1950:

SELECT Name, Year_Built
FROM Bridge
WHERE Year_Built >= 1900 AND Year_Built <= 1950;

Example: Find sensors that are either accelerometers or strain gauges:

SELECT Sensor_ID, Type
FROM Sensor
WHERE Type = 'accelerometer' OR Type = 'strain_gauge';

Example: Find bridges not built in the 20th century:

SELECT Name, Year_Built
FROM Bridge
WHERE NOT (Year_Built >= 1900 AND Year_Built < 2000);

7.2.5.3 Pattern Matching with LIKE

The LIKE operator performs pattern matching on text. Two wildcards are available:

  • %: Matches any sequence of characters (including empty)
  • _: Matches exactly one character

Example: Find bridges with “Bridge” in their name:

SELECT Name
FROM Bridge
WHERE Name LIKE '%Bridge%';

Example: Find sensors with IDs starting with ‘S00’:

SELECT Sensor_ID, Type
FROM Sensor
WHERE Sensor_ID LIKE 'S00_';

Note: Pattern matching is case-insensitive in some database systems (like MySQL by default) but case-sensitive in others (like PostgreSQL).

7.2.5.4 NULL Values

SQL uses NULL to represent missing or unknown data. Testing for NULL requires special operators:

  • IS NULL: Tests if a value is NULL
  • IS NOT NULL: Tests if a value is not NULL

Example: Find sensors without an installation date:

SELECT Sensor_ID, Type
FROM Sensor
WHERE Install_Date IS NULL;

Important: You cannot use = NULL or <> NULL. The expressions value = NULL and value <> NULL always evaluate to unknown (not true or false), so they won’t work as intended.

7.2.5.5 Operator Precedence

When combining operators, SQL follows precedence rules (highest to lowest):

  1. Arithmetic operators (*, /, +, -)
  2. Comparison operators (=, <, >, etc.)
  3. NOT
  4. AND
  5. OR

Use parentheses to override precedence and make expressions clearer:

SELECT Name
FROM Bridge
WHERE (Year_Built < 1900 OR Year_Built > 2000) AND Length > 500;

7.2.6 Aggregate Functions and Grouping

Aggregate functions compute summary statistics across multiple rows. They extend beyond pure relational algebra, providing powerful analytical capabilities.

7.2.6.1 Common Aggregate Functions

SQL provides several built-in aggregate functions:

Function Purpose Example
COUNT(*) Count all rows COUNT(*)
COUNT(column) Count non-NULL values in column COUNT(Sensor_ID)
SUM(column) Sum of values SUM(Length)
AVG(column) Average of values AVG(Year_Built)
MIN(column) Minimum value MIN(Year_Built)
MAX(column) Maximum value MAX(Length)

Example: Count the total number of bridges:

SELECT COUNT(*)
FROM Bridge;

Example: Find the average bridge length:

SELECT AVG(Length)
FROM Bridge;

Example: Find the oldest bridge construction year:

SELECT MIN(Year_Built)
FROM Bridge;

7.2.6.2 Grouping with GROUP BY

The GROUP BY clause partitions rows into groups based on column values, then applies aggregate functions to each group separately.

SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;

Example: Count how many sensors are on each bridge:

SELECT Bridge_ID, COUNT(*) AS Sensor_Count
FROM Sensor
GROUP BY Bridge_ID;

This groups sensors by Bridge_ID, then counts how many sensors are in each group.

Example: Find the average length of bridges by decade:

SELECT (Year_Built / 10) * 10 AS Decade, AVG(Length) AS Avg_Length
FROM Bridge
GROUP BY (Year_Built / 10) * 10;

7.2.6.3 Rules for GROUP BY

When using GROUP BY:

  1. SELECT clause can only include:
    • Columns listed in GROUP BY
    • Aggregate functions
    • Constants
  2. You cannot select non-grouped, non-aggregated columns

Invalid example:

-- This will error: Name is not in GROUP BY
SELECT Bridge_ID, Name, COUNT(*)
FROM Sensor
GROUP BY Bridge_ID;

Valid version:

SELECT Bridge_ID, COUNT(*)
FROM Sensor
GROUP BY Bridge_ID;

7.2.6.4 Understanding SQL Clause Execution Order

SQL processes clauses in a specific logical order (not the order you write them):

  1. FROM: Identify tables
  2. WHERE: Filter rows
  3. GROUP BY: Group rows
  4. HAVING: Filter groups
  5. SELECT: Choose columns and apply functions
  6. ORDER BY: Sort results
  7. LIMIT: Restrict number of rows

This order explains why:

  • WHERE filters individual rows before grouping
  • You can’t use aggregate functions in WHERE (aggregates happen after WHERE)
  • HAVING filters groups after GROUP BY (so you can use aggregates there)

Example: Find bridges with multiple sensors (at least 2):

SELECT Bridge_ID, COUNT(*) AS Sensor_Count
FROM Sensor
GROUP BY Bridge_ID
HAVING COUNT(*) >= 2;

The HAVING clause filters groups after grouping, keeping only groups where the count is at least 2.

Example: Find the average length of bridges built after 1900, grouped by decade, showing only decades with average length > 500:

SELECT (Year_Built / 10) * 10 AS Decade, AVG(Length) AS Avg_Length
FROM Bridge
WHERE Year_Built > 1900
GROUP BY (Year_Built / 10) * 10
HAVING AVG(Length) > 500
ORDER BY Decade;

Execution order: 1. FROM Bridge 2. WHERE Year_Built > 1900 (filter individual bridges) 3. GROUP BY decade (group remaining bridges) 4. HAVING AVG(Length) > 500 (filter groups) 5. SELECT Decade, AVG(Length) (compute results) 6. ORDER BY Decade (sort output)

7.2.7 Joins in SQL

Joins combine data from multiple tables. They translate directly from Module 5’s relational algebra join operators.

7.2.7.1 Cross Join (Cartesian Product)

A cross join produces all possible combinations of rows from two tables. This corresponds to the relational algebra cross product (\(R \times S\)).

SELECT *
FROM Table1 CROSS JOIN Table2;

Alternative syntax (comma notation):

SELECT *
FROM Table1, Table2;

Example: Combine all bridges with all sensor types (not useful, but demonstrates the concept):

SELECT Bridge.Name, Sensor.Type
FROM Bridge CROSS JOIN Sensor;

Cross joins by themselves are rarely useful because they produce many meaningless combinations. They become useful when combined with WHERE conditions (which creates a theta join).

7.2.7.2 Inner Join (Theta Join)

An inner join combines rows from two tables based on a condition. This corresponds to the relational algebra theta join (\(R \bowtie_\theta S\)).

Syntax:

SELECT columns
FROM Table1
INNER JOIN Table2 ON condition;

Example: Find which sensors are on which bridges:

SELECT Bridge.Name, Sensor.Sensor_ID, Sensor.Type
FROM Bridge
INNER JOIN Sensor ON Bridge.Bridge_ID = Sensor.Bridge_ID;

This joins Bridge and Sensor tables where the Bridge_ID values match, giving us sensor information along with the bridge name.

Note: The INNER keyword is optional—JOIN alone means inner join.

7.2.7.3 Natural Join

A natural join automatically joins tables on columns with the same name. This corresponds to relational algebra’s natural join (\(R \bowtie S\)).

SELECT *
FROM Bridge NATURAL JOIN Sensor;

This automatically joins on Bridge_ID (the common column name).

Caution: Natural joins can be dangerous if tables have multiple columns with the same name—the join will use all of them, which may not be what you want. Explicit JOIN...ON is generally preferred for clarity.

7.2.7.4 Outer Joins

Outer joins include rows that don’t have matching values in the other table, filling in NULL for missing data.

LEFT OUTER JOIN: Include all rows from the left table, even if there’s no match in the right table.

SELECT Bridge.Name, Sensor.Sensor_ID
FROM Bridge
LEFT JOIN Sensor ON Bridge.Bridge_ID = Sensor.Bridge_ID;

This returns all bridges, even those without sensors (Sensor_ID will be NULL for bridges with no sensors).

RIGHT OUTER JOIN: Include all rows from the right table, even if there’s no match in the left table.

SELECT Bridge.Name, Sensor.Sensor_ID
FROM Bridge
RIGHT JOIN Sensor ON Bridge.Bridge_ID = Sensor.Bridge_ID;

This returns all sensors, even if their Bridge_ID doesn’t exist in the Bridge table.

FULL OUTER JOIN: Include all rows from both tables, filling with NULL where there’s no match.

SELECT Bridge.Name, Sensor.Sensor_ID
FROM Bridge
FULL OUTER JOIN Sensor ON Bridge.Bridge_ID = Sensor.Bridge_ID;

Note: SQLite doesn’t support FULL OUTER JOIN directly. You can simulate it using UNION of LEFT and RIGHT joins.

7.2.7.5 Self-Joins

A self-join joins a table to itself. This requires aliasing the table with different names.

Example: If engineers can supervise other engineers, find pairs of supervisor-supervisee:

SELECT E1.Name AS Supervisor, E2.Name AS Supervisee
FROM Engineer E1
INNER JOIN Engineer E2 ON E1.Employee_ID = E2.Supervisor_ID;

Here we alias Engineer as E1 and E2 to distinguish the two copies.

7.2.7.6 Joining Multiple Tables

You can chain multiple joins to combine data from three or more tables.

Example: Find sensor readings along with bridge names:

SELECT Bridge.Name, Sensor.Type, Reading.Value, Reading.Timestamp
FROM Bridge
INNER JOIN Sensor ON Bridge.Bridge_ID = Sensor.Bridge_ID
INNER JOIN Reading ON Sensor.Sensor_ID = Reading.Sensor_ID;

This joins three tables: Bridge → Sensor → Reading, connecting bridges to their sensors to the readings from those sensors.

7.2.7.7 Connection to Relational Algebra

SQL Join Relational Algebra Meaning
CROSS JOIN \(R \times S\) All combinations
JOIN...ON \(R \bowtie_\theta S\) Theta join with condition
NATURAL JOIN \(R \bowtie S\) Natural join on common attributes
LEFT JOIN No direct equivalent Include unmatched left rows
RIGHT JOIN No direct equivalent Include unmatched right rows

Outer joins extend beyond pure relational algebra, providing additional flexibility for handling incomplete data.

7.2.8 Nested Queries and Subqueries

A subquery (or nested query) is a SELECT statement embedded within another SQL statement. Subqueries allow you to break complex queries into logical steps.

7.2.8.1 Subqueries in WHERE Clauses

The most common use of subqueries is in WHERE clauses to filter based on results from another query.

Example: Find bridges that have at least one sensor installed:

SELECT Name
FROM Bridge
WHERE Bridge_ID IN (SELECT Bridge_ID FROM Sensor);

The inner query SELECT Bridge_ID FROM Sensor returns all bridge IDs that appear in the Sensor table. The outer query then selects bridges whose ID is in that list.

Example: Find the oldest bridge:

SELECT Name, Year_Built
FROM Bridge
WHERE Year_Built = (SELECT MIN(Year_Built) FROM Bridge);

The subquery finds the minimum year, then the outer query selects bridges built in that year.

7.2.8.2 Subqueries in FROM Clauses

You can use a subquery in the FROM clause, treating its result as a temporary table (called a derived table).

Example: Find bridges with above-average length:

SELECT B.Name, B.Length
FROM Bridge B, (SELECT AVG(Length) AS AvgLen FROM Bridge) AS Avg
WHERE B.Length > Avg.AvgLen;

The subquery computes the average length, then the outer query compares each bridge’s length to that average.

Alternative using WHERE subquery:

SELECT Name, Length
FROM Bridge
WHERE Length > (SELECT AVG(Length) FROM Bridge);

Both approaches work; choose based on readability and performance.

7.2.8.3 Subquery Operators

SQL provides special operators for working with subqueries:

IN: Check if a value appears in the subquery results

SELECT Name
FROM Bridge
WHERE Bridge_ID IN (SELECT Bridge_ID FROM Sensor WHERE Type = 'accelerometer');

Find bridges that have accelerometer sensors.

EXISTS: Check if the subquery returns any rows

SELECT Name
FROM Bridge B
WHERE EXISTS (SELECT * FROM Sensor S WHERE S.Bridge_ID = B.Bridge_ID);

Find bridges that have at least one sensor. EXISTS is often more efficient than IN for large datasets.

NOT IN / NOT EXISTS: Negate the above

SELECT Name
FROM Bridge
WHERE Bridge_ID NOT IN (SELECT Bridge_ID FROM Sensor);

Find bridges without any sensors.

7.2.8.4 When to Use Joins vs. Subqueries

Many queries can be written with either joins or subqueries. Guidelines:

  • Use joins when you need columns from multiple tables in the result
  • Use subqueries when you only need data from one table but need to filter based on another
  • Use EXISTS for checking existence (often faster than IN for large datasets)

Example with JOIN:

-- Using join: need columns from both tables
SELECT Bridge.Name, Sensor.Type
FROM Bridge
INNER JOIN Sensor ON Bridge.Bridge_ID = Sensor.Bridge_ID;

Example with subquery:

-- Using subquery: only need Bridge data
SELECT Name
FROM Bridge
WHERE Bridge_ID IN (SELECT Bridge_ID FROM Sensor);

7.2.9 Data Definition Language (DDL)

DDL statements define the structure of databases. They create, modify, and delete tables and other database objects.

7.2.9.1 CREATE TABLE

The CREATE TABLE statement defines a new table with its columns, data types, and constraints.

Basic syntax:

CREATE TABLE table_name (
    column1 datatype constraints,
    column2 datatype constraints,
    ...
    table_constraints
);

Example: Create the Bridge table:

CREATE TABLE Bridge (
    Bridge_ID VARCHAR(10) PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Year_Built INTEGER,
    Length REAL CHECK (Length > 0)
);

7.2.9.2 Common Data Types

SQL provides various data types for different kinds of data:

Numeric types:

  • INTEGER or INT: Whole numbers
  • REAL or FLOAT: Floating-point numbers
  • DECIMAL(p,s) or NUMERIC(p,s): Fixed-precision decimals (p = precision, s = scale)

String types:

  • VARCHAR(n): Variable-length string, max n characters
  • CHAR(n): Fixed-length string, exactly n characters
  • TEXT: Variable-length string, no specified maximum

Date/Time types:

  • DATE: Date (year, month, day)
  • TIME: Time (hour, minute, second)
  • TIMESTAMP or DATETIME: Date and time combined

Boolean:

  • BOOLEAN: True/false values (not supported in all systems)

Note: Data type names and availability vary by database system. The types listed here are common across most systems.

7.2.9.3 Constraints

Constraints enforce rules on data to maintain integrity.

PRIMARY KEY: Uniquely identifies each row

Bridge_ID VARCHAR(10) PRIMARY KEY

FOREIGN KEY: References a primary key in another table

CREATE TABLE Sensor (
    Sensor_ID VARCHAR(10) PRIMARY KEY,
    Bridge_ID VARCHAR(10),
    FOREIGN KEY (Bridge_ID) REFERENCES Bridge(Bridge_ID)
);

NOT NULL: Column cannot contain NULL values

Name VARCHAR(100) NOT NULL

UNIQUE: All values in column must be distinct

Email VARCHAR(100) UNIQUE

CHECK: Enforces a condition on column values

Year_Built INTEGER CHECK (Year_Built > 1800 AND Year_Built <= 2100)

DEFAULT: Provides default value when none specified

Install_Date DATE DEFAULT CURRENT_DATE

Example: Create Sensor table with multiple constraints:

CREATE TABLE Sensor (
    Sensor_ID VARCHAR(10) PRIMARY KEY,
    Type VARCHAR(50) NOT NULL,
    Bridge_ID VARCHAR(10) NOT NULL,
    Install_Date DATE,
    FOREIGN KEY (Bridge_ID) REFERENCES Bridge(Bridge_ID)
);

7.2.9.4 ALTER TABLE

The ALTER TABLE statement modifies an existing table structure.

Add a column:

ALTER TABLE Bridge
ADD COLUMN Material VARCHAR(50);

Drop a column:

ALTER TABLE Bridge
DROP COLUMN Material;

Modify a column (syntax varies by database system):

-- MySQL syntax
ALTER TABLE Bridge
MODIFY COLUMN Name VARCHAR(200);

Add a constraint:

ALTER TABLE Sensor
ADD CONSTRAINT check_type CHECK (Type IN ('accelerometer', 'strain_gauge', 'temperature'));

7.2.9.5 DROP TABLE

The DROP TABLE statement deletes a table and all its data permanently.

DROP TABLE table_name;

Example:

DROP TABLE Sensor;

Warning: This operation is irreversible. The table and all its data are permanently deleted.

Note: You cannot drop a table if other tables have foreign keys referencing it (unless you use CASCADE in some database systems).

7.2.9.6 Complete Example: Bridge Monitoring Database

Here’s how to create the bridge monitoring database schema:

CREATE TABLE Bridge (
    Bridge_ID VARCHAR(10) PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Year_Built INTEGER CHECK (Year_Built > 1800),
    Length REAL CHECK (Length > 0)
);

CREATE TABLE Sensor (
    Sensor_ID VARCHAR(10) PRIMARY KEY,
    Type VARCHAR(50) NOT NULL,
    Bridge_ID VARCHAR(10) NOT NULL,
    Install_Date DATE,
    FOREIGN KEY (Bridge_ID) REFERENCES Bridge(Bridge_ID)
);

CREATE TABLE Reading (
    Sensor_ID VARCHAR(10),
    Sequence_Number INTEGER,
    Timestamp TIMESTAMP NOT NULL,
    Value REAL,
    PRIMARY KEY (Sensor_ID, Sequence_Number),
    FOREIGN KEY (Sensor_ID) REFERENCES Sensor(Sensor_ID)
);

This creates three tables with appropriate primary keys, foreign keys, and constraints.

7.2.10 Data Manipulation Language (DML)

DML statements modify the data stored in tables. They insert new records, update existing records, and delete records.

7.2.10.1 INSERT Statement

The INSERT statement adds new rows to a table.

Insert a single row (specify columns explicitly):

INSERT INTO table_name (column1, column2, column3)
VALUES (value1, value2, value3);

Example: Add a bridge:

INSERT INTO Bridge (Bridge_ID, Name, Year_Built, Length)
VALUES ('B001', 'Roberto Clemente Bridge', 1928, 244);

Insert without specifying columns (values must match all columns in order):

INSERT INTO Bridge
VALUES ('B002', 'Fort Pitt Bridge', 1959, 1201);

Insert multiple rows at once:

INSERT INTO Sensor (Sensor_ID, Type, Bridge_ID, Install_Date)
VALUES
    ('S001', 'accelerometer', 'B001', '2020-01-15'),
    ('S002', 'strain_gauge', 'B001', '2020-01-15'),
    ('S003', 'temperature', 'B002', '2019-12-10');

Insert from a SELECT query:

INSERT INTO Archive_Bridge
SELECT * FROM Bridge
WHERE Year_Built < 1900;

This copies all bridges built before 1900 into an archive table.

7.2.10.2 UPDATE Statement

The UPDATE statement modifies existing rows.

Basic syntax:

UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

Example: Update a bridge’s length:

UPDATE Bridge
SET Length = 245
WHERE Bridge_ID = 'B001';

Update multiple columns:

UPDATE Bridge
SET Name = 'Clemente Bridge', Length = 245
WHERE Bridge_ID = 'B001';

Update based on a condition:

UPDATE Sensor
SET Install_Date = '2020-01-01'
WHERE Install_Date IS NULL;

Warning: If you omit the WHERE clause, all rows will be updated:

-- DANGER: This updates every bridge
UPDATE Bridge
SET Length = 0;

Always include a WHERE clause unless you intend to update all rows.

7.2.10.3 DELETE Statement

The DELETE statement removes rows from a table.

Basic syntax:

DELETE FROM table_name
WHERE condition;

Example: Delete a specific sensor:

DELETE FROM Sensor
WHERE Sensor_ID = 'S003';

Delete based on a condition:

DELETE FROM Reading
WHERE Timestamp < '2020-01-01';

Warning: If you omit the WHERE clause, all rows will be deleted:

-- DANGER: This deletes all sensors
DELETE FROM Sensor;

The table structure remains, but all data is removed.

Note: You cannot delete rows if other tables have foreign key constraints referencing them (unless CASCADE DELETE is enabled).

7.2.10.4 Transactions (Brief Introduction)

A transaction is a sequence of SQL operations that execute as a single unit. Either all operations succeed, or all are rolled back.

Basic transaction commands:

  • BEGIN or START TRANSACTION: Start a transaction
  • COMMIT: Save all changes permanently
  • ROLLBACK: Undo all changes since BEGIN

Example:

BEGIN;

INSERT INTO Bridge (Bridge_ID, Name, Year_Built, Length)
VALUES ('B005', 'New Bridge', 2024, 500);

UPDATE Sensor
SET Bridge_ID = 'B005'
WHERE Sensor_ID = 'S010';

COMMIT;

If any statement fails, you can ROLLBACK to undo all changes:

BEGIN;

DELETE FROM Bridge WHERE Bridge_ID = 'B001';
-- Oops, wrong bridge!

ROLLBACK;  -- Undoes the deletion

Transactions ensure data consistency, especially when multiple related operations must succeed together.

7.2.11 Setting Up and Using a Local DBMS

To practice SQL, you need a database management system installed on your computer. This section covers installation for MySQL and SQLite on macOS.

7.2.11.2 Option 2: MySQL (Industry Standard)

MySQL is a full-featured, client-server database system widely used in industry.

Installation on macOS:

  1. Install via Homebrew (recommended):

    brew install mysql
  2. Start MySQL server:

    brew services start mysql
  3. Secure the installation (optional but recommended):

    mysql_secure_installation
  4. Connect to MySQL:

    mysql -u root -p

    (Enter password when prompted, or just press Enter if no password set)

Installing MySQL Workbench (GUI client):

  1. Download MySQL Workbench from mysql.com/downloads
  2. Install the .dmg file
  3. Launch MySQL Workbench
  4. Create a connection to localhost (usually connects automatically)

Using MySQL Command Line:

Once connected to MySQL:

mysql> CREATE DATABASE mydb;
mysql> USE mydb;
mysql> CREATE TABLE Test (id INT, name VARCHAR(50));
mysql> INSERT INTO Test VALUES (1, 'Hello');
mysql> SELECT * FROM Test;

Useful MySQL commands:

  • SHOW DATABASES; — List all databases
  • USE database_name; — Switch to a database
  • SHOW TABLES; — List tables in current database
  • DESCRIBE tablename; — Show table structure
  • EXIT; — Close connection

7.2.11.3 Choosing Between SQLite and MySQL

Use SQLite if:

  • You want the simplest setup
  • You’re working on a single computer
  • File-based databases suit your needs

Use MySQL if:

  • You want industry-standard experience
  • You prefer a GUI tool (MySQL Workbench)
  • You plan to learn database administration

Both are excellent choices for this course. The SQL syntax is nearly identical for the topics we cover.

7.2.11.4 Next Steps

Once you have a DBMS installed, you’re ready to create databases and practice SQL. Your instructor will provide scripts to set up example databases for exercises and assignments.

7.2.12 SQL vs Relational Algebra

This section provides a side-by-side comparison of relational algebra expressions (from Module 5) and their SQL equivalents. Understanding these correspondences helps you translate between the mathematical foundation and practical implementation.

7.2.12.1 Basic Operations

Operation Relational Algebra SQL
Select all \(\text{Bridge}\) SELECT * FROM Bridge;
Project columns \(\pi_{\text{Name, Year\_Built}}(\text{Bridge})\) SELECT Name, Year_Built FROM Bridge;
Filter rows \(\sigma_{\text{Year\_Built} < 1950}(\text{Bridge})\) SELECT * FROM Bridge WHERE Year_Built < 1950;
Combined \(\pi_{\text{Name}}(\sigma_{\text{Length} > 500}(\text{Bridge}))\) SELECT Name FROM Bridge WHERE Length > 500;

7.2.12.2 Set Operations

Operation Relational Algebra SQL
Union \(R \cup S\) SELECT * FROM R UNION SELECT * FROM S;
Intersection \(R \cap S\) SELECT * FROM R INTERSECT SELECT * FROM S;*
Difference \(R - S\) SELECT * FROM R EXCEPT SELECT * FROM S;*

*Note: SQLite doesn’t support INTERSECT or EXCEPT. Use IN/NOT IN with subqueries as workarounds.

7.2.12.3 Joins

Operation Relational Algebra SQL
Cross product \(\text{Bridge} \times \text{Sensor}\) SELECT * FROM Bridge CROSS JOIN Sensor;
Theta join \(\text{Bridge} \bowtie_{\text{Bridge.Bridge\_ID} = \text{Sensor.Bridge\_ID}} \text{Sensor}\) SELECT * FROM Bridge JOIN Sensor ON Bridge.Bridge_ID = Sensor.Bridge_ID;
Natural join \(\text{Bridge} \bowtie \text{Sensor}\) SELECT * FROM Bridge NATURAL JOIN Sensor;

7.2.12.4 Complex Queries

Example 1: Find names of bridges with accelerometer sensors

Relational Algebra: \[\pi_{\text{Name}}(\text{Bridge} \bowtie (\sigma_{\text{Type} = \text{'accelerometer'}}(\text{Sensor})))\]

SQL:

SELECT Bridge.Name
FROM Bridge
JOIN Sensor ON Bridge.Bridge_ID = Sensor.Bridge_ID
WHERE Sensor.Type = 'accelerometer';

Example 2: Find bridges without any sensors

Relational Algebra: \[\pi_{\text{Bridge\_ID}}(\text{Bridge}) - \pi_{\text{Bridge\_ID}}(\text{Sensor})\]

SQL:

SELECT Bridge_ID
FROM Bridge
WHERE Bridge_ID NOT IN (SELECT Bridge_ID FROM Sensor);

7.2.12.5 Key Differences

Beyond syntax, SQL and relational algebra differ in important ways:

Aspect Relational Algebra SQL
Duplicates Relations are sets (no duplicates) Tables allow duplicates (use DISTINCT to eliminate)
NULL values Not part of theory NULL represents missing/unknown data
Ordering Relations are unordered sets ORDER BY controls output order
Aggregation Not in pure relational algebra COUNT, SUM, AVG, etc.
Grouping Not in pure relational algebra GROUP BY clause

7.2.12.6 SQL Extensions Beyond Relational Algebra

SQL provides features that extend beyond pure relational algebra:

  • Aggregate functions: COUNT, SUM, AVG, MIN, MAX
  • Grouping: GROUP BY and HAVING
  • Sorting: ORDER BY
  • NULL handling: IS NULL, IS NOT NULL
  • String operations: LIKE for pattern matching
  • Date/time operations: Date arithmetic and formatting
  • Subqueries: Nested SELECT statements
  • Data modification: INSERT, UPDATE, DELETE
  • Schema definition: CREATE, ALTER, DROP

These extensions make SQL more practical and powerful than pure relational algebra, while maintaining the same fundamental concepts.

7.2.12.7 Translation Practice

When translating between relational algebra and SQL:

  1. Identify the operations: What relational algebra operators are used?
  2. Map to SQL clauses:
    • \(\sigma\) → WHERE
    • \(\pi\) → SELECT columns
    • \(\bowtie\) → JOIN
    • \(\cup, -, \cap\) → UNION, EXCEPT, INTERSECT
  3. Apply SQL clause order: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY
  4. Handle SQL-specific features: Consider duplicates, NULLs, ordering

With practice, translating between these notations becomes second nature, allowing you to think mathematically about queries while writing practical SQL code.