Lab 5: SQL

Overview

In this lab, you will be working with PostgreSQL to create a database for some sample data.

To receive credit for this lab, you MUST show your work to the TA during the lab, and push to the github by the deadline. Please note that submissions will be due right before your respective lab sessions in the following week. For Example, If your lab is on this Friday 10 AM, the submission deadline will be next Friday 10 AM. There is no late policy for labs.

Learning Objectives

LO1. Learn to read and construct ER diagrams
L02. Understand the basics of Data Definition Language (DDL)
L03. Use SQL to query and modify tables
L04. Learn to use more advanced SQL features like CTEs, Group By, and Joins

Part A

Pre-Lab Quiz

Complete Pre-Lab quiz on Canvas before your section's lab time.

Part B

ER Diagram

What is an ER diagram?

An entity-relationship (ER) diagram is a blueprint that shows how "entities" such as people, things, and concepts are related to each other in a system. ER diagrams are most commonly used to design or debug relational databases in software development, business informatics, education, and research. Also known as ERD or ER model, it uses a predefined set of symbols such as rectangles, diamonds, ellipses, and connecting lines to represent the interconnectivity of entities, relationships, and their attributes.

Below is the ER Diagram for the database that we will be using for this lab. Please refer to it while writing queries in the later portion of the lab.

🎥Recording output

Nothing to be recorded here.

Directory Structure

Clone your GitHub repository

info

You need to accept the invite to the GitHub classroom assignment to get a repository.

Github Classroom Assignment

For the next two steps, make sure to edit the name of the repository after the copying the command into your terminal.

Using an SSH key
Using a Personal Access Token (PAT)

git clone git@github.com:CU-CSCI3308-Fall2024/lab-5-sql-<YOUR_USER_NAME>.git

git clone https://github.com/CU-CSCI3308-Fall2024/lab-5-sql-<YOUR_USER_NAME>.git

Navigate to the repository on your system.

cd lab-5-sql-<YOUR_USER_NAME>

The website directory structure contains the SQL files for this lab. The list below contains all the files that are provided and the ones you will submit as well. Please make sure to follow the guidelines specified here.

├── data
│   ├── actors.sql
│   ├── genres.sql
│   ├── movies.sql
│   ├── movies_to_actors.sql
│   ├── movies_to_genres.sql
│   └── platforms.sql
├── entrypoint.d
│   ├── 00_create.sql
│   ├── 01_alter.sql
│   ├── 02_insert.sql
│   ├── 03_verify.sql
│   └── 04_insert.sql
├── queries
│   ├── query_01.sql
│   ├── query_02.sql
│   ├── query_03.sql
│   ├── query_04.sql
│   ├── query_05.sql
│   ├── query_06.sql
│   ├── query_07.sql
│   ├── query_08.sql
│   ├── query_09.sql
│   ├── query_10.sql
│   └── testdata
│       ├── ...
├── docker-compose.yaml
├── README.md
└── test.sh

Initializing Docker Container

In your repository, you will find a docker-compose.yaml file with the following contents:

docker-compose.yaml
version: '3.9'
services:
  db:
    image: 'postgres:latest'
    env_file: .env
    expose:
      - '5432'
    volumes:
      - postgres-data:/var/lib/postgresql/data
      - ./data:/data:ro
      - ./entrypoint.d/:/docker-entrypoint-initdb.d:ro

volumes:
  postgres-data: {}

Create a .env file in the root (lab5-sql-\<your_username>) folder to store environment variables to setup postgres.

.env
# database credentials
POSTGRES_USER="postgres"
POSTGRES_PASSWORD="pwd"
POSTGRES_DB="movies_db"

You should not commit this file to GitHub (it is in the .gitignore file, so this is automatic.

Start the PostgreSQL docker container with the docker-compose.yaml file:


docker-compose up -d

Now that the container is running in the background, connect to PostgreSQL shell:

docker-compose exec db psql -U postgres

PostgreSQL

Create a movies database

create database movies_db;

Connect to the movies database


\c movies_db;

Here is a list of some useful terminal commands for PostgreSQL that you will need for the later tasks in the lab.

To list all of the databases in the current PostgreSQL database server, use the \l command

\l

To list all of the tables in the current database, use the \dt command

\dt

To describe a table such as the column, type, modifiers of columns, etc., use \d followed by the table name like so


\d table_name

To view all of the data associated with a table, select all of the columns from a table like this:

SELECT * FROM table_name;

To display the command history, use the \s command

\s

Comments

Single line comments start with --, while multi-line comments start with /* and end with */. Any text between /* and */ will be ignored while executing the .sql files. Use multi-line comments to include formatted query outputs in sql files.

Example:

-- Select all the columns of all the records in the Customers table:

SELECT * FROM customers;
/*
| customer_id | customer_name |
|    123      |    David      |
*/

A. Data Definition Language (DDL)

A.1. CREATE table

Creating the tables

To actually create a table we must start with the "CREATE TABLE" command. Follow this with the name of the table you want to create. To make sure the script is re-runnabe you should use "IF NOT EXISTS" [CREATE TABLE IF NOT EXISTS table_name] to your script to ensure postgreSQL doesn't try to recreate a table (handy when working with long/reused scripts).

PLEASE NOTE: While the commands are demonstrated here in uppercase, postgreSQL is NOT case sensitive. It is convention to list the postgreSQL commands in upper case.

Syntax:

CREATE TABLE table_name (
  column_name1 DATATYPE,
  column_name2 DATATYPE
);

Example:

Following is an example of the CREATE TABLE command.

caution

DO NOT MAKE THIS TABLE. Use this as a guide to create the necessary tables.

/* Create a table names products with the mentioned columns
having the datatypes mentioned */
CREATE TABLE products (
  product_id SERIAL PRIMARY KEY /* the primary key for each entry */,
  product_name VARCHAR(200) NOT NULL,
  quantity_per_unit VARCHAR(20),
  unit_price FLOAT,
  units_in_stock SMALLINT,
  units_on_order SMALLINT,
  reorder_level SMALLINT,
  discount DECIMAL,
  category_id SMALLINT
);

caution

Just about every postgreSQL command ends with a semicolon. If you don't add one, then the terminal will simply wait until you enter it. MAKE SURE TO ADD YOUR SEMICOLONS!

The table will look something like this with no data inserted.

product_id	product_name	quantity_per_unit	unit_price	units_in_stock	units_on_order	reorder_level	discount	category_id

You may have noticed certain keywords in the query such as SERIAL, PRIMARY KEY and NOT NULL. Here are some definitions for your reference.

SERIAL

The SERIAL command creates an auto-increment that is managed by the database. Each time a new entry is added to the table, the SERIAL column is set and the value is incremented by 1

PRIMARY KEY

The PRIMARY KEY defines a unique identifier for each row of the table. THIS MUST BE A UNIQUE VALUE! This is how the database differentiates the various rows/entries of the table.

NOT NULL

The NOT NULL attribute requires that a field be set when a new entry is added to the table. This is a design choice, so only set a field to NOT NULL, when you really want to ensure that data is entered. For example, the slogan field does not have "NOT NULL" listed. Meaning that you can add a new item to the store table and skip the slogan field, but must provide a name, qty, and price.

Data Types

You can find the official documentation that lists all of PostgreSQL's supported data types here

VARCHAR(#)

A variable-length character string, where # defines the maximum number of characters. This is a go-to option for defining strings that need to be set a maximum length. As a note, make sure to use single quotes ' ' when working with VARCHAR data.

TEXT

The TEXT type is fairly similiar to the VARCHAR(#) data type, but removes the character limit.

Whole Numbers

Whole numbers, or integers, can come in various sizes (smallint, int, and bigint). You should choose which one to use based on the specific data you hope to store.

Name	Number of Bytes	Range	Recommended Use
smallint	2	-32,768 to 32,767	Only use when you KNOW your values won't exceed its range (ex. the height of a person in inches or the age of a person in years)
int or integer	4	-2,147,483,648 to 2,147,483,647	Considered the "work-horse" data type and most often used for giving you a good balance between range and data storage.
bigint	8	-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807	Only use when you KNOW you are dealing with very large values (ex. number of transactions Amazon handles in a year).

Decimal Numbers

Decimal Numbers come in three different option - real, double precision, and numeric(#, #). Like the whole numbers above, the type you use depends on what kind of information you hope to store.

Name	Precision	Recommended Use
real	6 Decimal Places	Having up to 6 Decimal Places, means that real will work for "most cases" when dealing with decimal numbers.
double precision	15 Decimal Places	Use when you're dealing with scientific data or small decimal information.
numeric(precision, scale)	precision defines the total number of digits in a value scale defines the number of decimal places in a value Example: 12345.67 - this value has a precision of 7 (7 total digits) and a score of 2 (2 decimal places)	Use this when you are dealing with currencies or other data that requires a maximum number of digits. According to the documentation: Numeric values are physically stored without any extra leading or trailing zeroes. Thus, the declared precision and scale of a column are maximums, not fixed allocations.

DATE

Date values, in postgreSQL, are in the yyyy-mm-dd format - '2019-11-25'. To work with dates see the list of handy date functions below:

Function Name & Example	Function Description
to_date(date_text, date_format)	The to_date() function will convert a string literal into a date value, using the provided date formate. Further details here
to_date('20191125', 'yyyymmdd')
NOW():: DATE	To retrieve the current date on the server you can use the NOW() function, with the :: DATE option added to it we will only get the current date (removes the time information). The other option is to retrieve the CURRENT_DATE which will is in the default format of 'yyyy-mm-dd'.
SELECT NOW()::DATE
CURRENT_DATE
SELECT CURRENT_DATE;
Further Examples	If you want to learn more about working with dates, check out this resource here

To Do - Create tables in movies DB

For this activity, you will be using the file entrypoint.d/00_create.sql.

tip

It would be easier for you to write your CREATE queries in this file before executing them on the terminal.

You need to choose suitable datatypes for the columns and identify and assign primary keys in the tables. You may refer to the ER diagram provided at the top of this page for this activity.

caution

Make sure you are connected to the movies database.

tip

Please refer the ER Diagram above to understand the datatypes for the necessary variables.

Create movies table with the following columns:
id
name
duration
year_of_release
gross_revenue
country
language
imdb_rating
platform_id
Create actors table with the following columns
id
name
agency
active_since
location
Create platforms table with the following columns
id
name
viewership_cost
Create genres table with the following columns
id
name
Create movies_to_actors table with the following columns
movie_id
actor_id
Create movies_to_genres table with the following columns
movie_id
genre_id

info

Multi-line Comments: Multi-line comments start with /* and end with */. Any text between /* and */ will be ignored while executing the .sql files.

Example:

/* Select all the columns
of all the records
in the Customers table: */

SELECT * FROM Customers;
/*
| customer_id | customer_name |
|    123      |    David      |
*/

🎥Recording output

File name: entrypoint.d/00_create.sql This file should include the CREATE queries and the outputs of the commands in this file.
The outputs copied in the entrypoint.d/00_create.sql file should be commented out using Multi-line comments. (/* */) as shown above.

A.2. ALTER table

There could arise scenarios that need you to alter the structure of table. This could mean adding/updating/deleting a column/constraint. In such situations,we make use of the ALTER TABLE command.

Syntax:

ALTER TABLE table_name
ADD COLUMN new_column_name data_type constraint;

Example:

/* Alter the table products to change the constraint
of foreign key to category_id column */
ALTER TABLE products
ADD CONSTRAINT category_id FOREIGN KEY (category_id) REFERENCES categories (category_id);

To Do - Alter Table

For this activity, you will be using the file entrypoint.d/03_alter.sql.

tip

It would be easier for you to write your ALTER queries in this file before executing them on the terminal.

Make the column platform_id a FOREIGN KEY in the movies table. You need to determine which column from which table this column would reference.

🎥Recording output

File name: entrypoint.d/01_alter.sql. This file should include the ALTER query and their outputs.
The outputs copied in the entrypoint.d/01_alter.sql file should be commented out using Multi-line comments. (/* */) as shown above.

A.3. DROP table

In case you want to get remove a table or in other words delete a table - DROP TABLE is the command to use.

caution

Please be very careful when using the DROP command because it will erase the table, the data within the table and all the metadata associated with that table.

Syntax:

DROP TABLE IF EXISTS table_name;

Example:

DROP TABLE IF EXISTS products;

Cascade is used to remove the specified records and foreign keys referencing them in other tables.

🎥Recording output

Nothing to record and output here.

B. Data Manipulation Language (DML)

B.1. INSERT INTO table

Insert Provided Data

You will now insert rows from the provided sql files under data in your repository We mounted the data folder in your repository to /data in the container. Replace <filename> with the actual file name for each table.

\i /data/<filename>.sql;

🎥Recording output

Add each of the 6 commands to entrypoint.d/02_insert.sql.

Verify that the data was correctly inserted. Replace <table_name> with each table.

select count(*) from <table_name>;

Tables and expected number of rows in each table

Table Name	No. of Rows
Movies	26
Actors	20
Platforms	7
Genres	8
movies_to_actors	61
movies_to_genres	26

info

Multi-line Comments: Multi-line comments start with /* and end with */. Any text between /* and */ will be ignored while executing the .sql files.

Example:

/* Select all the columns
of all the records
in the Customers table: */

SELECT * FROM Customers;
/*
| customer_id | customer_name |
|    123      |    David      |
*/

🎥Recording output

1. Copy the verification queries and their outputs to show the number of rows in each table to a file. You must use /* */ to comment out the outputs of the verification queries.

File name: entrypoint.d/03_verify.sql

Now let's practice inserting some new rows of our own

The INSERT query will add records to the table.

Syntax:

INSERT INTO table_name (column_name1, column_name2, column_name3,...column_nameN)
  VALUES (value1, value2, value3,...valueN), /* First row  */
  (value1, value2, value3,...valueN);        /* Second row */

Example:

/* Insert the values into corresponding columns in the table products */
INSERT INTO products (product_name, quantity_per_unit,  unit_price, units_in_stock, units_on_order, reorder_level, discount, category_id) VALUES
  ('Unison Composition Book', '3',  10.4, 100, 50, 2, no, 1);

Insert with return inserted row

The "returning *" clause at the end of the query will return all columns for the newly inserted record.

INSERT INTO products (product_name, quantity_per_unit,  unit_price, units_in_stock, units_on_order, reorder_level, discount, category_id) VALUES
  ('Unison Composition Book', '3',  10.4, 100, 50, 2, no, 1) returning * ;

To Do - You should insert atleast 2 rows in each table. (The data should satisfy the constraints on the tables)

For this activity, you will be using the file entrypoint.d/04_insert.sql.

tip

It would be easier for you to write your INSERT queries in this file before executing them on the terminal.

info

Multi-line Comments: Multi-line comments start with /* and end with */. Any text between /* and */ will be ignored while executing the .sql files.

Example:

/* Select all the columns
of all the records
in the Customers table: */

SELECT * FROM Customers;
/*
| customer_id | customer_name |
|    123      |    David      |
*/

🎥Recording output

File name: entrypoint.d/04_insert.sql This file should contain all INSERT queries and their outputs. The outputs copied in the entrypoint.d/04_insert.sql file should be commented out using Multi-line comments. (/* */) as shown above.

B.2. UPDATE table data

More often than not, there can be a need to update the data in a table. For that purpose, we can use UPDATE table command. You need to make sure to specify a condition, using the WHERE clause when you update rows in a table otherwise all the rows in the table will be updated. We'll learn more about the WHERE clause in the querying section of this lab.

Syntax:

UPDATE table_name
SET column1 = value1,
    column2 = value2,
    ...
WHERE condition;

🎥Recording output

Nothing to record here

B.3. DELETE table data

In some cases you may need to remove a row from the table. DELETE is the command to use. Similar to the UPDATE command, you need to make sure to specify a condition, using the WHERE clause when you update rows in a table otherwise you may end of deleting all the rows in the table. We'll learn more about the WHERE clause in the querying section of this lab.

Syntax:

DELETE FROM table_name
WHERE condition;

🎥Recording output

Nothing to record here

C. Data Query Language (DQL)

C.1. Basic Queries

In this section we will learn some important commonly used queries in the premise of products database seen previously.

SELECT Statement

Used to select and display rows of data from table.

Syntax:

SELECT column_name FROM table_name WHERE condition_is_true;

Select Everything:

The asterisk (*) or "star" acts as a wild card in our select statement. When we don't want to limit the number of fields/columned returned we can use * to select all of them.

Example:

/* SELECT all the rows from table products */
SELECT * FROM products;

The FROM statement defines where we want to retrieve our data (this is usually a table name, but could also be a view or even a subquery).

Limit Our Columns:

If we want to choose and display only selected columns instead of all columns, we can do so by the following way.

Example:

/* Only show the product_name and quantity_per_unit columns from table products. */
SELECT product_name, quantity_per_unit FROM products;

In the above example, product_name and quantity_per_unit are the names of columns that we are choosing to display out of the many columns that might be present in the table products. This is great when we have large tables with a large number of rows or when we want to display columns side-by-side for improved readability.

Limit Our Rows:

If we want to select only some rows based on one or more conditions, we can use where clause. This clause helps us limit the number or rows returned. This is a true/false boolean statement that the database evaluates for each row. If the row evaluates to true, its information is returned. Otherwise the row is skipped over/ignored.

Example:

/* Show the rows in columns product_name, quantity_per_unit in table
products where unit_price is less than 1 dollar*/
SELECT product_name, quantity_per_unit FROM products WHERE unit_price < 1;

Aliases

Basic Syntax

SELECT column_name AS alias_name FROM table_name as table_alias_name

Rename A Column

 SELECT item_name, qty AS quantity FROM store;

AS alias_name: Sometime our columns have short or abbreviated names (like qty). To help improve readability when we run our queries, we can use an alias to temporarily rename our column. In this example we are altering qty to the complete word quantity. But what if we wanted to work with multiple words? Then we can also place our alias in double quotes: qty as "Quantity of Items in Stock".

Rename A Table

 SELECT item_name, qty FROM store as retail_establishment;

AS table_alias_name: Aliases can also be applied to tables, but we don't usually do it for making our readouts more readable. Instead these are uses for helping us handle subqueries and joins. We'll cover these use cases in the next section.

Combining Fields

 SELECT item_name || ' : ' || slogan AS catch_phrase FROM store WHERE slogan IS NOT NULL;

item_name || ' : ' || slogan: The last option we'll discuss for using aliases is to help us rename concatenated or calculated fields. We use the double pipe || for string concatenation in SQL. So this statement states that we will be concatenating the item_name column and slogan columns, with a colon in between. The where clause was added to remove the rows that lack a slogan.

COUNT

The COUNT() function will return the number of rows that match a query.

COUNT(*)

Our first example is a quick way to check how many rows are in a table (a great way to see if an insert statement or query worked correctly!). The second example will incorporate a where clause, letting us now how many rows meet a specific condition.

Example:

/*count the number of rows in the products table */
SELECT COUNT(*) FROM products;
/* count the number of products that cost more than 1 dollar */
SELECT COUNT(*) FROM products WHERE unit_price > 1;

COUNT(product_id)

If a column name is listed inside of the COUNT() function, then rows with null fields will be ignored. In our now updated examples, we list the slogan column (which can be null). This means all of rows without a product_id are completely ignored by the COUNT() function.

Example:

/*count the number of products in the products table which have a product_id*/
SELECT COUNT(product_id) FROM products;
/* Count the number of products that cost more than 1 dollars */
SELECT COUNT(product_id) FROM products WHERE unit_price > 1;

COUNT(DISTINCT prices)

By adding the DISTINCT keyword, we can limit our search to ignore duplicates. In this example, Gold Fish & Pringles share the same price, so collectively they only count as 1 entry! Use DISTINCT to help you determine how varied your table data is. Ignore Null & Duplicates

Example:

/*Count the number of rows with product_id which have unique values */
SELECT COUNT(DISTINCT product_id) FROM products;
/* Count the number of rows in the products table which have unique non-null prices that are above 1 dollar. */
SELECT COUNT(DISTINCT product_id) FROM products WHERE unit_price > 1;

SUM

The SUM() function will add up all of the numeric values in a column and return their combined sum. Just like COUNT(), we can apply a WHERE clause to limit which rows we add together. As a note, this will ignore null entries.

All Rows

Example:

/*Sum up the quantity_per_unit column*/
SELECT SUM(quantity_per_unit) FROM products;

Subset of Rows

Example:

/* Sum up the total quantity_per_unit for items that cost more than 1 dollar*/
SELECT SUM(quantity_per_unit) FROM products WHERE price > 1;

Math Operations

A complete list of operations can be found in the official documentation here Use the table listed below for our Math Operations Examples

Addition

SELECT product_name, (units_in_stock + units_on_order) as "Total future stock" from products;

(units_in_stock + units_on_order): Working with the plus sign, we can add the values of two columns together for each table entry. For example, adding the units currently in stock and units ordered from the seller will give us the future total stock.

Subtraction

SELECT product_name, (units_in_order - reorder_level) as "Units to be sold before reorder" from products;

(units_in_order - reorder_level): Working with the minus sign, we can subtract the values of two columns from one another for each table entry.

Division

SELECT product_name, (unit_price / quantity_per_unit) as "Unit Price by quantity per unit" FROM products;

CAST (...): Our math operations will try to perserve the data type of the columns being used. This means if our operation works with whole numbers, then our answer will be a whole number. To work around this, we can use CAST() to treat one of our values as a float, which will result in a decimal number for our answer!

Multiplication

SELECT product_name, (units_in_stock* quantity_per_unit) as "Total amount in stock" FROM products;

((...)* 100): Like many other programming langauges, we can use the asterisk to handle multiplication. In this case we are multiplying with a constant 100.

GROUP BY

Basic Syntax

/*Use SUM to aggregate over column 2 */
SELECT column_1, SUM(column_2)
FROM products
/* Make groups keyed by column 1 values, that sum over column_2 */
GROUP BY column_1;

GROUP BY column

When we want to summarize sets of rows that share values, the GROUP BY statement is often useful. When coupled with an aggregate function like SUM or COUNT, we can aggregate values by groups. For example, if our products had a category column, we could sum the qty within each category, like fruit. For example:

Example:

/* select category_id and sum of units in stock grouped by category_id from table products */
/* This means that all proucts having same category_id are put in the same group and each group's units_in_stock is summed up */
SELECT category_id, SUM(units_in_stock)
FROM products
GROUP BY category_id;

GROUP BY HAVING column

The HAVING clause is used to filter groups returned by the GROUP BY clause. It is particularly useful when you want to include conditions on aggregated data, such as sums, averages, counts, etc., that cannot be filtered through the WHERE clause. For example:

Example:

/* select category_id and sum of units in stock grouped by category_id from table products */
/* This means that all proucts having same category_id are put in the same group and each group's units_in_stock is summed up */
/* Only include groups where the sum of units_in_stock is greater than 100 */
SELECT category_id, SUM(units_in_stock)
FROM products
GROUP BY category_id
HAVING SUM(units_in_stock) > 100;

C.2. Views, Subqueries and CTEs

Working With Views :

Syntax

CREATE VIEW view_name as your_query_goes_here;

Create View

A VIEW works with a pre-defined query and creates a resulting table. The view's fields can be accessed in the exact same way as a normal table. This is a handy way to aggregate data into more easy to access table format. Also, if you find yourself reusing query over and over, make it into a view for easier access.

Example:

CREATE VIEW inventory_cost AS
  SELECT product_name, (units_in_stock * unit_price) AS "Total Cost"
  FROM products;

Access a View

Example:

SELECT *
FROM inventory_cost /* Retrieve all of the rows from inventory_cost's query */;
SELECT * FROM inventory_cost WHERE "Total Cost" > 12;
/* Restrict the rows returned to only those with an inventory cost greater than 12 dollars  */

SUBQUERIES

A subquery can allow you to run an inner query that feeds its results into an outer query. Often this is utilized in the WHERE clause of a select statement to help define which rows a query should work with.

In our example above, our subquery will retrieve the current price of a Kit Kat. This means we don't have to worry about "hard coding" the price for a Kit Kat. Our outer query will always work, even if the price of a Kit Kat is updated.

Example:

/* Display all of the product items that have a price greater than a Kit Kat */
SELECT *
FROM products
WHERE
  unit_price > (SELECT unit_price FROM products WHERE product_name = 'Kit Kat');

CTE - Common Table Expressions

CTE is a temporary names result-set created using with and select clause. You can use this temporary result to further select, insert, update or delete. This is a convenient way to manage complex queries. We give a temporary name to the CTE which acts like a View. In the following syntax, temp_cte is the name of the intermediate view which has columns a,b,c selected from table1. This view is further used to select a and c columns where b's value is greater than 3.

Syntax

WITH
  temp_cte AS (
    SELECT a, b, c
    FROM table1
  )
SELECT a, c
FROM temp_cte
WHERE b > 3;

Building on the previous exmaple of displaying all products that have a price greater than KitKat, let's use CTE to do the same.

Example:

/* Display all of the product items that have a price greater than a Kit Kat */
WITH
  kit_kat_price AS (
    SELECT price
    FROM products
    WHERE name = 'Kit Kat'
  )
SELECT name, price
FROM products
WHERE price > (SELECT price FROM kit_kat_price);

This query creates a temporary result-set named kit_kat_price that returns the price of the product named 'Kit Kat'. Then, in the main query, we select the name and price of all products where the price is greater than the price of 'Kit Kat'.

C.3. Joins

You can find a more detailed explanation of Joins for PostgreSQL here.

Checkout the handy diagram below:

index

Example:

Consider the following tables for performing joins. We will be using the employee table on the left and salary table on the right of the join condition for the purpose of this explanation.

CREATE TABLE employee (
  id SERIAL PRIMARY KEY,
  name VARCHAR(60) NOT NULL,
  age INT NOT NULL,
  phone_number VARCHAR(10) NOT NULL
);

INSERT INTO employee
  (name, age, phone_number)
VALUES
  ('Randolph Esquire', 33, '1112223333'),
  ('Clarence Roosevelt', 12, '4442220000'),
  ('Rachel Null', 44, '1002004000'),
  ('Sarah The Third', 102, '1032034003');

-- --

CREATE TABLE salary (
  employee_id INT PRIMARY KEY,
  starting_salary FLOAT NOT NULL,
  current_salary FLOAT NOT NULL
);

INSERT INTO salary
  (employee_id, starting_salary, current_salary)
VALUES
  (1, 35000, 37222.32),
  (2, 55.33, 78.92),
  (3, 123450.12, 277111.32),
  (5, 100, 323.5);

index

info

Any blank cells that you see in the answer sets, are NULL values.

INNER JOIN

The INNER JOIN will return only the entries that have matching values in a shared column, in this case a shared column between the "employee" & "salary" tables is id <-> employee_id. By default, if you do not specify the type of join, the JOIN keyword in SQL will refer to an inner join.

Example:

Command:

SELECT
  employee.name,
  salary.current_salary
FROM
  employee -- left table
  INNER JOIN salary -- right table
    ON employee.id = salary.employee_id;

Output:

You'll see that "Sarah the Third" is an entry with in the employee table with id = 5. Since there isn't an entry in the salary table with employee_id = 5, that entry is left out.

index

LEFT JOIN

The LEFT JOIN works similarly to the inner join, except ALL rows from the left table are returned. The fields from the right table will either be set to NULL, or to the value from the right table if there is a match. This means you will see null values for some fields from the right column in the cases that there was no match.

Example:

Command:

SELECT
  employee.name,
  salary.current_salary
FROM
  employee -- left table
  LEFT JOIN salary -- right table
    ON employee.id = salary.employee_id;

Output:

index

RIGHT JOIN

The RIGHT JOIN works the same as the left join, but opposite. That is, ALL rows from the right table are returned. The fields from the left table will either be set to NULL, or to the value from the right table if there is a match.

Example:

Command:

SELECT
  employee.name,
  salary.current_salary
FROM
  employee -- left table
  RIGHT JOIN salary -- right table
    ON employee.id = salary.employee_id;

Output:

index

FULL JOIN

Example:

The FULL JOIN will return all the entries from each table that have matching pairs. For any entry that doesn't, it will display null for the field.

Command:

SELECT
  employee.name,
  salary.current_salary
FROM
  employee -- left table
  FULL JOIN salary -- right table
    ON employee.id = salary.employee_id;

Output:

In this example, the entry "5" and "Sarah The Third'' will be paired up with null values.

index

FULL JOIN - (rows that DON'T have matching pairs)

This is a variant of FULL JOIN in which it will only return the entries that DON'T have matching pairs (i.e. unique data entries). This means you will only see the rows that contain a null value on other side. (Entry "5" & "Sarah The Third")

Example:

Command:

SELECT
  employee.name,
  salary.current_salary
FROM
  employee -- left table
  FULL JOIN salary -- right table
    ON employee.id = salary.employee_id
WHERE employee.id IS NULL OR salary.employee_id IS NULL;

Output:

index

C.4. Querying data

For each query in this activity, you will be editing the corresponding file in queries.

danger

We have provided test data, a script to test each of your queries, and the correct results for the current tables. We will use different data to test your submission, so make sure you do not reference ids directly!

Before moving onto the queries, take a moment to comment reset your database such that it matches the version we used to generate the expected outputs. This will allow you to test your queries as you go.

You can do this by commenting out your insert statements in entrypoint.d/04_insert.sql.

Then run the test script to reset your database.

./test.sh --reset
ERROR: query queries/query_01.sql is incorrect
...
FAIL: 0/10 queries are correct
FAIL: encountered error while testing queries

As you progress through this section, rerun this script to check your progress. Feel free to look at the expected output for each query in queries/testdata, but again keep in mind that hard-coding values will not work as the data will not be the same during grading.

Write a query to find the highest rated movie. If there is a tie, return any ONE row with the movie with the highest rating.

Hint: Select all the rows from the required table and sort it in descending order.

Write a query to find ALL the movies starring "Leonardo di Caprio".

Hint: Join the movies & actors tables by using any necessary join operation required to combine both. Filter rows where the actor’s name is "Leonardo di Caprio".

Write a query to find the top 3 oldest movies in the database. Return only THREE rows.

Hint: Select all the columns from the required table and sort it in ascending order based on the appropriate column. Limit the number of rows printed to 3.

Part C

For Part C, we will be practicing writing more queries. You will continue to work with queries.sql as you need to submit that at the end of this activity.

Write a query to get movies available on Hulu ordered by id.

Hint: Use a subquery to get the platform_id for Hulu and use this to select the required movies.

Write a query to find the actor who has acted in the maximum number of movies.

Hint: Use a subquery that groups based on the actors and sorts in descending order of the aggregated movies. Use this subquery to select actors names.

Top 3 highest grossing thriller movies.

Hint: To write this you will need records from 3 tables that are joined - movies_to_genres and movies where the genre_name would be ‘Thriller’. The records would be in the descending order of the gross_revenue, keeping the limit 3.

Update the platform for all 2004 movies to be Netflix.

Hint: Use a subquery to fetch the platform id of Netflix and use that information to update the platform for movies, released in 2004, in the movies table.

Which platform has the most number of movies?

Hint: Write a subquery that groups based on the platform ids and sorted by the count of platforms to filter the row that has most movies.

Which actor’s movies have generated the overall highest revenue?

Hint: Join movies, actors and any tables needed to join the two and calculate the sum of revenue by creating groups over actors.

Write a query to find actors from Missouri that starred in a Romcom and order them by name.

Hint: Create 2 views: the first one to join movies and actors by filtering on 'Missouri', the second one to join movies and genres by filtering on ‘Romcom’. Join these 2 views to get the final query.

Once you are done, make sure to rerun the test script. This is what it should look like.

./test.sh --reset
INFO: query queries/query_01.sql is correct
INFO: query queries/query_02.sql is correct
INFO: query queries/query_03.sql is correct
INFO: query queries/query_04.sql is correct
INFO: query queries/query_05.sql is correct
INFO: query queries/query_06.sql is correct
INFO: query queries/query_07.sql is correct
INFO: query queries/query_08.sql is correct
INFO: query queries/query_09.sql is correct
INFO: query queries/query_10.sql is correct
PASS ./test.sh

🎥Recording output

Files: queries/*.sql These files should each contain 1 query. Make sure to check your queries with ./test.sh.

Any outputs should be commented out. If the script does not pass your query will not be graded.

Submission Guidelines

Commit and upload your changes

Make sure you have all the files added in the directory structure specified. Run the following commands inside your root git directory (in your lab-5-sql-<username> directory).

git add .
git commit -m "Added all files for lab6"
git push

tip

Once you have run the commands given above, please navigate to your GitHub remote repository (on the browser) and check if the changes have been reflected.

You will be graded on the files that were present before the deadline. If the files are missing/not updated, you could receive a grade as low as 0. This will not be replaced as any new pushes to the repository after the deadline will be considered as a late submission and we do not accept late submissions.

Regrade Requests

Please use this link to raise a regrade request if you think you didn't receive a correct grade. If you received a lower than expected grade because of missing/not updated files, please do not submit a regrade request as they will not be considered for reevaluation.

Rubric

	Description	Points
Part A - Lab Quiz	Complete Lab Quiz on Canvas	20
Part B: Create Table	`entrypoint.d/00_create.sql`	15
Part B: Alter Table	`entrypoint.d/01_alter.sql`	5
Part B: Insert into Table	`entrypoint.d/03_verify.sql`, `entrypoint.d/04_insert.sql`	10
Part B and C: Queries	`queries/*.sql`	30
In-class check-in	You showed your work to the TA or CM.	20
		100

Please refer to the lab readme or individual sections for information on file names and content to be included in them.

Overview​

Learning Objectives​

Part A​

Pre-Lab Quiz​

Part B​

ER Diagram​

Directory Structure​

Clone your GitHub repository

Navigate to the repository on your system.

Initializing Docker Container​

PostgreSQL​

Here is a list of some useful terminal commands for PostgreSQL that you will need for the later tasks in the lab.​

Comments​

A. Data Definition Language (DDL)​

A.1. CREATE table​

Example:​

VARCHAR(#)

TEXT

Whole Numbers

Decimal Numbers

DATE

To Do - Create tables in movies DB​

A.2. ALTER table​

To Do - Alter Table​

A.3. DROP table​

B. Data Manipulation Language (DML)​

B.1. INSERT INTO table​

To Do - You should insert atleast 2 rows in each table. (The data should satisfy the constraints on the tables)​

B.2. UPDATE table data​

B.3. DELETE table data​

C. Data Query Language (DQL)​

C.1. Basic Queries​

SELECT Statement​

Select Everything: ​

Example:​

Example:​

Example:​

COUNT​

Example:​

Example:​

Example:​

SUM​

Example:​

Example:​

Example:​

Example:​

GROUP BY​

Example:​

Example:​

C.2. Views, Subqueries and CTEs​

Working With Views :​

Syntax​

Create View​

Access a View​

SUBQUERIES​

CTE - Common Table Expressions​

Syntax​

Example:​

C.3. Joins​

Example:​

INNER JOIN​

Example:​

LEFT JOIN​

Example:​

RIGHT JOIN​

Example:​

FULL JOIN​

Example:​

FULL JOIN - (rows that DON'T have matching pairs)​

Example:​

C.4. Querying data​

Part C​

Submission Guidelines​

Commit and upload your changes​

Regrade Requests​

Rubric​

Overview

Learning Objectives

Part A

Pre-Lab Quiz

Part B

ER Diagram

Directory Structure

Initializing Docker Container

PostgreSQL

Here is a list of some useful terminal commands for PostgreSQL that you will need for the later tasks in the lab.

Comments

A. Data Definition Language (DDL)

A.1. CREATE table

Example:

To Do - Create tables in movies DB

A.2. ALTER table

To Do - Alter Table

A.3. DROP table

B. Data Manipulation Language (DML)

B.1. INSERT INTO table

To Do - You should insert atleast 2 rows in each table. (The data should satisfy the constraints on the tables)

B.2. UPDATE table data

B.3. DELETE table data

C. Data Query Language (DQL)

C.1. Basic Queries

SELECT Statement

Select Everything:

Example:

Example:

Example:

COUNT

Example:

Example:

Example:

SUM

Example:

Example:

Example:

Example:

GROUP BY

Example:

Example:

C.2. Views, Subqueries and CTEs

Working With Views :

Syntax

Create View

Access a View

SUBQUERIES

CTE - Common Table Expressions

Syntax

Example:

C.3. Joins

Example:

INNER JOIN

Example:

LEFT JOIN

Example:

RIGHT JOIN

Example:

FULL JOIN

Example:

FULL JOIN - (rows that DON'T have matching pairs)

Example:

C.4. Querying data

Part C

Submission Guidelines

Commit and upload your changes

Regrade Requests

Rubric