Lab 5: SQL
Overview
In this lab, you will be working with PostgreSQL to create a database for some sample data.
To receive credit for this lab, you MUST show your work to the TA during the lab, and push to the github by the deadline. Please note that submissions will be due right before your respective lab sessions in the following week. For Example, If your lab is on this Friday 10 AM, the submission deadline will be next Friday 10 AM. There is no late policy for labs.
Learning Objectives
LO1. Learn to read and construct ER diagramsL02. Understand the basics of Data Definition Language (DDL)
L03. Use SQL to query and modify tables
L04. Learn to use more advanced SQL features like CTEs, Group By, and Joins
Part A
Pre-Lab Quiz
Complete Pre-Lab quiz on Canvas before your section's lab time.
Part B
ER Diagram
What is an ER diagram?
An entity-relationship (ER) diagram is a blueprint that shows how "entities" such as people, things, and concepts are related to each other in a system. ER diagrams are most commonly used to design or debug relational databases in software development, business informatics, education, and research. Also known as ERD or ER model, it uses a predefined set of symbols such as rectangles, diamonds, ellipses, and connecting lines to represent the interconnectivity of entities, relationships, and their attributes.
Below is the ER Diagram for the database that we will be using for this lab. Please refer to it while writing queries in the later portion of the lab.
Nothing to be recorded here.
Directory Structure
Clone your GitHub repository
Github Classroom Assignment
- Using an SSH key
- Using a Personal Access Token (PAT)
git clone git@github.com:CU-CSCI3308-Fall2024/lab-5-sql-<YOUR_USER_NAME>.git
git clone https://github.com/CU-CSCI3308-Fall2024/lab-5-sql-<YOUR_USER_NAME>.git
Navigate to the repository on your system.
cd lab-5-sql-<YOUR_USER_NAME>
The website directory structure contains the SQL files for this lab. The list below contains all the files that are provided and the ones you will submit as well. Please make sure to follow the guidelines specified here.
├── data
│ ├── actors.sql
│ ├── genres.sql
│ ├── movies.sql
│ ├── movies_to_actors.sql
│ ├── movies_to_genres.sql
│ └── platforms.sql
├── entrypoint.d
│ ├── 00_create.sql
│ ├── 01_alter.sql
│ ├── 02_insert.sql
│ ├── 03_verify.sql
│ └── 04_insert.sql
├── queries
│ ├── query_01.sql
│ ├── query_02.sql
│ ├── query_03.sql
│ ├── query_04.sql
│ ├── query_05.sql
│ ├── query_06.sql
│ ├── query_07.sql
│ ├── query_08.sql
│ ├── query_09.sql
│ ├── query_10.sql
│ └── testdata
│ ├── ...
├── docker-compose.yaml
├── README.md
└── test.sh
Initializing Docker Container
- In your repository, you will find a
docker-compose.yaml
file with the following contents:
version: '3.9'
services:
db:
image: 'postgres:latest'
env_file: .env
expose:
- '5432'
volumes:
- postgres-data:/var/lib/postgresql/data
- ./data:/data:ro
- ./entrypoint.d/:/docker-entrypoint-initdb.d:ro
volumes:
postgres-data: {}
- Create a
.env
file in the root (lab5-sql-\<your_username>) folder to store environment variables to setup postgres.
# database credentials
POSTGRES_USER="postgres"
POSTGRES_PASSWORD="pwd"
POSTGRES_DB="movies_db"
You should not commit this file to GitHub (it is in the .gitignore
file, so
this is automatic.
- Start the PostgreSQL docker container with the docker-compose.yaml file:
docker-compose up -d
- Now that the container is running in the background, connect to PostgreSQL shell:
docker-compose exec db psql -U postgres
PostgreSQL
- Create a movies database
create database movies_db;
- Connect to the movies database
\c movies_db;
Here is a list of some useful terminal commands for PostgreSQL that you will need for the later tasks in the lab.
To list all of the databases in the current PostgreSQL database server, use the \l command
\l
To list all of the tables in the current database, use the \dt command
\dt
To describe a table such as the column, type, modifiers of columns, etc., use \d followed by the table name like so
\d table_name
To view all of the data associated with a table, select all of the columns from a table like this:
SELECT * FROM table_name;
To display the command history, use the \s command
\s
Comments
Single line comments start with --
, while multi-line comments start with
/*
and end with */
. Any text between /*
and */
will be ignored while
executing the .sql files. Use multi-line comments to include formatted query
outputs in sql files.
Example:
-- Select all the columns of all the records in the Customers table:
SELECT * FROM customers;
/*
| customer_id | customer_name |
| 123 | David |
*/
A. Data Definition Language (DDL)
A.1. CREATE table
Creating the tables
To actually create a table we must start with the "CREATE TABLE" command. Follow this with the name of the table you want to create. To make sure the script is re-runnabe you should use "IF NOT EXISTS" [CREATE TABLE IF NOT EXISTS table_name] to your script to ensure postgreSQL doesn't try to recreate a table (handy when working with long/reused scripts).
PLEASE NOTE: While the commands are demonstrated here in uppercase, postgreSQL is NOT case sensitive. It is convention to list the postgreSQL commands in upper case.
Syntax:
CREATE TABLE table_name (
column_name1 DATATYPE,
column_name2 DATATYPE
);
Example:
Following is an example of the CREATE TABLE command.
DO NOT MAKE THIS TABLE. Use this as a guide to create the necessary tables.
/* Create a table names products with the mentioned columns
having the datatypes mentioned */
CREATE TABLE products (
product_id SERIAL PRIMARY KEY /* the primary key for each entry */,
product_name VARCHAR(200) NOT NULL,
quantity_per_unit VARCHAR(20),
unit_price FLOAT,
units_in_stock SMALLINT,
units_on_order SMALLINT,
reorder_level SMALLINT,
discount DECIMAL,
category_id SMALLINT
);
Just about every postgreSQL command ends with a semicolon. If you don't add one, then the terminal will simply wait until you enter it. MAKE SURE TO ADD YOUR SEMICOLONS!
The table will look something like this with no data inserted.
product_id | product_name | quantity_per_unit | unit_price | units_in_stock | units_on_order | reorder_level | discount | category_id |
---|---|---|---|---|---|---|---|---|
You may have noticed certain keywords in the query such as SERIAL, PRIMARY KEY and NOT NULL. Here are some definitions for your reference.
SERIAL
The SERIAL command creates an auto-increment that is managed by the database. Each time a new entry is added to the table, the SERIAL column is set and the value is incremented by 1
PRIMARY KEY
The PRIMARY KEY defines a unique identifier for each row of the table. THIS MUST BE A UNIQUE VALUE! This is how the database differentiates the various rows/entries of the table.
NOT NULL
The NOT NULL attribute requires that a field be set when a new entry is added to the table. This is a design choice, so only set a field to NOT NULL, when you really want to ensure that data is entered. For example, the slogan field does not have "NOT NULL" listed. Meaning that you can add a new item to the store table and skip the slogan field, but must provide a name, qty, and price.
Data Types
You can find the official documentation that lists all of PostgreSQL's supported data types here
VARCHAR(#)
A variable-length character string, where # defines the maximum number of characters. This is a go-to option for defining strings that need to be set a maximum length. As a note, make sure to use single quotes ' ' when working with VARCHAR data.
TEXT
The TEXT type is fairly similiar to the VARCHAR(#) data type, but removes the character limit.
Whole Numbers
Whole numbers, or integers, can come in various sizes (smallint, int, and bigint). You should choose which one to use based on the specific data you hope to store.
Name | Number of Bytes | Range | Recommended Use |
---|---|---|---|
smallint | 2 | -32,768 to 32,767 | Only use when you KNOW your values won't exceed its range (ex. the height of a person in inches or the age of a person in years) |
int or integer | 4 | -2,147,483,648 to 2,147,483,647 | Considered the "work-horse" data type and most often used for giving you a good balance between range and data storage. |
bigint | 8 | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | Only use when you KNOW you are dealing with very large values (ex. number of transactions Amazon handles in a year). |
Decimal Numbers
Decimal Numbers come in three different option - real, double precision, and numeric(#, #). Like the whole numbers above, the type you use depends on what kind of information you hope to store.
Name | Precision | Recommended Use |
---|---|---|
real | 6 Decimal Places | Having up to 6 Decimal Places, means that real will work for "most cases" when dealing with decimal numbers. |
double precision | 15 Decimal Places | Use when you're dealing with scientific data or small decimal information. |
numeric(precision, scale) | precision defines the total number of digits in a value scale defines the number of decimal places in a value Example: 12345.67 - this value has a precision of 7 (7 total digits) and a score of 2 (2 decimal places) | Use this when you are dealing with currencies or other data that requires a maximum number of digits. According to the documentation:
|
DATE
Date values, in postgreSQL, are in the yyyy-mm-dd format - '2019-11-25'. To work with dates see the list of handy date functions below:
Function Name & Example | Function Description |
---|---|
to_date(date_text, date_format) | The to_date() function will convert a string literal into a date value, using the provided date formate. Further details here |
to_date('20191125', 'yyyymmdd') | |
NOW():: DATE | To retrieve the current date on the server you can use the NOW() function, with the :: DATE option added to it we will only get the current date (removes the time information). The other option is to retrieve the CURRENT_DATE which will is in the default format of 'yyyy-mm-dd'. |
SELECT NOW()::DATE | |
CURRENT_DATE | |
SELECT CURRENT_DATE; | |
Further Examples | If you want to learn more about working with dates, check out this resource here |
To Do - Create tables in movies DB
For this activity, you will be using the file entrypoint.d/00_create.sql
.
It would be easier for you to write your CREATE
queries in this file before executing them on the terminal.
You need to choose suitable datatypes for the columns and identify and assign primary keys in the tables. You may refer to the ER diagram provided at the top of this page for this activity.
Make sure you are connected to the movies database.
Please refer the ER Diagram above to understand the datatypes for the necessary variables.
Create movies table with the following columns:
id
name
duration
year_of_release
gross_revenue
country
language
imdb_rating
platform_idCreate actors table with the following columns
id
name
agency
active_since
locationCreate platforms table with the following columns
id
name
viewership_costCreate genres table with the following columns
id
nameCreate movies_to_actors table with the following columns
movie_id
actor_idCreate movies_to_genres table with the following columns
movie_id
genre_id
Multi-line Comments:
Multi-line comments start with /*
and end with */
. Any text between /*
and */
will be ignored while executing the .sql files.
Example:
/* Select all the columns
of all the records
in the Customers table: */
SELECT * FROM Customers;
/*
| customer_id | customer_name |
| 123 | David |
*/
- File name:
entrypoint.d/00_create.sql
This file should include the CREATE queries and the outputs of the commands in this file. - The outputs copied in the
entrypoint.d/00_create.sql
file should be commented out using Multi-line comments. (/* */) as shown above.
A.2. ALTER table
There could arise scenarios that need you to alter the structure of table. This could mean adding/updating/deleting a column/constraint. In such situations,we make use of the ALTER TABLE command.
Syntax:
ALTER TABLE table_name
ADD COLUMN new_column_name data_type constraint;
Example:
/* Alter the table products to change the constraint
of foreign key to category_id column */
ALTER TABLE products
ADD CONSTRAINT category_id FOREIGN KEY (category_id) REFERENCES categories (category_id);
To Do - Alter Table
For this activity, you will be using the file entrypoint.d/03_alter.sql
.
It would be easier for you to write your ALTER
queries in this file before executing them on the terminal.
Make the column platform_id
a FOREIGN KEY in the movies table. You need to determine which column from which table this column would reference.
- File name:
entrypoint.d/01_alter.sql
. This file should include the ALTER query and their outputs. - The outputs copied in the
entrypoint.d/01_alter.sql
file should be commented out using Multi-line comments. (/* */) as shown above.
A.3. DROP table
In case you want to get remove a table or in other words delete a table - DROP TABLE is the command to use.
Please be very careful when using the DROP command because it will erase the table, the data within the table and all the metadata associated with that table.
Syntax:
DROP TABLE IF EXISTS table_name;
Example:
DROP TABLE IF EXISTS products;
Cascade is used to remove the specified records and foreign keys referencing them in other tables.
B. Data Manipulation Language (DML)
B.1. INSERT INTO table
Insert Provided Data
You will now insert rows from the provided sql files under data
in your
repository We mounted the data
folder in your repository to /data
in the
container. Replace <filename>
with the actual file name for each table.
\i /data/<filename>.sql;
Add each of the 6 commands to entrypoint.d/02_insert.sql
.
Verify that the data was correctly inserted. Replace <table_name> with each table.
select count(*) from <table_name>;
Tables and expected number of rows in each table
Table Name | No. of Rows |
---|---|
Movies | 26 |
Actors | 20 |
Platforms | 7 |
Genres | 8 |
movies_to_actors | 61 |
movies_to_genres | 26 |
Multi-line Comments:
Multi-line comments start with /*
and end with */
. Any text between /*
and */
will be ignored while executing the .sql files.
Example:
/* Select all the columns
of all the records
in the Customers table: */
SELECT * FROM Customers;
/*
| customer_id | customer_name |
| 123 | David |
*/
File name: entrypoint.d/03_verify.sql
Now let's practice inserting some new rows of our own
The INSERT query will add records to the table.
Syntax:
INSERT INTO table_name (column_name1, column_name2, column_name3,...column_nameN)
VALUES (value1, value2, value3,...valueN), /* First row */
(value1, value2, value3,...valueN); /* Second row */
Example:
/* Insert the values into corresponding columns in the table products */
INSERT INTO products (product_name, quantity_per_unit, unit_price, units_in_stock, units_on_order, reorder_level, discount, category_id) VALUES
('Unison Composition Book', '3', 10.4, 100, 50, 2, no, 1);
Insert with return inserted row
The "returning *" clause at the end of the query will return all columns for the newly inserted record.
INSERT INTO products (product_name, quantity_per_unit, unit_price, units_in_stock, units_on_order, reorder_level, discount, category_id) VALUES
('Unison Composition Book', '3', 10.4, 100, 50, 2, no, 1) returning * ;
To Do - You should insert atleast 2 rows in each table. (The data should satisfy the constraints on the tables)
For this activity, you will be using the file entrypoint.d/04_insert.sql
.
It would be easier for you to write your INSERT
queries in this file before executing them on the terminal.
Multi-line Comments:
Multi-line comments start with /*
and end with */
. Any text between /*
and */
will be ignored while executing the .sql files.
Example:
/* Select all the columns
of all the records
in the Customers table: */
SELECT * FROM Customers;
/*
| customer_id | customer_name |
| 123 | David |
*/
File name: entrypoint.d/04_insert.sql
This file should contain all INSERT queries and their outputs.
The outputs copied in the entrypoint.d/04_insert.sql
file should be commented out using Multi-line comments. (/* */) as shown above.
B.2. UPDATE table data
More often than not, there can be a need to update the data in a table. For that purpose, we can use UPDATE table command. You need to make sure to specify a condition, using the WHERE clause when you update rows in a table otherwise all the rows in the table will be updated. We'll learn more about the WHERE clause in the querying section of this lab.
Syntax:
UPDATE table_name
SET column1 = value1,
column2 = value2,
...
WHERE condition;
B.3. DELETE table data
In some cases you may need to remove a row from the table. DELETE is the command to use. Similar to the UPDATE command, you need to make sure to specify a condition, using the WHERE clause when you update rows in a table otherwise you may end of deleting all the rows in the table. We'll learn more about the WHERE clause in the querying section of this lab.
Syntax:
DELETE FROM table_name
WHERE condition;
C. Data Query Language (DQL)
C.1. Basic Queries
In this section we will learn some important commonly used queries in the premise of products database seen previously.
SELECT Statement
Used to select and display rows of data from table.
Syntax:
SELECT column_name FROM table_name WHERE condition_is_true;
Select Everything:
The asterisk (*) or "star" acts as a wild card in our select statement. When we don't want to limit the number of fields/columned returned we can use * to select all of them.
Example:
/* SELECT all the rows from table products */
SELECT * FROM products;
The FROM statement defines where we want to retrieve our data (this is usually a table name, but could also be a view or even a subquery).
Limit Our Columns:
If we want to choose and display only selected columns instead of all columns, we can do so by the following way.
Example:
/* Only show the product_name and quantity_per_unit columns from table products. */
SELECT product_name, quantity_per_unit FROM products;
In the above example, product_name and quantity_per_unit are the names of columns that we are choosing to display out of the many columns that might be present in the table products. This is great when we have large tables with a large number of rows or when we want to display columns side-by-side for improved readability.
Limit Our Rows:
If we want to select only some rows based on one or more conditions, we can use where
clause.
This clause helps us limit the number or rows returned. This is a true/false boolean statement that the database evaluates for each row. If the row evaluates to true, its information is returned. Otherwise the row is skipped over/ignored.
Example:
/* Show the rows in columns product_name, quantity_per_unit in table
products where unit_price is less than 1 dollar*/
SELECT product_name, quantity_per_unit FROM products WHERE unit_price < 1;
Aliases
Basic Syntax
SELECT column_name AS alias_name FROM table_name as table_alias_name
Rename A Column
SELECT item_name, qty AS quantity FROM store;
- AS alias_name
- Sometime our columns have short or abbreviated names (like qty). To help improve readability when we run our queries, we can use an alias to temporarily rename our column. In this example we are altering qty to the complete word quantity. But what if we wanted to work with multiple words? Then we can also place our alias in double quotes: qty as "Quantity of Items in Stock".
Rename A Table
SELECT item_name, qty FROM store as retail_establishment;
- AS table_alias_name
- Aliases can also be applied to tables, but we don't usually do it for making our readouts more readable. Instead these are uses for helping us handle subqueries and joins. We'll cover these use cases in the next section.
Combining Fields
SELECT item_name || ' : ' || slogan AS catch_phrase FROM store WHERE slogan IS NOT NULL;
- item_name || ' : ' || slogan
- The last option we'll discuss for using aliases is to help us rename concatenated or calculated fields. We use the double pipe || for string concatenation in SQL. So this statement states that we will be concatenating the item_name column and slogan columns, with a colon in between. The where clause was added to remove the rows that lack a slogan.
COUNT
The COUNT() function will return the number of rows that match a query.
COUNT(*)
Our first example is a quick way to check how many rows are in a table (a great way to see if an insert statement or query worked correctly!). The second example will incorporate a where clause, letting us now how many rows meet a specific condition.
Example:
/*count the number of rows in the products table */
SELECT COUNT(*) FROM products;
/* count the number of products that cost more than 1 dollar */
SELECT COUNT(*) FROM products WHERE unit_price > 1;
COUNT(product_id)
If a column name is listed inside of the COUNT() function, then rows with null fields will be ignored. In our now updated examples, we list the slogan column (which can be null). This means all of rows without a product_id are completely ignored by the COUNT() function.
Example:
/*count the number of products in the products table which have a product_id*/
SELECT COUNT(product_id) FROM products;
/* Count the number of products that cost more than 1 dollars */
SELECT COUNT(product_id) FROM products WHERE unit_price > 1;
COUNT(DISTINCT prices)
By adding the DISTINCT keyword, we can limit our search to ignore duplicates. In this example, Gold Fish & Pringles share the same price, so collectively they only count as 1 entry! Use DISTINCT to help you determine how varied your table data is. Ignore Null & Duplicates
Example:
/*Count the number of rows with product_id which have unique values */
SELECT COUNT(DISTINCT product_id) FROM products;
/* Count the number of rows in the products table which have unique non-null prices that are above 1 dollar. */
SELECT COUNT(DISTINCT product_id) FROM products WHERE unit_price > 1;
SUM
The SUM() function will add up all of the numeric values in a column and return their combined sum. Just like COUNT(), we can apply a WHERE clause to limit which rows we add together. As a note, this will ignore null entries.
All Rows
Example:
Example:
/*Sum up the quantity_per_unit column*/
SELECT SUM(quantity_per_unit) FROM products;
Subset of Rows
Example:
Example:
/* Sum up the total quantity_per_unit for items that cost more than 1 dollar*/
SELECT SUM(quantity_per_unit) FROM products WHERE price > 1;
Math Operations
A complete list of operations can be found in the official documentation here Use the table listed below for our Math Operations Examples
Addition
SELECT product_name, (units_in_stock + units_on_order) as "Total future stock" from products;
- (units_in_stock + units_on_order)
- Working with the plus sign, we can add the values of two columns together for each table entry. For example, adding the units currently in stock and units ordered from the seller will give us the future total stock.
Subtraction
SELECT product_name, (units_in_order - reorder_level) as "Units to be sold before reorder" from products;
- (units_in_order - reorder_level)
- Working with the minus sign, we can subtract the values of two columns from one another for each table entry.
Division
SELECT product_name, (unit_price / quantity_per_unit) as "Unit Price by quantity per unit" FROM products;
- CAST (...)
- Our math operations will try to perserve the data type of the columns being used. This means if our operation works with whole numbers, then our answer will be a whole number. To work around this, we can use CAST() to treat one of our values as a float, which will result in a decimal number for our answer!
Multiplication
SELECT product_name, (units_in_stock* quantity_per_unit) as "Total amount in stock" FROM products;
- ((...)* 100)
- Like many other programming langauges, we can use the asterisk to handle multiplication. In this case we are multiplying with a constant 100.
GROUP BY
Basic Syntax
/*Use SUM to aggregate over column 2 */
SELECT column_1, SUM(column_2)
FROM products
/* Make groups keyed by column 1 values, that sum over column_2 */
GROUP BY column_1;
GROUP BY column
When we want to summarize sets of rows that share values, the GROUP BY statement is often useful. When coupled with an aggregate function like SUM or COUNT, we can aggregate values by groups. For example, if our products had a category column, we could sum the qty within each category, like fruit. For example:
Example:
/* select category_id and sum of units in stock grouped by category_id from table products */
/* This means that all proucts having same category_id are put in the same group and each group's units_in_stock is summed up */
SELECT category_id, SUM(units_in_stock)
FROM products
GROUP BY category_id;
GROUP BY HAVING column
The HAVING clause is used to filter groups returned by the GROUP BY clause. It is particularly useful when you want to include conditions on aggregated data, such as sums, averages, counts, etc., that cannot be filtered through the WHERE clause. For example:
Example:
/* select category_id and sum of units in stock grouped by category_id from table products */
/* This means that all proucts having same category_id are put in the same group and each group's units_in_stock is summed up */
/* Only include groups where the sum of units_in_stock is greater than 100 */
SELECT category_id, SUM(units_in_stock)
FROM products
GROUP BY category_id
HAVING SUM(units_in_stock) > 100;
C.2. Views, Subqueries and CTEs
Working With Views :
Syntax
CREATE VIEW view_name as your_query_goes_here;
Create View
A VIEW works with a pre-defined query and creates a resulting table. The view's fields can be accessed in the exact same way as a normal table. This is a handy way to aggregate data into more easy to access table format. Also, if you find yourself reusing query over and over, make it into a view for easier access.
Example:
CREATE VIEW inventory_cost AS
SELECT product_name, (units_in_stock * unit_price) AS "Total Cost"
FROM products;
Access a View
Example:
SELECT *
FROM inventory_cost /* Retrieve all of the rows from inventory_cost's query */;
SELECT * FROM inventory_cost WHERE "Total Cost" > 12;
/* Restrict the rows returned to only those with an inventory cost greater than 12 dollars */
SUBQUERIES
A subquery can allow you to run an inner query that feeds its results into an outer query. Often this is utilized in the WHERE clause of a select statement to help define which rows a query should work with.
In our example above, our subquery will retrieve the current price of a Kit Kat. This means we don't have to worry about "hard coding" the price for a Kit Kat. Our outer query will always work, even if the price of a Kit Kat is updated.
Example:
/* Display all of the product items that have a price greater than a Kit Kat */
SELECT *
FROM products
WHERE
unit_price > (SELECT unit_price FROM products WHERE product_name = 'Kit Kat');
CTE - Common Table Expressions
CTE is a temporary names result-set created using with
and select
clause. You can use this temporary result to further select, insert, update or delete.
This is a convenient way to manage complex queries. We give a temporary name to the CTE which acts like a View. In the following syntax, temp_cte is the name of the intermediate view
which has columns a,b,c selected from table1. This view is further used to select a and c columns where b's value is greater than 3.
Syntax
WITH
temp_cte AS (
SELECT a, b, c
FROM table1
)
SELECT a, c
FROM temp_cte
WHERE b > 3;
Building on the previous exmaple of displaying all products that have a price greater than KitKat, let's use CTE to do the same.
Example:
/* Display all of the product items that have a price greater than a Kit Kat */
WITH
kit_kat_price AS (
SELECT price
FROM products
WHERE name = 'Kit Kat'
)
SELECT name, price
FROM products
WHERE price > (SELECT price FROM kit_kat_price);
This query creates a temporary result-set named kit_kat_price that returns the price of the product named 'Kit Kat'. Then, in the main query, we select the name and price of all products where the price is greater than the price of 'Kit Kat'.
C.3. Joins
You can find a more detailed explanation of Joins for PostgreSQL here.
Checkout the handy diagram below:
Example:
Consider the following tables for performing joins. We will be using the employee table on the left and salary table on the right of the join condition for the purpose of this explanation.
CREATE TABLE employee (
id SERIAL PRIMARY KEY,
name VARCHAR(60) NOT NULL,
age INT NOT NULL,
phone_number VARCHAR(10) NOT NULL
);
INSERT INTO employee
(name, age, phone_number)
VALUES
('Randolph Esquire', 33, '1112223333'),
('Clarence Roosevelt', 12, '4442220000'),
('Rachel Null', 44, '1002004000'),
('Sarah The Third', 102, '1032034003');
-- --
CREATE TABLE salary (
employee_id INT PRIMARY KEY,
starting_salary FLOAT NOT NULL,
current_salary FLOAT NOT NULL
);
INSERT INTO salary
(employee_id, starting_salary, current_salary)
VALUES
(1, 35000, 37222.32),
(2, 55.33, 78.92),
(3, 123450.12, 277111.32),
(5, 100, 323.5);
Any blank cells that you see in the answer sets, are NULL values.
INNER JOIN
The INNER JOIN will return only the entries that have matching values in a shared column, in this case a shared column between the "employee" & "salary" tables is id <-> employee_id
.
By default, if you do not specify the type of join, the JOIN keyword in SQL will refer to an inner join.
Example:
Command:
SELECT
employee.name,
salary.current_salary
FROM
employee -- left table
INNER JOIN salary -- right table
ON employee.id = salary.employee_id;
Output:
You'll see that "Sarah the Third" is an entry with in the employee table with id = 5. Since there isn't an entry in the salary table with employee_id = 5, that entry is left out.
LEFT JOIN
The LEFT JOIN works similarly to the inner join, except ALL rows from the left table are returned. The fields from the right table will either be set to NULL, or to the value from the right table if there is a match. This means you will see null values for some fields from the right column in the cases that there was no match.
Example:
Command:
SELECT
employee.name,
salary.current_salary
FROM
employee -- left table
LEFT JOIN salary -- right table
ON employee.id = salary.employee_id;
Output:
RIGHT JOIN
The RIGHT JOIN works the same as the left join, but opposite. That is, ALL rows from the right table are returned. The fields from the left table will either be set to NULL, or to the value from the right table if there is a match.
Example:
Command:
SELECT
employee.name,
salary.current_salary
FROM
employee -- left table
RIGHT JOIN salary -- right table
ON employee.id = salary.employee_id;
Output:
FULL JOIN
Example:
The FULL JOIN will return all the entries from each table that have matching pairs. For any entry that doesn't, it will display null for the field.
Command:
SELECT
employee.name,
salary.current_salary
FROM
employee -- left table
FULL JOIN salary -- right table
ON employee.id = salary.employee_id;
Output:
In this example, the entry "5" and "Sarah The Third'' will be paired up with null values.
FULL JOIN - (rows that DON'T have matching pairs)
This is a variant of FULL JOIN in which it will only return the entries that DON'T have matching pairs (i.e. unique data entries). This means you will only see the rows that contain a null value on other side. (Entry "5" & "Sarah The Third")
Example:
Command:
SELECT
employee.name,
salary.current_salary
FROM
employee -- left table
FULL JOIN salary -- right table
ON employee.id = salary.employee_id
WHERE employee.id IS NULL OR salary.employee_id IS NULL;
Output:
C.4. Querying data
For each query in this activity, you will be editing the corresponding file in queries
.
We have provided test data, a script to test each of your queries, and the correct results for the current tables. We will use different data to test your submission, so make sure you do not reference ids directly!
Before moving onto the queries, take a moment to comment reset your database such that it matches the version we used to generate the expected outputs. This will allow you to test your queries as you go.
You can do this by commenting out your insert statements
in entrypoint.d/04_insert.sql
.
Then run the test script to reset your database.
./test.sh --reset
ERROR: query queries/query_01.sql is incorrect
...
FAIL: 0/10 queries are correct
FAIL: encountered error while testing queries
As you progress through this section, rerun this script to check your progress.
Feel free to look at the expected output for each query in queries/testdata
,
but again keep in mind that hard-coding values will not work as the data will
not be the same during grading.
- Write a query to find the highest rated movie. If there is a tie, return any ONE row with the movie with the highest rating.
Hint: Select all the rows from the required table and sort it in descending order.
- Write a query to find ALL the movies starring "Leonardo di Caprio".
Hint: Join the movies & actors tables by using any necessary join operation required to combine both. Filter rows where the actor’s name is "Leonardo di Caprio".
- Write a query to find the top 3 oldest movies in the database. Return only THREE rows.
Hint: Select all the columns from the required table and sort it in ascending order based on the appropriate column. Limit the number of rows printed to 3.
Part C
For Part C, we will be practicing writing more queries. You will continue to work with queries.sql
as you need to submit that at the end of this activity.
- Write a query to get movies available on Hulu ordered by id.
Hint: Use a subquery to get the platform_id for Hulu and use this to select the required movies.
- Write a query to find the actor who has acted in the maximum number of movies.
Hint: Use a subquery that groups based on the actors and sorts in descending order of the aggregated movies. Use this subquery to select actors names.
- Top 3 highest grossing thriller movies.
Hint: To write this you will need records from 3 tables that are joined - movies_to_genres and movies where the genre_name would be ‘Thriller’. The records would be in the descending order of the gross_revenue, keeping the limit 3.
- Update the platform for all 2004 movies to be Netflix.
Hint: Use a subquery to fetch the platform id of Netflix and use that information to update the platform for movies, released in 2004, in the movies table.
- Which platform has the most number of movies?
Hint: Write a subquery that groups based on the platform ids and sorted by the count of platforms to filter the row that has most movies.
- Which actor’s movies have generated the overall highest revenue?
Hint: Join movies, actors and any tables needed to join the two and calculate the sum of revenue by creating groups over actors.
- Write a query to find actors from Missouri that starred in a Romcom and order them by name.
Hint: Create 2 views: the first one to join movies and actors by filtering on 'Missouri', the second one to join movies and genres by filtering on ‘Romcom’. Join these 2 views to get the final query.
Once you are done, make sure to rerun the test script. This is what it should look like.
./test.sh --reset
INFO: query queries/query_01.sql is correct
INFO: query queries/query_02.sql is correct
INFO: query queries/query_03.sql is correct
INFO: query queries/query_04.sql is correct
INFO: query queries/query_05.sql is correct
INFO: query queries/query_06.sql is correct
INFO: query queries/query_07.sql is correct
INFO: query queries/query_08.sql is correct
INFO: query queries/query_09.sql is correct
INFO: query queries/query_10.sql is correct
PASS ./test.sh
Files: queries/*.sql
These files should each contain 1 query. Make
sure to check your queries with ./test.sh
.
Any outputs should be commented out. If the script does not pass your query will not be graded.
Submission Guidelines
Commit and upload your changes
Make sure you have all the files added in the directory structure specified. Run the following commands inside your root git directory (in your lab-5-sql-<username> directory).
git add .
git commit -m "Added all files for lab6"
git push
Once you have run the commands given above, please navigate to your GitHub remote repository (on the browser) and check if the changes have been reflected.
You will be graded on the files that were present before the deadline. If the files are missing/not updated, you could receive a grade as low as 0. This will not be replaced as any new pushes to the repository after the deadline will be considered as a late submission and we do not accept late submissions.
Regrade Requests
Please use this link to raise a regrade request if you think you didn't receive a correct grade. If you received a lower than expected grade because of missing/not updated files, please do not submit a regrade request as they will not be considered for reevaluation.
Rubric
Description | Points | |
---|---|---|
Part A - Lab Quiz | Complete Lab Quiz on Canvas | 20 |
Part B: Create Table | entrypoint.d/00_create.sql | 15 |
Part B: Alter Table | entrypoint.d/01_alter.sql | 5 |
Part B: Insert into Table | entrypoint.d/03_verify.sql , entrypoint.d/04_insert.sql | 10 |
Part B and C: Queries | queries/*.sql | 30 |
In-class check-in | You showed your work to the TA or CM. | 20 |
100 |
Please refer to the lab readme or individual sections for information on file names and content to be included in them.