Subqueries and the JOIN command in MySQL

Hello fellow devs, this article is a continuation of the one I published a few days ago (which you can check out here). In the last article, I explained some basic SQL commands by going over the solutions for a project given by Harvard, where I have to query a SQL file containing data from IMDb. The topics I didn't talk about are subqueries and joining two or more tables with the JOIN command, so I will explain them by solving two problems.

The first problem

If you read my previous article, you can probably see the answer to this question. But for those who don't know, this is what the answer looks like:

SELECT AVG(rating)
FROM ratings 
WHERE year = 2012;

What the above code simply means is: give me the average rating of all the movies released in 2012. The AVG command calculates the average value of whatever is passed into it, and in this case, it's the ratings of all the movies released in 2012 but that code won't work here and I will explain why but first, let's look at the structure of the database.

Go down a bit and you will see CREATE TABLE ratings, that's the ratings table and in it, we have a movie_id column, a rating column (which contains the ratings of all the movies), a votes column (containing the total votes each movie has) and the last thing is something called FOREIGN KEY which isn't a column and I will explain it later on.

You can see that the ratings table doesn't have a year column that contains the year in which each movie was released but that year column can be seen in the movies table (the first table in the database). So, how do we access the year column if it exists in another table? That's where subqueries come in.

SELECT AVG(rating)
FROM ratings
WHERE movie_id
IN (SELECT id
FROM movies
WHERE year = 2012);

The code looks similar to the first one but you can see a couple of changes in it. Subqueries can be a bit confusing but one technique I use is to break the problem into small parts and start building the query from the inner one. To get the average ratings of all movies released in 2012, we must first get those movies before worrying about their ratings.

SELECT id
FROM movies
WHERE year = 2012;

This simply means give me the id number of all the movies released in 2012. The result might be something like: 1, 2, 5, 6, 7, ... n. We can now use this id number to access the rating of each movie in the ratings table, and we do that by using the movie_id column found in the ratings table. The id number column in the movies table corresponds to the movie_id column in the ratings table, that's the work of that FOREIGN KEY you saw in the ratings table. Simply put: id = movie_id

SELECT AVG(rating)
FROM ratings
WHERE movie_id
IN (SELECT id
FROM movies
WHERE year = 2012);

The code in the bracket is the subquery and it executes first and now we have the id of all movies released in 2012. The rest of the code goes thus: give me the average rating from the ratings table where the movie_id is in the result of that subquery. Let's assume the subquery gave a result of 1, 2, 5, 6, 7 which is the id of movies released in 2012, the outer query will look for the ratings of the movies whose movie_id is in that result. Recall: movie_id = Id

This is the output:

The second problem

This is similar to the last problem but the same technique won't be effective and we have to use a command known as JOIN. This command does a very simple thing: it joins two or more tables into one big table, and now we can access the peculiar columns of data in each of the tables.

SELECT movies.title, ratings.rating
FROM movies
JOIN ratings
ON movies.id = ratings.movie_id
WHERE movies.year = 2010
ORDER BY rating DESC, title;

The question said we should list movies and their ratings, that's exactly what the first line does. The two things we are listing out are in two different tables, so I specify them by saying movies.title and ratings.rating. The dot is similar to the dot notation in JavaScript used for accessing the properties of an object and in python, it is used to access the items in a dictionary.

In SQL, the dot notation is used to access the columns of a table. So, in that first line of code, I am saying: go into the movies table and select the contents of the title column, the same thing applies to the ratings table. FROM movies JOIN ratings is simply how we join both tables, the ratings and movies table are now joined together but before that can happen, they must be joined together by a column that is common to both tables.

Do you recall where I said there's a movie_id column in the ratings table that corresponds to the id column in the movies table? That's exactly the column that is common to both of these tables, they are bound together via the FOREIGN KEY I mentioned earlier. The ON command is used to specify the columns that are peculiar to both tables and in this case, it is the id and movie_id columns, which makes movies.id = ratings.movie_id true

After joining both tables, we can now access the data that exists in both tables. Since we are told to only list out movies released in 2010, we do that with the WHERE command and then we sort or order the result by their ratings, starting from the highest to the lowest (descending order). For movies that have the same ratings, we order them alphabetically by their title, this explains the last line of code.

This is the output. I have limited the result to only 10:

Note: This same JOIN method can be used to solve the first problem instead of using a subquery, it will produce the same result.

And that is it!

SQL is a very fascinating and interesting language, I really enjoyed working on this project by Harvard and I learned a lot about SQL. I also worked on another project where I had to use SQL queries to figure out who stole Harvard's rubber duck by using codes to look for clues in a database file containing witness reports, ATM records, security logs, and a lot more.

It was a very fun project but I won't write about it coz it will probably take me over 5 articles to completely cover it 😆 it was way more complicated than this one. For now, I have set SQL aside and started learning JavaScript, I can't wait to start building fun projects with it and writing about them.

Thanks for reading

Connect with me on:
Twitter: @kushyzeena
Readcash: @kushyzee

Lead image: Image by pressfoto on Freepik
Edited with Canva
H2
H3
H4
3 columns
2 columns
1 column
Join the conversation now
Logo
Center