Hi there. In this programming post I use R & RStudio to look at a sample of this dataset of Ontario school courses. The province of Ontario in Canada do provide public datasets on their data.ontario.ca
website.
The link is here.
Load Data
To start I load the tidyverse
library and the stringr
library. Tidyverse contains a bunch of packages that are used for data manipulation, data wrangling and filtering dataframes. This tidyverse
package is comparable to Python's pandas
package.
# Ontario School Courses - Education Dataset Analysis
library(tidyverse)
library(stringr)
The link to the .txt
dataset file is kind of long. I use the paste()
function in R to combine or concatenate a long string that is separated. Then the data is loaded into the variable course_data
.
# Ontario.ca dataset:
link <- paste("https://data.ontario.ca/en/dataset/7902ebf1-dc53-4cfa-950e-fd6a351982c5",
"/resource/1543646b-8930-4d39-bf2f-aa799e587d44/download/ministry_defined_courses_en.txt"
, sep ="")
# Load data, column titles separated by |, use sep = "|"
course_data <- read.csv(link, header = TRUE, sep = "|")
You can take a sample view of the dataframe object with the use of head()
and tail()
. Head is for the top n
rows and tail is for the last n
rows.
# Preview data:
head(course_data, n = 10)
tail(course_data, n = 10)
Using str()
allows for looking at the dimensions/size of the dataframe, the columns, some of the values in each column and the column names.
# 1931 Courses
str(course_data)
The dimensions of the dataset is somewhat surprising to me. There are 1931 courses available in Ontario. I am sure there is no high school out there that offers all 1931 courses. I think different schools offer different elective type courses along with the core subjects such as English, French (mandatory until gr10), Science and Mathematics. There are some high schools that are arts focused and some high schools that are more technology based.
I am not a fan of the first column name for the course code. This course code column can be renamed without those dots. In the dataframe that has i..Course.Code
, I rename this column into just Course.Code
. In R use colnames()
along with which()
.
# Change column name Course Code:
colnames(course_data)[which(names(course_data) == "ï..Course.Code")] <- "Course.Code"
Filter Data Set By Grade
In the province of Ontario high school consists of grades 9, 10, 11 and 12. The filter()
keyword function is heavily used here. This %>%
pipe operator is a bit of a unique shortcut operator for R's tidyverse. Instead of using filter(dataframe, <filter_condition>
you would use dataframe %>% filter(<filter_condition>)
From the Pathways.Desination column Open means that it is open to any student for the grade, Academic is the highest difficulty for students aiming for University and Applied is in between Open and Academic in terms of difficulty.
For grades 11 and 12 there is Workplace Preparation with University/College Preparation being the higher difficulty.
Grade 9 Courses
#### Filtering
# Grade Nine Courses
grade_nine_courses <- course_data %>% filter(Grade == "Grade 9")
# Preview grade nine courses
head(grade_nine_courses, 20)
Grade 10
# Grade Ten Courses
grade_ten_courses <- course_data %>% filter(Grade == "Grade 10")
# Preview grade ten courses
head(grade_ten_courses, 20)
Grade Eleven
# Grade Eleven Courses
grade_eleven_courses <- course_data %>% filter(Grade == "Grade 11")
# Preview grade ten courses
head(grade_eleven_courses, 20)
Grade Twelve
# Grade Twelve Courses
grade_twelve_courses <- course_data %>% filter(Grade == "Grade 12")
# Preview grade ten courses
head(grade_twelve_courses, 20)
The courses by grade screenshots are a sample of each dataset by grade. More inspection is needed.
Filter Course Data By Topic
When it comes to filtering the course data by topic I want to extract the portion of the dataset that contains a certain word in the column. In this section I go through the Course.Description
column and use the grepl()
keyword. This grepl()
keyword is used as a condition in the filter()
keyword.
I extract a sample of courses.
Music Courses
### Filtering By Type Of Course:
# Use grepl in the filter keyword. grepl('Pattern', column of dataframe)
# Music Courses (Music In Name):
music_courses <- course_data %>% filter(grepl('Music', Course.Description))
head(music_courses, 20)
Dance Courses
# Dance Courses:
dance_courses <- course_data %>% filter(grepl('Dance', Course.Description))
head(dance_courses, 20)
Courses With Levels
While going through the dataset .txt file there are a bunch of language courses available with Level numbers under the grade column. In this one I filter the original dataset in the Grade column where the entry has the word Level in it.
# Courses with Levels In Their Name:
level_courses <- course_data %>% filter(grepl('Level', Grade))
head(level_courses, 20)
Science Courses
# Science Courses:
science_courses <- course_data %>% filter(grepl('Science', Course.Description))
science_courses
I did not use head(science_courses, n = 20
this time. There are 17 course offerings for Science. I do exclude Chemistry, Biology and Physics. More code needed to get the rows with those 3 topics.
Math Courses
# Math Courses:
math_courses <- course_data %>% filter(grepl('Math', Course.Description))
math_courses
There is Calculus, Functions and Advanced Functions (Pre-Calculus) missing. More searches needed to include those.
Notes
I was surprised to see 1700 something course offerings. My high school probably offered something like 100 courses at the time.
Math is a core course (up to grade 10 or 11). The course offerings in math are not much while dance and music have so many lol.
The provincial government has recently revised the grade 9 mathematics curriculum. There is no longer grade 9 academic math, grade 9 applied mathematics nor grade 9 Workplace mathematics (lowest difficulty). All the grade nines take a destreamed grade 9 mathematics course. The class sizes I've heard are larger. Larger class sizes make it hard for teachers and for some students.