Using R Programming To Take A Look At Ontario Education Courses

Hi there. In this programming post I use R & RStudio to look at a sample of this dataset of Ontario school courses. The province of Ontario in Canada do provide public datasets on their data.ontario.ca website.

The link is here.


Pixabay Image Source

 

Load Data


To start I load the tidyverse library and the stringr library. Tidyverse contains a bunch of packages that are used for data manipulation, data wrangling and filtering dataframes. This tidyverse package is comparable to Python's pandas package.

# Ontario School Courses - Education Dataset Analysis

library(tidyverse)
library(stringr)

The link to the .txt dataset file is kind of long. I use the paste() function in R to combine or concatenate a long string that is separated. Then the data is loaded into the variable course_data.

# Ontario.ca dataset:

link <- paste("https://data.ontario.ca/en/dataset/7902ebf1-dc53-4cfa-950e-fd6a351982c5",
        "/resource/1543646b-8930-4d39-bf2f-aa799e587d44/download/ministry_defined_courses_en.txt"
        , sep ="")

# Load data, column titles separated by |, use sep = "|"

course_data <- read.csv(link, header = TRUE, sep = "|")

 

You can take a sample view of the dataframe object with the use of head() and tail(). Head is for the top n rows and tail is for the last n rows.

# Preview data:
head(course_data, n = 10)

tail(course_data, n = 10)

head_tail_coursedata.PNG

 

Using str() allows for looking at the dimensions/size of the dataframe, the columns, some of the values in each column and the column names.

# 1931 Courses
str(course_data)

str_course_data.PNG

 

The dimensions of the dataset is somewhat surprising to me. There are 1931 courses available in Ontario. I am sure there is no high school out there that offers all 1931 courses. I think different schools offer different elective type courses along with the core subjects such as English, French (mandatory until gr10), Science and Mathematics. There are some high schools that are arts focused and some high schools that are more technology based.

I am not a fan of the first column name for the course code. This course code column can be renamed without those dots. In the dataframe that has i..Course.Code, I rename this column into just Course.Code. In R use colnames() along with which().

# Change column name Course Code:

colnames(course_data)[which(names(course_data) == "ï..Course.Code")] <- "Course.Code"

Filter Data Set By Grade


In the province of Ontario high school consists of grades 9, 10, 11 and 12. The filter() keyword function is heavily used here. This %>% pipe operator is a bit of a unique shortcut operator for R's tidyverse. Instead of using filter(dataframe, <filter_condition> you would use dataframe %>% filter(<filter_condition>)

From the Pathways.Desination column Open means that it is open to any student for the grade, Academic is the highest difficulty for students aiming for University and Applied is in between Open and Academic in terms of difficulty.

For grades 11 and 12 there is Workplace Preparation with University/College Preparation being the higher difficulty.

 

Grade 9 Courses

#### Filtering

# Grade Nine Courses

grade_nine_courses <- course_data %>% filter(Grade == "Grade 9")

# Preview grade nine courses
head(grade_nine_courses, 20)

grade_nine_courses.PNG

 

Grade 10

# Grade Ten Courses

grade_ten_courses <- course_data %>% filter(Grade == "Grade 10")

# Preview grade ten courses
head(grade_ten_courses, 20)

grade_ten_courses.PNG

 

Grade Eleven

# Grade Eleven Courses

grade_eleven_courses <- course_data %>% filter(Grade == "Grade 11")

# Preview grade ten courses
head(grade_eleven_courses, 20)

grade_eleven_courses.PNG

 

Grade Twelve

# Grade Twelve Courses

grade_twelve_courses <- course_data %>% filter(Grade == "Grade 12")

# Preview grade ten courses
head(grade_twelve_courses, 20)

grade_twelve_courses.PNG

 

The courses by grade screenshots are a sample of each dataset by grade. More inspection is needed.


Pixabay Image Source

Filter Course Data By Topic


When it comes to filtering the course data by topic I want to extract the portion of the dataset that contains a certain word in the column. In this section I go through the Course.Description column and use the grepl() keyword. This grepl() keyword is used as a condition in the filter() keyword.

I extract a sample of courses.

Music Courses

### Filtering By Type Of Course:
# Use grepl in the filter keyword. grepl('Pattern', column of dataframe)

# Music Courses (Music In Name):
music_courses <- course_data %>% filter(grepl('Music', Course.Description))

head(music_courses, 20)

music_courses.PNG

 

Dance Courses

# Dance Courses:
dance_courses <- course_data %>% filter(grepl('Dance', Course.Description))

head(dance_courses, 20)

dance_courses.PNG

 

Courses With Levels

While going through the dataset .txt file there are a bunch of language courses available with Level numbers under the grade column. In this one I filter the original dataset in the Grade column where the entry has the word Level in it.

# Courses with Levels In Their Name:

level_courses <- course_data %>% filter(grepl('Level', Grade))

head(level_courses, 20)

level_courses.PNG

 

Science Courses

# Science Courses:

science_courses <- course_data %>% filter(grepl('Science', Course.Description))

science_courses

 

science_courses.PNG

 

I did not use head(science_courses, n = 20 this time. There are 17 course offerings for Science. I do exclude Chemistry, Biology and Physics. More code needed to get the rows with those 3 topics.

Math Courses

# Math Courses:

math_courses <- course_data %>% filter(grepl('Math', Course.Description))

math_courses

 

math_courses.PNG

There is Calculus, Functions and Advanced Functions (Pre-Calculus) missing. More searches needed to include those.

 


Pixabay Image Source

Notes


I was surprised to see 1700 something course offerings. My high school probably offered something like 100 courses at the time.

Math is a core course (up to grade 10 or 11). The course offerings in math are not much while dance and music have so many lol.

The provincial government has recently revised the grade 9 mathematics curriculum. There is no longer grade 9 academic math, grade 9 applied mathematics nor grade 9 Workplace mathematics (lowest difficulty). All the grade nines take a destreamed grade 9 mathematics course. The class sizes I've heard are larger. Larger class sizes make it hard for teachers and for some students.


Pixabay Image Source

 

Thank you for reading.

H2
H3
H4
3 columns
2 columns
1 column
Join the conversation now
Logo
Center