[Image 1]
Hey it's a me again @drifter1! Today we continue with the Parallel Programming series about the OpenMP API. The previous article was an introduction that I suggest everyone to read if they haven't already! Today we will get into details of how we define Parallel Regions. So, without further ado, let's dive straight into it!
OpenMP is a parallel programming API thats meant for use in multi-threaded, shared-memory parallelism. Its built up of Compiler directives, Runtime library routines and Environmental variables. OpenMP offers a level of abstraction without taking away the full control of parallelization from the programmer. OpenMP Programs follow the fork-join model, where a master thread forks into a group of threads in parallel regions and joins back to one when the parallel region finishes.
The definition of a compiler directive in the C/C++ implementation of OpenMP is as follows:#pragma omp directive-name [clause ...]
OpenMP programs can be compiled using the GNU compiler by adding the "-fopenmp" flag:gcc -o output-name file-name.c -fopenmp
g++ -o output-name file-name.cpp -fopenmp
To FORK into a team of threads and start parallel execution of a block of code/statements on multiple threads, you have to use the parallel region construct or directive. After the parallel statements are finished the threads of course JOIN back into one.
Using the parallel region construct, definining a parallel block of code (or parallel region) in OpenMP is as simple as:
#pragma omp parallel
{
/* This code runs in parallel */
}
By default, this construct creates a team of N threads, where N is determined at runtime by the maximum number of CPU threads that are available. For example, on a system with a 6-core CPU with 2 threads on each core, N would be 12. The threads that are created are numbered from 0 (master thread) to N-1.
At the end of the parallel region an implied barrier causes only the master thread to continue execution past that point. To get more in-depth, because the master thread of the team has a thread number of 0, checking if the thread can continue executing is as simple as checking if its number is 0 or not.
The construct definition that we showcased so far is of course only the tip of the iceberg(!). Using clauses after "parallel", its easy to define:
The number of threads to be executed in a parallel region can be defined using the num_threads clause:
#pragma omp parallel num_threads(N)
The same can also be achieved using a specific Runtime Library Routine called omp_set_num_threads. This routine takes in an integer and sets the number of threads equal to that integer:int n;
omp_set_num_threads(n);
The number of threads thats executing in the current region can be retrieved using the runtime library routine omp_get_num_threads. When called in a sequential part of the program, the routine returns 1.
The routine takes no parameter and returns an integer:
int n;
n = omp_get_num_threads();
The thread number of a calling thread can be returned using the routine omp_get_thread_num. When inside a sequential region, the function returns 0, else it returns the thread number within the current team:
int thread_num;
thread_num = omp_get_thread_num();
To retrieve the upper bound of threads that can be used to form a team within a parallel region/construct, the routine omp_get_max_threads has to be used. This function tells us the number of threads that will be used when no num_threads clause and omp_set_num_threads routine is encountered:
int max_n;
max_n = omp_get_max_threads()
Using an if clause parallelism can be made conditional. This means that parallellism can be enabled/disabled based on the evaluation of an expression inside the if clause. If the expression inside the parenthesis of the if clause evaluates to true (1) then the parallel region will execute in parallel. On the other hand, when the expression evaluates to false (0) then the region will execute sequentially, using only one thread. For example. this can be quite useful for enabling/disabling parallelism using a global flag.
In C/C++ code:
#pragma omp parallel if(expression)
{
/*
Executes in parallel when expression evaluates to true (1)
Executes sequentially when expression evaluates to false (0)
*/
}
As we already mentioned earlier, the num_threads clause is used for directly specifying the number of threads to be executed.
In C/C++ code:
#pragma omp parallel num_threads(N)
{
/* A team of N threads executes the code block in parallel */
}
Using the default clause we can specifiy the default data sharing behavior.
The data sharing behavior can be:
#pragma omp parallel default(private | firstprivate | shared | none)
Using the private clause we can specify the variables that should be private.
In C/C++ code:
#pragma omp parallel private( list of variables )
Using the shared clause we can specifiy the variables that should be shared.
In C/C++ code:
#pragma omp parallel shared( list of variables )
Note that by default (without overwriting the default clause) all variables are shared within the threads anyway.
Let's print out a simple message in both sequential and parallel regions of code.
#include <omp.h>
#include <stdio.h>
int main(){
// Sequential Region
printf("Hello from Sequential Region\n");
// Parallel Region
#pragma omp parallel
{
printf("Hello from Parallel Region\n");
}
// Sequential Region
printf("Hello from Sequential Region, again\n");
return 0;
}
Without any clauses, how many times will be the message in the parallel region be print out?Let's extend the previous program to:
#include <omp.h>
#include <stdio.h>
int main(){
// Sequential Region
printf("Hello from Sequential Region\n");
printf("Number of executing threads is: %d\n", omp_get_num_threads());
// Parallel Region
#pragma omp parallel num_threads(8)
{
// Only master thread
if(omp_get_thread_num() == 0){
printf("Number of executing threads is: %d\n", omp_get_num_threads());
}
// All threads
printf("Hello from thread %d\n", omp_get_thread_num());
}
// Sequential Region
printf("Hello from Sequential Region, again\n");
printf("Number of executing threads is: %d\n", omp_get_num_threads());
return 0;
}
Compiling and Running the Code:
Let's again modify the previous code to run in parallel only when a specific flag is set to true. For that we have to use the if clause.
The modified code is:
#include <omp.h>
#include <stdio.h>
#define PARALLELISM_ENABLED 0
int main(){
...
// Parallel Region
#pragma omp parallel if(PARALLELISM_ENABLED) num_threads(8)
...
}
With parallelism disabled the output is:Let's lastly also get into data sharing:
#include <omp.h>
#include <stdio.h>
#define PARALLELISM_ENABLED 1
int main(){
int thread_num;
...
// Parallel Region
#pragma omp parallel if(PARALLELISM_ENABLED) private(thread_num) num_threads(8)
{
// Retrieve thread number
thread_num = omp_get_thread_num();
// Only master thread
if(thread_num == 0){
printf("Number of executing threads is: %d\n", omp_get_num_threads());
}
// All threads
printf("Hello from thread %d\n", thread_num);
}
...
}
Running this code we get the same output as before:
Keep on drifting!