[Image 1]
Today we continue with the Parallel Programming series about the OpenMP API. I highly suggest you to go read the previous articles of the series, that you can find by the end of this one. Today we will get into how we use Task Constructs.
So, without further ado, let's get straight into it!A parallel region is defined using a parallel region construct:
#pragma omp parallel [clause ...]
{
/* This code runs in parallel */
}
Parallel For Loops are defined by using the following syntax:
int i;
#pragma omp parallel for private(i) [clause ...]
for(i = 0; i < N; i++){
...
}
Non-iterative work-sharing is defined using a sections construct with nested section directives:
#pragma omp sections [clause ...]
{
/* run in parallel by all threads of the team */
#pragma omp section
{
/* run once by one thread */
}
#pragma omp section
{
/* run once by one thread */
}
...
}
To create sections and a team of parallel threads at the same time we can use the shortcut:
#pragma omp parallel sections
When only one section inside a parallel region has to be executed by one thread then we can use the single construct:
#pragma omp single
Critical Sections are specified using the following construct:
#pragma omp critical [[(name)] [ hint (hint-expr)]] [clause ...]
{
/* critical section code */
}
Atomic operations are defined as:
#pragma omp atomic [read | write | update | capture] [hint(hint-expr)] [clause ...]
/* statement expression*/
When parallelizing a specific algorithm is too difficult using for loops and sections, then task constructs come into play!
The general format of such a construct is:
#pragma omp task [clause ...]
{
/* code block */
}
The explicit task can further be configured using the following clauses:
The difference with the parallelization that has been covered up to now is that the thread that creates the task, doesn't necessarily execute the task as well. Parallel constructs create implicit tasks, whist a task construct creates a somewhat explicit tasks. Explicit tasks are created by a running thread and later executed, probably by a different thread. The exact timing is handled by a task scheduler.
This makes tasks useful for parallelizing processes that don't fit the previously covered constructs, such as while loops or operations on data structures like linked lists, trees etc.
When some kind of synchronization is needed, then the taskwait directive is OpenMP's weapon of choice.
The taskwait directive is defined as:
#pragma omp taskwait
and specifies a wait on the completion of child tasks of the current task.
The code after a taskwait only gets executed after all the tasks before the taskwait have finished their execution:
#pragma omp task
/* task code */
#pragma omp task
/* task code */
...
#pragma omp taskwait
/* code after taskwait */
If the execution of a task can be suspended in favor of the execution of a different task then the taskyield directive is used.
The taskyield directive is defined as:
#pragma omp taskyield
A Taskloop construct is used to specify that iterations of one or more associated loops should be executed in parallel using tasks.
The General format of a Taskloop Construct is:
#pragma omp taskloop [clause ...]
/* for loops */
A Taskloop can further be configure using the following clauses:
Such a work-sharing loop should not be confused with parallel for loops, as the sharing works in a different way...
Tasks can also be grouped together using taskgroup constructs.
The general format of a Taskgroup Construct is:
#pragma omp taskgroup
{
#pragma omp task
/* task code*/
#pragma omp task
/* task code*/
...
}
When a running task finds a taskgroup, it pauses and doesn't execute again, until all tasks in the taskgroup have finished.
A parallel for loop can be defined using a parallel for construct, but also using tasks.
In order to get a for loop inside of a parallel region to execute in parallel using tasks, the for loop is declared as single (pragma omp single) and the code-block inside of the for loop then contains a task.
This way, each iteration only gets one task, which then is scheduled to be executed by a thread of the team.
The following two for loops have identical behavior in OpenMP:
#include <omp.h>
#include <stdio.h>
#define N 10
int main(){
int i;
/* parallel for */
printf("Parallel For:\n");
#pragma omp parallel for
for (i = 0; i < N; i++){
printf("Thread %d runs iteration %d\n", omp_get_thread_num(), i);
}
/* parallel for using tasks */
printf("\nParallel For using Tasks:\n");
#pragma omp parallel
{
# pragma omp single
for (i = 0; i < N; i++){
#pragma omp task
printf("Thread %d runs iteration %d\n", omp_get_thread_num(), i);
}
}
return 0;
}
Compiling and executing this code on my system, with 8 threads, we have the following output:
In both cases the loops are executed unordered and by different threads...
Let's now pass through a list of integers, first in serial fashion and then using a parallel while loop using tasks...
The structure of a List can be defined as:
typedef struct List{
int data;
struct List *next;
}List;
In order to add items an addItem() function is necessary.
For the sake of simplicity this implementation will add the item at the beginning.
Meaning that the newly added item becomes the list head.
In C-code such a function looks as follows:
List* addItem(List *list, int data){
List *newList = NULL;
newList = (List*) malloc(1 * sizeof(List));
newList->next = list;
newList->data = data;
return newList;
}
Filling such a list with random integers is quite easy, and can be done as follows:
int i, r;
List *list = NULL;
/* fill list with N random items */
srand(time(NULL));
for(i = 0; i < N; i++){
r = 1 + rand() % 100;
list = addItem(list, r);
printf("Added %d to list\n", r);
}
To process a list a dummy/temporary list p is created, which can be passed through using p = p -> next.
The list ends when p equals NULL and so the processing can be easily implemented using a while loop.
In C-code this looks as follows:
List *p = NULL;
p = list;
while (p != NULL){
printf("Processing %d...\n", p->data);
p = p->next;
}
In order to process the same List using Tasks, the while loop is defined as single (pragma omp single) and the print/process statement is defined as a task.
In C-code:
p = list;
#pragma omp parallel
{
#pragma omp single
{
while(p != NULL){
#pragma omp task
{
printf("Thread %d processes %d\n", omp_get_thread_num(), p->data);
}
p = p->next;
}
}
}
The output of this program is:
The items are printed in the opposite order when Serial Processing, because of the way that items are inserted to the list. On the other hand, using Tasks the elements of the list are processed in a random fashion, and by different threads each time.

Keep on drifting!