Programming - Tasks in OpenMP

[Image 1]

Introduction

Hey it's a me again @drifter1!

Today we continue with the Parallel Programming series about the OpenMP API. I highly suggest you to go read the previous articles of the series, that you can find by the end of this one. Today we will get into how we use Task Constructs.

So, without further ado, let's get straight into it!

GitHub Repository


Requirements - Prerequisites


Quick Recap

A parallel region is defined using a parallel region construct:

#pragma omp parallel [clause ...]
{
    /* This code runs in parallel */
}
Parallel For Loops are defined by using the following syntax:
int i;
#pragma omp parallel for private(i) [clause ...]
for(i = 0; i < N; i++){
    ...
}
Non-iterative work-sharing is defined using a sections construct with nested section directives:
#pragma omp sections [clause ...]
{
    /* run in parallel by all threads of the team */
#pragma omp section { /* run once by one thread */ }
#pragma omp section { /* run once by one thread */ } ... }
To create sections and a team of parallel threads at the same time we can use the shortcut:
#pragma omp parallel sections
When only one section inside a parallel region has to be executed by one thread then we can use the single construct:
#pragma omp single
Critical Sections are specified using the following construct:
#pragma omp critical [[(name)] [ hint (hint-expr)]] [clause ...]
{
    /* critical section code */
}
Atomic operations are defined as:
#pragma omp atomic [read | write | update | capture] [hint(hint-expr)] [clause ...]
    /* statement expression*/


Task Construct

When parallelizing a specific algorithm is too difficult using for loops and sections, then task constructs come into play!

The general format of such a construct is:

#pragma omp task [clause ...]
{
    /* code block */
}

The explicit task can further be configured using the following clauses:

  • default, private, firstprivate and shared clauses - Define Lists of Variables with specific Data-Sharing
  • if clause(condition) - Specify Conditional Parallelism
  • untied clause - If present, any thread in the team can resume the task region after being suspended
  • mergeable clause - Specifiy that the task is mergeable
  • depend(dependence-type : list) clause - Enforce additional constraints on the scheduling of tasks or loop iterations
  • priority(numerical-value) clause - Non-negative numerical scalar expression that specifies a hint for the priority of the generated task
  • final(scalar-logical-expr) clause - Specify that the generated task is the final task if the expression evaluates to true

The difference with the parallelization that has been covered up to now is that the thread that creates the task, doesn't necessarily execute the task as well. Parallel constructs create implicit tasks, whist a task construct creates a somewhat explicit tasks. Explicit tasks are created by a running thread and later executed, probably by a different thread. The exact timing is handled by a task scheduler.

This makes tasks useful for parallelizing processes that don't fit the previously covered constructs, such as while loops or operations on data structures like linked lists, trees etc.


Task Synchronization

Taskwait Directive

When some kind of synchronization is needed, then the taskwait directive is OpenMP's weapon of choice.

The taskwait directive is defined as:

#pragma omp taskwait
and specifies a wait on the completion of child tasks of the current task.

The code after a taskwait only gets executed after all the tasks before the taskwait have finished their execution:

#pragma omp task
/* task code */
#pragma omp task /* task code */
...
#pragma omp taskwait /* code after taskwait */

Taskyield Directive

If the execution of a task can be suspended in favor of the execution of a different task then the taskyield directive is used.

The taskyield directive is defined as:

#pragma omp taskyield


Taskloop Construct

A Taskloop construct is used to specify that iterations of one or more associated loops should be executed in parallel using tasks.

The General format of a Taskloop Construct is:

#pragma omp taskloop [clause ...]
    /* for loops */

A Taskloop can further be configure using the following clauses:

  • default, private, firstprivate and shared clauses - Define Lists of Variables with specific Data-Sharing
  • if clause(condition) - Specify Conditional Parallelism
  • untied clause - If present, any thread in the team can resume the task region after being suspended
  • mergeable clause - Specifiy that the task is mergeable
  • priority(numerical-value) clause - Non-negative numerical scalar expression that specifies a hint for the priority of the generated task
  • final(scalar-logical-expr) clause - Specify that the generated task is the final task if the expression evaluates to true
  • grainsize(grain-size) clause - Cause the number of logical loop iterations assigned to each created task to be greater than or equal to the minimum of the value of the grain-size expressions and the number of logical loop iterations, but less than two times the value of grain-size
  • num_tasks(num-tasks) clause - Create tasks that are the minimum of the num-tasks expression and the number of logical loop iterations
  • collapse(n) clause - Specify how many loops are associated with the loop construct

Such a work-sharing loop should not be confused with parallel for loops, as the sharing works in a different way...


Taskgroup Construct

Tasks can also be grouped together using taskgroup constructs.

The general format of a Taskgroup Construct is:

#pragma omp taskgroup
{
    #pragma omp task
    /* task code*/
#pragma omp task /* task code*/
...
}

When a running task finds a taskgroup, it pauses and doesn't execute again, until all tasks in the taskgroup have finished.


Example Programs

Parallel For using Tasks

A parallel for loop can be defined using a parallel for construct, but also using tasks.

In order to get a for loop inside of a parallel region to execute in parallel using tasks, the for loop is declared as single (pragma omp single) and the code-block inside of the for loop then contains a task. This way, each iteration only gets one task, which then is scheduled to be executed by a thread of the team.

The following two for loops have identical behavior in OpenMP:

#include <omp.h>
#include <stdio.h>
#define N 10
int main(){ int i;
/* parallel for */ printf("Parallel For:\n"); #pragma omp parallel for for (i = 0; i < N; i++){ printf("Thread %d runs iteration %d\n", omp_get_thread_num(), i); }
/* parallel for using tasks */ printf("\nParallel For using Tasks:\n"); #pragma omp parallel { # pragma omp single for (i = 0; i < N; i++){ #pragma omp task printf("Thread %d runs iteration %d\n", omp_get_thread_num(), i); } }
return 0; }

Compiling and executing this code on my system, with 8 threads, we have the following output:

In both cases the loops are executed unordered and by different threads...

Parallel While Loop using Tasks

Let's now pass through a list of integers, first in serial fashion and then using a parallel while loop using tasks...

List Structure

The structure of a List can be defined as:

typedef struct List{
    int data;
    struct List *next;
}List;

In order to add items an addItem() function is necessary. For the sake of simplicity this implementation will add the item at the beginning. Meaning that the newly added item becomes the list head.

In C-code such a function looks as follows:

List* addItem(List *list, int data){
    List *newList = NULL;
newList = (List*) malloc(1 * sizeof(List)); newList->next = list; newList->data = data;
return newList; }

Fill List with random Values

Filling such a list with random integers is quite easy, and can be done as follows:

int i, r;
List *list = NULL;
/* fill list with N random items */ srand(time(NULL)); for(i = 0; i < N; i++){ r = 1 + rand() % 100; list = addItem(list, r); printf("Added %d to list\n", r); }

Serial processing

To process a list a dummy/temporary list p is created, which can be passed through using p = p -> next. The list ends when p equals NULL and so the processing can be easily implemented using a while loop.

In C-code this looks as follows:

List *p = NULL;
p = list; while (p != NULL){ printf("Processing %d...\n", p->data); p = p->next; }

Parallel Processing using Tasks

In order to process the same List using Tasks, the while loop is defined as single (pragma omp single) and the print/process statement is defined as a task.

In C-code:

p = list;
#pragma omp parallel
{
    #pragma omp single
    {
        while(p != NULL){
            #pragma omp task
            {
                printf("Thread %d processes %d\n", omp_get_thread_num(), p->data);
            }
            p = p->next;             
        }
    }
}

Console Output

The output of this program is:

The items are printed in the opposite order when Serial Processing, because of the way that items are inserted to the list. On the other hand, using Tasks the elements of the list are processed in a random fashion, and by different threads each time.


RESOURCES:

References

  1. https://www.openmp.org/resources/refguides/
  2. https://computing.llnl.gov/tutorials/openMP/
  3. https://bisqwit.iki.fi/story/howto/openmp/
  4. https://nanxiao.gitbooks.io/openmp-little-book/content/

Images


Previous articles about the OpenMP API


Final words | Next up

And this is actually it for today's post!

Next time we will get into Barriers and Locks...

See ya!

Keep on drifting!

H2
H3
H4
3 columns
2 columns
1 column
1 Comment
Ecency