[Image 1]
Hey it's a me again drifter1! Today marks the start of a new Parallel Programming series. After covering the basics of multi-threading and multi-process parallelization in Networking, a little bit of MPI (Message Passing Interface) for Distributed Programming, and the OpenMP (Open Multi-Processing) API for easier multi-threaded, shared-memory parallelism, its now time to get into more advanced topics. By advanced I of course mean highly multi-threaded processing, which can be achieved by using GPUs for example.
For GPU Computing there are two APIs out there:
Because the CUDA API is specifically implemented for Nvidia Graphics Cards its also much easier to begin with, and thus this series will be about Nvidia's CUDA API!
Last, but not least, this series will be guided by Nvidia's Documentation on CUDA, but also on my own knowledge and skills that I gained from various projects.
So, without further ado, let's dive straight into it!
The Documentation of the API is fantastic, meaning that all possible installations should be covered.
I personally have a GeForce GTX 1080 Ti, which is of the Pascal Architecture and am using Ubuntu 20.04 LTS as my operating system.
To install the CUDA Toolkit on a GNU/Linux System like Ubuntu, there are basically two choices:
So, after verifying that the GPU and Operating System is CUDA-Capable from the Pre-Installation Actions, its as simple as:
sudo dpkg -i ...)sudo apt-key add ..., sudo apt-key adv ..., etc.)sudo apt-get update)sudo apt-get install cuda)So, why should you care? Why is general-purpose parallel computing using the GPU so popular?
Well its simple, GPUs and CPUs are designed for different purposes:
Most applications have to mix parallel and sequential parts, and so CPUs and GPUs are combined together in order to maximize the overall performance. If the application benefits of high-degrees of parallelism then the massive parallel nature of the GPU will of course achieve higher performance then CPUs. If the application is mostly sequential then parallelism can even make things less efficient, which of course also a problem in CPU multi-processing or multi-threading!
So, after this brief Introduction to the world of GPU Computing, let's now head back to CUDA!
The Nvidia CUDA API is a general-purpose parallel computing platform and programming model that uses Nvidia GPUs in order to solve complex computational problems. CUDA comes with a software environment that can be used in the C/C++ programming language as a high-level API. CUDA is also supported by other programming languages, APIs and directive-based approaches, which include, but are not limited to, FORTRAN, DirectCompute, OpenACC.
CUDA has a low learning curve for programmer familiar with C/C++, as its based on three key abstractions:
Using these abstractions CUDA provides data and thread parallelism at its core. Solving a problem using the GPU is as simple as partitioning the problem into sub-problems that can be solved independently in parallel by blocks of threads. Each sub-problem is then split futher into smaller pieces that can be solved cooperatively in parallel by all threads within the block.
GPUs are built around an array of Streamining Multiprocessors (SMs).
SMs partition a multi-threaded program into blocks of threads, making GPUs with more multi-processors automatically execute programs faster than GPUs with fewer multiprocessors. Similarly, GPUs with more blocks and more threads in each block, also execute the highly-parallel programs much faster.
Nvidia GPUs have a number of CUDA cores, which basically means how many instructions can be executed per circle. How many threads per block and blocks in general the program should use depends on the application. CUDA has some limits per block, dimension etc. that also depend on the architecture and compute capability. In the end its just trial-and-error with such parameters in order to get the best results. There are of course some guidelines that should always be followed!
The thread and block hiearchy will be discussed deeply next time, where we will also write our first CUDA program!
And this is actually it for today's post!
Next time we will get into more details around the Thread Hiearchy in CUDA!
See ya!Keep on drifting!