Debugging OpenCL programs with Oclgrind

Share on linkedin
Share on twitter
Share on facebook
Share on google

James Price is currently completing a PhD degree at the Department of Computer Science, University of Bristol.

When developing programs that utilise GPU compute via OpenCL, we can’t use our traditional CPU development tools. This can make debugging complex OpenCL kernels challenging. As part of my PhD, funded by Imagination Technologies, I’ve developed an OpenCL device simulator called Oclgrind, which enables a raft of tools that can make debugging OpenCL kernels a much simpler task.


At its core, Oclgrind simulates how an OpenCL device executes a kernel. It does this in a manner which is independent from any specific architecture, which enables it to uncover many portability issues that arise during OpenCL development. By exposing a simple plugin interface, Oclgrind enables the creation of a wide variety of tools that can be used to analyse or debug OpenCL kernels.

Using Oclgrind is straightforward, as it implements the full OpenCL 1.2 runtime API. This means that existing OpenCL programs can be simulated without requiring modifications. An alternative interface for simulating specific kernels in isolation is also provided.

Detecting invalid memory accesses

Accessing memory locations that have not been properly allocated is a common problem that can cause headaches for developers using GPU compute. This varies between different platforms, but many GPUs don’t have the capability to detect these errors at the hardware level and can’t produce meaningful feedback about what went wrong.

Oclgrind provides a simple plugin that will check each memory access that your OpenCL kernels perform to ensure they do not access memory that they shouldn’t, and provides clear diagnostics to aid the developer in finding the problem when something does go wrong. Take the following kernel as an example – a trivial parallel vector addition.

kernel void vecadd(global float *a, global float *b, global float *c)
  int i = get_global_id(0);
  c[i] = a[i] + b[i];

Running this kernel through Oclgrind with a global work size of 1024 but using buffers that are only 4000 bytes in size produces the following error message:

Invalid write of size 4 at global memory address 0x3000000000fa0
  Kernel: vecadd
  Entity: Global(1000,0,0) Local(8,0,0) Group(31,0,0)
    store float %7, float addrspace(1)* %8, align 4, !dbg !21
At line 8 of
  c[i] = a[i] + b[i];

This shows us that we were trying to write 4 bytes (one 32-bit float) to global memory. It gives us the specific kernel and line of code, and also tells us which work-item was responsible. In this case, it’s clear that we need to guard these memory accesses by checking that the index is within the bounds of our problem.

Detecting race conditions

Another common bug that arises when writing parallel programs is when insufficient synchronisation is provided when memory is accessed concurrently from multiple units of execution, resulting in a race condition. These defects can be difficult to spot when dealing with complex programs, and might only cause problems on certain devices.

Consider the simple kernel depicted below. Here we have a shared variable named ‘temp’ which is allocated in the local address space, and is therefore shared between work-items in the same work-group. In this kernel, the work-item with a local ID of 1 assigns a value to this variable, and then the work-item with local ID 0 tries to read and use the variable.

kernel void racy(global float *input, global float *output)
  local float temp;
  int lid = get_local_id(0);

  if (lid == 1)
    temp = input[lid];
  if (lid == 0)
  output[0] = temp;

We can enable the data-race detection plugin in Oclgrind by passing the –data-races flags. If we simulate the above kernel, Oclgrind will inform us of the potential race with a message like this:

Read-write data race at local memory address 0x1000000000000
  Kernel: racy
  First entity: Global(1,0,0) Local(1,0,0) Group(0,0,0)
    store float %5, float addrspace(3)* @racy.temp, align 4, !dbg !23
  At line 7 of
    temp = input[lid];
  Second entity: Global(0,0,0) Local(0,0,0) Group(0,0,0)
    %7 = load float, float addrspace(3)* @racy.temp, align 4, !dbg !26
  At line 10 of
    output[0] = temp;

This tells us that we have a data-race on a local memory address, between two work-items with IDs 0 and 1. It also shows the lines of code that were responsible for the problem. Adding a barrier(CLK_LOCAL_MEM_FENCE) between these two statements and then running the code again gives a clean bill of health.

Interactive source-line debugging

Sometimes, getting error messages after-the-fact is not enough to get to the root of a problem, and you need to really delve into a kernel to find out why it is misbehaving. Oclgrind provides a plugin that allows a developer to interactively step through kernel execution in a GDB-like command-line environment. The plugin can be enabled by passing the –interactive flag when we launch Oclgrind. When kernel execution begins, we will be provided with a prompt at which we can enter a variety of commands. Much like GDB, we can step through our kernel line-by-line, print out the values of variables or memory addresses, or instruct the plugin to break at a particular line of code (type help for a full list of commands). This plugin also interacts with the other plugins, and will automatically break into a prompt if an error is encountered.

Checking for runtime API errors

Writing OpenCL programs isn’t just about developing kernels, and there’s plenty of issues that can arise when using the various runtime API functions from the host. The error messages that these API calls return can sometimes be confusing, or they could indicate multiple possible causes. By passing the –check-api flag to Oclgrind we can get it to give us human-readable error messages that describe the exact problem that an API call is trying to report:

Oclgrind - OpenCL runtime error detected
  Function: clEnqueueNDRangeKernel
  local_work_size[0]=32 does not divide global_work_size[0]=1000

Additional information

Oclgrind is an open source tool, and is provided under a BSD license. Binaries for Windows, Linux and OS X are available.

More information about using Oclgrind can be found on its Wiki.

Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom
Tel: +44 (0)1923 260 511

Search by Tag

Search for posts by tag.

Search by Author

Search for posts by one of our authors.

Featured posts
Popular posts

Related blog articles

Three great reasons to visit the Imagination booth at GDC 2019

The Game Developer Conference is a highlight of the year for anyone involved in the graphics world and naturally, Imagination is there, located at booth S763, at the Moscone Center, in San Francisco. If you’re looking to find out a

Making Unreal Engine and Unity profiling on PowerVR easier

The PowerVR Developer Technology team has always been fully committed to making development for PowerVR easier, by continually improving our groundbreaking tools and SDK. Our ecosystem is very important to us, so we’re focusing on the things developers really need

Product and event round-up from the experts in GPU and AI

It’s certainly been a busy few months for Imagination. Towards the latter end of last year, we released a raft of new products and initiatives, and a new CEO took the helm giving us real momentum for 2019. At the

Stay up-to-date with Imagination

Sign up to receive the latest news and product updates from Imagination straight to your inbox.

  • This field is for validation purposes and should be left unchanged.