GPU Shader Tutorial Logo
GPU Shader Tutorial
This tutorial is currently a work in progress. Content may be added, updated, or removed at any time.

Shader Advanced - Branching

A question that might have come up often in past chapters is why simple conditional logic not used to execute different bits of code, instead of relying on transforming the conditional logic into mathematical logic.

Examples:


1
2
3
4
5
highp float diagonal1Factor = step(0.5, randomFactor) * diagonal1Color;
highp float diagonal2Factor = invert_step(0.5, randomFactor) * diagonal2Color;
// highp float diagonal1Factor = step(0.4, randomFactor) * diagonal1Color;
// highp float diagonal2Factor = invert_step(0.6, randomFactor) * diagonal2Color;
highp float diagonalFactor = max(diagonal1Factor, diagonal2Factor);

1
2
3
4
5
6
7
8
9
10
highp float getColorShiftFactor(highp vec3 color) {
  return clamp(ceil(3.0 - (color.r + color.g + color.b)), 0.0, 1.0);
}

void main() {
  highp float colorShift = cos(time / 500.0);
  highp vec4 textureColor = texture2D(diffuseTextureSampler, uv);
  highp float finalColorShift = getColorShiftFactor(textureColor.rgb) * colorShift;
  gl_FragColor = vec4(clamp(textureColor.rgb - finalColorShift, 0.0, 1.0), textureColor.a);
}

The reasons for this is when a GPU hits a branch, the common behavior of GPUs is to run the code for all possible branch outcomes, and then only keep the results of the final outcome.

Since the GPU relies on many parallel calculations being executable at once, branches force the GPU to waste time executing the same shader code multiple times for a single vertex/fragment, instead of for other vertices/fragments.

This means that if there is a significant amount of code that is only executed based upon a condition, then a lot of time is wasted by the GPU executing a lot of code that may not be required.

However, using branches is not always discouraged. Some examples are:

  • Branches that are based on the value of a uniform shouldn't lead to a performance bottleneck, since such a branch will always have the same result, irrespective of which vertex or fragment is being operated on.
  • Similar to the last point, if the branch results are consistent (always have the same outcome), there shouldn't be a performance bottleneck.
  • Branches that are used to set the value for a variable (ex: variable = condition ? value1 : value2) can be performed efficiently by GPUs.
  • Branches that are consistent over a group of pixels (ex: 8x8 group) should not produce a major performance penalty*.
  • For certain other cases of branching, they may be optimized to not cause major performance degredation*.

* - This is GPU and driver dependent, so such code would require extensive testing to verify.

Note: Certain GLSL functions used so far (ex: clamp) don't have a performance impact although they are expected to cause branching.

This is because either OpenGL optimizes such functions, or the GPU has special hardware or driver code optimizations to ensure that such code will not have any performance impact.

Some of these functions and operations have been supported by GPUs before they supported branching operations (ex: clamp).

Summary

  • Branching is generally discouraged to be performed in shaders and can negatively impact performance except in certain scenarios.
  • Test to see if a branch affects performance, but remember that it can be GPU and driver dependent. Preferrably use branches only when you have to.