A new article about using Intel TBB is here. It contains examples using C++ lambdas and joining multi-threaded loops with SIMD code
In this article we will transform a plain C loop into a multi-threaded version using Intel Thread Building Blocks library (TBB).
Here is the loop to transform:
{CODE brush: cpp; ruler: true;}
unsigned char *SrcImagePtr = (unsigned char *)SrcImage;
unsigned char *DstImagePtr = (unsigned char *)DstBuffer;
for (int i = (OriginalImageWidth * OriginalImageHeight); i > 0; i–)
{
int YValue = (SrcImagePtr[0] * FirstFactor ) +
(SrcImagePtr[1] * SecondFactor) +
(SrcImagePtr[2] * ThirdFactor );
SrcImagePtr += PixelOffset;
YValue += 1 << (SCALING_LOG – 1);
YValue >>= SCALING_LOG;
if (YValue > 255)
YValue = 255;
*DstImagePtr = (unsigned char)YValue;
DstImagePtr++;
}{/CODE}
This loops iterates over a three-channel image named SrcImage (usually a RGB one), and it computes the luma value for each pixel storing it into DstImage. As the computation of every pixel has no dependencies whatsoever on other pixel, it is very simple to separate this computation into multiple threads, each performing it on a different slice of the image.
Even if we could directly use threads for such a task, it is much simpler and faster to use an ad-hoc library such as Intel’s Thread Building Blocks.