Saturday 30 January 2016

Empty loop not responding to exit condition (also: how to multi-thread on Windows)

I had the following code
HANDLE * threadHandles
bool exitThreads
void threadEntryPoint(void * data)
{
    while (exitThreads == false){}
    _endthreadex(1);
}

void launchThreads(int nThreads)
{
    exitThreads = false;
    threadHandles = new HANDLE[nThreads];
    for (uint32_t i = 0; i < nThreads; ++i) {
        threadHandles[i] =
            (HANDLE)_beginthreadex(0,
            0,
            (uint32_t(__stdcall *)(void*))threadEntryPoint,
            NULL,
            0,
            0);
    }
}

void killThreads(int nThreads)
{
    exitThreads = true;
    for (uint32_t i = 0; i < nThreads; ++i) {
        WaitForSingleObject(threadHandles[i], INFINITE);
        CloseHandle(threadHandles[i]);
    }
    delete[] threadHandles;
}
This code will launch nThreads threads which will loop pointlessly until you call killThreads(). Ignore why I chose beginthreadex() over beginthread(). This is a good way to max out the cores on your CPU.

Conceptually this should work. And yet it didn't. The program was hanging it seemed as there was none of the expected output. Thankfully I had only launched it with 3 threads on a quadcore machine - I suspect launching it with 4 would've caused the machine to hang. The first thing I tried was debug mode. Multi-threading can get tricky, so might as well start by debugging it. Oddly enough it worked as expected in debug mode. That was weird, maybe the threads aren't printing properly? I then polluted my code with a bunch of cout statements with flushes in release mode, but I found that it would get to WaitForSingleObject() and then stop printing. So for whatever reason the threads were not exiting.

A quick google search turned up this beauty. It seems that the compiler tried to optimize the release version by not loading the value of exitThreads every loop iteration and instead assume it's always false. By adding the the keyword volatile before the bool exitThreads declaration, the compiler is forced to have the loop load the value every iteration - because the value is possibly volatile. I was aware of this concept before, but it's not something I deal with often so it's easy to slip the mind.