Multithreading in C, POSIX style

Multithreading — An Overview

In most modern operating systems it is possible for an application to split into many "threads" that all execute concurrently. It might not be immediately obvious why this is useful, but there are numerous reasons why this is beneficial.

When a program is split into many threads, each thread acts like its own individual program, except that all the threads work in the same memory space, so all their memory is shared. This makes communication between threads fairly simple, but there are a few caveats that will be noted later.

So, what does multithreading do for us?

Well, for starters, multiple threads can run on multiple CPUs, providing a performance improvement. A multithreaded application works just as well on a single-CPU system, but without the added speed. As multi-core processors become commonplace, such as Dual-Core processors and Intel Pentium 4's with HyperThreading, multithreading will be one of the simplest ways to boost performance.

Secondly, and often more importantly, it allows the programmer to divide each particular job of a program up into its own piece that operates independently of all the others. This becomes particularly important when many threads are doing blocking I/O operations.

A media player, for example, can have a thread for pre-buffering the incoming media, possibly from a harddrive, CD, DVD, or network socket, a thread to process user input, and a thread to play the actual media. A stall in any single thread won't keep the others from doing their jobs.

For the operating system, switching between threads is normally cheaper than switching between processes. This is because the memory management information doesn't change between threads, only the stack and register set do, which means less data to copy on context switches.

Multithreading — Basic Concepts

Multithreaded applications often require synchronization objects. These objects are used to protect memory from being modified by multiple threads at the same time, which might make the data incorrect.

The first, and simplest, is an object called a mutex. A mutex is like a lock. A thread can lock it, and then any subsequent attempt to lock it, by the same thread or any other, will cause the attempting thread to block until the mutex is unlocked. These are very handy for keeping data structures correct from all the threads' points of view. For example, imagine a very large linked list. If one thread deletes a node at the same time that another thread is trying to walk the list, it is possible for the walking thread to fall off the list, so to speak, if the node is deleted or changed. Using a mutex to "lock" the list keeps this from happening.

Computer Scientist people will tell you that Mutex stands for Mutual Exclusion.
In Java, Mutex-like behaviour is accomplished using the synchronized keyword.

Technically speaking, only the thread that locks a mutex can unlock it, but sometimes operating systems will allow any thread to unlock it. Doing this is, of course, a Bad Idea. If you need this kind of functionality, read on about the semaphore in the next paragraph.

Similar to the mutex is the semaphore. A semaphore is like a mutex that counts instead of locks. If it reaches zero, the next attempt to access the semaphore will block until someone else increases it. This is useful for resource management when there is more than one resource, or if two separate threads are using the same resource in coordination. Common terminology for using semaphores is "uping" and "downing", where upping increases the count and downing decreases and blocks on zero.
Java provides a Class called Semaphore which does the same thing, but uses acquire() and release() methods instead of uping and downing.

With a name as cool-sounding as semaphore, even Computer Scientists couldn't think up what this is short for. (Yes, I know that a semaphore is a signal or flag ;)

Unlike mutexes, semaphores are designed to allow multiple threads to up and down them all at once. If you create a semaphore with a count of 1, it will act just like a mutex, with the ability to allow other threads to unlock it.

The third and final structure is the thread itself. More specifically, thread identifiers. These are useful for getting certain threads to wait for other threads, or for getting threads to tell other threads interesting things.

Computer Scientists like to refer to the pieces of code protected by mutexes and semaphores as Critical Sections. In general, it's a good idea to keep Critical Sections as short as possible to allow the application to be as parallel as possible. The larger the critical section, the more likely it is that multiple threads will hit it at the same time, causing stalls.

In POSIX, the types we'll be dealing with are pthread_t for thread identifiers, pthread_mutex_t for mutexes, and sem_t for semaphores. We use the word "pthread" a lot because it stands for POSIX Threads.

Compiling Multithreaded Programs

Compiling multithreaded applications will require a few minor tweaks to our build setup. First, we'll need to include the appropriate header file. For POSIX systems, this header is called pthread.h. This header defines all the functions we'll be using to make threads. If we're using semaphores we'll also need to include semaphore.h.

#include <pthread.h>
#include <semaphore.h>

The next change is that we'll need to link our program with the pthread library to use its functions. For a compiler like gcc we simply use the -l option, like this:

gcc myProgram.o -o myProgram -lpthread

Now that we've got the header in place, and we know how to link our program, let's get started.

Creating a thread

Creating a pthread is fairly easy. The function pthread_create is used, and it takes 4 arguments.

int pthread_create(pthread_t * pth, pthread_attr_t *att, void * (*function), void * arg);

The first argument is a pointer to a pthread_t, where the function stores the identifier of the newly-created thread. The next argument is the attribute argument. This is typically NULL, but can also point to a structure that changes the thread's attributes. the third argument is the function the new thread will start at. If the thread returns from the function, the thread is terminated as well. You can think of the function as main, since it behaves similarly. The final argument is passed to the function when the thread is started. this is similar to the argc/argv command line arguments to main, but it can be any data type. Zero is returned on success, otherwise a failure of some variety happened.

Inside the thread function, a thread can terminate itself by returning from the thread function or by calling pthread_exit. They behave identically.

A thread can also be "detached", which frees all the resources the thread acquired while it was running as soon as it terminates. This is accomplished with pthread_detach. A detached thread can't be waited on.

Stopping a thread

Sometimes an application may wish to stop a thread that is currently executing. The function pthread_cancel can help us accomplish this.

int pthread_cancel(pthread_t thread);

The only argument to pthread_cancel is the thread identifier for the thread to be cancelled. It returns zero if successful, or an error code otherwise.

A thread can set whether or not it can be cancelled by using int pthread_setcancelstate.

Mutexes and Semaphores

Mutexes are fairly easy to create. The function we use is pthread_mutex_init, which takes 2 parameters. The first is a pointer to a mutex_t that we're creating. The second parameter is usually NULL, but can also be a pthread_mutexattr_t structure that specifies different attributes for it.

To lock and unlock a mutex, use pthread_mutex_lock and pthread_mutex_unlock. These both take 1 parameter: a pointer to the mutex being operated on. pthread_mutex_trylock is similar to pthread_mutex_lock, except that if it can't lock the mutex, it returns a error instead of blocking.

When the mutex is no longer needed, it can be freed with pthread_mutex_destroy.


Semaphores follow a similar paradigm. They are initialized with sem_init, which takes 3 parameters. The first is a pointer to the semaphore being initialized. The second is always zero. This argument is used to denote semaphores shared between processes, but it isn't always supported. The third argument specifies the initial value of the newly created semaphore.

To "Up" a semaphore, use sem_post. To "Down" a semaphore, use sem_wait. These kind of parallel pthread_mutex_lock and pthread_mutex_unlock.

sem_destroy is used to destroy a semaphore once it is no longer needed.

Multithreading — Waiting for other threads

It is also possible to make one thread stop and wait for another thread to finish. This is accomplished with pthread_join. This function takes a pthread_t identifier to pick which thread to wait for, and takes a void ** parameter to capture the return value. Joining a thread that has already exited is possible, and performing this will free any resources the thread had not already deallocated. In GNU/Linux, as well as other UNIX-like operating systems, these unjoined threads are called zombies.

Note that only 1 thread can wait for any other thread. A detached thread (with pthread_detach) can't be waited on either.

Here's some example code to illustrate pthread_join:


#include <stdio.h>
#include <pthread.h>

/* This is our thread function.  It is like main(), but for a thread */
void *threadFunc(void *arg)
{
	char *str;
	int i = 0;

	str=(char*)arg;

	while(i < 10 )
	{
		usleep(1);
		printf("threadFunc says: %s\n",str);
		++i;
	}

	return NULL;
}

int main(void)
{
	pthread_t pth;	// this is our thread identifier
	int i = 0;

	/* Create worker thread */
	pthread_create(&pth,NULL,threadFunc,"processing...");

	/* wait for our thread to finish before continuing */
	pthread_join(pth, NULL /* void ** return value could go here */);

	while(i < 10 )
	{
		usleep(1);
		printf("main() is running...\n");
		++i;
	}

	return 0;
}


Running this code will produce a bunch of text from threadFunc(), and then a bunch from main().

Multithreading — Example Source

Here's some example code to illustrate thread creation:


#include <pthread.h>
#include <stdio.h>

/* This is our thread function.  It is like main(), but for a thread*/
void *threadFunc(void *arg)
{
	char *str;
	int i = 0;

	str=(char*)arg;

	while(i < 110 )
	{
		usleep(1);
		printf("threadFunc says: %s\n",str);
		++i;
	}

	return NULL;
}

int main(void)
{
	pthread_t pth;	// this is our thread identifier
	int i = 0;

	pthread_create(&pth,NULL,threadFunc,"foo");
	
	while(i < 100)
	{
		usleep(1);
		printf("main is running...\n");
		++i;
	}

	printf("main waiting for thread to terminate...\n");
	pthread_join(pth,NULL);

	return 0;
}



The output will be (mostly) alternating lines as the main() and threadFunc() threads execute and pause. Without the usleep()'s they'll not switch because we aren't doing anything that takes long enough to consume our whole time slice.

We could capture the return value in the pthread_join() call if we used a variable instead of NULL for the second argument.

Performance Considerations

When designing an application for threads, or converting an existing program, there are some considerations to keep in mind when it comes to threads.

First, thread creation tends to be expensive -- spawning thousands of threads with short lifetimes usually isn't time-effective. If you need to create threads frequently, a common pattern used to reduce this cost is a "Thread Pool". At startup, the application will spawn a number of threads and supply them on demand. When the thread task completes, the thread returns to the pool for reuse later. Fancier implementations will dynamically close threads when there's too much of a surplus, or spawn additional threads when there's a shortage.

Each additional thread also gets its own stack. This stack space can be large, which can consume a lot of memory space (especially in 32bit applications). There are methods to reduce a thread's stack size using the pthreads API. For small numbers of threads this usually isn't a concern, but it's something to keep in mind.

Lock contention (when two or more threads are trying to acquire the same lock) requires skillful design to keep as many threads operating in parallel as possible. There are several volumes of literature on ways to design locks, lock heirarchies, and other variations to mitigate this cost.

Multithreading Terms

There are many terms used when writing multithreaded applications. I'll try to describe a few of there here.

Deadlock — A state where two or more threads each hold a lock that the others need to finish. For example, if one thread has locked mutex A and needs to lock mutex B to finish, while another thread is holding mutex B and is waiting for mutex A to be released, they are in a state of deadlock. The threads are stuck, and cannot finish. One way to avoid deadlock is to acquire necessary mutexes in the same order (always get mutex A then B). Another is to see if a mutex is available via pthread_mutex_trylock, and release any held locks if one isn't available.

Race Condition — A program that depends on threads working in a certain sequence to complete normally. Race Conditions happen when mutexes are used improperly, or not at all.

Thread-Safe — A library that is designed to be used in multithreaded applications is said to be thread-safe. If a library is not thread-safe, then one and only one thread should make calls to that library's functions.
Styles: Default · Sianse · Green