Evolution of a Parallel Program
Shared Memory - pthreads
The C program is extended to utilize the POSIX thread functions found
on our SMP linux machines with the
Pthreads Library. This allows
different threads of execution to run in concert together in a single
memory and process space, but yet take advantage of the multiple
processors. One of the critical things to keep in mind here is that
both threads (one for each processor) will be running in the same
memory space concurrently. This means that access to variables
in the threaded part of the program must be controlled to avoid data
mangling by the separate threads. There are many methods to control
access to shared memory [ie semaphores, mutex's and critical
sections], but this problem does not require any sophisticated data
sharing and the approach used here exemplifies simplicity and
reliability.
After having analyzed our program we decide that for larger ranges
the bulk of the computation resides in the "findPrimes" function.
This indicates that the program as a whole will speed up if we can
increase the resources given to this function, this is commonly known
as
Amdahl's Law. Since a speedup in this function would induce a
speedup for the entire calculation we focus our attention here. This
function is thus our candidate for multithreading.
An instantiation of a new pthread allows for the specification of a
function for that thread of execution and for a pointer to a data
block cast as (void *). This highlights the difference
between the prototypes to the serial and threaded versions
of the computational part of the program.
int isPrime(int testNum) {
int i;
if(testNum < 2) return 0;
if(testNum == 2) return 1;
for( i=2; i < sqrt(testNum)+1; i++)
if (!(testNum % i))
return 0;
return 1;
}
void findPrimes (void* Data) {
int *Params = (int *)Data;
int i, j, x, y;
const int ID = Params[0];
const int lbound = Params[1];
const int ubound = Params[2];
prime_count[ID] = 0;
printf("ID = %d :: range = [%d, %d]\n",
ID, lbound, ubound);
for(i = lbound; i < ubound; i+=4) {
if (isPrime(i)) prime_count[ID]++;
if (isPrime(i+1)) prime_count[ID]++;
}
pthread_exit((void*)&prime_count[ID]);
}
The most crucial difference here is that the counter,
prime_count[] is an array and is indexed by a unique thread ID,
specific to each thread. This is essential since prime_count
is in shared memory space and if we were to use a single variable
then both threads will attempt to access it, unreliably, and we will
get indeterminate results without careful management of access.
Another difference here is the in the loop of findPrimes().
Notice that isPrime() is called 2 times a cycle and the loop
iterator is incremented by 4. This is a load-balancing
technique that i) takes advantage of the fact that even numbers are
not prime, so that with 2 threads, one thread does not check
exclusively even numbers, and ii) the threads check alternate odd
numbers (based on their starting point, handled below) so that there
is an inherent balance in the amount of work done by each thread.
Also note that since the findPrimes() function will run
inside a thread, a graceful exit is assured with a call to
pthread_exit to deallocate any thread management resources
and to ensure proper return of the critical data from the thread.
This could equally well be done in the standard C manner with a
function return value.
The control of the program is slightly modified as well. We see that
some thought must be given to how the workload will be pursed out and
this requires, in this case, of how the data will be used inside the
thread itself. Since we have chosen a "leap-frog" approach to avoid
difficulties of poor load balancing we must stagger the start and end
points of the ranges of data given to the threads. This is seen in
the calculation of the even[] and odd[] arrays
before the threads are invoked.
int main(int argc, char* argv[]) {
pthread_t thread[Num_Proc];
int i, retval0, retval1;
int Range[3] = {0, 100, 0};
int odd[3], even[3];
if (argc == 3) {
for (i = 1; i < argc; i++) Range[i-1] = atoi(argv[i]);
}else if(argc != 1){
return 1;
}
even[0] = 0;
even[1] = Range[0];
even[2] = Range[1] - 2;
odd[0] = 1;
odd[1] = Range[0] + 2;
odd[2] = Range[1];
pthread_create(&thread[0], NULL, (void *)&findPrimes,(void *)&even);
pthread_create(&thread[1], NULL, (void *)&findPrimes,(void *)&odd);
pthread_join(thread[0], (void*)&retval0);
printf("Sub : %d primes found.\n", *(int*)retval0);
pthread_join(thread[1], (void*)&retval1);
printf("Sub : %d primes found.\n", *(int*)retval1);
printf("Tot : %d primes found.\n", *(int*)retval0 + *(int*)retval1);
return 0;
}
When the threads return they are resolved by joining
according to the saved threadID's and the return values from the
threads are reported. The complete source for the original program
can be found as
prime_thread.c.
The program can be compiled and run with something
like:
klamath:[prime]> make prime_t
gcc -D_REENTRANT -O3 -Wall prime_thread.c -o prime_t -lm -lpthread
klamath:[prime]> prime_t 10 10000
ID = 0 :: range = [10, 10000]
ID = 1 :: range = [12, 10000]
Sub : 617 primes found.
Sub : 608 primes found.
Tot : 1225 primes found.