Evolution of a Parallel Program

Shared Memory - pthreads

The C program is extended to utilize the POSIX thread functions found on our SMP linux machines with the Pthreads Library. This allows different threads of execution to run in concert together in a single memory and process space, but yet take advantage of the multiple processors. One of the critical things to keep in mind here is that both threads (one for each processor) will be running in the same memory space concurrently. This means that access to variables in the threaded part of the program must be controlled to avoid data mangling by the separate threads. There are many methods to control access to shared memory [ie semaphores, mutex's and critical sections], but this problem does not require any sophisticated data sharing and the approach used here exemplifies simplicity and reliability.
After having analyzed our program we decide that for larger ranges the bulk of the computation resides in the "findPrimes" function. This indicates that the program as a whole will speed up if we can increase the resources given to this function, this is commonly known as Amdahl's Law. Since a speedup in this function would induce a speedup for the entire calculation we focus our attention here. This function is thus our candidate for multithreading.
An instantiation of a new pthread allows for the specification of a function for that thread of execution and for a pointer to a data block cast as (void *). This highlights the difference between the prototypes to the serial and threaded versions of the computational part of the program.

int isPrime(int testNum) {
  int i;
  if(testNum < 2) return 0;
  if(testNum == 2) return 1;
  for( i=2; i < sqrt(testNum)+1; i++)
    if (!(testNum % i))
      return 0;
  return 1;
}

void findPrimes (void* Data) {
  int *Params = (int *)Data;
  int i, j, x, y;
  const int ID = Params[0];
  const int lbound = Params[1];
  const int ubound = Params[2];
  prime_count[ID] = 0;

  printf("ID = %d :: range = [%d, %d]\n",
         ID, lbound, ubound);

  for(i = lbound; i < ubound; i+=4) {
    if (isPrime(i)) prime_count[ID]++;
    if (isPrime(i+1)) prime_count[ID]++;
  }
  pthread_exit((void*)&prime_count[ID]);
}

The most crucial difference here is that the counter, prime_count[] is an array and is indexed by a unique thread ID, specific to each thread. This is essential since prime_count is in shared memory space and if we were to use a single variable then both threads will attempt to access it, unreliably, and we will get indeterminate results without careful management of access.
Another difference here is the in the loop of findPrimes(). Notice that isPrime() is called 2 times a cycle and the loop iterator is incremented by 4. This is a load-balancing technique that i) takes advantage of the fact that even numbers are not prime, so that with 2 threads, one thread does not check exclusively even numbers, and ii) the threads check alternate odd numbers (based on their starting point, handled below) so that there is an inherent balance in the amount of work done by each thread.
Also note that since the findPrimes() function will run inside a thread, a graceful exit is assured with a call to pthread_exit to deallocate any thread management resources and to ensure proper return of the critical data from the thread. This could equally well be done in the standard C manner with a function return value.
The control of the program is slightly modified as well. We see that some thought must be given to how the workload will be pursed out and this requires, in this case, of how the data will be used inside the thread itself. Since we have chosen a "leap-frog" approach to avoid difficulties of poor load balancing we must stagger the start and end points of the ranges of data given to the threads. This is seen in the calculation of the even[] and odd[] arrays before the threads are invoked.

int main(int argc, char* argv[]) {
  pthread_t thread[Num_Proc];
  int i, retval0, retval1;
  int Range[3] = {0, 100, 0};
  int odd[3], even[3];

  if (argc == 3) {
    for (i = 1; i < argc; i++) Range[i-1] = atoi(argv[i]);    
  }else if(argc != 1){
    return 1;
  }
  even[0] = 0;
  even[1] = Range[0];
  even[2] = Range[1] - 2;
  
  odd[0] = 1;
  odd[1] = Range[0] + 2;
  odd[2] = Range[1];

  pthread_create(&thread[0], NULL, (void *)&findPrimes,(void *)&even);
  pthread_create(&thread[1], NULL, (void *)&findPrimes,(void *)&odd);

  pthread_join(thread[0], (void*)&retval0);
  printf("Sub : %d primes found.\n", *(int*)retval0);
  pthread_join(thread[1], (void*)&retval1);
  printf("Sub : %d primes found.\n", *(int*)retval1);
  printf("Tot : %d primes found.\n", *(int*)retval0 + *(int*)retval1);

  return 0;
}

When the threads return they are resolved by joining according to the saved threadID's and the return values from the threads are reported. The complete source for the original program can be found as prime_thread.c. The program can be compiled and run with something like:

klamath:[prime]> make prime_t
gcc -D_REENTRANT -O3 -Wall prime_thread.c -o prime_t -lm -lpthread
klamath:[prime]> prime_t 10 10000
ID = 0 :: range = [10, 10000]
ID = 1 :: range = [12, 10000]
Sub : 617 primes found.
Sub : 608 primes found.
Tot : 1225 primes found.

Start