Description
A ''shared-memory multiprocessor'' is a "computer system composed of multiple independent processors that execute different instruction streams". The shared memory programming model is the most widely used for parallel processor designs. This programming model starts by identifying possibilities for parallelism within a piece of code and then mapping these parallel tasks into threads. The next step is to determine the scope of variables used in a parallel program, which is one of the key steps and main concerns within this model.Variable scope
The next step in the model groups tasks together into bigger tasks, as there are typically more tasks than available processors. Typically, the number of execution threads that the tasks are assigned to, is chosen to be less than or equal to the number of processors, with each thread assigned to a unique processor. Right after this step, the use of variables within tasks needs to be analyzed. This step determines whether each variable should be shared-by-all or private-to-each thread. This step is unique to shared-memory programming. (An alternative isPrivatization
Dependencies – potential conflicts between different threads during execution – prevent parallelization, and these conflicts appear when we have Read/Write Conflicting variables. One technique to remove these conflicts is privatization. The basic principle involves making a private copy of a variable for each thread, rather than share one instance. This changes the category of the variable from Read/Write Conflicting to Read/Write Non-conflicting. The actual local (private) instances of the Read/Write Conflicting variables are created at compile time, by allocating several areas of memory for the variables stored at different memory locations. The architecture of shared-memory multiprocessors helps, as threads share an address space. There are two situations in which a variable can be described as privatizable: # When the variable is written before it is read by the same task during the original program's sequential order. In this case, if the task wrote to its private copy rather than the shared one, the conflict/dependency would be removed. The reason for this is that the program's sequential order will ensure that the value will be the one written by the same task, removing any conflicts that might occur by other threads accessing the same variable. See . # When the variable is read before it is written by the same task. The difference here is that the value the task is trying to read is one from a prior computing step in another task. But if each task wrote to its own private copy, any conflicts or dependencies would be solved during execution, as they would all read a value known ahead of time and then write their correct values on their own copies. See . Because Read/Write Conflicting variables are the only category that prevents parallelization, there is no need explicitly to declare Read-only and Read/Write Non-conflicting variables as private. Doing so will not affect the correctness of the program, but may use more memory for unnecessary copies.Limitations
Sometimes a variable can be neither privatized nor reduced to remove the read/write conflict. In these cases, the Read/Write Conflicting variable needs to be updated by different tasks at different points of time. See . This problem can sometimes be solved by changing the scope of parallelism to explore a different parallel region. This might produce good results, as it is often that after reanalyzing the code, some Read/Write Conflicting variables may change to Read/Write Non-conflicting. If the variable still causes conflicts, the last resort is to declaring it as shared and protecting its access by some form ofArrays
Read/Write Conflicting variables can be scalar, or compound types such as arrays, matrices, structured types an so on. Privatization can be applied to both types of variables. When applied to scalar variables, the additional space and overhead introduced by making the extra private copies per thread is relatively small, because scalars are small. However, applying privatization on arrays, matrices or other compound types is much more complex. When dealing with arrays, the compiler tries to analyze the behavior of each array element separately and check for the order it is read and written. If each element is written before it is read in the same iteration, this array can be privatized. To do this, the compiler needs to further analyze the array to combine its accesses into sections. Moreover, the compiler should have extra functions, to be able to manipulate and deal with the array elements. For example, some array expressions may have symbolic terms, hence, to be able to privatize such array, the compiler needs to have some advanced symbolic manipulation functions.Examples
Example 1
A variable can be privatized if each task will write to it before reading from it. In this case, it does not matter whether other threads are doing so. In the code below, the variablex
is used to help swap three different pairs of variables. Because it is always written to before being read, it can be privatized.
//Sequential Code: x = a; a = b; b = x; x = c; c = d; d = x; x = e; e = f; b = x;This code cannot be made parallel without privatizing
x
. With x
privatized, it can run on three different threads, each with its own private x
:
//Parallel Code: //Thread 1: x = a; a = b; b = x // Thread 2: x = c; c = d; d = x // Thread 3: x e; e = f; b = x
Example 2
Privatization is possible when a variable's value is known before it is used – even if it is written to by a different task. The code below demonstrates this. The variablex
is written to in the middle of each task, but that value could be computed when the program was compiled. By making x
private and defining it at the beginning of each task, the code can be run in parallel:
//Sequential Code: x = 1; y = x * 3; x = 4; z = y/x; a = x * 9; x = 3; b = a/x; c = x * 1; x = 11; d = c/x;To make the sequential code above parallel, a few lines of code must be added so that
x
can be privatized:
//Parallel Code //Thread 0: x = 1; y = x * 3; x = 4; z = y/x //Thread 1: x = 4; a = x * 9; x = 3; b = a/x //Thread 2: x = 3; c = x * 1; x = 11; d = c/xBecause of the extra code, this short example may not actually see much of a speed up. But in real-life, longer code this technique can greatly improve performance.
Example 3
Privatization fails is when a variable is written in one task and read in another, and the value is not known ahead of time. An example is summing the elements of an array. The sum is a shared variable and is read/written in each iteration of the loop. In sequential code, this works fine. But if the iterations were each done in a different thread, the wrong sum would be calculated. In this case, privatization does not work. Thesum
cannot be made private because it relies on its value from the previous iteration.
//Sequential Code: sum = 0; for (i = 0; i < 100; i++) sum += aThis problem can be solved in part by
// Thread 0: sum = 0; for (i = 0; i < 100; i += 3) sum += a [0; // Thread 1: sum = 0; for (i = 1; i < 100; i += 3) sum += a[i[1; // Thread 2: sum = 0; for (i = 2; i < 100; i += 3) sum += a [2; // "Master" thread: wait_for_all(thread[0">[2.html" ;"title="[2">[2; // "Master" thread: wait_for_all(thread[0 thread[1], thread[2]); sum = sum + sum + sum
OpenMP
do i = 10, N - 1 x = (b(i) + c(i))/2 b(i) = a(i + 1) + x enddoFor each iteration of the loop,
x
is written to and then read from. Because x
is only a scalar variable, the loop cannot be executed in parallel because it would be overwritten in different threads, and b(i)
would not always be assigned the correct value.
Equivalent parallelized code using privatization is:
!$omp parallel do shared(a, b) private(x) do i = 10, N - 1 x = (b(i) + c(i))/2 b(i) = a(i + 1) + x enddoBecause
x
is declared as private, each thread gets its own copy and the dependence is removed.
Comparison with other techniques
Normally, when a variable is Read/Write Conflicting, the solution will be declaring it as shared and protecting access to it bySee also
* Parallel computing * Loop-level parallelismReferences
{{reflist Computer programming