Below files are attached that contains code and other necessary descriptions
Requirements:
In this project you have to use the reduction technique in OpenCL. Reduction is very common in parallel programming. Read the attached sample program, analyze it and write a brief report for each of the following queries.
a. The attached sample program is from AMD APP SDK 3.0. The program includes the following files:
i. [login to view URL]
ii. [login to view URL]
iii. [login to view URL]
iv. [login to view URL]
v. [login to view URL]
vi. [login to view URL]
vii. [login to view URL]
b. You can download and install AMD APP SDK. Most samples should work on non-AMD processors.
2. Questions:
a. How many data item are processed?
b. How many work items are created?
c. What is the work group size?
d. How many work groups are created?
e. Briefly describe the key ideas in the reduction process as implemented in Reduction_Kernels.cl. How is the sum calculated? For the work-item with global ID 0, how many additions does it perform?
f. In [login to view URL], what is the purpose of barrier(CL_LOCAL_MEM_FENCE)?
g. Briefly describe how the data array is transferred from host memory to compute device memory. Is the buffer object on compute device memory or host memory? Point out which line of the code actually trigger the data transfer.
h. In [login to view URL], why do we need to add the values in the array outMapPtr? Who provides the values in the array outMapPtr?
output = 0;
for(int i = 0; i < numBlocks * VECTOR_SIZE; ++i)
{
output += outMapPtr[i];
}
I'm good with OpenCL and CUDA, and have a good experience with parallel programing. Furthermore, I have implemented the reduction algorithm before, so I know the theory.