CUDA Programming

Ακυρώθηκε Αναρτήθηκε May 3, 2014 Πληρώθηκε κατά την παράδοση
Ακυρώθηκε Πληρώθηκε κατά την παράδοση

HW4A: knapAlgo [30 points]

Start from sequential code that you did for HW3, and implement the fill_table function in CUDA (you will essentially be using the code that you wrote for HW0, which does a sequence of kernel calls, one for each row of the table). Make sure that your program now produces correct answers for large values of the capacity. Modify the program so that beyond a certain depth, the fill_table function is executed on the host rather than the GPU, and study the value of depth at which the I/O transfer to/from the GPU is no longer worth the parallelism gains.

HW4B: knapAlgoOptSmallN (Making the CUDA compute bound) [25 points]

Using the techniques described in the lecture, modify the CUDA code so that the function fill_table function is evaluated in a single kernel call. We suggest that you proceed in the following steps.

First write a simple CUDA function that accomplishes the correct synchronization between the thread blocks. Think of writing a toy program similar to Waruna's example for syncthreads (except that that one was for threads within a single threadblock, and this one is for multiple threadblocks).

Next embellish this with the code that correctly updates (with only one thread per threadblock active) the section of the array that is allocated to this threadblock. make sure that the correct values are written and read from global memory between the synchronizations. For this to work there should be enough global memory such that each threadblock can store its entire "output" (i.e., N*WMAX or N*sigma(wi) memory per threadblock). Hence, in order to maximize the number of active threadblocks, this scheme will only work for relatively small values of N.

After ensuring that this code produces the correct answers, parallelize the computation of a threadblock. This is the part that was not completely detailed in the class, since there are a few different options that you could pursue.

HW4C: knapAlgoOpt2 [25 points]

Now modify your program so that an arbitrary value of N can be handled. For this you will repeatedly (in a sequence of kernel calls) call the code of Part B.

CUDA

Ταυτότητα Εργασίας: #5891041

Σχετικά με την εργασία

3 προτάσεις Απομακρυσμένη εργασία Ενεργό May 9, 2014

3 freelancers κάνουν προσφορές κατά μέσο όρο $229 για αυτή τη δουλειά

cudabigdata

Hi, I'd like to finish this project for you. ___________________________________________________________________________________________________________________________________________________________________________ Περισσότερα

$136 USD σε 3 μέρες
(8 Αξιολογήσεις)
4.0
prad08

Hi, I am masters student in Embedded Systems and am doing my graduation thesis in OpenCL, the platform independent counterpart of CUDA. While I have not actually worked on CUDA, OpenCL is conceptually the same and it m Περισσότερα

$225 USD σε 5 μέρες
(0 Αξιολογήσεις)
0.0
tulebaev

Предложение еще не подано

$150 USD σε 3 μέρες
(1 Κριτική)
0.0
patriczhao

I am very familiar with CUDA programming and I think you can check out my resume from linkedin. Since I am not full time on freelancer, I estimate to deliver your code in 15 days. Contact with me if you have any Περισσότερα

$311 USD σε 15 μέρες
(0 Αξιολογήσεις)
0.0