I’m a HPC developer with 3 years experience in C++/ CUDA development.
I have worked on parallelising and optimising many large scale CPU based application to CPU + GPU environment.
Workloads I handled: Image Processing, spatial linear algebra, Monte Carlo methods, autonomous cars, Deep Learning.
I have experience with CUDA accelerated libraries.
I will use a combination of custom CUDA kernels, accelerated libraries, OpenACC and optimised CUDA memory models with performance as end goal.
Hands on with tools like nvprof, nvvp, Nsight compute, Nsight system, Intel VTune, Intel Advisor.
Will provide documentation to understand the implementation.
Available hours can be discussed.