targeting CUDA

Suggest new features, components or other changes to the software

Moderator: electrogear

Re: targeting CUDA

Postby matti on Wed Feb 04, 2009 5:40 pm

rl wrote:just curious: how did you implement the prototype e.g. the nested loops of typical FFT? assembly?

I didn't build my own fft. I used the SM's one. And in cuda I'm using the included cufft library, which is based on fftw (sm's fft is maybe fftw too?).
matti
essemilian
 
Posts: 472
Joined: Thu Nov 02, 2006 5:23 pm
Location: Finland

Re: targeting CUDA

Postby matti on Sun Mar 08, 2009 7:09 pm

If anyones intrested...
I finally had some time to do tests with the CUDA convolution kernel i'm working on.
The figures from a test i ran look promising, except for the part of retreiving
the data back from the card:

block size: 256
memory move time: 0.016909 (ms)
FFT processing time: 0.069471 (ms)
complex multiply: 0.023685 (ms) (0.021075 ms even with a bigger 177152 block size)
iFFT processing time: 0.061804 (ms)
move data back: 1.78119 (ms)


I tried to simulate a situation where i would have a 256 sample block size(the latency i currently use in studio myself)
and would need to move only that amount of data back and forth during every block. I have no idea how VST's internally work concerning the latency times, and if the times presented here will be enough for it to work, but i'm going to try it anyway :)

With bigger blocksizes the memory moves increase linearly, and when building in debug mode all kernels seem to freeze up (the time they take to run increases almost 100x), so if you are evaluating the CUDA platform for some project remember those.

The tests were done on a 2,8GHz PentiumD with a 96 core 8800GTS card(G80) and the CUDA 2.0.
matti
essemilian
 
Posts: 472
Joined: Thu Nov 02, 2006 5:23 pm
Location: Finland

Previous

Return to Ideas and Requests

Who is online

Users browsing this forum: No registered users and 1 guest