targeting CUDA

Suggest new features, components or other changes to the software

Moderator: electrogear

Re: targeting CUDA

Postby infuzion on Fri Jun 27, 2008 9:45 pm

exonerate wrote:I'm developing in both C++ and SM at the moment, in my opinion there perfect partners :) , Learning C++ has helped my SM development, and SM helps with the C++ side.
Yea, that was my argument for SM having an SDK; to program segments as C++ if I want, & to add limited-audiance things like CUDA...
Need help? First search the forum & WiKi, then post in the help forum with a clear topic, request, & OSM. Then please WiKi the correct solution. If you want my personal assistance, I charge by the hour or for an exchange of services.
infuzion
smstar
smstar
 
Posts: 6163
Joined: Wed May 04, 2005 8:02 pm
Location: Earth, USA, CO, Denver

Re: targeting CUDA

Postby matti on Sun Nov 09, 2008 10:28 pm

:love:
I just tested out using convolution based effects on music production with CUDA and an 8800 GPU. Man that thing is a HUGE number cruncher! A reverb that usually takes around 20% of my dualcore CPU eats less than 1% of the GPU.. I could load up dozens(maybe hundreds?) of them at the same time. That is definetly the the future path for audio processing. Not sure tho how other algorithms will sit on such GPUs, but even if their theoretical maximum speed can't be reached they will still be a good addition to the CPU. Running only convolution/fft stuff on them is also a huge benefit as it's ofcourse a huge reduction to the main CPU load.

Now that those old 8800(or 8600 etc) GPU's are dirt cheap i can see a huge market for VST's utilizing their power ;)
matti
essemilian
 
Posts: 472
Joined: Thu Nov 02, 2006 5:23 pm
Location: Finland

Re: targeting CUDA

Postby infuzion on Mon Nov 10, 2008 2:00 am

I think you are still limited by the # of streams the GPU can handle. Also, while GPU for audio seems very tempting there is one problem; there is a sizable delay between the command from the CPU to process to the GPU to actually start processing. 200 cycle delay rings a bell, but you can look for it in some of the CUDA PR info. So, "real-time" GPU is not right now.

But GPU is perfect for mastering, reverb, & anything you don't need right now-now.
Need help? First search the forum & WiKi, then post in the help forum with a clear topic, request, & OSM. Then please WiKi the correct solution. If you want my personal assistance, I charge by the hour or for an exchange of services.
infuzion
smstar
smstar
 
Posts: 6163
Joined: Wed May 04, 2005 8:02 pm
Location: Earth, USA, CO, Denver

Re: targeting CUDA

Postby matti on Mon Nov 10, 2008 1:22 pm

infuzion wrote:I think you are still limited by the # of streams the GPU can handle. Also, while GPU for audio seems very tempting there is one problem; there is a sizable delay between the command from the CPU to process to the GPU to actually start processing. 200 cycle delay rings a bell, but you can look for it in some of the CUDA PR info. So, "real-time" GPU is not right now.

But GPU is perfect for mastering, reverb, & anything you don't need right now-now.


200cycle? (now that would be very fast) You mean 200 samples perhaps? Even so, it's not big. I currently work with 256 sample latency in my studio. Computer systems are not "real time" anyway. I don't see any problems with the implementation even now. Small latencys can be dealt with.

I'm going to investigate Cuda some more when i get that 8800 card back. There are limitations ofcourse, but i don't think they are that bad. PCI-E 16x is a bit faster than the old PCI right? (soundcards, uad's and powercores are PCI)
matti
essemilian
 
Posts: 472
Joined: Thu Nov 02, 2006 5:23 pm
Location: Finland

Re: targeting CUDA

Postby aliasant on Mon Nov 10, 2008 2:17 pm

matti wrote:
infuzion wrote:I think you are still limited by the # of streams the GPU can handle. Also, while GPU for audio seems very tempting there is one problem; there is a sizable delay between the command from the CPU to process to the GPU to actually start processing. 200 cycle delay rings a bell, but you can look for it in some of the CUDA PR info. So, "real-time" GPU is not right now.

But GPU is perfect for mastering, reverb, & anything you don't need right now-now.


200cycle? (now that would be very fast) You mean 200 samples perhaps? Even so, it's not big. I currently work with 256 sample latency in my studio. Computer systems are not "real time" anyway. I don't see any problems with the implementation even now. Small latencys can be dealt with.

I'm going to investigate Cuda some more when i get that 8800 card back. There are limitations ofcourse, but i don't think they are that bad. PCI-E 16x is a bit faster than the old PCI right? (soundcards, uad's and powercores are PCI)



I have a 8800GT in my computer. Is there any way I (and others ) can test this?
It's never to late to be late.....
http://martinrodensjo.smugmug.com/
User avatar
aliasant
smunatic
 
Posts: 2386
Joined: Sat Dec 30, 2006 5:49 pm
Location: Sweden

Re: targeting CUDA

Postby infuzion on Mon Nov 10, 2008 4:00 pm

aliasant wrote:I have a 8800GT in my computer. Is there any way I (and others ) can test this?
I'm not so clear on the delay, but I've read it is noticeable for real-time audio apps... I'd think the best bet for questions like this is the CUDA form:
http://forums.nvidia.com/index.php?showforum=62
Need help? First search the forum & WiKi, then post in the help forum with a clear topic, request, & OSM. Then please WiKi the correct solution. If you want my personal assistance, I charge by the hour or for an exchange of services.
infuzion
smstar
smstar
 
Posts: 6163
Joined: Wed May 04, 2005 8:02 pm
Location: Earth, USA, CO, Denver

Re: targeting CUDA

Postby matti on Tue Nov 11, 2008 1:56 am

Ah, btw. Until i get my 8800 back i made a little plugin to load IR's to ;)
matti
essemilian
 
Posts: 472
Joined: Thu Nov 02, 2006 5:23 pm
Location: Finland

Re: targeting CUDA

Postby matti on Wed Nov 26, 2008 10:51 am

I thought this might be a good place to mention this:
Next year there's rumoured to be a new fast platform rivalling CUDA. The Intel Larrabee. Targeting Larrabee is said to be lot easier as it's based on bunch of x86 processors instead of the more "traditional" GPU stream processors. SM should really look into that when it arrives. I sure will :)


More about Larrabee: the core has a SIMD Vector processing unit similar to the one used in SSE, except it's width is not 4 floats, it's 16. So as SM now has a 4 voices for the price of one scheme, you could basicly run a 16 voice synth on one Larrabee core(it's said to be an old 1+Ghz Pentium architecture, so it should run one SM-synth easily.. my 600mhz P3 does anyway), and there's said to be up to 32 cores in it. If the instruction set is close enough to SSE it should be a pretty smooth transition...

more stuff at wikipedia
matti
essemilian
 
Posts: 472
Joined: Thu Nov 02, 2006 5:23 pm
Location: Finland

Re: targeting CUDA

Postby matti on Wed Jan 28, 2009 3:16 pm

Ok, I've now installed CUDA alongside my compiler. First tests were succesfull and a simple convolution app was fairly simple to do. With a bit of research it seems that the GPU can run pretty much any type of algorithms, but with stellar speeds on only the ones that can be properly multi-threaded. So there's no reason for not being able to do even synths on it. What i'm intrested in doing is efficient(and fast) convolution reverbs. The bandwidth of the PCI-E shouldn't be a problem as there only needs to be 256 samples sent and received in a second. So that's 256*4*2 = 2 kibibytes per 1/172th of a second -> 344 KiB per second to and from the GPU. The bandwidth of my GPU is around 2 GiB = 2097152 KiB. Should be enough..
What might be a problem tho is the threading of the VST host program. It might complicate how things can be moved to- and back from the GPU. I need to test things further to see how it goes :)
matti
essemilian
 
Posts: 472
Joined: Thu Nov 02, 2006 5:23 pm
Location: Finland

Re: targeting CUDA

Postby infuzion on Wed Jan 28, 2009 4:17 pm

matti wrote:...So there's no reason for not being able to do even synths on it. What I'm interested in doing is efficient(and fast) convolution reverbs.
I wonder if synths might be out of the question, since there is a lag between commands/data being sent to the GPU & the GPU executing those commands, much greater than the lag between RAM & CPU. EQs & reverbs seem to be good uses of GPUs though!
Need help? First search the forum & WiKi, then post in the help forum with a clear topic, request, & OSM. Then please WiKi the correct solution. If you want my personal assistance, I charge by the hour or for an exchange of services.
infuzion
smstar
smstar
 
Posts: 6163
Joined: Wed May 04, 2005 8:02 pm
Location: Earth, USA, CO, Denver

Re: targeting CUDA

Postby matti on Sat Jan 31, 2009 8:05 pm

infuzion wrote:
matti wrote:...So there's no reason for not being able to do even synths on it. What I'm interested in doing is efficient(and fast) convolution reverbs.
I wonder if synths might be out of the question, since there is a lag between commands/data being sent to the GPU & the GPU executing those commands, much greater than the lag between RAM & CPU. EQs & reverbs seem to be good uses of GPUs though!


The lag doesn't seem to be that big. There is a small lag if you read a single value straight from the CPU's RAM on the motherboard, but other than that the system seems pretty fluid and fast. The manual mentions a 400 to 600 cycles of lag when reading from CPU RAM, compared to single cycle of reading from the GPU's low level memory that is shared between threads.. So in that way it is huge, but the GPU runs at over 1000 000 000 cycles per second.. so 400 to 600 cycles ain't that bad. Also, in reality moves to the card are done in batches. So even if memory latency would be higher than on CPU's, the speed thru the PCI-E is not that bad, 2Gib/s on my 8800. Plenty of time for moving simple controller values etc. Creating synth's are definetly possible! You don't need to send anything else to a synth than the notes it needs to play, and some controller values. That's so much less data than what is sent to a convolution reverb.

One cool thing is that you can process on the GPU at the same time when data is moved to and from the card(or do computing on the CPU ofcourse). With this trick the complete memory bandwidth thing can be masked by computing by the values retrieved from a previous pass(if the bandwidth is going to be a problem.. tho it seriously isn't).

I see CUDA as a very promising platform also for audio. I'll let you know when i get my first VST test codes ready :)
matti
essemilian
 
Posts: 472
Joined: Thu Nov 02, 2006 5:23 pm
Location: Finland

Re: targeting CUDA

Postby infuzion on Sun Feb 01, 2009 5:46 am

matti wrote:I see CUDA as a very promising platform also for audio. I'll let you know when i get my first VST test codes ready :)
Wow, lots of great info & insight; thanks Matti!
CUDA (& lack of SDK) is making it very tempting for me to dump SM in the next few months...
Need help? First search the forum & WiKi, then post in the help forum with a clear topic, request, & OSM. Then please WiKi the correct solution. If you want my personal assistance, I charge by the hour or for an exchange of services.
infuzion
smstar
smstar
 
Posts: 6163
Joined: Wed May 04, 2005 8:02 pm
Location: Earth, USA, CO, Denver

Re: targeting CUDA

Postby matti on Sun Feb 01, 2009 7:51 pm

infuzion wrote:
matti wrote:I see CUDA as a very promising platform also for audio. I'll let you know when i get my first VST test codes ready :)
Wow, lots of great info & insight; thanks Matti!
CUDA (& lack of SDK) is making it very tempting for me to dump SM in the next few months...

Well there's no reason to dump SM for that.. I'm for example protyping the fft convolution engine first inside SM. It's pretty much fully functional now, except a bit slowish.. Next step is to write it to C++/Cuda and run benchmarks.
It's lot more fun building something with SM, so i'm definetly not leaving it :)
matti
essemilian
 
Posts: 472
Joined: Thu Nov 02, 2006 5:23 pm
Location: Finland

Re: targeting CUDA

Postby matti on Wed Feb 04, 2009 1:47 am

matti wrote:Well there's no reason to dump SM for that.. I'm for example protyping the fft convolution engine first inside SM. It's pretty much fully functional now, except a bit slowish..

Ha.. now that i mentioned it i ran into some trouble. I'm getting a few calculation errors from propably the fft's. Where ever it's coming from it builds up into the quiet parts and nulls everything from it's path. Other than that my partitioned overlap-save convolution algorithm seems to be working.. tho it takes 2 seconds to calculate a 1.5 second verb for 256 samples. SM might not be that efficient for this type of thing.. I would need to slim it down to just few ms in CUDA. I just tested that the data transfers needed for a vst-effect take under 0.5ms so that doesn't seem like a problem yet.
matti
essemilian
 
Posts: 472
Joined: Thu Nov 02, 2006 5:23 pm
Location: Finland

Re: targeting CUDA

Postby rl on Wed Feb 04, 2009 12:46 pm

just curious: how did you implement the prototype e.g. the nested loops of typical FFT? assembly?
User avatar
rl
dsp wiz
 
Posts: 1494
Joined: Mon Feb 07, 2005 10:24 pm
Location: de.earth.universe.known

PreviousNext

Return to Ideas and Requests

Who is online

Users browsing this forum: No registered users and 1 guest