Latency will be determined by whatever the program sets...I looked into this idea a year ago, I think what would be worthy is to create a "Cuda-bridge-wrapper" much like jbridge functions in Sonar to run your plug-ins...pretty much if you meet the specs of Sonar to run it you really don't need to off-load it to parallel processing, but some of the heavy lifting, like maybe rendering audio or plug-ins could be shifted to the video card...yes my current card has 480 Cuda cores just basically sitting there...but then again the 6 core processors spend most of their time waiting around for an instruction set...
I think the problem comes form approaching the code-writing differently...with the processors being as powerful as they currently are lining up your code to just go in que is pretty much just going down the page of code writing, however writing code to be processed simultaneously on different cores is s different beast...the guys to do this would be the gaming community, they are light years ahead of us in this metric.
But yes being able to render audio or beef up plug-in power would be a plus... I think where companies like Sonar shy away from it though is requiring certain video cards to do it...ATI has their own version so locking people into a particular brand of video card is counter-intuitive for a native multi-machine program...
Thats where an independent programmer creating a general Cuda or ATI/type multi-core parallel processing "wrapper" that you could simply chain to your plug-in would be a plus.
Of course people with the wrong video card would probably buy it, try to run it and complain...see why Sonar doesn't go there? Even when you plaster warnings all over the web site people still do it.