I don't think sample accurate sync is really required at this level. Not even close. They are separate things, running at completely different rates of resolution. Audio is typically 44,000/sec for most of us, midi sequences require at the most a few thousand ticks per sec and more likely its in the hundreds; that you need.
If you look at Sonar, I don't think you will find any capability to have a midi event start at a certain sample count from the start of the track. You can only tell a midi event to happen at either BAR:BEAT:TICK, or you can tell it start at a SMPTE time, which is a measly 30 frames a sec and lower resolution than TICKS usually. And the BB:BB:TT resolution will take precedence, whatever it is.
There is absolutely no notion that the midi event will start at a particular sample number, that I am aware of.
There is only a need to
(A) make sure the audio sample rate is stable and
(B) make sure the midi tick rate is stable and
(C) make sure their timebase is synchronized pretty closely, which does not need to be sample accurate for that either. By timebase, I mean have a notion of where zero is on the timeline for both streams so that when you rewind and hit play, they both start playing with stable clocks from the same point in time. Or if you fast foward to a particular point, you can calculate how many samples forward that is and how many midi ticks forward and then start both clocks at the same time, again both with their own clocks running at completely different rates, but nonetheless, as accurately as possible, and located to the same place to start...and playing back accurately forward without drift.
(A) sample rate is kept stable with "word clock". By the way, audio sampling can experience its own form of jitter also, the thing is, its happening so fast you don't even realize it. Better quality A/D converters have more stable clocks in them which is one of many reasons they sound better. You can actually make a mediocre soundcard sound better by feeding it word clock from a more stable word clock master source such as the Apogee Big Ben. But even without word clock, most soundcards have their own internal clock that does a pretty reasonable job of sampling the audio at regular intervals, 44,000 times a second with low enough audio jitter to satisfy most of us.
(B) midi tick clock rate is more difficult today since with our DAW's most of us are letting WinXP be the clock, and its not very accurate. Even to do a measly 1000 times a second, WinXP can't do it very well. Most midi interfaces do not have an internal clock or ability to timestamp events according to their own internal clock. Only a few do.
(C) Once it starts playing, then no sample by sample synchronization can or should occur, they both just need to run with stable clocks. You could, I suppose, have the sample clock trigger the midi clock in some way. either by having them in the same device or by having a midi device that can listen to word clock...or perhaps ADAT sync? Dunno, but I don't think anything does it today. I'm not actually sure if the MOTU's with timestamping can listen to word clock, but its kind of overkill to try to force the midi to be so tightly locked to the sample clock, overkill which requires processing resources.
Nah, all that is really needed is for the midi interface to have its own clock that is very stable and to timestamp each event with some kind of timestamp. The problem is how to get a timebase into the midi interface so that the timestamp can then be interpretted by the midi driver later to mean something useful. I guess with ADAT sync you could get a sample accurate timestamp, but the driver would downgrade it to only a Bar:Beat:Tick resolution anyway in Sonar. Word clock could only be useful to establish a stable clock. So I suppose you could have a midi interface that basically timestamps everythign with the same clock rate as the sample rate...and then divide it back down in the midi driver. However, again, how do you get the timebase in there to establish the location in time? ADAT sync is the only way I know of.
Anyway, we're waxing theoretical again...not dicussing anything that Cakewalk has any control over. The stuff I highlighted in blue above is what I think we should be focusing on.