OK, I disabled all FX (by clicking the FX button) and recorded the synth input again. Same issue; the delay is the same.
Interestingly, I tried each of my hardware synths off the MIDIsport 4x4 interface, and the Waldorf Q seems to be the quickest responder. By "quickest responder" I mean only 1 to 2ms difference!
I have a spare MIDIsport 2x2 available, so I thought I'd give that a try for comparison, as to rule out the chance that the older 4x4 interface is slow. After test and recording again, it seems that the 2x2 interface offers slight improvement of about 1ms. However, my measurement techniques could be introducing an error percentage here and there, so I would consider the 2x2 and the 4x4 interfaces the same.
So am I just seeing the combination of MIDI data rate delay plus the round trip latency of ASIO/audio-interface? If so, is there a way to compensate for this in buffer size, manual offset, etc.? Any hardware folks out there experience this problem?
Maybe there is just no way around the issue, since no matter what, there will always be a delay somewhere in the electronics and chains. I figure the only way to compensate is to somehow push the MIDI notes to automatically fire sooner. However, will this cause bigger issues down the road as projects become more and more complex?
Sorry for the complexity of my problems and questions.

Overall, BIG THANKS for the help and conversation thus far.