If you're looking for the lowest possible round-trip latency, that means using an audio interface that has a small hidden safety-buffer.
External USB/Firewire units often use a large hidden safety-buffer to ensure glitch-free playback. While this help to ensure glitch-free playback (under less than ideal circumstances), it comes at the expense of greater round-trip latency. Other than increasing the sample-rate, there's nothing you can do to mitigate an audio interface's round-trip latency. Units that use DICE-II tend to use a larger hidden safety-buffer (Alesis, TC, Focusrite, etc)
In an external unit:
RME, MOTU, and Presonus Audiobox VSL series would all fit the bill.
The RME USB units offer round-trip latency of 4.9ms at a 48-sample ASIO buffer size/44.1k
MOTU and Presonus units would be slightly above this.
The Octacapture and siblings would be slightly above the MOTU and Presonus units (7.4ms)
With PCI/e units:
Most offer round-trip latency of ~5ms at a 64-sample ASIO buffer size/44.1k