I've tried it with and without the 64 bit engine, no change.
My "sound card" is the Focusrite Scarlet 18i20. It defaults to a 4ms latency. I've also adjusted that to as low as 1ms-no change.
ASIO Reported Latencies (includes buffers and hardware latencies)
Input = 8.9msec 394 samples
Output = 12.9msec 571 samples
Round Trip = 21.9msec 965 samples