I missed the "32-sample ASIO buffer" size when first responding.
The problem is trying to run heavy loads at a 32-sample ASIO buffer size.

With buffer sizes that low (.7ms at 44.1k), there's no margin for anything less than ideal circumstances.
Low/consistent DPC Latency (not just acceptable) is a must
BFD-3 is a wonderful drum plugin, but it's disk-streaming isn't the most efficient (compared to Kontakt).
If you've got a fast ride cymbal passage (with lots of ghost notes), polyphony can pile up fast.
With several disk-streaming sample libraries, you could easily exceed the capability of a 7200RPM "Samples" HD.
First thing I'd do is get BFD-3 library on SSD.
The MR816 is a nice audio interface, but it uses a large hidden safety-buffer.
With a smaller safety-buffer, you'd be able to (more comfortably) run it at a 64-sample ASIO buffer size.
For running heavy loads at a 32-sample ASIO buffer size, you'd do better with a PCIe audio interface (RME, MOTU, Lynx).
A recent make (well configured) machine would allow running heavier loads at a 32-sample ASIO buffer size.
I don't want to discourage the OP. Effectively working at a 32-sample ASIO buffer size is demanding.
The solution isn't simple... as it involves every facet.
Any weak link throws a monkey-wrench