I was curious, since I have never seen the Scarlett version. With the numbers you are getting, this seems higher than expected to me. Have you tried the lowest setting (1msec) to see if you get any pops/crackles/dropouts?
As far as the "math," the easiest way to remember things is with units. Sample rate is samples/second, and your buffer is in milliseconds, so if you convert the buffer time to seconds, your units will cancel out to make sense.
Example: 44.1 = 44100 samples/sec and a buffer of 1msec = .001 seconds. Multiplying them gives units of "samples-seconds/seconds" and the "seconds" can be canceled top and bottom, so is just "samples." so in 1msec at 44.1 you would see 44100 samples/second * .001 second = 44.1 samples.
Similarly, 96000 samples/second * .0112 seconds (the 11.2msec RT you mentioned above) = 1075.2 samples (which you said was 1078, close enough since the RT is rounded off to 11.2).
Sample rate * time = samples (in that time)
Samples / sample rate = time
Samples / time = sample rate (this one is easy, samples/seconds = samples/second

)
Just be sure the units used all match.