For audio, there is no reason to use dither when the output is greater than ~16 bits. It simply will never ever make any difference. The rule about always dithering when reducing resolution is a general rule about preserving as much data as possible, but it's irrelevant when the data is already noise or is never going to get used because it's going to be completely inaudible
and buried in analog noise anyway.
TheMaartian
32-bit float (23-bit fraction, 8-bit exponent, 1-bit sign) to 24-bit integer seems like an up-sample to me.
32 bit float = 23 bits of resolution + 1 sign bit +
1 implied bit (for non-subnormals, but subnormals are irrelevant for audio*) + 8 bit exponent.
24 bit = 24 bits of resolution
including 1 sign bit, but
you only get 24 bits of resolution for individual samples that are within ~6 dBFS of clipping.
In floating point you always (except for subnormals*) get the full resolution because the exponent scales things. IOW, you always (except for subnormals*) get 25 bits of resolution with 32 bit floating point vs. 24 bits only for the loudest samples with 24 bit fixed point.
* I only mention subnormals so that someone doesn't insist on needlessly correcting me on a technical subtlety that doesn't matter at all. If you don't already know what they are it means you don't need to.