I will try to be short...
16 and 24bit reference fixed point number representation.
32 and 64bit reference floating point number representation.
Let say you only have 3 digits with fixes point, you can use (range 0 - 1) 0.000, 0.001, 0.002, ... 0.999
Now let say you have floating point with 3 digits + 1 digit for power of 10. So in the same range (0-1), you can use:
0.000, 0.001 * 10^-9 = 0.00000000001, 0.002 * 10^-9 = 0.00000000002, ... 0.999 * 10^-0 = 0.999.
In the second we always have 3 digits
precision , independent from how small the number is. In the first case, we loose precision when we need lower numbers, f.e. we can not represent anything between 0.001 and 0.002.
But we need this "power of 10" extra digit.
32 bit floating point number has precision 24 bits and 8 for the power (of 2). 24 bit fixed point number also has precision 24, but the power is always zero.
AD/DA converters work in "fixed point" mode and have ~20 bit precision. So, by
recording into 24 bits fixed point you do not loose anything, in fact you save ~3-4 bits of garbage. Saving more bits or using 32bit floating point does not improve the quality, it just save more garbage bits.
With a chain of analog equipment, try to lower the volume by -50dB and then amplify +50dB. You can easily notice the difference then, compare to the original signal. Because the noise level is absolute, if you down the signal toward the nose level and then amplify it, you will amplify the noise as well.
The same (but with "digital noise" happens) in case you will process the signal with fixed point numbers. Digital noise will be (again absolute) -96dB for 16 bits and -144dB for 24 bits. With -50dB and 16bit, you will "bit crash" the sound.
Now try to make this example within Sonar. Lower the output of a track by -50dB and then "amplify it" (with several Buses +6dB). You will notice NO signal degradation. Why? Because DAWs are using floating point numbers internally. So when you lower by -50dB you still preserve original precision (you just changing the "power of 2").
But if after lowering -50dB you render the track into 16/24bit fixed point, that precision is LOST. To avoid, use
32bit files for processing (intermediate) formats.
The requirement to use 64bits instead of 32 (still floating point) comes from the fact many plug-ins are written by musicians and not mathematicians. With complicated algorithms, avoiding more then one bit calculation error is enormous job. From the beginning, PCs was preferring just to make calculations with more bits than deal with the problem (x87 co-processor could do this with 80bit format, while the result was normally converted to 64 or even 32). But that primary make sense for calculations, not for exporting. So almost paranoid "safe side" (even for top studio).
How can it be people "clearly hear" bad dithered 24bit files (easy to google such claims), while the equipment completely ignore everything above 20-21bits? Simple. They DIGITALLY amplify fixed point signal, so they "bit crash" it the way I have described before. If original signal is not yet mastered (not maximized), there is "free space" for such amplification (especially in almost silent places), so where the precision is not 24bit. So nothing to do with "golden ears", cheating pure. Bu that "cheating" is what happens during next processing of not yet finalized files, in case they are saved in 24bit format (they do not have 24bit precision).
For final 16bit format, correct dithering is real thing. In max places, theoretical SNR is -96dB. So already technically "reproducible". In reality the precision fall fast, and so the "noise"/"distortion" from incorrectly dithered material becomes audible even for noobs without "cheating".