One explanation that I read that makes sense to me:
Like said, the conversion to something with less quantization steps introduce artifacts - many values in 24 bit become one value 16 bit.
Ears/brains makes patterns of everything that occur in the same spot - like a certain quantization step.
Adding noise makes a little randomness to this very spot - sometimes it becomes a ONE, sometime a ZERO.
So less for brain to make patterns off.
In short it sound better to our ears.