The short technical answer is this:
When you quantize a signal to a set number of values/steps, for each sample there's a difference between the actual value and the quantized value known as quantization error. The more bits you have, the smaller the quantization error.
Generally speaking quantization error sounds like a weird, nasty distortion (assuming you can hear it all).
But if you add just enough random noise (this is the dither) before quantizing or reducing bit depth, it changes the weird nasty distortion to sound like noise, which is
much less annoying (again assuming you can hear it at all).
Dither also allows signals smaller than the smallest bit value to be preserved, and some of this signal is audible below the noise floor.
Different types of dither just refer to the nature of the dither noise and use tricks to try to make the dither noise less audible while still doing its job.