The direct and room reflections add together at different levels and phases, simple as that.
The impulse response of a filter in the time domain equates to the frequency and phase response of the filter in the frequency domain, so the room's IR can be thought of as an FIR filter that modifies frequency and phase response at the listening position. All an FIR filter does is convolve the signal with the filter's IR, just like (in theory) a convolution reverb does with it's IR.
So if you think of the room as a filter, you just need another filter to undo its effects at the listening position without screwing things up too much in the process.