The whole thing sounds very distorted, like it was recorded too "hot". I'm also pretty sure there's some short bursts of harsh digital clipping in places where they overloaded the analogue-digital convertor when recording it. The voice over is pretty distorted and harsh as well and as for the mix of voice and backing music......
I'm not sure I can hear anything I'd think of as "recording booth noise" though.
Was this recorded as speech only then the music mixed in afterwards, or by playing the music loudly in the booth while the voice was recorded by any chance? Either way, I'd start by eqing to make the whole thing less bright, harsh and in your face.
Actually, no, I'd tell them to go back and do it again, keep the recording levels lower and keep the tracks so a proper mix can be done.