If you're bringing up the music entirely for emotional effect, then simple volume automation is the way it's normally done. Long, smooth volume increases that start just before the end of the dialog. Just listen closely to any big-budget movie or TV show to hear it being done. It's rarely automatic via a dynamics processor, though, except in radio commercials that are knocked out by the dozen in a day.
When you do listen closely to movies, you'll notice that the music is NOT always brought up during every dialog silence. Doing so would draw the audience's attention to the music, but you probably want it to be subliminal. That's why it has to be done by hand, not with a sidechained gate or compressor.
If, on the other hand, you do want to duck the music to enhance the dialog's clarity, then that's another thing entirely. In that case, you want to duck only those frequencies that mask the dialog, rather than a broadband gate or compressor. Otherwise, it'll sound like a radio commercial. For that, you'd be on the right track with a multi-band compressor, but there are even more elegant and transparent methods.