Thank you! I have been messing around with it a bit more and found another couple nuances that would be nice to have in documentation (or maybe is, and I cannot find).
Using the set up from the OP, the song track is the Dry mix, and the voiceover is the Wet mix. With side chaining enabled, it appears the threshold is triggered by the voiceover (wet mix), but acting on the song (dry mix). The two knobs that were messing me up were Make Up (compression compensation), and Dry/Wet. It was not until I took those to extremes that I could understand what the "guts" were doing. The 4:1 ratio, 500ms release were prefect for what I was doing, thanks SuperG.
The Make Up was what was doing me in at first, since it was defeating the point of ducking in the first place on me (and auto makes it worse!!). The Dry/Wet comes in very handy to tailor the mix, and for "strict ducking" it best on settings > 70%.
The application I was using this in was using R-Mix to punch most of the vocals from a Billy Joel song to test out a new microphone, and although that is "pretty good" it is not perfect. Using "mild" ducking on that original track effectively removed the vocal artifacts the R-Mix missed and melded the performance much better.