Yeah... all those are probably easier options.
The easiest likely being just downloading the vid using one of those YouTube download apps. Then you can just import the vid into Sonar and the audio will get stripped out into a new audio track... where you can do whatever you want with it.
I however usually just hook the headphone output of my laptop (a separate computer from my DAW) to my mixer using a Y splitter, send the direct outs of those channels into my interface (which are always hooked up via a snake anyway) and record the track in real time. I can use the mixer to control levels and add some EQ if I want as well.
I don't do that often though but it works nicely.
Now that I own a half decent screen cap software I can actually just record the vid (or just the audio using a special feature) into the screencap recorder and then do whatever I want with the audio after.
So... definitely lots of ways to do this.
I just saw the Voicemeeter thing mentioned in your OP and since I've been messing with it lately figured I'd point out how you could do that.
Cheers.