Yes, that's how I'd go about it: invert and sum. The author does indeed start there, but then applies a precise analysis of the spectral differences by programmatically comparing each FFT bin individually. It would not, however, be necessary to go to such lengths if you just wanted to do some experiments for yourself.
What would surprise me is if that "ghost" information turned out to be musically significant. I wouldn't expect it to be. But it might be a fun experiment to try.
People tend to freak out when they hear what's been excised by MP3 encoding. Lots of high-frequency information and transients are given up. Your first reaction is "that stuff doesn't
sound insignificant!". But in theory, it's all information that your ears would have naturally filtered out anyway, so nothing of importance is lost.
The problem with theory versus practice is that between what's clearly audible and what's clearly inaudible lies a large grey area where an element may or may not be audible depending on many factors.
Pick up a book on perceptual encoding and look at the standard graph given for the temporal-masking "shadow". That graph was arrived at by averaging the results of subjective listening tests by many people. Chances are, it doesn't match your own shadow exactly, just as you probably don't match the average height, weight, eye color, tolerance to cold weather or hot peppers.
The encoder has to assume that your perception is
close enough to the average. It also has to ignore some variables in the name of efficiency. It cannot know with certainty that an element is definitely audible or inaudible to any specific listener.
Anyhow, that's all academic because for me, 192 kb/s or higher sounds OK, and - for me - 320 kb/s is indistinguishable from the original wave file. The author actually had to jump through some hoops to get any usable information from his "ghost" data.