The Ghost in the MP3

Author
Paul P
Max Output Level: -48.5 dBFS
  • Total Posts : 2685
  • Joined: 2012/12/08 17:15:47
  • Location: Montreal
  • Status: offline
2015/05/02 22:52:36 (permalink)

The Ghost in the MP3

 
Heard an interesting interview on the radio about this academic computer music guy who composes using what is lost when a wav is converted to any mp3.  The ghost (what is 'removed' from the original) is then used as raw material for composition and effects.  Sounds pretty cool.  Web page with lots of info and the paper that describes the project in more detail.  Here's the interview (7min) but the composer doesn't say how he does what he does.
 
I know nothing of how an mp3 is produced, but it's looks like it's a lot more complicated than I'd imagined and so is the means of bringing the ghost back to life, though I haven't read the documentation in detail.  I would have thought you could just invert the mp3 and sum it with the original.  I'm going to have to try that out.
post edited by Paul P - 2015/05/02 23:03:12

Sonar Platinum [2017.10], Win7U x64 sp1, Xeon E5-1620 3.6 GHz, Asus P9X79WS, 16 GB ECC, 128gb SSD, HD7950, Mackie Blackjack
#1

1 Reply Related Threads

    bitflipper
    01100010 01101001 01110100 01100110 01101100 01101
    • Total Posts : 26036
    • Joined: 2006/09/17 11:23:23
    • Location: Everett, WA USA
    • Status: offline
    Re: The Ghost in the MP3 2015/05/03 10:09:54 (permalink)
    Yes, that's how I'd go about it: invert and sum. The author does indeed start there, but then applies a precise analysis of the spectral differences by programmatically comparing each FFT bin individually. It would not, however, be necessary to go to such lengths if you just wanted to do some experiments for yourself.
     
    What would surprise me is if that "ghost" information turned out to be musically significant. I wouldn't expect it to be. But it might be a fun experiment to try.
     
    People tend to freak out when they hear what's been excised by MP3 encoding. Lots of high-frequency information and transients are given up. Your first reaction is "that stuff doesn't sound insignificant!". But in theory, it's all information that your ears would have naturally filtered out anyway, so nothing of importance is lost.
     
    The problem with theory versus practice is that between what's clearly audible and what's clearly inaudible lies a large grey area where an element may or may not be audible depending on many factors.
     
    Pick up a book on perceptual encoding and look at the standard graph given for the temporal-masking "shadow". That graph was arrived at by averaging the results of subjective listening tests by many people. Chances are, it doesn't match your own shadow exactly, just as you probably don't match the average height, weight, eye color, tolerance to cold weather or hot peppers.
     
    The encoder has to assume that your perception is close enough to the average. It also has to ignore some variables in the name of efficiency. It cannot know with certainty that an element is definitely audible or inaudible to any specific listener.
     
    Anyhow, that's all academic because for me, 192 kb/s or higher sounds OK, and - for me - 320 kb/s is indistinguishable from the original wave file. The author actually had to jump through some hoops to get any usable information from his "ghost" data.


    All else is in doubt, so this is the truth I cling to. 

    My Stuff
    #2
    Jump to:
    © 2024 APG vNext Commercial Version 5.1