craigb
What do you think is necessary to replicate a performance to levels beyond what a human can hear? 24/96? Better filters before (or after) the AD conversion?
As a playback format, 16/44.1kHz is already better than pathetic humans for the overwhelming majority of real world listening conditions. Unfortunately this is upsetting to those who wish to use audio as a means of separating themselves from the herd, but it is what it is.
Filters - you can look for yourself at the detailed filter responses of DAC chips on the spec sheets at any chip manufacturer's site. Typically they are excellent - almost ruler flat - below ~20 kHz. The anti-aliasing is generally designed to be good enough in the real world to put it below the noise floor caused by other factors (i.e. for noisier chips the anti-aliasing isn't as good). So the filters are typically quite good up to ~20kHz, where they begin to roll off and allow more aliasing/imaging.
Some humans can indeed hear test tones as high as the mid 20 kHz's, but that's only for test tones played back at 100 dB SPL or more. There is also some evidence that filter ringing at high frequencies may be audible, but these are borderline cases involving steep filters and most chip makers make the filters' transition bands as wide as possible to minimize ringing. Otherwise, despite much discussion, many misunderstandings and much misinformation, there is little objective evidence that frequencies > 20kHz are audible to humans under real world listening conditions.
Bit depth -
a. How loud are you listening (in dB SPL)?
b. What's the average (RMS) level of your audio relative to 0dBFS*?
c. How far below 0dBFS is your quantization error plus dither**?
From this, you can calculate how loud the QE + dither is - in dB SPL - and compare it to your threshold of hearing, and if isn't below that you can then compare it to the noise in your listening environment.

If you don't want to do these calculations, you just need to understand that noise shaping curves are typically based in large part on the curves above for somewhere between the threshold of hearing and ~15 phon (the equal loudness contour that equals 15 dB SPL at 1kHZ) - because that's the level they expect the QE + dither to played back at in the real world.
The shape of the noise matters, but the noise level for typical homes/offices is supposedly around 40dB SPL. A professionally soundproofed studio is supposedly around 20 dB SPL.
And if you do ABX testing, the results will match up what you'd expect from understanding the above.
*if you want to do it properly, you have to worry about weighting curves and the integration time of your dB meter, but it's only going to amount to a few dB of difference if you just want ballpark numbers here.
**you can ignore noise shaping, as the whole idea is for noise shaping to make things less audible. Based on this, you can just use -90 dBFS for the RMS level of unshaped 16 bit dither as a ballpark.