Remember that 96K TH2 thread? I Just had my mind blown, big-time

2014/06/03 13:53:47

abb

drewfx1
gswitzDrewFX, what would the time resolution of 24 bit 48kHz be? Can you figure it out? I'm curious.

Something on the order of 0.0000000000002 seconds.

How did you compute this value?

2014/06/03 14:39:48

Anderton

drewfx1
An experiment to try:
1. Make a stereo wave form at 44.1/48kHz where every sample in L vs. R are absolutely identical (i.e. L=R).
2. Upsample by 2x (or higher).
3. Shift L by 1 sample in time at the higher rate.
4. Downsample back to the original SR.
5. Zoom all the way in and compare L and R.

I understand how reconstruction and smoothing works. The question is about capture. How can something with a duration of 20 microseconds encode two events that are 10 microseconds apart?

2014/06/03 14:47:59

John

If I have this right the above from Craig is the argument about resolution or how many slices are being made at any given time.

From what I understand is it has no impact.

2014/06/03 15:00:14

Anderton

bitflipper
Keep in mind that anything that happens entirely inside a 10-microsecond timeframe is much too fast to worry about. You only worry about those frequencies after you've bought your $2,000 oxygen-free polarized cables.

This has nothing to do with frequency. Moorer's paper was about binaural hearing and the perception of delays between events, not listening to continuous frequencies. I would guess this is a refinement of the precedence effect "the first wavefront law"). He maintains people with average acuity can recognize a time differential between impulses hitting each ear of as little as 15 microseconds, and some could discriminate down to 5-8 microseconds.

I don't find that hard to believe. If I nudge a waveform one sample at a time compared to same waveform out of phase, it's clear the jump between samples is quite large. If you then switch back out of phase, the earlier one does "weight" toward one side of your hearing, as predicted by the precedence effect. Certainly 15 microseconds meets the requirement of being below the listener's echo threshold.

I didn't do the research, I'm just referencing his. I certainly don't believe we know everything there is to know about hearing and the subsequent processing of that information by the brain. Just remember how freaked out people were when they realized we see things upside down, and the brain does the needed corrections so we see images right side up.

2014/06/03 15:26:56

Anderton

John
If I have this right the above from Craig is the argument about resolution or how many slices are being made at any given time.

From what I understand is it has no impact.

That's where the controversy lies. Of course reconstruction will reconstruct a waveform; no question about that, otherwise the outputs of digital audio systems would be stair-stepped instead of continuous. This isn't about reconstructing a waveform, but about reconstructing a characteristic of the binaural listening experience which is, after all, how we hear sound.

The question is whether reconstruction is sufficiently precise to reconstruct the timing difference between two signals that are, say, 8 microseconds apart. I don't see how that's possible if the capture medium can't resolve differential timings under 21 microseconds.

Let me explain what I think is going on.

A 48kHz sample clock samples an incoming voltage, which is at "x" volts. So far, so good. 5 microseconds later, "y" volts is present at the input. 8 microseconds after that, "z" volts is present at the input. 5 microseconds later, "w" voltage is present at the input and that voltage lasts for 10 microseconds.

When the next sample occurs 21 microseconds after the first one, it will read the "w" voltage, but it will ignore the "y" and "z" values because they occurred between samples. I don't see any way the "y' and "z" values could factor into the encoding process because the system never sees them.

So then the question becomes does reconstruction reproduce those "ignored" variations successfully, and if not, does it matter? The argument that says it doesn't matter maintains that smoothing will accurately fill in the values between the "x" and "w" voltages, and will therefore reconstruct the frequency that was present at those times.

However, my understanding of Moorer's argument is that if there were spatial cues in between "x" and "w," they will be lost. Whether that matters or not depends on whether you accept Moorer's contention that people can discriminate between extremely short time delays when signals hit both ears. As I doubt anyone in this thread has verified or disproven these experiments, I don't think it's possible to accept or dismiss them out of hand. However, if (I emphasize "if," although I don't know what his motivation would be for making things up) he is correct, then given the sampling scenario presented above, I simply don't see any way that a 21-microsecond window can encode incoming delay-based events whose duration is significantly less than that.

2014/06/03 15:33:37

Anderton

abb
drewfx1
gswitzDrewFX, what would the time resolution of 24 bit 48kHz be? Can you figure it out? I'm curious.

Something on the order of 0.0000000000002 seconds.

How did you compute this value?

I don't think he's talking about time resolution, I think he's talking about amplitude resolution after 24-bit D/A conversion. Whether he's considering 24 bits as the actual value or taking into account circuit board layout issues, noise, laser trimming tolerances, etc. I don't know but I don't see how it relates to the question I'm asking.

2014/06/03 16:31:01

drewfx1

Anderton
drewfx1
An experiment to try:
1. Make a stereo wave form at 44.1/48kHz where every sample in L vs. R are absolutely identical (i.e. L=R).
2. Upsample by 2x (or higher).
3. Shift L by 1 sample in time at the higher rate.
4. Downsample back to the original SR.
5. Zoom all the way in and compare L and R.

I understand how reconstruction and smoothing works. The question is about capture. How can something with a duration of 20 microseconds encode two events that are 10 microseconds apart?

Consider again the sampling of a pure sine wave example - not just the frequency and amplitude of the sine wave is captured by sampling, but the phase as well. Your 10 microseconds in the time domain equates to phase shift in the frequency domain.

So take the 12kHz sine wave example again:
~10 microseconds = 45 degrees

So in this case it just means you are sampling a sine wave with 45 degrees of phase shift (~10 microseconds) and the sampled values will be different because the samples are taken at different points in the cycle.

So if you sample two sine waves 45 degrees out of phase with each other they will be reconstructed as 45 degrees out of phase with each other.

And you know that Fourier says that any complex waveform is just a combination of sine waves at various frequencies, amplitudes and phases.

2014/06/03 16:34:26

abb

Anderton
abb
drewfx1
gswitzDrewFX, what would the time resolution of 24 bit 48kHz be? Can you figure it out? I'm curious.

Something on the order of 0.0000000000002 seconds.

How did you compute this value?

I don't think he's talking about time resolution, I think he's talking about amplitude resolution after 24-bit D/A conversion. Whether he's considering 24 bits as the actual value or taking into account circuit board layout issues, noise, laser trimming tolerances, etc. I don't know but I don't see how it relates to the question I'm asking.

I'm similarly confused as to the relevance of the number; hence my question.

2014/06/03 16:39:22

John

Anderton
John
If I have this right the above from Craig is the argument about resolution or how many slices are being made at any given time.

From what I understand is it has no impact.

That's where the controversy lies. Of course reconstruction will reconstruct a waveform; no question about that, otherwise the outputs of digital audio systems would be stair-stepped instead of continuous. This isn't about reconstructing a waveform, but about reconstructing a characteristic of the binaural listening experience which is, after all, how we hear sound.

The question is whether reconstruction is sufficiently precise to reconstruct the timing difference between two signals that are, say, 8 microseconds apart. I don't see how that's possible if the capture medium can't resolve differential timings under 21 microseconds.

Let me explain what I think is going on.

A 48kHz sample clock samples an incoming voltage, which is at "x" volts. So far, so good. 5 microseconds later, "y" volts is present at the input. 8 microseconds after that, "z" volts is present at the input. 5 microseconds later, "w" voltage is present at the input and that voltage lasts for 10 microseconds.

When the next sample occurs 21 microseconds after the first one, it will read the "w" voltage, but it will ignore the "y" and "z" values because they occurred between samples. I don't see any way the "y' and "z" values could factor into the encoding process because the system never sees them.

So then the question becomes does reconstruction reproduce those "ignored" variations successfully, and if not, does it matter? The argument that says it doesn't matter maintains that smoothing will accurately fill in the values between the "x" and "w" voltages, and will therefore reconstruct the frequency that was present at those times.

However, my understanding of Moorer's argument is that if there were spatial cues in between "x" and "w," they will be lost. Whether that matters or not depends on whether you accept Moorer's contention that people can discriminate between extremely short time delays when signals hit both ears. As I doubt anyone in this thread has verified or disproven these experiments, I don't think it's possible to accept or dismiss them out of hand. However, if (I emphasize "if," although I don't know what his motivation would be for making things up) he is correct, then given the sampling scenario presented above, I simply don't see any way that a 21-microsecond window can encode incoming delay-based events whose duration is significantly less than that.

I see what you are getting at. The problem is when you introduce binaural into it. Spacial data is not held in one mono audio stream but with stereo pairs. This is the way I understand it. You have two streams of data that interact with one another to give a sense of space. Mono doesn't do this. There is nothing in the mono signal to give a space sense.

BTW Binaural is not the same as stereo. From my experimenting many years ago it is a technique for creating the same aural experience as being there. You use two mics place about as far apart as ones head with a baffle between representing the ears. You need headphones to listen to the resulting recording. On loud speakers it just sounds like mono.

You may not have meant that and only were referring to our two ears.

2014/06/03 16:39:54

drewfx1

AndertonLet me explain what I think is going on.

A 48kHz sample clock samples an incoming voltage, which is at "x" volts. So far, so good. 5 microseconds later, "y" volts is present at the input. 8 microseconds after that, "z" volts is present at the input. 5 microseconds later, "w" voltage is present at the input and that voltage lasts for 10 microseconds.

When the next sample occurs 21 microseconds after the first one, it will read the "w" voltage, but it will ignore the "y" and "z" values because they occurred between samples. I don't see any way the "y' and "z" values could factor into the encoding process because the system never sees them.

So then the question becomes does reconstruction reproduce those "ignored" variations successfully, and if not, does it matter? The argument that says it doesn't matter maintains that smoothing will accurately fill in the values between the "x" and "w" voltages, and will therefore reconstruct the frequency that was present at those times.

What the sampling theorem says is that for a signal band limited to one half the sampling frequency, theoretically* everything is captured. This includes everything at any point in time between the samples for a properly band limited signal.

So "y" and "z" are indeed stored in the sampled signal - that's how come it can be reconstructed.

*In practice it's isn't perfect, but neither is human hearing.

Remember that 96K TH2 thread? I Just had my mind blown, big-time

Use My Existing Forum Account

Use My Social Media Account