Recently, there has been an interest in high sample rate recordings for films. I had pondered over this and why was it something that people took interest in. Was it just that it seemed to record high quality? Was it just for the sake of saying? Because if you think of it, the file size is almost double, track counts are less, CPU load is more, not all plugins work, you record unwanted frequencies, etc. But I wanted to know more about the pros and cons and the benefit of this. Surely it cant just be a number and so people want to look at it.

Sampling

Sampling frequency is a very important factor in this. I have heard comments sometimes saying that high frequency tones above 30kHz etc were used in scenes in a film that was mixed at 48kHz. Whether this really had an impact or not will purely be a placebo effect, but yes it will be reproduced, at a different frequency. The reason is Nyquist Theorem. The theorem states that: A sampled waveforms contains ALL the information without any distortions, when the sampling rate exceeds twice the highest frequency contained by the sampled waveform. This means that if your session runs at 48 kHz, it can only reproduce frequency upto 24kHz of audio. There is no way to reproduce a 30kHz. In fact, there are some interesting things that will happen at higher frequencies. Lets say there is a tone at 30kHz. Because it is higher than 24kHz, it ends up being reproduced as roughly 16kHz (look at the image). As the frequency goes nearer to the samplerate, the reproduction approaches 0. So, if you have a frequency of 48KHz in a sample rate of 48 khz, as far as the sampler goes, it will be 0 Hz. Its like film. Film is 24 frames per second. So, if you have a light that flashes 24 times a second, when you shoot that, it will either be on or off because that is the state captured on film. (If you look at a TV scene in an old movie, you can see the image shaking and a line travelling across the TV. This is the reason)

Sampling at different rates. The third pic is equivalent to higher frequency than what the Nuquist theory says.

This reproduction of a different frequency at a higher than Nyquist Limit is called Aliasing. (From the word Alias which means false or assumed identity. This is exactly the case where 30kHz is represented as 16 kHz!). Many people compare high sample rate to a high resolution picture. The analogy being that just like higher Pixel count (resolution) means a clear picture, means higher sample rate means a clear sound. No. If the source does not produce anything beyond 24 kHz, then recording the source at 96 kHz will not yield any information because 96 kHz can record frequency upto 48 kHz and there is nothing above 24 kHz in the source! That being said, aliasing is a very noticable issue. This is what differentiates many Analog to Digital converters. To prevent recording of anything above half of the sample rate, there will be very steep filters in the conversion. (Lets take 48 kHz as our base sample rate). So, to prevent frequencies above 24 kHz, a filter will be introduced. Now, with steep filters, there is phase issues, and aliasing issues at that frequency. So realistically, the converter will have to filter from around 22 kHz to be on the safe side. This will bring tonal changes in the audible region.

Why 96 kHz?

If we use the above analogy, we can see that the converter filter in this case would have to be at 48 kHz. That means that it is far above the audible range even if aliasing occurs. This means that we get much clearer highs. Another benefit is that when we have pitch changes, we are shifting lets say by an octave. In the earlier scenario in 48 kHz, the highest frequency we have is 24 kHz that would end up at 12 kHz after the shift. There would be nothing above that. This is also why pitch shifting in sound design sounds dull. But think of 96 kHz. We have information upto 48 khz. Pitching would still give us upto 24 kHz and that means cleaner highs! But we cant hear anything above 20 kHz, so why so much effort? Well, we cant hear it but can certainly feel it. How many amps and speakers can reproduce is another question. But still we get so much detail without aliasing that eqing, compression, pitching and processing become really clean. Not only that, it can also distinguish between arriving times. The human brain can discern a difference in a sound’s arrival time between the two ears (Interaural Time Difference) of better than 15 microseconds, which is around the time between samples at 96 kHz sampling. (96000 samples is 1 second. So, 1 sample is roughly 10 micro seconds long, or distance between samples is 10 micro seconds. This doubles in 48 kHz samplerate.)