The race of 16/44, 24/192, ... formats in search of the right sound. Vinyl, tapes, cassettes, CDs, etc.

Once, while surfing the Internet, I was looking for an answer to the question - what is the dynamic range of a vinyl record and whether it makes sense to digitize them in order to get a quality better than a CD. And I found an article that “distributing audio in 24-bit/192 kHz format does not make sense” and a couple of very interesting video lectures on the topic.

The author of the article and video is
Chris "Monty" Montgomery -
Red Hat
, creator of the
Ogg
and
Vorbis
,
Fedora
, founder of xiph.org.
Currently works at Mozilla
, multimedia programmer, musician.

↑ What to do?

1) Look for high-quality original CDs or high-quality “rips” of them, for example, in FLAC format.
2) Listen to music through high-quality digital-to-analog converters (DAC/DAC). For example, our USB audio DAC “Goldsmith”, not to mention much more expensive devices. Even a homemade product based on PCM2705 will give a real and quite noticeable difference compared to a sound card integrated into a laptop. Many laptops and mobile phones suffer from mediocre sound quality, and the sound solutions built into the motherboard are far from ideal.

3) Use good headphones or high-quality active acoustics.

4) Use special amplifiers for headphones - not all sound cards work well with low-impedance loads.

Perhaps all this together will allow you to hear your favorite music in a new way!

Well, and finally, a video to lift your spirits. And it doesn’t matter how many bits and kilohertz there are: Tom Jones & Jerry Lee Lewis, Rockin' Medley. 1969

Thank you for your attention!

Expert listening

The following devices were used as “reference” devices:

bud headphones (with replaceable silicone tips) Sharp HP-MD33-S, nominal impedance 16 Ohm
Monitor headphones Sennheiser 265 Linear (150 Ohm)

Speaker systems involved:

Active stereo pair JetBalance JB-382

Listening showed that headphones with a nominal value of 16 Ohms are strictly contraindicated. The system perfectly recognizes their connection, but the bass disappears somewhere, and all background noise (including from working disks) seems to be selectively amplified. Headphones with an impedance of 150 Ohms are sounded with full bass, but the system did not automatically recognize the connection of a professional rating.

Non-linear distortion on good speakers becomes too noticeable. Connecting “high-fidelity” multi-channel acoustics makes no sense if you only have stereo recordings. Praise be to the Almighty, software (WinDVD 5.0 Platinum) playback of a six-channel DVD - audio 24 bit 96 kHz occurs without distorted resampling of the 16-bit audio signal. If only Azalia also supported multi-channel SACD! Then computer audio could be considered as a backup to Hi-Fi equipment.

Artificial virtualization, even on a cool six-channel acoustic set, is unlikely to make the right impression. The codec has nothing to do with it: it’s just that there are no revolutionary achievements to speak of in the algorithms for extracting additional channels from regular stereo.

In the settings panel of the C-Media codec software there is an interesting button called Dolby Digital Live. Is the coding mode really “alive”? I have to disappoint you: this mode was not tested as part of our testing - any encoding quality settings are not available in the current version of the software. Although the C-Media 9880 codec supports such exotic things, all the “things” from Dolby ® Laboratory, including the DDICE-DolbyR Digital Interactive Content Encoder, are optional “software”.

Perhaps using a more successful audio codec and adding all sorts of filters to it when built into the motherboard will improve the situation, but 32-bit 192 kHz audio is needed here like “skis in a bathhouse.” Moreover, miniaturization of motherboards and cases is becoming fashionable, and there will simply be nowhere to cram bulky “hi-fi” audio hardware.

↑ Links

• Interesting hearing test from Philips:
Philips Golden Ears (Russian)

The unique Golden Ears training program has been developed for our engineers to develop their skills as acoustic experts.
Thanks to their ability to evaluate sound, we create devices with superior sound quality to reveal all the nuances of musical compositions. Translation of Chris Montgomery's article into Russian:
• Digital audio format 24/192, and why it makes no sense. Part 1 • Digital audio format 24/192, and why it makes no sense. Part 2 • Digital audio format 24/192, and why it makes no sense. Part 3

Digital audio format 24/192, and why it makes no sense

###Back to your ears We've discussed the range of frequencies that your ears can detect, but what about dynamic range (that's the range from the quietest sound to the loudest)?

One way to accurately determine dynamic range is to look again at the pain threshold and hearing threshold curves. The distance from the highest point of the pain threshold curve to the lowest point of the hearing curve is about 140 decibels - for a young and healthy person. True, you won’t be able to listen to sound at such a volume for a long time, since +130 dB is already enough to damage your hearing in a few minutes or even seconds. For reference, I will say that the volume of a jackhammer at a distance of one meter is 100-110 dB.

An interesting point: the hearing threshold increases with age and hearing loss, and the pain threshold decreases with age. The hair cells of the cochlea in the ear capture only part of the entire 140 dB range, so the muscles of the ear continuously regulate the amount of sound reaching the cochlea by shifting the auditory ossicles - much like the iris regulates the amount of light entering the eye. The mechanism ossifies with age, which limits the auditory dynamic range and reduces the effectiveness of protective mechanisms.

###Ambient Noise Few people realize how quiet sound can be at a person's hearing threshold.

The weakest sound pressure that a person can perceive is -8 dB SPL. On the A-scale for measuring noise levels, the hum from a 100 W incandescent lamp at a distance of one meter is about 10 dB SPL, which is 18 dB louder. The hum of the lamp will be much louder if you connect it to a rheostat.

An example of a sound pressure level of 20 dB SPL (which is 28 dB louder than the quietest sound) is often cited in an empty recording studio or soundproofed room. Finding a quieter place is quite difficult, so you have never heard the noise made by a light bulb.

###16-bit dynamic range 16-bit linear pulse code modulation has a dynamic range of 96 dB, according to the most common calculation method, where the dynamic range is calculated as (6*number of bits) dB. Many people believe that 16-bit audio does not transmit arbitrary sounds quieter than -96 dB. This is a big misconception.

Below I have provided two links to 16-bit audio files. One contains 1 kHz sound at 0 dB (where 0 dB is the loudest sound), and the other also contains 1 kHz sound at -105 dB.

Sample 1: Sound 1 kHz at 0 dB (16 bit / 48 kHz WAV)
Sample 2: 1 kHz audio at -105 dB (16 bit / 48 kHz WAV)

Above is a graph of spectral analysis of sound with a volume of -105 dB, recoded to 16/48 format using PCM. The volume of 16-bit PCM audio is clearly below 96 dB, otherwise -105 dB would not be imaginable or audible.

How is this possible? Encode this signal without distortion so that it is significantly above the noise floor, while its amplitude takes up a third of a bit?

Part of the puzzle is solved by the correct pseudo-random signal, which makes the quantization noise independent of the input signal. Indirectly, this means that this quantization method does not introduce distortion, but only uncorrelated noise. This in turn means that we can encode signals with arbitrary bit depths, including signals with peak amplitudes, in less than one bit. However, a pseudo-random signal does not change the fact that if the signal level drops below the noise level, it practically disappears. How can a sound of -105 dB still be heard against a background noise of -96 dB?

The answer is that we are misrepresenting the noise characteristics of -96 dB. We use a non-applicable definition of dynamic range. The formula (6 * number of bits) dB gives us the rms noise of the entire signal band, and each hair cell is sensitive only to a narrow spectrum of the entire frequency band. Since each hair cell hears only a portion of the total noise energy, the noise level received by the cell will be much lower than the entire -96 dB frequency range.

16-bit audio can have deeper modulation than 96 dB if you use the right pseudo signal, which shifts the energy of the quantization noise into an area where it is harder to hear. In practice, 16-bit audio can reach volumes of 120 dB.

120 dB is more than the difference between the sound of a mosquito in the room and a jackhammer a foot away from you. Or the difference between an empty, soundproof room and a sound loud enough to damage your hearing in seconds.

16 bits are enough to store the entire audible spectrum, and will always be enough.

###Signal-to-Noise Ratio It's worth making a quick note that the signal-to-noise ratio for the ear is less than the dynamic hearing range. Within a given critical band, typically the signal to noise is only 30 dB. The signal-to-noise ratio will not reach the audible range, even if the frequency band is expanded. This ensures that the 16-bit PCM format provides more resolution than necessary.

It is also worth noting that increasing the bit depth of audio from 16 bits to 24 does not increase the resolution or “quality” of the sound. This will simply expand the dynamic range - the distance between the quietest and loudest sounds - by reducing the noise level. Be that as it may, 16 bits already provide a level of noise that we are unable to hear.

###When do 24 bits matter? Professionals use 24-bit samples to record music due to lower noise levels and for convenience reasons.

16 bits are enough to cover the entire audible range with a margin. But it does not cover the entire possible range of audio equipment. The main reason to use 24 bits during recording is to avoid errors. Rather than being careful about centering 16 bits at the risk of cutting off high frequencies or adding noise, 24 bits allows the operator to set a rough level and not think about it any further. Missing a couple of bits has no consequences, and effects that dynamically compress the recorded spectrum have more wiggle room.

The engineer also requires more than 16 bits when mixing signals and mastering. Modern workflows can include literally thousands of effects and operations. The quantization noise and noise floor of a 16-bit sample may not be noticeable during playback, but when such noise is magnified several thousand times, it becomes immediately noticeable, and the 24-bit format keeps the accumulated noise at a very low level. Once the music is ready to be burned onto discs, there is no reason to leave it at more than 16 bits.

###Listening Tests Understanding lives where theory and reality meet. The issue is resolved only when they both agree.

Empirical data obtained from listening tests showed that 44.1 kHz/16 bit provides the highest possible playback quality. Many controlled tests have confirmed this, but I recommend the recent work "CD-Standard Audibility, A/D and D/A Conversion Used in High-Resolution Audio Reproduction" by the local folks at the Boston Audio Society.

Unfortunately, to access the full text of the work you must be a member of the Society of Audio Engineers. However, this work has been widely discussed in many articles and forums by authors who are part of it [the community]. Here are some links:

New sampling rate: how high is the quality of modern CDs?
Hydrogen Audio forum thread
Background information from the Boston Audio Community page, including equipment list and sample list

In this study, subjects were selected to choose between high-quality audio DVD/SACD recordings, selected by high-definition audio purists to demonstrate its superiority, and the same recordings, but in 16/44.1 kHz CD format. Listeners were required to identify any differences between them using a random selection methodology. The Boston Audio Society conducted an experiment using high-end equipment in a noise-isolated environment with both normal and trained listeners.

Among 554 trials, subjects chose “correctly” 49.8% of the time. In other words, they were trying to guess. Not a single listener during the entire test was able to identify which of the recordings was in 16/44.1 format and which was high definition sound. And the 16-bit signal wasn't even smoothed!

Another recent study examined the ability to hear ultrasound, as earlier studies had suggested. The test was designed to maximize recognition by adding intermodulation components where they would be most audible. It was determined that ultrasonic waves could not be heard... but it turned out that distortion from intermodulation components could be recognized.

This article spawned a series of further studies, most of which had conflicting results. Some confusion was resolved when it was discovered that ultrasound could cause more intermodulation distortion in power amplifiers than expected. For example, David Griesinger conducted this experiment and found that his acoustic setup did not introduce noticeable intermodulation distortion, but the amplifier did.

###Reader beware It is very important not to take individual works or “expert comments” out of context or only take them from resources that interest you. Not all articles fully agree with these results (and a few even disagree with most of them), so it's easy to stumble across minority opinions that may argue for any point you can imagine. Regardless, the articles and references above represent great importance and a serious body of knowledge and experimental records. There is not a single known article that has stood the test of time and casts doubt on the validity of these results. Disputes only occur among consumers and within music lover communities.

If anything, the number of ambiguous, incomplete, and downright invalid experimental results available in a Google search underscores how difficult it is to conduct accurate and unbiased research. Various scientists are looking for all sorts of little things, requiring rigorous statistical analysis to reveal the subconscious choices that subjects unintentionally made. So we are rather trying to prove something that doesn't exist in principle, which makes things even more complicated. Proving the null hypothesis is akin to solving the stopping problem—it's unrealistic. The only way to confirm something in this case is to collect a lot of empirical data.

Despite this, work confirming the null hypothesis is really strong evidence; confirming “inaudibility” experimentally is much more difficult than discussing it. Unknown errors in test procedures and equipment almost always produce false positives (due to the random introduction of sound differences) rather than false negatives.

If professional researchers have such a hard time researching individual audio differences, you can imagine how difficult it is for amateurs.

###How to (inadvertently) ruin the results of a sound experiment The "best" comment I've heard from people who believe in high-quality audio (paraphrased): "I've heard high-quality audio in person, and the improvement in sound quality is obvious. Do you seriously want me to not believe my ears?”

Of course, you can trust your own ears. But the fact is that the brain is too trusting. I'm not trying to offend anyone, this is a problem for all people. ####Bias, the Placebo Effect, and Double-Blind Testing Any test where the listener can identify two options by anything other than hearing will usually produce the results that the listener expected in advance. This is called bias and is similar to the placebo effect. This means that people "hear" differences due to subconscious cues and preferences that have nothing to do with sound - like choosing a more expensive (or nicer) amplifier over a cheaper one.

The human brain is designed to notice features and differences where there are none. And this feature cannot be turned off by simply asking a person to make objective decisions - this happens on a subconscious level. Bias cannot be eliminated by skepticism. Controlled experiments prove that being aware of biased decisions only increases the effect! A test that has not eliminated the influence of biased judgments is worthless.

In single-blind testing, the listener knows nothing in advance about the options and receives no feedback during the trial. Such testing is better than direct comparison, but does not exclude experimenter bias. The person administering the test may inadvertently influence the test or convey their own bias to the listener with careless remarks (for example, "Are you sure that's what you hear?" body language can also indicate a "wrong" choice, causing doubt, and so on). The influence of tester bias on listener performance has also been confirmed experimentally.

Double-blind tests are the standard; in such tests, neither the experimenter nor the listener receives any information about the content of the test and the current results. The most well-known example is computer-based ABX tests, which are freely available and can be run on your own PC [19]. ABX tests require a minimum number of hearing test results before they are considered invalid. Reputable audio forums such as Hydrogen Audio often prohibit any discussion of hearing test results unless they meet minimum requirements for objectivity [20].

Above is the working window of Squishyball, a simple command line ABX tool running in xterm.

Personally, I have not conducted a single quality comparison test in my research (no matter how serious) without using ABX. Science is science, there is no room for negligence.

####Loudness Tricks The human ear can consciously detect amplitude differences in loudness of about 1 dB, and experiments show the ability to detect differences as small as 0.2 dB at a subconscious level. People almost universally find louder sound better, and 0.2 dB is enough for a person to show a preference. Any comparison in which the amplitudes are not carefully adjusted will result in a strong preference for the loud sound, even if the differences in loudness are small enough to be recognized. Audio retailers have known about this trick for a very long time.

The professional testing standard requires amplitude differences of no more than 0.1 dB. This often requires the use of an oscilloscope or signal analyzer, because guessing and twisting knobs until the sound matches is quite wasteful.

####Signal Clipping Signal clipping is another mistake (sometimes only apparent over time) that is easy to make. It may be that several trimmed samples and their derivative signals are compared to the untrimmed signal.

The danger of signal clipping is especially damaging in tests that sample, resample, and manipulate digital signals on the fly. Let's say we want to compare the sound quality of signals with a sampling frequency of 48 kHz and 192 kHz. A common way to conduct such an experiment is to downsample from 192 kHz to 48 kHz, then upsample again to 192 kHz, and then compare the two signals in an ABX test [21]. This order allows us to eliminate any possibility of changing equipment parameters or substituting samples that affects the results. We can use the same DAC to play both samples and switch between them without any changes in the operating mode of the equipment.

Unfortunately, most samples use the entire digital range. Careless application of oversampling can often result in accidental audio clipping. It is important to either watch out for clipping (and discard the clipped audio) or avoid it by using various techniques, such as attenuating the audio.

####Different Media - Different Master I looked at several articles and blogs that argued the merits of 24 bit or 96/192 kHz by comparing CDs and audio DVDs with the "same" recordings. This comparison is invalid because different masters were used for these recordings.

####Unintended Signals Unintentional audio signals are almost inevitable in older analog and hybrid digital/analog test setups. Obviously, digital setups can completely eliminate the problem in some forms of testing, but can also increase the number of potential software errors. Such limitations and bugs have been producing false positive results in testing for quite a long time [22].

The article "Digital Testing - More About ABX Testing" tells the fascinating story of an amazing hearing test conducted in 1984 designed to challenge the authority of audiophiles of the time who initially claimed that CD was inferior to vinyl. The article is less about the results of the test (I suspect you can guess what they were) and more about the chaos of the world involved in conducting such a test. For example, an error on the part of the test organizers inadvertently revealed that the invited listening expert was making selections based not on sound quality, but rather on the various crackling noises that the switch relays produced.

Anecdotal stories are no substitute for real data, but this story shows how easily hidden deficiencies can interfere with hearing tests. Some of the beliefs of music lovers are also quite funny, for example someone hopes that many of the current studies will be considered stupid in 20 years.

#####Notes for Part 3

Everyone knows that feeling when the eardrums “unclench” after turning off the loud music.
Some excellent graphs can be found on the HyperPhysics website.
20 mPa is usually taken as 0 dB for ease of measurement. This is approximately equal to the hearing threshold at a frequency of 1 kHz. At frequencies from 2 to 4 kHz the ear is as sensitive as 8 dB.
The article below describes the best explanation of anti-aliasing that I have seen, although it [the article] is more about image anti-aliasing. But the first half covers the theory and practice of anti-aliasing in audio before moving on to the topic of images. Cameron Nicholas Christov, article “Optimal anti-aliasing and noise reduction in images.”
Engineers involved in digital signal processing may have noticed, as my omniscient compatriot did, that 16-bit audio can, in theory, have an infinite dynamic range for pure audio if you use an infinite Fourier series to transform it. This concept is very important for radio astronomy. Although the ear's operation is not very different from the Fourier transform, its resolution is relatively limited. This places a limit on the maximum possible bit depth of 16-bit signals.
Digital music production uses 32-bit floating point numbers because it is very convenient for modern processors, and because it completely eliminates the chance of accidental clipping going undetected and ruining a song.
Several readers wanted to know how the Mayer and Moran test in 2007 could have given a null result if ultrasound can cause intermodulation distortion? It should be obvious that “could” and “sometimes” are not the same as “could” and “always.” Intermodulation distortion from ultrasonic waves may or may not occur in any system under any set of conditions. Mayer and Moran's zero result means that intermodulation distortion was inaudible on the systems they used during the test. Readers are encouraged to familiarize themselves with a simple intermodulation distortion test to determine the intermodulation potential of their own equipment.
Karou and Shogo, article “Determination of the threshold for sound with a frequency above 22 kHz” (2001). Paper number 5401, presented at the 110th meeting, May 12-15, 2001, Amsterdam.
David Gresinger, "Mid-Frequency Perception and High-Frequency Intermodulation Distortion in Speakers and Their Interaction with High-Resolution Audio Recordings."
Since publication, several commenters have sent me similar versions of the same joke (paraphrased): “I was once listening to some headphones/amps/records expecting result A, but was very surprised when I came to result B! It has been proven that bias is nonsense!” I can say two things. First, judgmental bias does not replace all correct results with incorrect ones. It tilts the results in a direction that is difficult to predict by an unknown amount. How can you say what is true and what is not for sure if the test was rigged by your subconscious? Let's say you expected to hear a big difference, but were surprised to hear a small difference. What if there was no difference at all? Or is there a difference, but being aware of possible bias, your well-intentioned skepticism offset your opinion? Or maybe you were completely right? Objective testing, such as ABX, removes all of these uncertainties. Secondly: “Do you think you are not biased? Great! Prove it!” The significance of an objective test lies not only in its ability to convince us, but also in its ability to convince others. Claims require evidence. Extraordinary claims require extraordinary evidence.
Probably the simplest tools for ABX testing: Foobar2000 with the ABX plugin Squishyball and the Linux command line tool that we use at Xiph
At Hydrogen Audio, the abbreviation TOS8 (objective testing requirement) denotes the necessary testing condition, the number 8 denotes the eighth clause of the terms of service.
It is generally accepted that oversampling causes irreparable harm to the signal. It's not like that at all. At least until someone makes a mistake, such as cutting off the signal. The downsampled and then resampled signal will be indistinguishable from the original. This is a common test used to set higher sampling parameters, which is not necessary.
This may not be directly related to sound, but... are neutrinos faster than the speed of light, seriously?

Noises

Some musicians study the specifications of audio interfaces to understand which sound card is better. But, despite the fact that the specification does play an important role when choosing equipment, it is still not so important. The specification won't tell you how good a particular audio interface sounds.

The most important characteristics of any audio interface are dynamic range and signal-to-noise ratio . There is still some confusion with these characteristics, since there are several options for measuring them. But the measurement method used by audio interface manufacturers inspires confidence.

In audio interface terminology, the concept of "signal-to-noise ratio" compares the maximum volume level of a signal that you can apply to the input of an audio interface (meaning that the input meters will register 0 dB) with the level of background noise when no signal is being received. However, some clever sound card manufacturers have decided to automatically mute the output signal if nothing is fed to the input, thereby significantly increasing the signal-to-noise ratio. Therefore, dynamic range measures the noise level when a low-level signal (usually a 1 kHz sine wave at -60 dB), which prevents the audio interface from muting the channel. This is why dynamic range is a truer metric.

However, when comparing the dynamic ranges of different audio interfaces, keep in mind which signals you'll be recording in the first place. If you are going to record, for example, external synthesizers, most likely their dynamic range does not exceed 80 dB. If you plan to record a live band using microphones, then the background noise of the microphones and microphone preamps will most likely already be higher than the noise of the audio interface , especially considering how difficult it is to muffle and soundproof the room in which the recording is taking place.

So, while it makes sense when purchasing an audio interface to select equipment with the lowest noise levels , the reality is that most people will not be able to tell the difference between an audio interface with 110 dB dynamic range and one with 120 dB dynamic range. Most often, the high level of noise in the recording is caused by completely different factors, and not the quality of the audio interface.

Just as we want to have an audio interface with the lowest noise level, we also want the sound card to have a minimum level of distortion and the recorded waveform to have the same shape as at the input to the device, without any harmonic changes. But don't worry - most modern audio interfaces have extremely low distortion levels (0.001% or better).

conclusions

Contrary to what some people think, there is no best and most universal audio interface. You should choose an audio interface based on your needs, and not because someone said or advised something. The action plan is as follows:

First of all, decide how many inputs and outputs you need.
Decide which data transfer format suits you best.
Check which audio interface models meet your requirements and add them to your list. Most musicians end up with 3 or 4 audio interfaces to choose from.
Now you should learn as much as possible about each model and study the specifications of the audio interfaces in detail. At this stage, you can also find out the opinions of your friends about these models, read reviews, articles and specialized forums on this topic.
When making your final decision, make sure that the audio interface you choose is fully compatible with your computer.