The race of 16/44, 24/192, ... formats in search of the right sound. Vinyl, tapes, cassettes, CDs, etc.

Once, while surfing the Internet, I was looking for an answer to the question - what is the dynamic range of a vinyl record and whether it makes sense to digitize them in order to get a quality better than a CD. And I found an article that “distributing audio in 24-bit/192 kHz format does not make sense” and a couple of very interesting video lectures on the topic.

The author of the article and video is
Chris "Monty" Montgomery -
Red Hat
, creator of the
Ogg
and
Vorbis
,
Fedora
, founder of xiph.org.
Currently works at Mozilla
, multimedia programmer, musician.

Something about human psychology

Last year, Neil Young *
and Steve Jobs discussed creating a service for downloading audio in "uncompromising studio quality", and some time later Neil Young introduced the Pono player, which would be used to play this audio.
In general, investors like this idea, and they just recently allocated $500,000 to popularize this format. Essentially, what is this money allocated for? To deceptive marketing. Why does this marketing work
?
Well, it works because of
a
couple of factors
.
First
, when people consume news like this, they often make assumptions about how digital audio works rather than how it actually works: they assume that increasing the sample rate is the same as increasing the number of frames per second in video.
In fact, such an increase is similar to the addition of infrared and ultraviolet colors, which we will never see and cannot see in principle. (This is the central part of the article, but it will be a little further.) Secondly
, people may believe that they hear a difference in sound when in fact there is none.
It is normal for humans to make such thinking errors. These errors are called cognitive distortions. Confirmation bias, herd instinct, the placebo effect, and trusting authority are just some of the cognitive biases that can lead a person to believe they hear a difference. Confirmation of prejudice: “There is more information in 24/192, which means I should hear it; oh, I hear you!” The herd instinct in general somehow magically makes people believe in something that does not and cannot exist. Trust in authority either forces you to be completely uncritical of information, or, when compared with your honest opinion, to give preference to someone else’s opinion. The Soviet popular science film “Me and Others” clearly shows some social cognitive distortions. For example, the film shows the following experiment: a group of students are shown several portraits of people, and they have to say which of the two portraits shows the same person. All students, except one, are fake and point to two portraits of completely different people, and the subject, although he initially did not think about this option, often agrees with the opinion of the majority. You will say: “No, well, I’m not like that.” Actually, it's unlikely. We are all people, we just differ in that we are aware of something to varying degrees. In any case, if people were not subject to such cognitive distortions, then marketing would not work long ago. Look around: people buy unreasonably expensive goods and are happy about it. So, 24/192 usually doesn't improve quality and that sounds like bad news. The good news is that it's easy to improve your sound quality—you just need to buy good headphones **
.
In the end, the improvement in sound quality from them is immediately noticeable, it is not illusory and pleases. At least if you buy headphones in the price range from $100 to $200, you will be happy and thank me for my advice to buy good headphones, unless, of course, you buy beautiful and expensive fashion headphones that are not intended for high-quality audio playback. Now let's get to the fun part. *
Yes, I had no idea who Neil Young was either.
It turns out that this is a famous Canadian musician... he's been famous for 50 years. **
This is my personal opinion, I am not a representative of any stores and do not pursue any commercial purpose.

The 24/192 digital audio format and why it makes no sense. Part 4 [Translation]

Modern digital technology makes it possible to raise the volume to absurd levels. There are also a variety of automated, highly complex proprietary digital station modules that are deployed everywhere without a full understanding of how they work or what they actually do.

Save and read later -

Note translation:

This is a translation of the last part of an extensive article by Christopher "Monty" Montgomery (creator of Ogg Free Software and Vorbis) about why it makes no sense for ordinary people to store and play music in 24/192 format and what can really improve the quality of playback of your favorite recordings.

[First part]

[Second part]

[The third part]

Finally, some good news!

What does it take to improve the quality of the digital audio we listen to?

The best headphones

The easiest way is not digital. The biggest improvement in sound quality for the money is a good pair of headphones. On-ear or in-ear headphones, open-back or closed-back headphones - for the most part it doesn't matter. They don't even have to be expensive, although expensive headphones can be worth the money.

Remember that some headphones are expensive because they are well made, durable and sound great. Others are expensive because they're $20 headphones that cost hundreds of dollars in styling, hype, and some brand name. I won't make any specific recommendations, but I will say that you probably won't find good headphones at big box hardware stores, even if they specialize in music equipment.

Lossless compressed format

It is true that a properly encoded OGG file (or MP3 or AAC) will be indistinguishable from the original at a moderate quantization level.

But what about poorly encoded files?

Twenty years ago all MP3 encoders were very poor by modern standards. A lot of these bad encoders are still in use, presumably because the licenses for them are cheap and most people don't differentiate or don't care about the sound differences. Why would companies spend money fixing something if people don't even know it doesn't work well?

If you move to newer formats, such as Vorbis or AAC, then nothing fundamentally will change. For example, a lot of companies and individuals used (and still use) FFmpeg's low quality standard Vorbis encoder because it comes default with FFmpeg and they don't care how bad it is. AAC has an even longer history of widespread, low-quality encoders used for lossy compression of all major formats.

Lossless compression formats such as FLAC eliminate any possibility of audio quality being compromised [23] by a bad encoder, or even a good one used incorrectly.

The second reason for the proliferation of lossless formats is the desire to avoid future losses. Each encoding and recoding loses more and more information, even if the first encoding was perfect, it is very likely that audio artifacts will appear after the second encoding. This is important for those who want to remix or sample music. This is especially important for us codec researchers, we need clear audio to work.

The best master discs

In the BAS test I mentioned above, it was mentioned in passing that the SACD version of a recording can sound significantly better than the CD version. This is not due to an increased sampling rate or quantization level, but due to the fact that a higher quality master disc is used to create the SACD. When burned to CD-R, SACD still sounds as good as the original SACD, and better than CD because the original sound used to record the SACD was better. Good mastering and production techniques obviously contribute to the quality of music [24].

The recent press coverage of "Mastering for iTunes" and other similar initiatives from other labels are somewhat encouraging. What remains to be seen is whether Apple and others will actually "get to the problem" after all, or whether this is just a bait to sell consumers the music they already own, but at a higher price.

Environment

Another “sales trick” that I would fall for is “volume” recordings. Unfortunately, there are some technical dangers here.

Old-fashioned discrete "surround sound" with multiple channels (5.1, 7.1, etc.) is a technical relic that was used back in the 1960s in movie theaters. However, the surround image is limited, and the sound from nearby speakers is distorted when the listener moves out of position or sits in the wrong place to begin with.

We can restore and create excellent and reliable location systems using tools such as Ambisonics. Problems include the cost of equipment to recreate surround sound and the fact that a recording encoded for a natural sound field sounds bad when played back in stereo and cannot be artificially recreated properly. It is very difficult to fake ambiphonic sound or holographic audio, the effect will be like 3D - it turns into a tasteless gimmick and makes 5% of the population sick.

Binaural sound is also very complex. You can't fake it because it sounds different to different people. People subconsciously move their heads to better track the source of a sound, otherwise they cannot determine its location. This cannot be taken into account in a binaural recording, although it can still be achieved in a fixed environment.

These are hardly insurmountable technical obstacles. Discrete surround audio has already proven itself in the market, and I am personally delighted with the capabilities offered by Ambisonic.

Code

“I was never bothered by music as such, the juice was in its quality!” – Flanders and Swan, “The Song of Reproduction”

The most important thing is to enjoy the music, right? Modern playback quality is incomparably better than good analogue systems of the previous generation. Is this issue just another first world problem? Perhaps, but bad mixes and encodings bore me and distract me from the music, and I'm not alone.

Why am I against 24/192? Because it's a solution to a problem that doesn't exist - it's a business model built on ignorance to deceive people. Moreover, pseudoscience goes around the world undetected, and it is all the more difficult for the truth to outshine plausibility. Even if it is a small and completely insignificant example.

“It seems to me that it is much better to understand the universe as it is than to persist in an error, however satisfying and hopeful.” - Carl Sagan

What else to read

Readers have given me links to a couple of great papers that I didn't know about before writing my own article. They cover many of the same issues, but in more detail.

"Encoding High-Quality Digital Audio" by Bob Stewart of Meridian Audio is incredibly insightful, albeit lengthy. Our conclusions are slightly different (he takes for granted a slightly wider frequency range and bit depth for no particular reason), but his point is clear and easy to follow. [Edit: I may not agree with many of his other works, but I really like this one]

Article “Digital audio. Discretization Theory by Dan Lavry of Lavry Engineering is another article that was highlighted by several readers. She explains my two pages on sampling, resampling and filtering in more detail over 27 pages, with lots of graphs, examples and links.

Stephane Pigeon of audiocheck.net wrote browser-based hearing tests and posted them on the company's website. The set of tests is still relatively small, but some are directly relevant to the context of this article. They work well and I found the quality to be quite good.

Notes for Part 4

23.

Wired magazine suggests that lossless compressed formats such as FLAC are not always truly lossless formats:

Some fighters for pure sound will generally suggest not to pay attention to FLAC and immediately buy WAV. […] By purchasing WAV, you can avoid the potential losses that may occur when converting to FLAC. It's rare, but it happens.

It is not true. The lossless compression process never changes the original data under any circumstances, and FLAC is no exception.

If Wired meant hardware file corruption (disk failures, memory loss, sun spots), then both WAV and FLAC will be damaged. But FLAC has checksums, and damage can be tracked. FLAC also takes up less space than WAV, which reduces the possibility of accidental corruption because there is less data to corrupt.

24.

"Loudness Wars" are the most frequently cited example of poor mastering in the industry today, although they are far from the only one. Loudness is a much older phenomenon than Wikipedia claims, with artists and producers insisting on the loudest recordings possible back in the 1950s. Equipment manufacturers researched and created new technologies to please record makers. More advanced vinyl mastering equipment in the 1970s and 1980s, for example, tracked and compressed track boundaries where possible, making it possible to record higher amplitudes than the record space would normally allow.

Modern digital technology makes it possible to raise the volume to absurd levels. There are also a variety of automated, highly complex proprietary digital station modules that are deployed everywhere without a full understanding of how they work or what they actually do.

This article has been read 21,591 times.

The article is included in the sections:

Interesting things about sound

Nyquist-Shannon theorem

In order not to fall into the thinking trap, let's try to understand from the very basics why digital audio works. First, let's clearly understand the terms (we will formulate them as if they are used only in the analysis of sounds). A signal
is a time-dependent function.
For example, the electrical voltage in the wires of audio equipment or, say, the sound pressure on the eardrum (depending on the moment in time) can be expressed as a signal. Spectrum
is a representation of a signal as a function of frequency rather than time.
This means that the function is expressed not as a "loudness" recorded over time, but as a set of loudnesses of an infinite number of harmonics (cosine waves) included at the same point in time. That is, the original signal can be represented as a set of harmonic signals of different frequencies and amplitudes (“loudness”). Yes, physical quantities can often (in fact, almost always) be represented in such a “strange” way (by performing a Fourier transform on the original function). ( Displaying the value of the spectrum at an arbitrary point in time is one of the most visual ways to visually depict music in an audio player
. Note that the spectrum I’m talking about contains information about the entire period of time, and not about some instantaneous value, i.e. because from a set of harmonics (spectrum) it is possible to recreate the entire sound passage.) The Nyquist-Shannon theorem states that if a signal has a limited spectrum, then it can be reconstructed from its samples taken with a frequency strictly greater than twice the upper frequency
fc
:
f
> 2
fc
. If we increase the sampling frequency, this will only affect the fact that the digital audio format will begin to allow us to record higher frequencies - those that we do not perceive in any way. By the way, this theorem speaks of a signal consisting not of a finite set of frequencies, but of an infinite one, as in real sound. In simple terms, the meaning of the theorem is that if we take some audio signal containing only frequencies less than fc, and write (to a file) its values ​​every 1/f seconds, then we can then recreate the original sound signal based on these values. Yes, yes, recreate it completely, without losing any quality at all. But the wording doesn't explain how to recreate that sound. In general, this is a theorem from Nyquist’s work “Certain topics in telegraph transmission theory” for 1928; this work does not say anything about how to recreate sound. But Kotelnikov’s theorem, proposed and proven by V.A. Kotelnikov in 1933, explains this quite clearly.

Kotelnikov's theorem

What does this mean? First, let's pay attention to the function sinc(t) = sin(t)/t. Visually, this is just a Mexican hat:


Subtracting
k
/(
2f 1
) from
t
means moving the hat to the right place (to the same place where the reference was recorded), and multiplying by
Dk
means stretching this hat vertically so that its top coincides with the reference point.
That is, the theorem states that to recreate the sound, it is enough to collect the hats at points corresponding to the references, and in such a way that the tops of the hats coincide with the measurements in the references. We will leave the theorem without proof - it can be found in almost any literature on signal processing. However, I would like to draw your attention to the fact that recreating a function using Kotelnikov’s theorem is not just smoothing. Yes, the hat does not affect the values ​​in adjacent samples, but it does affect the values ​​between them. And when we have a low-frequency signal, it may look like smoothing, but
if we have, say, a high-frequency cosine, then when it is depicted in the form of steps, we will not even understand that it is a cosine - it will seem just a chaotic set of samples, however, when the restoration will produce a real and perfectly smooth cosine. Well, mathematically it is clear that it is possible to restore sound. Purely theoretically. And this does not mean that digital audio playback devices recreate the sound indistinguishable from the original, it only means that the audio format allows this to be done. But how to correctly throw Mexican hats at the output of a digital-to-analog converter and how to convey the resulting sound to the ear with minimal distortion is a completely different magic that has nothing to do with this article. Fortunately for us, good engineers have already thought a thousand times about how they can solve this problem for us.

What do 24 bits provide?

When discussing the application of Kotelnikov’s theorem to digital audio, for the sake of simplicity, we forgot that when quantizing (digitizing), the numbers Dk
are numbers recorded on a computer, and, therefore, these are numbers not of any accuracy, but of a certain one - the one that we choose for our audio format.
This means that the values ​​of the original signal are not recorded accurately, resulting in the general inability to recreate the original signal. But how does this actually affect the sound perceived by a person when comparing 16 and 24 bit signals fairly? Studies were conducted on which is better, 24/44 or 16/88 (yes, that’s right!), doubling the frequency did not improve the quality, but the subjects determined the increase in bit depth without problems. No one is looking towards 32 and 64 bits yet; there are no devices in nature that could realize the potential of 64-bit audio. But when internally processing sound in music editors, they use high bit depth of 64 bits and higher. Let's talk about sound volume. The loudness of a sound is a subjective quantity that increases very slowly with increasing sound pressure and depends on it, the amplitude and frequency of the sound. The loudness level of a sound is a relative value that is expressed in phons and is numerically equal to the sound pressure level
created by a 1 kHz sine wave of the same volume as the sound being measured.
Sound pressure level
(SPL) is measured in dB relative to the threshold of audibility of a 1 kHz sine wave for the human ear, and when the sound pressure increases
by
2 times, the sound pressure level increases
by
6 dB. Here are a few sound pressure values:

  • 20-30 dB SPL is a very quiet room (yes, a room in which nothing happens).
  • 40-50 dB SPL – normal conversation.
  • 75 dB SPL – screaming, laughing at a distance of 1 meter.
  • 85 dB SPL - Hearing Harmful Level - Damage from prolonged exposure 8 hours per day, may be less for some people. About the same volume on the freeway during rush hour [Sound pressure levels]. I don’t know about you, but I never listen to music at this volume - this becomes clear when I walk past the highway with closed-on over-ear/over-ear headphones and try to listen to music.
  • 91 dB SPL – hearing damage with exposure 2 hours per day.
  • 100 dB SPL is the maximum permissible sound pressure for headphones according to European Union standards.
  • 120 dB SPL - almost unbearable - pain threshold.
  • 140 dB SPL and above - rupture of the eardrum, barotrauma or even death.

This volume summary table is designed for playback from loudspeaker systems where high sound pressure affects the entire body. With headphones, many people listen at 130-140 dB without any problems and no membrane rupture occurs. It is certainly possible to ruin your hearing. The main data on pain thresholds are obtained from speakers, where the greatest harm is caused by low frequencies, which act not so much on the ear as on the whole body, bringing internal organs into resonance and destroying them. It is simply not realistic to damage the chest from low frequencies from headphones. But in a car, a subwoofer is just right. But more importantly, the table was originally created for production noise in factories. Headphones can damage your ear at high volumes only in the upper mid-frequency region, where the ear has its own resonance. The effective dynamic range of 16-bit audio is 96 dB. Comparing 130 and 96 dB, it becomes clear that we can hear the difference in sound. But purely theoretically. First, 96 dB is the signal-to-noise ratio of typical sound sources. Secondly, to popularize high-resolution formats, studios often mix sound for CD and DVD-Audio with slightly different diligence, and as a result, the buyer can hear mediocre mixed material in the first case and well mixed material in the second. Recently, it has become fashionable to release remasters of various artist albums. But at the same time, most of these remasters, made on newer equipment and in heavy formats, sound significantly worse than old recordings... Here the suspicion arises that instead of high-quality mixing by a talented sound engineer, everything is simply replaced with high-quality equipment and the confidence that this will give a better result, and if not, then everything will be sold out anyway. It turns out that from the standpoint of technical parameters, 24 bits will always be better than 16, but you can hear this on high-quality recordings; if you make a recording from the radio, it will be very difficult to distinguish between 16 and 24 bits. Thus, it is worth pursuing not high formats, but high-quality recorded and mixed recordings and strives to improve the quality of the equipment. The race towards heavyweight formats is comparable to the race for megapixels in cameras, where any professional knows that the final quality depends rather little on this. In expensive systems, they sometimes use separate processing in the form of SRC, as in Colorfly C4 Pro, which, when converting 44.1/16>192/24, allows you to switch the DAC to a different operating mode and replace its digital signal filtering unit (from aliasing) with a more advanced external SRC converter. Also, separately converted files from 44.1/16 to 192/24 can sometimes sound better, but it is precisely because of the features of the DAC used that this gives reason to think about upgrading the system as a whole. It should be noted that testing various DVD-Audio discs sometimes produced disappointing results, because The original source for the heavyweight format was taken from standard CD-Audio.

Additionally

Well, if our goal is to enjoy the sound, then it remains to understand that the news about the pointlessness of 24/192 is not even bad at all - it actually says that the sound quality can be improved, but for this no need to chase heavy formats. But since there are at least two opinions about “16/44.1 versus 24/192,” then maybe there are some other and interesting opinions? Yes, I have. There are at least two more interesting articles with unexpected conclusions: “Coding High Quality Digital Audio” from J. Robert Stuart (article in English) and “24/192 Music Downloads... and why they make no sense” from Monty, developer of the OGG format ( This article is also in English, it claims that 24 bits are also meaningless).

The Classical Shop

Formats: WMA, WAV, AIFF, FLAC, FLAC 5.1

Resolution: up to 24bit/96kHz

Website

The site contains a large collection of classical and jazz (over a million tracks) from approximately 200 labels. According to the store owners, approximately 80,000 tracks are added to the catalog every month! However, many recordings are only available in CD quality.

Fortunately, the site has a filter that allows you to precisely set the desired parameters, including sampling and bit rate. The classical and jazz section has many subgenres with Hi-Res studio master recordings.

Summary

  • There is no point in storing audio in 24/192 as it will not improve the audio quality for nothing.
  • 192 kHz is meaningless because it allows us to record sounds with frequencies that we cannot hear, and all audible sounds are at 44.1 kHz.
  • By the way, if these frequencies contained any information, and if it were reproduced by a digital-to-analog converter, then it would introduce additional distortion (noise) in the audible frequency range. Do you know the reasons for this behavior of the audio system?
  • 24 bits allows us to record sounds at a volume that we cannot hear on conventional equipment (or allows us to record the volume of audible sounds with such precision that is indistinguishable from 16 bits).
  • Due to cognitive biases, we may believe that the difference between 16/44.1 and 24/192 exists and is noticeable.
  • Many marketing moves and strategies are based on cognitive biases and ignorance.
  • The sound quality can be improved, but in other ways.

Author:
Other authors:
Roman Kuznetsov 12/14/2012

Found a typo in the text?

Select and press
Ctrl+Enter
. This does not require registration. Thank you.

↑ Links

Interesting hearing test from Philips:
Philips Golden Ears (Russian)

The unique Golden Ears training program has been developed for our engineers to develop their skills as acoustic experts.
Thanks to their ability to evaluate sound, we create devices with superior sound quality to reveal all the nuances of musical compositions. Translation of Chris Montgomery's article into Russian:
• Digital audio format 24/192, and why it makes no sense. Part 1 • Digital audio format 24/192, and why it makes no sense. Part 2 • Digital audio format 24/192, and why it makes no sense. Part 3

Rating
( 1 rating, average 4 out of 5 )
Did you like the article? Share with friends:
For any suggestions regarding the site: [email protected]
Для любых предложений по сайту: [email protected]