7 elements of high-quality digital sound

What affects the sound quality of digital recordings?

My grandfather listened to the gramophone.
My father spent his youth listening to music coming from the speaker of a reel-to-reel tape recorder. My youth saw the rise and fall of cassette recorders. My son is growing up in the era of digital sound. In order to keep up with the times and provide my son with good “sound,” I decided to figure out what determines the quality of playback of a digital audio signal. I talked to my music lover friends. Conducted an information search on the Internet. As a result, I came to the conclusion that high-quality sound in the digital era can be achieved if you correctly choose the 7 main elements of modern music centers:

the format in which the music is recorded;
record player;
digital-to-analog converter;
amplifier;
acoustics;
cables;
nutrition.

Below I will share my observations and conclusions regarding achieving high-quality sound from recordings in digital formats.

What will we use?

1. foobar2000 - for decoding, playing and viewing technical characteristics of music files. Additions to foobar2000: fooCDTect (a shell for auCDTect - lossless check for upconversion), AuSpec (convenient viewing of a spectrogram by pressing one button), MP3 Packer - viewing specific MP3 parameters.

Note: in order not to install additional mountains of decoders and add-ons, I recommend immediately downloading my assembly. I don’t recommend alternatives to foobar2000 and add-ons, because... they are significantly inferior in terms of capabilities.

2. EncSpot Professional - with its help we will view the technical characteristics of MP3 files.

3. Adobe Audition 2 - for viewing spectrograms with convenient scaling.

What does a number sound like?

A lyrical digression, experts don’t have to read it.

I’ll explain in a nutshell where digital sound comes from. During the sound recording process, the microphone converts mechanical vibrations (sound itself) into an analog electrical signal. An analog signal, in the most general case, is similar to the sine wave that we are all familiar with from high school. In the era of analogue sound, it was this signal that was recorded on various media and then reproduced.

With the development of microprocessor technology, it became possible to record and store audio information in digital formats. These formats are obtained using an analog-to-digital conversion (ADC) process.

During the ADC, the analog signal (our sine wave from high school) is converted into a discrete one (in other words, it is cut into parts). At the next stage, the discrete signal is quantized, i.e. each resulting segment of the sinusoid is associated with a digital value. At the third stage, the quantized signal is digitized, i.e. encoded as a sequence of 0 and 1. In relation to digital audio recording, information about the amplitude and frequency of sound is digitized.

How to feel 100% immersed in the sound of your favorite music? All the answers are here!

Still from the film “Baby Driver”

Nowadays, almost any relatively well-known track can be listened to for free somewhere on the Internet, which is what most modern music lovers resort to. Alas, many of them do not even suspect HOW much the original file was lost.

Moreover, it often suffers even more in the process of downloading and subsequent transcoding in order to eventually get to your headphones.

Now we will tell you how to make sure that you listened to exactly what the author intended. (And if suddenly you are that rare vinyl lover, we also have a guide to choosing a record player).

How to store good sound?

Digital audio formats are used to record and store digital audio information. An audio format is a set of requirements for the representation of audio data in digital form.

When discussing sound quality, digital formats are divided into 3 categories:

Formats without additional compression (CDDA, DSD, WAV, AIFF, etc.);
Formats compressed without loss of quality (FLAC, WavPack, ADX, etc.);
Formats that use lossy compression (MP3, AAC, RealAudio, etc.).

High quality sound is obtained when playing music saved in formats from the first and second categories. In the formats of the third category, in order to reduce the volume of data, some information is deliberately excluded. For example, information about hidden frequencies.

Hidden frequencies are those that lie outside the range of perception of the average person: 20 Hz - 22 kHz. For audiophiles, this range, due to individual psychophysiological characteristics, is wider.

To complete your home audio library, you should select recordings saved in files with the extensions:

*.wav, *.dff, *.dsf, *.aif, *.aiff – these are uncompressed audio files;
*.mp4, *.flac, *.ape, *.wma are the most common files with lossless compressed audio.

From the history. They say that the very first experiments on sound preservation were carried out by the ancient Greeks. They tried to preserve sound in amphorae. It looked something like this: words were spoken into the amphora and it was quickly sealed. Alas, not one such recording has survived to this day.

RED

If we need real redundancy, RTP has a solution called RTP Payload for Redundant Audio Data, or RED.
It's quite old, RFC 2198 was written in 1997. The solution allows multiple RTP payloads with different timestamps to be placed in the same RTP packet at relatively low cost. Using RED to put one or two spare audio frames in each packet would give much greater packet loss tolerance than Opus FEC. But this is only possible by doubling or tripling the audio bitrate from 30 kbps to 60 or 90 kbps (with an additional 10 kbps for header). Although compared to more than 1 megabit of video data per second, this is not so bad.

The WebRTC library included a second encoder and decoder for RED, which is now redundant! Despite trying to remove unused audio-RED-code, I was able to use this encoder with relatively little effort. The full history of the solution is available in the WebRTC bug tracking system.

And it is available as a trial version, enabled when you launch Chrome with the following flags:

--force-fieldtrials=WebRTC-Audio-Red-For-Opus/Enabled/ RED can then be enabled via SDP negotiation; it will display like this: a=rtpmap:someid red/48000/2 It is not enabled by default because there are environments where using extra bandwidth is not a good idea. To use RED, change the order of the codecs so that it comes before the Opus codec. This can be done using the RTCRtpTransceiver.setCodecPreferences API as shown here. Obviously, the other alternative is to change the SDP manually. The SDP format could also provide a way to configure the maximum reservation level, but the offer-response semantics in RFC 2198 were not entirely clear, so I decided to put that aside for now.

You can demonstrate how this all works by running an audio example. This is what an early version with one backup package looks like:

By default, the payload bitrate (red line) is almost twice as high as without reservation, at almost 60 kbps. DTX (Discontinuous Transmission) is a bandwidth saving mechanism that sends packets only when voice is detected. As expected, when using DTX the impact of bitrate is mitigated somewhat, as we see at the end of the call.

Checking the length of the packets shows the expected result: packets are on average twice as long (higher) than the normal payload length distribution shown below.

This is still a little different from what Zoom is doing, where we saw partial reservations. Let's repeat the Zoom packet length graph from earlier to see how they compare:

Player – searching for a win-win solution

Choosing a player must begin with an understanding of the form in which the home audio library will be formed. You can buy CDs the old fashioned way or switch to purchasing your favorite music online. The latter option has two significant advantages. It is compact and environmentally friendly:

The question of space in the apartment for storing CDs does not arise.
No need to throw faulty disks in the trash.

Have you decided how to buy music? Great! If you buy CDs, you need a CD player. If you prefer online shopping, look for a player on a hard drive or flash memory. Undecided? Great! Look for a universal player. On this you can listen to both discs and files purchased online.

Naturally, you can turn it into a player and a personal computer. But this option is convenient when the computer is truly personal. The prospect of competition for space at the keyboard and possible conflicts will significantly reduce the pleasure of listening to music in good quality.

When choosing a player, pay special attention to the available connectors. The more connector options, the easier it will be to select other elements of the music center.

Good external acoustics

Headphones are good, but the outside sound at home is just a child’s delight. So if you have the opportunity, be sure to create powerful acoustics at home to listen to your favorite tracks at your leisure.

It is best to buy something modern and networked so that the device can be used simultaneously with different devices. And if you buy several speakers, arrange them in a circle; the stereo effect has not been canceled. If the sound is correctly distributed in space, the effect will be maximum.

Briefly about the advantages of external acoustics:

a pleasant background for work and performing routine tasks;
the perfect atmosphere for parties with friends and family at home;
a hundred times greater effect from watching films (especially spectacular ones);
aesthetic orgasm from listening to your favorite music alone with a glass of something tasty (and not necessarily strong).

DAC! And the digital turns... into an analog signal

The player has read a digital sequence from a CD or file. Now comes the most mathematical moment of digital audio reproduction. The digital signal is converted to analog. This math happens in a DAC, or digital-to-analog converter.

The DAC can be built into the player or implemented as a separate unit. If you want to get high quality sound, you need to opt for the second option. The built-in converter is usually inferior in quality to a separate one. The external DAC has its own power supply, the built-in one is powered from a common source with the player. When using an external DAC, its operation is almost unaffected by interference from the player and amplifier.

External DAC according to circuit design solutions is implemented in 4 main versions:

Pulse width modulator;
Resampling scheme;
Weighing type;
Ladder type, or R-2R chain circuit.

With such a wealth of choice to achieve high quality sound, the R-2R option appears to have no alternative. Due to a special circuit implemented using precision resistances, the ladder-type DAC can achieve very high conversion accuracy.

When choosing an external digital-to-analog converter, you should pay attention to two main characteristics:

Bit depth. It’s good if the selected model has 24 bits.
Maximum sampling rate. Very good value 96 kHz, excellent 192 kHz.

Really high quality headphones

As you already understand, good sound is not only a correct audio recording, but also, of course, media capable of reproducing this original track without distortion. Let's start with professional headphones. And if you immediately thought about bulky invoices, you’re in vain; inset ones, contrary to popular myth, can also be high-quality.

For example, the young company FiiO skillfully combines in its products the highest quality components, excellent assembly of each product, excellent sound and very affordable prices. The company produces high-quality players, headphones, DACs and amplifiers.

FIIO F9

Photo by FIIO

As one example of in-ear headphones, let's present a model - three-driver monitor hybrids with a super-durable design: the waves that can be seen on the body are not just a design decision, they significantly increase the strength of the headphones.

It is important that there are three drivers per channel: a dynamic driver is responsible for deep bass, and two balanced armature drivers are responsible for clean high and mid frequencies.

The dynamic driver is made of PEK polymer nanocomposite, which has sufficient rigidity and light weight. This allows the driver to produce fast, detailed and extended bass.

Photo by FIIO

The maximum performance of the F9 drivers is achieved by taking into account the laws and principles of physics and psychoacoustics when designing the headphones. Thus, the drivers work in perfect harmony to achieve absolute sound in the frequency range from 15 Hz to 40 kHz.

For more advanced audiophiles using an advanced sound source, the package also includes a 2.5mm balanced cable made of silver-plated copper wire.

Photo by FIIO

Both cables feature MMCX connectors for a secure fit, and come with two sets of ear pads in different sizes: one optimized for bass, the other designed for a more balanced sound.

Plus, the kit includes a special hard case for carrying headphones.

Photo by FIIO

An amplifier is a speaker system's best friend

To achieve high-quality sound, you need to buy an amplifier along with the speaker system. Essentially, these two elements of the audio center work as one.

A little theory. An amplifier is a device that is designed to increase the power of analog audio signals. It allows you to match the signal received from the DAC with the capabilities of the acoustics. Based on the type of power elements, power amplifiers are divided into tube and transistor ones. Each group contains devices with feedback and without feedback. The introduction of feedback is aimed at correcting distortions that the amplifier itself introduces into the amplified signal. However, when obtaining sound without distortion, you have to accept the loss of part of the dynamic range of the sound.

From the point of view of selecting the acoustics-amplifier tandem, it is important to classify the latter according to the type of characteristics of the power element. There are amplifiers with triode and pentode characteristics. Pentode amplifiers come in tube and transistor versions. They are suitable for bookshelf or simple floor-standing speaker systems. For sensitive floor acoustics with a range of 90 dB or more, it is better to select amplifiers with a triode characteristic.

Even before purchasing, you need to try to achieve the ideal balance between the capabilities of the amplifier and acoustics. It is best to ask the consultants directly in the store to test the selected speaker system together with different amplifiers. You need to choose the set that suits your ear best.

Finding the Right Distance

“Distance” is the number of backup packets, that is, the number of previous packets in the current one.
As we worked to find the right distance, we discovered that while RED with distance 1 was cool, RED with distance 2 was even cooler. Our lab evaluation simulated random packet loss = 60%. In this environment, the Opus + RED produced excellent sound, while the Opus without RED performed significantly worse. The WebRTC getStats() API provides a very useful way to measure this by comparing the percentage of hidden samples obtained by dividing concealedSamples by totalSamplesReceived. On the audio samples page, this data is easily obtained using this piece of JavaScript code pasted into the console:

(await pc2.getReceivers()[0].getStats()).forEach(report => { if(report.type === “track”) console.log(report.concealmentEvents, report.concealedSamples, report.totalSamplesReceived, report.concealedSamples / report.totalSamplesReceived)}) I ran a couple of packet loss tests using the obscure but very useful WebRTCFakeNetworkReceiveLossPercent flag: --force-fieldtrials=WebRTC-Audio-Red-For-Opus/Enabled/WebRTCFakeNetworkReceiveLossPercent/20/ With 20% packet loss and FEC enabled by default, there wasn't much difference in audio quality, but there was a slight difference in the metric:

scenario	loss percentage
without red	18%
without red, FEC disabled	20%
red with distance 1	4%
red with distance 2	0.7%

Without RED or FEC, the metric is almost the same as the requested packet loss. There is an effect from FEC, but it is small.

Without RED, at 60% loss, the sound quality became quite poor, a bit metallic, and the words difficult to understand:

scenario	loss percentage
without red	60%
red with distance 1	32%
red with distance 2	18%

There were some audible artifacts with RED with distance=1, but almost perfect sound with distance 2 (which is the amount of redundancy that is currently used). There is a feeling that the human brain can tolerate a certain level of silence that occurs irregularly. (And Google Duo apparently uses a machine learning algorithm to fill the silence with something).

Acoustics: three roads, three ways

What is a good speaker system is the most confusing question. The choice of acoustics depends on the individual characteristics of a person’s hearing, the parameters of the room in which the system will be placed, and financial capabilities. In this three-variable system, finding a middle ground is very difficult. Therefore, we will consider three fundamental options for solving the problem.

Solution one. Budget. You can equip your home audio with speaker systems. These small systems can be placed on a bookshelf. They are convenient for a small room. Due to its small size, it is also an inexpensive option. A significant disadvantage of this solution is that “shelf” acoustics will not produce normal bass sound.

Solution two. Luxurious. If the dimensions of the room and financial capabilities allow, then you can buy floor-standing acoustics. This system, due to its size, can contain a large diameter woofer. This means there is a chance to enjoy good bass.

Solution three. "Golden" compromise. This solution is suitable for large and small rooms and is affordable. It consists of purchasing a subwoofer and satellites. The subwoofer is responsible for high-quality bass reproduction. Stellites reproduce high frequencies.

When choosing acoustics, you should not follow any advice. You need to rely only on your own hearing. You also need to be prepared for the fact that the sound of the acoustics in the store and in your apartment will be different.

FAQ

How effective is a balanced connection than an unbalanced connection?

If we are talking about good* cables 1-3 meters long, then the type of connection does not matter much. If there is a lot of equipment in the room or it is necessary to lay a cable route more than 3 meters long, then a balanced connection is preferable, since it better protects the signal from external interference. Details are in the next article.

* For switching it is worth using high-quality “instrument” or “microphone” cables from well-known brands. The price of such conductors is $1-2 per meter. In a professional environment, this price for a wire is considered normal, in contrast to the audiophile environment, where a cable under $300 is considered “table lamp wire.” More details in the next article.

What to choose - an external card or an internal one?

Internal solutions have a more attractive price/quality ratio. In other words, with the same cost, the internal card will always sound better, and with the same sound, the external one will always be more expensive. In addition, an external card provides greater signal delays when working with professional audio programs.

They say that there is a lot of electromagnetic interference inside the PC and an external card will sound better outside this bunch of interference. Is it true?

This is how people who have absolutely no information can reason. High-quality internal sound cards, without any additional shielding, can produce phenomenally low distortion (ten-thousandths of a percent) and noise. Many records were made using internal maps. And this applies even to serious studios. What can really spoil the sound is poor shielding of the card’s power circuits (usually found in inexpensive multimedia models), incorrect routing of tracks on the motherboard, or “dirty” current from a low-quality power supply. Such interference may appear inconspicuously (there seems to be less clarity at high frequencies than it should be) or clearly (you can hear crackling noises in your speakers or headphones when you move the mouse or when moving the hard drive heads). The problem can usually be solved by moving the card to another slot, replacing the power supply or motherboard. You can check the system for the presence of spurious interference, for example, with the RMAA program, which will display noise on the graph that is uncharacteristic of the card. It is also advisable to provide the “music” computer with correct grounding and “place” it on a separate phase in the electrical panel. But be careful! Observe safety precautions and do not carry out any work unless you have the appropriate qualifications.

To be continued.

Cables - brevity, the sister of talent

The choice of connecting conductors is an issue that will inevitably have to be resolved to achieve high-quality sound. Many articles have been written about the effect of cables on sound. The only thing the authors achieved unity on was the requirement for cable length. The shorter the better - this is the golden rule when choosing connecting cables.

A little theory. Cables are divided into interconnect and acoustic cables. Interblocks are used to connect audio center blocks, such as a player and a DAC. Speaker cables connect the speaker system to the power amplifier.

Based on the type of conductor material, cables are divided into OFC, OCC and composite. OFC are oxygen-free copper cables produced by the pulling method. OCC are cables made from monocrystalline copper obtained directly from the melt. Composite cables are cables in which the conductor consists of several materials.

If you set out to create the perfect audio center from units from different manufacturers, try to use connecting cables that are as short as possible. And be prepared to experiment to achieve the perfect sound quality.

An easy way to improve

The standard volume control is located on the panel next to the clock. It performs two main functions:

Adjusts the volume. To do this, you need to left-click on the icon and select a specific option;
Slightly improves the parameters; for this, the corresponding window is launched. To open properties with a number of additional options, you need to right-click on the device.

First, the mixer is checked, since its main task is the volume level of the entire speaker system, and it is also responsible for the sound level in games, the browser, etc.

If everything is fine here, the sliders are located at a high level, then the problem should be looked for in additional properties. To do this, you need to open “Sound Options”. Select the section you are interested in and open it. A window will appear on your desktop with two useful tabs: “Enhancements” and “Special Features.”

To increase the volume, just turn on “Sound Leveling”; in some cases, the “Loud Compensation” option helps. These parameters give a high gain, while the level of distortion is minimal, which allows you to safely increase the value.

If this is not enough, you can use an equalizer. However, it is important to understand that this program is best used by professionals, since when turning all the sliders to the top positions, a person will not be able to achieve the desired effect, but he will get distortion and wheezing with a 100% probability.

An equalizer is a unique tool that will help you create the right and desired sound. However, you should understand that this requires a lot of time. In addition, you cannot create conditions of extreme power, as the speakers will not withstand such loads and will fail. Especially if you are using laptop speakers.

Some users may not have this tab. But instead there is a full-fledged software that is automatically installed on the computer. Most often it is present in laptops and PCs with a Realtek sound card.

If there is no such application, then you can easily download it from the manufacturer’s official website. Having opened the program, two panels will appear in front of a person: the right one is designed to control the equalizer, while its settings are much higher than the standard ones, the left one gives access to amplifying individual frequencies.

To achieve maximum effect, you can raise all sliders to the maximum value. Of course, the distortion and crackle will increase, but the sound will also be 200% higher. Therefore, whether to do this or not, everyone decides for themselves.

When you need to make the parameters more pronounced, you should use the “Voice”, “Cinema” or “Music” settings. There are certain values already set there that will automatically improve the settings.

Good nutrition is the key to comfortable sound

Finally, our home complex for high-quality music playback in digital format is assembled. Now all that remains is a mere trifle. Good equipment requires high-quality power supply. If the most expensive “brand” amplifiers, DACs, and players are powered from a common network, then there can be no talk of any high-quality sound. Voltage contaminated with interference will kill all efforts to select and purchase high-quality units for the audio center.

Organize power supply for each unit with a separate cable. The cables must be connected directly to the distribution panel at the entrance to the home. Connection sockets must provide a high degree of plug fixation. It is wise to use a surge protector; it will make the power supply, and therefore the sound, cleaner.

Rate this article Rating 3.94 (33 Votes)

Measuring performance in the real world

We hope that enabling RED in Opus will improve the sound quality, although in some cases it may make it worse.
Emil Ivov volunteered to conduct a couple of listening tests using the POLQA-MOS method. This has already been done for Opus, so we have baseline data for comparison. If initial tests show promising results, we will run a larger experiment on Jitsi Meet's main deployment using the percentage loss metrics we used above. Note that for media servers and SFUs, enabling RED is a little more complex because the server may need to manage RED relay to select clients, as is the case if not all clients support RED conferencing. Also, some clients may be on a bandwidth-constrained channel where RED is not required. If the endpoint does not support RED, SFU can remove unnecessary encoding and send Opus without a wrapper. Likewise, it can implement RED itself and use it when resending packets from an Opus endpoint to a RED-enabled endpoint.

Many thanks to Jitsi/8×8 Inc for sponsoring this exciting adventure and to the guys from Google who analyzed and provided feedback on the necessary changes.

And without Natalie Silvanovich, I would still be sitting there, looking at the encrypted bytes!