Hi-Res Audio Tutorial

The graphic illustrates one of many digital artifacts: pre- and post-ringing. As can be seen, the relative magnitude and the duration of pre- and post-ringing reduces with increased resolution. In this example DSD, most closely resembles the original analog waveform.

Another consideration is rise time of the impulse. Notice how th higher resolutions provide for faster rise time of the impulse. State-of-the-art condenser microphones can go from nothing to full output in about 25 µs. A practical rule-of-thumb in the field of metrology is to use a measurement system ten times more precise than the variable being measured. With a sampling interval of 22.7 µs, the CD standard fails to meet this criteria. Even at 180 kHz sampling rate, the sampling interval is 5.56 µs - only five times more precise than the microphone. That's why the impulse responses look much better for the cases of 192-24 and DSD in the graphic above.

Comparison of bit rates:

Rather than complicated digital nomenclature, consumers may be more familiar with bit rates. Typically, music is represented in kilobits per second ("kbps"). [1] Here's a comparison of various CODECs and resolutions:

MP3 and AAC: 320 kbps

CD: 1,411 kbps

48-24: 2,304 kbps

96-24: 4,608 kbps

192-24: 9,216 kbps

2.82DSD: 5,640 kbps

11.2DSD: 22,580 kbps

[1] kbps = kilobits per second = sample rate x bit depth x 2 channels

It is obvious that Hi-Res can be 1.63 times to 16 times better than CD!

Subjective listening:

Subjectively, it is easy to hear the difference between 44.1-16 and higher resolutions. 48-24 sounds significantly better than 44.1-16 and 88-24 and 96-24 sounds even better. There are diminishing returns. It begins to become difficult to subjectively discern 192-24 from 96-24 and 5.6 Mhz DSD from 192-24. Often times, at these resolutions, the recording, mixing, and mastering process becomes more important than the resolution itself. But for those who demand the best, 11.2DSD or 352-32 reduce digital artifacts to nearly the point of insignificance.

Today's "digital age":

The use of physical storage media, especially CDs and DVDs, is declining rapidly and will soon be defunct.
Advances in digital storage capacity and digital data transmission rate easily accommodate large file sizes associated with Hi-Res Audio.
Solid-state media (hard drives and USB memory sticks, for example) consume less space and provide for higher quality playback. For example, a 512 GB USB memory stick can hold 640 CDs.
Music software allows rapids access to albums, songs, and album artwork.
Physical media degrades with age - digital media lasts forever.

Recommendations:

Rather than get hung up on PCM or DSD, or CODECs, just go with 96-24 AIFF (or better) for PCM or 2.8DSD (or better) for PDM. Meticulously crafted music of elegant provenance is available in 352-32 or 11.2DSD and is presently state-of-the-art, providing for the ultimate listening experience.

Musicians who continue to release physical CDs are strongly encouraged to consider recording at 96-24, or better, and releasing Hi-Res versions of their albums for digital download. Eventually, all music will be streamed in Hi-res - it is the future.

Where to buy Hi-Res:

There are dozens of companies who provide Hi-Res content. Simply search the web. The following are a few of the prominent retailers:

High-Resolution Audio ("Hi-Res Audio" or "Hi-Res") refers to a collection of digital processes and formats that allow the encoding and playback of music using higher sampling rates than the standards used in CDs.

Hi-Res Audio brings the studio experience to the consumer, thereby presenting the music as the artist intended.

A common response from those who learn about Hi-Res for the first time assume that only passionate or experienced audiophiles can hear the difference. Not true! Anyone with normal auditory senses can hear the difference. There is a certain irony in this response. Individuals typically spend thousands of dollars on high-resolution televisions because they can see the difference. In a likewise manner, individuals value the taste of good food and wine and will not hesitate to spend considerable money on such things. Why is it that individuals don't doubt their sense of vision or taste, yet immediately doubt their auditory sense? It might surprise them to learn that the auditory sense has a dynamic range of about 140 dB, while sense of sight is about 90 dB! It could be argued that our auditory capabilities may be the best of our senses!

There are two basic types of digital audio: Pulse Code Modulation ("PCM") and Pulse Density Modulation ("PDM").

Pulse Code Modulation

Employs multi-bit values, typically 16, 24, or 32 bits, at sampling rates of nominally 44.1, 48, 88.2, 96, 176.4, 192, 352, and 384 kHz, to name a few.
CDs employ 16 bits at sampling rate of 44.1 kHz. The Compact Disc was developed by SONY and Phillips in 1982.
The minimum bit depth and sample rate necessary to be considered Hi-Res is 24 bits and 48 kHz, respectively.

The image below is a graphical representation of PCM. The ordinate ("y-axis") and abscissa ("x-axis") axes are voltage and time, respectively. Shown is a sine wave sampled at discrete integer intervals in both voltage and time. For CD, the ordinate and abscissa are sampled at 65,536 (2^16) and 1/44,100 intervals, respectively.

CODECs

There are numerous CODECs used for PCM. A CODEC is a device or computer program for encoding or decoding a digital data stream or signal. CODEC is a portmanteau of coder-decoder. Common CODECs include MP3, AAC, AIFF, WAV, FLAC, ALAC, and DXD to name a few.

Consumers are most familiar with MP3 and AAC CODECs, which are "lossy", "compressed", formats, which are not as good as CD-quality. These formats are considered low-resolution because they compromise playback quality in exchange for reduced file size.

CODECs used for Hi-Res typically include AIFF, WAV, FLAC, and ALAC. AIFF and WAV are uncompressed file formats, while FLAC and ALAC are "lossless" compressed audio formats. AIFF files are preferred because they support album artwork.

When describing Hi-Res Audio, the following nomenclature is useful:

SSS-bb or bb-SSS

where SSS is the sample rate in kHz, and bb is the bit depth. For example, 44.1-16 or 16-44.1 both interchangeably represent 44.1 kHz - 16 bit audio data.

The CD was released in 1982 and was limited by the technology available at that time. For 35 years, music has been limited by the 44.1-16 standard. The following objective and subjective information support this assertion:

Dynamic range:

Human hearing has a dynamic range of about 140 dB.
Condenser microphones have a dynamic range of about 125 dB
CD-quality sound (16 bit) has a theoretical dynamic range of 96 dB
Hi-Res Audio at 24 and 32 bits, has a theoretical dynamic range of 144 and 192 dB, respectively.

Point: Why use the CD format when it can neither match the performance of the microphone nor the human hearing?

Digital resolution:

Let's assume the goal of digital playback is to accurately reproduce all frequencies up to and including 20 kHz. For the sake of simplicity, let's consider the basic sine wave. Clearly, music is much more complex.

First, let's examine the CD at 44.1-16. Using 44,100 samples per second results in a sampling interval of 22.7 µs (1/44,100 samples/second). The period of a 20 kHz sine wave is 50 µs, thus only two samples can be obtained:

It is evident that nine samples, equivalent to a sampling rate of 180 kHz, crudely approximate a sine wave.

This simplistic analysis demonstrates the benefit(s) of increased digital resolution. The graphic below compares the impulse responses of various resolutions to the referenced analog waveform:

Pulse Density Modulation

Employs a sequence of single-bit values at high sampling rates, nominally 2.82, 5.64, and 11.2 MHz.
DSD is a trademark name by SONY and Phillips for Direct Stream Digital originally developed in 1999.
All DSD is considered Hi-Res.

The image below is a graphical representation of PCM. In a likewise manner, the ordinate ("y-axis") and abscissa ("x-axis") axes are voltage and time, respectively. Notice the different way the waveform is sampled. In the case of PDM, the voltage is either "on" or "off" and the duration of the sampling corresponds to the magnitude of the signal. In the case of DSD, the time interval of one sample is 1/2,822,400 seconds.

Clearly, 44.1-16 is incapable of properly digitizing a 20 kHz sine wave.

Next, let's see how nine sampling intervals do: