What is spatial audio? How does it relate to binaural audio?

800

You’ve probably heard a bit about spatial audio before, but what exactly is spatial audio? What’s so compelling about this feature that big companies like Google, Apple and Samsung are integrating it into their products? Is spatial audio the same as Dolby Atmos? This article will go into detail about what spatial audio is and why we should care about it.

The History of Audio

When we are not using headphones or earbuds, we listen to sound in 3 dimensions. (Source: Easybom.) Sounds come from all directions (up, down, left, right, forward, backward and all directions between them) and our brain can decipher these sounds to determine direction.

For more than a century, the quest has been on for technology that can mimic this natural experience anywhere, and in 1881, a French engineer named Clement Ader invented the Théâtrophone (theater phone), which used 80 transmitters to connect the entire stage of the Paris Opera House. These transmitters produced a kind of binaural stereo sound (i.e. a method of recording sound using two microphones that replicated the 3D stereo sound that people experience in real life). As a result, opera lovers as far away as two kilometers can hear these sounds.

Acoustics played a large role in determining the direction of airplanes during World War I and the early years of World War II. Each country/region had its own unique method of picking up and amplifying noise to help hear airplane engines and determine their direction. In hindsight, this may seem comical, but it is clear that audio was a key war technology.

Some 30 years later, in 1972 Neumann released their first commercially available binaural recording system, making it simple and consistent to replicate spatial sounds in a variety of applications. Since then both the technology and methodology have improved, including the introduction of new techniques using arrays, rather than just two different microphones, to record sound in a given space in greater detail.

Today, advanced audio technologies are integrated into a wide range of devices for a variety of audio applications (from music to gaming), including bar speakers, headphones, TWS earbuds, automobiles, and XR devices.

Spatial Audio Series Tree

The way we listen to audio has changed over the years. It initially started with mono output, like listening to the radio, where all the sound came from one source. Then, sound playback gradually evolved to use more speakers to provide the listener with a more engaging and comprehensive sound experience.

The earliest form was stereo, with two speakers, which evolved into four speakers. This evolved into 5.1 and 7.1 surround sound (5 and 7 speakers, respectively, and a subwoofer to play low-frequency sounds), as well as large speaker arrays (more than 7 speakers) for greater spatial output.

While 5.1 and 7.1 surround sound systems simulate the sound around you, because these speakers surround you at the same height, it’s only on a flat surface around you. Dolby Atmos has moved into audio space, providing audio cues above and below you to create a more immersive experience.

What is spatial audio?

What is so compelling about this feature that companies like Apple and others are integrating it into their products? How is spatial audio different from Dolby Atmos? You may have noticed that I’ve never referred to the previous sound experience as spatial audio. Though some might argue that, according to the dictionary definition of space (pertaining to or occupying space), any scene with two or more speakers could justifiably be called “spatial”. Well… I agree with that.

However, in this industry, spatial audio refers to a very specific type of experience. You may also hear it referred to as 3D audio, or Samsung refers to it as 360-degree audio. The technical term for this type of experience is head-tracking binaural audio. Let’s break it down. Binaural audio is audio that is recorded with two microphones at the ear position of an emulated head. Like the Neumann head shown below.

Neumann KU-100

This gives you audio that matches what you hear because the microphones are located in the same position as the normal ear.

To learn more, check out Rit Rajarshi’s great illustration:

Audio cues from binaural audio

The bird sound on the left travels to the listener’s head. But because one ear of the human head is farther away from the source than the other, the sound reaches both ears at different times. The brain processes and understands this time difference to provide us with the relative position of the sound. So, putting these components together creates a “sound map” that puts you in an immersive space. This is really cool, but it’s still not very realistic. This is where the head tracking component comes into play.

In the real world, where objects are stationary around you, sound comes from those positions relative to your head (and subsequently your ears). For example, you hear cheers erupting from a sports field behind you. If you turn your head to the left or right, you can focus your hearing on this sound, but this does not change its position.

Similarly, the sound remains in place even after moving. The moving picture below shows the difference between binaural audio and head-tracking binaural audio. When you turn your head, the world doesn’t rotate with you, it remains in the same place.

Why Head Tracking Matters: Binaural Audio vs Head Tracking Binaural Audio

These combinations make up a truly immersive experience, what we in the industry call spatial audio.

Another part of this experience is the head-related transfer function (HRTF). This type of algorithm is used to determine how sound bounces, scatters, and diffuses as it reaches and moves toward the ear canal. The distance between the two ears is also taken into account. In short, this determines how any given unique head shape affects the sound, which is then adjusted to make the audio cue as realistic as possible.

By combining all of these components, binaural recording, HRTF and head tracking, you get a complete, comprehensive and fully immersive audio experience that produces an immersive feeling.