Your Guide to Spatial Audio in VR

Last Updated: October 28 2021

spatial audio

Audio in VR is incredibly important. It cannot be neglected. If a VR experience doesn’t have the right audio to match the visuals, it will break immersion. Considering that the benefits of VR are more pronounced when you’re fully immersed in it, developers have every reason to learn how humans hear and how sound can be recreated realistically in VR apps.

Have you ever thought about building your own XR application? Check the XR Development for Unity Program:
Download Syllabus

How Do Humans Hear?

Despite only having two ears, humans are incredibly capable of pinpointing the location of a sound in the 3D environment that we live in. This is called sound localization and it’s made possible because our brain interprets and analyzes the information it receives through our ears. 

We developed sound localization because it was a crucial element of survival for our ancestors. After all, our vision is restricted to our limited field of view. Without sound, we’d essentially be entirely blind to what’s going on behind us, next to us, and when we’re in the dark. We’d have been easy targets for predators without sound localization. 

In order to pinpoint a sound in space, we need to know the angle and the distance of the sound. Our brain uses a few important cues to figure out both variables. For the angle, or the lateral localization of a sound, our brain primarily uses two cues: the interaural level difference (ILD) and the interaural time difference (ITD).

The ILD is the difference in loudness in one ear when compared to the other ear. A sound coming from the left will sound somewhat louder to our left ear than it does to our right ear. Our brain picks up on that to determine where a sound is coming from. 

The ITD is the difference in the timing of the sound entering one ear when compared to the other ear. Our brain determines that a sound came from the left if it entered our left ear before it entered our right ear.

How we hear

Source: Soumyasch at English Wikipedia

It’s a more complex story when it comes to sounds that are right ahead or behind us, because sounds arrive in both our ears at the same time equally loud. Here, our brain relies on spectral modifications. These are the minute reflections of sound because of the size of our head, neck, shoulders, torso, and because of the shape of our outer ears.

Because all bodies differ in various small ways, this means that every individual has their own unique spectral modifications. We can capture these modifications in a head-related transfer function (HRTF) which will prove important for audio in VR. But more on that later.

To determine how far away a sound is, our brain uses the loudness of a sound, the frequency of a sound (particularly to determine whether it’s moving away or toward us), the amount of reverb of a sound, and a myriad of other small cues.

Creating Realistic Audio in VR

For a fully immersive audio experience in VR, you’re looking to create spatial audio. This is audio that allows you to accurately pinpoint sounds in VR. But we cannot simply take a sound as we hear it in real life, place it somewhere in a VR simulation, and expect our brains to be able to localize it.

After all, most people wear headphones in VR. This means that a sound heard in our left ear is either not heard in our right ear or heard at the same time and at the same volume. This makes it hard for our brain to figure out where a sound is coming from. As such, we need to accommodate for the different cues that our brain uses to create spatial audio.

One way to do this in Unity is through the audio spatializer setting, which you can enable if you go to edit, project settings, audio. The Oculus Spatializer and the Microsoft HRTF spatializer are included by default, but you can find other spatializers if you type in spatializer plugin in Google. These spatializers will allow you to change the angle and the volume of sound as it enters the left and right ear of the user’s headphones.

Enable Audio in Unity3d

Source: Unity Manual

But you can go a step further too. As it stands today, using HRTFs for sound is the best technique for directional localization. While everyone has their own unique HRTF (think of it as your audio fingerprint), most people’s HRTFs are similar enough that a generic HRTF is good enough for our brain to pinpoint sound with reasonable accuracy.

There are a few online databases with HRTF samples that you can use to create spatial audio in your VR app. While these databases are often hard to find and far from complete, they’re worth investigating if you’re dedicated to realistic audio. Keep in mind that the user of your VR app will need to wear headphones for your HRTF-enabled VR app, else they’ll be hearing sound processed through a generic and their own HRTF, something that you want to avoid.

However, because HRTFs are recorded in an anechoic chamber, they will often sound dry and weird when used in isolation. There will be no sound reverb. Ideally, you want to combine HRTFs with some environmental modeling. This is where a lot of innovation is happening right now.

Today, developers and sound engineers currently use what’s called the shoebox model, which allows you to specify the distance and the reflectivity of the six walls around the user in VR. But this is quite a limited model for realistic audio. Not all walls might have the same reflectivity and it doesn’t take into account the objects in the room that might interfere with the sounds.

However, during Oculus Connect 5 in 2018, Oculus introduced what they call propagation, something that the Facebook Reality Lab had been working on. It allows developers to create audio that takes into account the objects in the room and how sound reverberates around a room with different levels of reflectivity. Watch the video below from minute 23:40 to understand what that sounds like.

Source: Oculus Connect 5

While this already sounds good on a video on YouTube (when you listen with headphones), this sounds incredible in VR and really adds to the immersion. While this is still a relatively new technology, it’s a big step forward toward immersive VR.

In Conclusion

Audio is incredibly important for feeling fully immersed in a VR experience. Creating spatial audio in VR is difficult because our brain uses several cues to determine where a sound is coming from. However, it’s not impossible. Audio spatializers in Unity, generic HRTF databases, and propagation are all techniques that can make your VR app sound incredible. 

Interested in how to create and build your own XR application with Spatial Audio? Check out the XR Development with Unity course.


Dejan Gajsek

Content Manager

Receive our newsletter to stay on top of the latest virtual reality and augmented reality info.