Tech

Binaural sound or how to restore natural 3D listening with headphones. Demonstration in images (and sound) in a video that introduces you to the world of spatialized sound.

"Binaural" literally means "relating to both ears". Binaural listening is a very simple technique for the listener: it only requires the user to wear headphones and gives a very natural 3D listening experience. Test it yourself!

Put on your headphones and immerse yourself in the binaural sound!

Le binaural c'est quoi ?

Pour comprendre cette technique de restitution du son en 3 dimensions, voici les explications en texte en partant de la façon dont nous écoutons naturellement. 

What is binaural sound?

The binaural sound is based on the principles of our daily natural listening

We see what is in front of us, we perceive the rest of our environment thanks to what we hear all around us. The sound that propagates in the air is diffused in a three-dimensional space regardless of its source and the number of diffusion points. The sound, contrary to the image which can be reproduced on a surface, reproduces itself in space without any support constraint as long as there is air or matter.

When we are in a room, a place, whether open or closed, our gaze informs us about the place we occupy in relation to other objects, or within this space. We only see what is in front of us. If we are blindfolded, if we are in the dark, or if we simply turn our back on a person/object, we are able to interpret what surrounds us thanks to sound clues that are analyzed by our brain in a more or less conscious way, and inform us about everything we do not see or clarify what we see.

In real life, we hear in 3 dimensions, we perceive sounds coming from in front, behind, right or left but also from above or even below. However, we only have 2 ears, and these 2 ears are the only sound information input received by our brain (but still occulting bone conduction). This means that only 2 sounds contain enough information to allow our brain to separate the origin of each sound source and "distribute" them around our head to give us the 3D listening that seems so natural to us.

During a natural listening, our two ears do not receive exactly the same information. In summary, our brain needs 3 clues which are differences between the left ear and the right ear.

ITD = Difference in time of arrival of a sound from one ear to the other: our two ears are not exactly in the same place. There is a placement discrepancy that induces a difference in the time of arrival of a sound from one ear to the other depending on its position relative to our head. Certain frequencies that are perfectly perceived by one ear are modified when they arrive at the ear opposite to the sound source because they are attenuated by the masking effect of the head. There is also a phenomenon of frequency filtering due to the impact of the pinna of our ears.

Arrival time = t2-t1Loss of intensity between right and left ear

ILD = difference in loudness of a sound from one ear to the other.

Frequency index: also called monaural index, these differences result from the impact of the shape of the pinna on the nature of sounds as they reach our eardrum.

Different shapes of ears
Different shapes of ears

The most efficient spatialization tool is still our brain. The matter it decodes is composed of only 2 acoustic signals containing left-right differences called "interaural" differences. The principle of binaural listening is to restore to the headphones these interaural differences.

Contrary to almost all other systems of restitution, binaural listening also allows a listening of proximity or extreme proximity: one can whisper in the ear of the listeners in an extremely disturbing way for example. Conversely, it also has the ability to give a sensation of externalizing sound sources so that the perceived sounds can give the impression of being well beyond the headphones and beyond the walls of the room in which the listener is standing. Just like natural listening, which seems innate and instinctive, it informs the listener in a very realistic way about the acoustics of the place and therefore about the architectural features including the texture of the building materials.

The brain is a signal decoder, capable of spatializing sounds in space from two files heard simultaneously. It is therefore essential to respect a minimum quality to the initial audio file, both in terms of the quality of the sound recording (microphones, recorders, recording formats) and that of transmission or reception.

Binaural listening in a multi-sensory context

The vision

The ability we have to know how to "decode space" is an ability acquired during our human experience through a multi-sensory approach. Our personal "scheme" of spatialization, was built by matching the information received by our eyes and those received by our ears. Once this schema is built in our brain, we naturally perceive the sounds around us, and our brain has no difficulty in imagining that the sound source is really present when it is not in our field of vision. On the other hand, it will be difficult to make us believe in a sound source that is not visually present or not very visually credible because it is emitted by a screen, of poor quality or not synchronized.

This multi-sensory phenomenon must be taken into account in the case of audiovisual production because it can be an asset under certain conditions, just as it can become a disturbing element when the visual-auditory adequacy expected by the viewer is not faithful to what he or she might expect from the notion of reality.

In a film, a documentary, a video game, different keys are used to enable the viewer/user to understand the space in which he finds himself: overall plan, change of axes, visual clues to the setting, cultural clues. We use sound clues just as we use visual clues: bird noises in a forest, the sound of cicadas when it is hot, traffic noise in a big city. Other artificial sound clues are created to make us feel the space: reverberation, certain frequencies tell us about the nature of the place.

Thanks to multi-channel broadcasting, we can even hear sounds all around us in home cinema installations, and even above us in cinemas equipped with multi-channel broadcasting. On the other hand, these systems, however expensive and sophisticated they may be, do not yet allow us to feel the natural space around us.

The real listening environment may have an impact on the realistic or unrealistic feeling of the scene. It will be introduced by a mismatch between what the listener hears and the environment in which he knows he is and which he does not recognize resonates acoustically.

Example of audiovisual content in binaural

Dernier round from Binaural Circus / David Kleinman on Vimeo

The movement

Another multi-sensory phenomenon, the one linked to movement as far as we can consider that moving and being conscious or at the origin of one's movement is a sense. We will say that it is rather an aggregate of senses that form a coherence. The perception that we have of a space which moves or rather of us moving in a space is most of the time related to the fact that we are at the initiative of this movement. The changes of point of view and listening point remain coherent because they are anticipated, and in adequacy with the "expected".

Audiovisual content must therefore take into account the viewer's expectation or habit of consciously or unconsciously anticipating a movement before it is perceived. This notion will be addressed in interactive content, in which the Internet user-viewer will be the actor of his navigation and sometimes the master of his "movements" or his position. It could also be a question of using head-tracker (monitoring head movements) in binaural rendering, so as to render a stable and fixed scene even if the person wearing the headphones makes head movements.

Like any new media, this new paradigm also needs known codes and therefore a certain amount of learning by the authors but also by the listeners in order to be understood and integrated by the people it is aimed at. A grammar of binaural writing will gradually be established with its currents and schools, its followers and detractors.

For which contents?

It will be necessary to differentiate between 2 types of content: those that are interactive and that allow the listener/user to control his movement and that will therefore benefit advantageously from binaural restitution, and the others.

The first case is that of virtual reality contents, whose interactivity is only at its beginnings but whose promising future developments are expected. There are more and more of them, often based on the principle of video games, in which the user can move around as he wishes and change his point of view and listening. The difference lies in the editorial proposal, which often moves away from the game to focus on the narrative, the feeling. However, this frontier is increasingly porous, with video games tending towards reality/realism while television tends to virtualize itself and interact with its viewers.

The binaural has much to contribute to these proposals. Because of its ability to touch the listener in a rather unconscious way, it can provoke emotions and transport the user quite effectively.

The second form of interactivity that is most often used on musical or theatrical content. These are contents that use a head-tracker on scenes in which the interaction is played at the level of the listener's position as a spectator of a scene in which he cannot interact but for which he can choose his listening axis. Unlike RV, this allows the viewer to be "audibly" in the center of a 3D listening room (as if speakers were placed all around him). The user with headphones can then turn in different directions and have the feeling that the scene remains stable. This technique can also be used to restitute mixed films for multi-channel speaker configurations for restitution with headphones as well as in the theatre.

Binaural listening is an essential element of immersive storytelling. Technical and technological know-how is certainly an important issue, but even more so, narrative innovation, with or without associated images, is essential. Creators with good hearing...

Article originally published on Méta-Media

Also read the rest of the dossier: Binaural sound: how to produce it? (in French)

and "BiLi", the collaborative research project on binaural listening (in French)