Hi. This time, I am presenting a transcript of the presentation I did at IBC 2016 regarding my whitepaper on Reverse Engineering Emotions in an Immersive Mix Format. I was very fortunate to have been able to do this and also engage with a lot of people who are actively pursuing this particular area of thought. (Makes me think I should be doing more workshops for the interaction!)

All of us reading this have a certain expectation of what this is about. I have a certain excitement, happiness, anxiety. All of this, happiness, excitement, joy, sorrow, expectation, tragedy, comedy, boredom, is all part of one single umbrella. Emotions.




I have always thought. What would art be without emotions? What would our life be without emotions? We cry while watching a movie, yet can sometimes withstand a muscle tear. Today, I would like to briefly talk about the how and the why of using this in storytelling. 1/3rd of my life has been mixing sound for films and as a re-recording mixer, I have to deal with varying kinds of emotions onscreen.


Image Copyright of http://www.benedictcumberbatch.co.uk

This is a favourite character of mine played by Benedict Cumberbatch. What is the emotion you can associate to the image? Inquisitive? Accusive? Thinking? Reflecting? Calculating? Questioning? Content? Unhappy? Accomplished? The truth is it can be anything. But it can only be confirmed with the context. Our mind constantly seeks an emotion from a context.

Sound has very little emotional relation or meaning to the person experiencing it unless there is a context to which it can be associated.

That’s why, as a listener, it is very important to understand that sound has very little emotional relation or meaning to the person experiencing it unless there is a context to which it can be associated. The Context it is Associated. For example, the sound of a crow in a movie may not indicate much and could be taken as part of an ambience, but when it is shown in the context of let’s say witchcraft or so, the meaning associated with it changes. On the other hand, there are sounds that are conditioned into us. An example is the sound of a wolf howling. This creates an eerie atmosphere only because the sound has been used multiple times for this very purpose. But how is this response created?


Psychological Mechanisms


Patrik .N. Juslin in “Seven Ways in which the Brain Can Evoke Emotions from Sounds”, Sound, Mind and Emotion, The Sound Environment Centre at Lund University, Report no. 8, Sweden, 2009 has presented a study on how the brain creates an emotional response from the sound with six psychological mechanisms.

Brain Stem reflex. The Loudness or acoustic character defines the proximity of the sound. It can help shape to tell us if the audience is first person, second person or third person in the story. Example Earthquake or Accident.

Evaluative conditioning. This is a very often used technique in film and sports. For example the theme music in films or the rising pitch of a commentator just before a goal to build up the moment.

Emotional Contagion. This is a way where the listener will physically reciprocate the emotion created with the sound like for example tensing the muscles or an internally creating the emotion. This is very much similar to anchoring.

Visual Imagery. This is where elements of sound design like a crowd in a stadium, or a single cricket sound can evoke the image of the context. Another good example is the cover of music albums in certain genres that can evoke a certain emotion when listening to that music.

Episodic Memory. This is another interesting method where a past event can be triggered. Like “Hey that’s our tune they are playing. The sound of old video games to sometimes bring the image of being young and the excitement associated with it.”

Expectancy. This is where a certain sequence of sounds or musical structure will create an expectancy in the way it will go forward which when broken will cause an emotional response. It is very similar to conditioning.

(I did an experiment with the audience where I could demonstrate this on stage which unfortunately is not possible to reproduce in writing.)

Positioning and Response


One of the ways we accept and emotionally respond to sound is based a lot on the position of it and more importantly the proximity. On a usual question to students and professionals on what they perceive is more intimidating; is it a sound of a twig break or a tiger growl. Usually, the response is twig break. This is because of the amount of imagination we put into the context of the twig break and the reason we create for it based on the environment we are in at that moment. We can identify a tiger growl. But since the twig break can have multiple causes for it, we recognise fear more. The second question I posed was where would be scarier – in front or in the surrounds. The majority of the response was the surround. Again the reason I would put for this is based on episodic memory because a sound without a known source would be more intimidating than one we know.

This is a very important factor. The reason is that this helps establish a way to anchor the audience in the emotion based on the position of the sound. This is what is achieved to a fairly large extent with immersive mixing techniques. One of the primary methods to achieve this is through Dynamics. Dynamics help us to convey the story in a much more emotionally engaging way because it is interesting to know that cinema is a very suggestible medium. The audience when in a theatre, usually have an expectation of the genre of the movie. This can be used with a lot of effect by the expectancy mechanism I just explained earlier. As a mix engineer, sound designer or composer, one can choose the finer timings to break and carry the audience along the story. This is why dynamics is quite important.

According to me, there are 3 main kinds of dynamics once can achieve in sound. Volume, Frequency and Position. We are very well versed with the first 2 methods and are quite common in a stereo mix or music mix. The position is something that can be quite interesting. This is what brings the audience into becoming first person, second person or third person in the experience of viewing.

Methods in an Immersive Mix


There are a few methods one can use when doing an immersive mix.

Sense of Position

The closer and more realistic the positioning is, the more effective it would become in putting the audience in the space of the character. Of course, while this is also related to the visual image onscreen to maintain the distance, this helps in making the audience identify with the character or situation much more easily. For example, constructing a street or restaurant that is hyper-real to then taking it out of context for an emotional sound design need. You can have the height layer and the lower layer or very specific objects in say a Dolby Atmos Mix with very accurate spatial positioning. Unlike the traditional mixing methods of 5.1 or 7.1 Surround, this gives a huge scope for realism and believability.

Sense of Proximity

There are 2 main factors that affect the sense of proximity to a sound. First one is the Space, the second being the attack. Space is something that can be manipulated using reverbs and EQ. Space also provides the listener with the distance that one has from the character in the given scene of the film. This means that it can be effectively used to determine what amount of engagement the audience has at that given point in the story and whether it should be broken for dynamics. Proximity in Immersive mix formats allows for a much higher accuracy because of the Height, depth as well as positional accuracy (in Object-based panning systems). A combination of these three provides the listener with the current space they need to experience.

The Attack of a sound also determines how close it is to us. An object based panning system can provide us with a much more accurate representation of this. This technique is something quite often used in music and film where a compression on the attack of the given sound to reduce its proximity, for example on the tambourines or snares. This can also be used in Foley sounds in film for footsteps or other sounds that can be very close or far. This can also be used effectively to subconsciously invite the audience into the soundscape. A trick I usually do in this method is to gradually reduce the level of dialogue or elements in a mix in a very emotionally intense scene. This can draw the audience into the screen and help the focus. Subliminally introducing other elements gradually will help ease the tension on the screen when needed.

Music and placement

Music is one component that has a huge emotional impact on the listener. There have been various studies regarding the emotion of instruments some with an interesting result where the trumpet sounds happier in character compared to the horn. Although the study of that is not part of the scope of this paper, the placement is something we can explore. This also is a very debated topic when it comes to mixing. The placement and movement of sound is something that can and is experimented in the context of the story. Places where the audience needs to reflect and concentrate on the screen itself, the score can be placed very near to the character on screen. This brings a form of personalization and identification to the character. Of course for this to work correctly, the music must be in the right genre that has been established. Else, the audience will dissociate them from the piece and the emotional context of the music. Using this technique, it is easier to have the instruments “Speak” to the audience and have the score compliment the shots on screen, like if it is a close shot, or a sweeping pan shot etc.

Sense of Reality and Dissociation

One of the best advantages of the Immersive format is constructing realistic scenes and then using this to bring the audience in and out of the storytelling as an art. For example, creating a realistic scene of a street in India that has a character walking through it and slowly shifting it to a character perspective or POV. This is something that gives immense scope for the positional dynamics that I earlier mentioned. So, it is very easy to craft an environment that can be extremely engaging and slowly change or remove elements that influence the listener thereby shifting the focus to the screen or the character or the location as needed.




The above techniques can be expanded to a wider form of a better use of these as sound design / music when it comes to storytelling in cinema. The advances happening in Reactive Experience to sound in some cases like the Virtual Reality Headsets, and the techniques being employed there, it can only be closer to achieve the emotional response we want from the audience. This new field has the advantage of being reactive unlike cinema, which is more of a guided experience. That was one of the primary reasons I feel that Immersive audio was created for film. Being able to do this and reversing the way we emotionally react to sound and its placement is something that has a lot of scope for development in the technical as well as the artistic field. It has also demonstrated and shown various studies and the interpretation of that for mixing sounds that provide the listener with the experience and more importantly, the creator with the tools and understanding for better storytelling. In being able to use Primed association of audio, along with Positional dynamics, getting the audience to enjoy the creator’s original intent is a closer reality now than it was before.


I hope this was a useful read. If you would like to read my paper, you can find it here.

Happy Mixing!