After watching Mike Thornton’s excellent video on Loudness, I thought I should write about this topic based on what I know, especially since we all have to deal with this part of the chain at some point. This writeup is quite detailed and I know is a long read, but I feel is very useful since it helps a lot of us when dealing with this matter. As always I would like to thank my colleges and friends at Avid who have been so kind, patient and generous in educating me in this area. A lot of reference has been taken from this pdf also.
Loudness and Metering
It is very important to understand how and why this has come about and what affects the measurement. In the end, we all want to achieve the specs we have to adhere to and get a good sounding mix. For that, we need to understand the philosophy behind these measurements. We will then proceed to take that information of metering and loudness, and apply that into mixing techniques. Please note that the techniques that I will be talking about are not rules, but guidelines. It is important to understand the philosophy behind that thought.
What Is Loudness?
Loudness is the characteristic of a sound that is primarily a psychological correlate of physical strength (amplitude). More formally, it is defined as “that attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud” This is the definition.
- It is relative.
- Not just peak level.
- Multiple factors affect it.
But in reality, it is something that is relative and dependent from person to person. Loudness is not just peak level and that is very important to know. Because if it were, a horse footstep would be louder than a scream, and we all know that isn’t the case. We will come into that shortly and you will see that peak normalisation and loudness normalisation can achieve different results. It is User judged. But, for measurements, we must be able to assign a number to this.
So, Loudness is relative and very subjective. In fact, there are variations that exist called BLV (Between Listener Variable) that includes some factors like age group, culture, gender, etc. and WLV (Within Listener Variable) that depends on Mood, focus on the material, etc. There is also variation in reproduction equipment, but we are not considering that now. If you watch a movie after listening to a concert, the movie may not be that loud. And this is also something that must be kept in mind when it comes to ear fatigue. Of course, if you’ve watched Gordon Ramsay in Hells Kitchen, you would know that it also varies from person to person. So, how does our ear translate the sound that we hear?
Fletcher Munson Curve and our brain
This is the Fletcher Munson Curve. It is a representation of the loudness and our sensitivity to frequency. Yes. Frequency does affect our perception of how loud a material is. (Remember that loudness is subjective?) This graph shows the original Fletcher-Munson curve in blue and the modified ISO 226-2003 curve in red. You can see that over time, we have become less sensitive to the same frequency. The Fletcher-Munson curve was made in the 1930s. So that means we have developed a differential hearing.
What is interesting is if we inverse the graph, we get the EQ curve that is in our brain. (Edit: Not that this is something applied in our mind, but just to represent how the response towards frequencies is. Thanks t0 Stefano Campello for the constructive feedback on this.) And looking at that you can see an obvious bump in the upper mid frequencies around 2k to 8k. So, translating this into a mix and number is interesting. It means that the more these frequencies exist in a given program or material, the louder it will be. It also shows us the sensitivity we have as the signals are increased in levels. So, one of the things we can deduce for ourself as a mixer from what the graph shows and subsequently the plot on which the Loudness calculation is based on, shows us is how to achieve a good mix based on overall frequency distribution in a program. So what affects our perception of loudness?
We now know that frequency affects us. Our monitoring level in the studio is also an important factor because if we are monitoring the wrong levels, it means that according to the Fletcher-Munson curve earlier, the response of our ears is different for the same frequency at different levels. So, while the meters may show something, we may mix according to something completely different to our hearing. Time is another factor that affects us as it is directly related to ear fatigue. It can be noticed if you hear people talking among themselves after a Rock Concert. They can be seen talking louder even if there is no external noise factors. The Meter scale is something we need to understand to best deliver the mix we need to. If we don’t know what we are reading and how it is affected, then the mix becomes difficult. Standards is a very debated thing. There are various standards for different broadcasters. This is something that affects the end delivery to the consumer. This variation may not always be extreme in cases though. That being said, Loudness though relative has a number for measurement. There are some terms we need to become familiar with. But there is one more concept we need to be aware of.
The PPM measurement has a problem. It only identifies the peaks of digital audio, or the Red dots. How is that important? Think of this. We don’t actually listen to digital audio. Digital audio is converted to analog through the system’s DAC, and is then converted again to acoustic (analog) audio by the monitors. The Analog section is where things are important. While digital signals have instant variations, an analog system has to fight momentum. A peak in your digital audio could actually be much higher in the analog realm as seen in the graph, sometimes as much as 10dB or more! This means we could be facing unwanted distortion in your analog components and transducers. This is also +0dBFS. This induces issues not only for audio reproduction but also with codecs. So, head room is needed in a lot of areas like DA Conversion, Data Reduction Codecs, etc. This is why True Peak is also part of the ITU standards.
LUFS and LKFS
LUFS and LKFS are two terms that are commonly used. LKFS is Loudness, K-weighting, with reference to Full Scale and LUFS is K-weighted Loudness Unit with reference to digital Full Scale. They are essentially the same. Around 5 years or so back, the ITU introduced the LKFS along with an algorithm to measure Program loudness and true peak level. It described a way to measure the average loudness of a piece of audio and gave you a value for the audio as a whole. A few years later EBU found an issue with the measurement in this. The issue was that long sections of quiet parts would decrease the average loudness. So, they introduced the R128, which had corrections for this and also introduced some terms like the LU, LRA and LUFS. The LRA measurement was actually developed by TC Electronics, who are part of the EBU PLOUD group. The big change in the EBU and the major difference then was the introduction of a gate at -8 LU. LU is a relative loudness measure, so if you are targeting -23LUFS, that means at -23LUFS, it is 0LU and at -26 LUFS it would be -3 LU or -3dB for that target.
The big change in EBU was the introduction of a gate that would stop data being taken into account if the level dropped below a certain threshold. This is also an adaptive gate, Because consumers judge the loudness of any particular piece of audio based on the loud parts, what the EBU called the foreground loudness, this gate would ignore any data that was 8 LU below the ungated measurement, so quiet sections would not skew the overall measurement. So, at that point, there was a difference between LKFS and LUFS. LKFS was the ungated measurement, and LUFS was the measurement that included the effects of the gate. Generally, on your average TV programme (over a period of say 20-30 minutes), the difference between the LKFS and LUFS measurements is usually between 1-2 LU. However, in March 2011, the ITU updated their paper with ITU-R BS.1770-2, incorporating the changes the EBU recommended in it’s paper R128, namely the gate. Both the ITU and the EBU agreed that they would both change the threshold of the gate to -10 LU. Now, it is ITU-R BS 1770-3 which has a slight change in true peak calculation. (ITU-R BS stands for ITU-Radio communication Sector, Broadcast Service.)
ITU-R BS.1770 is the parent of ATSC RP A/85 and EBU R128. It is the loudness metering specification used in those two broadcast television specifications (A/85 and R128). The specification is something we just saw earlier. Because we don’t perceive loudness as instantaneously as a Peak Program Meter registers audio peaks, there is a time variable needed for perceived loudness measurements as well. The specification includes this, and meters that comply often offer you multiple measurement window. Anywhere from 2 seconds to “Infinite” (meaning over the duration of your entire program). ITU-R BS.1770 does not stop here, as there are two other important considerations built into this spec.
The first is True Peak which we discussed earlier.
The other consideration is scalability. Loudness perception can change as you add or subtract channels. Part of the specification is a system that ensures that metering can be accurately applied over mono, stereo or multichannel material. Lets glance at the calculation for this.
Our hearing is not flat; we have a pronounced rise in our hearing sensitivity, peaking around 4 kHz and almost two octaves wide (it varies with the amplitude of the sound, as shown in the Fletcher Munson Curve or ISO 226.2003 curve). To approximate that rise, the measurement algorithm uses a high-frequency shelf that hinges at about 1 kHz and boosts the entire spectrum above 3 kHz by about 4 dB. This is the Pre-Filter as shown above in a graph. The RLB filter is B-weighting and is intended to approximate human hearing at moderately loud levels (it is, I think, modeled roughly on the inverse of the 70 Phon Equal Loudness Contour, which is anchored to 70 dB SPL at 1 kHz.). In any case, the revised B-weighting filter used for B.S. 1770-3 begins to roll off below 200 Hz, and is down 15 dB at 20. The theory is that when such a response curve such as K-weighting is applied, the resulting changes in program level adhere more closely to subjective human estimations of changes in loudness than does simple amplitude change. As a result, such levels can be used successfully to estimate relative “loudnesses” of different programs for a wide range of end-listeners. This means that, if we use these measured levels (which we call “loudness” even though they aren’t) carefully, we should have more uniform and predictable loudnesses for our end-users.
True Peak Calculation
Like I mentioned earlier, True peak is used to measure the distortion that happens after the signal gets converted to analog. Now, this is an approximation because it isn’t possible to accurately find the peak once the signal has converted. The True Peak Level of an audio signal indicates the maximum (positive or negative) value of the signal waveform in the continuous time domain; this value is, in most cases, higher than that shown by a quasi-peak meter or even a sample-peak meter, both of which would miss the true peaks which potentially lie between samples. This is why there is an oversampling done. So, if the audio is at 48kHz, the 4x Oversampling means it will be sampled to 192kHz while measuring. If this is done directly, that may induce distortion, so the signal is reduced by 12 dB or 4 times (to reduce signal to half its strength, it has to be reduced by 3dB. So, to one-fourth because it is 4x oversampling, it is 4 times 3dB or 12dB. Ah, the joy of logarithms!) This value is then displayed after a filter (which was introduced in the 1770-3) to give the true peak display. The maximum permissible is -1dBTP to allow for the inaccuracy. It is interesting to note that the true peak level needed is also determined by the bitrate of the coder used in Broadcast.
So essentially there are 3 values to remember:
- The Program Loudness
- The Loudness Range
- The True Peak level
We have to change from Peak Normalisation to Loudness Normalisation. If you look at the figure above, you will see that a movie has over 60dB of dynamic range. So, while the peak may hit 0dB, the loudness is around 24 dB lesser. So, a peak normalisation between a cinema and a broadcast commercial, will have a difference of nearly 18 LU in the loudness level. On the other hand when we achieve uniform loudness, the peaks may vary. That also has a change in headroom.
Imagine that all these tools are the different programmes. If we do a peak normalisation, we would end up like this. But there is an issue. The loudness density (center of gravity of these tools) are all different. So, something like the sword may be a dialogue based film with occasional loud gunshots that peak and hence the blade extends so much. Playing these programs back to back will introduce a huge variation. Studies have shown that 95% of the viewers will change the volume on their TV or receiver if the loudness jump or difference is an increase of 5 LU or decrease of 8 LU. As we look at this graph, we can immediately see that the viewer will be annoyed by having so much variation in tolerance. So, lets say we come across this issue and want to maintain this loudness standard and yet do a peak normalisation.
Lets assume this is the loudness we want to achieve. To do this, we would have to reduce the size of all the tools. This makes it practically useless as a tool. The sword becomes a toothpick!
So, if we go back to the initial scenario we had, and create the loudness standard, we can see that it is very easy to align all the center of gravity to this. The peaks may differ, but atleast the relevance of all the tools remain. This is why we need to understand how and what is to be done for maintaining the value in a mix, and why the numbers and terms are important. Coming to terms, these are the measurements we need to look at.
Program loudness or I value is the total loudness of the program as a Whole. It starts measuring with the beginning of the program and ends with the end of the program. The Momentary Loudness is an instant loudness in a short window. This helps to see if there are gun shorts or short bursts of loud sounds that may push up the average. The Short Term Loudness is a 3 second measure window that deals with every 3 seconds of an audio clip. The M and S are important values to keep in mind if mixing a live to air show and you need to achieve a consistent LUFS value. The Loudness Range gives the variation of loudness in the program. It is a constant measure of the difference between the highest LU and the lowest LU. This means that if there is an LRA of 10, to keep a constant loudness, the average fader ride for that program will be between +5 dB and -5dB. This is again a good way of translating a reading into a mix value. The LRA also predicts if a program fits consumer requirements. Theatrical TV: Below 20 LU Casual TV: Below 12 LU Mobile TV: Below 8 LU
The example readings of these values can be seen on this plugin which is the Pro-Limiter from Avid. The earlier mentioned values can be seen on the display here. In addition, the Pro-Limiter also has a histogram display to show you the overall loudness distribution over time. There are other plugins like the Nugen VisLM and the Waves WLM that also show these values.
One way of looking at a mix is by comparing it with a DAW. It has an input, processing and Output. The Input is the section where we receive the tracks or record them. It is always a good idea to start the tidying up and having a clean and manageable session from the start. Because a mix is always a multitude of adjustments that we constantly do. At this stage, having a clear session and track record of the various channels, helps us to get it faster. The Second stage is the Mixing or Processing stage where the Mix happens. This is the core part of the process and where all the numbers we earlier referred to make a lot of difference. It is important to have an idea of the Dynamic range, the tonality, and the proper levels we need to have in the mix based on the delivery. The Delivery or output is the last and key stage in a delivery. The mix translation is something that should be constantly checked and if there are any discrepancies, they should be rectified as the process goes along. It is also important to know the mode of delivery, if there are any codecs or compression being used, what is it, how does it deal with frequencies, etc.
Regarding the Input, there are various kinds of materials and each has its own challenge. While the challenges may not be unique, they can also be common amongst the sources. The solution for all of the challenges is something that we have to be familiar with. That being said, the solution ideas and techniques are merely guidelines and not rules. They can and must change per engineer based on how he or she works and handles a session.
I have some ways that I set up the session. For example, Naming Tracks, I would have VCAs start with v, Reverbs with x, etc. Color Coding tracks are important for Visual Feedback. This gives us at a quick glance, where the tracks are. So, I color my Dialogues different from Effects, and Ambience or music etc. The advantage of this is also evident in our S6 console where the tracks can be arranged according to color on the matrix display. Track arrangement. I arrange tracks based on type. Usually dialogues, followed by its reverb send, then its VCA Master, followed by Effects, in the same manner, then ambience, music etc. The Master tracks area towards the end of the session so that I can quickly get to it if I need. The sends are configured in a particular way. I always have my sends at 0 dB. By default, Pro Tools sets the send level to –inf. You can change that to 0 by going to Setup-Preferences and unticking the checkbox that says Sends Default to ‘-INF’. The reason I do this is so that the signals when sent to the reverbs or processing are sent in full strength. So, the gain the plugins work on is good and optimised. Also, it helps me if I need to set room reverbs. Usually the reverbs will be at a same level for all the characters. So, I can control the final level or the Wet/Dry level using the Reverb master fader. The final Mix Tracks are arranged to be towards the beginning or end. I usually have it at the end because it helps me to check the levels and prevent any accidental edit or so on the tracks.
I know this has been a long read and has various references from around the Web some of which I couldn’t credit because I was unable to trace it back. My apologies for this. I hope the concepts explained here and the Techniques I shared are put to use. Remember, there are no rules. Just ideas. Share if you found this article useful and I would love to hear your comments. I am not a perfect Mixer (in fact a lazy one if you follow my blog, hence the applescripts!) , so if there are mistakes I made above, I would love to correct it and share that! So, till next time, Happy Mixing!