Mastering audio for DVD is a least looked into aspect in our film industry. Many films are released on DVD where audio is just taken from the print. Most people don’t realise that after a few weeks or months, the movie is going to be stored on DVDs. Directors and Producers who take the effort just to get it into the theaters should appreciate the fact that ultimately, people will keep the collection on DVDs or now Blurays.

But then, what should be looked into this? And why Master for DVD? The answer is simple.

The home viewing and theater viewing are entirely different worlds. Both have different dynamic ranges. In a home, its a non controlled environment. One more factor that must be kept in mind is that at home, the room is usually very live. Therefore the High frequency response is different.

When mixing a movie for theatrical release, there is something called the X-Curve or eXtented Curve. This curve came from the research done by C.P. and C.R. Boner. They found that in a large room, the frequency response would be on the brighter side for a normally balanced program. In other words, the sounds would be perceived to be bright in a large room, although the material may not be so.

To compensate for this, movie theaters and mixing studios are calibrated using an X-Curve where the sound would be flat upto 2kHz and then have a 3dB per octave roll off after that. The measurement of EQ is done using wideband pink noise excitation through the speaker. The larger the room, the longer and greater the reverberation builds up over time; resulting from this steady-state signal. If we measure EQ on a fixed bandwidth basis, like full octaves or third octaves, the reverb which tends to be stronger at low frequencies will tend to make the meter read louder there than at the higher frequencies where it is better absorbed. By knowing how much the meter reading is influenced by this effect, a curve can be used to compensate. That’s what the X curve does on both the higher and lower frequencies.

The X-curve also varies between rooms because it is dependent on the size. Calibrating this brings uniformity in the sound mix across theaters. (Technically!!)

Now, in a home environment, it is closer and not big. So, if a normal program mixed for cinema was to be played, that would sound much more brighter because EQs are applied for the 3dB/octave dip in theaters. So, the theater mix cannot be played directly on the home speaker. It requires some amount of EQ to get it sounding right.

At this point, it could be argued that isnt it enough to just apply an HF shelf generally on the material while mastering it. I personally feel that it can be done only if the time in hand is less. I wouldnt advise that because there are so many things in Music (high strings or Synths) and Foley (shuffles or intimate sounds) that lose clarity when done so. Remember at this point of time, I am just speaking of sounds in 5.1. Stereo is another matter that we will approach in some time.

It should be noted that there is a difference in 5.1speakers for Home and theaters. In a theater, the surrounds are not a point, but rather spread like a long array. In a home theater, the surrounds are single satellite speakers on either side of the viewer. Also, most of the times, that is very near the listener because the front speakers are placed near the TV and the surrounds, well, just beside the sofas!

So,what levels must be maintained for surrounds, and what frequency ranges? Level-wise, I dont interfere much, but if its too much, I take it down 2-3 dBs at that point. Also, I roll off a bit of the low frequency in the surrounds. There is a reason behind this. If being encoded in DTS, it must be noted that whatever low frequency is there in the surrounds, will go into the sub whether it was intended or not. So, cleaning the surrounds will help in getting a cleaner low end after decoding.

There are 2 other things that are to be taken care of now. 1 is converting to 25fps, and 2 is making a stereo mix.

There again are 2 schools of thought for the 25fps conversion. Both although agree that the length of audio must change, they differ when it comes to pitch. The length change occurs because film speed is 24Frames per second while for PAL broadcast, its 25 Frames per second. The total number of frames dont change for a film. So the speed will vary depending on when you play 25 frames in a second or 24. So how is this conversion achieved? If you have money, you could very well buy a dolby 585 Processor that does both time scaling and pitch shifting.

Why does pitch shifting come into view here? Well, simple. When converting audio to 25 fps, we are literally speeding up the playback. This increases the pitch. Now, personally I have found an issue while using conventional Pitch Shifting.

Conventional methods when using Pitch n time to compress audio or the Waves time Shift may work very well in a stereo situation, the same doesnt hold while handling multichannel audio. The reason being that they are simply not designed for Multi Channel Audio. If you were to process a 6track mix for such a time compression, it would treat the audio 2 channels at a time. The problem here is this. If the layout is on a 6 track channel in lets say Protools, the 2 channel pairing would be L-C, R-Ls, Rs-LFE. Now, that is in no way co-related channels.

Ok, lets then use a pairing like L-R, Ls-Rs, and C-LFE. Good in a way. But there is another issue here. The reverbs usually used are mostly made up of Coherent Waves. This means they are the same signal wave in the channels. Separating this is not possible, but then introducing flange is highly possible. This is because the Waveform structure changes to accommodate both length and pitch changes. This changes phase relationships between channels. All this is not that much noticeable when played on an exclusive 5.1 system. But the moment the player starts to fold down this into stereo, the flanging and phasing effects start to become prominent. Especially in situations where there is no channel lag taken into consideration and the source audio in the mix is just 2 tracks. (example a commercial song that is panned 50% between front and surrounds). The same effect happens when there is a usage of Divergence especially in ambiences and motonal sounds.

Now, I am going to divulge a method that I have been using for a long time and is not very known. The downside to this is that there is a pitch change that occurs. But the upside is that the audio phase relation will remain exactly as it were in the multichannel mix. Another advantage is that even if the audio is an encoded channel, the phase relation will be exact.

There is a small amount of maths involved, but its very elementary (Mr. Watson!).
Lets assume that you have a clip whose properties are:

Length: 10 Minutes at 24 Fps
Sample Rate: 48kHz
Bit Depth: 24
File Format: .wav

This file will have a finite File Size. When converting, the only thing we can change without processing is samplerate. So, if we change the sample rate, in order to maintain file size on the disk, the length will invariable vary.

Why? Samplerate is nothing but a value entered in the header of the file. Lets do a simple calculation here.

For 24 FPS, the sample rate is 48000 (48kHz)
So, whats the sample rate for 25fps? simple
24/48000 = 25/x
so, x=25*48000/24 = 50000 (50kHz)

Using audio in 48kHz is better because you end up with a round figure of 50kHz. (use, 44.1 and you do the math!)

Once the file sample rate is changed in the header (you can use sound hack or even Protools for that), the entire waveform structure is maintained perfectly. There is no phasing issues that can crop up. So, import the wav files (convert if asked to) into a 48kHz session, and you have a 25fps Perfect wavefile.

Now, on doing the required eq and processing, you can create a perfect mastered audio for DVD. This audio will have no artefacts on being mixed down into stereo by a player.

That said, I feel that it is equally important to include a stereo mix in the DVD rather than allowing the player to do so. Why? Simple. Many people buy the DVDs for the sake of video quality and crystal clear sound. But most of them also listen to it on a 2 channel reproduction system like TV. Having a Mix for that in a DVD is not much of a hassle. But then, we as audio engineers must stress the importance of having that on the DVD.

Doing a Mixdown or stereo mix is not that easy or that difficult because ultimately it depends on the type of material. Whether its a talkie or action. I usually work from my mix stems for this. (I create a Dialogue, Music, FX stem which at 0 level is equal to the final Mix). There is a lesser bandwidth in TV. Personally, I have always felt that dialogues are the ones that take the biggest hit when doing a flat downmix because the left and right channels are invariably heavier. (In some DVDs, the Songs are way too louder than the Dialogue so if we set the Volume level in the TV for the song, the dialogues turn to be very soft and so on…) This requires a control on the dynamic range. I would rather not leave that to the Player!

What would happen to the Low frequency channel in a down mix? Well, if you are doing a master of your own dvd, you have control over it. Unless absolutely necessary, I dont include it much in terms of levels because it increases Low Frequency in the channel, making it need more energy to transmit or reproduce in home speakers. So rather than just plain filtering, I use discrete automation in that. (Actually, I automate the EQs and the MultiBands too!!)

Well, thats a start on Mastering Audio for DVD. I know I haven’t touched on the topic completely, I feel that this is enough to get anyone started.