Audio Clocking and the Whatabouteries

As Mix Engineers, we usually are very well versed in clocking and its importance. But as a beginner, I remember struggling to understand exactly how it worked. What were the nuances of it? This time, I will attempt to break it down and in the course of it, we can try and understand why, how and when we need clocking. This is by no means comprehensive but should serve as something that would help further reading. But the fundamental thing to remember is that it is not about time as position, but intervals of time.

What is a wordclock?

Well, the most used example to explain this is the idea of a conductor for an orchestra. He is the one who instructs and forces the different players handling different instruments to follow the same tempo and thus be in time. It’s the same way when in audio we have multiple systems that need to be synchronized and run. But how does the IO know that?charles_dutoit_and_the_philadelphia_orchestra_concert_in_tianjin

In Digital Audio, we know that audio is sampled when converting from Analog to Digital. (We can go through how that process works sometimes in a later post or I may cover that in detail in one of my workshops.) For now, We know that at say 48kHz, the audio is sampled at 48000 times a second and the voltage values are converted to bits to be stored in Digital. Now, we assume that the sampling happens at exactly 1/48000th of a second. This is true only theoretically. For it to be accurate, there has to be a clock that somehow sends a signal saying sample now, at every 1/48000th of a second. This clock signal is usually a pulse. If this isn’t accurate, then the amplitude sampled will be at the wrong time, and that means when it is represented digitally, it becomes a different waveform altogether. This is also true for Digital to Analog conversion. This timing error, by the way, is called Jitter. So, essentially a wordclock is a pulse that is sent by a generator.

Jitter Timing

Dotted Arrow shows where the clock is supposed to be.

How is it Generated?

The Simplest Clock will have a Resistor and A Capacitor connected to an inverter. The Capacitor will charge and discharge electrical charge to the ground. Now, if we place a resistor, between the capacitor and ground, the rate of discharge can be controlled. Good. So why an inverter? Basically, an inverter will send out a charge the moment it does not receive voltage. It works inversely. (hence inverter ) When the Capacitor discharges completely, the inverter will have 0 voltage. This means it will send out a voltage from the output. This can be split. One goes back to the capacitor, and the other used as a signal. So, when this returns to the capacitor, it will charge and discharge, instantly closing the inverter output. This becomes cyclical. This is a crude way of how the clock can be generated as the output of the Inverter will open and close.

RC Clock

Now, this is not enough for our audio application, because it has to be high speed, and also accurate. Inaccuracy comes because the voltage that is supplied to the inverter can be unstable and therefore the capacitor charge and discharge time will vary, thereby changing the speed at which the pulse is generated. So an alternate way is to use a crystal that oscillates. This is the same way a watch works too. So, we can have a crystal that oscillates based on the current we supply to it. So, when a crystal is supplied with current, it oscillates. The Frequency of oscillation is designed by the cut and shape of the crystal. This is called Piezo-Electric Effect. This pretty much acts like a very accurate version of the Resistor-Capacitor clock we just saw above. This frequency generated can be amplified to a voltage that can be transmitted thereby producing the clock we need.


Crystals like quartz that are used above are susceptible to temperature variations that can cause Jitter. This is because, when heated, the crystal will expand. This means the shape changes. And when shape changes, the resonating frequency will change, thereby causing a variation in the clock. So, some designers can create what is called a Temperature Compensated Crystal Oscillator. Basically by heating the crystal to above the ambient temperature. A different way of doing this is using Oven Controlled. Basically placing the Oscillator in an oven that can maintain a constant temperature.

More Info about the difference here:

There are some manufacturers that use Atomic Clock like the Antelope where instead of using crystals, they use Cesium or Rubidium as the accuracy of this is far higher. But they also are quite expensive.

Jitter, and what it sounds like.

Like I mentioned earlier, Jitter is basically the timing error of the clock that causes the samples to be taken at a different time. This, by definition kind of states that it happens only during a conversion or a point where sampling has to occur and that is the A-D Conversion. Now, what is interesting is that for jitter to be manifested to a large extent, it has to occur where a slight change in timing will yield a huge change in amplitude value captured. So, if for example, you look at a low-frequency waveform, over time, the variation in amplitude is quite gradual. Compare that to a high-frequency waveform. Since the wavelength is small, a small variation will cause the amplitude measured to be more different. This means that the effect of Jitter will affect our high frequency rather than the low spectrum. This means that Jitter will have a cause of making the sound rather brittle. Now, as humans, our perspective of what is clean and close/neat is based on the number of transients that are present clearly.  This is also what impacts the stereo field and width. This means the representation of High Frequency. And Jitter will cause that to be skewed.

What is interesting is that the variation of Jitter over time can be plotted. This will yield the frequency of Jitter that happens so it will show how fast and slow in a given time period the clock varies. This variation frequency of jitter will cause harmonics to be generated. How? Think about it. When jitter happens, it will be faster or slower clock. So that means the captured amplitude and the time period is different. But when digitized, this means it is a different frequency. So, jitter will cause harmonics. These harmonics are called Sidebands. So if say for example the Jitter frequency is, say 2kHz, and you are sampling say 10kHz, it will generate 10+2 and 10-2 as the sideband harmonics, that is 12kHz and 8kHz. But we aren’t recording one sinewave in real life! So, the harmonics will be generated for all the frequencies we record. Now, again, we are dealing with Digital Audio. That means we have a sample rate. This implies we have a Nyquist frequency. This means any sideband above that Nyquist frequency will alias. This will create distortion. And again all of this is in the high-frequency spectrum! This also means that the generated sidebands are absolutely not musical, and hence can be heard to be more brittle sounding. Since the generated frequencies are dependent on the input signal, they are not random and hence is not noise but distortion. Therefore, Jitter causes Distortion.

Now, how does an IO unit look at the clock? We have just established that clock signals are pulses. This means that they are square waves. The detector in the IO will usually look for an incoming signal to cross a threshold, So when the pulse comes, it will cross a threshold and after the width will go to 0. So from that, the IO can get the clock. Now, since it is a square wave, there will be 2 things one needs to look at. The Spacing or the timing of the square wave. The other is the shape of the square wave. If it is not exactly s square wave, and the ends are slanting, this means it will affect the time at which the voltage crosses the threshold, thereby causing the IO clock to sample on the wrong time.

The Cables

Now, Clocks, as we saw above, are made of Square waves. A Square wave is made up of the sum of every odd harmonic of the frequency. When you take say for example 1kHz and add 3kHz, 5kHz, 7kHz, 9kHz, etc, it becomes a square wave. This means a square wave will have high-frequency content and any change to that will cause the wave to shift from being a square wave. We just saw what happens when the shape of the wave changes and how it affects the clock.  A Cable usually has a property of acting as a filter for high frequencies. This means that the length of the cable is very crucial. Not only that, when sending clock like this, the cable can also pick up noise which means frequency is added to the square wave thereby causing the shape to change and that converts to jitter. So, the cable used and the length of the cable used is very important.

Digital Clock

Very often, we clock from the input signal if it is digital, saw if its AES/EBU IO or MADI IO. How does this work? Understanding this is quite simple if we understand how audio bits are transmitted on AES or MADI. (MADI is a variation of the AES structure).

The above link can be explained in a very simple way. Let’s consider 48kHz samplerate. It means there are 48000 samples in a second. Now, Each cable of AES will transmit stereo. So, we need to figure how 2 channels are transmitted. So if we consider one sample, it consists of 24 bits. In addition to these 24 bits that represent the sample, it will have what is called a synchronisation preamble. What this basically is a kind of a header that tells when the block of data ends and if its left or right etc. So after 24 bits, you will have 8 bits that will set this. The AES format uses what is called a Bi-Phase mark code. How this works is that every data is transmitted in pairs of 2. Like 00 or 11 or 01. So for example, the bit 0 is transmitted as 00 or 11. Bit 1 is transmitted as 01 or 10. The only requirement is that each bit has to start with what is opposite of the previous one. Say, for example, we were transmitting a bit value of 110010. According to the biphase Mark Code, it would be the following.

Split each bit:


Now as per our rules, 1 can be represented as 01 or 10

So the first would be 01. The next number is also 1. But since the previously represented biphase number was 01, and it ends in the digit 1, we can represent the next 1 as 01.

So it now is 01-01. For the next digit that is 0, it can be represented as 00 or 11, Since the previous biphase ended in 1, this would be 00. (We cant use 11 because that would mean we have 3 consecutive 1s and that is violating the Biphase Mark Code)

So the taking all this together, it would be


Now, we just spoke about something that as 8 bits and is a synchronization preamble. What that represents is basically a collection of Bits that violate the Biphase Mark Code. That is it will have 3 values that are same. So example, the header can be

00010111 if the previous end value was 1 and 11101000 if the previous value was 0 etc. This is what determines that the block is ended.

So coming back to the whole transmission, there are 24 bits, plus 8 bits for sync added, this makes a subframe. Since 2 channels are transmitted, there will be a subframe for the second channel as well. So 2 subframes make a frame. A frame is one sample. So it is sent exactly how the sample rate is. So at 48kHz, there will be 48000 frames in a second. Instead of transmitting 48000 times in a second, 192 frames are grouped to be what is 1 audio block. So, 48000/192 = 250 audio blocks are actually sent in 1 second. Thereby, there is no overloading and the data transmitted can be in chunks.

Interestingly, 1 subframe has 32 bits. 2x Subframe make a frame. So 64 bits is a frame. 192 frames make an audioblock. So 1 audioblock is 192*64= 12288 bits. 250 Audio Blocks in 1 second. So 12288*250 = 3072000 or 3072kbps of data rate. Since the sync preamble is generated per channel, and in AES 2 channels are transmitted, the biphase clock is 2*3072000 = 6.144MHz.


Now, in the whole of the above, the way clock is extracted is by looking for the sync preamble that we just mentioned. The issue here is that since this is digital, the data transmission is not really dependent on high accuracy but just that the integrity of data is present at the other end. This means it can have a variation on the clock. That equates to jitter.

Daisy Chaining and Splitting

We have often had Daisy chained and also sometimes split as a clock (Using a T Split). Is this something that’s recommended? Well to understand this, let’s have a quick look at how an IO looks at the external clock. Every IO has its internal clock and that is what it uses for sampling the audio. When an external clock is supplied, what the IO does is to lock its internal clock to the external using what is called a Phase Locked Loop (PLL). Basically, it checks periodically to see if it has drifted and will rematch to the incoming clock. This means in between the check, there will be a variation, but since it is clocking back, the jitter effect will no longer be dependent on the internal one but the external clock. This, however, is still dependent on the noise and issues on cables we saw earlier. Now, once this is done, when daisy chaining, we use the output clock of the first IO to the second one. The issue is, the Output clock is the output of the internal clock that is PLL’d to the incoming master clock. This means it is not the exact clock that is sent and will have its jitter. This is then used to PLL the second IO and so on and so forth. This means as we daisy chain, the clock problems can increase! But what about splitting then? When you split a signal, the amplitude will decrease. This will cause jitter as we saw above with the trigger values. So, the best method is to feed individual clock to the equipments.

Now, this is by no means a comprehensive explanation. There are much more topics like jitter shaping, Asynchronous Clock Jitter Correction, SuperClocks, etc that can be talked about. But at a high level, this should be good enough for us to know about clocking in our studios.

You Can read more here:

Happy Mixing!