Understanding Audio Compressors

1   Why Do We Need a Compressor?

Over the years, I realized that any mix kind of lives inside a window. At the bottom of that window is the noise floor – like the hiss of the recording medium, the rumble of the theatre’s air conditioning, the rustle of the audience etc. At the top of it is the pain threshold and the physical limits of the playback system. Which means, everything the audience hears must fit between these two boundaries.

The trouble is that real-world sound doesn’t respect boundaries. You see, a whispered confession might sit 50 dB below a gunshot. In a theatre that gunshot would hit 135 dB which is well beyond the system’s headroom and the audience’s tolerance. And the ambiance of a quiet scene, at 35 dB SPL, would disappear under the theatre’s ambient noise. The full dynamic range of the performance simply cannot be reproduced as-is.

A compressor is a tool that solves this by watching the signal level and automatically turning it down when it gets too loud so you can then turn the overall level to match your need. You set a threshold which is a level above which the compressor starts working. You then set a ratio that determines how aggressively the signal is turned down. So, at 4:1 for example, for every 4 dB the signal exceeds the threshold, only 1 dB comes through. The dynamic range above the threshold is reduced to a quarter of its original span. You then add makeup gain to bring the overall level back up. This way, the loud parts are tamed, and the quiet parts are relatively louder. The attack is simply the speed or the time it takes the compressor to reduce the signal to 4:1. The release is the time it takes to go back.

Now, though this sounds simple in concept, it can be extraordinarily complex in execution. This is because the compressor has to make these gain changes smoothly, sometimes thousands of times per second, without the audience hearing it work. Every aspect of how it does this – like how fast it responds, what circuit element does the actual turning down, where it measures the signal level, etc. – changes the sound. This is why there are hundreds of compressor designs and why audio engineers like us have such strong preferences about which one to reach for.

2   How A Compressor Works

Let me try and explain how a compressor works in a very intuitive way, using a remote control. My elder one wanted to watch her favorite cartoon on TV. And as always, the person with the remote wins any argument. So, I told her it was late and she had to go to sleep while I decided to watch TV. Although unwillingly, she did and told me she would come back if she hears anything loud. I agreed. She left and I put on a movie. I turned off the lights, set the volume to where I could hear the dialog clearly but not loud, and set the remote aside and was watching.

Suddenly a gun shot and a fight sequence starts. I take the remote and reduce the volume in steps, wait till the action is done and bring it back up again. Truth is if you have done the above activity anytime, you have understood an audio compressor.

The level at which you realize that the sound is loud is called the threshold. The amount by which you reduce the volume (it’s not the same every time is it) is called the ratio (as it depends on how loud the sound was). The amount of time it takes to reduce the volume to a comfortable level on the remote is called the attack time. The time you take to bring it back to where you were ok is called the release.

What else do we infer?

If we are late to reduce the levels on the remote (higher attack time) more loud sound passes (transients go through). If we have seen the movie before, we know exactly when to drop the volume, thus having a smooth audio. (Lookahead). If we begin to lower the levels before hand anticipating the loud portion, that’s knee. If you were always watching the TV with the level at say 30 and for this film you kept it at 40, that’s input gain. If you have no neighbors and are blaring the tv and it sounds fine to you, that is brick wall limiting.

So, if we strip away the brand names and the vintage mystique, every compressor ever built follows the same architecture. Understanding this architecture makes the differences between compressor types easier to understand.

2.1   Two Signal Paths

Block diagram showing audio splitting into two paths: one through the Gain element to Output, and one through the Control/sidechain which sends a Control Signal back to the Gain element
Figure 1. The two-path architecture at the heart of every compressor. The audio path passes through the Gain element; the sidechain path analyses the signal and sends a Control Signal telling the Gain element how much to reduce.

This is how it starts. The audio enters and immediately splits into two copies. One copy goes straight through to the output but passing through a gain element. This is the component that actually turns the signal up or down. This is the audio path that we hear.

The other copy goes to the sidechain which is a separate analysis whose job is to measure the audio level and generate what is called a control signal that tells the gain element how much to turn things down. It kind of listens, decides, and sends instructions.

The gain element follows the sidechain’s instructions and changes the level of the audio path. In a digital compressor, this is a simple multiplication where the audio sample is multiplied by a number between 0 and 1. In an analog compressor, this element could be a VCA chip, a FET transistor, a photocell, a vacuum tube, or even a magnetic transformer. The choice of gain element is actually what defines the type of compressor and gives it its sonic color.

2.2   Inside the Sidechain

The sidechain has several stages, and each one affects the sound.

Measurement. We know that the raw audio swings positive and negative. This means you can’t make gain decisions based on a waveform that’s crossing zero sixty times a second. So the sidechain first converts the audio into a smooth, positive representation of its level. There are two ways to do this.

Peak detection follows the instantaneous amplitude. This is the absolute value of the waveform. It’s fast and responsive, catching every transient. This is what makes the 1176 so aggressive on drums because it sees and reacts to every single peak. The other is RMS detection which tracks the average power over a short window, that corresponds more closely to perceived loudness. This is why opto compressors like the LA-2A feel smoother and more musical. The photocell in this (the opto part) naturally averages the signal rather than tracking every peak.

Compression. The gain element takes the measured level and applies the threshold and ratio to calculate how much gain reduction is needed. Below the threshold, no compression. Above it, the signal is reduced according to the ratio. A soft knee makes the transition gradual rather than abrupt. This way, the compression eases in over a range around the threshold. Hard knee provides a sharp, definite onset of compression that is more audible but more controllable.

Smoothening. If the gain element’s decisions were applied instantly to the audio, you’d hear every gain change as a click or a thump. This isn’t something we want. There is an envelope follower that smooths these changes over time. The attack time controls how quickly the compressor can gain reduce a signal to the ratio that’s set, when signal exceeds the threshold. The release time controls how quickly it lets go when the signal drops back down.

Right here in these timing controls are where much of the art of compression lives. Fast attack catches transients but can kill the punch of a drum hit. Slow attack lets the initial transient through (preserving the attack of the sound) but may allow brief peaks to pass uncompressed. Fast release tracks the signal closely but can cause audible “pumping” as the compressor rapidly ducks and recovers. Slow release provides a gentler, more transparent compression but may hold the signal down too long after a peak, suppressing the material that follows.

The Giannoulis, Massberg, and Reiss study (JAES, 2012), which to me was the most thorough academic analysis of compressor design ever published, tested four different envelope follower designs and found that they produce very different amounts of distortion. The smoothest designs (which they called “smooth decoupled” and “smooth branching”) generated much less harmonic distortion than the simpler designs used in many classic analog compressors. To be fair, the distortion from the simpler designs is part of what gives those compressors their character, so it’s not a defect but rather it’s the sound.

They also found that where you place the level detector in the sidechain matters enormously. If the smoothing happens before the gain calculation (the way most classic analog compressors do it), the compressor has a brief delay before it starts compressing. This is because the envelope has to “charge up” to the threshold level first. If the smoothing happens after the gain calculation (working directly on the control signal in decibels), this lag disappears and the release behavior becomes much smoother. Most modern digital compressors use this post-gain-computer placement specifically because of these advantages.

This is all good. But where does the sidechain get its input from?

2.3   Feedforward vs Feedback

Two block diagrams: top shows a Feed Forward Compressor where the sidechain reads from the Input before the Gain element; bottom shows a Feed Back Compressor where the sidechain reads from the Output after the Gain element
Figure 2. Feedforward (top) vs Feedback (bottom) topology. In feedforward, the sidechain analyses the clean input signal. In feedback, it analyses the already-compressed output, creating a self-correcting loop.

In a feedforward design, the sidechain listens to the input signal before any gain reduction has been applied. It calculates what needs to happen and applies it. This is precise, predictable, and allows features like look-ahead (where the audio is delayed slightly so the sidechain can start compressing before the transient arrives). The SSL G-series bus compressor and the dbx 160 are feedforward designs.

In a feedback design, the sidechain listens to the output signal after gain reduction. This creates a self-correcting loop: if the compressor over-compresses, the output drops, the sidechain sees a lower level, and it eases off. The result is a softer, more “musical” compression that we can describe as the compressor “breathing” with the music. The 1176 and the LA-2A both use feedback topology. The down side is that feedback designs cannot achieve true limiting (infinite ratio) and cannot use look-ahead.

3   How Compressors Color the Sound

No compressor is transparent. Every compressor changes the tone and character of what passes through it, and understanding the mechanisms of that coloration can help you choose the right tool.

3.1   Distortion from Compressing

The very act of compression – where we modify the audio by a time-varying gain signal – will create new frequencies that weren’t in the original. When you multiply two signals together, their frequency content interacts, producing sum and difference frequencies (intermodulation products). The faster the gain changes, the more the action of the control signal, and the newer content appears.

This is why fast attack times along with quick release times produce more audible coloration than slow ones. A fast attack pulls the gain down quickly on the first peak and holds it, which reduces level without reshaping the waveform. But if the release is also fast enough to recover within a single cycle period, the gain moves up and down multiple times within that cycle, literally reshaping each waveform cycle into a new shape. That is distortion by definition because it is dependent on the signal. A 1 ms attack time means if the threshold is crossed, the gain is changing rapidly during each cycle of a 100 Hz signal. This means the compressor is literally reshaping the individual waveform cycles. And this is distortion by definition. For very low frequencies, even moderate attack times can reshape individual cycles. This is one reason sub-bass content requires careful compressor settings.

3.2   Distortion from the Gain Element

In analog compressors, the component doing the gain reduction is never a perfect multiplier. Each type colors the sound differently.

FET transistors have a curved resistance characteristic. This means the relationship between the control voltage and the amount of reduction is not perfectly linear. This curve generates harmonics, predominantly second and third order, that add a warm, slightly aggressive edge to the sound. The 1176’s character comes partly from this FET nonlinearity.

Vacuum tubes in vari-mu compressors produce predominantly even-order harmonics (second, fourth) when driven asymmetrically, which we interpret as warmth. In a push-pull configuration (like the Fairchild 670), the even harmonics cancel and the remaining odd harmonics (third, fifth) add a different kind of density.

Another type called Diode bridges generate substantial distortion. The Neve 2254 addresses this by attenuating the signal by 40 dB before the diode bridge, operating the diodes in a tiny region where they’re nearly linear, then amplifying by 60 dB after. The architecture works, but the 60 dB of makeup amplification brings up the noise floor significantly. The distinctive thickness and punch of Neve compressors comes partly from this architecture.

VCA chips (particularly the THAT 2180/2181 series) have the lowest distortion of any analog gain element. When a VCA compressor adds coloration, it’s usually from the surrounding circuitry – the input transformers, the output amplifiers, the power supply – rather than the VCA itself.

3.3   Distortion from Timing

When the attack time is shorter than one cycle of the signal’s fundamental frequency, the compressor begins reshaping individual waveform cycles. For a 100 Hz tone, one cycle is 10ms. A 1ms attack time means the compressor changes the gain shape within each cycle, thus causing distortion.

You see, the release time creates different problems. If its too fast, the compressor tries to follow individual half-cycles of the waveform, generating severe harmonic distortion. Too slow, and the compressor stays compressed during quiet passages, squashing the dynamics and raising the noise floor.

4   Seven Ways to Turn the Signal Down

The thing that gives each compressor family its identity is the gain element. This is the physical component doing the level reduction. Broadly speaking, seven methods or ideas have been used, each with a different speed, distortion profile, and sonic character.

4.1   VCA (Voltage-Controlled Amplifier)

A VCA compressor uses a specialized integrated circuit. This can specifically be the Blackmer gain cell topology, where matched transistors are used control gain in response to a control voltage. The important characteristic here is precision. So say every 6 millivolt change in control voltage produces exactly 1 dB of gain change. This makes VCA compressors the most accurate and predictable of all types.

David Blackmer developed the first audio VCA at dbx in 1971 using discrete, hand-matched transistors sealed in a temperature-controlled oven (this is because temperature affects transistor behavior). Later, a company called THAT Corporation put the design onto a single chip (the THAT 2180 and 2181), the matching improved because all transistors share the same silicon.

VCA compressors could be called the workhorses of mixing and mastering. The SSL G-series bus compressor is probably the most influential single compressor in popular music production. VCAs give you the most control, the fastest response, and the cleanest sound which can also make them sound clinical if you’re looking for character.

Example units: dbx 160, SSL G-series, API 2500, Focusrite Red 3, Shadow Hills (VCA stage).

4.2   FET (Field-Effect Transistor)

A FET compressor uses the drain-to-source resistance of a field-effect transistor as a variable resistor in the audio path. When the sidechain applies voltage to the FET’s gate, the resistance drops, pulling the signal level down.

The 1176, designed by Bill Putnam in 1967, defines this category. Its attack time can go as low as 20 microseconds which is roughly a thousand times faster than a typical VCA compressor. The FET responds at the speed of the electric field itself. Combined with its feedback topology and Class A output stage (which adds its own saturation) the 1176 has a distinctive aggressive, exciting character that has made it the go-to for vocals, drums, electric guitars, and anything where you want the compression to be felt.

As a side note, when Gerat, Eichas, and Zolzer (2017) built a detailed digital model of the 1176, they discovered something interesting during its analysis. The 1176’s envelope shape could not be reproduced by a single smoothing filter. They needed three parallel smoothing filters with different attack constants, blended together with carefully weighted mixing. This multi-path envelope is part of what gives the 1176 its distinctive “grab”. Its amazing that it’s not a simple exponential response but a complex, multi-phase trajectory that no other compressor architecture naturally produces.

Example units: UREI/UA 1176, API 525, Empirical Labs Distressor.

4.3   Optical (Opto)

An optical compressor uses a light source and a photocell (light-dependent resistor) sealed in a single package. The sidechain drives the light. This is because the photocell’s resistance changes with illumination, attenuating the audio signal.

The photocell’s response is program-dependent. It responds relatively quickly to increasing light (the attack) but recovers slowly and in two phases. The first is fast, and then gradually tapers off. This two-stage release tracks the rhythm of speech and music naturally, which is why opto compressors are so popular on vocals and dialog. The LA-2A could be called the defining unit. It uses a PerkinElmer T4B optocoupler which has an electroluminescent panel (not an LED), adding its own thermal lag to the photocell’s response. The result is a compressor where the attack, release, and effective ratio are all determined by the physics of the optical element rather than user controls.

There is an ongoing challenge for hardware manufacturers in that the cadmium sulfide in traditional photocells is restricted under EU environmental regulations (RoHS), and no alternative material exactly replicates the response.

Example units: Teletronix LA-2A, LA-3A, Tube-Tech CL 1B.

4.4   Variable-Mu (Vari-Mu)

In a vari-mu compressor, the vacuum tube itself is the gain element. The sidechain changes the tube’s bias, which changes its amplification factor (mu), which changes the output level. The tubes used are “remote cutoff” types that allow gain to be varied smoothly over a wide range. The ratio is not fixed because it increases automatically as the input gets louder. At low levels, you might get 1.5:1 compression. At higher levels, 4:1 or more. This progressive behavior is what gives vari-mu compressors their reputation for musical, forgiving compression that tightens as the signal pushes harder.

The Fairchild 670 is the iconic vari-mu compressor. Its push-pull tube configuration cancels the low-frequency “thump” that would otherwise occur when the bias changes, using common-mode rejection at the output transformer.

Example units: Fairchild 660/670, Manley Variable Mu, Retro Instruments 176.

4.5   Diode Bridge

Four diodes in a balanced bridge configuration act as a variable attenuator. Here, a DC control current biases the diodes, controlling the signal path’s attenuation. The Neve 2254’s 40 dB pre-attenuation / 60 dB post-amplification architecture, as mentioned before, keeps the diodes operating in their most linear region while producing a thick, punchy, harmonically rich character.

Example units: Neve 2254, Neve 33609.

4.6   PWM (Pulse Width Modulation)

This is basically a high-speed electronic switch that runs at ultrasonic frequencies (around 200 kHz). The duty cycle (on-time vs off-time) controls the average signal energy. A precision filter reconstructs the audio. Because the switch is either fully on or fully off, there are no nonlinear transfer curves to generate distortion. The resulting gain control is mathematically clean, with artifacts 118 dB below the signal.

The Crane Song STC-8 is the best-known PWM compressor. It uniquely allows the user to switch between second-harmonic (warm) and third-harmonic (aggressive) distortion characteristics.

Example units: Crane Song STC-8, EMT 156 PDM.

4.7   OTA (Operational Transconductance Amplifier)

This is a simpler, cheaper alternative to the Blackmer VCA. OTAs (such as the LM13700) have more distortion, more noise, and less dynamic range, but their low cost makes them common in budget compressors, synthesizers, and effects pedals.

5   Digital Compressors

The same architecture we saw till now in analog – like sidechain analysis, control signal and so on – applies in digital. But the digital domain brings new capabilities and new problems.

5.1   Look-Ahead

In a digital feedforward compressor, the audio can be delayed by a few milliseconds while the sidechain processes the un-delayed signal. The compressor knows what’s coming and starts reducing gain before the transient arrives. This eliminates overshoot entirely and enables transparent peak limiting with zero distortion. Impossible in analog.

In fact, the FabFilter Pro-C 3 introduced an adjustable maximum lookahead, letting us trade latency against transient capture. In a mixing context with delay compensation, this is purely beneficial.

5.2   Aliasing

When the control signal modifies the audio, new frequencies are created. If those new frequencies exceed half the sample rate (the Nyquist frequency), they fold back into the audible band as aliasing. This can be inharmonic, metallic-sounding distortion that has no analog equivalent.

The traditional solution is oversampling where we run the compressor at 2x, 4x, or higher sample rate internally. This gives room for the new frequencies to exist without folding back. After processing, a steep anti-aliasing filter can remove everything above the original Nyquist, and the signal is downsampled back.

Sierra (2023) showed that this brute-force approach is not always the most efficient. He tested three simpler techniques which were filtering the input signal to create a small frequency gap below Nyquist, filtering the control signal to remove its sharp edges, and gently smoothing the control signal with a simple one-pole filter. The combination of all three achieved better alias rejection than 2x oversampling. This was achieved at one-fifth the CPU cost. His work suggests that a well-designed compressor can sound clean without expensive multi-rate processing. This is something which matters for real-time applications and sessions with dozens of compressor instances.

The Three-Body Technology compressor named Cenozoix took yet another approach. Instead of oversampling the entire signal or just filtering things, Cenozoix applies a mathematical technique called Anti-derivative antialiasing (ADAA), a method invented by Native Instruments and used for waveshapers, directly to the rectifier stage that calculates the anti-aliased output analytically without oversampling. It also intelligently detects when the compressor is in the middle of an attack or release transition (where aliasing actually happens) and applies oversampling only during those moments, dropping back to normal speed during steady-state periods. The result is clean, alias-free compression at a fraction of the usual CPU cost.

6   Multiband Compression

6.1   Why Multiband Exists

In a broadband compressor, the sidechain responds to the total level of the signal. If a bass note is 20 dB louder than the rest of the mix, the compressor ducks everything, including the vocals, the strings, and the cymbals to control that one bass note. Multiband compression splits the signal into frequency bands and compresses each one independently. The bass can be compressed without affecting the treble.

6.2   The Phase Problem

There is a challenge. The crossover filters used to split the signal into bands introduce phase shift. All analog-style (minimum-phase) filters do this because it’s inherent in how they work. When you split a signal into bands using these filters and then sum the bands back together, the phase shifts must cancel perfectly to produce a flat output. And they do. But as long as nothing changes the level of individual bands between the split and the sum.

But that’s exactly what multiband compression does. The moment one band is compressed and another isn’t, the gain difference breaks the phase relationship at the crossover frequency. The bands no longer sum to flat. You get frequency-dependent coloration that varies with the compression depth. This is a subtle but audible smearing or thickening around the crossover regions.

Higher-order crossover filters make this worse. There are ways around this. The Avid Pro Multiband Dynamics uses eighth-order Linkwitz-Riley crossovers (two 24 dB/octave slopes), which provide excellent band separation and sum perfectly flat when no processing is applied. But those filters rotate the phase by 360 degrees through the crossover region. This is specifically why the Avid Pro Multiband Dynamics cannot be used for parallel compression.

In parallel compression, you blend the compressed signal with the uncompressed original. But the compressed signal has gone through the crossover filters (with their phase shift), while the original hasn’t. When you sum them, the phase difference produces comb filtering. These are frequency-dependent cancellations and additions that can change with the compression depth. The deeper the compression, the worse the comb filtering. Any minimum-phase crossover will have this problem and higher-order crossovers make it more severe.

Linear-phase crossovers eliminate the phase problem entirely. The Waves Linear Phase Multiband Compressor uses linear-phase filters that introduce no frequency-dependent phase shift at all. Its just a flat delay. Bands can be recombined without coloration regardless of compression depth, and parallel compression works correctly. The cost is latency (typically hundreds of milliseconds) and pre-ringing on transient material, which makes linear-phase crossovers unsuitable for real-time monitoring.

FabFilter’s Pro-MB takes a different approach entirely. Instead of splitting the whole signal into contiguous bands, it lets you create floating processing bands anywhere on the spectrum. Areas outside the bands pass through untouched. Each band can independently use minimum-phase or linear-phase filtering. In minimum-phase mode, the filter engages only when compression is active, so the phase coloration is present only when the compressor is working. When it releases, the filter disappears and the signal returns to its original phase response.

6.3   The Waves C4 vs C6

The C4 has four contiguous crossover bands. These bands are always in the signal path, even when no dynamics processing is happening. If you insert a C4 on a track and set all the dynamics controls to zero, the signal is still passing through the crossover filters.

The C6 adds two “floating” bands on top of the four crossover bands. These floating bands are not part of the crossover network. They’re independent parametric filters with adjustable bandwidth that can be placed anywhere on the spectrum. Unlike the crossover bands, they’re true-bypass: when they’re not actively compressing, they’re completely removed from the signal path.

This true-bypass design creates its own artifact. When the plugin is active, the filter is inserted into the signal path. When bypassed, the filter is removed. That is a phase discontinuity. The signal’s phase relationship changes instantaneously, which can produce an audible click at the moment the band disengages. It’s most noticeable on material with sustained energy near the band’s center frequency. It’s an inherent consequence of true-bypass architecture because you can’t remove a filter from a signal path without a phase change.

7   Compression for Immersive Audio

A Dolby Atmos 7.1.2 bed has 10 channels. Compressing them raises a fundamental question: should the channels be compressed independently, or should they share a sidechain? If each channel has its own independent sidechain, a sound panning from left to right will be compressed by different amounts at different times. The spatial image shifts and collapses.

If all channels share a single sidechain, the loudest channel (usually the center, carrying dialog) drives the compression for every channel including the quiet surrounds and heights. Dialog peaks pump the entire room ambience. The practical solution is grouped linking: the front L/C/R share a sidechain group, the surrounds share another, the height channels share a third. The center channel can be excluded from the surround group so dialog doesn’t pump the ambience.

Several plugins are designed specifically for this:

PSP auralComp, designed with multichannel specialist Ronald Prent, supports 16 channels with eight independent sidechain submixes. You can configure exactly which channels drive compression for which other channels.

Fiedler Audio Gravitas MDS supports up to 128 channels and uses a triple-detector system that simultaneously tracks the loudest channel, the average level, and the quietest channel in a group.

FabFilter Pro-C 3 supports full Atmos up to 9.1.6 width with continuously adjustable channel linking from fully independent (0%) to fully linked (100%).

8   Modeling Analog Compressors

When you load a plugin that claims to emulate an LA-2A or an 1176, something has to happen between the audio going in and the audio coming out. That something is a mathematical model running in real-time on your CPU. There are different ways to build these models, and understanding them helps you evaluate which plugins are doing what.

8.1   Circuit Modeling (White-Box)

The most ambitious approach starts with the actual schematic of the hardware unit. Every component, every resistor, capacitor, transistor, transformer winding, diode, is described by a mathematical equation that captures how it behaves electrically. A resistor follows Ohm’s law. A capacitor stores charge. A transistor follows the Ebers-Moll or Gummel-Poon equations. The entire circuit is then represented as a system of interconnected equations.

The challenge is solving these equations fast enough. When audio enters the plugin at 44,100 samples per second, the plugin must solve the entire circuit model 44,100 times per second. This is once for every sample. Each time, it feeds in the current input voltage, works through all the component interactions, and produces an output voltage. For a simple circuit like a diode clipper (two components), this is easy. For a Fairchild 670 with 20 tubes, 14 transformers, and hundreds of passive components, the system of equations becomes enormous. So what do we do? Two main techniques are used to solve circuit models efficiently.

SPICE-derived nodal analysis works the way circuit simulators like LTSpice do. It writes an equation for every node (junction point) in the circuit, expressing Kirchhoff’s current law. This states that the sum of currents entering each node must equal zero. This can get a bit complex from here. All of this produces a matrix of linear equations (for the resistors and capacitors) with nonlinear equations embedded (for the tubes, transistors, and diodes). Now, the linear part can be solved directly with matrix algebra. The nonlinear parts require iterative solving where the algorithm makes a guess, checks how far off it is, adjusts, and repeats until it converges on the right answer. This iteration is what makes circuit modeling expensive. A tube amplifier model might need 3-10 iterations per sample to converge, and each iteration involves solving the full matrix.

Wave Digital Filters (WDF) take a different approach entirely. Instead of modeling voltages and currents at circuit nodes, WDFs model the propagation of “wave” signals through the circuit. Each component (resistor, capacitor, inductor) is replaced by a “port” element that scatters incoming waves into reflected waves according to the component’s impedance. The circuit’s topology (which components connect to which) determines a tree structure of “adaptors” that route waves between ports.

The advantage of WDFs is that the linear parts of the circuit are solved in a single pass through the tree with no iteration needed. Only the nonlinear elements (tubes, diodes, transistors) require iterative solving, and they can be grouped at the “root” of the tree where they’re solved together. Kurt Werner’s PhD dissertation at Stanford (2016) extended WDF theory to handle circuits with multiple nonlinearities and complex topologies, making it possible to model complete circuits like the Roland TR-808 bass drum or the Fender Bassman preamp with near-perfect accuracy compared to SPICE simulations.

The practical result is that WDF-based models can run in real-time on modern CPUs for moderately complex circuits. Arturia, u-he, and several other manufacturers use circuit modeling approaches for their analog emulations. The limitation is that truly complex circuits (like the Fairchild 670) push the boundaries of what can be solved in real-time, especially at higher sample rates.

8.2   Sampling-Based Modeling

Acustica Audio’s approach is fundamentally different from circuit modeling. Instead of building a mathematical model of the circuit, they sample the hardware’s behavior directly which is much like how a convolution reverb captures a room’s acoustics by recording its impulse response.

To understand how this works, we start with regular convolution. If you send an impulse (a single click) through a reverb unit and record what comes out, you get an impulse response. This is basically a “fingerprint” of that reverb. You can then convolve or process any audio with that impulse response to make it sound like it went through the reverb. This works beautifully for systems which are linear. Where say doubling the input doubles the output.

But compressors are not linear. Their behavior changes depending on how hard you drive them. A gentle signal gets treated differently from a loud one. A standard impulse response cannot capture this level-dependent behavior.

This is where Volterra series come in. The mathematician Vito Volterra formulated a general theory of nonlinear systems in 1887. The core idea is an extension of convolution where instead of convolving the input with a single kernel (the impulse response), you convolve it with multiple kernels of increasing “order.” The first-order kernel captures the linear behavior (like a regular impulse response). The second-order kernel captures how pairs of input samples interact to produce distortion. The third-order kernel captures three-way interactions, and so on. Each higher order captures more complex nonlinear behavior like harmonics, intermodulation, saturation.

Acustica Audio’s proprietary Vectorial Volterra Kernels Technology (VVKT) implements this concept. They send carefully designed test signals through the target hardware at multiple levels, recording the output each time. From these recordings, the software extracts a set of Volterra kernels that collectively describe the hardware’s behavior across its operating range. When audio passes through the plugin, it’s processed through these kernels simultaneously. The first-order kernel handles the frequency response, while higher-order kernels add the harmonic distortion and saturation characteristics.

The system stores these kernels in tree data structures capable of managing up to 100,000 elements in real-time, with smooth interpolation between kernels when parameters change. For dynamic processors like compressors, the system adds an envelope follower and control logic on top of the kernel engine. This means, the kernels capture the sonic character of the gain element and signal path, while conventional DSP handles the sidechain behavior.

Acustica’s most recent development is NOVA (Neural Optimized Volterra Audio), introduced in their YELLOW plugin in March 2026. This hybrid architecture combines Volterra kernels with WaveNet-style neural networks. At lower signal levels, the Volterra component handles the precise frequency and phase response. As the signal is driven harder, the neural network progressively takes over, reproducing the complex saturation behavior that emerges in real analog gear at high levels. The two models run simultaneously, blending their contributions based on signal level.

The advantage of sampling-based approaches is that they can capture the exact sound of a specific piece of hardware without any knowledge of its internal circuit. The disadvantage is CPU cost because processing audio through thousands of kernels simultaneously is computationally expensive, which is why Acustica plugins have historically been heavier on CPU than other manufacturers.

8.3   Block Modeling (Gray-Box)

Gray-box modeling is the middle ground used by most commercial plugin manufacturers. You know the general structure of the device (it’s a compressor with a sidechain, a gain element, and a smoothing filter), but you don’t model every component. Instead, you build a simplified digital model of each functional block and optimize its parameters to match measured behavior from the real hardware.

Gerat, Eichas, and Zolzer’s 2017 study of the UREI 1176LN is the textbook example. They decomposed the 1176 into blocks made of a linear frequency response filter, a peak level detector, a gain mapping function, and a smoothing filter. Then they fed test signals through both the real hardware and the digital model, measured the difference, and used an optimization algorithm to adjust the model’s parameters until the difference was minimized.

What makes this approach powerful is that the optimization can discover things the designer didn’t expect. Gerat’s team found that the 1176’s smoothing filter required three parallel paths with different attack speeds blended together. This was something not obvious from the schematic. The optimization algorithm found this structure because it produced the closest match to the hardware’s measured output.

When audio passes through a gray-box plugin, it goes through each block in sequence. The input is filtered to match the hardware’s frequency response, the sidechain extracts the level and computes gain reduction, the smoothing filter shapes the envelope, and the gain element applies the result. Each block uses relatively simple DSP operations like IIR filters, lookup tables, multiplications, etc. that are computationally cheap. This is why gray-box plugins tend to be much lighter on CPU than circuit models or sampling-based approaches.

8.4   Neural Network Modeling

The most recent approach treats the hardware as a complete black box. No schematic, no functional decomposition. It is just audio in and audio out. A neural network is trained to reproduce the input-output relationship.

Steinmetz and Reiss (2022) demonstrated this with the LA-2A. Their model uses a Temporal Convolutional Network (TCN). This is a neural network that processes audio as a sequence, looking at the current sample and a window of past samples to predict the output. The network’s “receptive field” – which is how far back it can look – determines how much context it can use. For a compressor with a release time of several hundred milliseconds, the network needs to see at least that far into the past.

A separate small neural network (a multilayer perceptron) takes the hardware’s control settings like the LA-2A’s peak reduction knob and compress/limit switch and generates adaptation parameters that modify the main network’s behavior at each layer. This is how a single model can emulate the hardware across its entire range of settings.

When audio passes through the trained model, each input sample flows through the four convolutional layers, each one looking at a progressively wider window of past context. The network has learned from training on 10 minutes of input-output recordings from the real LA-2A, what output to produce for any given input in any given context. In fact even in listening tests, the results were close enough to the real hardware that most listeners could not distinguish them.

8.5   Which Approach Sounds Best?

This depends entirely on what you value. Circuit modeling is the most physically “correct” but limited by computational cost for complex circuits. Sampling-based modeling (Acustica’s VVKT) captures the exact sonic fingerprint of a specific piece of hardware but is CPU-heavy. Gray-box modeling offers the best balance of accuracy, CPU efficiency, and flexibility. This is why most plugin manufacturers use it. Neural network modeling achieves remarkable accuracy with minimal hardware knowledge but produces a model that is essentially a mathematical black box. This means you can’t tweak individual circuit components because the model doesn’t know they exist.

For us, the truth is that the differences between well-executed implementations of any of these approaches are subtle. A carefully optimized gray-box 1176 plugin and a well-trained neural network 1176 model, both calibrated against the same piece of hardware, will sound more similar to each other than either sounds different from the hardware. The audible differences between compressor plugins have more to do with the care taken in the calibration and the design decisions (which aspects of the hardware behavior were prioritized) than the underlying modeling technology.

8.6   Intelligent Compression

The idea of a compressor that automatically adapts to the audio or matches the dynamics of a reference recording did not originate in the academic space. It came from the mind of a legend named Paul Frindle.

Frindle is one of the most important yet least celebrated figures in audio engineering. He designed the SSL E and G series consoles – the desks that defined the sound of modern mixing. He was a key architect of the Sony Oxford OXF-R3, widely regarded as the finest digital mixing console ever built. After leaving Sony Oxford, he co-founded Pro Audio DSP and in 2008 released the Dynamic Spectrum Mapper (DSM) – a plugin that was, and arguably still is, years ahead of the research that eventually caught up to it.

The DSM captures both the spectral response AND the dynamic behaviour of a reference audio signal and uses that captured information as a threshold curve for a multiband dynamics processor with far more bands than any conventional multiband compressor. You play a section of audio that sounds the way you want your material to sound. This may be the best phrase of a vocal performance, or a reference master, and the DSM analyzes its frequency distribution and dynamic characteristics. It then uses that analysis as the baseline for compression. This means gain reduction occurs in each frequency band only when the current audio exceeds the captured threshold in that band. The result is that the processed audio’s spectral balance and dynamics are pushed toward the reference. The clever part is that this is not using static EQ matching, which only addresses average frequency balance, but through dynamic spectral processing that responds to the signal in real-time. And critically, it does this without the artifacts that trouble conventional multiband processors. Since the band thresholds are derived from the actual spectral content of the programme, the processing is musical and not arbitrary. Bands that aren’t being compressed pass through bit-for-bit identical to the input.

It is important to know that this was 2008. Giannoulis, Massberg, and Reiss published their parameter automation paper in 2013. Singh, Bromham, Sheng, and Fazekas published their neural network reference-matching compressor in 2021. Steinmetz, Bryan, and Reiss published their differentiable style transfer system in 2022. All of these academic works address aspects of what Frindle had already shipped as a commercial product years earlier. The DSM is now at version 3, distributed through Plugin Alliance, and remains one of the most sophisticated dynamics processors available.

The academic contributions that followed are still valuable. Giannoulis’s work formalized the automation approach with rigorous analysis, Singh’s neural network removes the need for manual capture by automatically extracting features, and Steinmetz’s differentiable framework enables training without labeled data. But the lineage should be acknowledged. In my opinion, Frindle got there first by solving real mixing and mastering problems with nearly 50 years of engineering experience behind him.

Following Frindle’s pioneering work, the academic community developed several complementary approaches. The most recent research goes beyond emulation into territory that analog hardware simply cannot reach.

Giannoulis, Massberg, and Reiss (2013) built a compressor where every parameter except threshold is automatically set based on real-time analysis of the input signal. The compressor extracts features like the signal’s crest factor, spectral content, and transient density, then uses those features to set the ratio, attack, release, and knee. The user only decides how much compression they want.

Singh et al. (2021) built a plugin that matches compression to a reference recording. You provide a track that sounds the way you want your material to sound, and the plugin analyzes it and sets its parameters to match. User studies with amateur and professional producers showed broad acceptance – professionals liked the speed boost, amateurs liked the removal of the knowledge barrier.

Brunet, Li, and Kim (2023) proposed the most radical departure: using machine learning to predict what the signal will do in the near future, then applying compression “just-in-time” without any attack or release time constants at all. The ML model forecasts the upcoming level, and the compressor applies the appropriate gain reduction before the signal arrives. This eliminates the fundamental trade-off between fast attack (catches transients but distorts) and slow attack (preserves shape but misses peaks). In listening tests, this approach was preferred over traditional compression for its transparency and natural transient handling.

8.7   Does Modeling Technology Actually Help?

This is a question I have always had. After decades of research, do neural networks and machine learning actually give us better compressors?

For emulating specific hardware, the answer is yes. We’ve reached the point where the differences between the real LA-2A and a well-trained neural model are almost indistinguishable. If you want an LA-2A sound and can’t afford the hardware (or need 40 instances simultaneously), the technology can deliver.

But “accurate emulation” may not be the most important benefit. Every hardware compressor will sound different to each other. The bigger wins are in the intelligent automation features. A compressor that automatically adapts its timing to the signal content as Giannoulis showed, saves time and avoids artifacts. A plugin that matches the dynamics of a reference track as Singh showed, removes barriers for less experienced users. An auto-threshold feature like FabFilter Pro-C 3 which dynamically adjusts the threshold to maintain consistent compression regardless of input level can save a dialog mixer hours of threshold automation across hundreds of edits from different scenes and takes.

In fact, Campbell, Paterson, and van der Linde (2017) tested 130 listeners and found moderate compression is preferred over heavy compression, limiting on the master bus is the most disliked configuration, and compression applied to fewer signals simultaneously sounds better than compression applied to pre-mixed groups. These preferences held regardless of the compression algorithm used. This means they’re about how much and where, not which algorithm.

Technology has solved the accuracy problem for hardware emulation. The genuine benefit is now in intelligent automation, style transfer, and ML-based prediction that eliminates the compromises present in traditional compressor architecture.

9   Modern Compressor Plugins

I decided to take two products to illustrate the two directions of modern compressor design.

9.1   FabFilter Pro-C3

In my opinion, this is the most feature-complete single-plugin compressor currently available. But beyond the feature list, several of Pro-C 3’s innovations show genuine shifts in how modern compressors work.

14 compression algorithms, including six new styles like Versatile (general-purpose), Smooth (bus glue), Upward (pumping upward compression), TTM (combined downward and upward across multiple bands), Op-El (optical-tube emulation), and Vari-Mu (variable-mu tube character). What’s significant here is that each algorithm is a different sidechain architecture with its own detector type, gain computation method, and envelope shape. Switching between styles changes the entire compression engine, not just the character.

Character Mode adds analog-style saturation with adjustable Drive and Pre/Post compression routing. The Pre/Post routing is the radical part. When saturation is placed before compression (Pre), the sidechain sees the saturated signal and makes different compression decisions than it would on the clean signal. The compression envelope responds to the distortion. This creates a fundamentally different interaction between saturation and dynamics than simply putting a saturator after a compressor on the insert chain. When saturation is placed after compression (Post), the harmonics are added to the already-compressed signal, so the compression behavior is unaffected. These are two genuinely different sounds from the same controls, and the distinction matters in practice.

Auto Threshold is perhaps the most radical feature for film work. Traditional compressors have a fixed threshold. So you set it at say -20 dBFS and the compressor engages when the signal crosses -20 dBFS. If the input level changes – maybe for a different take, a different scene or a different actor – the amount of compression changes. A quiet whisper might never reach the threshold, while a shout might trigger 15 dB of gain reduction. We end up having to automate the threshold or manually adjust it for every change in input level.

Auto Threshold dynamically shifts the threshold to track the input level, maintaining a consistent amount of gain reduction regardless of how loud or quiet the input is. If the input drops 10 dB, the threshold drops 10 dB. The compressor always applies the same depth of compression. For dialog in film, where input levels can vary 20 dB or more between a whispered confession and a shouted argument, across different takes, different microphones, different ADR sessions – this eliminates hours of manual threshold riding. It solves at the plugin level what Giannoulis et al. tried to solve at the research level in their 2013 parameter automation paper.

9.2   Three-Body Technology Cenozoix

While Pro-C3 is about features and ecosystem, Cenozoix is about solving fundamental problems in digital compression that have been accepted as unavoidable for decades.

The core problem is that digital compression creates aliasing. Every time the control signal modulates the audio signal, new frequencies are generated. If those frequencies exceed the Nyquist limit, they fold back as inharmonic distortion. The standard industry solution for 20 years has been oversampling where we run the entire process at 2x or 4x the sample rate to push the Nyquist frequency higher, giving the new harmonics room to exist without folding back. It works, but it’s expensive: double the sample rate means double the computation for every operation in the signal chain.

Cenozoix takes a very different approach. Its innovation is something called Anti-Derivative Anti-Aliasing (ADAA), a technique originally developed by Julian Parker, Vadim Zavalishin, and Efflam Le Bivic at Native Instruments, first published at the DAFx conference in 2016. The technique was initially created for reducing aliasing in nonlinear waveshaping (distortion, saturation, clipping) and has since been adopted more widely in virtual analog modeling. Cenozoix applies it specifically to the rectifier and gain stages of the compressor which are the nonlinear operations where aliasing is actually generated.

To understand what ADAA does, think about what happens when a compressor processes audio normally. At each sample, the plugin evaluates a nonlinear function like a rectifier in the sidechain, a soft clipper, or the gain multiplication itself. The problem is that between samples, the signal is moving, and the nonlinear function is generating harmonics that the sample rate can’t capture. When you only evaluate the function at each discrete sample point, you miss everything happening between those points. Right there, those missed harmonics fold back as aliasing.

The idea behind ADAA is elegant. Instead of evaluating the nonlinear function at each sample point, compute the area under the function between consecutive samples. What this gives you is the average value of the nonlinear function over the interval between samples and not just its value at the sample points.

This matters because the continuous-time behavior between samples is exactly where the aliasing-generating harmonics live. So by computing the average over the interval rather than point-sampling, ADAA effectively applies a continuous-time lowpass filter to the nonlinear function’s output before it gets sampled back to discrete time. This means that the harmonics that would have exceeded Nyquist are suppressed analytically without any oversampling, without any multi-rate processing, without any FIR or IIR anti-aliasing filters. This is very clever.

Cenozoix applies this technique to the specific nonlinear stages within the compressor where aliasing originates like the rectifier, the gain mapping, and the waveshaping. This results in a compressor that sounds clean without requiring expensive multi-rate processing.

The second radical feature is Adaptive Oversampling Interpolation. The oversampling choice is decided in real-time. The plugin monitors its own control signal. When the control signal is steady – meaning the compressor is either fully engaged at a constant gain reduction, or not compressing at all – there is no spectral expansion happening and oversampling serves no purpose. The plugin drops to 1x processing, using minimal CPU. The moment the control signal starts changing – say during an attack or release transition – the plugin ramps up to higher oversampling factors for the duration of that transition. When the transition settles, it drops back down.

This is very clever because aliasing in a compressor is not constant. It only happens during gain changes. In a typical mix, a compressor might be in steady state 70-80% of the time. Adaptive oversampling means the plugin uses full CPU resources only during the 20-30% of the time when aliasing is actually being generated, cutting the average CPU load dramatically.

10   Choosing Compressors for Film Sound

10.1   Dialog

The overriding requirement for dialog compression in film is invisibility. The audience should never hear the compressor working.

Optical compression is the natural first choice. The LA-2A’s program-dependent release tracks the rhythm of speech naturally. Its soft, increasing ratio means gentle compression on normal dialog and firmer control on shouts, without an abrupt transition. The photocell’s averaging behavior ignores brief transients (consonant pops, lip smacks) and responds to the overall vocal energy. For dialog, the “inaccuracy” of the opto element is a feature.

For bus-level dialog compression in a stem mix, a VCA compressor with slow attack (10-30 ms), medium release (100-200 ms), and a low ratio (2:1 to 3:1) provides gentle level control without adding character. The SSL G-series bus compressor or the FabFilter Pro-C 3 in Clean or Smooth mode work well here.

FabFilter Pro-C 3’s Auto Threshold is particularly valuable when processing dialog from multiple scenes, takes, and actors. The input level can vary by 20 dB or more, and Auto Threshold maintains consistent compression depth without manual threshold automation. For ADR sessions where the recording level may differ significantly from the production sound, this can save hours.

10.2   Guns

Gunshots are among the most challenging sounds to compress because they combine an extremely fast transient (the initial crack, which can be as short as 50 microseconds) with a complex reverberant tail that varies enormously with the space.

A FET compressor is the primary tool. The 1176’s 20-microsecond attack time can capture the leading edge of the transient before it clips the output. Fast attack (under 1 ms), medium release (50-100 ms), and a moderate ratio (4:1 to 8:1) will control the peak while letting the body and tail breathe naturally. The 1176’s harmonic distortion adds weight and density to the impact, which helps the gunshot feel powerful rather than just loud.

After the FET, a VCA limiter provides a hard ceiling. A fast-attack brickwall limiter (like the FabFilter Pro-L 2 or Avid Pro Limiter) catches any peaks that escape the FET compressor. The limiter should have look-ahead enabled to catch the absolute peak before it passes.

For automatic weapons (rapid sustained fire), consider a slightly slower attack on the FET (2-5 ms) to let the initial transient of each shot punch through before the compressor engages. This preserves the rhythmic character of the gunfire while controlling the overall level.

10.3   Explosions

Explosions occupy a huge frequency range from sub-20 Hz pressure waves through midrange debris to high-frequency shrapnel. Broadband compression on an explosion is almost always wrong because the massive low-frequency energy will dominate the sidechain and pump everything else.

Multiband compression is the answer. A VCA multiband compressor (FabFilter Pro-MB, or Avid Pro Multiband) allows independent control of the sub-bass rumble (below 80 Hz), the midrange body and crack (200 Hz – 2 kHz), and the high-frequency debris and air (above 4 kHz). The sub-bass can be limited firmly to prevent speaker damage or clipping without affecting the mid and high frequencies that give the explosion its detail and spatiality.

On the sub-bass band, use a fast attack with a moderate ratio (4:1) to control the pressure wave. On the mid band, use a slower attack (10-20 ms) to let the initial crack through. On the high band, use gentle compression or none at all because the debris and air give the explosion its sense of scale.

For the initial impact transient, a FET compressor on the attack layer of the explosion (if the explosion has been designed with separate attack and sustain layers) adds weight and aggression.

10.4   Footsteps

Footsteps are transient-heavy signals with very little sustain. The key challenge is controlling the peak level without removing the natural attack character that makes a footstep sound real.

VCA compression with a fast attack (1-5 ms) and fast release (30-50 ms) provides clean, transparent peak control. The ratio should be moderate (3:1 to 4:1) since heavy compression on footsteps makes them sound unnatural and “squashed.” A VCA is preferred over a FET here because the goal is transparent control, not character. The FET’s harmonic distortion can make footsteps sound colored and unnatural.

Sidechain filtering is important. High-pass the sidechain at 100 Hz to prevent the low-frequency thump of the step from over-triggering the compressor (which would suppress the higher-frequency detail like the scrape, the texture, the surface character).

For bare feet, soft shoes, or any footstep where delicacy matters, use less compression or none at all and rely on volume automation instead.

10.5   Foley (Taps, Grabs, Cloth, Props)

Foley is the most varied category. Cloth movement, object handling, body touches, prop manipulation and so on. All these sounds range from extremely quiet (cloth rustle) to moderately loud (a glass dropping onto a table).

For general foley handling (picks, sets, grabs), use VCA compression with gentle settings. Slow attack (10-20 ms) to preserve the natural transient of the pick or set. Medium release (100-150 ms). Low ratio (2:1 to 3:1). The goal is to bring up the quiet detail (the finger texture on a glass, the cloth sliding over a surface) while preventing the louder moments (the glass being set down) from jumping out. Makeup gain brings the overall level up, and the compression prevents the peaks from exceeding the dynamic window.

For cloth movement and body foley, optical compression works beautifully. The LA-2A’s program-dependent behavior tracks the slow, continuous energy of cloth and body movement without reacting to brief transients. The smoothness of the opto element matches the inherently smooth character of these sounds.

For prop manipulation with sharp transients (keys, coins, weapons being handled), use a VCA with a slightly faster attack (5-10 ms) to catch the sharp metallic transients. Be careful not to over-compress since the audience needs to hear the detail of the object being handled.

For very quiet foley (breathing, subtle hand movements), consider whether compression is needed at all. These sounds are often better served by careful volume automation. If compression is used, keep it extremely gentle (1.5:1 ratio, slow attack) or use upward compression/expansion to bring up the quiet details without affecting the peaks.

10.6   Fights

Fight scenes combine impact sounds (punches, kicks, body falls), vocal exertion (grunts, shouts, breathing), and cloth/movement foley, all happening simultaneously and at widely varying levels.

For impacts (punches, body hits), FET compression captures the fast transient of the impact and adds the aggressive, dense character that makes a punch feel powerful. Fast attack (0.5-2 ms) to just let the transient through, medium release (50-80 ms) and moderate to heavy ratio (4:1 to 8:1). The 1176’s harmonic distortion adds weight. For particularly heavy impacts (body slams, thrown into walls), consider parallel compression where we can blend the heavily compressed signal with the dry signal to maintain the transient impact while adding density and sustain.

For fight dialog (grunts, shouts, strained breathing) use optical compression. The LA-2A handles the wide dynamic range of fight vocalizations naturally. So a quiet grunt and a loud shout are both controlled without the compressor pumping. The program-dependent release can track the irregular rhythm of fight vocalization without artifacts.

For fight cloth and movement, a VCA with gentle settings, similar to general foley but with slightly faster attack (5-10 ms) to handle the more energetic movements.

On the fight premix bus, where all these elements come together, a VCA bus compressor with slow attack (20-30 ms), medium release (100-200 ms), and a low ratio (2:1) provides gentle “glue” that makes the impacts, vocals, and cloth feel like they inhabit the same space. Do not over-compress the fight bus because the dynamic range between a quiet body shift and a devastating punch is what makes the fight feel real. Campbell et al.’s research (2017) applies directly here. Moderate compression on the individual elements, minimal compression on the combined bus.

10.7   Music Stems

The listener preference research (Campbell et al., 2017) indicates compression on individual instrument groups is preferred over heavy compression on the music bus. Limiting on the music bus was the most disliked configuration in their study of 130 listeners. Compression applied to fewer signals simultaneously sounds better than compression applied to pre-mixed groups, because in a pre-mixed bus, tracks whose levels would not have reached the threshold alone are affected by peaks in other tracks they are grouped with.

For a film music stem, compress the individual instrument groups (strings, brass, woodwinds, percussion, synths) with appropriate compressors for each, then apply only gentle bus compression to the combined music stem.

10.8   The Volume Automation

But keep this in mind. In film post-production, the most effective dynamics control is not compression. It is volume automation. A properly volume levelled dialog track, where every phrase has been leveled by hand, needs very little compression. The compressor’s job in film is to catch what automation cannot – like the unpredictable peaks, the sibilance, the unexpected shout. The compressor is to be considered as the safety net, not the primary tool. This is the opposite of music mixing, where compression is often a creative instrument. In film, the goal is for the audience to forget that technology exists between them and the story.

11   The Art and Science of Limiting

11.1   How a Limiter Differs from a Compressor

A limiter is a compressor with an infinite (or near-infinite) ratio. That single difference changes everything about how the device behaves, how it distorts, and how it must be designed.

With a compressor at 4:1, a signal that exceeds the threshold by 20 dB comes out exceeding it by 5 dB. There is headroom above the threshold for the signal to move. With a limiter at infinity-to-one, a signal that exceeds the threshold by 20 dB comes out at the threshold. Period. No headroom, no movement, no exceptions. The output level is clamped to a hard ceiling.

This means the limiter must be fast. In fact, much faster than a typical compressor. If a transient exceeds the ceiling by even a fraction of a dB for even a single sample, the limiter has failed. And to top it, it must achieve this speed without audibly distorting the audio, which is the fundamental challenge at the heart of all limiter design.

In a compressor, you have the luxury of a slow attack where you can let transients through because the ratio provides only partial reduction. In a limiter, there is no such luxury. If the attack is not fast enough to catch the transient before it exceeds the ceiling, the output clips. This is why limiters almost universally use look-ahead where the audio is delayed by a few milliseconds so the sidechain can begin reducing gain before the transient arrives.

11.2   Waveshaping Functions

At the heart of every limiter is a transfer function. This is the mathematical curve that maps input levels to output levels. In a simple hard clipper, anything above the ceiling is cut flat. This produces a sharp corner in the waveform that generates harmonic distortion. The sharper the corner, the more high-frequency harmonics are created, and the harsher the sound. Think of a square wave.

The goal of a transparent limiter is to round that corner, so the waveform approaches the ceiling smoothly rather than hitting it like a wall. Different mathematical functions produce different curve shapes, and each shape has a different distortion profile.

Graph showing Hard Clipper Response: output gain is flat below the threshold, rises linearly through the linear region, then clamps flat at the ceiling above - producing a sharp corner at the clipping point
Figure 3. Hard Clipper transfer curve. The sharp corner where the linear region meets the ceiling generates high-order odd harmonics (3rd, 5th, 7th…), perceived as harsh and buzzy distortion.

Hard clipping is the simplest approach. Output = min(input, ceiling). The waveform is chopped flat at the ceiling. This generates odd-order harmonics (3rd, 5th, 7th, 9th…) similar to a square wave that the ear perceives as harsh and buzzy. Hard clipping is used deliberately in mastering when a few dB of invisible peak shaving is needed, but it becomes audible quickly.

Graph showing tanh (hyperbolic tangent) Response: a smooth S-curve that starts linear for small inputs, then gracefully curves toward the ceiling it approaches but never quite reaches - no sharp corners anywhere
Figure 4. The tanh (hyperbolic tangent) transfer curve. No sharp corners anywhere in the transition. The smooth approach to the ceiling generates predominantly low-order harmonics (2nd and 3rd), perceived as warm and musical rather than harsh.

The tanh (hyperbolic tangent) function is perhaps the most important waveshaping curve in digital audio. It starts linear for small inputs (below the threshold, the signal passes through unchanged), then curves smoothly toward a ceiling that it approaches but never quite reaches. Think of it as a natural, mathematically elegant soft clipper. The key property is that the curve has no sharp corners anywhere since the transition from linear to saturated is perfectly smooth, and the harmonics generated are predominantly low-order (2nd and 3rd), which the ear perceives as warm and musical rather than harsh.

Graph comparing standard tanh curve (red) against a stateful tanh implementation (black and blue lines), showing slight divergence as the signal approaches saturation - the stateful version behaves more like analog hardware with thermal memory
Figure 5. Standard tanh (red) versus a “stateful” tanh implementation (black/blue). The stateful version adjusts its curve based on recent signal history, mimicking the thermal memory of analog components.

The tanh function has become something of a secret weapon in DSP development. The Instruo tanh[3] is a hardware Eurorack module that does nothing but apply the tanh function to three channels of audio. It’s used as a limiter, a saturator, a feedback controller, and a waveshaper because the function is that versatile. In the software world, the developer behind Variety of Sound (known for high-quality free plugins) wrote extensively about implementing tanh-based saturation, noting that a naive implementation is computationally expensive. He developed a “stateful” version of tanh where the function remembers recent signal history and adjusts its behaviour accordingly. This means the curve changes subtly based on what happened in the previous few samples. This makes the saturation respond more like analog hardware (where components have thermal memory) than a static mathematical function.

Graph showing arctan(x) in red and arccot(x) in green plotted together. The arctan curve has a similar S-shape to tanh but approaches its ceiling at a different rate, generating a slightly different harmonic balance.
Figure 6. arctan(x) and arccot(x) waveshaping curves. Arctan is similar in shape to tanh but produces a slightly different harmonic balance – used in sinusoidal shaping approaches to limiting.

Soft clipping with polynomial curves uses quadratic or cubic functions to round the transition. Simpler to compute than tanh but with a less elegant harmonic profile. Many budget limiters use polynomial soft clipping.

Sinusoidal shaping uses sin(x) or arctan(x) for the clipping region. Arctan has a similar shape to tanh but generates a slightly different harmonic balance.

The choice of waveshaping function is one of the primary factors that determines whether a limiter sounds “transparent” or “coloured.” Developers spend significant effort tuning these curves and their transitions for the specific character they want.

11.3   Attack and Release in a Limiter

In a compressor, we know attack and release times are user-controlled creative parameters. But in a limiter, they are engineering challenges.

Attack

The attack of a limiter must be essentially instantaneous. The gain must be reduced to the required level before the transient reaches the output. This is achieved through look-ahead where the audio is buffered (typically 1-5 ms), and the sidechain analyzes the buffered audio to predict the required gain reduction. By the time the audio reaches the gain element, the gain has already been set.

The problem is that reducing gain instantaneously creates its own distortion. If the gain drops by 6 dB in a single sample, that’s a step change in the audio waveform – essentially a click. The attack must be smoothed over some minimum number of samples to avoid this artifact. But the more you smooth the attack, the more likely a fast transient will slip through before the gain reduction is fully applied.

This is the challenge. Fast attack catches everything but distorts; slow attack is clean but lets peaks go. Every limiter design is a different solution to this tension.

The FabFilter Pro-L 2, Masterplan and similar modern limiters offer multiple limiting “styles” that represent different attack strategies. A “transparent” style uses a longer, more gradual attack with aggressive look-ahead. An “aggressive” style uses a sharper attack that catches transients more completely but introduces more gain-change distortion. A “modern” style might combine look-ahead with soft clipping at the output to catch anything that leaks through.

Release

The release of a limiter is equally critical and often more audible than the attack. After a loud transient has been limited, the gain must return to unity (no reduction). If the release is too fast, the gain recovers between individual cycles of the waveform, producing severe harmonic distortion. This is because the limiter is essentially modulating the waveform shape at audio frequencies. If the release is too slow, the limiter stays “ducked” after the transient, suppressing the quieter material that follows and creating the dreaded “pumping” or “breathing” effect.

Most modern limiters use program-dependent release where the release time adapts automatically based on the signal content. Short transients get a fast release (so the gain recovers quickly before the next note). Sustained loud passages get a slower release (to avoid the gain pumping up and down on every cycle). Some limiters even use multiple release stages – a fast initial release followed by a slower final release. Almost similar to the two-stage release of optical compressors.

The Newfangled Audio Elevate limiter, designed by Dan Gillespie (a 15-year Eventide DSP veteran), takes release to an extreme. It independently adapts the attack, release, and look-ahead parameters for each of its 26 frequency bands. A kick drum transient in the low band gets a different release profile than a cymbal crash in the high band. This per-band adaptive approach is why Elevate avoids the artifacts that plague broadband limiters where the low end isn’t dictating the release behaviour of the entire signal.

11.4   Limiting in Analog vs Digital

Analog Limiting

In the analog world, limiting is constrained by physics. The gain reduction element (FET, VCA, tube) has a physical response time. The sidechain has capacitors and resistors that define the attack and release constants. And there is no look-ahead because the circuit can only respond to what has already happened.

The 1176 in “all buttons in” mode is the closest thing to a limiter in the classic analog world, but it’s not a true brickwall limiter. Peaks can and do exceed the threshold. The Fairchild 670 can achieve very high ratios, but again, not infinity-to-one. True brickwall limiting in analog requires either very fast gain elements (like FETs) or clipping diodes at the output, which introduce their own distortion.

Analog tape also acts as a natural limiter. As the signal level approaches the tape’s saturation point, the magnetic particles can’t accept any more flux, and the signal is soft-clipped. This “tape limiting” generates predominantly third-harmonic distortion and has been a foundational part of the analog recording sound for decades. The gentle, progressive saturation of tape is one of the reasons analog masters from the 1960s-1980s could be pushed louder without sounding harsh.

Digital Limiting

Digital limiters can do things that are physically impossible in analog. Look-ahead eliminates the attack problem entirely. The limiter knows what the signal will do before it happens and can prepare accordingly. No analog circuit can do this.

Sample-accurate gain control means the gain can be adjusted at every single sample with mathematical precision. No analog gain element has this level of control. Oversampling allows the limiter to work at higher-than-native sample rates, reducing the aliasing generated by the waveshaping process. Multiband limiting (like Elevate’s 26-band approach) is impractical in analog because of the crossover phase issues, but digital linear-phase crossover filters make it viable. True Peak detection and limiting is impossible in analog because the concept of inter-sample peaks doesn’t exist in the continuous analog domain.

11.5   Transparent Limiting

A truly transparent limiter would control peaks without any audible change to the audio other than level reduction. In practice, this is extraordinarily difficult because limiting is, by definition, a nonlinear process, and nonlinear processes generate distortion.

The distortion from limiting comes from several sources. Every time the gain changes, the shape of the waveform is modified. The faster and deeper the gain change, the more the waveform is altered. This is fundamental because you cannot change the gain of a signal without changing its waveshape, and changing the waveshape is distortion. When the limiter reduces gain on a complex signal (with multiple frequency components), the gain reduction modulates all components simultaneously, generating sum and difference frequencies that weren’t in the original. The FIR filters used in look-ahead processing can introduce pre-ringing – a faint reverse echo before transients. This is most audible on percussive material and is one reason some limiters offer adjustable look-ahead times. During the release phase, the time-varying gain modulates the signal, generating spectral content that varies with the release profile.

Developers take different approaches to minimise these artifacts. The AOM Invisible Limiter minimises the difference between the limited and original signals. The FabFilter Pro-L 2 offers eight different limiting algorithms, each representing a different balance of transparency, aggression, and transient handling. Elevate’s 26-band approach reduces intermodulation by limiting each frequency band independently.

As one plugin developer noted on KVR: “limiter is the one plugin category where the gap between ‘sounds fine in isolation’ and ‘holds up at the final stage’ is brutal.” The difference between a good limiter and a great one only becomes apparent at the very end of the mastering chain, where every artifact accumulates.

11.6   True Peak Limiting

What Are Inter-Sample Peaks?

Digital audio exists as a series of sample values or discrete measurements taken at regular intervals (44,100 or 48,000 times per second). Between these samples, the actual analog waveform that a DAC reconstructs can peak higher than any individual sample value. These are called inter-sample peaks (ISP) or True Peaks.

Graph showing the difference between discrete digital sample points (dots connected by vertical lines) and the true continuous analog waveform reconstructed between them (smooth curve). The true waveform peaks visibly higher than the sample values at certain points.
Figure 7. True Peak vs Digital Samples. The dots are the sample values a standard peak meter sees. The continuous curve is what a DAC actually reconstructs – it can exceed the sample values by up to 6 dB in worst-case scenarios.

Imagine two consecutive samples, both at -0.5 dBFS. The analog waveform reconstructed between them could peak at 0 dBFS or even higher, depending on the waveform shape. A standard sample-peak meter would read -0.5 dBFS and report no clipping. But when a DAC reconstructs this signal, the output clips.

Inter-sample peaks can exceed the sample values by up to 6 dB in theoretical worst cases. Still, if a broadcast delivery specification requires a True Peak maximum of -1.0 dBTP (as many streaming platforms do), even a +0.5 dB ISP is a failure.

True Peak Meters

The ITU-R BS.1770 standard (now at revision 5) specifies the measurement method. The signal is upsampled by 4x using a specific FIR interpolation filter (with coefficients defined in the standard). At this higher sample rate, the previously “invisible” peaks between samples become visible as actual sample values. The absolute value of the upsampled signal is taken, and the result is the True Peak level.

True Peak Limiters

A True Peak limiter operates at the upsampled rate. It upsamples the audio (typically 4x), performs the limiting at the higher rate where inter-sample peaks are visible as real samples, then downsamples back. Any peak that would have existed between the original samples is caught and limited at the upsampled rate.

FabFilter Pro-L 2 implements this as a dedicated post-limiting stage. The main limiter operates normally, then a True Peak limiting process catches any remaining ISPs. FabFilter’s documentation notes that combining 4x oversampling with a minimum 0.1 ms look-ahead keeps inter-sample peaks within 0.1 dB of the ceiling in most cases.

The additional processing for True Peak limiting introduces about 5 ms of extra latency. For offline mastering this is invisible (DAW delay compensation handles it). For real-time monitoring, it’s usually acceptable.

11.7   The State of the Art in 2026

Newfangled Audio Elevate

Dan Gillespie’s Elevate represents the most radical rethinking of limiter architecture in the past decade. Its core innovation is using 26 filter bands based on the Mel scale (which models the critical bands of human hearing) rather than the 3-6 bands of traditional multiband limiters. Each band has independently adaptive attack, release, and look-ahead parameters.

The philosophy is that a limiter sounds transparent when it reduces gain in only the specific frequency region that needs it, rather than pulling down the entire signal to control a peak in one band. When a kick drum peaks in the low end, only the low-frequency bands reduce gain while the mids and highs are unaffected. When a snare crack peaks in the upper mids, only those bands respond. The listener perceives preserved dynamics because the majority of the frequency spectrum is not being limited at any given moment.

Elevate also includes a spectral clipper at the end of its chain – a waveshaper that can be driven for additional loudness after the multiband limiting. The Shape control morphs between soft-knee saturation and hard-knee clipping, and the interaction between the multiband limiter (which preserves dynamics) and the spectral clipper (which adds loudness through harmonic saturation) is where much of Elevate’s distinctive sound comes from.

Musik Hack Master Plan

Master Plan, by Sam Fischmann, represents a different philosophy entirely. It’s not a precision mastering limiter for engineers who want granular control, but an integrated mastering chain for producers who want results fast. At its centre is a limiter/clipper circuit that combines soft clipping with look-ahead limiting, controlled by a single “Loud” knob that can deliver up to 24 dB of boost while keeping the output below ceiling.

What makes Master Plan interesting from a technical standpoint is the combination of approaches it uses under the hood. The limiter circuit blends clipping and limiting – which are two fundamentally different peak control strategies. Clipping (waveshaping) handles the fastest transients with zero latency by simply reshaping them, while the look-ahead limiter handles the sustained peaks that would distort under clipping alone. When TruePeak mode is engaged, it handles inter-sample peaks with incredible precision.

FabFilter Pro-L 2

The most widely used mastering limiter in current production, offering eight limiting styles from transparent to aggressive, full True Peak metering and limiting compliant with ITU-R BS.1770, and support for surround formats up to Dolby Atmos 7.1.2.

11.8   Film Post-Production

In film post-production, limiters serve two distinct purposes:

Safety limiting on the mix bus prevents the output from ever exceeding 0 dBFS (or -1 dBTP for True Peak compliance). This should be transparent so that the audience never hear it engage. Set the ceiling at -0.3 dBFS (for sample-peak safety) or -1.0 dBTP (for True Peak compliance with streaming and broadcast specifications). The limiter should catch only the rarest, most extreme peaks. If the limiter is engaging regularly, the mix needs level automation, not more limiting.

Dialog protection limiting prevents sudden shouts or unexpected transients from exceeding the dialog level range. A fast limiter with look-ahead on the dialog bus catches these before they hit the mix bus. This is a safety net much like the compressor, where the primary dynamic control should be volume automation, with the limiter catching what automation cannot.

For Dolby Atmos deliverables, True Peak limiting is mandatory. The Dolby specification requires -1.0 dBTP maximum. The limiter must be True Peak-aware and the metering must comply with ITU-R BS.1770.

The automation-first principle applies even more strongly to limiting than to compression. A well-automated mix that arrives at the limiter with controlled dynamics will sound dramatically better through any limiter than a poorly automated mix pushed hard into the best limiter ever made. The limiter is the last safety net, not the primary tool.

12   References

12.1   Academic Papers (AES E-Library)

  • Giannoulis, Massberg & Reiss (2012). “Digital Dynamic Range Compressor Design – A Tutorial and Analysis.” JAES 60(6).
  • Giannoulis, Massberg & Reiss (2013). “Parameter Automation in a Dynamic Range Compressor.” JAES.
  • Sierra (2023). “Is oversampling always the solution?” AES Express Paper.
  • Steinmetz & Reiss (2022). “Efficient neural networks for real-time modeling of analog dynamic range compression.” AES Convention Paper.
  • Gerat, Eichas & Zolzer (2017). “Virtual Analog Modeling of a UREI 1176LN.” AES Convention Paper.
  • Colonel & Reiss (2022). “Approximating Ballistics in a Differentiable Dynamic Range Compressor.”
  • Bromham, Moffat, Sheng & Fazekas (2022). “Measuring Audibility Threshold Levels for Attack and Release.”
  • Steinmetz, Bryan & Reiss (2022). “Style Transfer of Audio Effects.” JAES Open Access.
  • Singh, Bromham, Sheng & Fazekas (2021). “Intelligent Control Method for the DRC: A User Study.” JAES.
  • Sheng & Fazekas (2018). “Feature Selection for DRC Parameter Estimation.”
  • Campbell, Paterson & van der Linde (2017). “Listener Preferences for Alternative DRC Configurations.” JAES.
  • Brunet, Li & Kim (2023). “ML-Based Time Series Forecasting for Audio DRC.” AES Open Access.
  • Jot, Smith & Thompson (2015). “Dialog Control in Object-Based Audio Systems.”

12.2   Hardware

12.3   Plugins