Conceptual Explanation of FIR Filters

Conceptual Explanation of FIR Filters

As technology marches on, FIR loudspeaker presets are becoming increasingly common, and with good reason – they’re not just a marketing buzzword. FIR filters offer some real benefits in pro audio applications – and, like other tool – they come with their own drawbacks and considerations. This post is adapted from a forum explanation I gave that attempts to explain how FIR filters work on a conceptual level. For a more technically rigorous explanation, I highly recommend this article from Eclipse Audio, developer of the excellent FIR Creator software. For a look at how manufacturers leverage the benefits of FIR for loudspeaker design, see this interview with my friend Sam Feine.

We start with a very important, foundational principle: time and frequency are inextricably linked. Mathematically, T = 1/f. That means that anything we can describe in terms of frequency, we can also describe in terms of time. Different frequencies take different lengths of time to go through a cycle. So we could say “hey, let’s talk about that signal that cycles 200 times per second.” That’s 200 Hz. We could also say “Hey, let’s talk about that signal that takes 5 milliseconds to complete a cycle.” That’s also 200 Hz. We’re just describing it in terms of the time domain.

These are two sides of the same coin – in fact, there’s a set of mathematical operations called transforms that take frequency data and spit out time data, or the other way around. This is how spectrum analyzers work – even the basic RTA app on your phone – it looks at the audio signal picked up by the microphone (level varying over time) and transforms that into the frequency domain, displaying the frequency content of that signal.

So let’s say we design a device that changes the level of different frequencies in an audio signal, or, in other words – a filter. Perhaps our filter boosts the bass and reduces the highs.  By changing the frequency content of the signal, we are also changing how the signal behaves over time. It’s a mathematical certainty. This gives us a lot of flexibility and lets us do helpful and non-intuitive things in audio.

Think about being in a reverberant space like a church. Let’s say we want to evaluate or describe the acoustic properties of the space. You could go into the space and fire a gun, pop a balloon, or clap your hands (for many of us who work in live sound reinforcement, walking into a venue and clapping our hands to evaluate the reverberance is a matter of habit). This would send out a wave of energy (we call it an impulse to illustrate the fact that it’s a very brief, very loud signal), and then we could record the energy as it came back from bouncing off the walls, etc. Guns and hands and balloons are physical devices so there are limitations but let’s assume for the sake of argument that the impulse we produce is incredibly loud for a very short amount of time, and has a flat frequency response (contains all frequencies equally). So you might imagine creating an audio file that has every sample value set to 0, except a single sample value at 1 (Full Scale). In fact, you can open an audio editing program and create such a file. Here’s what it looks like viewed in Smaart’s Impulse Response mode:

In the middle pane you can see that all the energy is concentrated around a single moment. In the bottom pane you can see the flat frequency response. Interestingly enough, those two statements are linked at the hip. If one is true, they both have to be true. If the energy were to be spread out over a longer time period then the frequency response can no longer be flat. Likewise, if the frequency response is not flat, the impulse cannot be infinitely short.

Why? Remember that we’re looking at the same audio file in both plots here – we’re just displaying the data in different ways. Put another way – if we change the frequency content of a signal, we change its waveform, and the converse is also true. If you’re stuck on this, here’s a ProSoundWeb article I wrote a while ago that explores this concept in more detail.

Pull The Trigger

So – back to measuring the acoustic behavior of our hypothetical venue. Let’s create an impulse in the space (firing a starting pistol, clapping our hands, or popping a balloon are all reasonable approximations for our purposes here). We’ll set up a measurement microphone in the space, and record the result. As the energy bounces around the room and arrives back at the microphone, our recording will capture the level over time, and that tells us something about the space. Think echolocation, or a Batman gadget. This is called an Impulse Response and it shows us the acoustic properties of the space. (Or, more generally, an impulse response is exactly what it sounds like: it describes a system’s response to an impulse input.)

In the early days of studying room impulse responses, this data was recorded using a machine that trailed a pen over a scrolling roll of paper. Thanks to some cool mathematical tricks, modern audio analyzers can actually use a variety of different test signals (most commonly sweep tones and specially-derived pink noise) to produce the same resulting IR recording. So we don’t have to create the impulse sound in the space directly – the analyzer can produce the IR using other acquisition methods, which basically means the resulting impulse response file is “here’s the result you would get if you did play a pure impulse in this space.” If you play back the resulting IR (usually a .wav file), it sounds exactly like you might think: a snap or a pop with a reverb tail.

Acoustic measurement software can then analyze this impulse response data in multiple ways depending on what we’re trying to study. The top pane shows a linear view – this is level (vertical axis) over time (horizontal axis), and is basically the same thing you would see if you opened the IR file in a DAW or other audio editing software – viewing the waveform directly.

The middle pane is the same idea, only the vertical scale is in dB (logarithmic display) which makes low-level signal and decay behavior a lot easier to see. The bottom pane is a spectrogram rendering, with time moving left to right, frequency from low to high, and color representing level. We can see that this venue has longer reverberation at lower frequencies than at higher frequencies, which is pretty typical.

Now let’s look at a different room Impulse Response so I can point something out.

Again, time runs left to right. You can see the initial impulse at around 18 ms, followed by three distinct secondary arrivals (the following peaks). Those are reflections off walls. (At this point, then, you may be wondering about the previous IR we looked at, above, and why it shows a peak before the main impulse. That “pre-peak” is the result of using a sweep test signal to measure the IR of a system that is exhibiting harmonic distortion – which is an interesting topic for another time. πŸ˜‰)

Now that we’ve captured the reverberant decay of the space, we can use it as a mixing tool. If we wanted to, we could get a good, clean, “dry” recording in our home recording studio, and then “multiply” that recording with the impulse response of the desired acoustic environment. Technically speaking, this process of multiplying a signal and an impulse response together is called convolution, and it’s how some reverb plugins work.

But for now, let’s revisit that first IR measurement we looked at – and I’ll swap the bottom spectrograph pane into a frequency response plot.

Here’s the key takeaway: these are both different views of the same data. Mathematically speaking, if you give us the impulse response of a room, filter, or system, we can generate the frequency response. That goes the other way, too. We can take frequency response data and mathematically produce an impulse response that contains us the same information, just in a different format. We’re just transforming the data into a different format. (That’s the Transform bit of Fast Fourier Transform.)

Let’s Talk About Tech, Baby

So now that we have a little more context, let’s talk about filters. Most of use are pretty familiar with EQ filters, so let’s think of those as an example. EQ filters are used to change the frequency response of an audio signal. Now thanks to what we discussed above, we know that, since the filter has a frequency response, it also has an impulse response. If we have the frequency response of a filter, we can determine its impulse response as well (that’s the transform bit again).

Remember the first image from this post – the pure impulse with the flat frequency response – and let’s compare that to a filter. I used a bandpass filter (a filter that will pass energy in the midband, and reject the low and high frequency components of a signal).

The bottom pane shows the frequency response of the bandpass filter – rolling off below 500 Hz and above 2 kHz – and the top pane shows the impulse response of that filter. Notice how the energy is no longer concentrated into a single clean peak. It can’t be, because we’re created a filter that treats energy differently at different frequencies, which is the same thing as saying it treats that energy differently in the time domain as well. That ripple and time smear in the IR isn’t the side effect or result of the filter’s response, it IS the filter’s response – just viewed in the time domain.

Ones and Zeros

While we are used to thinking of filters in the frequency domain, an FIR filter can be thought of as the time-domain (impulse response) representation of the same filter. An FIR filter is literally nothing but a list of numbers between -1 and 1. Each number represents the what output of the filter should be at that sample. 0 = no output here, 1 = max level output here. If we made a 9-tap (9-sample) filter of a perfect impulse (think about that .wav file we started off with), it would look like this:

10000000

Or

000010000

An FIR filter simply multiplies every single sample of the incoming audio signal by these filter values. Both of the examples above would just reproduce the original input signal, with no change in level. They would behave differently in time: the first example has the impulse starting directly on the first sample, while the second one has it halfway through, producing 4 samples of delay.

So if you move from left to right along the filter’s IR and describe the magnitude at that point as a value between -1 and 1, that’s exactly what the IR filter actually is. If you look directly at the sample values in a text editor, that’s what you’ll see.

So, conceptually, the FIR filter is based on creating the desired frequency response, transforming that into the time domain to end up with a string of sample values that describe the impulse response. Then every sample in the incoming audio stream is multiplied (convolved) with these values (called coefficients) which produces the desired change in response to the signal. This is a mathematically demanding process, so FIR generally requires a lot more computational resources than a “regular” (traditional IIR) filter. So why should we use it?

Two main reasons. First, we can create and implement a complex frequency response with a single filter, where creating the same curve with IIR filters might take a bunch of different parametric filters, shelves, etc to combine into the target response. So if you’re doing something like designing filters that will live inside loudspeakers to correct their response, a single FIR is much easier to get the results you need. Loudspeaker DSP engineer Sam Feine discusses this process in the article I linked at the top of this post.

Second, we can create filters that affect magnitude and phase independently of each other if we want. Yes, they are inextricably linked, so it’s not magic. We pay the price in delay. That’s the difference between the two 9-tap impulse responses from above. The one that doesn’t start its output until halfway through will take some more time to produce the output (because of that initial four-0 silence) but that delay allows us to manipulate the phase and magnitude independently within certain bounds.

If you look again at the FIR filter’s impulse response above, notice that the peak is followed by a ringing / smearing of energy that makes up the filter’s response. Such is the nature of things – we can’t have a filter that creates output before it receives input – that would violate causality and probably explode the universe πŸ˜…(if you figure out how to get this to work, call me and we’ll go to Vegas). But we can have a filter that has a bit of delay (a bunch of 0-value samples) before the IR peak – and that allows us the ability to implement filters in a way that allow us to manipulate magnitude and phase in a way that causes the IR to smear forward in time. For example, here’s the same bandpass filter response from above, only with linear-phase filtering.

The magnitude response is the same, but the wiggles in the IR now extend in both directions – forward and backwards from the main peak. Because of this, the main peak arrival can’t be at the beginning of the filter’s value list – we have to push it back to allow room for some non-zero sample values before the peak happens. This is at the heart of why linear-phase FIR filters cause delay. It’s not “latency” – it’s part of the filter’s response before we get to the IR peak.

How much delay is required depends on the type of response we’re trying to implement, and what frequency ranges we want our filter to affect. In general, extending the filter’s response to lower frequencies requires more time (because T = 1/f). To dig deeper into this concept, here’s the first in a series of articles by Pat Brown. But this is the practical limitation that prevents FIR from being more widely used in live sound applications. We need things to happen quickly. To affect lower frequencies with the filter, we have to remember that those frequencies take longer to happen (longer cycle time) and so we need to use longer and longer filters (more samples). That means more math, more system resources, and if we’re looking to go after phase separately, more delay through the filter.

So the industry seems to have pretty much settled on a compromise where we are happy to tolerate a handful of milliseconds of delay in order to linearize the phase of our loudspeakers throughout the majority of the audible bandwidth – say, from 350 Hz on up – and below that point, we quickly get into diminishing returns, as every time we want to extend our filter resolution down an octave, we have to double the filter length.

As a closing example, here’s the same linear-phase bandpass filter as before, with the bottom pane showing the Log view of the response, which better shows us the low-level details that aren’t visible in the Linear view. We can see that the filter’s IR occupies more than 15 milliseconds worth of time, and so we need an FIR filter that is long enough to accommodate that (and we typically also end up using something called a data window to “fade down” the tails to 0 towards the ends of the filter to avoid various undesirable effects).

At 48 kHz, we might use a 1024-tap filter (1024 samples) which is 21.3 ms long, and since the IR peak is centered, will produce 512 samples (10.7ms) of delay. That’s probably too much for a typical live sound application – but if we start to use shorter filter lengths, we’ll have to start truncating the “skirts” of the IR, which causes error in the frequency response.

If you’d like to explore these concepts yourself, two common choices for generating custom FIR filters are FIR Creator from Eclipse Audio, which offers a free demo version, and FilterHose from HX Audio Lab, which offers a free LT version.

Leave a Reply

Your email address will not be published. Required fields are marked *