The Tech of Knee-Deep in Tech, 2020s edition - part 3
In part 1 of this blog series I outlined the gear we use to record the podcast. The second part was all about actually recording content. It’s now time to dive into the third and most technical part – post-processing.
select * from foo into bar
As a quick recap you might remember that the starting point for this step is the raw audio files. I will typically have one file per host plus the recording of the Teams meeting. Let’s start with Teams first, as we need a way to extract the audio feed from the video so I can use that for lining up my other audio files. The Teams recording results in an MP4 file, and this file has several data streams. One of them is the audio, and using a tool called ffmpeg, this stream can be extracted to a format of your choosing. On the commandline I do this:
FFMpeg has a ton of different flags, more about them at the ffmpegs page. The different flags I use do the following: -i means input file – the MP4 we got from Teams. -vn means skip the video. -acodec pcm_s161e sets the codec, i.e the kind of decoder for the audio we’re trying to yank out (more from Wikipedia). -ar 44100 sets the sample rate (the exact same as we all use when recording). -ac 2 means 2 channels and finally I specify the output filename TeamsChatKDiT104.wav.
Using ffmpeg with the settigns above gives me a WAV file that is of the exact same type and quality as the raw audio files of the hosts. Time to go to work!
Aligning the audio tracks and doing basic edits
I open up Audacity and throw all the files in there. This is what it looks like in Audacity before I do anything. As you can see, the audio is not aligned at all – and this is what I’d be left with if I didn’t have the Teams track as a kind of a “metronome”. The only file where all the claps are recorded is the teams audio track at the bottom.
After hunting a bit in it I find the first claps:
…and then I just need to line this up with the audio tracks, like so:
Now I can discard the teams channel as that audio is not useful for anything else.
Time for the least fun bit – editing out all the “ummms…”, clicks, unwanted sounds, too long silences and such. This is the part that takes the longest as I have to listen through the whole episode to catch them. More obvious noises I might be able to spot in the waveforms in Audacity, but I don’t trust myself to always do so.
Dealing with noise
Noise is everywhere. Noise can be something obvious as people talking loudly nearby, the drone of an air conditioning unit, to ground hum picked up from AC wiring. Some noise can be minimized, some might almost be completely removed. As always the key is to start with as clean a signal as possible, as there is only so much one can do in post-production. When it comes to handling noise on recordings from our respective home offices, things are fairly straight forward. I know the characteristics of Simon and Toni’s microphones. I know for a fact that Simon’s microphone and amplifier will produce a cleaner signal than Toni’s microphone. Armed with that knowledge I am able to handle noise reduction in a way that is best suited to the different input signals.
For starters, I will use the first couple of seconds to give Audacity’s noise reduction effect something to chew on. I select the first few seconds of quiet, go to the effect and click “Get Noise Profile”. This “teaches” the effect what to listen for. Then I select the whole track and go to noise reduction again, this time applying these settings and pressing OK:
I won’t go into details what all these mean, but I’ve found these settings to work for our equipment. Your mileage may vary. Repeat this step for all the host tracks. It is vital to “train” the plug-in (using get noise profile) on the specific track it is to work with to ensure the best noise reduction possible.
Post-production on audio we’ve captured from the field can be very different. If the noise in that recording includes a faint murmur of someone talking in the background, it is impossible for the noise reduction effect to know which voice to remove and which voice to keep. In essence, trying the same noise reduction trick like above can lead to very garbled and strange sounding audio. There is no perfect solution to this conundrum – you just have to experiment a bit with finding a setting that gives reasonable reduction of unwanted noise while keeping the recording as clear and normal-sounding as possible. In some ways, the faint background din is part of the charm of an on-site recording.
Compression and dynamics
Time for the slightly more difficult parts of post-processing. Most people have some variation in loudness when they speak, especially over an extended time (like, for instance, the 30 minutes of an episode). This variation in loudness is generally not that desirable. What we often wish to do is to amplify the quieter parts and bring down the louder parts to create a cohesive loudness all through the episode. There are several ways to accomplish this. One is compression, another is limiting, or a third could be using automatic gain control. If you’re unreasonably interested in how this actually works, take a look at Wikipedia here. In short, I use a compressor to handle both making the audio more uniform, but also to improve the general dynamics of the sound. However, I was never quite satisified with the built-in compressor results.
Then I was pointed towards a 3rd party plugin called “Chris’ Dynamic Compressor” which did wonders. The Audacity Podcast has a great set of starting parameters, and I found them to suit my needs perfectly.
I select the track(s) I want to apply the compressor to, set the parameters to the following and press OK:
Equalization
It’s now time to tackle unwanted frequencies. Again I won’t go much into detail as others have written about them in a much better way than I can (for instance the blog FeedPress here. I’ve created an equalization (EQ) curve in the equalizer plug-in that looks like this:
It’s somewhat hard to tell, but the bass roll-off is between 60Hz to 100Hz (a high-pass filter cuts out low frequencies that we want to avoid). Human hearing generally peaks around 15kHz, so that’s where I’m rolling off the top end as I don’t need that frequency range either.
This step can sometimes amplify and bring out noise that previously was hard to detect. I’ve found that Toni’s microphone has a tendency to be a bit noisy after this step, so in his case I throw in an extra noise reduction step just to clean that track even more.
Finally I cut out the dead air we used in the beginning of the recording to handle noise as that won’t be needed anymore.
Loudness
The newest step in the chain came about when I grew tired of having episodes with varying volume. Technically they are varying in loudness, but the end result is that our listeners can’t listen to two episodes back-to-back without having to fiddle with the volumen knob. That had to go. Unfortunatly, this is a pretty involved step to crack, but here’s how I do it.
The free Audacity plug-in called “Youlean Loudness Meter” can help me figure out the loudness per track. I select the track I want to check, bring up Youlean and click “apply”. The plug-in will have a quick (silent) listen through the track and give me the integrated loudness of the track (indicated in blue in the picture below).
My target is -19 LUFS/LKFS for mono tracks. What the heck is LUFS or LKFS? Look here for an explananation of LUFS/LKFS for podcasters. Don’t worry if it says “stereo” down in the left hand corner of the window – it is only displaying the loudness of the track I’ve selected. To get to -19 LKFS from -17.7 LKFS I will use Audacity’s amplify plug-in. The thing with the amplify plug-in is that it can take both positive and negative values. In this case we’re going for a negative value as we want to decrease the loudness of the track from -17.7 to -19. According to plain old math, 19-17,7=1,3, so we’ll put in -1,3. Finally we check the results using Youlean one more time to make sure we’re around -19 integrated. (i.e over time and not momentary)
Since that was so much fun, we get to do it again for all the other tracks!
The end result is a set of mono tracks, all normalized to -19 LKFS/LUFS and this is both the unofficial “podcast standard” and a consistent number I can use going forward.
Mix, render and intro music
With most of the post-processing flow done, it’s time to mix and render the audio streams into one mono stream. I then add the intro music file (already normalized to -19 LUFS like above). Shifting things around when I mix and render again, I’m left with a stereo file almost ready for public consumption.
The final step is to make sure that the stereo tracks are normalized to -16 LKFS/LUFS using Youlean. Why -16? Didn’t we jump through all those hoops to get to -19? Yes we did, but that was for mono, remember? This is stereo, and then -16 equals -19 in mono. Deal with it.
With all the post-processing done I export the results as an MP3 file using the following settings:
Summary
Getting the audio to where I want it involve a lot of work. The final episode of 2019 is probably the one with the worst audio – ever. Audacity decided to record my Skype headset instead of my ProCaster. I only realized when we were done. That episode boils my blood to this day, and I vowed to never inflict such a crappy episode on my listeners ever again.
The steps I’ve gone through are the following:
- Extract audio from the Teams recording
- Use that file to line up the other files
- Edit for silences, clicks, “umm…”, coughs, breathing, etc.
- First round of noise reduction
- Chris’ Dynamic Compressor
- Equalization
- Second round of noise reduction (if needed)
- Loudness adjustment per track using Youlean
- Mix, render and add intro music
- Final check with Youlean
- Export to MP3
The plug-ins I use apart from what comes with Audacity are
Phew. With that out of the way, the fourth and final part of this blog series will deal with getting the episode and word out!