Transcript from the "Frequency Analysis Overview" Lesson
>> So, that's a bit of how to use this API to create visualizations of waveforms. And how to even use the waveform data to start to create some really interesting visualizations based on frequency, if you're using the by code filter. But now let's talk a little bit more about analysis of frequency, what it means to analyze the frequency, and we'll do it in a way that isn't relying on the by code filter.
[00:00:25] We're gonna do it in a way that's a little bit lower level, little bit rougher maybe a little bit. Yeah, maybe a little bit more mathy, but hopefully it will start to break apart what is going on and how we can use this in different ways. So, remember how I was talking earlier about low and high frequency sounds.
[00:00:45] And showing this image here where the waves will look compressed or strung out like if it's a slinky that you're pulling, or pushing it together. And that corresponds to these different tones that we're hearing. Now, if we were to take the time domain data, so time domain represents the graph that we've been talking about, that waveform graph.
[00:01:07] And we were to apply what's called a fast Fourier transform to it, which is just a mathematical concept. If we were to apply that mathematical idea to that time domain data, what we would end up getting is frequency, a frequency response kind of graph. And it's kind of broken down like this in this graphic here where on the left is the time domain data, that's the graph with time.
[00:01:34] And then each of these little things in the middle, those are the different bands of frequencies. So, the one that's closest to the time domain, that's a rather low frequency band, and then in the middle it's a middle mid tone, sort of mid frequency. And on the end it's a high frequency when the slinky is pushed together, and what it's doing is it's showing, this frequency graph is showing the signal strength of each of these different bands of frequency.
[00:02:02] And so right now in this example we have three different bands of frequency and we can see the blue is peaking at those three moments. But perhaps there'd be thousands of different bands of frequency if we're analyzing the audio with the web audio API. That's a little bit hard to wrap your head around so I'm gonna show it in a couple of different ways.
[00:02:21] This is another way of visualizing the frequency spectrum. This is very common in audio workspaces when you're trying to analyze audio and starting to EQ it and equalize it, and shape it a little bit. And it looks like this on the left on the vertical axis, we have signal which is in decibels.
[00:02:40] Decibels is a relative unit, for the purpose of this course we're gonna really just be talking about the decimal value from negative 30 which we can assume is rather loud. To negative 100 which is the lowest that we're gonna really deal with, which we can assume to be rather quiet, pretty much silent.
[00:02:59] But these are very relative terms, and we're just gonna be using these values for the sake of today. But you might have to tweak those values depending on your audio files, depending on the sort of setup you have and how loud your your sources really are. Cuz they might not be within that range, they might be quieter or they might be louder.
[00:03:21] And then on the horizontal, we have the frequency, which typically goes from zero hertz. Hertz is a measure of a change per second, basically. And it goes from zero hertz to, let's say some very high numbers. So, like 10,000 or 20,000 hertz. The exact number that it goes to is actually called the Nyquist variable, and that has to do with the sample rate of the the audio context which we typically don't control the audio context.
[00:03:54] When we create it is usually set up with whatever the device wants it to be set up with, which might be 44,100, and the Nyquist is that divided by two, which is typically around 21K in terms of hertz. But in this graph, it just maxes out at 10K, but in the web audio typically you can kind of imagine that it's gonna be like either 10 or 20 K in terms of frequency.
[00:04:21] And when it reaches that point, it's so high pitched that we can no longer really hear it. So typically in an audio file you won't really need any signal that goes beyond these values. I think the human hearing maxes out at something like 20K, or something like that, I don't know the exact number.
[00:04:42] But once it reaches that point it just becomes so high pitch that we can't hear it anymore. And so typically Web Audio and just audio in digital formats doesn't really try to capture those frequencies. Another way to visualize this, instead of it being a graph with all of these many thousands of bands and sort of detailed line going from zero to some large value.
[00:05:06] Imagine we were to split the frequency spectrum from zero to 22k, we would split it into discrete individual bands. And so, let's say we were to use these bands that I've defined here just arbitrarily like 0, 50, 200, 1,000, 5,000, 22,000. If you were to listen to a kick drum and pass that through a spectrum analyzer, or some sort of graph like this, it might end up with a result like this where the kick drum has these bass tones, really low frequency tones.
[00:05:37] And it's pumping the signal in that end of the graph, whereas a piano might be more around the mid-tones, and it might be shifting depending on which octave you're playing on the piano. Voice again is kind of mid-tones maybe higher mid-tones, maybe lower depending on the voice. And then something like treble, so chime and high pitched tones, those might be more in the thousands of hertz kind of range.
[00:06:04] And so, just with these graphs you can start to see how we can start to sort of split apart an audio file into different buckets. And maybe do some different visualizations based on those buckets. So, maybe we have something where the circles are only changing when the kick drum hits, but then we have some lines that are only changing when the piano hits.
[00:06:26] And maybe some text is moving around when the voice is happening. You can start to visualize things based on these different frequency bins. Here's how it looks in an array, so we use the function get float frequency data which is kind of like the time domain, just now we're using frequency.
[00:06:44] We pass it into a frequency array where the lowest index is zero index here, this represents the zero hertz, so the lowest amount of frequency. Whereas the highest index in the array represents the highest amount of frequency we're trying to capture, which might be around 22k. And the actual value within each element of the array, as I was saying earlier, it's gonna go, let's say from around negative 100 to negative 30 decibels.
[00:07:13] Just as a little note, if you do have a different audio signal that's within a very specific range of decibels or within a range that's not really captured here. You can change the minimum and maximum decimals of the analyzer node, so that it's giving you back variables that are a little bit more in line with the type of signal you're working with.
[00:07:35] And one other little way to sort of visualize this just trying to really bring this home is, let's say you're trying to create a graphic where you're trying to scale circles based on the different frequencies, based on kick drum voice. And maybe something like a cymbal crash, with the red band here what I'm doing is I'm taking some starting point and some ending point, and averaging the whole signal between those sort of points.
[00:08:02] So within everything within that red band I'm taking the average signal of it and mapping that to the size of the red circle. And then doing the same for the blue band and the same for the green band. And that's gonna be the kind of code we're gonna to be looking at here in the next demos.
[00:08:19] I have a little tool which is I've found it quite useful. It's modeled based on a filter that is an Ableton Live. It's called spectrum.search.sh, and what it does is it visualizes with an audio spectrum like an analyzer node using Web Audio API. And it gives us back the frequency graph for any audio file that we pass into it.
[00:08:43] So, for example, I'll just open up our source audio, and I'll just put in this piano track here, drag and drop it in. And it will play the audio, and show the graph as it plays.
[00:09:04] And so something here on the bottom left to note is that you can move your mouse around and see the frequency at where your mouse is. So, here on the left is low frequency sounds, you can hear this piano track is not too high and low frequency sounds.
[00:09:20] It's really sort of the signal is very high in this middle range, between 170 hertz to around 1.5k kilohertz. And you can actually hear, if you listen to the sound, as the piano notes are being hit, the person who's playing the piano is going up and down the keyboard.
[00:09:45] And you can see that signal is pumping in a way that you can actually sort of see the different frequencies.
[00:09:55] So, one of the interesting things actually with frequency that I think is kinda cool is that, if you put your mouse let's say on the 440 hertz here, this one here. This band here represents the note C. And so when we're talking about music, and talking about different notes in music like C, D, A, etc., those are just representations of frequency.
[00:10:23] And so C is represented, depending on the tuning and things like that, but generally we represent C as 440 hertz. And so if you hear a constant 440 hertz signal, you're hearing something in the note of C, or in pitch of C. And I just find that really interesting that we're just labeling these notes are really just frequencies they're hearing to our ears.
[00:10:48] And based on that you can actually start to get some kind of accurate visualizations based on different notes that are being played if you're able to capture that. One more thing here is that the vertical axis of your mouse is giving you the decibels. So, you can actually even find out, let's say, let's say you want to visualize something between this band here.
[00:11:12] You might know that it's always going from the maximum for negative 30 decibels to the minimum around negative 70 decibels, or maybe a little lower. And so you can just look within that range and sort of shape the wave, and shape the data that you're getting to be within that.