Web Audio Synthesis & Visualization Visualizing a Waveform
Transcript from the "Visualizing a Waveform" Lesson
>> So let's jump into some of the code for building out these Waveforms with the Web Audio API. So in the demos here, if you scroll down, you'll see the waveform demo. [SOUND] This is showing the waveform of whatever is playing through the audio context. But if I layer it up, it gets a little bit bigger.
[00:00:23] And I'm just gonna open that up locally here. And actually, if I stretch the browser, you'll see, as this is happening, it's really just a line. May be hard to see. But the wider your browser is the more you'll start to see that it's just a line just like the graph that I've been showing on the slides.
[00:01:06] So within this demo, we have a few things that are very similar to the buffer example that we've shown earlier and a few additional things. So we have the gain node, as I showed, we create a Gain node. This is on top of the usual loading the buffer and then playing the buffer after it's been loaded.
[00:01:30] But there's another sort of set of things that we're doing here that's a little different. We first create an Analyser node, that's this one here. And this Analyser node is what's gonna allow us to actually take the signal that's coming into the node and figure out what the waveform is of that signal.
[00:01:53] And it doesn't need to be an MP3. It could be really whatever node is coming into it. So maybe that's it. The microphone or maybe that's from a video that's playing in the browser or something else like that. And so we do audioContext.createanalyser. And then the next thing we need to do is set up an array that will store all of that data that I was showing in the examples in the slides.
[00:02:16] And it's gonna be a Float32Array. And here we pass in the property called fftSize. FFT is something we'll talk about in a bit. It stands for fast Fourier transform. And all you need to know for now is that this variable here is something around 2048 or 1024 values.
[00:02:40] And we're gonna be able to change that in a bit. But for now, we just leave it as it is. It will know what to use is the best, depending on your browser, and things like that, depending on the API. And we have to use Float32 for this because I find that it's sort of the future facing API for analyzing the audio.
[00:03:04] But in some browsers, you might be forced to use the older byte data API. So if the browser is not new enough, it might not support float data for audio analysis, in which case you have to use, instead Uint8Array. And then you'll have to also change your functions.
[00:03:24] Instead of it being get float time domain data, you have to use get byte time domain data. But just for now, I'm gonna stick with Float32Array, that'll give us the highest precision and the most detail. And then once we create our analyser node and get our data, we have to make sure that we connect all of our nodes.
[00:03:45] So remember what I was saying earlier, if you create an audio context node, an audio node, if you create it without connecting it to anything or without connecting anything into it, it's not really gonna do anything. It's just floating in the middle of nowhere not doing anything. So we first have to connect our gain node, which is our master gain for the audio source.
[00:04:08] We have to connect that into the analyser. And the way it looks is just like this, connecting the gain into the analyser. And then we also wanna connect the gain node into the destination. So it's splitting off there. One of them is going to the speaker and one of them is going to the analyser.
[00:04:32] And then here, as we scroll down, when we're playing the sound from the audio buffer, we're just connecting it straight to the gain node, connecting it to that master gain, which is gonna split off to the speaker and the analyser. And then here we get into the actual analysis.
[00:04:50] So I'm gonna try and I'm just gonna copy paste this into a new tab just to make sure I don't lose anything. Whoops, and I'll start again from the ground up. So in our draw function, as we're playing this audio, we wanna start to visualize it. And the way we're gonna do that is just by drawing a graph.
[00:05:14] And so the first step is to use that API function I was mentioning, analyserNode.getFloatTimeDomainData. And it looks like that. And we just pass in our analyserData here. That's the array that's gonna receive all of this data. And we're gonna reuse the same array every frame. And now the next thing we're gonna do is loop through that array.
[00:05:50] And for each element in the array, we're gonna draw a line, a vertex in a line, polyline kind of shape. And to do that, we're gonna start with beginShape. And we're also gonna have endShape. And just note here that with p5.js, if you're using beginShape, endShape and then you create some vertices in between, you created a shape, which might be a polygon.
[00:06:17] And, in this case, we don't want to create like a filled polygon. So we say noFill. And we set the stroke to a certain color, just to make sure we're only getting a single line instead of a filled polygon. And then we're gonna loop through our array, analyserData.length, and i++.
[00:06:39] And what we'll do here is we will get the amplitude from the data. And this value goes between negative one and positive one And let's convert that to a y pixel position. So, if I was to try and draw a graph based on this amplitude, the graph would be very small because it would just be going from, one pixel up here to the next pixel down.
[00:07:12] So we have to find out, based on this amplitude, what is the pixel position that's goinna be up here at the top of the screen, all the way down to the pixel position is gonna be at the bottom of the screen. And so for that, I'd like to use p5.js has a map function where you pass in the value you pass in the minimum that the value will be and the maximum of the value will be.
[00:07:37] So our amplitude is always going from negative one to positive one. And then you pass in the output. What you want to map this from to. So you're mapping it from negative one to one into, let's say, for now we'll just do it. Actually, we'll do it this way.
[00:07:56] So we go from the very top. So I'm gonna do width / 2, which is the center of the screen minus let's say some amount. We could do width / 4 for now. We can change that later. And then, again, the same, except instead of it being minus, it's gonna be plus.
[00:08:19] And what that should do is it should say, if the amplitude is close to this, give back a result that's close to this. If the amplitude is close to this, give back a result that's close to this. So it will just linearly interpolate, which is a bit technical.
[00:08:34] But basically, it'll just kind of map the value from the top to bottom. And we also need to map the value from the start of the array, 0, to the end of the array, which might be, let's say 10, 24 or 20, 48 or whatever. And we need to make that so that it goes from the very left of the screen in pixels to the very right of the screen in pixels.
[00:08:57] So again, I'm gonna use map. I'm gonna say I, for index, starts from zero. And goes to analyserData.length. I'm actually gonna do length -1, just so that it goes all the way to the end of the screen. Just something I tend to do. And then the output of that is gonna be from 0 to the width.
[00:09:20] This is a built-in variable by the way from p5.js. Now when we try and draw this vertex, we can see it's not perfect but it's getting there. So there's a little bit of an issue here with how I'm mapping my amplitude to the pixel values, because it's a little bit higher than it should be.
[00:09:49] Of course, so I should be using height for this y value. So, because I'm mapping it from the top of the screen to the bottom of the screen, you don't wanna use the width of the screen cuz, in this case, the width is shorter than the height. You need to make sure that you're using the height, which represents a horizontal axis, just a bit of a bug.
[00:10:09] There we go. [SOUND] And so from that now, we're able to visualize this waveform. There's a little bit more of an advanced demo that I won't try to explain too much because it's just a lot of extra little details. But there is here this advanced demo, which you can see.
[00:10:33] It builds on the same ideas and I'll open it up.
[00:10:41] It does a few things differently. It has a few extra little features and browser support. So one of the things that does a little differently is that, instead of just playing the audio right away as we had in our earlier streaming examples, it only waits until the audio file can play.
[00:11:01] And then it resumes, the context and calls play. And it just makes sure that it does that once. It's a slight detail. I don't know if it's gonna really make much of a difference in these demos. But maybe it's good practice to only try and trigger these things once the audio files are actually ready to play.
[00:11:19] Because before they can be played, typically they have to load some metadata. They have to load just the basics of what is the file, what kinda file format is it. And then once that's all ready, this canplay event gets triggered. And then you can actually play the audio.
[00:11:35] So I don't think it'll make much of a difference whether you use this event or whether you just call play right away. But it might be good practice. Another thing that it's doing here is it's checking whether the browser can support float or not for things like time domain data.
[00:11:55] And that just looks like this. I won't go into detail. But basically, it's a bit of a slightly different way of coding. If you're using the byte data, it's between 0 and 255. Those are the values you're getting. And so you have to scale those from 0 to 255 into a -1 to positive 1 range to make it behave like the float data.
[00:12:21] Another thing that's actually kind of useful to know is this FFT size. So we can adjust the resolution of these kinds of analyzer nodes and analysis, we can adjust it by just giving it more samples. In the case of a waveform, when we give it more samples here, we're increasing the window size basically.
[00:12:43] Or perhaps increasing the density of samples. That's actually something that might be worth just checking the API for exact details. But in either case, we're gonna end up with a more dense sort of output here. Just more data, as opposed to less data. So if you're willing to compromise, and potentially it might run a little slower or it might not perform as well on mobile devices, but you can set this value to a higher value If you want more of that.
[00:13:16] And then another thing that this is doing that's a little different is that instead of just every single frame, getting the new data and rendering that every frame. I thought it was kinda nice. And I've done this in a few installations now and sort of audio projects now where I use smoothing.
[00:13:33] So I'll get the new audio data every 12 frames per second or something. But then 60 frames a second, I'm still rendering the visualization. And I'm just smoothing the data from one frame to the next. And that just creates a bit more of a sorta noticeable look. You can really start to see the peaks and the troughs of the waveform.
[00:13:56] Whereas if it's just going every single frame, it's happening so fast that sometimes you don't really get to see those details. It's just sort of a change of graphics that are just constantly changing. And it kinda loses some of the quality there. So yeah, that's this demo, a little bit more complicated.
[00:14:16] I won't go into it too much. But it's definitely worth a look if you want a little bit more robustness out of this kind of waveform stuff.