Transcript from the "Nginx File Compression" Lesson
>> Jem Young: And earlier, Sam mentioned Brotli, so there's two basic types of compression we're gonna do on servers and the great thing is NGINX by default has this turned on, it's called gzip. Does anybody know what compression is in general? Like just what's your general mental model on it?
>> Sam: That is do duplication, right?
>> Jem Young: Yeah, yeah, that's a good way of putting it.
>> Speaker 3: So it's taking information and putting it into a smaller container. So it could be lossy or lossless. Lossy you throw out details that you hope nobody will notice or care about.
[00:00:41] Lossless is like folding a parachute you fold it carefully and you put it into a small of container as possible. Lossy you take a chainsaw or scissors and you cut out parts and hopefully it still works as a parachute.
>> Jem Young: Yeah, no I like that. You lost me a bit about parachute cuz I was sort of like that's a good analogy should I use that?
[00:01:05] But I like that. The one you're probably all familiar with is MP3s that is a lossy compression algorithm. So if you've ever seen a WAV file, now I'm dating myself, but [LAUGH] WAV files are unmodified lossless ways of recording audio and in the days of dial-up Internet, they were huge.
[00:01:28] You couldn't share a WAV file they could be 200, 300, 400, 500 megabytes for one song. That's insane on dial-up that will take years. So they invented, what changed the game on sharing was compression. They created something called MP3 which is MPEG layer 3 which is Motion Picture Expert Group.
[00:01:48] Again, why do I know these random things? It's okay, but now you know those random things so you should feel as bad for being a nerd too. But what it did was it took all of given a WAV file, you have waveforms that look like this they go up and down.
[00:02:02] And it said, you know, humans can't even hear these things so let's just chop those out and you have an MP3. And that gets, like what was a 200 megabyte file down to 2 megabytes which is really impressive. And part of the art of compression is knowing what you can take out, so like Sam was saying lossless compression or lossy compression.
[00:02:25] In terms of server compression the what where doing is we're saying, hey, this file says gem, gem, gem, gem, gem, gem, gem, gem, gem. I can compress that by just creating a sort of pseudo programming language that says, instead of repeating gem 20 times I'm just gonna create gem times 20.
[00:02:45] And that compresses things down like this and that's the essence of compression. So if you think of 1s and 0s, if you say the next block has ten 1s and ten 0s just send those instructions down to the server and then the client can unpack those or uncompress them.
[00:03:00] And that's how you have really, really small payloads that can unload to something that is huge. Now there's something called, and this is just interesting, a gzip bomb.
>> Jem Young: What do you think gzip bomb does or any sort of, what do you think zip bomb does? This was in Silicon Valley one of the seasons.
>> Speaker 4: Like here Peter only has so much memory or something like that and then when you're uncompressing something it takes up more and more memory that your computer doesn't have anymore. Is that it? Something like that?
>> Jem Young: Yeah, you can create, because that languagry in compressing where you saying like, hey, I'm gonna take gem and that's 30 times.
[00:03:39] But what if we just created a self referential loop that says, this is gem 30 times and it unpacks to something that says this is gem 30 times. And then your file system explodes and you can take down entire systems with what are known as zip bombs. They still work funny enough [LAUGH] there's not a good way to disable that.
[00:03:58] So it's one of the things if you're on the Internet don't download random files. I don't know why I'm telling you this, but [LAUGH] don't download random zip files, don't download random things like that because you don't know what'll happen. In terms of game-changers as far as making the Internet a little bit faster NGINX will automatically compress everything in a standard format known as gzip.
[00:04:20] That's GNU zip, G-N-U that's the general something I forget, but GNU is used everywhere. But what it does is it compresses all this data going out and it reads it in that same format about just like we described earlier how if there's a lot of 1s and 0s it just rewrites that in a language that we all understand.
[00:04:40] And what's great is all the browsers so Safari, Internet Explorer, Samsung Internet they all understand how to unpack a gzip file and that's what makes your connection so much faster. In fact, let's check this out. Let's go to MDN, my favorite website and let's check out what they're doing.
[00:05:01] Go to the Network tab, and you see the size here? Let me make that a little bigger.
>> Jem Young: So generally, if you're looking at your inspector you're gonna have two sizes, one is the uncompressed size and one is the compressed size. And we see the power of compressing things in that instead of sending down a almost 300-kilobyte file that gets compressed to 37-kilobytes and that's the power of compression.
[00:05:27] And the great thing about NGINX and one of the more powerful things is it does this on the fly. So if you're sending out images things like that NGINX will automatically gzip them if you leave it on which is on by default. And it'll serve these images compressed and then when it hits the client it'll unpack them for you.
[00:05:42] And that's just something that's really cool. Yes, question?
>> Sam: I thought images were omitted from gzip because that's they're already encoded in as compressed to format as they can?
>> Jem Young: Yes-
>> Speaker 3: Once they enlarge-
>> Jem Young: Thank you. I misspoke. Images are not compressed because images by default using PNG or JPEG are already types of compression versus I think bitmap BMP, might be able to be compressed because that is an uncompressed file.
[00:06:07] But that's why you never see bitmaps served over the Internet anymore, we used to, those were dark days, really dark days [LAUGH]. But thank you for bringing up that point. Yes, when we talk about compression there's different types that are more efficient for different formats. So images in particular serve well to compression it because what is an image?
[00:06:28] It's a series of dots that are certain colors and if those dots it's just math. And if I know I have 50 blue dots or if I wanna represent say, this board right here it's all one color this will compress really well. So I can just say, how big is the size?
[00:06:45] 600 by 800 and it's all the same color I can compress that down to one line essentially. And that's what makes image compression really powerful in PNG but that's why you can't gzip them. You can if you really, really want to, you'll just actually add overhead to it because you're now adding a zip around something that's already compressed.
[00:07:04] So what you're doing is you're adding unnecessary instructions to something that it's not gonna do anything. Thank you for bringing up that point though.
>> Jem Young: It's gonna be a good class, I can feel it. So if you wanna check out your gzip settings and lots of other settings you can look at your NGINX.com.
[00:07:22] We're not gonna touch that right now, but if you look at it, you'll see that gzip is on by default. And we can set our compression levels to 1 through 9 I believe it is, I wouldn't mess too much with the compression levels. It's tempting to go all the way up to 9 or 11, if it were, [LAUGH] if anybody understands that joke.
[00:07:43] But honestly, at this point you're trading off CPU power because it takes time to compress these things and run through the compression algorithms versus the amount of bytes you're saving. And 6 4 is a pretty good level where you're trading off speed for performance. Going any higher it's really tempting but you're honestly not saving that much.
[00:08:02] There's more optimizations you can do down the line.