Digging Into Node.js

gzip Compress with zlib

Kyle Simpson

Kyle Simpson

You Don't Know JS
Digging Into Node.js

Check out a free preview of the full Digging Into Node.js course

The "gzip Compress with zlib" Lesson is part of the full, Digging Into Node.js course featured in this preview video. Here's what you'd learn in this lesson:

Kyle pulls in Node's built in zlib module, which handles gzip compression, and writes code to create a streaming gzip transform stream in order to compress incoming data and then send it into the output stream.


Transcript from the "gzip Compress with zlib" Lesson

>> Kyle Simpson: So uppercasing files is not terribly interesting or useful. What are some kinds of processing on these files that we might do that would be more interesting? One of the common sorts of transformations that you may see or encodings that you may run across on files is that they've been gzip compressed.

You may wanna unzip them, process them, and then rezip them. This happens a lot when you're say downloading a big file from an FTP site, processing it, adding some stuff to it, and then rezipping it and uploading it to some other HTTP server, that sort of streaming kind of operation.

So let's pull in some gzipping and gunzipping. And luckily, this is, again, built in for us in Node. They have a streaming implementation of the gzip algorithm built into Node and it's called zlib. So we're gonna pull in zlib,
>> Kyle Simpson: From Node.
>> Kyle Simpson: Now let's think about this, let's first talk about compressing our output, and then we'll talk about decompressing input.

Let's say that we want to allow an option for compressing the output. So we are gonna add a Boolean flag called compress. Before I forget, let's go ahead and update our help. We can say console.log{"--compress, we'll gzip the output.
>> Kyle Simpson: So here we are in our process file, and we need to set up a streaming gzip transform stream basically.

So it turns out that's a whole lot easier than that sentence sounded like it would be. We need to figure out where in this function, do we wanna do the gzipping of the compression before or after we've done our transformation? We wanna do it after, so we wanna make our capitalization of stuff, and then we wanna do our gzipping.

So here we're going to say if (args.compress), then we need to insert a step into our stream processing here. We need to say here, if {args.compress}, we need to pipe our outStream into a gzipping stream and then have it. So we're gonna say outStream = outStream.pipe. And the thing that we wanna pipe it to is going to be a gzipStream.

So how do we define a gzipStream?
>> Kyle Simpson: It's as easy as zlib.createGzip. That makes us a transform stream, kinda like the one we did manually. It makes this a transform stream that already understands the streaming gzipped protocol. By the way, I never knew this about gzip, but the compression protocol was literally designed for streaming.

That's why it's so easy for them to have a streaming gzip because the actual protocol for gzipping was designed with this use case in mind, to be able to do it streaming. All right, so we are conditionally adding an additional step of processing which is that we want to gzip things.

Now if we're going to gzip an output file, then we might want to change the file extension from saying out.txt to having like .gz extension because that's a pretty common convention for that file name. So if we are doing a right stream here, let's go ahead and say output stream, if we are compressing.

Let's go ahead and say that output stream is equal to whatever output stream already is, but then let's just appended a .gz on the end of it. I'm sorry, not out stream, it's out file.
>> Kyle Simpson: Let's test this out. We're gonna pull up our ex2.js. And here I'm gonna switch back, just for the time being, I'm gonna switch back to the hello.txt, and I'm gonna use standard out to see things.

And then let's go ahead and add the compress, so we should see all the gzipping junk show up in our shell. And sure enough we see that, okay. But if I were to take off the out and just do the compression, now we won't see it but it'll end up creating a file called out.txt.gz.

And if we look, we now have an out.txt.gz. And here's something interesting that I found when I was first teaching this workshop, because I went to open up the file and I was expecting to see all the gzipping junk, and it turns out that we can see the plain text of the file.

And so I spent a good half a day trying to figure out why is my ziping not working? What's happening? It turns out that them, automatically if you have a file extension that says .txt.gz, it automatically gunzips it for you when you open it. So this is them's magic showing us the contents of the gzip file.

If you wanted to verify that you could a actual cat of it. And then you'll see the junk is in the file. Okay, so we're able to compress on the fly. Let's try it again but let's try it with Lorem, and we'll see that it creates a somewhat bigger file.

But that still is significantly smaller at 226K, significantly smaller than the 1 gigabyte that we started with.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now