Networking and Streams Introducing Streams
Transcript from the "Introducing Streams" Lesson
>> Now we can learn about streams in Node.js. Streams are a handy way of shuffling data around. So if you need to build some IO glue, where you've got some sources of data, you've got some destinations where the data needs to go. You've got some steps in between, streams are pretty good fit.
[00:00:22] Some examples of that might be if you need to use compression, if you need to add different kinds of transformations to build the data pipeline. Streams are fantastic for that kind of problem. Streams are a really old idea. This is a quote from 1964 from Doug McIlroy who is working on what would become the Unix system.
[00:00:50] Back then it was still called Multics, which was a group project between MIT, Bell Labs, and General Electric. But then AT&T split off and made Unix in 1869. But anyways, just read this quote because it's a pretty fantastic idea about like chaining programs together with pipes which is one of the main influences of node js streams.
[00:01:15] So, we should have some ways of connecting programs like a garden hose, screw in another segment when it becomes necessary to massage data in another way. This is the way of IO also. This idea is that instead of building these like big pieces that will kind of like deal with the entire data pipeline internally, you can split them out into separate pieces.
[00:01:43] That's kind of like do one thing and do it well. And handoff that information to the next piece in the pipeline, which is what pipes let you do in Unix. And it's also what the stream implementation in Node.js which has a method called pipe does. It's very similar.
[00:02:00] Another great reason to use streams aside from just the logical separation, which is useful, is that streams operate on data in a kind of like piecewise manner. So chunk by chunk. So instead of having to passing very large objects and to keep a lot of things in memory, with streams you can deal with like much smaller pieces.
[00:02:22] So this is really useful if you have to build something like a web server, where you might have to stream out 100 megabyte video file, right? So you wouldn't want to have to read in the entire file and then write it out because maybe you have thousands of video files or millions on your servers.
[00:02:42] And that would be an awful lot of expensive RAM to serve up that kind of thing. So streams will let you pick apart those video files chunk by chunk and only keep, like kilobytes or maybe like a megabyte in main memory instead of the whole thing. So, in the Unix workshop I gave, we came up with this little example of a downloaded Moby Dick from Project Gutenberg, which is gzipped.
[00:03:11] We came up with this little pipeline to count the number of times that whale was said in Moby Dick. And it's really a useful way of thinking about breaking apart problems on the command line. And this is also kind of how you can solve a lot of problems in Node.js in a similar way.
[00:03:30] So what that might look like with some hypothetical, with some real and some as yet unwritten methods, is you could use some methods like FSX create read stream to read in that file chunk by chunk. And it would return a readable stream that you can call .pipe on.
[00:03:49] And you can pipe that into createGunzip which is a transformed stream that takes gzipped input as output, so compressed input and provides uncompressed output. The next step might be a little function that can split on whitespace, and turn those any whitespace character into a new line. And then you could turn that output into the input of a filter program, that would search the number of times that you see whale.
[00:04:18] Then maybe you could have another thing at the very end that just counts up the number of occurrences. So this is kind of like how you could decompose the problem in a similar way with the more Node.js style interface. So this version in Unix this version yeah, question?
>> So all of those commands will be running at the same time?
>> Or would they be sequential?
>> So some of these would operate on chunks of information. So like gzip, I think operates in chunks, I don't know maybe like 500 bytes, a kilobyte in size.
[00:04:54] I'm not exactly sure. But basically yes, so every one of those steps is going to be emitting little chunks of data. Sometimes they have to buffer a little bit up to like fill up like a chunk size. This is especially the case with things like compression and with encryption, but otherwise, yeah, things are mostly happening all at the same time and sort of like feeding through this pipeline.
>> For each chunk it like goes through each step, like one at a time.
>> Yeah, each chunk, like goes in order from one step into the next. Yes, just like on the command line. It's how it works. So, here's an example of just something that really simple that you can try out yourself.
[00:05:50] So if you make a file called greets.txt or whatever you like, you can just print it out. So this is basically the Unix cat commands without the concatenation part, just print a file. So if we type that out, I'll call it print.js. Maybe instead of hard coding the file, I'll take the file name as an argument.
[00:06:15] That'll be, so would you createReadStream, and then I'll read process argv2, which is the first command that you specify after the node and then the file name and then the arguments. And I'll call that pipe to process.standard out. And now if I run this program on a file, maybe I'll provide the program itself as the input file.
[00:06:44] So here we go, the program prints its own source, that's kind of fun. Also, if I had another file like hello.txt, I could run that as well. Great, so we're just copying input into output. It's all that this program does. It's good place to start.