Real-Time Web with Node.js Real-Time Web with Node.js

File Streams

Kyle Simpson
You Don't Know JS
Check out a free preview of the full Real-Time Web with Node.js course:
The "File Streams" Lesson is part of the full, Real-Time Web with Node.js course featured in this preview video. Here's what you'd learn in this lesson:

Rather than reading an external file in one large chuck, data from the file can be loaded piece by piece using the file stream API. The file stream API uses event listeners. For example, the "data" event is fired each time a chunk of data is received.

Get Unlimited Access Now

Transcript from the "File Streams" Lesson

[00:00:00]
>> [MUSIC]

[00:00:04]
>> Kyle Simpson: We have our three.js in our hello world three.js. Let's take one more step of evolution before we get out of the command line and we start working our way into the web where things can get more interesting. The last thing I wanna do is we are reading a file in one big chunk.

[00:00:22] That's what the readFile function does. It reads the entire file all into one big chunk. If it was a giant file, that could be really slow, and also if we needed to sort of do some stream processing on it, like we needed to dumb output to the browser quicker or things like that, waiting for the entire thing to load before we get any notification that it's finished is not very optimal.

[00:00:47] So, another way of going about this would be to treat this as loading it up with a file stream. And that's my light introduction that I'm gonna give you to streams in Node, because that's another incredibly important understanding you need to have about, sort of, the core mechanics of Node is how it deals with streams.

[00:01:09] Streams are a kind of abstraction on top of passing data around in a highly efficient manner. They use buffers which are highly memory efficient, they're low on the garbage collection threshold, and they send the data back and forth. Whenever I was talking about the HTTP request and I said there was a rack and a res, those were in an input and output stream for a web request.

[00:01:34] So there's a request stream, which represents the requests that I make to a server, and there's a response stream, which represents the stream of data that I'll send back out as the response. Well, streams are really all over Node and they're getting more and more so. So it's probably something that you should start to get more familiar with.

[00:01:52] As a very simple example, how would we start to use streams here? So the first thing we're gonna do is we're not gonna call fs,readFile all at once in one big shot like that anymore. In fact, we're not even gonna set up our sequence in quite the same way, I'll return my ASQ.

[00:02:16]
>> Kyle Simpson: All right, so the first thing I want to do is I want to set myself up a stream variable and I'm gonna call fs.createReadStream. And I need to give it the file name to create a stream from.
>> Kyle Simpson: Now at this moment it has not actually loaded anything from the file.

[00:02:39] It's not gonna start loading something from the file until we ask it to start pumping us some data. And the way we're gonna do that is we're gonna listen for events on our stream. So, we might say stream.on, which is listening for events, I wanna listen for the data event.

[00:02:57] And every time the data event fires, I'm gonna get a chunk of data from my file. So I will also set myself up with a contents = "" that is empty to start out with and every time the data event fires, I want to append that stuff onto the end of my content string.

[00:03:24]
>> Kyle Simpson: Is everybody with me so far? So if .dataEvent fires once, it's only gonna add once but if it fired a thousand times, it would out on a thousand chunks to our code. By default the stream is gonna run to completion. There's ways to pause the stream and control the streams, it's beyond what we'll talk about.

[00:03:41] But by default, it will run to completion which means it'll keep going and it'll keep firing data events up to its maximum buffer size, it'll pull in each chunk up to the buffer size and it'll fire the data event, empty the stream, do it again, it'll keep going until it's exhausted the entire contents of the file.

[00:03:58] Now we would like to know when the file was finished. So there's an event called End. So we'd say stream.end. And that function gets no parameters but it just tells us that we have finished listening for the stream and, it's at this point, that we could say to our a sequence you're done and here are the contents.

[00:04:21] We've loaded the entire contents of our file. Does everybody see that?
>> Kyle Simpson: Now I want to just, for temporary purposes, prove that the data function is getting called. I'm gonna put a console log statement in here, okay? It's just a temporary debugging step to prove that this thing is being called.

[00:04:50]
>> Kyle Simpson: We don't need to make any changes to 3.js, cuz all we did was change the internals of how we deal with the contents, getting the contents out of the stream. Let's go back to our command line and say node 3.js --file = test.txt. Oops, what did I miss?

[00:05:12] I forgot that. I wasn't supposed to change that.
>> Kyle Simpson: Yeah, what am I doing here?
>> Kyle Simpson: Wow, that's right. Right, what's wrong with that returning sequence?
>> Kyle Simpson: Be just a moment, what's it talking about here?
>> Kyle Simpson: Not loading the right module of course. All right, so it called it once, waited a thousand milliseconds, then gave us our data.

[00:06:07] Did everybody roughly get the same result when they tried it themselves? Okay, let's try it, I also provide you a test3.txt which is much bigger. Let's try that. If we call it with test3.text, you'll notice that the data gets called several times before it prints this out, a 200K contents of our file.

[00:06:29] Let me show it again one more time in case you missed it. Prints out data, data, data, data five times and then t a thousand milliseconds later, it prints out the contents of that file. So that shows you that internally the data had to be called several times because the contents of that file were bigger than Node's internal buffer default size.

[00:06:58]
>> Kyle Simpson: Now we can go back and take out that console log statement. Now that it's no longer necessary.
>> Kyle Simpson: All right, questions about the read stream?
>> Speaker 2: Can we set the size of the chart?
>> Kyle Simpson: I believe there is a way to change the default buffer size but I don't actually know off the top of my head.

[00:07:23] I'll try to maybe think about looking that up over a break. I think there is a way to change it but I don't remember how. Usually, you wanna let Node handle that, though. It's pretty good at figuring that stuff out for you but I think there is a way.