Transform Data

Substack

Lesson Description

The "Transform Data" Lesson is part of the full, Networking and Streams course featured in this preview video. Here's what you'd learn in this lesson:

To transform a data with a stream, James shows how to convert text case within a file and display the result in standard output. James takes questions from students.

Join Now or Start a Free 5-Day Trial

Preview

Transcript from the "Transform Data" Lesson

[00:00:00]
>> A way that we might modify this program now is to transform the data before we print it out. So a fun thing that we might be able to do is maybe we want to convert those chunks of data to uppercase. That's a really simple kind of transformation that we could do.

[00:00:19]
So for that example, we're gonna use a module called through2 and we're gonna insert it into our pipeline. So, we have the readableStream, was the first part of our pipeline, and the final part was to print out the data to standard output. And now we're gonna insert this part in the middle so you call through with write function, or an in function that's optional.

[00:00:44]
And the write function is going to transform each chunk. So let's just write that out. Modifier program. Maybe I'll call it a loud.js because it's going to uppercase things. So we require the through2 package. And now we pipe to through and we're going to have a right function that's going to.

[00:01:10]
So with the function that you provide to through and this is also true if you use the core transform stream method. You get these three arguments buffer encoding, and next. Buffer is a node.js buffer object, which is kind of like a binary representation of data that's coming in.

[00:01:32]
Encoding, you can just ignore encoding. I don't think that I've ever used that option in any program, maybe like once that's it. And then the next function is what you call when you want the next piece of data. So there's two ways to write out how you want the output to be formatted.

[00:01:53]
The first way I'll show you is to call the next function. The first parameter that you pass the next function is going to be an error object if you have one. So this is similar to the node,js callback convention. Where are the err back where the air is the first argument to a function, so we're going to call it with null because we don't have an error.

[00:02:15]
The next piece is the piece of data that we want to send out on our transform stream. So if we convert that buffer object to a string, then we can call that touppercase. And in this way of dealing with transform streams, the output has to be a buffer or a string.

[00:02:35]
So this should give us a string of output, so that should work. Okay, question.
>> I just don't quite understand why you couldn't just do the to string like pipe in that directly instead of going through the hole through.
>> Why you couldn't pipe tostring in directly?
>> [COUGH] Yeah like you got your files, and then you just like, do like a dot pipe kind of make us up, I don't know but like to upper, and that like does that I don't know what the through gets us that's the question.

[00:03:15]
>> The through is gonna turn this function into a stream into a transformed stream in particular.
>> Okay.
>> We could write it that way. Maybe you're thinking about this, like if we had a two upper function, and we instantiate it, and then we have a function to upper, and we could return through with that function that would be another way that we could write it if that makes more sense.

[00:03:42]
But through is going to take these functions that we provide that can act on our data, and it builds a stream out of those functions. But both of these are fine ways to write it. It just kind of depends on what your needs are in particular.
>> Keep in mind it's working on a chunk of data, not the whole file.

[00:04:00]
>> Yeah.
>> So these chunks also can be arbitrarily sized. So you can't really have in this case shouldn't have assumptions about the size of the data or how that data is going to be broken up. Because, things, it could have new lines, or maybe it doesn't or, things can be spaced, it could be a single byte, or they could be kilobytes.

[00:04:27]
That's not really that's more up to the operating system in this case than it is to your program. There are ways of chunking data that are streaming transforms which we'll get into some of those. But if you deal with input, then the chunk sizes aren't really up to you.

[00:04:44]
So you have to deal with them yourself.
>> So anytime we want to modify the stream we need to use through and then do whatever it is that we want to do?
>> Yeah, you can use through or you can use some of the other userland modules, or you can use core streams.

[00:04:59]
I've got an example of using core streams I like to use through it's kind of just what I have a habit of using but the, so the stream requires stream transform method is actually a lot more ergonomic than it used to be. So you could also use that if you like.

[00:05:16]
Okay, so there's a question in the chat, is it suggested to use nodes built in modules are usually in NPM packages. So I would say generally that I think it's better to use the userland packages because they're just nicer. There's kind of like less boilerplate. They're less particular and I tend myself to use the user land stuff a lot more.

[00:05:42]
And when I do use the core modules, I actually don't tend to require a stream. I tend to require the readable stream package from NPM because that way, you don't have this implicit peer dependency on node core, because node core can change things and they've done that and they've broken my packages, and I didn't like that very much.

[00:06:03]
But when you use a userland package like readable stream, and all of these, well, most of these userland packages like through2 depend on the readable stream package. And behind the scenes, they don't require a stream. Then you actually have explicit versioning control so you don't have to worry about either like the next version of node breaking your package and you don't have to worry about people in different versions of node who have like different versions of the stream interfacing core.

[00:06:34]
This is kind of a general pattern that I think works a lot better, especially over time is to use userland packages that have semver because otherwise, like, who knows what version of stream you get from node core, it's whatever is there. So, okay, so if you people wanna try this out, you can change this program in different ways too.

[00:06:56]
You might make it lowercase or you could replace all a's with at signs or something with that replace. But let's just run this program to make sure that it works. Oops, right, so we have to give it a program as input or a file as input. So if I run that program with our file, indeed it converts it to uppercase.

[00:07:28]
Another fun way that we might change our programme is maybe instead of reading from a file, we can read from standard input. So, if you read from standard input then the input of your programme doesn't have to be a file it could actually be the output of another programme which is sometimes a lot more useful.

[00:07:49]
So we can just take this piece right here and replace it with process.stdin. Now if we run the program now it's gonna sit and wait for us to type stuff in on the keyboard so I can type hi. Hello, Types in gibberish on the command line and it indeed buffers everything up.

[00:08:14]
Could also use, could pipe the input of a file into it and now is allowed version of our program we just typed, it's kinda fun.

Learn Straight from the Experts Who Shape the Modern Web

In-depth Courses
Industry Leading Experts
Learning Paths
Live Interactive Workshops

Start a 5-Day Free Trial