Check out a free preview of the full Developer Productivity, v2 course

The "Unix Tools" Lesson is part of the full, Developer Productivity, v2 course featured in this preview video. Here's what you'd learn in this lesson:

ThePrimeagen provides a quick overview of several Unix tools, including sed, find, exec, xargs, parallel, and awk. He explains how these tools can be used for various tasks such as text transformation, file manipulation, and data processing.

Preview
Close

Transcript from the "Unix Tools" Lesson

[00:00:00]
>> ThePrimeagen: We're going to kinda do like a little quick little roundup of a couple cool Unix tools that you should just know about, because knowing about Unix tools is very, very, very fantastic and will just save your life a whole bunch of time. Remember that same file that we had, I'm gonna transform it so that you can do this, foo foo, bar bar, and still all be the same.

[00:00:19]
But we just wanna do a basic transformation on top of it. And so this is using something called Zed. If you're not familiar with Zed, you could always open up Vim and effectively create the Mac, create the regex in Vim, and then use it in Zed. And the reason why is that if you have your kind of Vim, even basically configured, you can have, you know.

[00:00:40]
You see how that thing is highlighting as I'm doing the search? It makes it a little bit easier when you're trying to handcraft a bit of a macro, or, I mean, a bit of a search and replace, because it's not gonna be nearly as confusing, because you're like, this is where I messed up, right?

[00:00:54]
Because, once you start getting into the slash d part, and then you're trying to put in all these things, and I'm like, is it like a slash plus, is it? You start going into that, then all of a sudden becomes really confusing, and you start getting everything incorrect.

[00:01:07]
You don't wanna have that right, because then it's just awful. So what I can do right here is I can do the following, I'm gonna do a little find and replace. And I wanna find foo effectively and replace multiple times. So I can do something like this. I can go colon and then I can do what I like to refer to as a fighting one eyed Kirby.

[00:01:28]
And so a fighting one eyed Kirby is you escape the opening parenthesis dot star escape the closing parenthesis. So it kinda looks like a fighting one eyed Kirby, right? There's only one eye on this poor guy. And so, that will create what is known as a capture group.

[00:01:43]
These are very important in your regex for ad hoc kind of regex stuff. By the way, I highly recommend you don't use regex in production software. Highly recommend using them on the command line and ad hoc query, foo, it's always good to have it in your belt, not good to have it in production, okay?

[00:01:56]
Unless if your belt is production, then that's crazy. You're Batman. Okay, you're not Batman. Not Batman. All right, so now that I have that, I can match that group and I can repeat it with a slash one. So notice that says foo, whatever I captured, it repeated. I can reference it and put it back in.

[00:02:11]
So this is just such a fun skill to have because I can't tell you how many little things that I found throughout my life where it's just like, I need to grab certain parts and just move them. And it's a hugely manual operation, or it can become just fine and replacing very, very great.

[00:02:27]
And so we theoretically could create this exact same thing and go, slash one, slash one, and then end it right there. And so there we go. We've now created kind of that exact same thing. So if I take this and I'm gonna highlight this and I'm gonna yoink this whole thing that I've created right here, right?

[00:02:43]
And I go cat out, right? Cat out. You could go to sed, and I could actually take this exact same thing, and you'll notice it does the same thing. So often whenever I'm working for a really good sed query, I'll use Vim to kind of help myself through it.

[00:03:00]
But sed allows you to do this in line, in a pipe throughout it. Again, I know you can also just pass in the file to sed itself. You can replace in a file with sed. I think at one point people used to program using things as arcane, as sed, and add, and grep, and all these kind of things.

[00:03:16]
I never lived in that world. That just sounds nuts, though it does sound a little fun to spend one week trying to become good at Ed? Can you become good at the standard editor and just like, that'd be just so much fun. Maybe it wouldn't be fun at all.

[00:03:29]
But in my head, romantically, it sounds fun. So it's good to just have a little bit of sed in you because you never know where you're gonna need to replace one thing with something else or you're gonna need to do some sort of transformative operation on a stream of input coming in.

[00:03:42]
It's just a really good thing to have available to you. So I just wanna make sure you know about that. We've used find a lot in this, but I've never really talked much about it. Find is really, really great. And when you have to do something over a lot of stuff, it's really good to know about this exec.

[00:03:58]
It'll just save you seconds upon seconds of execution when you have to do this a bunch. Instead of piping the results of find into something else, you can use exec, and it'll just save you a lot of time. So it's very, very fantastic. You don't even need to know how to use it.

[00:04:11]
You just need to know that it exists. So at someday in the future when you do this, just use that and your life will be much, much better. All right, fantastic. Okay, xargs. This is where things get really exciting, okay? Xargs, all right. If you're not familiar with xargs, what xargs allows you to do is, say you have some input coming in, and you want to do something with that input, but you want to do something over and over again.

[00:04:34]
Have you ever tried to remove stuff you're finding? You have a list of files you wanna remove. Say you had a file that contained 10 files that you wanted to remove. How would you do that? Well, it'd be a little bit more complicated if you didn't have xargs, cuz xargs allows you to take something, and then capture it, and then do something with it.

[00:04:51]
So I could theoretically go curl.com. So notice that I curled 123.com. So it allows you to take input and then pipe it into something else. So if you wanted to remove those files, you could actually have RM, capture it, and it would actually remove those files. So it's like one of those nice things if you ever find yourself in.

[00:05:12]
Somehow I find myself needing to remove a group of things regularly, say, here's a good example. Have you ever accidentally not have your TS config correctly specified. Somehow hit TSC, and then now your project's filled with a bunch of JavaScript and TypeScript files. Well, you can literally find source and then you could actually technically do delete if you wanted to do it.

[00:05:33]
It actually worked that way, but I'm just trying to give you an example here, okay? We're not gonna do that. You could technically find source, list everything out that matches some sort of pattern, and then you could xargs and say, delete on those ones, right? You could delete on top of everything that you find that say is a JavaScript file.

[00:05:50]
So you can kinda do some things to kinda compose yourself in this way. Xargs is super, super useful. But there's also one thing that's kinda sucky about xargs. Notice that, as I did that it just executes it one at a time, bap, bap, bap, bap, bap, right? So if I add in 4 or new line 4, new line 5, new line 6, new line 7, it's just da, da, da, da, da, every single time, right?

[00:06:11]
Maybe that's not what you want to do. Maybe you want something a little bit more robust, right? Yeah, am I the only one that says robust that way. Okay, so we're gonna move on to something called parallel. Parallel is a fantastic tool. If you don't have it you should get it.

[00:06:27]
It can make your life again really easy. If you ever have input that's coming in. If you're doing anything with logs that you need to go somewhere else and check something, but you don't wanna do it all at once and actually make like 9,000 connections to a database or something.

[00:06:40]
You wanna do say 3 at a time, 5 at a time, 8 at a time, 10 at a time. You can do this all with parallel and turn a very pain in the ass program to write into a simple program to write. Not only that, say you have some sort of tool you developed that takes a line of input, does something that takes a long time and then gives you out a result.

[00:06:57]
Well, it would really suck if you had to do a thousand lines. You're like, now I have to write this program to be multi-threaded if you wrote it in Node, or if you wrote it in Go, it would just work. But hey, we can't always write things in Go, but like pretend you had to write it in Node, now you want to be able to spawn like 8 of those or 10 of those instances, but you want them plucking as fast as possible because some run short, some run long.

[00:07:17]
Parallel again, if you write your script just to do one thing. Parallel can paralyze it for you, so you don't have to worry about that. And even you can determine how much you want it to parallelize. All right, so let's go like this. This is just like such a great one.

[00:07:31]
I'm just gonna call count. All right, there we go. There you go, 1 through 9. Fantastic, I can go like this, cat count, right? There you go, 1 through 9. I can take account, and I can go into parallel, right? And I can say, hey, I want three at a time, and I want to go curl.com.

[00:07:47]
So now we just did three at a time, and so we could do it. We can do it one at a time, okay, that's not that fast. We can go two at a time. We can preserve, I believe k preserves input ordering. So that means1 will always be in the one spot, 2 will always be in the two spot, 3 will always be in the three spot.

[00:08:05]
So that way, even if some of them start running really fast, it will buffer up all the input for you and make sure that it remains in the order that you specify, which is, again, that's a really hard problem for you to program. If you had to program that yourself, it's gonna take some effort, right?

[00:08:18]
That's a non trivial amount of effort, because you want to pipe it out as fast as you can. But anything that runs ahead, you don't wanna pipe it out. So that way you could have parallel program to another parallel program to another parallel program, and you could just keep on doing these things, and it'll just go through as fast as possible.

[00:08:32]
So absolutely fantastic. Just such a fantastic thing to have in your tool bag, because anytime you write a script, you never have to think about the multi-threaded side of things, if you're bound to a language that only has one thread, right? And so, it's just a lifesaver. It's just such a lifesaver, especially when you have a lot of stuff.

[00:08:51]
I had this right before I left my previous company that I'm wearing right here, I had to do a bunch of stuff with games, and on it, we were getting millions of rows of data per like, every couple minutes. And so I had to go through all these things and try to figure out, say, what is the frame rate doing?

[00:09:07]
How healthy is the machine doing? All these kind of things. And so it's just like, that is such a hard problem when there's also 50 games at the same time. And so I wrote a simple script, and I could use parallel to do certain levels of querying out to the database, other sources, make sure I'm not flooding that and just doing like a million requests at once, right?

[00:09:25]
I can control this entire system and then write a series of small simple programs that just work on input outputs and just daisy chain them all together as a single command and just made life really, really easy as opposed to just being really huge pain in the ass.

[00:09:38]
Anyways, there you go. I think that's great awk. If you don't know awk, awk's kinda weird. I'm not very good at awk, but I only use it in one way. You can get really expressive with awk. I'm just not that great with awk. But let's say, if you go a little ps aux, and we're gonna grab vim, right?

[00:09:52]
I'm gonna get a couple processes that are running Vim, right? What happened if you had some output and you wanted to sum up the second item throughout all these things? How would you do that? Well, likely, a lot of people would just reach for, again, a program. Split by space, take the first entry, convert them to numbers, sum them all together, print them out.

[00:10:12]
Well, I mean, you technically don't have to do that. You can use something like awk, which would be sum plus equals, I think it's, I wanna say it's two, two, and print some, right? If I've done this all correctly, it should just sum them all together. And so you can do things where you can pluck stuff out, add it together.

[00:10:33]
It's like a pretty squirrely language, considering that awk is a genuinely old kind of program, and it just create variables and things just work. And you can assign variables. There's dictionaries. You can do quite a bit of ad hoc scripting in a really simple way in this program.

[00:10:50]
So it's just good, just to have a little bit of knowledge that it exists, and then maybe some ChatGPT if you don't use it there very often. One of the hard parts about any of these things is that often you will not use them at all for months on end, and then all of a sudden you have to use JQ every day for like a week straight, and then you won't touch it again for six months.

[00:11:09]
And so you don't have to memorize everything. It's just good to know that it exists, that you can do these operations, cuz it's easy to go recall that knowledge, especially now in the day and age of LLMs. It just makes it super, super simple to find effectively what you want.

[00:11:26]
You describe it in your dumb caveman style, it produces out an answer that's relatively close to it or completely hallucinated, and you can eventually get your way there. And so it's just so easy, just long as you just know that this thing exist, your life will be dramatically better when you need to do them.

[00:11:41]
And then just get in the habit of doing them, because then you start getting a little bit faster. You get faster and faster, and eventually you kind of know the basics. And once you know the basics, you can get through quite a bit before you need to go look up.

[00:11:52]
So, I just wanted to give you that little bit of like encouragement of how awesome that stuff is.
>> Speaker 2: Something like JQ in those tools. Do you use use those explicitly only in scripts and bash stuff or do you ever use them in an equivalent of that syntax and server side code?

[00:12:11]
Because that syntax for the JQ seems pretty powerful to kinda use all the time for JSON transformation and stuff.
>> ThePrimeagen: Yeah, I mean, it is really convenient. I don't know if JQ is a library. I don't know the performance characteristics of it. I've only ever used it as just ad hoc query foo.

[00:12:30]
And so it's always me bringing down a bunch of stuff, need to do a bunch of stuff, and then that's that. And so I'd be a bit hesitant to do anything beyond that and incorporate it into something else, because I really just don't know its exact run time or the consequences of that run time.

[00:12:47]
Because you could be just massively incurring something that's much, much slower. And if you're spawning, and if you just have to use it as like a process that you're spawning, spawning additional processes every single time, requests are coming in would also be massively bad. I would probably suggest not doing that.

[00:13:03]
And so my guess is it's no, but you never know. You just never know. And anytime you use in a language, typically things tend to be slower, right? Cause now you have two problems. Now you have your DSL that you're debugging and often debugging a DSL is really awful.

[00:13:20]
Even the worst of enemies I wouldn't recommend debugging some DSL, it's just unfun, and so, I mean, you've even experienced this with JSON when you have an error in your JSON and it's just like error. You're just like, crap, where's the comma, right? Now you're gonna play the game of look through your package out JSON, just try to figure out where the comma is, because it's just unfun to look at DSL errors.

[00:13:42]
They just are never good. They're always very frustrating.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now