Transcript from the "Curl, Grep, and Pipeline" Lesson
>> James Halliday: So here's another fun command, it's called curl. So curl will take any url, so here's the project Gutenberg from Moby Dick that I've been using for my example. You can get it from the notes or if someone could paste it into the chat that would be handy.
>> Speaker 2: I just did and I was playing with this and I had to add a dash dash compressed flag cuz it was coming back gzipped.
>> James Halliday: Yeah. Well, there is another command that we can do with a pike that might even be a better example, but, yeah, okay.
[00:00:31] I think when I last did this workshop it wasn't gzipped by default. So anyway, if we do curl with the url, what the curl command does is it loads the url and it prints the contents to standard out before whatever URL you've done. And I recommend to use the dash -s flag with curl because by default, curl prints a progress bar that can kind of get in your way sometimes.
[00:01:00] So, if I do if I do curl -s and give it a URL, like a text file, or whatever kind of file that you want, then it's going to print the contents to standard out, which happens to be compressed with gzip. So there's a program that should be already installed on your system called gunzip.
[00:01:21] All right, so if you happen to have done the curl command and your terminal looks so weird like this, and if you type and it looks like gibberish, you can fix it by typing reset. And now you're terminal should be better. Okay, so there's a program called gunzip, which is spelled like this.
[00:01:46] And that will read from standard input, g-zipped, compressed data. And it will write uncompressed data to standard out. So we can combine these existing tools to do something that we want. So here we go. And now I see the uncompressed output. Pretty sweet. All right,
>> James Halliday: So we can do some other fun things.
[00:02:09] I'm gonna cover this more in the regular expression section later. But here I have a little program using sed, which replaces all whitespace with new lines. So if there's something like a space, it's gonna turn that into a new line. And this is really useful because it means that every word will be on its own lines so we can start to do things like a filter those lines or a count the number lines, for example.
[00:02:41] So if I add that a little bit at the end,
>> James Halliday: at the end of my program, then now, I see every word gets its own line. And I can use another program called grep that's gonna filter the output. So suppose that I only wanna see lines that contain whale.
[00:03:05] It's Moby Dick so of course there's a lot of them. Also sometimes, whale is at the beginning of a sentence. So like a capital W. There is an other flag that you can pass to grep that does case insensitive matching, -i, which you can put in the middle or you can put it at the end because the order doesn't matter.
[00:03:27] So if I want to see all of the instances of the word whale, lowercase or uppercase can do, we curl this text file, which is compressed. Then we gunzip it, and then we turn all of the whitespace into new lines, so that every word is on it's own line.
[00:03:45] And then we search for all of the lines that contain the word whale, like that. And now we can pipe that to the wc -l command to know how many times in Mobby Dick the word whale appears, which happens to be 1,692. So you can answer question like this about files just by kind of composing all of these very small tools that kind of do one thing and do it well.
[00:04:19] Again, I'll get more into how this works in the regular expression section but It's not just whale, but also words like whaling, and different kinds of verbs that include the word whales. So if we wanna see everything that matches whale, or whaling, we can use this group right here.
[00:04:43] And the pipe character inside of the string there means either an e or ing.
>> James Halliday: And so, whoa. All right, and you have to give grep the capital E command if you don't want to back space the parentheses. Again I'll get more into how that all works in the regular expression section.
[00:05:06] So there are actually 1,823 instances of whale words in Moby Dick, which is a fun question that we can answer with our command line pipeline here, which is a little program that counts whales.
>> James Halliday: Yeah, so we can use some of the other stuff that we just talked about as well.
[00:05:31] So for example, maybe we wanna, that's really good pieces of information we wanna save for later so what we can also do is we could use the redirect character so we're gonna use > to write the whale counts to a file. We'll call it whale-count.txt. And so now, we have that information for later.
>> James Halliday: And you can combine and remix all of these sorts of pieces. So I think that should give you a lot to play with hopefully. I've got a little breakdown here of what every piece does just to kind of review and then hopefully everyone can get a chance to play around with some of these commands because they're really fun, I think.
[00:06:20] So curl -s, again, fetches Moby Dick from Project Gutenberg. It so happens that you get compressed output to your terminal so you can pipe that to the gunzip commands, which reads from standard in and prints the uncompressed text to standard out. Then we have this little program here that runs sed which does substitutions.
[00:06:46] So in this case /S means non, or no, sorry, / s means white space characters and + means one or more white space characters, so that's what it matches. It replaces one or more white space characters with a single new line. And the g means it does that for every whitespace character that it finds, not just the first one.
[00:07:13] So if you want to play around with this little sed thingie, you can sort of see what this program does in isolation by typing in some data with echo. So here I have just some one two three, whatever. And I can see that in fact it works like I expect.
[00:07:30] That's a really nice thing about pipelines in Unix is that you can always test what every piece does in isolation before building something really big and complicated. So, that's what this little piece of our pipeline does. There's also grep which filters the output. So remember before we had all of the words in Moby Dick and now if we put this rep piece on the end of our pipeline, we only see words with whale in them.
>> James Halliday: And then finally, wc -l will count the number of lines in the output. So, we can count the number of times that whale appears.