Introduction to Bash, VIM & Regex

Anchors, Groups, and sed

Introduction to Bash, VIM & Regex

Check out a free preview of the full Introduction to Bash, VIM & Regex course

The "Anchors, Groups, and sed" Lesson is part of the full, Introduction to Bash, VIM & Regex course featured in this preview video. Here's what you'd learn in this lesson:

To have a search pattern check at the start of the beginning of a string, James shows anchoring within Regular Expressions. Then James demonstrates capture groups, which allow for targeting a number of letters or numbers. Next James introduces sed, a stream editor used to perform basic text transformations.


Transcript from the "Anchors, Groups, and sed" Lesson

>> James Halliday: I have been using this a bit already but there is this concept of anchoring the regular expression. So if you want the regex engine, it's walking along your string like character by character. If you want it to start at the beginning you use a caret. This is, unfortunately caret is also used for character class negations but this is a different one, that is not at the start of a character class.

So you can anchor at the beginning with a caret and anchor at the end with a dollar sign. And later in Vim we will learn that these conventions are actually rather more widespread than it might seem. So, why don't we play around with that a little bit, so may be I'll stick to node here.

So suppose we have a sting like cool beans and we want to replace beans but only if it's like the start of the word, we wanna capitalize it. So that's not gonna match normally, right? Unless we start the string out that way, then it matches. Same thing with dollar sign, but at the end.

So we had it the first way, the dollar sign matches. Makes sense, okay. There's also similar in spirit to how character classes work, how it kind of enumerates a bunch of options. We have this notion of groups, both captured groups and noncaptured groups. This is a way of telling the engine like either one of these patterns is valid.

So, but these patterns can be in multiple characters, not just a single character like with the character class. So if we have something like, say we have beep and boop and we want to match b and then either ee or oo.
>> James Halliday: We can do that.
>> James Halliday: Another way of writing this might be you could do this and then these are quantifier to specify two of them be the same thing.

So, capture groups are a little bit special. So, capture groups can also be used to match content. So if I wanted, so suppose I had, if we go back to that example earlier with the brackets, so suppose that this is our string. And now we want to match everything that is inside of the brackets.

How can we do that? Capture group, so I'll put that. That's how we had it earlier, right? So you can put that middle part in parentheses. This tells the regex engine to save that text and make it available and so we can use .exec with the string. And now, the match object that we get has the contents that were captured as m[1], which is the part that is inside of here.

Everyone of these capture groups that you specify in your regular expression is going to be available with an integer index. So if we have two of them, the second one would be m with square brackets, 2, and it goes one, by one, by one. You can also use these in your replacement.

I'll show you an example in JavaScript, and also an example in sed. So for example, if we have the string cool <beans> .replace that, maybe globally sure, you can refer to this capture group by doing $1. So maybe we wanna change angle brackets to quotes, for example. So I can put quotes in there, and then the replacement engine is gonna substitute $1 for whatever I put in the first capture group there, which is now cool beans with quotes around it zzz.

>> James Halliday: Maybe we want to also capture the next word as the second group. So you can put $2 in there as well, and that is also captured.
>> James Halliday: There's also a non-capture groups, so sometimes you need the, like if you need to use the OR character, inside of a group.

For example, if we want to match either angle brackets or maybe curly brackets, we can do that. If I'd have done that with, let's do it like this.
>> James Halliday: So if we have this first pattern, which is gonna be angle brackets, and then our second pattern maybe will be the double quotes, right?

So we'll just do the same kind of thing. It would be good if we could maybe not match the first bit into a pattern. You can do that by specifying a question mark colon around, and that just basically means don't capture this pattern. Just a nice little thing sometimes if need, if you have to add a bit that is a branch into your regex.

But not something that you have to use all the time. We can also do capture groups in sed, there's a few ways, you can use the $ syntax or you can also use a / in front of the number. So if we go back to sed,
>> James Halliday: And we do sed -E and then do this replacement.

>> James Halliday: If we want to just get rid of the angle brackets, for example. Whoops. All right, and you've gotta do the parens around it if you want it to work. There we go. We could also, like, put around like tildes or something,around the quoted bit, that works. Dollar sign I guess doesn't work in sed but it does work in JavaScript.

Related to this is the idea of back references. So, you can actually use a capture group that you've captured in your regex inside of the regex itself, so it's still inside of the pattern. For example a good use case is this if you wanna search for all words that are kinda of repeated twice, like the the or something, we can do the regex.

So the the whatever cats dogs dogs etcetera. We can now search for all words that are, immediately after the word is the same word, which is what this means because \1 refers to the first capture group that was just matched. And we can replace that with the first word again.

So this regex right here, this substitution, will replace all of the repeated words. So it should replace the the with just a single the and it should replace dogs dogs with a single dogs, did it do it? All right you have to do it in -E mode, of course.

There we go, so that's what we expect. Back references also works in node, so we can take this example and I'll just show you how to do that in JavaScript really quick.
>> James Halliday: Do just w+ \1/, and then you have to use $1, but otherwise I think they ought to be the same.

Yeah, there we go.
>> James Halliday: Right, and -1 in JavaScript land means something else, so you have to use dollar sign.
>> James Halliday: Okay, that should be a lot for people to play around with. So there is more that I'm gonna get to in a bit but just play around.

If you have any questions, if you're getting stuck you can either use the command line, you can use something like the node repo. You can try out filtering files with grep to search for patterns, so you can try using sed. There's also some handy flags for sed like you can use -i which does an in place substitution, so if you wanna modify a file this a pretty nifty trick.

So suppose in this file hello.txt we want to modify it to replace the word, maybe we'll replace the word cool with the capital, maybe the word cool spelled with a k, because that's really tacky. So if we do cool with be also 0s and 1, that's extremely tacky -i.

And now instead of printing out to standard out, sed will go ahead and update that file first in place. And if I open it again, yep, we can see that the file was modified.
>> James Halliday: And, that's just pretty handy if you have a file that has a bunch of stuff that you want to fix really quick.

Just, you can do it all in one place like that,

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now