Introduction to Bash, VIM & Regex

QA: Regular Expressions Reference

Introduction to Bash, VIM & Regex

Check out a free preview of the full Introduction to Bash, VIM & Regex course

The "QA: Regular Expressions Reference" Lesson is part of the full, Introduction to Bash, VIM & Regex course featured in this preview video. Here's what you'd learn in this lesson:

James shows the Perl Regular Expressions Reference as a useful tool that is probably already installed on your OS for researching syntaxes for Regular Expression. James takes questions from students.


Transcript from the "QA: Regular Expressions Reference" Lesson

>> James Halliday: Another good tip, if you want a handy reference that might already be installed on your system, you can do perldoc perlreref. Even though we're not doing Perl, it's a very handy regular expression reference. If you just scroll passed all of the Perl-specific part, you see the syntax section is a really good, quick reference for something that's already on your system.

So you can see that here the dollar sign is anchor at the end of the string, different kinds of range quantifiers, character classes. It gets into all of the kinds of escape codes as well. Here's a reference for digit, nondigit, word, non-word, whitespace, non-whitespace. This horizontal whitespace as well is something that might be supported in some engines.

>> James Halliday: I know that I personally consult this now and then. Especially if I need some rather more esoteric kind of capturing, if the engine supports it. We're not gonna go into the quite so fancy stuff though, because that's.
>> Speaker 2: You showed that example that you had up, where you were replacing a word within a file.

>> James Halliday: Yeah.
>> Speaker 2: Using sed.
>> James Halliday: This one.
>> Speaker 3: Can you comment on how, I think this is all oriented around lines. And is this, what if you wanna match something on multiple lines, or?
>> James Halliday: Right, so sed and grep are oriented around lines, generally. If you do this kind of thing in JavaScript, it's not oriented on lines, unless you pass in a modifier, I think m.

If you pass the m flag into your regex at the end, then that will turn the regular expression from non-line mode into line mode. But the default behavior in JavaScript is that things will generally process as a big single string, single line, even if there's new lines in it.

But that's not true of sed and grep, I don't know if it's even possible. It's possibly possible to configure sed and grep not to work in line mode, but they're really oriented around line mode operation. If that answers, yep. Yeah, regular expressions are kind of one of those things that, there's a bunch of rules.

We've more or less gone through all of the rules, but it's kind of like the implications of those rules are not always obvious. And sometimes it takes just a lot of experimentation to kind of work through how you can use those rules to solve problems in different ways.

And how the different pieces kind of work with other pieces. You can, for example, always put a quantifier on a group, or you can put a character class inside of something and it's sort of like, it's a language like any other. So there's this synthesis that happens, but I don't know, it's hard to communicate that without having people go in and try stuff out, I guess.

>> Speaker 3: When you did the matching for the password or the password rules.
>> James Halliday: Yeah.
>> Speaker 3: You put a quantifier on the end, the {8,}..
>> James Halliday: Here it is, yes, so this means 8 or more characters
>> Speaker 3: Yeah, so the comma, and then no character after that, no number after that, means unlimited?

>> James Halliday: Yes.
>> Speaker 3: So if you put a limit on it, you could say a max size of something?
>> James Halliday: Yeah, a max size of 16, for example, would be just comma 16. It means anything between 8 and 16, inclusive. And the thing on the left of the quantifier might not even be a character class.

You could even have something that is not a single character, it could be like a group, for example, so if you had something else there.
>> James Halliday: That's kinda what I was trying to get at with how these operations sort of compose.
>> James Halliday: There we go.
>> Speaker 3: So can you help me understand the difference if the quantifier with the 8 applied to a group that you just had when you put the, what's the difference?

I don't think I quite got it.
>> James Halliday: Yeah, why don't I come up with an example for that. So if we have something like,
>> James Halliday: Something that's multi-character that repeats would be,
>> James Halliday: So say we have,
>> James Halliday: Oops, some string, this is an extremely contrived example, but I'm sure there are much better examples of this.

So if we want to match at least, actually, why don't I do a test for this particular case? If we wanna match this pattern at least twice, for example, so we could do maybe something a little more complicated. Capital or lowercase c, and then a t, and then maybe d, o, o or e, I don't know.

And then we take that whole group, right? Or maybe it's or, right, so we'll match cat or dog, and with these variations exactly two times. So that would match because there's cat and dog. Hang on. All right, and then you've gotta have another quantifier actually, like a space or a white space character.

There we go. So, [LAUGH] here's an example of that. So this says, all right, so the regex engine, it's like, all right, I get a start. Whenever I see the start of this pattern, either a cat or dog, I have to match, and I'm gonna keep matchings. If I hit c, then I better see an A, a capital or lower case a.

If I see a d, then I better see an o or an e followed by either g or a t. And then I'd better see a white space character, and then I'd better see two of those whole patterns, right? So things that will match are like cat cAt, hang on a sec.

>> Speaker 3: You need a space after the second cat?
>> James Halliday: I have a white space character, that ought to match.
>> Speaker 2: What about the second cat?
>> James Halliday: Yeah, you're right, it needs a space after it. So here's an even fancier way to make this happen. You could do another group that's like either whitespace, or the end would match in both cases then, so.

Yeah, these things, you can stack them in all kinds of ways, maybe two or more would match. Or maybe you wanna do a replacement, so we can turn this into replacement.
>> James Halliday: And just replace it with something,
>> James Halliday: So that takes all of those out.
>> James Halliday: And because we put in the other grouping, it'll also just go to the end, and also match.

So it doesn't have to be whitespace, necessarily. There's yet another shortcut for this kind of thing, where you have either whitespace or the end. You could do, in some engines it's \y, but in other it's \b. I think in JavaScript it's \b, which is a break word character.

All right, maybe JavaScript doesn't have that one. I guess it doesn't. Anyway, so you can always enumerate the possibilities instead.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now