Transcript from the "Character Class Ranges" Lesson
>> James Halliday: Okay, so why don't we talk now a little bit about character classes. So, character classes are ways of specifying different characters that could match. So we've already seen an example of dot which matches any character. But we couldn't match for example just several of a set of characters.
[00:00:21] So maybe we want to match, so if we have a string
>> James Halliday: 'dog
>> James Halliday: days'.
>> James Halliday: If we wanna match both D-O and D-A, we could make a little regex that will take d and then start with a square brace. And then just list out the characters that it could possibly be.
[00:00:53] And then we can replace that with
>> James Halliday: Capital DA, for example. So matches these two things.
>> James Halliday: Where it starts to get really fun is you can match things like character classes, you can put quantifiers around. So, for example, if we wanna match one or more of either a or o, WE Can put a + after that character class.
[00:01:24] And now it's gonna take that whole string of D-O-O-O-O-O and it's gonna replace it with DA.
>> James Halliday: That's something you can try.
>> James Halliday: So we might also wanna do something like put all the vowels, aeiou, and replace it. So now if we have like deg, it also matches.
[00:01:55] You can also do the negation of a character class by putting a caret at the start, so that will match any vowel. In this case, we don't have any of the letter d that's followed by a non-vowel, so it won't match anything. But we could put some in there, so, like, dx matches now because d is followed by a non-vowel.
>> James Halliday: More about character classes, you can specify ranges that the character classes can be using a hyphen. So for example, if we wanna match things like numbers,
>> James Halliday: So if we wanna match all of the numbers in here, we can do that by doing 0-9. And then if we want one or more numbers, we can use a +.
>> James Halliday: And so this will remove all of the numbers from that output.
>> James Halliday: So we just see the word hello what. You can also do the same thing with letters, so if we have things like a-z, so we'll replace all of the letters. You can mix and match ranges.
[00:03:24] So you can have capital letters and lower case letters all in a replacement, like that.
>> James Halliday: So if we had some capital letters in there, let's put a few. It replaces them, whereas before, it would not. You can also put more characters in there if you wanna match, like, underscores as well.
>> James Halliday: So if we wanna match A through Z capital, a through z lowercase in underscore. And if there is like underscores in here that we wanna get rid off, then that also gets rid of the underscores.
>> Speaker 2: You have the escape spaces there?
>> James Halliday: You have to escape spaces-
>> Speaker 2: Or what happens if you wanna put spaces in those brackets?
>> James Halliday: You can do that. It's fine.
>> Speaker 2: Or tabs?
>> James Halliday: Yeah, you can do that. I don't know how to type a tab but you can do it somehow. [LAUGH] Probably by writing a program to do it.
[00:04:24] So that also got rid of the space. We can see.
>> James Halliday: I see on the chat that someone asks is RegEx knowledge useful or very useful in REST API development. RegEx is one of those things that if you don't know RegEx, you're not gonna use it, and you can get by without it.
[00:04:47] But if you do know RegEx, you're gonna be so much better off when you see a problem where RegEx is pissed at you. Recently I read a bunch of code that had for example, it was a parser and it had all of these slice calls in it and it was doing index of everywhere and it was just all of this integer math.
[00:05:07] When a handful of RegExs would probably be much simpler and much easier to understand and probably also faster too because these RegEx engines are heavily, heavily optimized. And could avoid doing a bunch of allocations and things of that sort. It's sort of the kind of topic that is as useful as you make it.
[00:05:34] I don't know. I don't have a good answer to that question, but I know that I personally write RegEx like every single day and all the kinds of stuff that I do. So
>> James Halliday: You can certainly get away with not knowing it, but it's probably best to know it [LAUGH] rather than not.
[00:05:56] I'll show you how do some of this in Node as well. So if we have like this string, we can do .replace would be the word. So much of the time when you're programming something, you have a little bit of string that you just need to slice up a bit.
[00:06:15] And RegEx is
>> James Halliday: Oftentimes the best way to do that. So you have some prefix you wanna get rid of, or some kind of string format you need to split up and repackage. So we can just get rid of all of the certain characters. So if you have to test whether a password field has the right characters in it, you could do that with a RegEx pretty easily with a character class and maybe some quantifiers.
[00:06:44] You could use curly braces. Here let's build one of those, that would be kind of fun. So we can test whether a password field is acceptable under some policies. So suppose that we need to have at least eight characters and it has to contain a number and a lower case letter and a capital letter and one of a handful of symbols.
[00:07:10] So we can do that.
>> James Halliday: So if we have,
>> James Halliday: This is gonna be our test password,
>> James Halliday: Which, autopass I think. If we have a capital in it. So we can do something like, starting at the beginning, I'll get a little more into how that works. But we can use quantifiers now.
[00:07:39] So it has to have at least eight characters and $ means go to the end. That's in the notes as well. So we need to have a pattern that's going to be capitals, and lowercase and digits. And there is a shortcut for this called \w for word character.
[00:08:06] So we can test
>> James Halliday: Whether that condition is true with their password. It's not in this case because there's non-word characters there like an exclamation mark, so we can modify the character class to include some symbols.
>> James Halliday: I'll just hit the top key there. And if you have a dash, it's little bit special because you have to escape that with backslash if you don't want it to be a range.
[00:08:37] So in this case that part is true. So how do we enforce the second part of our password policy which is it has to have at least a capital letter, at least a lower case, at least a number, or at least a symbol. So we can do that by either testing each one of those cases, or doing it in the same RegEx is probably possible, but I wouldn't do that.
[00:09:02] I would just do it the simpler way. So we need to make sure that there's at least one lowercase letter. And at least one uppercase letter. And at least one digit, which you can do 0 through 9, or there's another shortcut called \d that matches any digit character.
[00:09:27] So do it that way, or you can do \d. And at least one symbol, so if we have a character class where we can type out all of the symbols that are allowed, being a bit careful about the dash because to escape that one.
>> James Halliday: Then, the password that we've defined in pw matches.
[00:09:53] If we do another password,
>> James Halliday: Password doesn't match our policy. So this is kind of a more practical example of something when you might need RegEX in sort of a rest server in HTTP layer, for example. I'll paste that into the chat, if you wanna mess with it.
>> James Halliday: So hopefully this makes sense. We didn't cover yet what the anchors do, but I can just get into that in a sec. Negating character classes is sometimes useful if you want to do things like match brackets. So for example, if you have some text and you wanna match whatever is in the brackets.
[00:10:48] I would caution you not to use RegExs for parsing things like HTML because they're actually nested. And so RegEx is not really a good tool for that, you should use a parser. But if you have just some simple formatted text that's like relatively uniform, then you can use it for that as well.
[00:11:07] So if we wanna replace anything that begins with a bracket and has a character class that's any character that's not a right-angled bracket, one or more of those up to a closing angle bracket. Then we will replace it with string XXX. Does that make sense? It's kind of a lot to digest.
>> James Halliday: So here we're just using the negation of that. It's also useful if you wanna, for example, if you wanna match where some quotes begin and where the quotes end.
>> James Halliday: Then we can modify this so that we're doing quotes instead. And it's the same idea.
>> James Halliday: So oftentimes you might have, for whatever reason, some strings quoted for whatever reason.
[00:12:07] But this is a handy way to do that. It gets more complicated if you wanna address things, like if you have a backslash quote that you don't want to show up. Maybe we can do that at the end, once I've covered kind of all of the rules for how to build RegExs.
>> Speaker 2: That top line there, can you just verbally say what you're doing there again.
>> James Halliday: Yeah, so all right. So we have the string, cool and then beans is inside of angle brackets, right? Open angle bracket, close angle bracket. And then some other fluff at the end. So what this RegEx says, is it says, okay, RegEx engine, I want you to find starting from a left angle bracket, once you've found one of those, here's what you gotta do.
[00:13:00] After the opening angle bracket, search for this character class. This character class means anything that is not a right angle bracket. And then it says, one or more of those kind of characters. So basically, step through that pattern, which is like anything that's not a right bracket, I want you to keep going, keep going, keep going until you hit one and then stop.
[00:13:31] And then I want you to match that right angle bracket that you stopped on. And that whole pattern, that you found RegEx engine, I want you to replace that with the string XXX which is what it does. So I can maybe walk through a couple of cases where it wouldn't do that, like if there's no closing angle bracket, it won't replace, because the RegEx engine will actually go all the way to the end of the string and it never found a closing angle bracket.
[00:14:01] So it's not gonna replace it. Could also maybe that is at the end of the string, same deal.
>> James Halliday: Does that help? Are people kind of getting how this little machine works? Cuz we are building little machines here with this RegEX language. Okay, cool.
>> James Halliday: So it takes a lot of practice to kind of internalize these, but very powerful.
[00:14:36] I use them all the time. I personally vouch for them. [LAUGH] Okay, so we used a couple of these already in some of the other sections, but there are a bunch of shortcuts you can use for different character classes. There is word characters, there's white space characters, there's digits which are w, s for white space, d for digit.
[00:14:59] And there's also the negations, which are capitalized versions of those shortcuts. So if you have W, it means anything that's not a word character. So it's the same as doing each of these character classes. So here we have, for words, A through Z, capital and lower case, and digits 0 through 9 and also underscore.
[00:15:20] But then the negation just puts a caret inside of the character class as the first element. And it does that for all of these different things.
>> Speaker 2: Can you mix that? Can I start, I don't know if you'd wanna do this. But can you start with positives and then obviously put a caret in and anything after that is a negation?
[00:15:41] Or would you not do it that way?
>> James Halliday: No, the negations can only be at the start of the character class.
>> Speaker 2: Got it.
>> James Halliday: So you probably have to have a couple of different character classes if you wanna do something like that. Okay, so these are just kind of the stuff you get for free in almost any engine or you might have felt like toggle extended notice, something silly like that to get it.
[00:16:00] But if the engine you're using doesn't support these, you can always use the more long form version of these rules.