Transcript from the "Flags, Metacharacters, and Quantifiers" Lesson
>> James Halliday: So I talked a little bit about flags. These are how you can configure the RegEx engine. So, these are the ones, I think I don't really use any other flags aside from these. Use the first two all the time and the second two pretty rarely. So, if you know ing, I'd say you're pretty set for the most part.
[00:00:24] In mns, you're kind of like rather more obscure. Okay, so the main thing that you need to know when dealing with RegEx is these things which are called meta characters. So these characters that are kind of, some of them are operated a bit like wild cards like the dot.
[00:00:46] Other are called quantifiers. So here's an example of what how the other dot.characters works. So dot means any character. So you can do things like, if we have the input, go hello beep boop, then we can replace any character that comes between a b and a p, so any two characters, dot dot with something else.
>> James Halliday: So if we don't pass in g, that's going to replace the first instance which should be beep. So it says, hello whatever boop, but if we put a g at the end said is going to replace every occurrence. So we get hello, whatever, whatever.
>> James Halliday: The next thing we can cover, is this idea of quantifiers.
[00:01:47] So quantifiers are a way of specifying how many times something should appear. Question mark is a 0 or 1 times, a star means 0 or many times, and a plus means one or many times. And there are other ways of doing quantifiers with curly brackets. I don't think I put them in the notes, but I'll cover them as well.
[00:02:14] So here's just some example that you can try about, I can explain what each one of these does. So if we have this string, dog and doge, then we wanna match everything that is dog, and it may or may not have an e at the end. So you can use sed -r here to not have to put that back slash or a dash capital e I guess on Mac.
[00:02:40] And this will match anything that's dog or doge, cuz the e can appear 0, or 1 time. Similarly you can match things like here the e can appear has 0 or more times. So might have no e's like you might just have BP or you might have a lot of e's like this or you have 2 or 1 e.
[00:03:07] It's all going to match and be replaced by a beep. So that works, and then plus is 1 or more time that means BP is not going to match. But bep and beeeep will match, because there has to be at least one occurrence for the plus quantifier to match.
>> Speaker 2: Question.
>> James Halliday: Yeah. Is that s slash, the beginning of that string, is that technically a flag where you would be able to do m slash if it was a multi-way?
>> Speaker 2: For the trailing bit, the part with the g?
>> James Halliday: No, the part with the s.
>> Speaker 2: No, this is specific to said and curl, and vim as well, where this just means that we're performing a substitution.
>> James Halliday: Got it, a substitution.
>> Speaker 2: Yes, so you don't do that in, the flag goes on the end.
>> James Halliday: Got it, You could do gm the you multiflag.
>> Speaker 2: Yes you could do gm and like also the order of the flag doesn't matter at all so there is i, g, m, and s more depending on what engine you are using.
[00:04:12] Okay, so why don't we just if you wanna try around the rejects.md files should be in the same ripple if you wanna like pull up some of these examples and while all of you do that I'm just gonna fiddle around on the command line for some stuff that you can just see what I'm up to if you like.
[00:04:33] And if you have any questions online or in person, then just ask them.
>> Speaker 2: So if you find any good files or anything else that you would like to like do some processing on, column formats are good for playing around with RegEx. I'll see if I have any in my data directory.
>> Speaker 2: The CSV files are pretty good. Sweet, I do have one, landsat.csv.
>> Speaker 2: It's a very big file.
>> Speaker 2: Probably not the best to use as an example though.
>> Speaker 2: [LAUGH] Or The CAT by Herman Melville.
>> Speaker 2: [LAUGH]
>> Speaker 2: If you want to experiment with a certain number of characters, you can use braces for your quantifier.
[00:07:24] So if I want between 2 and 4, I can do 2, 4. Or if you want more than 4, or more than 2, you can just leave off the remaining one. So that will match plus g.
>> Speaker 2: Wait, sed right needs to be put into enhanced mode to do that.
>> Speaker 2: Cool, Dashy works on Linux, so I'm gonna use that from now on. [LAUGH]
>> Speaker 2: Right, so see in this case, it's gonna match for because RegEx will try to expand out to be the longest sequence that it can. Which is called the greediness of the RegEx, there's also non-greedy RegExes, which are kind of an enhanced thing.
>> Speaker 2: So it's gonna match 4 and 4, and then there's gonna be a left over o in this case. So that's why there's a o in the output right here.
>> Speaker 2: Yeah, so with grep and sed unless you pass in -E, these commands are going to require that you escape the meta characters except for star for reasons.
[00:09:15] So, probably it's good to just use capital E all the time I would say, unless you wanna keep in your head which meta characters you have to escape which is not very useful information