Full Stack for Front-End Engineers, v3 Security & Hashing
Transcript from the "Security & Hashing" Lesson
>> So when it comes to creating a server, or an application, or anything really, you have to have a way of login. Generally, most people use username and password, that's pretty well established. Pretty low friction, we're all pretty used to it. But what's the problem with using a username and password?
[00:00:19] Anybody, throw out.
>> Someone else has the credentials, they can get in.
>> Yeah, someone can steal your credentials very easily, what else?
>> Someone could guess them or steal them, change them on you.
>> Yeah. What else?
>> It can be sniffed over the wire?
>> They can be sniffed, yeah.
[00:00:43] Yeah, also, you can lose them pretty easily, you can forget your password. And humans tend to do this, so we all tend to fall back to the same set of patterns. So we're talking about computer science here in software engineering, but really we're talking about human behavior. In human behavior, we're very predictable creatures.
[00:01:02] In fact, These passwords here to the left are the most commonly used passwords on the Internet. Password is still the most commonly used password, it's incredible. We all know, we all hear about hacks, we all hear about losing credentials, things like that. Yet, humans are such that we are so against changing our own behavior that we do things that are counter to our benefit.
[00:01:30] It's really incredible honestly. So, yes, there's a question.
>> A couple of people online said they can easily be forgotten, and then one person said you can't use them with the command line.
>> Yeah, you can use them with the command line. You wouldn't want to, there's better ways if you're already on command line.
[00:01:53] There's better ways to do it, but that's good thinking. So username and password, probably not the best idea to secure our future billion dollar application on DigitalOcean. So we can use biometric login. The problem with biometric login, it's not super portable, not all devices support it. Actually, I'd say most devices don't support it.
[00:02:12] You have to enact its own standards. It's pretty much not standard right now, that is changing slowly. I would say in ten years, we will see a lot more biometric authentication, cuz it actually just makes sense. A lot simpler using your thumb, or your face, or something like that, but we're not there yet.
[00:02:28] So what we're gonna use is, we're gonna use an ssh key. SSH is one of the more powerful concepts when it comes to server authentication, probably the most powerful. And the fact that I won't say you can't go wrong with SSH key, but it's really hard to do things bad.
[00:02:47] And no one's ever gonna guess your SSH key ever. I won't say ever, quantum computing, who knows? But we'll say, in our lifetimes, it is unlikely your SSH key will ever get guessed, unlike your password123, which I've probably already guessed your password. And we talked about why we don't use passwords, even a long one is pretty easy to crack.
[00:03:12] So even one made of symbols and characters, it's about two years on a basic desktop computer, so something I have at home. Now if I had dedicated hardware running an entire server dedicated getting into your password, I could do it in a couple of seconds. What this table doesn't show you is that people, again, are really predictable.
[00:03:33] We don't use the underscore, ampersand, P capital thing, capital Y, and it's like gibberish. We use things that are recognizable to ourselves. So we might use password, but we'll use an @ instead of the A. Again, I probably just guessed some of your passwords right there, and we think we're being smart.
[00:03:50] But the problem is, the people of the world, I won't say bad people. I hate saying bad and good. But the people in the world that are dedicated to getting your password know this. And they know all the possible combinations of password. They know it's password123, capital P, all that.
[00:04:07] So there's not a lot of entropy in the passwords we actually use. The real way to make a password is completely computer-generated. And what a random one looks like is, let's generate a new password here, and I'm gonna say symbols too. So I'm using something called 1Password, a password manager, which I highly recommend.
[00:04:30] If you don't use a password manager, use a password manager today. Don't use LastPass, their security is very questionable. Use 1Password or KeePass, whatever you want. Don't use LastPass, professional tip. If I'm using 1Password, I use it generate all my passwords. I don't know the password to any of my logins.
[00:04:48] I know one password that goes to 1Password, hence, the name. But this is the way you should do it, because I am not reliable. I don't trust myself. I know that if you ask me to make a password, I will make same version of passwords even if I don't think I'm being random.
[00:05:01] If I think I'm being random, I'm being very predictable. So this is what a real random password looks like. But that'd be a pain to type in, and you're like, is that a T, is that an I, is that an L? So we use SSH keys for logging into our server, because we want it to be as secure as possible.
[00:05:22] More secure than your bank, more secure than your phone, more secure than anything. Because if someone gets a hold of your server, they can do anything they want to it. They can mine Bitcoin, they can use it to launch DDoS attacks against other people. They can use it to serve illegal content.
[00:05:38] They can do all sorts of bad things on your server. And when the FBI comes, it's on you, cuz it's your server. Hence, we don't use username and passwords for servers. And we talked about kind of the predictable passwords. There's something in security called a dictionary attack. In dictionary attack, it takes all of your randomness.
[00:06:02] Like this one with the password, instead of a, you add a 12, that's in the dictionary somewhere, that's really easy to guess. And computers can generate every single possible combination of the word password. So you think you're being secure and random, but you're not. A real randomness looks like that, which doesn't look anything like password.
[00:06:21] But again, humans are gonna do this, you're not gonna remember this. So whenever you think you're being clever and you have like, my password is onomatopoeia564, that's super easy to guess, that's all in the dictionary. The dictionary takes a couple of kilobytes to load the entire English language.
[00:06:36] And it takes milliseconds for a computer to run through all those possible combinations. Again, I'd say I'm not trying to scare you, but I am trying to scare you. You should not rely on passwords for most things you want. If you do, they should be very long and very random generated by a computer.
[00:06:58] But at the core, really, when we talk about SSH keys, we're talking about this concept of hashing. Does anybody know what hashing is? And not hash browns like you have at Waffle House. What's a hash?
>> Like a one-way algorithm.
>> It can be one-way, it can be two-way too.
[00:07:22] Yeah, anybody else, Mark?
>> Encryption method.
>> It can be an encryption method, it can also not be an encryption method. We do use it for encryption, but that's not all it's used for, that's a good-
>> Verify or validate, The information in the file for instance.
>> That's a great use case.
[00:07:45] Yeah, verifying and validating the file is actually intact. Yeah, so hashing is an amazing way of storing a larger amount of information in a small bunch of characters. So what we're doing is we're taking some data, which is just a string of bits and bytes. We're running it through a mathematical function called a hash function.
[00:08:05] And what's outputed is, it's called a hash. And there's different kinds of hashing function. So if you want a simple definition of hashing, you take some sort of input, you run it through some math, and you get output of numbers, and letters, and characters. That's probably the easiest way of putting it.
[00:08:21] And there's different types of hash for different ends. So the first hash that probably most people may have heard of, or maybe not, I don't know, I'm kind of a nerd, is called MD5. MD5 used to be what we used as a cryptographic hash. So we take your password, and we say, here's your password, we're gonna run it through MD5, and then now it's a hash of random numbers.
[00:08:46] The problem with that, and we learned this very odd in the early days of the Internet, is MD5 is very predictable, it's not long enough. So you can take your hash and its password, and we run it through MD5, and we're going to get this number. Actually, let me just show you.
[00:09:05] Why am I talking? I'll show you this is all built into your operating system. This is what's great about Unix or Linux-based systems. So I'm gonna say clear. Oops, may kill my server. First, Ctrl+C, if I got it still running. Say, clear, we can say temp, it's fine.
[00:09:23] So I want to hash my password, but how do I do that? I need to write it to a file first. So I can write a file, or I can be really lazy and shortcut. So I'm gonna say echo password, and I'm gonna use double arrow. And we'll talk a bit more about redirection later in the course.
[00:09:46] But double arrow essentially means append to a file, and I'm just gonna call the file foo. [LAUGH] I forgot when I was playing around, I made a directory called foo. Okay, you probably won't run into that, all right, so there we go. There's foo, we can cat foo, we see password's in there.
[00:10:11] So we're gonna use a program called OpenSSL. And remember, if you don't trust me, which you shouldn't, cuz I've got a very, very sketchy face, you can always man the commands that I'm telling you. So again, I want you to know these things and why they exist, not just following me along.
[00:10:28] So OpenSSL is a cryptography toolkit implementing the TLS security layer that follows cryptography standards. So what we're gonna use that for today is, it has all these hashes. It has MD5, it has SHA-1, it has SHA-256 in there. So let's play around with this a little bit, we'll have a little fun.
[00:10:49] Again, I'm a nerd, so my idea of fun is maybe a little different from yours. So we want to hash our foo file with the word password in it. So we're gonna say openssl, and the hash we're gonna use is md5. And then, the file we're gonna use is called foo.
[00:11:07] And we're gonna pipe this to something called awk, A-W-K. And again, don't take my word for it, let's man awk. Awk is a program built in, I'll just read it, pattern-directed scanning and processing language. Essentially, you can take an output, you can parse it to make it look like some other output.
[00:11:28] So output in Linux using standard out is usually column-based. So we can use awk to pick those columns of data, and then output that. So we can do a lot of things on the fly. When you see people who are really good at Unix in the command line, they'll be awk, sed, grep, they'll do all these commands.
[00:11:46] And you're like, what did they do? It's magic. They somehow took this giant log file and outputed just the things they care about. And again, that's the power of really good at these tools. But anyways, we won't go too deep into that, oops. Jump back to where I was, clear.
[00:12:06] So we were on openssl, and we chose MD5, we chose foo as the file. And we're gonna pipe, that bar is called the pipe. So we're just taking all the output from OpenSSL in this command, and we're gonna pipe it into the next command. So it's awk, we're gonna take the file.
[00:12:25] And we're gonna say, this one I'm using my notes, this is not a natural thing I do every day. So we're gonna use awk, and we're gonna say, and I know the syntax is a little strange here. So we're gonna say we're gonna print, and we want the second column from the output there.
[00:12:42] So we're gonna say number $2, and then we're gonna close that off. Oops, I put in too many here. And let's see what this says, awk: no program given. Doing something wrong here, no program given. So let's just try looking at the output first just to make sure we're getting that far.
[00:13:07] Okay, md5 foo, so what we want is the second output here. Actually, we actually won't need it, I will figure this out later. So we're running that foo with the word password in it through MD5, and this is the output that we get. But you know what the problem here is.
[00:13:28] This output, this hash, it looks kind of random, but are we all getting the same hash? Are we all getting this number? Yeah, so maybe MD5 isn't the best hash. Because I can run through all the common passwords, and I can MD5 hash them, and then I have now what's known as a rainbow table.
[00:13:47] A rainbow table is the output of hash files that map to a specific word or phrase that's really commonly used. Again, computers can do this really, really easily. So there are rainbow tables for probably every word in the English language, MD5, every password combination, it's already there. So we don't use MD5 anymore, it's very easily breakable.
[00:14:07] Again, we always arrive at the same results, so not very good.