Transcript from the "Hashing with Salt" Lesson
>> So next, the next thing that came along was something called SHA1. SHA1 is much more rigorous, and we'll look at the difference here. So we'll say OpenSSL, we'll use SHA1, and that's quite a bit longer. So the MD5 hash was about 33 characters or so. SHA-1will give us 41, that's a lot more entropy when you talk about cryptographic hashing.
[00:00:27] But again, you probably all got the same output as me. So again, you feel good implementing these things. You're like, I hash the passwords using SHA1. But it's again, I can create a rainbow table of all these things already very, very easily, and it stores well in memory.
[00:00:44] So we're gonna use what is now currently the golden standard, which is SHA256. And your output there is gonna be the same as mine cuz we're all using the same password. We're all using the same hashing algorithm. However, it becomes a lot more difficult to implement a rainbow table when the sizes are so big.
[00:01:05] So MD5 is 33 characters, SHA1 was 44. SHA256 is 65 characters, that's pretty hard. Now, we're getting to the randomness zone where it's much harder to crack or save all these passwords cuz it will just blow up your memory really quickly. While we're still not there yet, that's still not very secure.
[00:01:27] And this is just a common SHA1 patterns for password, p@ssword, password1, 2, 3. Again, this art exists. I didn't have any of these things. There are people in the world that dedicate themselves to security and cracking security. So what we need is something called a SALT. So the SALT is randomness.
[00:01:50] It's just a random number or hash or it doesn't really matter. That is completely unique to that particular process at the time. So we take the input, we take the SALT which is that random number, then we give it to SHA or any hash function, and then we get what's called the salted hash.
[00:02:05] So now, if we were to take that foo password, we give it some random number, doesn't really matter. And we pass through SHA1, now that password's unrecogniZable. you will never know what it is. And it didn't take that much more work. We're still using cryptographic hashing, but the bit of randomness means no one can ever create a rainbow table of all these things.
[00:02:27] Now I'm spending a lot of time on hashing and cryptographic functions because this is where you see a lot of mistakes breaking down. You see companies that get hacked and they store their passwords in plain text. They don't hash any of them, why, like why? I just demonstrated why you shouldn't do that and I demonstrated that in five minutes, yet people still do it.
[00:02:47] People still use MD5 as hashing, so when you're creating a web application, you never store your password in plain text. I think most people know that but you still see it. People say, I've hashed it using MD5. We just explained why that's not security there because people are bad at passwords.
[00:03:03] And you know what I've also seen? Well, I'll just MD5 it ten times. That way, I'm hashing them hashing and hashing and that's totally secure. Computers can do that, too, really, really easily. And it takes very little processing power to guess that. The only secure solution is to have a strong cryptographic function that's long and hard to crack with a random SALT sort for every single user.
[00:03:28] That way, even if someone cracked your password database, it's just gibberish cuz they don't know the SALTs. They can't reverse that cryptographic hash. I know, that's a lot on hashing, why it's important. It's kind of roll today. But security is one of the most basic things that we fail on and we fail on and we fail on.
[00:03:49] And as increasingly as more of our infrastructure is built on top of these web applications and servers and processes you. We have to remember, security is the number one thing we care the most about. Yes, question.
>> What's the advantage of SHA1 over MD5 if they both give the same output for the same input?
>> The same output, for the same input.
>> Meaning they both essentially do the same thing.
>> So all hash functions will always give you the same output for the same input. It's a mathematical function. Why we use a SALT is you can't reverse it. And that's what makes a cryptographic hash versus just a regular hash function.
[00:04:30] A regular hash function, you can reverse it if you know what the input was. Or if you knew hash function they use, essentially, you can reverse that and run it backwards. That's not true for every hash function, but MD5 is particularly pretty easy.
>> Where do we get the SALT, or what is typically used for the SALT?
>> It could be anything, math.random, another. Math.random applied to a hash function, which will give you another hash. Combine those together and you solve or you run those against the hash function that will give you something completely random. Unguessable, unbreakable with modern technology, and it doesn't take that long to implement.
[00:05:12] What's important is don't do the thing. Don't roll your own security. Don't roll your own hash function. I should have started with that. Do not write your own hash function. It's easy to say like, I'm gonna bit shift this and do this and add in this magic number.
[00:05:26] It seems like, how hard can it be? And you have something that looks random, but it turns out this stuff is called really hard. I think, if I'm remembering right, it's called elliptical curve cryptography. So what that means is generally given a hash function, if you run it enough times, you'll start seeing a curve of the it looks random, but it's actually no.
[00:05:48] Because all mathematical functions follow some sort of linear break. So I took an entire semester of hashing or not hashing but kind of think about elliptical curve, think about randomness, how difficult randomness. I took an entire course in my degree, because randomness is actually a very, very hard problem.
[00:06:06] Humans aren't random, computers aren't random. It's all predictable. If you follow that curve long enough, you can figure out what the pattern is, and you can reverse it. Two questions.
>> What about Bcrypt?
>> Bcrypt, I want to say bcrypt uses, let's look it up. What hash does Bcrypt use?
[00:06:31] Blowfish cipher. Yeah, Blowfish is pretty common. I try remember what Bcrypt is use in, but SHA256 is the main one I know most about. SHA1, you see, MD5 is useful. Someone pointed out earlier for validating the contents of a file. So instead of saying, hey, John, I sent you an email.
[00:06:53] Did you get it? I'm like, how do I know it's from you? Or how do I know what that file was? You can run MD5 on it and that's gonna give you a hash function that's about 33 characters or so. Doesn't matter how big the file is, MD% is always gonna collapse it down to a set of characters.
[00:07:09] And then you can say, here's the MD5 of that. I can validate on my end, and that way, I know that the file was not tampered with at all, which is pretty cool. Again, that's a really common use case of validating files and saying, what I sent you is what you got.
>> Picking up on that previous SALT question, you mentioned that you could generate a salt with math.random. But you wouldn't wanna do that every time because it has to be the same SALT that's always used. So that when you're comparing the password with the ones locking in, the SALT of the same way to compare against the hash so you need to store the SALT somewhere, correct?
>> Exactly, yeah, you need to store the SALT in the database or something so it knows what to hash against. But that database is usually encrypted and usually itself is hashed in some form. So it's just layers and layers and layers. If someone really wants get into your system and they got root access, there's easier ways of getting passwords.
[00:08:07] They can just sniff them over the wire, in this case, but yeah, you're right. You'd have to store the SALT somewhere, otherwise, you can't apply it later down the line. I have really good thing on MD5. Yes, MD5 is still useful. You don't use it. It's not considered cryptographically secure.
[00:08:26] Again, we explain why, it's pretty easy to crack. I can create every pulse combination using MD5, but it is good for validation of files with the wire. Sometimes you see open source projects, things like that. They'll say, here's the file, here's the MD5 hash, and just compare it on your end, too.
[00:08:44] So you run MD5 and you can validate, the file is what they say it is. Security or what's the right term for it? Security chain software, no, the actual word escapes me. But the idea of you need to secure every single step of your software from when you download it to when it's running on the server is really important.
[00:09:07] We hear people about hijacking NPM libraries, things like that. And you're saying like, hey, I'm installing left pad. Well, how do you know? How do you know I can just take that domain and I created my own version of left pad which does the functionality? But also has this little other functionality that would sniff your traffic and sends it back to me.
[00:09:23] You don't know if I've tampered with the file, but if you use something like MD5, you know the file is what? Here's what the author wrote. Here's what I have on the other end. So there is something called now we're getting into it a little bit. And if you space out for a minute, that's totally okay.
[00:09:40] There's something called collisions and something with MD5 is pretty short at only 33 characters. It's possible to create a collision, so I can create an identical file or I create a file that's different than your file but it hash in such a way that matches your MD5. Again, because MD5 is not very complex.
[00:09:57] With SHA1 collisions are extremely rare, I don't know if it's ever happened. SHA256, they're pretty much impossible. So again, there's pros and cons of everything. MD5 is pretty quick, SHA256 takes longer but one's more secure than the other. Jem, you spent a long time on hashing. We talked about some of the use cases.
[00:10:17] Where else do you see it? Cryptocurrencies, it's all about hashing. Bitcoin uses SHA256, you solve the puzzle, you run the hash, you get a prize called a bitcoin. We, as humans, decided that bitcoins are worth something, hence the foundation of modern capitalism. Or in where are we at, post capitalism?
[00:10:36] Is that the right term for where we are today? Doesn't matter. Ethereum uses something called Keccak-256. You may hear that referred to as SHA3. We don't use SHA3 because it's not considered cryptographically secure. But Ethereum uses a version of that. But don't say it's SHA3 cuz the Ethereum people get annoyed, but that's really what Keccak is.
[00:11:00] Anybody heard of Ransomware? I'm sure you've heard. What is ransomware?
>> Software that gets maliciously installed on in a computer system that encrypts all the data on that system.
>> Yeah, and how does it encrypt all that data?
>> Yeah, exactly, so instead of taking your 100 Gigabytes of files and try to put them in another directory and put a password around it, I'll just take all those files and run it through my hash function.
[00:11:31] And that way, I just have this giant block of text that represents all of your data but you now can't get to it cuz I've deleted the original copy and only I know the key to unlock it using my SALT in the hash function. And that's what Ransomware is, and that's why it's particularly malicious and it's not easy to crack.
[00:11:47] Because as again, this is cryptographically secure hashing, you're not supposed to crack it. Ransomware is a blight, it will never go away. And if I can go on a rant here, there is a very strong correlation between the rise of Ransomware the rise of cryptocurrencies. Cuz I can pay so anonymously, hence, where these things exist.
[00:12:09] But crypto is a very interesting, I don't know it's pushing the bounds of what we know about, one, cryptography, two, distributed computing. Interesting topic, another day.