Check out a free preview of the full Four Semesters of Computer Science in 5 Hours, Part 2 course:
The "Bloom Filters Setup" Lesson is part of the full, Four Semesters of Computer Science in 5 Hours, Part 2 course featured in this preview video. Here's what you'd learn in this lesson:

Brian introduces bloom filters and how they give you a very fast way to make sure something isn't in a set.

Get Unlimited Access Now

Transcript from the "Bloom Filters Setup" Lesson

[00:00:01]
>> Brian Holt: So this is kind of our layout for the whole day. Sections 3 through 7 are really tightly tied together. So we'll kinda be going through those as kinda one unit. But I wanted to start off on something that to me, is an impressive use of a data structure that is not, perhaps it's complicated as the rest of these.

[00:00:22] Then we'll do some searching in an array which is something that was kind of an omission from the last one, so it's kind of me just saying I want to cover that. And the last two are two new sorts because I really like sorting algorithms, I like taking things and putting them in order.

[00:00:38] Maybe that's OCD or something, I don't know, but it's fun. So we're gonna do those last two before the end of the day. Cool, so let's talk about BFilters. Bloom Filters are a data structure and they are peculiar.
>> Brian Holt: The peculiar in the sense that they have two answers.

[00:01:03] The answers they give are no and maybe. [LAUGH] So, what's really cool about this is they are a set or a collection, or they at least act similar to a set or collection. And if you remember from the previous course I said a collection is basically this amorphous cloud of data, right?

[00:01:22] So I put things into it and then later I asked the set, do you have this in the set? And it's gonna say yes or no, right? Bloom filters kind of seek to fulfill that same space but they do it in slightly different way. They do this by-

[00:01:39]
>> Brian Holt: Making it very compact in memory, and very fast to look up, but at the cost of not being able to say definitely something is in the set, or not. The only thing that a Bloom filter can tell you is that something, so the only thing it can tell you is it's definitely not in the set, which I think is what I said.

[00:01:58] So they do this at the trade off, or what you're gaining by losing the certainty is that something is, or the Bloom filter is very compact in memory and it's very, very fast to look up. So let's talk about where this would be useful. There's a great medium article that I linked.

[00:02:16] It's quite long for reasons unknown to me. But the general premise behind it is quite fascinating of how medium, which is a online blogging platform, right? Does recommendations. When you're going and you read an article and you see the recommendation at the bottom, you don't wanna see things that you've already seen before, right?

[00:02:37] If I've already read this article I don't want it to keep showing up on my recommendations. And furthermore, if I go to like five or six different articles and it keeps recommending the same thing. Eventually, I'm not gonna read that article so stop recommending it to me, right?

[00:02:49] So medium uses Bloom filters in this capacity and it basically says has this person seen this particular article too much or how they read it before. And then they'll add it to the Bloom filter once you've read it or once you've seen it too many times. And then later, they'll go back and ask the Bloom filter hey, has this person read this article?

[00:03:09] And it's gonna say, no, this person definitely has not read this article, therefore you can show it, or it says maybe. They might have read it before, and therefore they won't show it. Now, the reason why it's a false positive. Sorry, rather the false negative is acceptable, is that it might ask, sometimes, have you read this article?

[00:03:30] And it's gonna say, this person might have read this article. It's like, okay, well, show them a different article, right? The fact that you're not showing them the article, despite the fact that they actually haven't read it is fine, right? You're just filling it with more different articles.

[00:03:42] So that's kind of the point of these Bloom filters is when you have some sort of tolerance for false negatives, but you have no tolerance for false positives, right? We follow up more or less kind of the premise here? Cool.
>> Brian Holt: So I linked some Wikipedia articles here, if you wanna go check out some more real use cases of Bloom filters.

[00:04:07] The other thing people wondering if Bloom is descriptive, why is it called a Bloom filter, the guy that invented it his last name is Bloom. So that's just good to keep in mind. I was trying to picture flowers, and I don't know, it went to a weird place.

[00:04:21] So picture for a second you have, well, let's just diagram this out actually.