Complete Intro to Databases

HyperLogLog and Streams

Brian Holt

Brian Holt

SQLite Cloud
Complete Intro to Databases

Check out a free preview of the full Complete Intro to Databases course

The "HyperLogLog and Streams" Lesson is part of the full, Complete Intro to Databases course featured in this preview video. Here's what you'd learn in this lesson:

Brian defines HyperLogLog as a command used to check if an element does not exist in the database, and explains that streams are useful when adding an important amount of data to a source, and needing to subscribe to updates of that data.


Transcript from the "HyperLogLog and Streams" Lesson

>> There's two more data types that I just wanna talk about, just at a very high level. I'm not gonna show you how to operate on them but you can go, I left links in the docs here that you can go check out how to use them. The first of them is HyperLogLog, which I just love the name.

So it's often abbreviated HLLs. So in another one of my courses, the four semesters of computer science part 2, I talk about a thing called Bloom filters. So Bloom filters these kind of things where you can add things to it and they're very fast for lookups but they have a margin of error.

This is great for extremely large data sets where you have a tolerance for false positives. So HyperLogLogs are actually that same idea implemented inside of Redis. It doesn't use Bloom filters it uses something called the HyperLogLog. I don't know why they're called that. Did I put that in there?

No, I didn't. So anyway, you can probably go search through the docs. It's another way of doing these kind of Bloom filter type of ideas. And the nice thing about HyperLogLogs versus Bloom filters. Bloom filters can have very high margins of error up to 30, 20% something like that.

HyperLogLogs is like a 0.8% margin of error. So let me just give you an example of when you might use something like this. There's a great article from medium and when they used Bloom filters on of when to recommend a new article to a user. What they'll use a Bloom filter to say, should I show this or should I recommend this article to this user?

And they'll use a Bloom filter to determine whether or not they should or should not show that to the person. Whenever a user has read an article, they don't wanna recommend to that user again to read that same article. So what they do is they add it to this Bloom filter, or in this case, they could add it to HyperLogLogs.

And sometimes it's going to tell them, this user has read this article when in fact that user hasn't read that article. But the consequence of that false positive is that that particular user won't be recommended an article that they haven't read before, which is fine. Because they'll just get recommended another article that they definitely haven't read before.

It's kind of a strange thing. But the trade off here is with having some tolerance for that false positive, you're able to go way faster and have much less memory footprint. So, that's what that is. If it's a little weird, because it's weird, someday in the future, you might have that, it's like, this is the perfect time I've been waiting my whole damn life.

I wanna use a HyperLogLog or that moment may never come and I'm sorry that you won't have such a glorious moment with HyperLogLogs. Okay, so that's HyperLogLogs. The other thing is Streams. These are much more newer to Redis and I have not personally used them. But it's basically something you can do pub sub, so publish subscribe with Redis.

You can subscribe to a key where something can be publishing new things to that key. And then using a Stream, you can just be continually reading things out of Redis. So it's basically you can subscribe into like a read only Stream and get keys out of it. So the example that they use is like logging, right?

So if you're logging something from your app service, you can log it to Redis. And you can have something else subscribing to that continual logging of to that Stream. So, another good thing to do that would be like reading from an IoT device so that if you have like a temperature sensor, that's reading and getting a new reading like every three minutes.

You could be subscribed to that temperature update through a Stream in Redis. I don't know if that's how I would implement it. Obviously I haven't so I can't really tell you but that is a data type that is available to you in Redis as well.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now