Complete Intro to Computer Science Graphs
Transcript from the "Graphs" Lesson
>> So this data structure is ends up being one of the more useful ones and it's used a lot. A graph is basically just a tree that doesn't have a root, right? So you're gonna have a node, and that node is gonna be connected to lots of other notes, right?
[00:00:16] Whereas trees we have this sort of hierarchy right where you have like a root and it has children and those have children's Graphs are more other similar to that, but they don't have hierarchy, right? You have this graph or this node and this node and this node, and they're connected to each other.
[00:00:33] But not any one of them is going to be the parent. Right, they're just connected. So a really good way of thinking about that would be your friends on Facebook, right? You have this person, and they're all connected to each other via friendships, right? But they're not necessarily have any sort of hierarchy, right?
[00:00:54] There's no one friend that owns the other friend. I don't know, maybe you have a weird friend group, my friends don't own me. Like in the Facebook example we can have a node and the node can be represented as a row in a database, right? So I could have 100 people that exist in my friends database right?
[00:01:11] And then I probably have like a separate table, that would probably call something like edges. So an edge is typically what you would call the thing that connects to different things. And an edge describes a relationship. So if I have person here, person here, and then I have an edge between them, that edge would be the friendship, right?
[00:01:31] But, you can imagine there might be multiple different kinds of edges, right? So in a previous course that I taught on front of masters called the complete intro to databases, I talk about a database called Neo for j which is a graph database. And, the example that I given that is you have movies, actors and directors right?
[00:01:52] Directors are connected to movies via, they are the director, right? So that's the name of the edge, there's directed. Actors are connected to movies via acted, right? So an actor, its relationship to a movie is that it acted in the movie, so. As you can see, you can have different types of edges as well.
[00:02:13] Another interesting kinda caveat, there is a Facebook friendship is bidirectional, right? So if I add you on Facebook, you have to approve me adding you on Facebook, and therefore, we become friends with each other, right? So that relationship exists in both ways. I am your friend, you are my friend.
[00:02:32] Let's consider Twitter for example, where I just follow you. So that's unidirectional, that I follow you but you don't necessarily have to follow me back, right? So that would be an example of a unidirectional edge. Cool. So that kind of just describes at a very base level of like what a graph data structure is.
[00:02:52] And a graph database like Neo for j is just a very large graph data structure. If you're interested in that, I highly suggest taking my course the complete intro to databases. I talked a lot about it. It's kind of fun. I think it's approachable to anyone. If you're this far in this course, you can definitely take that course and be okay.
[00:03:12] I'm gonna assert that I think this course is harder. Graphs are everywhere. So we talked about social networks. That's kind of the easiest one to think about, but your Internet of Things, right. So all of your various devices that talk to each other, those are almost always represented as graphs.
[00:03:29] Neural network, machine learning libraries like TensorFlow, or pytorch. Those have tons of graphs in them. Everything that we're kind of just modeling in terms of like sensors, and mapping the virtual world or sorry, mapping the natural world to the virtual world. Just everything is a graph, right? Everything is just describing a relationship between two different objects.
[00:03:54] So, let's talk about social network cuz I think that's kinda of an easier one to conceptualize. We're going to be tracing a virtual made up social network. That's definitely not LinkedIn. So the idea is that you have a bunch of co workers that are connected to each other, right?
[00:04:14] And each one of those co workers has a title, right? So this person is a programmer, this person is an architect, right. And they have various different connections that are on this social network. What if we wanna do some sort of analysis, right? So let's say, I want to find out the most common job title in my social network, right?
[00:04:36] I'm connected to five people, and I wanna find out how many are the most common job title, and not only my connections, but my connections, connections, right? So how would you go about thinking about doing that, right? So let's say I have me here, and I wanna find out the most common in two degrees, right?
[00:04:57] So I wanna find out what Maria's, Bob's, Alice's and Sally's are, right? I'm not connected to Alice, so I have no edge between me and Alice. And I'm not connected to Sally. I'm just connected to Bob and Maria. Well, I don't wanna analyze who Maria's connected to, or sorry, who Alice is connected to and who Sally is connected to, I just wanna analyze these four people.
[00:05:21] Cuz Alice and Sally could be connected to a lot more people. Well, let's think back to the the algorithms that we learned earlier in this course. If I did a depth first traversal of this social network, I would not only analyze Sally's, I would analyze Sally's connections, and that person's connections, and that person's connections because I'd be going deep first, right?
[00:05:45] So it doesn't make any sense for us to do any sort of depth first traversal. But what if we did breadth first traversal, and just limited how many degrees we went, right? With breadth first traversal, we can say, I wanna look at the first layer, or the second layer, or the third layer, right?
[00:06:02] And we can look at just the people in those kinda connections. So we're actually gonna use that exact same algorithm, breadth first traversal. And we're just gonna apply to a graph where there's no root. But everything works exactly the same, right? We're still gonna use a queue and all that kind of stuff.
[00:06:16] So when you're looking at the problem here, maybe take a gander at the code that you wrote for your breadth first traversal, cuz it's gonna be pretty similar. So, if I wanted to find out. All these people's job titles, I would me to the queue first, right? Just like a breadth first traversal.
[00:06:41] Then I would dequeue me and I would add my job to the tally, so program manager. Then I would queue my connections, Bob and Maria. Then I would dequeue Bob, and add Bob's job to the tally, Bob to designer. And then I would queue Bob's connections, Sally and Alice.
[00:07:02] Then I would dequeue Maria, cuz she'd be the next person in my queue here, so I queue her so I'm connected to Maria. And that Maria she's also a program manager. And then I would queue Maria's connections. So, that would be just Alice, right? But here's the thing.
[00:07:22] I already queued Alice because I'm connected to Bob, and Bob is also connected to Alice. So I would have to make sure that I ignore Alice or she'd get double count, right. So that's one thing about graphs is whereas trees, I can guarantee this branch doesn't connect to this other branch, right?.
[00:07:39] So I can be guaranteed that all these nodes that are being entered into my queue are unique. That's not necessarily true in a graph, right? I can be friends with you, you can be friends with that person, right? And also others of my connections can be friends with that person without me ever being friends with that person.
[00:07:55] So we don't have any sort of guarantee, right? So, that's just one necessary concern that you should concern yourself with is like, hey, make sure I don't add duplicate people to my queue. Okay, so after I do that and after I've processed brought Bob and Sally, I've actually already done my first iteration one degree of separation, right?
[00:08:19] So if I was just doing one degree of separation, then I'd already be done. out it would be one for loop, right? And then if I wanted to do a second degree, I would just do through that for loop again, and I would just enter the queue again.
[00:08:34] And, that's how you kind of do these various different degrees of separation. So does anyone have any questions about how to process a graph?
>> So quick question about duplicates. I guess they would need to have some sort of unique value, right? In this case, that would be Alice.
[00:08:54] So we would check if this is already in our array, we just ignore it, right? We just keep it?
>> Yeah, so the question is it's like, what's a good mechanism for detecting duplicates in this. In the code that, cuz this is actually the problem that I'm gonna have you solve here in just a second.
[00:09:14] All of the people have an ID. And I just add that ID to a set and I say, hey, if this is already in my set, don't add it to the queue. That's a pretty elegant way of handling it, but it's up to you. You can handle it.
[00:09:29] There's a lot of ways to to handle that. It's a good question. Cool, and again, feel free to pop over here to the Complete Intro to Databases. Down here there is these whole section on graphs. We talk a lot about connection and edges, and how to do this at super large scales, right?
[00:09:56] How does Facebook manage its social graph, those kind of things. So, definitely take a look at that. Okay, so question?
>> Yeah, question about graphs. How do you limit the amount of loops when you don't know the node to stop at?
>> How do you limit the amount of loops when you don't know the user to stop that.
[00:10:21] Okay, so that's a good question. Let's go take a look at traversals here, and the solution here. So we're gonna be looking at this one, the iterative one. So we have the while (queue.length), okay. So one degree of separation is everyone that I know, right? So what I can do to start this queue here, say skip here, so don't do anything weird.
[00:10:58] I have this queue that I'm making right now. So if I'm just going to go for just one degree of separation, all I have to do is say const. My connections = me.connections right? Then I would just say my connections.length. Bam, this would just be one degree of separation, right?
[00:11:24] Now, if inside of this particular while loop. Let's go back and say this is queue now. And here inside of my while loop here, what I end up doing is I say, alright, queue.push, connection.connections, right? So my connection's connections, let's just call this person. So, and I'll just say this is a const newQueue.
[00:11:59] Or let. Okay? And I'll put some actually new queue. So what I'm gonna be doing when I do this new queue.pushperson.connections. What I'm doing, alright this is actually not be pushed but can cap, whatever it's fine. This is basically preparing the queue to process again, right? And then instead of having a while loop here, you're just going to have a for loop.
[00:12:36] That's going to go for only so many loops, right? So by just tracking how many times you're queuing up the users, or how many times you're processing your connection's connections, that's how you can limit how many degrees of separation you're gonna go. But basically, when you're processing one degree of separation, you're getting ready to process the next one by queuing up all of their connections, until eventually get to the point is like, okay, I've now processed three degrees of separation.
[00:13:06] I'm gonna start processing that right, despite the fact you actually have queued up a fourth degree of separation process. Hopefully that kinda gives you some indication. That's how I did it. I think it's as far as I know the only way to do how to do it. And then if that still doesn't make sense, then we'll come back and we'll do it together and we can talk through it.
[00:13:31] Right, let me unbreak my breadth first traversal.