
Lesson Description
The "Boundaries in a Decision Model" Lesson is part of the full, Hard Parts of AI: Neural Networks course featured in this preview video. Here's what you'd learn in this lesson:
Will explains the term for the converter is a "model". This model represents a decision tree in which a computer would evaluate each condition to reach a prediction. The process of Machine Learning and AI is then generalized to include tools for finding the best model, minimizing the effort it takes the model to reach a prediction, and ensuring the sample reflects the overall population.
Transcript from the "Boundaries in a Decision Model" Lesson
[00:00:00]
>> Will Sentance: So, if you're finding this all very trivial so far, we are building out here the foundations that will allow us to implement gradient descent in a neural network to make predictions about classifying images. We'll be able to, by the end of the session today, be able to speak to pre-processing our data to ensure it has appropriate dimensionality for our production model to ensure accurate inference.
[00:00:29]
Okay, that sounds pretty fancy. Thumbs if that sounds fancy. That's gonna be so, okay, Joe doesn't think that sounds fancy. Maybe it doesn't, maybe you're all so experienced with this. But that should hopefully be the kinda language you're able to speak to by the end of this session.
[00:00:44]
So if you're finding this rather trite, or maybe you're not, certainly I could imagine either way, know that these are the foundations for all of those fancy things to come. So let's get some names on some of these things so that before we introduce how this links to the domain of machine learning, let's get some names on these things.
[00:01:08]
Maria, no, well, people, raise your hand if you have an idea what this converter might be called as a general term. Joe, Ben, raise your hand if you know what a converter's fancy name is.
>> Kayla: Model.
>> Will Sentance: Model, who said that? Well done, Kayla. Exactly, model, that's it, the set of rules to convert our data into our background data into the associated label.
[00:01:35]
This is a model, this is it. What type of model is this with a bunch of boundaries, maximums.and minimums, above and below? Anyone know what type of model that's called? It could be called a classifier, a prediction. It's actually a particular type of model called a decision tree, and you could say, wow, it's a fairly trivial model.
[00:01:58]
I say trivial in that it's quite like, what is it? Bunch of if else statements, right? And essentially, that is what you can think of a decision tree, except that by being so, quote, trivial, and again, I'm not saying trivial in a bad way, that's a good thing.
[00:02:13]
It's incredibly good for interpretation, and especially when you're deciding whether to give somebody their door dash refund, or if their insurance claim is fraudulent. Even more high sensitive, especially when you're deciding whether someone's insurance claim is fraudulent. You better have the ability to explain to the compliance division in your company why you gave one person the refund, gave one person the payout and another not.
[00:02:45]
And we're gonna see that many models would dream of having the interpretability. That is to say we could look at each rule and go, okay, so the reason this person did is they got years longer than that, account spend less than or dollar spend less than that, a number of accounts in the AP, that.
[00:03:02]
So that when head of compliance goes, excuse me, why have we got this pattern of people not getting refunds? It:s possible to say, well, let me just show you, for each individual person, why. So for each model, there are gonna be trade-offs, back to Ben's point, of benefits and costs associated with a given model, but this is a model a decision tree.
[00:03:25]
What are the names of the numbers that are part that go into a model? The numbers or the rules that we may or may not have different values for, which is not a million miles away from the term we use in functions? What are the names of the values, Kayla, that go into a models.
[00:03:42]
>> Kayla: Arguments?
>> Will Sentance: Or the one, I guess, yeah, it's not. What's the other one we used a bit before you put the data in? Parameters, exactly, so how many parameters, Kayla, do we have here? Would you say?
>> Kayla: Four.
>> Will Sentance: Spot on, got four parameters, four things that we are considering, four numbers associated with our converter, with our model.
[00:04:09]
Okay, how many input data values do we have, Paul?
>> Speaker 3: Three or four.
>> Will Sentance: Three, I'd say three. I guess there's a meta one, right, of how many we have, but that's interesting, actually. But we have here 3, and what do we call these things, people? What do we call these input values?
[00:04:38]
Features, soot on, three features, spot on from Ben. Was it Ben or Joe? Yes, three features. Now, by the way, in practice, in a decision tree, very rarely would we have a multi-input parameter, one that's checking multiple things. Typically, a decision tree will go, do this, left or right.
[00:05:01]
And decision tree is if years greater than 0.9 go towards refund, if dollars less than 200 go towards refund, if accounts less than 3 go towards refund, it really does split them out one by one. But how do we go about identifying these boundaries? Well, in machine learning, the machine, the computer, tries different values here.
[00:05:34]
Our boundaries or combinations of our boundaries tries different combination of our boundaries and assesses how many for different combinations of our sample convert their background info to the label given by a person, the customer service rep, as possible. There are so many combinations of parameter values. Again, how many parameters we have here, Kayla?
[00:06:00]
>> Kayla: Four.
>> Will Sentance: Four, so many combination proud values to try before the machine has learned the best set, the set that maximizes the number of correct conversions for the sample. In a decision tree, the technique that's used actually is check the first one, try different values for it, and see how many convert.
[00:06:22]
Check the next one, and we'll see in neural networks, the technique is using some equal grading percent, but we need back propagation. We need a bunch of techniques. Okay, here's the fun thing, people, here's some math. Okay, say, we just take the first three parameters, let's say we just have three parameters, okay, how many possible values would we want to check for each?
[00:06:50]
Give me, I don't know, we tried one. Let's say our sum was 100,000. Let's say we're not gonna try every dollar value. We don't think it makes much difference whether it's 199,198, 197. We're just gonna do tens maybe. And then for these, maybe do this in 0.1, that's about a month, right?
[00:07:06]
So in months, and then accounts on IP, maybe we check 1 through 30, and we know that high ones are really important, how many possible values might we wanna check for years? Just give me a rough number, how many possible values. If not number of years, but possible numbers to check if we're gonna go 0.1, 0.2.
[00:07:27]
How many might wanna check, Maria?
>> Maria: Ten.
>> Will Sentance: Ten, we don't have to go between 0 and 1, right? 0.1, 0.2, 0.3, 0.4, and suppose we do it in 0.1 increments, what do you think maybe? Let's say 50, because that's the number I learned. 50, and then for this, let's say 50 as well.
[00:07:45]
So we go in like, I don't know, 10, 20, 30, 40, up to 500 increments of 10. That'd be 50 values right to check, so that'd be 50. And let's just say we go for 50 on this. Does anyone know the math we do to calculate what the total number of different combinations of parameter values we need to check for those three, where we have 50 possible values would be knowing that we already tweaked one by this to make it work for a sample of four?
[00:08:12]
So we could have any possible combination of those 50 values be the ones that convert the maximum number of our sample. There's 100,000 inputs to our sample, it could be that it's, I don't know, 1.4 years, $120, and accounts on IP 17 that converts the maximum background info to the historic decision, and that maximizes conversion.
[00:08:32]
And assuming the sample represents a population, there's your best converter. Okay, let's say we had 10 things you wanna check, and it would be 50 to the power of 10. That's big, how big is that? Very big, how big is that? [LAUGH] I don't know this now, anyone know?
[00:08:50]
So it's 9 with 16 0s, and that is 9 quadrillion, is that correct? I think that's 9 quadrillion maybe, or is that 0 trillion? That's 9 quadrillion, that's a lot to check. Okay, thank goodness, it's a lot to check. That's a lot to check. If we had 10 different parameters, each we wanna check 50 and all the combinations of it.
[00:09:15]
The number we'd have to check, and then for every single one of them, run our entire sample through and see which converts the most, how many converts. Then try slightly tweaking for the other. Okay, let's write it so everyone gets to see, 8, no, should I write it?
[00:09:35]
No, I'm not writing 9 quadrillion. 9, here we go. That's six, is this good content, people, everyone together? Yes, I mean, there's one more than that, that's a lot. We don't have to check all of those, right, people, not a chance. Well, so let's have a look, there's a lot to check.
[00:09:57]
The discipline of machine learning and artificial intelligence has three major buckets, one, techniques and tools for finding the best converter, the model. Here, it's a decision tree, but there are so many to choose from, support vector machines, different types of regressions, and the one that's really changed the game, neural networks.
[00:10:18]
There are so many approaches to neural networks. This all comes down to finding the best converter that converts our sample, captures the pattern in our sample, converts our sample data. Ben's background info, Joe's, Kayla's, Maria's, have I said that enough times, into the respective decision whether to give a refund or not.
[00:10:40]
And there might be a ton of data what is gonna convert that technical tool we use to find that best converter. Two, ways, techniques and tools, to minimize the time and effort it takes to get the converters details, the model parameters, the ones that work for our sample.
[00:11:03]
It is computationally profoundly demanding to check them all.
>> Will Sentance: So we have to find shortcuts, optimizations. In a decision tree, the optimization is, do not try them all in combination with each other. Just try the first one and see which boundary for the first one works best for the sample.
[00:11:29]
Then if you're happy with it, it's called a greedy algorithm, gets it right at every stage. If you're happy with it, move to the next one, find the best boundary there for your sample. If you're happy, move it to the next one, and you use what's called a purity metric, a genie purity.
[00:11:42]
We're not here to learn decision trees inside out, but rather that there are techniques for every single model to get to the optimal conversion set of parameters for your sample. And that's a huge part of machine learning of artificial intelligence, because even with incredible optimization techniques like gradient descendant, backpropagation, it is still unprecedented amount of compute power that's needed to do so for, for example, a large language model.
[00:12:15]
Final bucket of work, all of this conversion, magic or not magic, set of rules that converts our sample is only interesting, cuz if we capture how to convert our sample from its background info to its respective label, well, we can use that on our population, future refund requests.
[00:12:43]
Where we do not have the decision, we do not have the label, we have missing data, and we can try to predict. We can only do that if the sample is representative of the population. Ensuring that is the case is absolutely vital. And that ties to being aligned with our goals and values, determining what our population even is, what our sample is.
[00:13:13]
Suppose that the customer service reps were not very good at labeling, refund or not, they maybe even had implicit or explicit, or no knowledge at all of it biases in whether they decided to refund or not. They base it on things that maybe the company doesn't wanna be incorporated, zip code, some things that might even be illegal.
[00:13:36]
You can look at that very much as, well, wait, that means our sample is not actually truly representative of our true patterns of the population as a whole. If we're getting people being determined no refund for reasons that are specific to the person who labeled the refund request.
[00:13:59]
In other words, by labeled here, I mean, said no refund, sorry, then that's not reflective of the population. That pattern there is associating the wrong background information with don't give a refund. So the absolutely crucial, as important as finding the right converter, as important as working out how to minimize the time it gets to takes to get the details of it or the right model.
[00:14:25]
Working out the parameters is ensuring that the sample and population reflect each other. That the sample data is accurate and is aligned with the values and goals of the organization. And in practice, we'll see everything we do in prediction is an extension of this. A sample of data with its labels or patterns, and a population from which we wanna make a prediction based on the insights we have on the sample.
[00:14:52]
So when you have models that developed their parameters to make predictions for, let's say, we'll see later, image generation. Gemini from Google's image generation tool made predictions about soldiers from the 1940s in Germany that did not mirror historical accuracy by any means. What is the failure there? Well, the failure is to define population and sample.
[00:15:25]
If you just want to be able to generate, as we'll see later on, any possible set of pixels, maybe those are pixels that could be generated. In this case, the goal was generate and the population you wanna capture is all possible historically accurate. Pixel combinations, images, then it's a failure of that model development.
[00:15:49]
And that final piece of alignment, fighting bias or misalignment, and of course, overfitting, we saw here we might have got a bit too perfect for our sample, we then tried to validate it. If our sample does not reflect our population as a whole, we have a problem.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops