
Lesson Description
The "Creating the Data Converter" Lesson is part of the full, Hard Parts of AI: Neural Networks course featured in this preview video. Here's what you'd learn in this lesson:
Will creates a "converter" which processes the data points or features from each person to determine if a refund is approved or denied. The conditions or boundaries within the converter and crafted from the observed examples in the current population.
Transcript from the "Creating the Data Converter" Lesson
[00:00:00]
>> Will Sentance: So, these are all things that could matter. Here we go people, let's get building. We've got to explore the nature of the data, know your data and how to convert it. One, select sample of historic refund requests. Each has data on, there's a whole database of it, background of the request.
[00:00:16]
Account history, request size, IP patterns, we saw a list of things that might matter. And then it has a corollary data set. It has,what would be the word, like a associated data set. It has a parallel data set of the decision to whether the request was refunded, yes or no.
[00:00:36]
Got it? Okay, pick the background info that matters. There's a bunch of it. Let's pick some that we think really matters. We'll go into in deploying this model. Shouldn't say the name of it yet, but deploying this in time, we'll decide exactly which, what do we call these things?
[00:00:54]
Anyone know what these inputs are called? Anyone know they're called features? So we'll decide which of these features we want to actually use in making our prediction, in making our converter, sorry, making our converter that will be used for making our decision, making our prediction. Okay, so let's get to it, people.
[00:01:17]
I'm going to, on this side, it wouldn't be a session of hard parts. If we were not to have a good delineation on our board. This side here, people, we are going to go about creating our converter. All right, up here we're gonna go about creating our converter.
[00:01:39]
Here it is, create converter. Excellent, and let's start with good old Ben. Ben is a very noble customer. Actually, we've got to pick our features, haven't we? So let's pick our features. I don't know, amount feels pretty important, right? Shout out to Maria for that one. I think account length feels pretty important and what a surprise the other one I suggested would get chosen accounts on IP there we go.
[00:02:13]
So Ben's a very good customer Ben how long have you had do you think your door dash or Uber Eats account?
>> Ben: Four years.
>> Will Sentance: Four years, sounds good to me. And your refund request was?
>> Ben: What?
>> Will Sentance: What do you think? You're like, don't get it wrong, yeah, no.
[00:02:32]
>> Ben: They didn't send me my Indian food, I don't know what-
>> Will Sentance: So maybe you had a big meal, it's like a, I don't know, $80. How about that? That's not that big, I guess, on DoorDash, right? And how many accounts on your IP address you've got, you and your partner maybe, or whatever.
[00:02:47]
So two, okay, good. The refund we know by the customer service rep was given one, excellent. People gonna drag this out. Next up, Joe. Joe, Joe, bad actor. How long have you had your DoorDash account, Joe?
>> Joe: Six years.
>> Will Sentance: No, Joe, you're a bad actor.
>> Ben: How long have you had your DoorDash account?
[00:03:14]
>> Joe: Six weeks.
>> Will Sentance: I get the feeling it's more like six hours. So I think we're gonna go for 0.01 years. And how much are you trying to refund, Joe?
>> Joe: $25.
>> Will Sentance: Joe, you are a really, really bad criminal, [LAUGH].
>> Joe: [LAUGH]
>> Will Sentance: Joe is trying to refund 500.
[00:03:34]
I love how pure you are, Joe. Even though, you know you can't even cosplay as criminal. And you've got not that many accounts on the IP, but you're trying it, you've got seven, okay? And this led the customer service rep to determine no refund, refund, minus one, because honestly, Joe.
[00:03:52]
The examples you gave otherwise, would be like a very unpleasant, customer service rep who didn't give you a refund for your six year account. A $25 request like that was quite harsh. So, Joe, you're a bad actor. All right, let's have Kayla. Kayla's, just very reasonable. Kayla, maybe, what do you think?
[00:04:16]
How many years?
>> Kayla: Five years.
>> Will Sentance: Five years, sure. Maybe you're just refunding 45 bucks and I don't know, you've got one account on your IP, no problem. And so no problem, Kayla got their refund back. Maria has, let's say, I think, a year and then Maria maybe was trying to refund 200 and let's say two IP accounting IP, and they gave Maria a refund.
[00:04:52]
Because why wouldn't they? Very reasonable. Maria seems fab. And so there we go. We have our sample with the background features things, that we think matter, features being the fancy word. Background info that we think matter, select your features. Now, people, we have to, and blue is going to be my creation process.
[00:05:16]
We have to somehow develop. Some set of rules, I'm gonna call it a converter, some set of rules that successfully convert, really, 482 into the number 1. Different numbers, 0.01, right? Joe, 507 into what output? What's our target here? Joe, what do we wanna make sure these kind of we make sure our rules know to take these numbers into getting a what?
[00:05:54]
>> Joe: Neg.
>> Will Sentance: Negative one exactly. Kayla, these numbers, 5, 45, 1, we want to be able to convert them into what number, Kayla?
>> Kayla: One.
>> Will Sentance: One, exactly. And Maria, your numbers into what number?
>> Maria: One.
>> Will Sentance: One, exactly. So that's our target output. The target output we want to achieve with our converter is respectively 1, -1, 1, and 1.
[00:06:22]
Because if we can work out the rules to convert these numbers into 1, these into minus 1, these into 1, these into 1. Then for Paul, whatever Paul's numbers are, we can apply those rules to and generate out a 1 or a- 1, and make a prediction. For whether Paul should get a refund or not based on the core assumption that this sample is representative of the overall population.
[00:06:50]
Okay, so what rules might we have? Let's turn to Nick. Nick, Ruff, actually, everybody make it really hard for the editing. What do we think we want our let's say, years to be greater than if we are to be greater than this, we will have an output of 1?
[00:07:15]
That is to say, we'll give a refund. What roughly Nick would you say might be some a good boundary for our years? You gonna be too good, that's a problem. Okay, what might be a good boundary for our years? Everyone shout out a number, cuz I need one of them to be the one that we choose.
[00:07:33]
What
>> Ben: 1?
>> Will Sentance: What 1? Yes, years greater than one feels very good. Watch out. [LAUGH] Good stuff, exactly. Well done, Ben. I feel a bit bad for Maria, potentially. But what about dollars? Spend less than what, Kayla, do you think, to be able to give a refund?
[00:07:52]
Maybe $200? Why do I keep talking over? I'm so sorry, Kayla, less than what?
>> Kayla: $200.
>> Will Sentance: $200, nice one, Kayla. And then accounts on IP, let's say, less than what people? We don't want Joe's to pass, do we? So what do we think, people? Less than what?
[00:08:13]
>> Ben: 3.
>> Will Sentance: 3, sounds good to me, amazing. All right, there's our set of rules, people. Let's apply them to each person starting with Ben, you got an account longer than a year?
>> Ben: Yep.
>> Will Sentance: Yep, dollars less than 200 in your refund request?
>> Ben: Yes.
>> Will Sentance: Accounts less than three on your IP?
[00:08:34]
Therefore, what?
>> Ben: 1.
>> Will Sentance: Therefore,1, so our converted output is 1. And people, this may all seem like aha, I got it. But this is gonna be the foundation of a building out by the end of the workshop neural networks with gradient descent. And all is gonna be an extension of even this fraud detection tool.
[00:09:00]
So Joe, years greater than 1 for your accounts.
>> Joe: No.
>> Will Sentance: I'm so sorry, Joe. Yes, no, exactly. Dollars less than 200, Joe?
>> Joe: No.
>> Will Sentance: No, accounts on IP less than 3?
>> Joe: Nope.
>> Will Sentance: No, therefore, we will say else -1. There is, okay, it's looking good.
[00:09:22]
Looking good, our converter's doing really well for our sample so far. If it works for our sample, the assumption, hopefully with a larger sample than four, the assumption is it will work for our population as a whole. Okay, Kayla, you're up. Years greater than 1?
>> Kayla: Yes.
>> Will Sentance: Dollars less than 200 in your refund request?
[00:09:43]
>> Kayla: Yes.
>> Will Sentance: Yes, accounts less than three?
>> Kayla: Yes.
>> Will Sentance: So do we have our converter, output? A one or a minus one for Kayla?
>> Kayla: One.
>> Will Sentance: One, well done. Okay, Maria, years greater than 1?
>> Maria: No.
>> Will Sentance: No, okay, okay. Dollars less than 200?
>> Maria: No.
[00:10:08]
>> Will Sentance: No, council IP, yeah.
>> Maria: Yes.
>> Will Sentance: So we know that this was meant to or should because the door dash customer service rep historically gave refund one. We know that this converter should return out one for our sample. Right now, it's returning out what for Maria?
>> Maria: Negative one.
[00:10:27]
>> Will Sentance: Negative one, exactly. It has been an adjustment for me, people, doing this workshop, realizing that in America. I love that even years later I discover new things that I didn't know that it's not minus 1 but negative 1 there you go there's a there's a lesson for all of us.
[00:10:47]
So our accuracy is right now out of four what is it Maria? What's our accuracy out of 4 right now? This one is correct. This one is what, Maria? Yeah, well done, exactly. Well done Maria, 3 out of 4. That's not good enough for this tiny sample. In this sample, we can definitely get 4 out of 4.
[00:11:09]
So what do we do? We tweak our converter. Maria, convert this. Let's assume we'll keep it at greater than, not like go to greater than, or equal to, what could we tweak this to?
>> Maria: 0.5.
>> Will Sentance: 0.5, sure, yeah or if you just wanna just make it work what I mean?
[00:11:24]
Just make it work, we could tweak it to 0.9, right? Yeah, but just make it work. Yeah, brilliant. So now Maria is passing here, 200 rather than, I'm gonna be a bit cheeky here. I'm gonna say it doesn't matter, you don't pass that one. Let's add another rule, we'll learn the name of these rules in a second, but add another rule, maybe 2 of 3 must be true.
[00:11:51]
So if one of them isn't true, still give out a one. So now Maria, account greater than 0.9 years, dollars less than 200. No accounts less than 3, yes, 2 out of 3, true, yes. So now Maria is outputs 1, and we have successful conversion and our accuracy.
[00:12:15]
Maria is now?
>> Maria: 4.
>> Will Sentance: Yay, [LAUGH] good stuff. Let's give Maria a hand.
>> [APPLAUSE].
>> Will Sentance: Well done, Maria.
>> [APPLAUSE].
>> Will Sentance: Maria's helped fix our converter. We've identified the rules, the boundaries, the max or min values for our background data. This background information on each request that convert as many of the sample correctly as possible to their labels.
[00:12:39]
1 or -1 a best fit, we fitted this set of this converter for our sample.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops