Refund Request Filtering

Codesmith

Lesson Description

The "Refund Request Filtering" Lesson is part of the full, Hard Parts of AI: Neural Networks course featured in this preview video. Here's what you'd learn in this lesson:

Will explains that predictive systems require sample data. Using an AI system that automates DoorDash refund request as an example, a sample population is identified and potential data points the system can use are established.

Join Now or Start a Free 5-Day Trial

Preview

Transcript from the "Refund Request Filtering" Lesson

[00:00:00]
>> Will Sentance: So in this workshop, we'll build out the full mental models of prediction, model development, deployment, so that you can be part of it. But folk, it all starts with predicting whether our DoorDash refund request is legit or not, so that we can automate that decision rather than back and forth chat with a customer service rep.

[00:00:25]
You've seen this maybe in your DoorDash or in your Upwork, no, well, [LAUGH] that's my favorite app. Your DoorDash or your Uber Eats, where you can no longer go and have a chat with a customer service rep and say, hey, my food did not show, can I get a refund?

[00:00:42]
But instead you can say my food did not show and automatically you get a refund. They ain't giving that to everybody, there's something about the characteristics of your request that's enabling that. How is that automation happening? To automate that decision rather than back and forth chat, how? By using a sample.

[00:01:03]
Now folk, [LAUGH] it wouldn't be [LAUGH] a workshop without whiteboarding so here we are back on our blackboard here, back to our familiar blackboard, and we are going to get building a sample of historic refund requests. So that is previous refund requests made that are gonna each have two sets of data.

[00:01:33]
One, a bunch of background info on each request. What sort of things might that be? We're gonna see a bunch, some sort of details about the request, maybe, Kayla give me one right now, detail on the request that might be interesting. How many dollars? Excellent from Kayla, exactly.

[00:01:54]
The background info on the refund request and an associated data set, absolutely crucial, the decision from the customer service rep whether or not to have given the refund. That's gonna be our historic data, that's gonna be our sample. Let's see, let's start here, folks, here we go. We're gonna have a sample, yay.

[00:02:21]
Of some refund requests, some historic, refund requests that are themselves, a subset that means some of, a total population. Population doesn't mean people, it means all possible of this type of thing, in this case, refund requests. Population of, this is one of only two fancy symbols we're gonna have in all of this otherwise very math heavy domain of machine learning, and that is the upside down A.

[00:02:58]
Raise your hand if you've seen this symbol before. Hey, it just means all. So [LAUGH] it's just a shorthand for all, but start using this in your slack messages. That's right, hi, all, people are gonna love you for that. You're gonna love the person who uses that symbol.

[00:03:19]
You'll be so popular, you will say, wow, down to earth, what a down to earth member of our team. All possible refund requests, including future ones, okay? In our sample though, because that's gonna hopefully provide us the background, the material to maybe predict or generalize to the rest of our population, including future refund requests.

[00:03:50]
But we have to start with a sample. So who should we start with? We will start with.
>> Will Sentance: I don't know who I can get. Ben, here we go, Ben. Ben, when did you do your refund request?
>> Ben: Yesterday.
>> Will Sentance: Okay, November 24th, great. And we know Ben is a very good, noble DoorDash user, so did Ben get a refund?

[00:04:20]
You bet Ben did. Absolutely, Ben got a refund, we'll say, yes, one. I feel bad [LAUGH] now, [LAUGH] given that I've- Okay, what about Joe? What do we think of Joe? Joe's great, but, Joe, when did you make your refund request?
>> Joe: November 28th.
>> Will Sentance: No, this is 2024, sorry, November 28, is that in the future.

[00:04:46]
>> Joe: November 18.
>> Will Sentance: That's in the future. [LAUGH Okay, so, Joe made their refund request in october 24, there we go, that'll do. And Joe, what do they call this month? They call this the month of fraud. Joe [LAUGH].
>> [LAUGH]
>> Will Sentance: DoorDash has your face on that- you know where they print out your face at the Bodega?

[00:05:10]
>> [LAUGH]
>> Joe: Yeah.
>> Will Sentance: DoorDash has, so Joe, did you get a refund?
>> Joe: No, I did not.
>> Will Sentance: Not a chance, [LAUGH] -1, no refund, absolutely, there we go. Ben, refund, yes, Joe, refund, no. This is our sample of all possible refund requests. In reality, maybe we have 100,000 in our sample, but for now, just Ben and Joe.

[00:05:44]
Here's the goal, if we can figure out, there's a bunch of background information on Ben's refund request, the size of the refund request, but maybe even thousands of different pieces of detail, pieces of data, thousands of it, I'll do a little symbol of the database, they got a whole database on Ben.

[00:06:13]
Joe, [LAUGH] they got a whole database on Joe. There's all the background info on Joe's refund request. If we can work out some rules, some instructions, some boundaries, maybe maximums or min-, some way of converting the background info on Ben's refund request to the decision refund one on that historic refund from Ben, and Joe's, the background Joe's refund request to the decision refund minus one, no refund if we can work that out, we can potentially create some sort of set of rules.

[00:07:00]
Some way to convert, let's call it a converter, which I know can be spelled with an Or, I checked, with an OR or an [LAUGH]. Some sort of converter, then we can potentially use that converter on the rest of the population, meaning, and by population, I don't mean people, though, per say, in the way it is here, but the rest of the refund requests.

[00:07:28]
Meaning that when Paul in December of 24, we're talking next month, the future, makes a refund request, we have a whole bunch of background info on Paul and their refund request and we can potentially use that converter to automate the decision to make a prediction. If the sample is representative, has the same sort of patterns in it, of what is associated with a refund in background data, what's associated with we shouldn't give a refund in background data, if we can find those patterns and apply them also to convert the background data on Paul to a decision of what would it be, what would be the decision here Kayla?

[00:08:25]
We got a refund of what two choices do we have?
>> Kayla: One or negative one.
>> Will Sentance: One or negative one, right, yes or no, then perhaps we can make a prediction of one or negative one, that is to say yes or no on giving the refund to Paul, and we can do it automatically.

[00:08:45]
No customer service rep required. People, that is it, that is it. If we can work out a way, a set of rules, instructions to convert the info on each of the requests into yes or no, one or minus one, that we have, right? We've got two parallel data sets here, a bunch of info on Ben, a bunch of info on Joe.

[00:09:05]
And in parallel, sort of corollary data sets, contiguous, I don't know what these words mean, [LAUGH] I think I know what they mean. Refund, one, refund, minus one, what the customer service rep gave, then for our population, all possible, including future refund requests, as long as that sample mirrors the population, as long as that sample is representative of the population of all future refund requests, the associated data of Ben, typically with a refund one, Joe, typically with a minus one and 100,000 of those, maybe then for Paul, we can apply that same conversion and make a prediction, the right decision, one or minus one.

[00:09:46]
Maria, then I'm gonna come to you, what info on each request might actually matter? And I realize I'm a spit squished down here, but I'm gonna put a few of them down here.
>> Maria: So the name of the person, the-
>> Will Sentance: Yeah, and I don't know if we should be inferring too much from name, but we could.

[00:10:04]
>> Maria: To pick up the background data, right?
>> Will Sentance: Yeah, absolutely, yes. I thought, what things might be important to determine whether or not to give a refund, the name might allow us to find all the background information, definitely. I don't know if we want to use the name to make our prediction.

[00:10:21]
Those Marias, don't get me started. Let's put this then, I think for now, here. So we have, let's put it in because it's good to know that this could be something we could consider, let's have name, What other things about the request might be useful? I understand Maria's- yeah, absolutely, date of the request.

[00:10:42]
What other things?
>> Maria: Amount.
>> Will Sentance: Amount, absolutely, that feels more fair than name, [LAUGH] but I understand absolutely that Maria was suggesting name, as in to go and get the information on their profile, absolutely, yeah, anyway, excellent. Really good stuff from Maria there, any more you wanna add?

[00:11:04]
>> Maria: Maybe items.
>> Will Sentance: Yeah, absolutely, maybe the types of items. Yeah, actually that's a really good one. I've not heard that one before, and that's a spot on one, isn't it? Types of items, excellent from Maria, let's have, yeah, please Joe.
>> Joe: What about description of the refund request?

[00:11:22]
>> Will Sentance: Yeah, exactly, the text string of the refund request spot on, yeah. Nick, sorry Nick.
>> Nick: Frequency of refunds, like-
>> Will Sentance: Yes, absolutely frequency of refunds, yes absolutely, Ben?
>> Ben: And if he has any historical refund?
>> Will Sentance: Historical refund, maybe historical spend, absolute daily spend, yeah, please Jamie.

[00:11:42]
>> Jamie: How about time since the original purchase.
>> Will Sentance: People who've actually worked in fraud detection, always bring this one, I'm not saying maybe Jamie has but that is absolutely such a great predictor that doesn't always come to mind, which is time since event, time since it's happening. Other things, Kayla, anyone's come to mind?

[00:12:14]
>> Kayla: Which restaurant?
>> Will Sentance: Yeah, maybe delivery driver, so yeah, these are all really good. Yeah, these are all really good people. There's so many great options on there. What else we got? We got amount, account length, how long you've been a member, right? How long you've used, so let's say account length.

[00:12:34]
I always got another one in mind actually, so I could be in years perhaps. I always got another one in mind which is number of accounts on that given IP, okay? Accounts on IP, let's see, accounts on IP. If you're trying to vote manipulate on, if any of you have tried that, [LAUGH] if any of you are trying to boost any of your content, you'll know that's a key one that gets you spotted.

Learn Straight from the Experts Who Shape the Modern Web

In-depth Courses
Industry Leading Experts
Learning Paths
Live Interactive Workshops

Start a 5-Day Free Trial