Open Source AI with Python & Hugging Face

Batches & Attention Masks

Temporal

Lesson Description

The "Batches & Attention Masks" Lesson is part of the full, Open Source AI with Python & Hugging Face course featured in this preview video. Here's what you'd learn in this lesson:

Steve discusses how neural networks require input sequences to be the same length, which is achieved by padding shorter strings with special tokens. Attention masks are used to mark real content with ones and padding with zeros, allowing the model to ignore the filler when processing, which enables batching and accurate comparison between sequences in tasks.

Join Now

Preview

Transcript from the "Batches & Attention Masks" Lesson

[00:00:00]
>> Steve Kinney: There are some challenges though, because you're doing the math between two different strings to see if they relate. And this is true if you do like we talk about that retrieval augmented generation, which is, hey, I want to take all my data and I want to turn it into vectors, numbers.

[00:00:17]
And I want to, on a prompt, go figure out what of my data I want to add onto the prompt so I can get a response back from ChatGPT more related to my stuff. The problem with a lot of the algorithms is they are expecting mathematical sets of numbers that are the same length and so what do you do when you have two strings that are not the same length?

[00:00:44]
You just make them the same length, and you do that by filling them up with nothingness. It's kind of like in JavaScript, if you do like new array 10, it's weird because you get like some weird undefined thing and then you got a map, it's not important. You can make arrays of nothingness, and effectively what you do in this case is a special token, that pad token at the bottom where if we had like the cat sat and the cat sat on the bed.

[00:01:12]
You would just fill in the shorter sentence with the empty tokens, and then you'd have two that were the exact same length and now you can do all of the math that you need to do to figure it out. In which a liberal arts major discusses math, you do all the math you need to do, but effectively, for our purposes, that is accurate and so basically we just add nothingness onto the end.

[00:01:37]
And then we have a lot of times what's called an attention mask, which is, let's say you had a string of tokens and it's 10, it was 3, 4, 5 tokens long, you needed to have 10. You'd also have a mask, which is like however many tokens of real actual content would be ones and then the zeros would be zeros.

[00:01:56]
So you can figure out how much of that to ignore when we actually go deal with it again and decode it. And so we have a way to like normalize everything and then we have a cheat code to go back to what it actually was to denormalize it.

[00:02:09]
And we'll see this in a second, we'll see it on a slide, but we'll also see it in it, we're doing the game where like, let's talk about the concepts get a little confused. And then see in practicality, where we already have some flags planted in the ground.

[00:02:20]
Is the reason that they're all the same length is because you're feeding into a neural network. Yeah. And it's just, like, you have this many inputs, so you have to give it everything, something. Exactly, and so it's comparing the relationship of two strings, they need to be the same length.

[00:02:33]
And so in this case, if we had one where these are all the actual tokens in this case, I think the period counted. I did this on purpose, I definitely measured it out, I forget my rationale on it and you had two pads on there. You would, I think, because the initial, the beginning of the sentence token and then the separator, you end up with the special tokens in there, so you're like, that's three words, beginning, end, five.

[00:03:01]
You would end up with all the ones that actually relate to tokens we care about and zeros representing we don't actually care about these. We just, again, had to have equal length of vectors to pass into the neural network. And so those are our initial tokenizations, we've got a tokenization playground that we're going to play with before we get too excited and talk about Transformers.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now