A Practical Guide to Machine Learning with TensorFlow 2.0 & Keras Natural Language Processing
Transcript from the "Natural Language Processing" Lesson
>> The next notebook which will demonstrate another, let's say technique in natural language processing. That's what NLP stands for. And by the way, if you wanna play with a demo, [LAUGH] this one is pretty interesting. It basically recognize toxicity, meaning that is there an insult or sexual explicit.
[00:00:28] I don't wanna type anything here but you can play with it on your own and see it's actually pretty fun. [LAUGH] And the source code is available on GitHub as well under tfjs models. All right, so get back to different word representations. So the main problem or I guess problem with text is that we need to somehow convert those letters and words into numerical representation, which then can be used by the narrow network for training.
[00:01:06] And one of the ways to actually create the embeddings, remember, I was talking about dimensionality reductions but I was doing it with images. And when we're saying that for handwritten digits, we can probably reduce the dimensionality to only I know 11 degrees of freedom and still it will preserve the most important information.
[00:01:29] With the words, you can kinda do the similar thing. So you can just specify that you have a particular degrees of freedom, and each word will be just a point in this multidimensional space. And the idea is that our neural network somehow might even figure out where to put kinda words with a similar meaning.
[00:01:56] If they are synonyms, they probably gonna be really close to each other. And then relationship, remember I said for instance, almost this kinda Euclidean distance between boy and man can be similar to woman and girl. So embedding, what embedding is this technique, it's basically just by analyzing text, we'll try to assign a particular coordinates to all the words it will meet in this multidimensional space.
[00:02:29] And the thing is it's part of the keras itself. So in layers, you can use embedding and specify how many, well, how many dimensions you want to have, right? And I think that's the how many words we will be providing. But let me quickly just run this code.
[00:02:47] And we can use the imdb data set once again to train our model. So the text in this case, we actually have slightly different data set, right? There is words, this underscore basically means that that's the space so we are looking at the end of the word. But it can be part of the words.
[00:03:11] Yeah, I don't have the, br for instance, it doesn't really have the under words. It's just part of some other words. All right, without going too deep into the details, we can say that I want 16 dimensions in my embedding and my keras will be Sequential, right? So the first layer will be this Embedding so kinda space which will transform the vocabulary into these 16 dimensional representation.
[00:03:42] Then what I want to do, I want just to kinda average, find the average among all of those channels. And finally, just with one Dense layer with the sigmoid, it will tell me if it's positive or negative review. So the idea here I'm actually just training this embedding layer, right, to somehow grasp the representation of different words.
[00:04:09] So let's see how long it will take me to train, one minute. So okay, I can probably just run this code and wait for it. So you can see that loss function for my training actually was decreasing for the training data set and once again kinda starts fluctuating quite a bit even after probably seven epochs.
[00:04:33] So once again, it's indicator of somewhat overfeeding. So if we look at the accuracy, it's not decreasing that drastically. So I can look like it's still fluctuating quite a bit. So I guess I can even stop after probably fourth epoch to have more kinda stable solution, generalized solution.
[00:04:57] All right, is it done? Six, we about half through the training Let's just wait for it to finish. It should take another 20 seconds.
>> Is this sort of training probably the same that Amazon uses? For example, on their ratings, it's like when it says this is the most critical review or this is the most positive review?
>> They use slightly different algorithm. So this dataset particularly is just taking the reviews provided and just saying if it's positive or negative. You're probably talking about recommender system, right, recommendation system or-
>> Just on each product that says here's all the reviews but here's the top most critical and here's the most positive one.
>> Yeah, but I felt they don't use machine learning algorithm for that. I think there's a system of weighted voting, the comments up and down, right? And I think it's actually not using machine learning there. It's just simply showing you the most voted up positive review or something like that.
[00:06:10] But technically yes, you can apply deep learning to recognize the positivity of the comment. But the thing is this system, the one we're considering right now, they are somewhat pretty simple. And yeah, I just want to demonstrate this quickly. So on Google, there is a thing called Embedding Projector.
[00:06:36] What I did right now, I just grabbed our embedding layer, right, our embedding layer was layer number 0. And I simply saved the vector, so those kinda 16 dimensional data and also the words into those two files, vecs.tsv. And I can use Embedding Projector to visualize all of that.
[00:07:01] So right now, it's Word2Vec data set. And I can load my own. So for instance, on the first one, it should be vectors. So I need to go to Documents, practical, and file's vec, should be uploaded and also [INAUDIBLE] the data from here, the one we just graded.
[00:07:28] All right, and that's it, right, Click outside to dismiss. Perfect, so we just loaded our own data set. But it's not, so we had 16 dimensions. It kinda reduced to the most important ones, only three dimensions because we cannot visualize 16 dimensions on my screen. But the most interesting part if we just put the words explicitly, so for instance, what is interesting here?
[00:07:57] I definitely can see that great, beautiful, wonderful on these parts, right? And the opposite side there is unhappy, poorly, terrible. So somehow our system already can separate and assign negative values to those words positive to those, right? And then basically it will influence how we're doing the prediction.
[00:08:19] If you kinda turn this around, maybe we can find some odd there, interesting dependencies, well, grades, perfect. I don't see anything. Excellent, yeah, it seems it basically tried to find the most important words which helped us to distinguish if we're looking at the positive or negative feedback. But the thing is I found some very interesting dependencies.
[00:08:55] For instance, DVD and VHS, [LAUGH] were on different sides of the spectrum. I couldn't explain why it was happening like that, but it was. So somehow ,yeah, probably words, DVDs were in more positive feedbacks and VHS negatives, I don't know. But that's one of the way to kinda to try to understand what the embedding is actually doing, right?
[00:09:23] And so it allows you embedding. When you're training like this, it will might even learn some dependencies between the words. For instance, you can type kinda well, word well, and see what's the words with the kinda similar meaning or similar condition. I actually like Euclidean. Well, unique, well, job, I'm not sure how to interpret this.
[00:09:49] Let's find something else like good. Convention, Powell-
>> Why are there some words with underscores after them?
>> So it's the way how the dataset was prepared. So interested with underscore means that we are looking at the word interested but split, it's only part of the word. So for instance, where it's splitful will be represented by this word.
[00:10:21] Is there a word splitful? Maybe not but you've got the idea is basically means that it was only part of the work as well as nte. It's not that a word but it was part of the word and sometimes they're doing it. They're just building parts of the word.
[00:10:36] Because they're trying to find maybe just some combination were part of the word still gonna preserve some meaning about the whole maybe even sentence. So it's just the tricks that you're doing with text to once again, kinda convert this just written form into digital representation. And the embedding is just the additional obstruction on whatever bag of words you created by kinda cutting and chopping parts of them.