Open Source AI with Python & Hugging Face

Pipeline Basics: Sentiment Analysis

Steve Kinney
Temporal
Open Source AI with Python & Hugging Face

Lesson Description

The "Pipeline Basics: Sentiment Analysis" Lesson is part of the full, Open Source AI with Python & Hugging Face course featured in this preview video. Here's what you'd learn in this lesson:

Steve discusses accessing libraries like transformers, which simplify tasks like text generation and sentiment analysis by abstracting tokenization, embeddings, and model processing. He also demonstrates sentiment analysis using a pipeline function that determines if a statement is positive or negative.

Preview
Close

Transcript from the "Pipeline Basics: Sentiment Analysis" Lesson

[00:00:00]
>> Steve Kinney: We got our little hugging face token set up for those keeping track at home. It lives right in secrets. That's where the switch is. If you jump to another one, they're like, you don't have a token. You just flip that switch on. They'll actually say, do you want it granted access?

[00:00:10]
And everything will be easy and great. But if you're curious where it is, that's where it is. And so the first thing that we had here, where we pulled in two libraries. One honestly, so I can just clear the console. That's what this one is. But the kind of more interesting one that we're going to play around with is from the Transformers, which is a library from our friends at Hugging Face.

[00:00:31]
We're going to pull in the pipeline function. If you are normally somebody who writes JavaScript TypeScript, it's just backwards from what you're used to. You would probably do import, react from React or whatever. This case is the other way. It's the second part of ES module import, but this time it's written this way.

[00:00:54]
Cool. With these pipelines, we can do all the things that we saw in those slides earlier. We can do our text generation, we can do our sentiment analysis. We can do all those fun things. What a pipeline basically is, it's an abstraction where it will grab all the various pieces that you need.

[00:01:14]
It will take the text, it will tokenize it. You're like, what is tokenization? That's why we're starting with the pipeline, because I will show you tokenization a little bit. It will then turn it into the embeddings. It will then feed it to the model. It will then take out the results.

[00:01:26]
It will turn it back into text. So a pipeline is effectively a bunch of other smaller pieces that we'll take a look at later. All made easy for you. What the hugging face library will do is effectively just pull down that open source model for you, download it.

[00:01:44]
If you're doing it locally on your machine into some dot directory, I forget exactly what it is, but various different tools will put it somewhere in your home directory somewhere and just get everything set up for you super easily. Right? Could you do all of this without hugging a face?

[00:02:01]
Absolutely. You could do it and you'd go and you'd download. It's the same way as you could use git without GitHub as well. Right? That doesn't mean you want to, it just means that you can. And there's nothing necessarily special or hugging space specific about anything we're talking about today.

[00:02:15]
Just as when we pull down Git repos or use NPM or what have you, there's nothing necessarily specific about any of those tools. They just make our life a little bit easier so that we can focus on the important parts. So all the things we talked about earlier are the things we're going to play around with today.

[00:02:33]
So we'll play around with some sentiment analysis through text generation, some question answering, and we'll talk about how that's different than what you might think with something like ChatGPT, and we'll see all of those as well. And then we'll kind of jump into stuff like tokenization and other fun stuff like that at some point as well.

[00:02:55]
So like I said, the hello world of all of this machine learning stuff, the easiest thing to do is sentiment analysis, right? And like I said, it has all the downsides of it doesn't really do all the stuff that Transformers do that we're now taking for granted with a lot of the tools that we use.

[00:03:14]
But that's like the next example. So we'll start to kind of see how that stuff works. But it's a great way to at least wrap our head around how this little abstraction that we'll pull down the model, take care of the tokenization, take care of the embeddings, convert it all back into stuff.

[00:03:31]
We can read how that library works, which is this first part right here. So we have this pipeline function and the argument we're going to give this one is like, hey, we're going to do sentiment analysis. This kind of piece that you see over here, which is again, if you know TypeScript or JavaScript a little bit better than Python, this is like if you had like a argument that was an object with a bunch of properties that you're deconstructing into the named arguments.

[00:04:01]
So like one could squint and probably turn that into JavaScript in their brains fairly easily. And that is effectively a theme. Like these are called lists in Python. They look like arrays, so on and so forth. You do not need to know a lot of Python to play along today, okay?

[00:04:19]
In fact, you can omit the model if you want for any given basic thing that we're going to do like the sentiment analysis text generation, there is a very small, lightweight default model that exists. You will get a warning of. You didn't specify a model. We are going to then default to this model.

[00:04:43]
So you can omit this. You could literally say pipeline sentiment analysis. And all of a sudden. And again, this works in the TypeScript SDK just as well as it does in the Python one. You could just say pipeline, sentiment analysis and all of a sudden you have a function that will take strings and do sentiment analysis.

[00:05:00]
That is effectively again what the abstraction, the hugging face is giving us. It's not necessarily doing anything you couldn't do yourself, but it is making it way easier than spending the first half of our day downloading things off the Internet and putting them in the right folders. Once we do that, we got this pipeline.

[00:05:17]
We can do sentiment analysis. I have various strings of text that I wrote that should. Theoretically you could probably do the sentiment analysis in your head of how they should rate. And that's the idea is like we should be able to see. And then all we're going to do is again, we had that.

[00:05:38]
We have that sentiment analyzer, which is effectively a function at this point. We feed it the text and the first thing in the responses is the result. Which point we're going to look at the label and the score. Let's go ahead. We'll hit this little play sign and one, it's going to go ahead and it's going to download that model.

[00:06:00]
Part of the reason that we chose small lightweight models is because I'm impatient and could you download a 22 gigabyte model? In fact, you could download it to Colab because you have a quarter terabyte of space. Do I want to wait for that? No. So I wouldn't even say it's about our WI fi because it's actually all happening on Google servers.

[00:06:21]
So what happens is it goes ahead, it pulls down this distilbert based on case fine tune sentiment analysis for English. And then we will take our very small array of strings and we'll run each one through and then you can see that it will take each string and we have that label and the result and it will give it a.

[00:06:47]
Not only does it think it's positive or negative, it will go ahead and give it a score. Now obviously there are some downsides here. It's not super good at sarcasm, as we mentioned earlier. So we could do something like we can add one more. It's the best thing since.

[00:07:11]
Awesome. What's your least favorite programming language and why? Is it Perl?
>> Austin: Yeah, Perl's up there. I'll say C.
>> Steve Kinney: C. It's the best thing since C. What I gathered from the tone of Austin's voice is he didn't mean that. Right? And yet it got a confidence of 1.0, 100% confidence that that was a positive statement.

[00:07:41]
Right? Because sentiment analysis is not doing the stuff that we will talk about later to try to figure out. Figure out. I mean, this is one string of text, like, I don't know, you know, you need a little more context to figure out that that was salty. But you know, you get the basic idea that like it's not a.

[00:07:57]
It's not a perfect tool. Right.
>> Austin: Could just be wrong.
>> Steve Kinney: Could just be wrong. To be clear, there's somebody who truly believes that, you know what I mean? I just hope they're not watching, otherwise I'll get an email. But yeah, so that is the. Again, send it now is a good hello world.

[00:08:15]
Because it is also does my thing work? So if you ran that and you see the output, we're in good shape for probably a good portion of the day until we switch over to images and then we play with GPUs and everything becomes new and different again. So some models have a neutral category.

[00:08:36]
This kind of base one does not. This model, like the default model, works best with English. So obviously if you hand it a bunch of German, you will not get probably accurate results. So there are things that you need to consider, so on and so forth. Right? So here's like the ones I had ready to go.

[00:08:59]
Like it thinks this is negative. It's not. Now there are other models, there are bigger models that maybe have a little bit more like dimensionality to them where you can do that stuff. So model selection in a lot of cases and what model you choose of the like many thousands insofar that people can just fork one and push one up, probably millions at this point.

[00:09:23]
I don't know, we'll have some of those things. But you always play a game which is do some of the more sophisticated models that maybe can do like neutral sentiment. Yeah, but at what cost, right? Like are they bigger, so on and so forth, are they slower? You play the game of what you're looking for at this point, for right now, for the simplest thing, where we're just trying to see if our setup works, I went for fast and easy.

[00:09:48]
Right? What you're looking for might be different. Okay, so with that, if no one's complaining my thing doesn't work, then we know.
>> Speaker 3: I had a question just about sentiment analysis. Is it always on a scale from positive to negative or can you use other describing words like angry versus sad.

[00:10:09]
Or something like that?
>> Steve Kinney: I think most of the sentiment analysis ones that I played with have positive, negative and some have neutral. But there is also when we get to stuff like the classification, when you want to have labels and stuff like that. That is usually what I have traditionally used that for.

[00:10:26]
So there might be some models that can do that, but I have always reached for a slightly different tool when it comes to that.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now