Open Source AI with Python & Hugging Face

Google Colab Setup & Configuration

Temporal

Lesson Description

The "Google Colab Setup & Configuration" Lesson is part of the full, Open Source AI with Python & Hugging Face course featured in this preview video. Here's what you'd learn in this lesson:

Steve walks through using Google Colab notebooks for working with Hugging Face tokens and models. He explains how to access tokens, create new tokens, and set up the environment for running code in the notebooks. Steve also touches on runtime options, running code blocks, and managing sessions in Google Colab.

Join Now

Preview

Transcript from the "Google Colab Setup & Configuration" Lesson

[00:00:00]
>> Steve Kinney: With that let's actually play with them now that we've got like the Tor locked into our heads. So a few tasting notes which is I will kind of guide you towards where to get your Hugging Face token. But before that let's do the housekeeping of the day.

[00:00:18]
Is everything that we're gonna talk about is in one of these Google Colab notebooks which is, yeah, if you ever use a Jupyter notebook or Python notebook, you have seen something like this before. So we'll kind of work through some of those so we can see some code and tweak some code and mess around with some code and you can make your own copies of these, do whatever you want with them, so on and so forth.

[00:00:46]
But you will probably not immediately but it makes sense to do it now. Probably need some kind of hugging face token at some point or another. So let's talk about that. So with that we can just go to huggingface.co, and it gives a whole bunch of really interesting stuff here.

[00:01:06]
There are data sets of all sorts of wild stuff. I was bored looking at it last night and there's just a bunch of really fun things in here and we will see when we get to fine tuning that. You could probably find some interesting data sets and make some curious things along the way.

[00:01:23]
They also have models which we will be using the kind of the small, easy to download, definitely run the free tier Colab ones but you can pull in open source models that are ginormous and be able to pull down and effectively if you've got the video card and the RAM for it and the space on your computer, pull in models that are not going to make Claude Opus blush but are kind of impressive for something that you just downloaded and are running on your machine as well.

[00:01:59]
But the thing that we care about today is going into our account and going down to access tokens and creating a new token, right? We're not gonna push anything up today. So really you could get away with just a read token. If you think that you, you know, like the responsible thing to tell you is you know, least access principle, get the smallest token you can get away with, so on and so forth.

[00:02:24]
I'm not gonna tell you do whatever you want, right? If you think you're gonna wanna push up some of your models. Cause one thing you can do is when we fine tune a model, you could theoretically it's like the whole idea of like git and forking and stuff like that, you can pull down a model you can fine tune it based on a data set.

[00:02:39]
Whether it's a data set you pulled out from Hugging Face or one that you maybe have some corpus of your own logs or something along those lines. And then you can push up your own model as well. You do need the write access to push up to hugging face.

[00:02:56]
So you decide what you wanna do there. You can get away with solely a read token. And then once you have that token, which I already have, and I don't feel like broadcasting a brand new token to all y', all, you can go into this first notebook. So if you go here pipeline basics and you can go this little key, that key means secrets.

[00:03:18]
If you hover over it, it says the word secrets. You can go in there and you can add your secret. You don't actually need to even do any of this. Like if the library needs it, like a lot of the hugging face SDKs for various things will seek to go looking for this environment variable.

[00:03:33]
So it does need to be named HF_TOKEN, right? And you can pop it in there and then you can basically decide if a notebook does or does not have access to it, right? In this case I do. But you pop that in there and that should be basically if I've done my job well, which unclear.

[00:03:55]
It's like one of those things of all plans are good until you face the enemy. Or as Mike Tyson put it, everyone has a plan until they get punched in the face. So ideally that should be all of the setup that you need. But we'll find out together if that's not true and we'll deal with it.

[00:04:12]
The other piece that I alluded to earlier mostly again that was in the slide, so I didn't forget to mention it now, is if you do need to change the runtime type, you don't have as many options as I do. Unless you want to open your wallet, which then you will have as many options as I do.

[00:04:29]
Because the only reason I got those options is cause I paid. You should have the CPU and this T4 GPU. TPU for those keeping track at home is a Tensor processing unit, which is I think something Google makes. But we don't need those today. We need for some cases a GPU, for some a CPU.

[00:04:48]
I'm gonna stick with the GPUs today, mostly because no one wants to see my entire thing be slow. So if at any given point, and if you wanted to write R or Julia instead, those are options. I'm not going to do those options. And if you've never used a Jupyter notebook before.

[00:05:09]
That's okay, too, because as I begin to kind of show you some of these, I will, like, take you on a little tour, right? They are really, they're really cool. Now, let's do the immature part of the show first. I'll show this to you now and I guarantee you will all do it, because I only turned this off this morning.

[00:05:27]
Go into settings. You got a bunch of settings in here. Whether or not you want Gemini to help you with things, you can actually hook it up to GitHub. So you can either pull in notebooks or push them out to GitHub. Miscellaneous is where the magic is. In Miscellaneous, you have really two important settings I actually don't know how to see.

[00:05:51]
If you check multiple of these boxes, you have your power level, where you can choose no power, some power, or many power. This one I would recommend caution with. Let's pick that for a moment, and then you can engage in either Corgi mode, kitty mode, or Crab mode.

[00:06:09]
Despite how much time I've been spending in Google Collab later, it didn't occur lately, it did not occur to me that I could turn more than one of these on. I thought it was a binary choice. I have been rocking kitty mode again up until 10 minutes before I left today.

[00:06:24]
But I will show you what both of these do. And you hit Save. And now you go down into your notebook and let's say we wanna find a given section, right? So you got some markdown in here. And I can say, as you can see, it is both shaking and shooting out various explosions.

[00:06:44]
And that is the power mode. You see, I got a combo down there from the typing, at some point, I expected some cats to be walking around the top. There they are. So I'm gonna turn that off because it's incredibly distracting. But be very clear. I keep power mode off.

[00:07:03]
I keep the kitties on, right? And there are some important downsides that I should make you aware of. If you are working and your, you know, friends, family, partner, spouse, whoever walks in and you're trying to convince them that you're doing real serious work, it will undermine that, all right?

[00:07:26]
At one point, preparing for this, I was rendering images of Mu Dang the Pygmy Hippo whilst having cats walking across the top on a Saturday, convincing my wife that I was doing work. I think she believed me, but I think there was some hesitation in there as well.

[00:07:46]
You can turn those settings on, you cannot turn those settings on. Some other tasting notes as you can see, effectively, all jupyter notebook is is pieces of code intermixed with pieces of markdown, which is basically my love language to begin with. They do run in sequential order. Now you can hit something like Run all, which will run the first code block followed by the second code block.

[00:08:10]
You can also choose to just run a given code block. The one thing that I will warn you, because it will bite you at some point today, is if you get really into it and you forget to run the code block where the variable is declared, it will not be in memory, but you can run the first one and then it's loaded into memory in the runtime and then you can run one later.

[00:08:37]
And if you don't change anything in the first one, you can change stuff. It's actually a really great way to work. You can also, if you use VS code or cursor, something like that, Python notebooks will open in there as well. You can literally use this in VS code as well.

[00:08:50]
The only reason we're not doing that is again, GPUs. You need to auth separately to do that. Or can you just open it? You can just open it. You can go over to file Download, download, download it. The only thing is, occasionally I use, there's a library in Colab called Google Colab.

[00:09:15]
The only feature I use is Output clear and that won't work, but there's a different Python version as well. As you can see, this library is exclusive to the Google Colab notebooks, but there is literally one in the IPython library that will also clear the output. But if there's a big PIP install that's a version of npm, I don't want to see that in the notebook and have to scroll past it, especially as I'm zoomed in a little bit as well.

[00:09:43]
For instance, if you want to run a given code block, you just hit that little plus sign and it will begin to run. There's a little tax you pay at the very beginning as you go and get what we call a runtime. Remember we hit change runtime and you can switch between a cpu, a gpu, so on and so forth.

[00:10:02]
It will kind of give you a virtual machine somewhere in the cloud and then begin to run it. And you can kind of see, for instance, like what your resources are, so on and so forth. So like I am on the T4, which is available on the free plan, and I think I don't have the high RAM version right, so you get like 15 gigs of RAM or.

[00:10:25]
Yeah, 12 gigs of system RAM, 15 gigs of GPU and like a quarter terabyte of disk space, which is pretty nice. Dustin, are you able to connect a local gpu? Yes. Good questions already. Where is it? Yes, connect to a local runtime or if you have like some Google Compute engine VM that you want to connect to, you can use that as well.

[00:10:49]
But you can, yeah, hit connect to a local runtime. You have to do some other like fun stuff to like spin that server up. But you can 100%. So if you have fancy GPUs on your computer, you're sitting there with like a rig when you just finished mining a bunch of crypto and you're ready to like, you know, switch to the new hotness.

[00:11:08]
You can also do that as well and harness all of that power too. Cool. The other one or two other just tasting notes as we're in here is you can, I think I have turned this on almost all the time. Is automatically run the first cell or section.

[00:11:26]
Any execution. That's as you see, the first cell for me is importing any libraries that we need. So it will run that first one as well. Then it will save your outputs. The one thing I'm going to do for future viewers is I'm going to open mine in playground mode.

[00:11:43]
You do not need to that way I'm not doing the thing that I have done at workshops in the past where I edit everything live and then have to go clean up my mess after the workshop. At one point I will forget to do that and I will have to clean up my mess.

[00:11:59]
But it's the beginning of the day and I still remember my good habits. Cool, cool, cool. All right, so like I said before, you can also hit Run all if you just wanna run everything, so on and so forth. But that is the kind of high level of these Google Colab/Jupyter notebooks that we're gonna be using.

[00:12:20]
Other things that come up in my head are at some point or another, particularly on the free plan, you will get yelled at where it's like, hey, person who's not paying us, you want to spin up another runtime? And you already have one running, right? As we jump from the first notebook to the second notebook, you can hit this fun little thing up here, hit manage sessions and you can close out another session.

[00:12:48]
Yeah, so if you get yelled on the free plan and you're like, now I don't know what to do? That's what you do. How do you find out which version of Python's running in the jupyter notebook. I mean, you've got a terminal and you've got the ability to just ask Python.

[00:13:02]
I know that it's 3.11. So yeah, there's that as well. There might even be an interesting place to go see it, but I don't remember off the top of my head. Right, cool, cool, cool. Yeah, you got a terminal down here. The other cool thing, we don't have any yet, but as variables are declared in memory, you can see them all here as well.

[00:13:25]
So you can kind of see the entire what's going on so on and so forth. And if you ever break something, this little guy down here is Toggle Gemini, where you can be like, there's two really great things. There's obviously like, explain this error. There's like transform my code, but there's also explain this code to me.

[00:13:46]
So I definitely encourage you. It lives both up here, down there, lots of other places as well. Cool, so with that kind of tour, we probably wanna get into it.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now