What is Prompt Engineering?

GitHub

Lesson Description

The "What is Prompt Engineering?" Lesson is part of the full, Practical Prompt Engineering course featured in this preview video. Here's what you'd learn in this lesson:

Sabrina introduces LLMs, or Large Language Models. Prompt Engineering is the science behind craft prompts to return high-quality results from LLMs, which are non-deterministic or unpredictable. Transformer architectures are discussed, and Sabrina explains why scaling an LLM by 10x delivers 100x the capabilities.

Join Now

Preview

Transcript from the "What is Prompt Engineering?" Lesson

[00:00:00]
>> Sabrina Goldfarb: Let's get into what prompt engineering actually is, right? So I'm going to start with a plain old definition from OpenAI, who we probably all have heard of, because I'm sure everybody here has used ChatGPT at least once. They say prompt engineering is the process of writing effective instructions for a model such that it consistently generates content that meets your requirements. Because the content generated from a model is nondeterministic, which is something we'll get into in a minute, prompting to get your desired output is a mix of art and science.

[00:00:35]
However, you can apply techniques and best practices to get good results consistently. So like I said, we're going to talk about what nondeterministic means in just a minute. You may know about it, you may not. Totally fine. But what I want to point out here is that even the creators of ChatGPT are saying that this is a mix of art and science. Today we will be covering a lot of the science of this.

[00:00:59]
We're going to go into research papers, we're going to talk about a bunch of techniques that have been empirically proven to work. But also there is art to it, right? And what I give to an LLM and what you give to an LLM is going to look different, even if our prompts are exactly the same. It may depend on the model we use, but even if we use the same model, we might get different outputs. So if at any point you're struggling with the code that we've written because your LLM has decided to do its own thing, don't worry about it.

[00:01:27]
We'll push the code. You can pull down that code and we can all get back on the same track. Or maybe you really like what your LLM decided to create, but this, you know, it just reminds us that this is a mixture of both art and science. So today we cover the science and then it's up to us to kind of create the art around that science. I want to also make sure we understand what prompt engineering is not, and the first thing I want to say is it's not magic, right?

[00:01:53]
We can't suddenly make these LLMs deterministic, which again we'll talk about in a minute, but we can't just change these LLMs, right? They were built in a certain way, they have limitations, and we just have to deal with that, right? But with these systemic approaches, we can get really measurable results and get better and more consistent outputs and results from these LLMs. We also have to know that prompt engineering is not our only tool, right?

[00:02:21]
There are plenty of other tools. I'm sure you've heard lots of letters, MCP, RAG, right, all these other letters of things that we can add to these tools to make them better. But prompt engineering is going to be the most accessible tool that we have. We can spend all of today just talking about prompting and getting you 70 or 80% of the way there with nothing but the words that we already have. We have plenty of time to test them, assuming we don't get rate limited, we have plenty of ability to test them, and even if we do, we can always just sign up for another tool and just try it there, right?

[00:02:58]
So it's a really accessible tool. But it is not a way to make LLMs deterministic. And again, we'll talk about that in one second, but as you can probably guess, determinism being deterministic versus being nondeterministic is a little bit of how frequently you'll get those same exact outputs. So we'll talk about that more in just a second. So now, let's talk about LLMs, right? LLMs are pattern predictors that generate one token at a time.

[00:03:30]
So they predict next tokens, is all they do. If I type in an input to an LLM, it's going to predict the next most likely token, one at a time, and give it back to me. Sometimes it will predict the next most likely token. A lot it will predict the next most likely token, but there are times that it won't. So let's say that I go to an LLM and I say like, what color is the sky today, right? Or what color is the sky in general?

[00:03:58]
Maybe I don't even say today because today can have some nuance to it. Let's say I just say what color is the sky. Most of the time the LLM is going to say blue. But other times it might say gray, or it might give me a paragraph about how sunsets can be orange and pink and yellow, right? So I'm not always going to necessarily get that next most likely token of the word blue. Something to keep in mind is that token by token generation means there's no planning ahead.

[00:04:31]
So this means that LLMs only think, and I put think in quotes, while they are typing, and this is just something that we're going to have to keep in mind later when we're talking about a prompting technique and how LLMs really think. LLMs are also trained on data that is collected up to a certain point, right? They do have a training cutoff date, so we need to be aware of that. Now, they have multi-modality, a lot of them now, right?

[00:05:01]
So LLMs have the ability to search the internet, which means that, okay, even if I ask the LLM about something that happened after the cutoff date, it can usually still find some sources on the internet and tell me the answer that I want. But this is just really important to know that they have cutoff dates, because the information after that cutoff date may be slightly less reliable, right? If the only people talking about a certain event are like two people on Reddit after the event, and the LLM has not been trained past that cutoff date, there's a decent chance that the answers I'm going to get aren't nearly as good as if it was information I'm asking for prior to that cutoff date.

[00:05:45]
You should be able to ask your models what their cutoff date was, and they're consistently updated as well. But they can't just be trained forever and uploaded every single day, right? So they are going to have a cutoff date at a certain time. Maybe that's two months ago, maybe that's four months ago. But this is just again important to know because maybe I'm using the very latest framework, or a totally new language, right?

[00:06:10]
The LLM may not have been trained on that data, which is why it might struggle a little bit more if we're trying to use it on something so new. LLMs work actually a lot like autocomplete, and I am definitely not going to go super deeply into LLMs today, so take this minimized version of how I explain them with that grain of salt. You can definitely take full courses obviously on this, but at its most basic, an LLM works really similar to our phone's autocomplete, right?

[00:06:44]
But autocompletes are terrible. So how are LLMs any different? We will talk about that in just a moment on the next slide. But just keep in mind that again, same as autocomplete, we are just typing something. The LLM sees that we've typed something and it's just trying to predict that next token that we want. Folks online are asking for a definition of LLMs. Yeah, so Large Language Models is what we're talking about, right?

[00:07:14]
So if you think about what a large language model is, I'm talking about GPT-4, Claude, basically the things that we are currently calling AI, right? If I'm using an AI program, I'm likely behind the hood just talking to an LLM. There are other language models out there, right? So there are like smaller ones, there's larger ones. Large Language Models just talks about how much data the model was actually trained on, right?

[00:07:43]
And then the language part of it is the fact that we train these models on natural language, right? Natural language processes. So just when I say LLM, I'm kind of using that exactly the way that most people are utilizing the word AI now. So the last thing I want to talk about with LLMs is the fact that they're nondeterministic. I have brought up this word so many times, and I'm so excited to just get over with it, what it is, right?

[00:08:13]
So let's think about things that are deterministic and things that are nondeterministic. A calculator is deterministic. If I type in 2 + 2 into a calculator, I should get 4 every time. And if I don't, I should return that calculator and I should get a new calculator, right? Because it should always be 4. If I type in 8 + 8, I'm always going to get 16. So a calculator is deterministic. For every input that I give it, there is an output that is correct, and every other output is incorrect.

[00:08:42]
LLMs are nondeterministic, right? We talked about the fact already that they're pattern predictors. They're just predicting the next token, and they may not even predict the next most likely token. So every time I use an LLM, I might get a slightly different answer. Even if we all, at this exact moment, typed in what color is the sky to the exact same provider, the exact same LLM on the exact same website, well, we might all get totally different answers.

[00:09:15]
I might just get the word blue. Someone might get the sky is blue, when it's clear outside. Someone might get a whole paragraph about how sunsets are beautiful and the sky is orange and pink and yellow, right? LLMs are nondeterministic, which means if I enter the same prompt 10 times, I'm going to get different answers 10 out of 10 times, most of the time, right? So this is just really important for us to know and understand today because when we utilize our prompts later, we're all going to get something slightly different, and that's okay.

[00:09:50]
That's kind of the beauty of LLMs. And we're going to talk about what makes LLMs more or less deterministic and how we can actually edit that as developers right now. First, we have to cover one more piece of LLMs that I did promise we would talk about, right? Which is how LLMs are like our phones when it comes to autocomplete. So when it comes to autocomplete on our phones, right, our phones are able to remember like five or 10 words, which is why when you use autocomplete, you know, like when your phone, if I type in I love and then your phone pops up, you know, my house, pandas or penguins, right?

[00:10:29]
It's kind of three words that you can pick from, right? That's what I'm talking about when I'm talking about this autocomplete feature on our phones that we've seen. And our phones can keep track of like five or 10 words. But using this "Attention Is All You Need" research paper that came out in 2017, we were able to utilize a transformer architecture to say now we're able to pay attention to more than five or 10 words, not only more than five or 10 words, but significantly more than five or 10 words.

[00:11:02]
Now we're talking about being able to pay attention to thousands of words at a time, right? So when Google came out with this "Attention Is All You Need" research paper talking about this transformer architecture, we had this new attention mechanism where not only could the model pay attention to more of these words, but it also learned which tokens mattered for predictions. I'm using the term words a lot, but you'll see me throw in the word tokens, and we're going to just explain the difference between the two shortly, but just for now, just words, right?

[00:11:38]
Something that was really interesting when this information came out is the scaling laws too. So when we 10X the size, we saw that actually the models became 100 times more capable. And now if you think about the size of models throughout the past couple of years we've gone from models that can understand about 4,000 words or tokens, right, to over a million. So if we are talking about 10Xing the size, becoming 100 times more capable, this is why we've seen AI just absolutely become an everyday part of our lives and every single application that we use, right?

[00:12:18]
It's incredibly capable now of helping us with anything we need, whether we're studying for an interview, whether we're using it to write code at our job. Maybe I'm just using it because I can't fall asleep, right? And I want to learn about pandas and things like that, but it's incredibly capable now.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now