Check out a free preview of the full First Look: ChatGPT API for Web Developers course

The "Embeddings" Lesson is part of the full, First Look: ChatGPT API for Web Developers course featured in this preview video. Here's what you'd learn in this lesson:

Maximiliano discusses the process of converting large amounts of text into numerical statistical data as a means of efficient processing. The discussion highlights searching the segmented data, returning a corresponding slice of data, and then utilizing the data as context for the prompt.


Transcript from the "Embeddings" Lesson

>> So vector-based databases, they let you store and index high dimensional vector representing text data allowing for efficient similarity search and retrieval of documents or phrases based on their embedding representations. So, what, first before trying to understand that concept let's go back and try to understand what's an embedding?

It's a method of converting text into numerical vectors, enabling efficient processing and comparison of text data, learn through training neural network models on large amount of text data is, still weird to understand what they do. We will take a text, a chunk of text, for example, my last ten minutes of speech, and we can send that to open AI not to the CBT model to a different API that is called embeddings.

In return, open AI will give us an array of numbers. The array of numbers, that's an embedding represents the text and the semantic relationship of the text with other words, it's difficult to understand actually. But the idea is that we store that in a database, but it's not a normal database, it's a vector-based database.

So that vector-based database will let us then make semantic search. So then when we have the question from the user, we can find exactly which vector is more semantically related to that question. The representation looks like a dense vector of real numbers. It's actually an array of numbers, a large array of numbers.

And because it's number is also fast to process for many algorithms, for example, a word embedding the word cat may look like an array of 0.2, 0.5 minute blah, blah, blah, blah. Typically we're talking about really large numbers and infinite numbers when you're looking at them, so that's why we needed an especial database for that.

So the method to work with this is known as a split and embed. So we're going to split our document in slices in chunks. By any character length, you pick one, 1000 characters 1500 even in fact, you can play with different numbers and see which one works better.

It doesn't need to be a semantic cut so you can cut anywhere well here from here to here, that's a slice, okay. It can be a PDF, videos caption, frequently asked questions, wherever document lash document you have. It can be even also like an output from classic database with all your customers, with all your products, with prices, so you have cut it in slices, in chunks.

Sometimes you even overlap a little bit, like 10% of the slice of the next slice also contains the last 10% of the previous one. We convert each slice into its embedded representation. How we pay Open AI for that? We make a request, you do that once only. We make a request to the open API for embeddings, that's a different API from the ones that we have been using.

And in return GPT will give us the vector representation, the embedding of that chunk. We then with that we store the embeddings in a vector database that you need to install, you need to use, or you can use from the cloud. There are a lot of cloud based solutions that will solve you that problem.

And it's just numbers, it's numbers, okay? So, we do that when we want to create our database. But then, the user is asking questions. So, when we need a prompt, and we need the prompt to answer the question from our data, we search in the vector database based on what the user needs.

The database will return the slice, the chunk of the information that is closer semantically to the query, and that's the power of embeddings. Then we checked that chunk of information, that slice, as the context of the prompt. We go back to the idea of filing the context, right, prompt engineering we are injecting that slice of information that is semantically close to the query into the prompt.

This is how it looks like in a diagram, so we have a use of prompt, we have our app, it's a server-side app, for example. We have a vector database in our server or in the Cloud, and then we have the Open AI APIs. So this is how it works.

We receive the users prompt, it's a query, it's a question. Then our app, we'll take the prompt and talk to the Open AI embeddings API. I have this question from the user. The question is, what's the price of the Open AI? So what's the price of a token?

That's something that I said at the beginning of the course. So Open AI doesn't know about that. But it receives a string, and it returns me the embedding of that question. Numbers, statistical numbers are just numbers. We then go with those embeddings to our vector database and say, hey, I have this embedding.

Find me the chunk that is semantically closer to that embedding. And your vector database will give you the slice of relevant information, the chunk of strings, the 1000 characters, where I was talking about the price. And then with that, you will add that as a context of the use of prompt and I'm making a final query to the GPT API and that will give me the output that I essentially user.

That's the process today to create a chatbot that will answer questions about a lot of things that you have in your system to documentation, for example, if you have a documentation of your API or if your app, you have a big manual of how something works. It can be documentation about a product that you are selling.

I'm selling laptops and I have a manual. Well, how can I make a chatbot that can answer questions about the manual? Well, you split the manual in chunks, you create the embeddings of those of each chunk, you store that in a vector database and when there is a question, you do that process.

That's how embedding works. It's a lot of work, yeah, I know that for you as developer are expecting something simpler. Even a lot of people is expecting actually fine-tuning Chat GPT, but that's not possible with that model as of today, and I don't think it's the idea. So you can still fine-tune, and you pay for that as well on Open AI.

Previous models not just CBT but for church CBT it's everything has to do with the prompt.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now