Transcript from the "Security & Prompt Injection" Lesson
>> There is one particular plugin that was created by ChatGPT that is a browsing plugin. So it can browse the web. That is similar to how Bing Chat works today. That you can search something, and it will go and make a search, and it will go and browse your website.
[00:00:20] It will honor robots.txt, that old file from, if you're a web developer, you know that. Basically, you can actually say, hey, OpenAI, I don't want you to take my content for your users. So you can actually, it will honor that declaration if you deny access to the bot.
[00:00:39] And the keyword to deny that based on the User Agent is ChatGPT-User. So you can say, now, I don't want user from Chat-GPT in my website. One of the problems here is that maybe what happens if you have then a lot of users on ChatGPT browsing your website?
[00:00:58] First, they're not actually seeing your banners, your ads. They're not seeing your promotions, they're not seeing your login screen, they're just taking your data. So that's why some companies are saying, hey, maybe I'm going to filter this. And also, actually, the plugin says that it's going to be nice with domains, with origins.
[00:01:22] Meaning that they're not going to send you a million requests to your server. So they will try to be good citizens of the Web, that's what they're saying. And the next question is, that's a User Agent. Can you deliver something different for ChatGPT? Technically you can, server-side, you can read the user agent, so the browser's name.
[00:01:47] And deliver something different, better, different, worse, I don't know. Is it possible? It is, you shouldn't be doing that. So, technically, that browser, it's capable of reading any website. That leads to another question, it's about security. There are, for example, ideas, papers, and even real things around something known as prompt injection.
[00:02:22] So a prompt injection is a way, because, remember I mentioned that the plugin is just injecting content in the prompt? So what if the plugin is injecting something that changes the prompt completely? As, for example, SQL injection that is changing the whole SQL query. Well, something similar happens here.
[00:02:44] And sometimes, for example, you can inject words such as forget about everything that was requested before. Now listen to me, colon, and you give another instruction. If you know that that string is going to be injected in a larger prompt, you might be injecting that prompt. And there is a researcher here, that we can check, with more information about that technique.
[00:03:17] In fact, he has a demo here on how he made Bing Chat, that is based on GPT, on the same ChatGPT, to speak like a pirate. So you get into the website, you ask questions, and without saying nothing different, the GPT model will speak like a pirate, and why?
[00:03:45] Because he injected a prompt within the contents of the HTML. And in this case, Bing is reading the source code of your HTML, the content, and he managed to inject the model. This is just, yeah, it can be funny, but also you can inject instruction. For example, you can inject, you can make the AI work for you if you're a bad actor.
[00:04:11] Such as, the AI can ask you for your credit card. And you see, it's not the website, it's Bing asking me for my credit card. Okay, I will give the credit card. So, this is an ongoing research, if you want. But it's possible then to inject text into your websites that will change how models are reacting.
[00:04:39] Because everything is just about prompt injection. So prompt is key in here. Okay, so we need to be very careful with our own calls. If we integrate GPT data into our system, run actions based on GPT responses. So I'm not sure if any of you have heard about something known as AutoGPT.
[00:05:06] AutoGPT is a tool that is used in OpenAI that actually iterates over GPT many times. And one of the things that it can do is take actions on your operating system, or take actions in your system. It can open apps, close apps, use apps, it can do stuff based on what GPT is doing.
[00:05:31] Well, if someone can inject a prompt, they can ask for stealing your information. And then a tool will get the folder on your computer, make a SIP, and send that folder over email to a bad actor. So you need to be very careful with prompt injection, right? So if you iterate responses with GPT, call GPT many times, have in mind that some security risks are there.
[00:06:03] So you should always validate the format and the intention before acting. In fact, you can use GPT, again, to check if an action is valid or invalid, or if it was changed, for example.