Transcript from the "Caching" Lesson
>> Jem Young: We covered HBS, we covered Nginx tuning on the client side so now let's talk about caching, aka, making it a bit faster on the server side.
>> Jem Young: So nginx has the ability, one of the unique things is it can setup, we can cache on nginx side. Normally we'd have to do that on the node side or things like that With Nginx.
[00:00:22] You know I think I did this with of all my paths. I don't know why I put [INAUDIBLE]. Just in case you're confused. Okay, much better. Nginx has the ability to cache things for us. So really long requests, things that are slow. We can edit our sites available.
[00:00:41] And we can put a cache in there. So to start with, this is all actually one long block. This whole area here, is one long block. You see it terminates with the semicolon? So I'm just gonna kind of walk through what we're doing. Proxy cache path is just saying where do we want the cache to live, because the cache is a physical file on your hard drive.
[00:01:03] Where should that exist? And we're saying we want it in /tmp/rginx. It's a good spot. It's a cache. We don't have to worry about maintaining it. Temp is a good location because temp periodically gets cleaned by the file system. So you don't have to worry about it kind of existing too long.
[00:01:19] Temp will clean itself up for you. But if you wanted to put your cache somewhere else, you can do that too. I don't know why you would, but, you could. The levels equal one and two. We're just saying, how far and how specific should the caches be within that file system.
[00:01:36] So if I request slash cuts and I have more requests there, slash cuts, slash Queue cuts, fuzzy cuts, how deep do we want the cache to go? We're just gonna do one and two for now, but we can get four, five, six. We can get really, really specific on our cache.
[00:01:50] And with that we have the ability to invalidate like very specific portions of our cache. But we don't have to worry about that for now. This is just throwing it out there so you know it exist. Our key zone is saying, what do we want our cache to be called and how big should it be?
[00:02:07] This is a little confusing, because I'm calling it slowfile_cache, cuz that's the cache we wanna make a cache for. And we want it to only be 10 megabytes. It may look like 10 minutes, but it's 10 Megabytes. Again, you can go crazy with your cache and make it infinitely large, you generally don't want to do that because you can eat-up all of your disk space with your cache.
[00:02:32] And you generally don't need to cache that much stuff. In act of saying, how long should this cache be enacted for? And let's call it 60 minutes. Done. So we want to invalidate our cache after a certain amount of time. And then it's gonna refresh itself with a new request coming in.
[00:02:49] Use temp path is something new to the version of nginx that we're using. It's just saying, instead of writing to a temporary file then writing to the cache, we're just gonna keep it in memory and write it to the cache. Now you wouldn't necessarily want to do this if you had a lot of requests coming in, because you can eat up a lot of memory as Nginx is writing the cache.
[00:03:05] But in our instance we'll just use_temp_path=off. So go ahead and add that into your sites-available.
>> Jem Young: And I believe
>> Jem Young: I wonder if you can?
>> Jem Young: I'll fix it later. I will put all this into the comments so that you can just copy and paste it in. I don't want you to work too hard.
[00:03:34] But I do believe that if you write something, you're more likely to remember it.
>> Speaker 2: Does this go under location or?
>> Jem Young: No, that goes outside of your server block, so.
>> Speaker 2: Outside the server block.
>> Jem Young: Yes. Let me just do it now.
>> Jem Young: And let's just put it up here at the top of the file.
[00:04:04] We can put a width in a service block, but proxy cache, I think it's okay to put it outside. So we're saying proxy
>> Jem Young: That's the key zone. Key zone equals so file_cache and that will be 10
>> Jem Young: That what it got me, keys_zone,
>> Jem Young: And we're gonna say use_tmp_path=off
>> Jem Young: Look good so far.
>> Jem Young: Right, everybody there just one long statement terminated by a semicolon.
>> Jem Young: All right. Next, we're our proxy_cache_key, so how do we wanna differentiate things that we're gonna cache? We can get very fine grains, so we can say a request coming to HTTP use a different cache then something at HTTPS.
[00:05:48] You can get actually very, very nuanced on how fun For now we're just gonna say request URIs. So, the URL that you're requesting, we're gonna use that as a cache too, that's it nothing too complicated. But again if you wanna get nuanced, you can, I wouldn't recommend it.
[00:06:05] But hey that's what I say. So, let's add in the proxy cache key.
>> Jem Young: And we're gonna say that was say, request_uri, one word or two, underscore, awesome.
>> Jem Young: Okay, okay, looking good so far. I'll give everybody a minute.
>> Speaker 3: Does it use tmp or temp for template?
>> Jem Young: Temp.
>> Jem Young: Does that answer the question? Sorry.
>> Speaker 3: Yeah I think you have tmp on the /tmpnginx.
>> Jem Young: On the last part of it where it says use_tmp.
>> Speaker 3: Yeah.
>> Jem Young: Good catch, thank you. That would have been one of those things that I'm like, why? I copied everything correctly.
[00:07:14] Thank you. And save.
>> Jem Young: All right, so right now we've told the Nginx kind of where the cache should be, what the key should be. Now let's tell it how to cache things and which path we want it to cache for.
>> Jem Young: So we only want the cash to run on slowfile because that's the file that's really slow.
[00:08:04] Adding expires headers to it won't really help at all. But we can solve that problem with nginx caching. So what we're doing is we're saying proxy_cache_valid. How long is this cache gonna be valid before the expiry? I'm gonna say one minute, which is fine. Beware of over caching and setting your time to expire too long.
[00:08:26] Because at this point, once I'm caching on the server, there's nothing I can do on the client side to get a fresh copy if I set my cache way too long. I've done this before so I'm telling you. I set my cache like, yeah, set it for a couple days.
[00:08:54] So, don't make your cache too long, just long enough. So we're saying proxy_cache_valid 1m fine.
>> Jem Young: We're adding ignore headers in here and I'll talk about that in a minute. But for now we're saying cache control which is normally something honored by nginx saying, hey, I've had some cache control.
[00:09:14] Don't cache it. It won't read a cache. It will respect that and get a fresh copy every single time. In this instance, we don't because browsers, Chrome by default when you first hit a page, will always send cache control no cache, try to get a fresh copy. In this particular instance, we don't wanna do that.
[00:09:32] We wanna cache copy every single time.
>> Jem Young: So then we can add our header. So if you want to add any header to nginx just add _header. And remember how we said the x status, x whatever is a way of just sending out information to the client, it's not necessarily relevant to the processing of the request?
[00:09:52] In this case we want to send a header that says are we hitting the cache or are we not hitting the cache? We do that with the X-Proxy-Cache header. And we're just gonna say _cache_status. So it's gonna say either, hit, miss, or expired. In the first case it's gonna be missed because nothing is cached.
[00:10:11] The second time it should be a hit. Really useful for debugging if you have an issue with, is it hitting a cache? Is it not hitting a cache? Cuz otherwise it's very difficult to tell other than the speed of the request. Next, what's the cache we're gonna use?
[00:10:24] Remember in key zone, we called it slow file cache, we're just saying this is the cache we wanna use. And as always, we proxy_pass to node, and if we don't do this it will just try to hit slash. So when you try it in slow file, that's gonna redirect the slash, so we make sure we pass it to slow file, the actual place where it should be going.
>> Speaker 4: Is there any reason we're not using HTTPS there?
>> Jem Young: Yes, because at that point we'd have to add HTTPS to node to decrypt that. And so we'd have to add certificate to be able to do that. We could do it. It's just kind of excess work that-
>> Speaker 4: That's what you were talking about with like tunnelling to-
>> Jem Young: Yeah, so we could do it, but since we own the server and I'm not too worried about someone who's on my server reading traffic coming in. Because if they're on my server at this point I have bigger problems.
[00:11:08] But if we wanted to or if we wanted to redirect to a complete different server we can add HTTPS to keep it encrypted. But in general I think most organizations do not tunnel HTTPS traffic around because it adds load on the application server. So when I say server server, I mean node, like a node server return something versus that's an application server which is nginx which is just your server server reverse proxy.
[00:11:35] Again if you do it in node it's gotta decrypt those, it's gonna add loads to your pipeline that you probably don't need to be, but great question though. I like questions. Okay, everybody done editing their slow file block, location block? Okay, so I am gonna go ahead and do that now.
[00:11:57] And I'll just put it above here
>> Jem Young: What was the first thing I said? Proxy cache valid. Wait, did I cheat? Did I put this in? Darn it. [LAUGH] I thought I put it in the speaker notes so I could copy and paste. Now we're gonna do it the hard way, it's okay, proxy_cache_valid let's say one minute what's my next slide, proxy_ignore_headers.
>> Jem Young: Then we're gonna add the downstream header, it's called X-Proxy cache.
>> Jem Young: Proxy cache.
>> Jem Young: And proxy-pass it all in two node.
>> Jem Young: Awesome, well we'll see if it's awesome if I did this correctly. And let's go ahead and write save and let's run our tester, invalid param, so I typed something incorrectly.
[00:13:59] Let's see what I mistyped, levels. Darn it, I knew something was gonna get me
>> Jem Young: Actually, I'll leave that for now. So if I said sudo service nginx restart, it just gives me this generic error. Because there's something error but it's not telling me what. That's why again, it's always been officially run in nginx [UNKOWN] just to validate your configuration file.
[00:14:33] And it tells me about I misspelled something. So now let's just fix that levels and nope. Still did something wrong again. Inactive.
>> Jem Young: It's an equal sign, not a colon. Yes, apologies if you copied exactly what I just did because It was wrong.
>> Jem Young: Equals, all right. Darn it.
>> Jem Young: Headers, it's so pedantic.
>> Jem Young: Okay I'm feeling good. Feeling good about this one. Think I got it. Anybody over under, $10 got it right, no.
>> Speaker 2: Well I had-
>> Speaker 2: You got it.
>> Jem Young: Yeah.
>> Speaker 2: So-
>> Jem Young: [LAUGH] Nobody's perfect. I make mistakes the same as everybody even though when I'm copying from a slide.
[00:15:48] All right, so we should be good now. If you have any errors ngnx should tell you where they're at. But let's make sure the cache is working. So now let's reload Nginx. sudo service nginx. Restart or reload.
>> Speaker 2: Did anyone else get invalid number of arguments and proxy_cache_directive?
>> Jem Young: Proxy_cache_directive.
>> Speaker 2: Yeah, it was from actually from the set of changes we made after I got it. And it looks like I have the stuff correct.
>> Jem Young: Proxy_cache_directive, so you have proxy_cache_path,
>> Jem Young: Then you have your cache key.
>> Speaker 2: No underscore between path.
>> Jem Young: It's very, very temperamental, but that's okay.
[00:16:37] I'd rather it fail loudly than fail quietly and I'll just be like, what's happening?
>> Jem Young: So now let's give it a shot, let's see if if I did things correctly. First load file, it's still gonna take about six seconds. Hopefully, the second time it's gonna be faster. So we see in the, now the response header, we're setting down it's a cache miss, so let's see if I can't.
[00:17:07] Look how fast that was. Remember this used to take six seconds, now it's 200 milliseconds? And so it's a cache hit every single time. So this is the power of caching on the client side and we cache on the server side. But the tricky part, as stated before, if you cache on the server side.
[00:17:24] Yeah it's fast, yeah it may seem magical, but if you set your cache too long, incorrectly. The client's always gonna be getting a cache copy, a stale copy of your site. And there's nothing you can do about it other than you'd have to reload nginx, clear the cache, things like that.
[00:17:38] That's why you don't get too overzealous with caching, only cache the things that you know you're gonna turn instantaneously. Everybody with me? We learned that caching is, it's a hard problem in computer science.
>> Jem Young: Excellent, so if I come back to this page in a minute because the cache is only valid for a minute.
[00:17:58] We're gonna get a cache miss again because the cache expired, it has to rebake that cache. Ever heard the term warming the cache? Anybody ever heard that term? So the process of warming cache is the saying, I'm gonna make a bunch of requests to make sure that cache is populated by Nginx.
[00:18:15] That way the next person that comes to the site is gonna have a cache hit every single time. So at this point I'm paying the cost of warming the cache for everybody else. Whereas if you go to jem.party right now, you're gonna get a cache hit, because the cache is warm.
[00:18:27] So warm cache versus cold cache, make sense? Like the cache is hot, it's being used. Sort of part of your build process often is warming the cache. So it'll just make automatic, it'll make query requests to your page just so nginx knows how to cache those requests. And then later it'll be hot for for everybody else.
[00:18:48] So if you ever hear the phrase warming the cache, now you know what they're talking about. It been a minute yet? Let's give it a shot, it's been a minute. So the cache is already expired, this why slow file is gonna be slow again. But once I hit it again, yep, it's expired.
[00:19:02] So if I hit it again, cache hit, yes?
>> Speaker 5: Is there a general rule of thumb along the assets to be cached, stay away from caching? Again, it sounds like it's a hard problem.
>> Jem Young: It's a hard problem, I think easy things like favorite icons, which aren't gonna change that often.
[00:19:23] If you have some base styles you wanna cache, but again, you're caching it and then there'll be that one time you need to fix something and the cache is stale. Then you have to clear your cache and restart nginx, which in our case is trivial, because it's one nginx server.
[00:19:39] But imagine thousands and restarting that and cleaning a cache across 1,000 servers. That's the kinda scale you have to think about why we do things that hard way sometimes. But the pay off is because, at scale, things get pretty gnarly. But great question, and if I knew the flat answer, I would be a very rich man.
[00:19:55] There is no one strategy for cache invalidation, things like that.