![]() |
|
sidki's config set: 2005-06-09 - Printable Version +- The Un-Official Proxomitron Forum (https://www.prxbx.com/forums) +-- Forum: Proxomitron Config Sets (/forumdisplay.php?fid=43) +--- Forum: Sidki (/forumdisplay.php?fid=44) +--- Thread: sidki's config set: 2005-06-09 (/showthread.php?tid=358) |
- z12 - May. 09, 2005 12:54 PM Hi sidki Here's my caching filters with some comments: Resources won't cache if the Server sends a Content-Type header that incorrectly identifes the resource as html. To get around this problem, I use a combination of the ETag, IF-None-Match & Last Modifed headers. The basic idea is to have the browser store a variable that indicates if the cached resource is html or not. We can't really store a "variable" in the browser, so the ETag header is used to simulate a variable. The first time a resource is requested, the "Cache-Control" header is determined by the "Content-Type", which at this point, may not be correct. After the "Content-Type" filters have had a chance to run, incorrect Mime-Types should have fixed and the ETag filters are applied. The first ETag filter removes the ETag header if the "Content-Type" is not html. The 2nd ETag Filter adds an ETag header if none is present. If the "Content-Type" is html, an ETag header is added with a field value of prxETag1. Otherwise, an ETag header with field value of prxETag2 is added. To make all future requests conditional, the Last-Modified header is added if it was missing. The next time the same resouce is requested, the browser sends an If-Modified-Since and If-None-Match header. The If-None-Match header contains the value that was in the ETag header. If the value is prxETag2, variable hvINM is set to 2 (not html), otherwise, variable hvINM is set to 1 (html). The header is removed whenever the value is prxEtag* as to not pass a bogus value back to the server. When the reply is received and the caching filters are enabled, the "Cache-Control:" header defaults to setting for 1 day. This setting is replaced with a value for 1 second if the following conditions are satisfied: The value of hvINM is not 2 and the Content-Type is html, or, the value of hvINM = 1. Since this is not the first time this resource has been requested, hvINM will have a value of either 1 or 2, which will determine whether or not the reply should be cached. If hvINM=2 (non-html), the Server's Content-Type is ignored and the reply is cached. If hvINM=1 (html), again, the Server's Content-Type is ignored (it may be missing or invalid on 304 replies) and the reply is not cached (ok it is, but only for 1 second). Here's the modifications I made: # modified URL, so 304 replies won't remove Cache-Control: 2 filters replacement value Code: In = TRUE# modified URL, so local.ptron html in cache would be reused # modified Match, so filter would match based on value of variable hvINM Code: In = TRUE# added this filter, removes ETag if it's not html Code: In = TRUE# added this filter, adds fake etag if ETag header not present Code: In = TRUE# added this filter, sets variable hvINM, removes header if etag was faked Code: In = FALSE# modified the existing If-None-Match filter Key so it had a number Code: In = FALSE# added this filter, adds header if it's missing, make all future request conditional Code: In = TRUEYou can see the effects of these filters at wired.com Click on a link to an article, use the browsers back button, then click on the link again. Each time you click on the link, the global.js is reloaded with a max-age=1. After these tweaks, on the second reply, the javascript is cached with a max-age=86400. Mike - sidki3003 - May. 09, 2005 04:32 PM Hi Mike, Very interesting! Thanks for the helpful explanations. It will take me some time to get familiar with these filters. The added filters are just now active for both options, "Always Cache" and "Always Cache except for HTML", but they aren't needed for the first option, right? A few more questions to get me started: Did you check if servers that don't send a Last-Modified actually accept an If-Modified-Since? Along the same line, i would feel a bit uncomfortable with removing all non-html ETags. What's with servers that base their responses on just this header? Or doesn't this happen? Hmm... just had an idea, see below. Excluding local.ptron: Shouldn't it just get the "i_cache:2" (always cache) keyword? Re "Cache-Control: 3 Kill if 3xx Response", there was a reason why i added that, but i can't remember which. Do you remember a situation where not killing it for 304s helped? Hmm... maybe i did it because the counter was always reset to 86400 and i never got a fresh copy... gonna check that. Out of curiousity, what does "hvINM" stand for? ![]() When hitting back to http://www.pcworld.com/ from another page i still get this: Code: RESP 6001 : Last-Modified addedRe using the ETag field as a var stored by the browser, how about making an "array" (a la keyword/flag/volat) out of it? The first field would be the original ETag, if present. All others would get a "prx-" prefix and are stripped on the way out while going thru Prox. That way the old ETag stays intact, and we would have a nice spot for future purposes. ![]() Later, sidki - z12 - May. 09, 2005 10:05 PM Hi sidki Quote:The added filters are just now active for both options, "Always Cache" and "Always Cache except for HTML", but they aren't needed for the first option, right? I've never run "Always Cache" but I can't think on any reason you would need the others. Maybe Last-Modified, if you wanted to send conditional requests after the cache expires. Quote:Did you check if servers that don't send a Last-Modified actually accept an If-Modified-Since? Not all do. Sometimes you get the full monty back with a 200 reply, but this only seems to happen when the cache is expired and your requesting the resource again, or, they sent the wrong mime-type and you have to ask for it again. Quote:Excluding local.ptron: Shouldn't it just get the "i_cache:2" (always cache) keyword? Well, you got me there. The truth is, I'm rather clueless about Keywords & IncludeExclude. I've never used a filter set that had those features, so I need to spend some time reading your documentation and looking at the list to really get a handle on it.I know one thing for sure, if you browse cnn with these filters you'll eventually request your popup.wav file that I cannot seem to cache. Code: BlockList 204: in IncludeExclude, line 64It might be a firefox thing, but I see my User-Agent string changes to the real one on this request, so I dunno. Quote:Re "Cache-Control: 3 Kill if 3xx Response", there was a reason why i added that, but i can't remember which. Do you remember a situation where not killing it for 304s helped? Hmm... maybe i did it because the counter was always reset to 86400 and i never got a fresh copy... gonna check that. I know why you did it. If you get a 304 to html it will get cached for a day if they don't send back a html Content-Type header, which occurs often with 304's. Been there, done that. Quote:Out of curiousity, what does "hvINM" stand for? header variable If None Match. Not very original. ![]() Quote:When hitting back to http://www.pcworld.com/ from another page i still get this: I'm going to check that out after this post. Quote:Re using the ETag field as a var stored by the browser, how about making an "array" (a la keyword/flag/volat) out of it? The first field would be the original ETag, if present. All others would get a "prx-" prefix and are stripped on the way out while going thru Prox. That way the old ETag stays intact, and we would have a nice spot for future purposes. I never thought of that. That seems like a good idea, hmm... Mike Edit: checked out pcworld, got a couple of images that wouldn't cache. Firefox didn't even send the If-Modified-Since header. curious.... - z12 - May. 10, 2005 03:01 AM Hi sidki I made the following changes so the original ETag is preserved. The previous ETag & If-None-Match header filters should be replaced with the following: Code: In = TRUEI must admit, I like this better, good idea. ![]() Mike - z12 - May. 10, 2005 09:13 AM Hi sidki simplified the following filter Code: In = FALSEMike - z12 - May. 10, 2005 10:19 AM Hi sidki modified match in the following filter to account for xhtml & such, w3.org serves this to firefox Code: In = TRUEAlso at w3.org, the "Content-Location: Show real Onsite URL " filter kicks in often. When this filter matches, the response won't cache. At w3.org, clicking on the css link loads several css files that won't cache. Any thoughts on this? Mike - sidki3003 - May. 10, 2005 11:14 AM Quote:I made the following changes so the original ETag is preserved.Cool! Although appending the Prox field may be better, in case something goes wrong. Quote:I know why you did it. If you get a 304 to html it will get cached for a day if they don't send back a html Content-Type header, which occurs often with 304's. Been there, done that.Right! That was it. So "Cache-Control: 3 Kill if 3xx Response" isn't needed while using your added filters? Why do you add prxETag2 (i.e. no HTML) to all 304 responses (incl. HTML)? Re "Last-Modified: 2 Add if Missing", i find it a bit confusing to always see a Last-Modified in the title now. I used it as an indicator that the server would send a 304 for the next request. Do you have a few example links where this filter turns a 200 into a 304, so that i can get an idea about the pros and cons? Quote:modified match in the following filter to account for xhtml & such, w3.org serves this to firefoxXHTML is fine of course. But there's a bunch of other XML Content-types that i personally wouldn't want to include here. Quote:Also at w3.org, the "Content-Location: Show real Onsite URL " filter kicks in often. When this filter matches, the response won't cache. At w3.org, clicking on the css link loads several css files that won't cache.Gonna check that. sidki - sidki3003 - May. 10, 2005 12:19 PM z12 Wrote:I know one thing for sure, if you browse cnn with these filters you'll eventually request your popup.wav file that I cannot seem to cache.That works for me - also with your added filters. It's loaded but i don't see it in the log the second time - IOW Cache-Control max-age=86400 is in effect. I didn't manage to get the popup again there on reload - to test for conditional response, but here: http://www.proxomitron.info/tests/poptest.html Code: GET 1549 : If-None-Match: prxETag stripped: othQuote:It might be a firefox thing, but I see my User-Agent string changes to the real one on this request, so I dunno.The User-Agent filter (in fact, everything except the Content-Type and caching filters) is bypassed for local.ptron. ![]() sidki - sidki3003 - May. 10, 2005 01:08 PM z12 Wrote:Also at w3.org, the "Content-Location: Show real Onsite URL " filter kicks in often. When this filter matches, the response won't cache. At w3.org, clicking on the css link loads several css files that won't cache.The server isn't sending any conditional caching headers, nor does it accept any. Even when unticking the Content-Location filter, i get: Code: GET 1827 : Cache-Control killed: max-age=0One more thing about XHTML caching. While at W3.org, i remembered that i'm changing the Content-Type with "Content-Type: 3 Sel. XML to text/html" like: Code: Content-Type: text/html; charset=utf-8; PrxMsg=Changed - Filterable XML: application/xhtml+xmledit: There is another way to include XHTML for web filters: Testing for e.g. text/html and application/xhtml+xml just once, setting a var hFilter:1, and let the web filters test for this var instead of $TYPE(htm). But i guess i'll leave that for the moment when it becomes necessary. ![]() sidki - z12 - May. 10, 2005 04:53 PM Hi sidki Quote:appending the Prox field may be better, in case something goes wrong. Good point. Quote:Why do you add prxETag2 (i.e. no HTML) to all 304 responses (incl. HTML)? doh, now that shouldn't be happening. Looks like when I was fooling around with the ETag filters, I overlooked that. Quote:So "Cache-Control: 3 Kill if 3xx Response" isn't needed while using your added filters? Not when its working right. Quote:Re "Last-Modified: 2 Add if Missing", i find it a bit confusing to always see a Last-Modified in the title now. I used it as an indicator that the server would send a 304 for the next request. I suppose for html, theres no need to do that, since its only getting a 1 sec cache. My main concern was for non html. Quote:XHTML is fine of course. But there's a bunch of other XML Content-types that i personally wouldn't want to include here. I didn't want to cache rss feeds, which might not even be xml for all I remember. Fixes to follow Mike - sidki3003 - May. 11, 2005 03:07 PM Hi Mike, z12 Wrote:Oh right, it all depends on the browser sending an If-None-Match along with the If-Modified-Since.Quote:So "Cache-Control: 3 Kill if 3xx Response" isn't needed while using your added filters?Not when its working right. Mozilla, IE6, and Opera 7 (IIRC) do that. The new Opera 8 isn't sending any conditional headers for the main doc any more on reload. Netscape 4.x is sending an If-Modified-Since, but doesn't know what an ETag is. So in latter i get HTML 304s cached for one day without "Cache-Control: 3 Kill if 3xx Response". Considering that, i wouldn't see a reason to *not* strip Cache-Control from 304s - as a safety measure so to speak. I couldn't find a file where the added Last-Modified turned a 200 into a 304. Maybe you added it because you where previously removing ETags from non-html docs, in which case it's indeed useful (hmm... i think it is, it's a while back that i tested that). So to me it appears like that filter isn't needed anymore - unless i finally find a working example. Which brings me to another thing that helps caching - maybe that was why you removed ETags: Removing the If-None-Match, *if* there is an If-Modified-Since (in response to a *real* Last-Modified). That may sound like the opposite of my first statement, but we just rely on the If-None-Match to grab the previously determined content-type, it doesn't necessarily need to leave the machine. See for instance the PayPal logo on this board, or various files (images, CSS, JS) at MSN. I tested a bit with Firefox/Opera/IE and it looks like adding a Prox ETag field is just needed for 200 responses: The old field doesn't get overwritten by a different ETag value, if it comes from a 304. Regarding my set, i still think that checking for "*xml*" isn't needed b/c of "Content-Type: 3 Sel. XML to text/html". ![]() Thoughts? sidki - z12 - May. 11, 2005 07:12 PM Hi sidki Quote: Considering that, i wouldn't see a reason to *not* strip Cache-Control from 304s - as a safety measure so to speak. I had some new filters to post, but after thinking about that, I think I'll wait. Are you saying reverse the filter order for these two? Code: Cache-Control: 3 Kill if 3xx Response:Sounds interesting, I'm going to check that out. Still having problems caching anything when $RDIR is used, including local.ptron. Don't know what to do about this. I emptied my cache, went to cnn then viewd "about:cache" in another tab. This caught my eye: Code: Key: http://local.ptron/killed.gifI'm sure the first one is the result of a $RDIR, but I'm not sure why the two are different. Mike - sidki3003 - May. 12, 2005 07:53 AM z12 Wrote:I had some new filters to post, but after thinking about that, I think I'll wait.Just thoughts, open to discussion. ![]() I'll try to boil them down to (modified) filters and actually test them, probably won't have enough time today. Quote:Still having problems caching anything when $RDIR is used, including local.ptron. Don't know what to do about this. I emptied my cache, went to cnn then viewd "about:cache" in another tab.Usually those killed.gif calls aren't $RDIRs but replacements for (in most cases harmless positioning) GIFs by the wegbug filter (red.gif for a real webbug). I cleared cache, went to CNN. Got "Fetch count: 1" for the 55 bytes killed.gif, which is the real - compressed - image. Got "Fetch count: 20" for the 832 bytes killed.gif, which is a memory-only uncompressed copy. That's how Mozilla handles images. Reloaded CNN page,"Fetch count: 2" for 55 bytes pic, 304 for killed.gif in Prox' log window. So 304s are included in fetch count. Went to some other page with a lot of killed.gif replacements and back to CNN, fetch count still 2. So "Cache-Control 1 day" is in effect. sidki - sidki3003 - May. 12, 2005 12:47 PM Okay, i did a few tests. That global.js at Wired is answering two of my questions, d'oh! - Your Last-Modified filter is needed. - Our Cache-Control header has to be added to 304s as well. I've no solution to that "browsers not sending If-None-Match" thingy, so i unticked "Cache-Control: 4 Kill if 3xx Response: Cache!" for now. Here is a - barely tested - WIP set (most log lines are temporary): Code: [HTTP headers]I've no more time just now, gonna test them in more detail later... sidki - Seikatsu - May. 13, 2005 01:34 AM Wow, there's some serious conferring going on in this thread! Well, if somebody could stop for a moment and look at why the fields on Weather.com's detail page are empty. I would appreciate it. Thanks. |