Sunday 17 May 2009

Optimization, BLOB caching and HTTP 304s

There's been an interesting mini-debate going on recently in terms of where to store static assets used by your site - images, CSS, JS files and so on. Broadly the two approaches can be characterized as:

  • Developer-centric - store assets on the filesystem, perhaps in the 12 hive
  • Author-centric - store assets in the content database, perhaps in the Style Library which comes with publishing sites

Needless to say these options offer different pros and cons depending on your requirements - Servé Hermans offers a good analysis in To package or not to package: that is the question. However, I want to throw another point into the debate - performance, specifically for anonymous users. Frequently, this is an audience I care deeply about since some of the WCM sites I work on often have forecast ratios of 80% anonymous vs. 20% authenticated users. Recently I was asked to help optimize an under-performing airline site built on MOSS - as usual the problem was a combination of several things, but one of the high-impact items was this decision to store assets in one location over the other. In this post I'll explain what the effect on performance is and why you should consider this when building your site.

The problem

Once they've been loaded the first time, most of the static files a website uses should be served from the user's local browser cache ("Temporary internet files") - without this, the internet would be seriously slow. Consider how much slower a web page loads when you do a hard refresh (ctrl+F5) compared to normal - this is because all the images are forced to be re-downloaded rather than served from the browser cache. Unfortunately, for files stored in some common SharePoint libraries/galleries (i.e. the author-centric approach) SharePoint doesn't deal with this quite right in some scenarios - most of the gain is there, but despite having the image locally, the browser still makes a request for the image - the conversation goes like this (for EACH image on the page!):

Browser: I need this image please - I cached it last time I came at [date/time], but for all I know it's changed since then.
Server: No need dude, it's not changed so just use your local copy (in the form of a HTTP 304 - "Not modified")
Browser: Fair enough, cheers.

This essentially happens because the file was not served with a "cacheability" HTTP header to begin with. Needless to say, this adds significant time to the page load when you have 30+ images/CSS/JS files referenced on your page - potentially several seconds in my experience (under some circumstances), which of course is a huge deal. If say, the user is in Europe but the servers are in the U.S., then suddenly this kind of network chatter is something we need to address. Needless to say, in the majority of cases we're happy to cache these files for a period since they don't all change too often, and we get better performance as a result.

The Solution (for some SharePoint libraries *)

Mike Hodnick points us to part of the solution in his highly-recommended article Eliminating "304" status codes with SharePoint web folder resources. Essentially, SharePoint's BLOB caching feature saves the day since it serves the image with a "max-age" value on the HTTP header, meaning the browser knows it can use it's local copy of the file until this date. This only happens when BLOB caching is enabled and has the max-age attribute like this (here set to 84600 seconds = 24 hours):

<BlobCache location="C:\blobCache" path="\.(gif|jpg|png|css|js|aspx)$" maxSize="10" enabled="true" max-age="86400" />

When we configure the BLOB cache like this we are, in effect, specifying that it's OK to cache static files for a certain period, so the "cacheable" header gets added. HOWEVER, what Mike doesn't cover is that this only happens for authenticated users - files served out of common content DB locations such as the Style Library and Master Page Gallery still do not get served correctly to anonymous users. Note this isn't all SharePoint libraries though - so we need to be clear on exactly when this problem occurs.

* Scope of this problem/solution

Before drilling down any deeper, let's stop for a moment and consider the scope of what we're discussing - a site with:

  • Anonymous users
  • Files stored in some libraries - I'm not 100% sure of the pattern but discuss it later - the Style Library and Master Page Gallery are known culprits however. Other OOTB libraries such as SiteCollectionImages do not have the problem.

If you don't have this combination of circumstances, you likely don't have the problem. For those who do, we're now going to look closer at what's going on, before concluding with how we can work around the issue at the end.

Drilling deeper

For a site which does have the above combination of circumstances, we can see the issue with Fiddler - as an anonymous user browsing to page I've already visited, I see a stack of 304s meaning the browser is re-requesting all these files:

BlobCachingDisabled_304s

However, if I'm authenticated and I navigate to the same page, I only see the HTTP 200 for the actual page, no 304s:

BlobCachingEnabled_No304s

Hence we can conclude it works fine for authenticated users but not for anonymous users.

So what can we do for our poor anonymous users (who might be in the majority) if we're storing files in the problematic libraries? Well, here's where I draw a blank unfortunately. Optimizing Office SharePoint Server for WAN environments on TechNet has this to say on the matter:

Some lists don't work by default for anonymous users. If there are anonymous users accessing the site, permissions need to be manually configured for the following lists in order to have items within them cached:

  • Master Page Gallery
  • Style Library

Aha! So we need to change some permissions - fine. This seems to indicate that it is, in fact, possible to get the correct cache headers added to files served from these locations. Unfortunately, I simply cannot find what permissions need to be changed, and nobody on the internet (including the TechNet article) seems to detail what. The only logical setting is the Anonymous Access options for the list - these are all clear by default, but adding the 'View Items' permission (as shown below) does not change anything:

AnonPermissions

As a sidenote, the setting above is (I believe) effectively granting read permissions to the identity which is used for anonymous access to the associated IIS site. So in IIS 7.0, I'm fairly sure you'd achieve the same thing by doing this:

AddPermsIUsr

So the problem does not go away when anonymous users are granted the 'View Items permission, and what I find interesting about this is that a closer look with Fiddler reveals some inconsistencies. The image below shows me browsing to a page anonymously for the first time, and to save you the hassle we can derive the following findings:

  • Files served from the 'SiteCollectionImages' library are given the correct max-age header (perhaps expected, since not one of the known 'problem libraries' e.g. Style Library)
  • Files served from the '_layouts' folder are given a different max-age header (expected, settings from the IIS site are used here)
  • Some files in the Style Library are in fact given a the correct max-age header! (not expected) 

MixedHeaders_Anonymous

So the 2 questions which strike me here are:

  • Why are some files being served from 'Style Library' with the correct header when most aren't?
  • Why can SharePoint add the 'max-age' header to files in the 'SiteCollectionImages' library but not the 'Style Library'?

The first one is a mystery to me - it's perhaps not too important, but I can't work it out. The second one might be down to how the libraries are provisioned - the 'Style Library' is provisioned by declarative XML in the 'PublishingResources' Feature, whereas the 'SiteCollectionImages' library is provisioned in code using the same Feature's activation receiver. Could this be the key factor? I don't know, but I'd certainly be interested if anyone can put me straight - either on this or the mystery "permissions change" required to make BLOB caching deal with libraries such as the 'Style Library'.

Conclusion

The key takeaway here is that for sites which want to take advantage of the browser caching for static files (for performance reasons) and have anonymous users, we need to be careful where we put our images/CSS/JS files as per Mike Hodnick's general message. If we want to use the author-centric approach and store things in SharePoint libraries, we need to consider which libraries (and test) if we will have the 304 problem. Alternatively, we can choose to store these files on the filesystem (the developer-centric approach) and use a virtual directory with the appropriate cacheability settings to suit our needs. My suggestion would be to use a custom virtual directory for full control of this, since the default settings on the '_layouts' directory ("cache for 1 year") are unlikely to be appropriate.

10 comments:

Anonymous said...

Personally I think this is a huge bug. Someone has noticed it back in 2007 in a comment on this post:
http://blogs.msdn.com/ecm/archive/2006/11/08/how-to-make-your-moss-2007-web-site-faster-with-caching.aspx

I don't understand that MS doesn't fix this. Even SP2 doesn't.

Tyler Holmes said...

Does this behave consistently across browsers?

For FireFox, when some asset gets pulled from the Style Library yes it's true that it gets a Cache-Control of private,max-age=0...but there's also an ETag and a Last-Modified header that get sent in the original response.

Other browsers will pick these up and the next time they send a request they'll send If-Modified-Since and If-None-Match headers (in addition to Cache-Control).

When SharePoint gets these it will hand out the appropriate response code 304 (Not Modified).

Could it be that this "bug" is in IE 6/7 and is fixed in future MS browsers?

Put fiddler away for a sec and have another look with FireFox and Tamper data. The fresh set of eyes may change your mind.

My Best,
Tyler

Michael Hanes said...

We work around this on www.westernaustralia.com and other sites by doing some cache header rewriting on our reverse proxy using FilterProxy. The reverse proxy sits between the WFEs and the internet and handles all content being shunted down the line. As all of our sites are served through the proxy this also makes life easier as we don't have to configure caching at the IIS site level.

I've also noticed differences between IE and FF (FF always seems to issue the conditional get but maybe that's just due to browser settings).

Chris O'Brien said...

@Tyler, @michhes,

Unless I'm missing something, I see the same behaviour in both browsers. If I check the request for a certain Style Library file in both IE/Fiddler and FF/Tamper, I see the same If-Modified-Since and If-None-Match data. So that's one thing, but in any case - isn't the issue that the request/subsequent 304 is happening in the first place for anonymous users? Regardless of the lower level headers, surely the thing I care about is that the round-trip is occurring and is therefore slowing down page loads - the browser wouldn't make this request if the file was initially served correctly.

It sounds to me like solutions like michhes's reverse proxy approach are the only way to combat this currently - I'm curious if this was implemented here because the servers are in Western Australia but site users might be in other continents, so this is exactly the kind of optimization required?

Thanks for the comments..

Chris.

Anonymous said...

I see this behaviour also in multiple browsers (even IE8). It is strange that authenticated users do get a max-age according to the max-age in the blobcache and anonymous users get a max-age of 0. The internet is full of anonymous users...
And the performance bottleneck is as Chris says the round trip.
Perhaps a custom http adapter that adds/changes the max-age header can also be a solution if there is no reverse proxy involved?

Chris O'Brien said...

Doing something with a HTTP module or similar - yep, that's a good thought.

Although I guess one could probably argue it would be simpler/more efficient to just store the files somewhere else. Still, good to have the workaround if for some reason the files have to be stored in the Style Library e.g. for an existing site.

Thanks,

Chris.

Mike Hodnick said...

Thanks for your reference to my post on Blob caching and 304's Chris! - Mike

Maxime Bombardier said...

Thanks Chris for the info, good post.

It's a known bug that isn't fix nor planned until SharePoint 2010. I'd suggest using a different library to store your custom CSS/JS/XSL.

The problem with file system folders will be to synchronize them across multiple servers, especially if you allow designers to touch your styles.

Maxime

Chris O'Brien said...

@Maxime,

Aha - very interesting to hear that's a confirmed bug, particularly from an MS person!

Many thanks for the info.

Chris.

P.S. Agree that if designers need access to the files, an alternative library which Fiddler shows does BLOB cache properly might be more appropriate than filesystem.

Jason Ramoutar said...

Chris, thanks for this write up. It's very helpful and informative.