Thursday 31 March 2016

Office 365 performance – our Azure CDN image renditions solution

In the last post image renditions causing slow page loads in SharePoint Online, I talked about how Office 365/SharePoint Online has some sub-optimal performance around images and image renditions, at least at the present time. Numerous people got in touch to say they also see the same issue. However, we are implementers - and we bring solutions, not problems! So in this post I’ll go into a bit more detail on our way of working around this challenge, and how it can improve page load times.

To recap, the problem is related to the image renditions functionality of SharePoint. This is a useful feature which automatically creates additional versions (sizes) of images in a publishing site such as an intranet. However, when a user hits a page which has these images - often a home page or section page – and they need to be downloaded, we see a big delay of up to 3 seconds. Clearly if a page is taking say, 5 or 7 seconds to download in total, this is a big chunk of this time. Surprisingly, the delay is NOT the actual image being sent over the wire to the user. Instead, analysis shows the 3 seconds or so pause happens on the server in SharePoint Online – most likely because of “cache misses” due to the fact that the renditions framework wasn’t originally designed for architectures used in Office 365. So, performance of this bit of the platform isn’t optimal - our solution was to roll our own renditions framework, and this post describes what we did.

Using Azure to implement renditions

Before delving into the implementation, here’s how I described the process last week:

  1. An intranet author adds or changes an image in SharePoint
  2. A remote event receiver executes, and adds an item to a queue in Azure (NOTE – RERs are not failsafe, so we supplement this approach with a fall-back mechanism which ensures broken images never happen. More on this below).
  3. An Azure WebJob processes the queue item, taking the following steps:
    1. Fetches the image from SharePoint (using CSOM)
    2. Creates different rendition sizes (using the sizes defined in SharePoint)
    3. Stores all the resulting files in Azure BLOB storage
  4. The Azure CDN infrastructure then propagates the image file to different Azure data centers around the world.
  5. When images are displayed in SharePoint, the link to the Azure CDN is used (courtesy of a small tweak to our display template code). Since CDNs work by supplying one URL, the routing automatically happens so that the nearest copy of the image to the user is fetched.

For those interested, let’s go into the major elements:

The Remote Event Receiver and associated SharePoint app/add-in

There are two elements here:

  • A SharePoint add-in used for our remote code (hosted in Azure) to authenticate back to SharePoint
    • We register this add-in using AppRegNew.aspx, specifying a Client ID and client secret which will be used for SharePoint authentication
  • A remote event receiver used to detect when new images are added

The two are related because we implement the RER code as a provider-hosted add-in using the Visual Studio template (which gives 2 projects, one for the app package and one for the app web). In actual fact, this particular RER doesn’t need to communicate back to SharePoint – when it fires, it simply adds an item to a queue in Azure which we created ahead of time. The object added to the queue contains the URL of the image which was just added or modified, and we use the Azure SDK to make the call.

We apply this RER to all the image libraries in the site which needs the solution. We do this simply as a one-off setup task with a PowerShell/CSOM script that iterates through all the subsites, and for each image library it finds it binds the RER. My post Using CSOM in PowerShell scripts with Office 365 shows some similar snippets of code which we extended to do this. The script can be run on a scheduled basis if needed, so that any new image libraries automatically “inherit” the event receiver.

The Azure WebJob

The main work is done here. The job is implemented as a “continuous” job in Azure, and we use an Azure QueueTrigger to poll the queue for new items. This is a piece of infrastructure in Azure that means that a function in our WebJob code is executed as soon as a new item is added to the queue – it’s effectively a monitor. We initially looked at using a BlobTrigger instead (and having the RER itself upload the image to Azure BLOB storage to facilitate this), but we didn’t like the fact that BlobTrigger can have a bigger delay in processing – we want things to be as immediate as possible. Additionally, remote event receivers work best when they do minimal processing work – and since a quick async REST call is much more lightweight than copying file bytes around, we preferred this pattern. When a new item is detected, the core steps are:

  1. Fetch details of the default rendition sizes defined in SharePoint for this site. This tends to not change too much, so we do some caching here.
  2. Fetch details of the *specific* rendition sizes for this image, using a REST call to SharePoint. We need to do this to support the cool renditions functionality which allows an author to specifically zoom-in/crop on a portion of the image for a specific rendition – y’know, this thing:

    Image renditions - crop image

    If an author uses this feature to override the default cropping for a rendition, these co-ordinates get stored in the *file-level* property bag for the item, so that’s where we fetch them from.
  3. Fetch the actual image from the SharePoint image library. We use CSOM and SharePoint add-in authentication to make this call – our WebJob knows the Client ID and client secret to use. We obtain the file bytes i.e. effectively downloading the file from Office 365 to where our code is running (Azure), because of course we’re going to need the file to be able to create different versions of it.
  4. For each rendition size needed:
    1. Resize the image to these dimensions, respecting any X and Y start co-ordinates we found (if the author did override the cropping for this image). There are many ways to deal with image resizing, but after evaluating a couple we chose to use the popular ImageProcessor library to do this.
    2. Upload each file to Azure BLOB storage. We upload using methods in the Azure SDK, and ensure the file has a filename according to a URL convention we use – this is important, because our display templates need to align with this.

Once the files have been uploaded to Azure BLOB storage, that’s actually all we need to worry about. The use of Azure CDN comes automatically from files stored there, if you’ve configured Azure CDN in a certain way. I’ll cover this briefly later on.

Authentication for the Azure WebJob

I thought long and hard about authentication. In the end, we went with SharePoint app-only authentication, but we also considered using Office 365/Azure AD authentication for our remote code. Frankly that’s my “default” these days for any kind of remote code which talks to SharePoint (assuming we’re talking about Office 365) – as discussed in Comparing Office 365 apps with SharePoint add-ins, there are numerous advantages in most cases, including the fact that there is no “installation” of an add-in, and the authentication flow can be started outside of SharePoint.

However, one advantage of using SharePoint authentication is that we aren’t tied to using the same Azure subscription/directory as the one behind the Office 365 tenant. Our clients may not always be able to support that, and that was important for us in this case – using this approach means we don’t have that dependency.

Display templates

As mentioned previously, a big part of the solution is ensuring SharePoint display templates align with file URLs in the CDN. So if we’re using roll-up controls such as Content Search web parts around the site and these reference rendition images, these also need to “know the arrangement”. Effectively it’s a question of ensuring the thing that puts the file there and the thing that requests the file are both in on the deal (in terms of knowing the naming convention for URLs). It’s here that we also implement the fall-back mechanism (more on this later) to deal with any cases where a requested image isn’t found on the CDN. In terms of the swapping out the default behaviour of fetching images from SharePoint to fetching them from the CDN instead, it just comes down to how the value used within the <img src> attribute is obtained:

<img src="_#= imgSrc =#_" />

Simply implement a function to get that value according to your URL convention, and you’re good. Although not shown in the snippet above, it’s here that our fall-back mechanism is called, courtesy of the ‘onerror’ handler on the <img> tag.

WebAPI

Since we’re talking about architecture pieces, there’s some WebAPI thrown in there too – this is part of the fall-back mechanism, described later.

Azure CDN configuration

As mentioned earlier, the CDN part is easy with Azure. When a file gets uploaded to Azure BLOB storage, it gets a URL in the form:

https://[MyAzureContainer].blob.core.windows.net/MyImage.jpg

..but if you configure the CDN, it can also be accessed on something like:

https://[MyCDNEndPoint].azureedge.net/MyImage.jpg 

When the latter URL is used, in fact the file will be requested from the nearest Azure CDN data center to the user. If the file hasn’t propagated to that location yet, then the first user to be routed through that location will force the file to be cached there for other users in that geographical region. Our testing found this additional delay is minimal. There are a few more CDN things to consider than I’ll go into detail on here, but initial configuration is easy – simply create a CDN configuration in Azure, and then specify it is backed by Azure storage and select the container where you’re putting your files. The images below show this process:

image

Create CDN endpoint from BLOB storage

The fall-back mechanism

So I mentioned a few times what we call “the fall-back mechanism”. I was always worried about our solution causing a broken image at some point – I could just imagine this would be on some critical news article about the CEO, on a big day for one of our clients. Fortunately, we were able to implement a layer of protection which seems to work well. In short, we “intercept” a broken image using the HTML 5 ‘onError’ callback for the <img> tag. This fires if an image isn’t found on the CDN for any reason, and this kicks off our mechanism which does two things:

  1. Substitutes the original rendition image from SharePoint - this means we’re “back to the original situation”, and we haven’t made anything worse.
  2. Makes a background async call to our WebAPI service – this adds an item to our queue in Azure, meaning the image gets processed for next time. This is the same as if the RER fired against this particular file.

The image below shows what happens (click to enlarge):

CDN image renditions - fallback mechanism 2

One nice thing about this mechanism is it works for existing images in a site. So if the mechanism is implemented in an existing site with lots of images, there’s no need to go round “touching” each image to trigger the remote event receiver. Instead, all that needs to happen is for someone to browse around the site, and the images will be migrated to the CDN as they are requested.

Challenges we encountered

Along the way we faced a couple of challenges, or at least things to think about. A quick list here would include:

  • Thinking about cache headers from Azure CDN, cache expiration and so on – this relates to scenarios where an author may update an image in SharePoint but not change the filename. Clearly end-user browsers may cache this image (and an end-user can’t be expected to press CTRL F5 to do a hard refresh just because you’ve updated a file!). My colleague Paul Ryan wrote a great post on this at Azure CDN integration with SharePoint, cache control headers max-age, s-maxage
  • Parallel uploads to Azure (e.g. if we’re creating 8 different sizes for image found, may as well upload them in parallel!)
  • Ensuring we understand how to handle different environments (dev/test/production tenants with different Azure subscriptions)
  • Implementing a nice logging solution
  • Testing

Summary

As I summarized last time, it would be great if the original performance issue in Office 365 didn’t occur. But CDNs have always been useful in optimizing website performance, and in many ways all we’re doing is broadening Microsoft’s existing use of CDNs behind Office 365. The building blocks of Azure WebJobs, Azure file storage, CDN, SharePoint add-ins, remote event receivers, WebAPI and so on mean that the Office 365/SharePoint Online platform can be extended in all sorts of ways where appropriate. This was a solution developed for clients of Content and Code so it’s not something I can provide the source code for, but hopefully these couple of articles help awareness of the issue and architectural details of one way of working around it.

2 comments:

Kristian M said...

This looks great Chris!

Is there any chance you will be sharing parts of it on your GitHub page? I would love to have a look at "the sharepoint nuts an bolts" so to speak :)

Chris O'Brien said...

@Kristian,

Thanks :) But I'm afraid I can't share the code, it's effectively IP that belongs the company I'm working with (Content and Code).

Good luck if you decide to create something similar!

Cheers,

COB.