Wednesday 15 June 2011

SP2010 Continuous Integration–pt 1: Benefits

On the back of my talk at the SharePoint Best Practices Conference, this is the first article in a series on implementing automated builds (aka Continuous Integration) in SharePoint development projects – specifically using Team Foundation Server 2010. In actual fact, the rest of the series may appear on a Microsoft blog (or a whitepaper) rather than this one, but some Microsoft folks and I are still working through the detail on that and I wanted to at least discuss the contents and get the thing kicked off here. With this particular topic, I could imagine that some folks will dismiss it as being irrelevant to them (and not read anything in this series), when in fact there may be pieces which would work for them and can be implemented fairly easily. Let’s be clear though – Continuous Integration (CI) is probably best suited to projects with the following characteristics:

  • Development-oriented – perhaps with more than, say, 3 Visual Studio projects
  • Multiple developers
  • Fairly long-running (e.g. > 2-3 months; to give the team a chance to implement CI alongside the actual deliverables)

The original plan for this blog series/whitepaper (which may change as myself, Kirk Evans and Mike Morton evolve it) looks something like this:

  1. CI benefits - why do it? (this post)
  2. TFS 2010 Build installation/config
  3. Creating your first TFS build process for SharePoint projects
  4. Implementing assembly versioning
  5. Using PowerShell to deploy WSPs from build output
  6. Running coded UI tests as part of a build
  7. Integrating tools such as SPDisposeCheck and Code Analysis into the build

Benefits of Continuous Integration

Although CI has become a fairly standard approach in the plain .Net world, it’s fair to say that SharePoint brings some additional considerations (as usual) and things are slightly more involved. Consider that in .Net the output is often either an executable or a website which can often be deployed with XCOPY. Yet SharePoint has Solutions and Features, all the logical constructs such as web applications and site collections, and is often heavily “state-based” (e.g. activate a Feature, then create a site, then create a list and so on). So CI is more involved in many cases, but all the feedback I hear tells me more and more people want to adopt it, and rightly so. Here are some of the benefits as I see them:

  • Consistent builds
    • E.g. no human error leading to debug (not release) build assemblies
  • Automatically versioned assemblies
  • Ability to track code versions in production back to release labels in source control
  • Less time spent by developers keeping their environments up-to-date (this alone can be huge)
  • Team cohesion through build notifications
    • Upon each build (e.g. after every check-in), everyone on the team sees a pop-up in the system tray letting them know if the build is currently passing/failing
  • Automated testing
    • Once WSPs have been deployed, run a series of tests (e.g. unit tests/UI tests/load tests) to check what would be deployed to the customer
    • Also run checks on the code (e.g. code analysis, SPDisposeCheck) to keep on top of any code smells from that day

What it looks like

Different ‘styles’ of CI

Once you have the capability to use TFS Build (remember we’ll talk about topologies/install next post), one of the first steps is to establish what you’re trying to achieve with CI. In my mind, there are two broad styles of Continuous Integration:

Style:

Best suited for:

“Rebuild everything” Pre-live
“Build current sprint against production state” Post-live (or frequent delivery)

The former is typically simpler than the latter, so conveniently that’s the style I’m focusing on this series (though I’m really discussing the core mechanics which apply to both styles). That said, let’s discuss the more complex “building against the current state” type for a moment. This model is probably the best choice if your solution is in production but still evolving – after all, would you really want to focus all your effort on testing the “build everything from scratch” process when you might only do that in a disaster recovery situation? Probably not. In this model clearly the idea of state is very important and, unsurprisingly, the best way of dealing with this is to use virtual machine snapshots. The idea is that step 1 of the build process rolls the target machine back to a snapshot which is a reflection of the current state of production (same code versions, solutions/features etc.) - from there, your build process deploys the resulting WSPs to this target and you get to see whether your current work will deploy to/run in production successfully.

Microsoft recognize this as an important scenario and the ‘Lab Management’ capability in TFS Build is designed to help. It provides special ‘activities’ for the build workflow to rollback/take snapshots – these can be dropped into the workflow (N.B. we’ll discuss the build workflow in article 3) and there’s also a wizard UI to help you select available snapshots etc. Now, there is a catch – obviously there’s a wide range of virtualization platforms out there these days, and Microsoft only go the extra mile to make it easy with their own; specifically this requires Hyper-V with System Center Virtual Machine Manager. The good news is that it shouldn’t be too difficult to achieve the same result with another platform – less slick perhaps, but use of an InvokeProcess activity which calls out to the command-line (e.g. something like vmrum revertToSnapshot C:\VMs\MyVM.vmx MySnapshotName for VMWare) should do the trick just fine.

Configuring the build (quick overview)

To get started, you create a new build definition in Visual Studio Team Explorer, and configure which Visual Studio projects/solutions should be built. Many other settings live here too, such as whether code analysis should be performed during the build:

BuildDef_Process

From there, we need to select when builds happen – manually, on every check-in, every night and so on:

BuildDef_Trigger 

In TFS 2010 Build, the actual steps which happen during a build (e.g. retrieving files from source control, compiling, deploying, testing etc.) are configured as a .Net 4.0 workflow. For SharePoint builds, we need to make some modifications to the sample builds which come with TFS – this can take some figuring out but I’ll post mine as a potential starting point in article 3 ‘Customizing the build workflow for SharePoint builds’:

BuildDef_WorkflowZoomedIn 

By this point, each build will compile assemblies from the latest code, generate WSP packages and copy them to a remote SharePoint box (assuming you’re not using an ‘all-in-one’ topology). We now need something to actually deploy/upgrade the WSPs in SharePoint, and perhaps do some other steps such as creating a test site – the build workflow will hand-off to a PowerShell script to do this. I’ll supply my PowerShell script later in the series and discuss the mechanisms of passing data from the workflow to the script, and collecting the success/failure return value – for now it’s just important to realize that a big part of automated builds is just standard PowerShell which you might already be using, and that every implementation will vary somewhat here depending on what you’re building for your client.

So by now, we would have a working build – WSPs are being built and deployed automatically, and any test site is being recreated to check the validity of the build. Now we can think about some automated tests to help us verify this. After all, we might be able to get a WSP deployed but is the actual functionality working or would a user see an error somewhere? Is there any way we can establish this without a tester manually checking each time a build completes?

Running automated tests (e.g. UI tests) as part of build

Now, it’s not that I’m against unit testing or anything….heck we even have some in our current project! But it definitely feels like this area is still challenging enough with SharePoint that most projects don’t do it, e.g. due to the reliance on mocking frameworks. I’m thinking more and more that the UI testing capabilities in Visual Studio 2010 offer a great alternative – coded UI tests can simulate a user on the website by using the controls on the page (e.g. navigation, buttons, a form, whatever) in exactly the same way. Sure, it’s not a unit test, and ideally you’d have both, but it does seem to have a much lower barrier to entry that unit testing – and with a few tests you could get good real-life test coverage pretty quickly. Here’s what it looks like – first the test gets recorded in the browser (a special widget appears):

CodedUITest_AddAssertion2

After pressing some buttons to imitate a specific user action (something I want to test), I then add an assertion to check that something is present on the page. In this case, I’ve pressed a ribbon button which has done something, and the assertion is testing that the green status bar is visible with a specific message, meaning the action was successful. If the action wasn’t successful, the green status message would not be present and the test would fail. What’s interesting about this is what’s behind the ribbon button – it’s some functionality developed for our current client around the social features of SharePoint, and lot is happening there. The button calls into some jQuery which calls a HTTP handler, which invokes a custom service application which talks to the data access layer, which then writes a record to a custom SQL database. You see what I mean about integration tests rather than unit tests?

But it does mean I can test a lot of code with minimal effort. There are some things I need to guard against – like the fact that the UI could return a positive when something underneath failed, but one thing which can mitigate this (in addition to simply recording many tests) is the fact that the tests generate .Net code, meaning I can supplement it with other checks if required. The cool part, of course, is that it’s very easy to integrate these tests into the automated build – if tests like these are automatically running every night/every few hours, you get to find regression bugs very quickly.

Getting the payback - finding the cause of a failed build

So if we had automated builds in place, what happens when a bug is checked in? How do we find out? Well that might depend upon the specifics of the bug and the tests you have in place, but let’s work through an example. Let’s say there’s an issue in the DAL in the example used above –we could imagine that there’s a mismatch of stored procedure parameters for example, but for simplicity I’m just going to add a dummy exception to the method which is behind the ribbon button:

throw new ApplicationException(“Dev did something silly”);

The developer checks in because everything builds/packages fine, and y’know, because he/she has been testing throughout and that little last change couldn’t possibly have broke anything! So the build runs - if you’re in the office at this time (you might not be if it’s configured to run nightly), then team members have the option of seeing a system tray notification pop-up whenever a build succeeds or fails:

BuildNotificationFailure

The main entry point is the build report – this gives us a view onto any build warnings/errors and a summary of test results:

BuildResultsSummary

There is also a more detailed log, and if UI tests are included then you’ll see the result of these – note the ‘Error Message’ section below shows that one of our asserts failed in a test:

CodedUITest_Result1

That’s useful, but we kinda need more detail to understand why we have a bug. Depending on how the build is configured, we should see links to some files at the bottom of the build report.

CodedUITest_Result6

Let’s walk through these files in turn:

  1. A .png file – this is a screenshot of what the UI looked like at the time of the failing test. This is incredibly useful, and we can see that the UI did not show the green success bar – so we get an understanding of how the bug manifested itself in the UI:

    AutomatedBuild_FailedUITest2 
  2. An XML file – this is data collected from the Windows event log at the time of the failing test. In many cases, this is how you’ll find the actual bug – we can clearly see the class/method and stack trace of where the problem occurred:

    CodedUITest_Result3
    Note that this does require that in your code you’re reporting exceptions to the event log – we’re using the SharePoint Patterns & Practices libraries to do this. You might extrapolate that what the SharePoint world probably needs is a ULS data collector – I’ve mentioned this to Microsoft, and folks seemed to agree that could be a good idea. In any case, we now have a screenshot of the failed test and event log data which should locate our bug, woohoo! But it doesn’t stop there – other files captured include…
  3. An iTrace file – this is Visual Studio 2010’s incredible IntelliTrace feature. IntelliTrace is a ‘historical debugger’, which allows a developer to step into a debugging session even if he/she wasn’t there at the start of it. In this context, it’s used from a test result – the image below doesn’t quite show it, but I can select an exception which occurred during a UI test (or any other kind of test for that matter) and then press a ‘Start debugging’ button. This will take me into a ‘live’ debugging session where all variables will have the values they did during the test, and I can step through line-by-line. And this is despite the fact that the test ran at 3am from an overnight build – so no worries that a dev will not be able to reproduce an issue the build encountered:    

    AutomatedBuild_IntelliTrace
  4. A .vsp file – this is the results of a code profiling session, where the code which executed during the tests was evaluated for performance. The first image below shows me ‘hot paths’ in the codebase i.e. where we spent a lot of time, and therefore could consider optimizing:

    CodedUITest_Result7

    This tells me that our WeatherWebPart class and some code in our ActivityFeedSubscription namespace might benefit from some attention. Drilling down, we can see some graphical representations of individual methods:

    CodedUITest_Result9

Going further still – possible things to include in the build

The last area, code profiling, wasn’t so much about identifying the bug as perhaps a daily check on any new alarm bells in our codebase. Expanding that theme further, consider the following as things we could fairly easily include in an automated build:

  • Unit tests (of course)
  • SPDisposeCheck
  • Code analysis (FxCop)
  • Documentation builds
  • Creating TFS work items from build/test failures
  • Etc.

Really, the sky’s the limit given that you can shell out to PowerShell or the command-line during the build.

Summary

Although fairly common in the .Net world, SharePoint projects which do automated builds are fairly rare - arguably due to other complexities that come with developing with SharePoint. Team Foundation Server 2010 makes build automation easier to achieve than previously with the Microsoft stack, and has superb integration with related Visual Studio capabilities such as testing. Implementing these techniques can make a big difference on SharePoint development projects. Although this post presents an overview of the end result, a couple of Microsoft folks and I are working on detailed content (roughly following the article series listed at the beginning) – when I know the final home for this content (blog/whitepaper), I’ll keep you updated and will amend this post with appropriate links.