Thursday 1 May 2014

Office 365 hybrid and search – presenting results from both on-premises and SharePoint Online sites

In my previous post about Office 365 and on-premises SharePoint hybrid deployments, we discussed the idea that, whilst it can be the right solution in many cases, a hybrid deployment is NOT simple and really only encompasses search, BCS, and Duet (for SAP) integration. Other aspects around document management, sites/navigation, metadata, customizations, and perhaps any other thing you care to mention are not synchronized or joined-up in any way. As a reminder, these articles are structured like this:

  1. Office 365 SharePoint hybrid – what you DO and DON’T get
  2. Office 365 hybrid and search – presenting results from both sides [this article]

Even in search, one of the cornerstones of hybrid, what you get might not be what you’re hoping for – results from the “remote” environment are displayed in a separate Result Block or Promoted Result. Here’s a reminder of the (more powerful) Result Block approach:

SharePoint hybrid search results - result block

Clearly this isn’t the best user experience. And the alternative, Promoted Results, are more about defining an individual result to appear at the top for a given search (i.e. a Best Bet) – so that’s great for individual results but not entire sets.

Last time, I discussed some of the questions and issues that can be raised by the current experience:

  • How many results should be shown (from each source) on page 1?
    • HINT – you’re only allowed a maximum of 10 for a Result Block anyway :)
  • What about paging? What happens when I go to page 2?
  • How can I easily see more results from both sources?
  • How do I determine which results are more relevant, from the local or remote environment?
  • Are there any options for seeing the results in a more integrated way?
Important news on this topic (April 2014)
At the recent SharePoint Conference 2014, Microsoft stated that they were looking at this problem (of disjointed search results in hybrid deployments), and hope to have a solution later this year. So, hopefully this whole problem will go away at some point. In any case, I think it’s still worthwhile discussing the current situation and possible interim techniques to work around it.

 

Investigation – can results be merged using code?

So, given the limitations of of the current out-of-the-box user experience, I was keen to see if there are other options. Indeed, for some things we are putting together for my current client, the requirement is very much “just search across everything and show me the most relevant things!” I think there’s nothing wrong with wanting that as the user experience. So as a developer, I wondered if going direct to the search API would provide any other options – perhaps the standard search page is built in a certain way, but the API has more flexibility. I’d also previously noticed a very interesting property in the API, so I started there:

Investigation 1 - using the EnableInterleaving property on the Query class

The SharePoint search API has an interesting option which can be used when writing custom code for SharePoint search. The name of the property in the API is “EnableInterleaving” on the KeywordQuery object – this sounded promising in terms of potentially being a native way of merging the result sets. The Microsoft documentation is (currently) not hugely clear on what the setting does, at least not to me anyway:

“A Boolean value that specifies whether the ResultTable objects in the ResultTableCollection produced by running this query should be interleaved.” (from http://msdn.microsoft.com/en-us/library/office/microsoft.sharepoint.client.search.query.query.enableinterleaving.aspx).

To start testing, I configured hybrid search in my dev environment – I used the outbound hybrid model, where the search center site exists on-premises but reaches out to Office 365/SharePoint Online to bring in results from there. (TechNet documentation is at Configure hybrid search for SharePoint Server 2013, or numerous sessions at the SharePoint Conference on this next week for those attending). At this point, the out-of-the-box search results page gave me the experience shown in the image at the start of the article, with the separate results blocks (highlighted in red/blue boxes).

I then wrote some custom code which allowed me to test this config option – effectively I have a custom web part with some CSOM (JSOM) code which runs a search query using the API. I have some URL parameters which enable me to specify the query text and whether EnableInterleaving is enabled or not. Unfortunately, EnableInterleaving does NOT interleave results between Office 365 and on-premises SharePoint. The main finding is that multiple result sets are still returned (i.e. for Office 365 and on-premises), and results are not combined together in any way. One difference I did observe, is that is an additional set of results is returned (a ResultTable) for personalized results (“PersonalFavoriteResults”), but it’s empty for me (most likely because this is a non-production environment where analytics aren’t in use. I’m sure SharePoint does natively populate/maintain this under real usage – it stores results previously clicked on by the user for this search).

Conclusion: - the EnableInterleaving property does NOT help us combine search results across Office 365 and on-premises SharePoint.

Investigation 2 – manually merging results in custom code

Anyone very familiar with SharePoint search will know that for any given query (e.g. a search for the query text “SharePoint”), the search engine can return different “tables” of results – perhaps think of these as “flavors” of results. They are:

  • Relevant results (“RelevantResults”)
  • Best Bet results (“SpecialTermResults”)
  • Personal favorite results (“PersonalFavoriteResults”)

Usually it’s the “relevant results” that we care most about – this table has the core set of results for the query. If our goal is to combine results from Office 365 and on-premises SharePoint, really what we want is a single “relevant results” table containing items from both locations. However, when we’re in hybrid mode, what we actually get by default is two separate “relevant results” tables – the image below shows my custom code rendering these tables, and you can see the similarities between this and the out-of-the-box search results page (which displays a separate Result Block for the Office 365 results):

Hybrid search - EnableInterleaving on 2

So, if we want to display combined results (rather than separated), we *could* write some code to merge the result tables together (e.g. in memory). But what logic would we use? How should the results be sorted? Well, one of the columns returned in the result tables is “Rank” – this is the “relevancy score” of the result compared to the query which was used (all search engines have such a calculation – like Google’s PageRank). So the question is – can we use Rank to compare results from different search indexes? Is it valid to merge the two result sets together, and then sort the full list by rank? After all, this can definitely be achieved with custom code – here’s my custom web part which goes direct to the search API to do this:

Hybrid search - custom interleaving 2

I had a feeling it wasn’t valid to do this. So I asked some folks including Microsoft, other MVPs etc. for input and my suspicions were confirmed – Wictor summed it up best with this response:

“Interleaving results from two different indexes would produce very strange results. Since the ranking is based on the analytics you basically need to "normalize" the rank and analytics data between the two (or more) different indexes. Assume the user behaves differently in the different environments (which is not uncommon - one environment might be collaboration and the other some other workload such as BI).”

So in effect, you can mash up the results, but not in a way which gives you the most relevant results first (across both environments) – the sorting by rank is artificial, and shouldn’t be trusted. However, it is worth considering that if you *don’t* care about relevance/rank, then this custom approach works fine. For example, if the requirement is to display results sorted by last modified date (or by something else, such as author, site URL or whatever) then happy days! You could use this custom code approach to merge results from your Office 365 and on-premises sites just fine in those cases.

The code

If you’re interested in this approach, here’s the code to do it. I used JSOM code to execute the search, but REST should be another option. When I have the two result tables, I iterate the results in each and add to a combined collection. Finally I sort the results, with a little help from Underscore.js. All this JavaScript is referenced by a custom web part, that I deploy to my on-premises site and add to a page there. So long as hybrid is properly configured, my code can reach across to the associated Office 365 environment and obtain search results from there too.

This is the initial code to execute the search – the main thing to note here is that there is NOTHING different needed in the code to “use hybrid mode” or “enable hybrid” or anything. The results automatically come back from both sources if the infrastructure is configured. You might notice I pick up some URL parameters for the query text and EnableInterleaving properties – you would need to integrate this with your UI as appropriate:

In the success handler, some different code *is* required here. Since 2 result tables are returned, here we do some processing to combine the results into a new JavaScript array, and implement any sorting. The code below provides different options for sorting – by rank (even though we’re saying it’s not valid under hybrid), and last modified date:

..and here’s the full JavaScript code with all my helper functions etc. (NOTE: I’m still leaving it as an exercise for the reader to implement this somewhere – I used two custom web parts which referenced this JavaScript, so providing the exact implementation you need is up to you):


Summary

At the current time the user experience for search in hybrid mode isn’t necessarily as optimal as we would like it to be, because results are displayed separately for Office 365 sites and on-premises SharePoint sites. In some cases, users will just want to “search across everything” and not care where the results live.

However, under some circumstances you might be able to use custom code to “join-up” the results. Specifically, this should work well if you are happy to display results sorted by something *other* than rank – perhaps last modified date, the author name or some other piece of metadata. Sorting by rank across two SharePoint environments is not valid, since for one reason, rank uses SharePoint’s analytics framework, which operates within the scope of an individual environment only (i.e. it doesn’t do anything magical for hybrid deployments).

In the future, Microsoft will hopefully provide capability which will address this problem “natively”, without the need for custom code. Then, it will be possible to show results across Office 365 sites and on-premises SharePoint sites together. Until then, the solution presented here might be an option for you.

3 comments:

Vinod Manuel said...

There is the option of using a search server to merge the search results. Say Google Search Appliance or any of its competitors. The search engine f available would return a common result set with appropriate ranking.

Denis said...

Hey,

Sorry for rising such an old post.

Would not be possible to create a Content source on your SP'on'prem environment that will index the SPOnline content and having a search results page that would merge the two content sources?

Of course you would lose the security trimming on the SPOnline results...

Chris O'Brien said...

@Denis,

No, unfortunately it's not possible to index SharePoint Online from an on-premises SharePoint environment - this is blocked. After all, if this was allowed, the whole world could be hitting SharePoint Online with their search crawlers, which would add massive load to Office 365 and make the service very hard to run.

Hope that makes sense.

Cheers,

COB.