Friday, 25 October 2013

Waiting for a search crawl in Office 365 – plan search-driven sites carefully

HourglassHere in autumn/fall 2013, if you’re working with Office 365 you might notice that content changes (such as new pages and documents) take some time to appear in search results. I spent a little time thinking about this recently, as my team and I finished building a search-driven news site. On this project, we are mainly developing against Office 365 – we use local virtual machines also, but since O365 is the target we are deploying our customisations there frequently as we develop.

We noticed that “index latency” – the time taken for new content to appear in the search index – was poorer than we expected on Office 365. We run several tenancies on different subscription levels (e.g. SharePoint P2, Office 365 E3 etc.), and we experience the problem across all of them. Some days are good, some days are bad. One memorable (read, stressful) time, we had a “end of sprint demo” - our solution was provisioned 2 days before the demo, giving us lots of time to create test content in order to make the demo to the business users go well. We completed adding our pages, documents, pictures and videos a full 24 hours before the demo, and waited for our home page to “light up” as content was crawled in Office 365.

Unfortunately, only some of the content was indexed in time. The demo itself went well, but perhaps only because a bit of narrative helped the business users imagine the ‘full’ picture. Overall, it’s difficult not to feel that 24 hours is a long time to wait for content to be indexed in SharePoint! Business users these days have higher expectations, and most on-premise environments I’ve worked with have used incremental crawls with a frequency of 15 or 30 minutes.

How long is normal in Office 365?

The poor performance surprised us somewhat. My colleagues and I thought that we had originally read that a delay of up to 15 minutes was expected in Office 365, perhaps suggesting that SharePoint 2013’s “Continuous Crawl” is used. The Office 365 Service Descriptions – Search page now suggests that isn’t the case, but however it is managed in the back-end, we certainly weren’t expecting such long delays. Some further digging will lead you to this KB article:

Search doesn't return all results in SharePoint Online – KB2008449

“Search crawls occur continuously to make sure that content changes are available through search results as soon as possible. Recently uploaded documents may not immediately be displayed in search results because of the time that's required to process them. SharePoint Online targets between 15 minutes and an hour for the time between upload and availability in search results (also known as index freshness). In cases of heavy environment use, this time can increase to six hours.”

OK, so at least that’s something official, even if it’s not necessarily what we wanted to hear. But why are we sometimes seeing longer delays than 6 hours even? I raised a Service Request with Microsoft to find out..

The support line

In short, I didn’t get a 100% satisfactory answer from Office 365 support. Ultimately it sounds like this kind of thing is fairly normal in Office 365 right now. I asked if other customers were reporting this issue, and the answer was “yes, but we just ask them to wait another day”. Hmm, OK then! Of course, if your site deals with time-sensitive content (or you are just looking for fresh content to be shown in search in a reasonable timeframe) this isn’t a great situation.

Working around the issue

So if you need to consider other alternatives:

  • If you are dealing with search-driven functionality, could the same thing be provided with query rather than search (e.g. if you do not need to aggregate across site collections)?
  • If you are in a hybrid situation, could the functionality be delivered by an on-premises environment?
  • Do you need a solution right now, or can you afford to wait for improvements? (I personally am hopeful that upgrades to Office 365 will improve the situation in the future.)

For us, in fact all three are options we could use. In our situation the 2nd option could be the simplest if we need an immediate solution - everything we are building for this client can work be deployed to Office 365 or on-premises SharePoint. This requires quite a lot of careful engineering (not only in terms of the solution, but also deployment scripts/processes etc.), but results in a nice position to be in for a hybrid deployment.

In general though, let’s hope that Microsoft work on this in Office 365. I’ll keep you posted if we see improvements - and if anyone has any useful information in this area, feel free to share in the comments below.

26 comments:

Jasper Oosterveld said...

Good post Chris. This is really a serious issue especially with the release of the Content Search Web Part.

I really hope this gets fixed fast before customers start to notice this issue and don't want to use Office 365.

Did you know audience compiling is only once a week? haha. Pretty embarrassing.

Chris O'Brien said...

@Jasper,

Yes, agree with all that. I think for now, the key thing is awareness of the problem (one reason for this post) - knowing about it can definitely help you side-step it in some cases.

And yes, the audience compilation thing isn't great either. We had to use another approach for personalization because of this recently.

The support technician I spoke to alluded to the fact that several background processes currently happen at the weekend on Office 365. In addition to audience compilation, I think sometimes a full sync of user profiles might fall into this category, maybe other operations too.

Cheers,

Chris.

TGITM said...

Yep unfortunately this pretty much mirrors what we see on SPOL.

We do most aggregation and content using search, but have had to use CQWP in some cases because of stale indexes.

Often audiences can be replaced by Query Parameters such as {User.} in your KQL.

Provision-wize we have developed a framework (PowerShell cmdlets build in C# and using XAML) that makes it transparent weather we deploy and import/export to either SPOL or on-premises.

Mikael Svenson said...

Hi,
My sources tell me that they are constantly working on improving how search works in a multi-tennancy scenario, which Office365 is all about.

I have no idea about the time frame, but I'm sure the index and query issues you might experience today in 365 will be resolved. Remember, 365 is what MS is betting on these days, and they have to make it work.

Thanks,
Mikael

Chris O'Brien said...

@TGITM/Anders,

Yes, sounds like we've followed a similar path. We are using {User.} heavily in our queries, and it was great to find this works fine in the Result Script web part - so need to worry about Content Search web part in many cases.

And for provisioning, our PowerShell scripts accept a parameter of "Online" or "OnPremises" and just do the right thing from there. Fun times!

Chris.

Chris O'Brien said...

@Mikael,

Yes, this is pretty much my expectation too - I just cannot believe they won't solve this soon.

It's good to hear you've heard things along these lines :)

Chris.

Andrew Burns said...

Sorry, possibly off-topic, but...

"And for provisioning, our PowerShell scripts accept a parameter of "Online" or "OnPremises" and just do the right thing from there. Fun times!"

I took a look at the Microsoft SharePoint Online cmdlets yesterday, and was pretty disappointed - I have to say, it seemed pretty useless. Did you write your own as TGITM described and use those? Or perhaps use CSOM directly in PowerShell?

Hugh Wood said...

This is an issue but not as much of an issue as it used to be. I must also add you can send things to the continuous crawl in SPO at list/library or site level. For sites and libraries it is located under Advanced Options in the list/library settings and for sites it is in Site Settings, Search and offline availability.


Also at 4am UK time today there was maintenance on Search and this has been happening quite frequently over the last couple of months. With changes to SPO like the CSWP and Cross Site Publishing as well new things to come increasing the number of lookup fields in a list search not only must but will get upgrades.

They have just been splitting out the sharepoint instances so perhaps this will change sooner rather than later.

Rule of thumb is new columns or sites take up to 48 hours to appear. New items can take between a few seconds and 15 minutes (More if the server is under heavy load). Not all too shabby considering.


Regards,
Hugh

Charlie Normand said...

Two things @Jasper I thought Content Search Web Part was not available in 365? Is there a hack?

@Chris are the community scoring stats driven off search or a separate timer job too? (tired to find the answer on google yesterday, no joy), our scores are days out of date (probably going to raise a ticket).

We also had a day where our farm was v. bad performance for 48 hours 'Server busy' clearly WFEs or something falling over, but it took a long time for them to sort it...

Is the whole 'scalable' thing happening?!

Phil Childs said...

Thanks Chris - I thought it was just me!

Chris O'Brien said...

@Hugh,

It's still early to say, but *perhaps* things have improved since I wrote this post. In our experience, the "reindex this library"/"reindex this site" options haven't solved the problem - on many occasions, we still waited several hours after having used this option.

It's encouraging if you really are seeing consistent times of less than 15 minutes, but up until yesterday (November 6 2013), we were still seeing delays of several hours before new content was indexed (in sites which have existed for several days/weeks, with no schema changes).

Cheers,

Chris.

Chris O'Brien said...

@Andrew,

All PS + CSOM. We're pretty hooked on that crack already :)

Chris.

Thomas Magnussen said...

Good Post.

We had the exact same experience with O365 and indexing, and also ended up using CAML for content retrieval.

However in our case, this posed other major challenges. We had various content segmentation and security requirements, meaning we had content in a large number of webs, with some users having access to one web, others to several, with a need for some aggregated views (all custom ui w/ responsive design).

This posed problematic when using JS CSOM and CAML, since you can no longer do _site-level queries_. You have to specify exactly _where_ you want to retrieve data from when using JS CSOM CAML, and this might severely impact how you need to structure your content and your logical IA.

We had to structure the content in a way that meant we could construct query-rules that implitely knew where content was stored for a particular user or context. While it worked, it certainly wasn't very elegant, when you know what can be done with traditional search.

So ppl need to be aware of the limitations in CAML with the JS CSOM.

Friendly regards,
Thomas

Andrew Burns said...

@Chris - Thanks, yeah, I've ended up going the, well, CSOM route mostly, for deployment, which hasn't been as bad as it might have been.

Unfortunately, I've now hit a point where I really need to use Search. Query ain't gonna cut it - and nor does waiting 24 hours.

Office 365 feels like trying to knit with one hand.

Mark Slavik said...

Hi Chris, reading this article confirms that we are not the only people experiencing this issue. In our case it was taking several days to re-index. Add new site columns was a very random affair as to whether the field would appear in the list of crawled fields. Then eventually once it dis appear it was equally random (days not hours) to get a usable managed property.
SharePoint is a search driven framework. This is far froman ideal situation to be in. It is going to have to change.

Jagan said...

Hi Chris,

I found a similar problem and figured out a way out of it.

Basically, my items were not appearing on the search for quite a while, later, I realized that I deleted few terms and the columns that were referring to that were broken. I feel this will be a problem, especially, when setting up environments.

Once I deleted column and asked for a re-index, the items started picking (including the new columns) within the 15 mins time window that MS told.

Don't know if it is a coincidence.

Cheers,
Jag

John Guilbert said...

Totally Agree. Microsoft better pull their finger out. The search on a 3 itemed List and still no update after 2 hours. Crazy. Is Search even reliable in this case?

Rosanne said...

Has anyone noticed any improvement with this? I have normally experienced latency of about a half-hour or less, but just recently I have noticed latency of many hours. Being new to SharePoint in general, but also SharePoint Online, do we know what the Office 365 crawl schedule is? Is there any rhyme or reason to when they are performed? Why are some items indexed in the crawl and others not (such as documents added/modified at the same time and not being included in the same crawl)?

toby said...

Yup... Ditto... I've had various issues with index/crawls and latency between post/publish and search results for a while... since O365 beta... Now I set up a new account and new domain just this last week and it's taken (at least!) 6 days to get any search results at all, and now the results are only partial and do not contain info that was updated about 4 days ago.

This is a really bad experience and difficult to deal with as a consultant trying to help others use Office 365. I even remember reading this post a while ago when having trouble before and just assumed that it would have to be fixed by now because this is simply required and relied on and it has to be there. It's very frustrating and extremely time consuming.

Y'all are creative in work arounds some have posted here, but I have to have just basic pages and basic content on pages show up and even that does not work. (No column customizations. No mumbo jumbo. Just OOTB as much as possible.)

Pinging for more MSFT traction.

Bruce said...

I've gotta say that this post back in October '13 guided my whole SP dev practice direction since (so thanks @Chris!). Do we invest in doing as much as possible using search/display templates (as MS seems to want us to) which is fine for On-Prem but falls over in SPO, or do we instead invest in leveraging REST as our go-to pattern for most things for maximum reusability? Glad I went with the latter.

Lars Lynch said...

I was also waiting for a new managed property mapping to show up after a full crawl.

Resetting the site index did the trick: https://support.office.com/en-us/article/Manually-request-crawling-and-re-indexing-of-a-site-a-library-or-a-list-9afa977d-39de-4321-b4ca-8c7c7e6d264e?CorrelationId=b019310d-fa92-4bab-9846-1743ae133e41&ui=en-US&rs=en-US&ad=US

Chris O'Brien said...

@Lars,

Yes, that's a good tip but remember that it doesn't actually trigger a recrawl (so that search picks up the changes). All you're doing is *flagging* some content for search, so that on the next crawl it does indeed get re-indexed.

From the page you link to:

"The content will be re-indexed *during the next scheduled crawl.*"

Still, this can be important as it does allow you to force a full re-index of a site - it's just that you're not actually changing anything in terms of the timing.

Thanks for the note!

Cheers,

COB.

Jean Marie Geeraerts said...

Hi, we're on O365 and we still face the issue of slow crawling.
I really do feel this is unacceptable and Microsoft should address this in the very near future.
Especially for user profiles it takes a long time to pick up new properties and the managed properties mapped to the crawl properties takes several hours to appear.
And there's no way to flag user profiles or managed properties for reindexing :(

Chris O'Brien said...

@Jean Marie,

Agree - speed of indexing of user profiles/people data is still something of a problem for us too..

Cheers,

COB.

arut said...

I've been playing around with managed metadata fields, Content Search web part and search refiners for the past few months on O365 SharePoint. For most part the crawling seems to happen every 15 minutes, but I've seen quite a few instances when it took couple of hours and in one instance, it think took nearly 2 days for the content to show up in content search web part.But, I'm not even concerned about that for now.

Yesterday, I observed that the content that was showing up in the CSWP and related refiners disappeared and I opened a case with Microsoft. However, when I re-indexed the document library that had the content, it all started showing up in the CSWP after 15 minutes.

Its seems scary to me that content from the search index would disappear!!!

Unknown said...

Chris,

And 3 Years later, End of November 2016 and the issue is still not resolved nor up to expectations. This is so ridiculous. We still have to wait 22 hour for a freaking blog post to show up.

Mikael Svenson, What your sources are telling you this time? Shall we wait for a century?