Chris O'Brien: February 2022

The rise of low-code business applications continues, with industry analysts such as Gartner predicting that by 2025 70% of new applications developed by enterprises will use low-code or no-code technologies, and Microsoft claiming that 92% of Fortune 500 clients are now using Power Apps to provide tools to their employees and customers. With this backdrop I'm excited to be speaking at the inaugural European Power Platform Conference this year, held in April in Berlin.

The conference is from the same team behind the European SharePoint, Office 365 and Azure Conference (ESPC), one of the biggest events in the Microsoft cloud space across Europe for the last few years and one which I've enjoyed speaking at for the last 6 or 7 years. I've no doubt that the Power Platform event has emerged to fulfill the huge demand for quality learning and guidance on best practices and lessons learnt which are specific to the Power Platform family. The surface area and capability of the platform has expanded massively in recent years, and in each release wave Microsoft are driving innovation hard in each of the areas depicted below:

The European Power Platform Conference event

The headlines of the event are:

When - April 6-8 2022
Where - Berlin, Germany
Language - English
Who - lots of community experts, Microsoft MVPs, and keynotes from Charles Lamanna (Microsoft Corporate Vice President, Business Applications & Platform) and Marios Stavropolous (ex co-founder and CEO of Softmotive, now at Microsoft as Partner, Microsoft Power Automate following the Softmotive acquisition)
Pricing - tickets from €695, see the conference pricing page

My session - Automate the Impossible with RPA and Power Automate Desktop

My session focuses on RPA (Robotic Process Automation) and uses a fun scenario to demonstrate serious technology. The abstract is:

Automate the Impossible with RPA and Power Automate Desktop

An alternative title for this session could be “How I frustrate my kids' ambitions of unauthorised gaming through automation!” Robotic Process Automation (RPA) has been around for a while but is now part of Microsoft’s Power Platform and easier to use than ever. Power Automate Desktop allows “island” or legacy applications and systems without an API to be automated – even my home Wi-Fi settings though my ISP’s terrible portal in my case.

In this session you’ll learn how to get started with automating web applications using Power Automate Desktop. We’ll focus on how to interact with web sites like a human with a mouse and keyboard does – using navigation links, entering text into textboxes and clicking buttons. You’ll learn the difference between web automation and UI automation in Power Automate Desktop, and how to decide which approach is best. Finally, we’ll cover how to integrate with Alexa at home, how to schedule your processes and discuss licensing considerations. The automation wave is about to hit – make sure you’re part of it!

For more details see:

Automate the Impossible with RPA and Power Automate Desktop

Conference link

Hope to see you there! Here's that link to the conference site again:

https://www.sharepointeurope.com/european-power-platform-conference/

SharePoint Syntex is the AI-powered ability for Microsoft 365 to learn and understand your documents, enabling knowledge to be extracted and automated processes to be implemented around your content. Building on earlier work I did, I've worked on a couple of Syntex implementations recently, one for a client and one internal to us at Content+Cloud. Putting Syntex into action in the real world is quite different to general experimentation, and in this post I want to share some learnings as I've worked with different scenarios. Before that, if you're looking for some fundamentals content on Syntex some of my earlier articles might be useful:

Syntex is an evolving technology and although Microsoft have worked hard to democratise the AI and put it in the hands of non-developers, there's definitely a learning curve and the constructs used in the training of AI models take some time and experience to get the most from. It starts with making the right choice between Syntex' two approaches (as discussed in choosing between structured and unstructured document understanding) and then requires a good understanding of the constructs and some trial and error (including interpreting the feedback received) during the AI training process.

Background - real world scenarios these tips are based on

Before I go into the tips I think it's useful to understand the types of documents I trained Syntex to understand. The first is known as a 'method statement' and is used in construction and engineering in the UK - it's required by law in some scenarios but is also commonly used on practically any service task where risk has been identified. Examples I've seen include replacing a building's alarm system or even cleaning windows with a tall ladder. The method statement goes alongside a risk assessment, and the overall process is often known as RAMS (Risk Assessment - Method Statement). The method statement describes the 'safe system of work', detailing the control measures taken for the identified hazards and risks. Here are the first pages from a couple of examples - as you can see the formats are quite different, but that's OK for Syntex:

For these types of document, Syntex document understanding works best.

The other work I'm doing revolves around an Excel format we use commonly at Content+Cloud which looks like this:

As you can see this one is very tabular in nature, and unsurprisingly if you know Syntex I found that the unstructured document processing model works best here.

My top 5 Syntex tips

1. To extract from Excel, convert to PDF first

Syntex structured document processing actually doesn't directly support Excel files so a quick conversion to one of the supported formats (JPG, PNG or PDF) is needed - I recommend PDF to avoid the uncertainties of image processing. Power Automate offers a couple of simple ways to convert an Office file to PDF and because you call into Syntex unstructured document processing through a Power Automate Flow you create, this is convenient all round. Some options are:

Move the file to OneDrive temporarily and use the 'Convert file' action with a target type of 'PDF', then copy or move back
Use a 3rd party conversation service which has a Power Automate connector. Encodian works great and is free for small volumes

To avoid a recursive loop you should add a check that the file being created is indeed an Excel file before converting. You'll need a check such as:

Content-Type is equal to 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'

Your file is now ready for Syntex processing.

2. When working with tables, try structured document processing first

From the two Syntex approaches, document understanding often isn't well-suited to extracting content from tables. The main reason for this is that Syntex document understanding strips away all the structural and formatting elements of the document to leave just the raw content. Consider the following document:

What Syntex actually 'sees' for this document has all of the formatting removed - we see this when labelling documents in the training process:

As a result, extracting contents from a certain row or column in a table can be difficult in document understanding. You might be able to do something with before/after labels and a proximity explanation, but accuracy might vary. In these situations, it's worth trying a switch to structured processing. The model there has special support for processing tables and it's capable of detecting rows and columns in a table - this makes it possible to zone in on 'row 4, column 2' or whatever you need.

Of course, what comes with a change to structured processing is a completely different approach (with its Power Automate integration) and pricing. But, you might need to go this way if the content you're trying to extract is tabular.

3. Implement a 'proximity explanation' and train on more documents when extracting content in a less describable formats

In Syntex document understanding models, when creating a Syntex extractor to recognise some content in a document and pull it out, it's worth understanding the following - unless specific steps are taken Syntex is best at extracting content in known, describable formats. For example, an extractor you create may be accurate immediately if you're extracting a consistent format such as a postcode or set of product IDs for example. However, training a model to recognise arbitrary data can be more difficult even if it's in a consistent place in the document. This becomes an important aspect of working with Syntex and something to plan for.

Consider two scenarios:

Extracting a product description (lots of arbitrary phrases)
Extracting a product ID in form XXX-123456-YY (distinct pattern)

In addition to the AI machine teaching you do when providing Syntex with some sample documents, the 'explanations' you create are vital here:

My tips for extracting more arbitrary formats are:

Implement a proximity explanation in conjunction with something just before or just after the content you're trying to extract
Train on larger numbers of documents - don't just stop at five sample docs, provide Syntex with ten or more
Expect to spend a bit more time on these scenarios

With some tuning, you should be able to get the extractor accuracy to where you need it to be. If not, see tip #2 - setting up a quick test with Syntex unstructured document processing to see if that's more effective is a good idea at this point.

4. Combine Syntex with formatting in SharePoint lists and libraries

Syntex and SharePoint JSON column formatting are a winning combination - I use them together frequently. Consider the case where you're extracting some content from documents with Syntex, and of course the values are extracted from the document into list item values - when some column formatting is applied we can really bring out the value of what Syntex has found.

From this:

To this:

We can now get an instant view across many documents at once:

Engagements with a higher value become obvious from the Excel 'data bar' style formatting on that column
The business manager is highlighted in bold, blue text
Any missing values from the project manager field are highlighted in red

In another example, I use very similar formatting but the emphasis is on highlighting the missing gaps across the documents (as shown by the red):

This provides an intuitive, instant picture of how complete the documents are according to what Syntex found in them. This can be a useful technique when Syntex is used with documents where it's important that certain information is present - which applies to many of the world's is commonly the case when you think about it.

5. Combine Syntex with Power BI reporting to measure document compliance

Expanding on the theme behind the formatting approaches shown above, many business processes rely on documents being completed to a certain standard - the document represents an important form of information exchange or capture. The RAMS review process I discuss above is a great example of this. I'll talk about that particular scenario in more detail in a future article, but as we're discussing general concepts today this idea of "Syntex + reporting" is, I feel, worthy of being a top Syntex tip.

Consider that for any process where information completeness is important, measuring and reporting on this is likely to be valuable. For my client's scenario where fully completed RAMS method statements are a guardrail to help ensure safe working, we implemented a Power BI dashboard to provide insights on to what Syntex was finding. As you can see in the image below, this measures overall 'process compliance' levels, and measures such as the average fidelity of documents from individual subcontractor companies. This allows us to create a subcontractor 'leader board', helping the team understand where there may be greater risk and what to investigate:

This does rely on your Syntex AI models being accurate for the documents you are processing, but even without full accuracy the insights help guide you to where improvements can be made.

Summary

Syntex is an amazing technology and while other AI-based document processing solutions have existed for a while (often vertical-specific, such as legal or contracts management solutions), the advantage Syntex has of being baked into your core collaboration platform is profound and opens many doors. With a good understanding of the constructs and capabilities together with some imagination, Syntex can be the foundation for some high value solutions. Hopefully this post provides some inspiration and useful tips!

Chris O'Brien

Monday, 28 February 2022

I'm speaking at the European Power Platform Conference 2022 in Berlin - RPA and Power Automate

The European Power Platform Conference event

My session - Automate the Impossible with RPA and Power Automate Desktop

Conference link

Wednesday, 2 February 2022

SharePoint Syntex AI - my top 5 real-world tips

Background - real world scenarios these tips are based on

My top 5 Syntex tips

1. To extract from Excel, convert to PDF first

2. When working with tables, try structured document processing first

3. Implement a 'proximity explanation' and train on more documents when extracting content in a less describable formats

4. Combine Syntex with formatting in SharePoint lists and libraries

5. Combine Syntex with Power BI reporting to measure document compliance

Summary

MVP profile

Popular articles

RSS feed

Previous articles

About Me