Thursday, 8 July 2021

Using RPA to control my ISP settings for kids gaming - Power Automate Desktop for web automation - part 2

In the last post I talked about how I automated my home wi-fi/ISP settings with Power Automate Desktop so that my kids can only use gaming websites like Roblox at permitted times. Since the scheduler in the portal doesn't support what I need, I had been manually visiting the site many times each week to block and unblock sites, which was a huge pain to both me and the kids. Enter Robotic Process Automation - RPA technologies are designed to "automate what cannot be automated", in other words applications without APIs, applications which run on the desktop and other legacy systems. By driving the keyboard and mouse directly, RPA opens the door to wider automation since it simply replicates how a human would interact with the application. Power Automate Desktop is now free with Windows 10 and Windows 11 - Microsoft's move to bring RPA technology to the masses and gain market share as RPA moves from exotic and niche to democratised and commonplace.

In this follow-up post I want to cover a few things:

  • Turning a Power Automate Desktop flow into a cloud flow which can run on a schedule
  • Sending a mobile notification once the flow has run
  • Licensing considerations for Power Automate Desktop
  • Authentication challenges and solutions - working around Captcha requests

In the last post I walked through how I got Power Automate Desktop to open a browser, navigate to my ISP's portal, find the page with the URL blocking, enter the two gaming URLs I want to block into a textbox and then click 'Apply' - with the final step being an announcement to Alexa. This now runs like clockwork every day once the kids have had their prescribed hour of fun. To give you a sense of what that looks like, the portal looks like this:

Where we got to in the last post is that my process was automated, but only in the sense that I have to click a button manually to trigger all the steps to execute. At this point I have a couple of "desktop flows" (one to block the URLs and one to unblock), and they show up in the relevant area in the Power Automate portal:

Of course, what I really want is for this to run in a 100% automated way and on a schedule - with no manual intervention involved. For that I need a cloud flow, so from our points above let's start there. 

Calling a Power Automate Desktop flow from a cloud flow for full automation

Once you have an RPA Flow it can be "wrapped" in a standard cloud flow - a cloud flow being the type of flow that you've been using all along if you've been working with Power Automate previously. Of course, a cloud flow can be triggered in a huge number of ways including when some data is changed in Microsoft 365, SQL or Dataverse or perhaps when something has happened in Salesforce, Workday or another cloud service, or even when an e-mail is received. In my case I just need a simple schedule. 

To get started I go into Flow and create a new scheduled cloud flow:

I then give my flow a name and define the schedule:

Once in the flow I can use the 'Desktop flows' connector to call the flow on my machine - this is the critical thing that links a cloud flow to a desktop flow:

This action allows me to select from a few Microsoft RPA options - note that all are premium actions (more on licensing later):

This action allows me to select from desktop flows I've already created - they are detected because of the connection previously established, which links my Microsoft 365 identity with my physical laptop:

Once I select one of my desktop flows, the action is intelligent enough to recognise any input parameters to that flow. In my case I parameterised the following to avoid them being hard-coded into my desktop flow:
  • RobloxUrl
  • ScratchUrl
  • AlexaAccessCode
The UI adapts to provide a way for me to enter these:

Take note of the "Run Mode" parameter above which is used to specify whether the desktop flow should run in attended or unattended mode - this is important for usage and licensing. We'll come to this later. 

Once things are saved, that's all I need to schedule and fully automate my desktop flow.

However, there's one more thing - the kids hear about the fact that gaming is either open or closed through the automated Alexa announcement I talked about last time. But I'd like to know that everything happened as expected too, so I add a Flow mobile notification:

...and specify the message:

Now wherever I am the notification comes through on my phone each time one of my flows run:

So now everything is fully automated and I have confidence that my automated process is behaving properly. 

Choosing between attended and unattended mode 

When we were creating a cloud flow to "wrap" our desktop flow earlier, we chose "Attended" for the run mode. When working with Power Automate Desktop and RPA in general, attended vs unattended mode is is an important decision to plan for - on the surface, unattended might seem sensible because you can't guarantee that a user will be at the machine when the flow runs, especially for a process that runs a few times through the day. However, an unattended flow can only run under the following circumstances:
  • All users are completely signed out
  • The screen is locked
  • The gateway connection between the cloud flow and the desktop flow has user sign-in information
In fact, a common use case for unattended flows is for a virtual machine to be used - thus giving you more control around guaranteeing these stipulations can be met. Unattended flows can also run concurrently on the same device if different user accounts are used, and this is useful with long-running processes or doing RPA at scale. 

Notably, licensing is significantly different for unattended vs. attended flows - so this is another reason why planning is required here. Let's turn to licensing now.

Licensing for Power Automate Desktop

I mentioned earlier that Power Automate Desktop is free with Windows 10/11, and that's true. What that gets you is the ability to click a button in Power Automate Desktop client on your machine and have your process run - the further automation that comes with a cloud flow requires pay-for Power Automate licensing however. As usual with Power Automate there are two ways of licensing this - per user and per flow. For per user licensing, you need a specific plan named the "Per-user plan with attended RPA", as shown below on the Microsoft pricing page:

Notice the additional "per bot" consideration if you want to run unattended (the second row). If you're doing per-user licensing you add the "unattended RPA add-on" to the per-user price:

So in summary, the options are:

Mode License approach Base cost Unattended add-on cost Total cost per month
Attended Per user $40 $15 per user per month (until 31 September 2021) - $15 (until 31 September 2021)
Attended Per flow $500 per month - $500
Unattended Per user $40 $15 per user per month (until 31 September 2021) $150 per month $165 (until 31 September 2021)
Unattended Per flow $500 per month $150 per month $650

As ever, you should refer to the official Power Automate licensing page for the current pricing and considerations between per user and per flow licensing.

From Microsoft's pricing you might observe:
  • There's a fairly significant premium for unattended processing. That's because you can automate at scale with this option with many processes running concurrently - if you're using RPA to process 10,000 invoices at month end, this incremental cost is likely to be justified of course.
  • The decision between per user and per flow licensing for Power Automate overall is relevant in this area too - most orgs decide this based on automation strategy and requirements they have in view  
  • Whilst there are some incentives at the time of writing (June/July 2021), it seems that the new pricing applied to Power Apps at this time doesn't carry over to Power Automate in case you're wondering
Overall, you can see this is designed for organizational use rather than anything personal or home-based. In that context of course, if a high value process is being automated the value of this to the organization could be many multiples of the cost - so comparing unattended RPA in Power Automate Desktop with non-enterprise solutions doesn't really make sense. And of course, my scenario of using this business technology to automate my wi-fi settings at home doesn't really make sense either - I created my solution on trial licensing, but if I needed to license this for the long-term the costs (even at $15 per month, which is all I'd need) would be a factor. None of this takes away from the power of the technology however - in a business context Power Automate Desktop is extremely powerful.

And when the trial licensing expires on my solution, I'll just click a single button each day and it will still be less work than going to my ISP portal and entering/removing URLs!

A final challenge for RPA - Captcha prompts

One final topic to address is how to use RPA to automate around visual Captcha challenges like these: 

Of course, the whole point of this kind of check is to ensure there's a human making the request rather than an automated bot - but this is to foil a malicious bot of some kind, rather than our wholesome, legitimate and approved automation. The image above shows the Captcha challenge presented when logging into my ISP's portal.

I suspect it may be possible to use a combination of DOM scraping and image recognition (e.g. Azure Cognitive Services Vision API) to get past something like this - however, reliability could be an issue since these images are small and frankly difficult enough for humans. So, my solution is far less exotic unfortunately:

  • Create a dedicated browser profile and pre-authenticate to the ISP site
  • Ensure the RPA process always uses this browser profile:
  • Enjoy the fact that persistent cookies last a long time with my ISP

So perhaps working around the Captcha problem more than solving it, but since I only need to take action every few weeks when the cookies expire that's good enough in this case. Clearly this could be a challenge for some automation scenarios which target public SaaS services in the enterprise, but wouldn't be common with other targets such as legacy non-cloud applications.


Over these posts we've seen how effective Power Automate Desktop can be in "automating what cannot be automated". We've discussed the basic "Power Automate Desktop only" approach vs. bringing a desktop flow into a cloud flows to facilitate scheduling or integration with other cloud processes - the latter method requires pay-for licensing in the form of a Power Automate per-user or per-flow license, and unattended processing comes with an extra cost but does enable automation at scale. 

As we noted that the technology is pitched at business automation rather than personal automation. The potential is clear, and it's great to see Microsoft have a strong easy-to-use offering in this space.

Wednesday, 9 June 2021

Using RPA to control my ISP settings for kids gaming - Power Automate Desktop for web automation - part 1

Like many families, we have annoying kids who would spend all their time gaming or on screens given half the chance. So, our tyrannical regime approach essentially grants them a certain amount of time per week, with controls put in place through our ISP which allows me to block or allow individual websites. Unfortunately with my ISP (Virgin Media here in the UK) there's no scheduling capability for this, only for parental controls overall - which means I have to manually go the portal and tap in the sinful URLs many times per week, and then unblock later on. A recurring conversation in our house goes something like this:  

  • Kid - "Dad, you know how you block our gaming websites when it's not gaming time so we can't sneak on?"
  • Me - "Yes?"
  • Kid - "Well, you're always on calls when it's 5pm on a gaming day and we're losing lots of time. It's so unfair! Can't you schedule it or something?"
  • Me - "Sorry son. Despite using a company who claim to be the UK's leading ISP and paying them through the nose each month, they chronically underinvest in their management portal and it was built by interns in the 1990s. There are no APIs either, so it's not possible unfortunately. Trust me, I'd like nothing more than to never visit that thing again or have notes shoved under the bedroom door when I'm on calls - but that's life son."
  • Kid - "Erm. Could we not just......turn the controls off?"   
  • Me - "Hahahahaha!"

After the 86th iteration of this conversation, I decided to spend a weekend looking at Power Automate Desktop for this since it does web automation. Microsoft announced in March 2021 that Power Automate Desktop is now free with Windows 10 as they expand the Power Platform into other forms of automation. In truth, the licensing means that if you want true unattended execution (rather than attended automation where you manually press a button and all the steps execute for you), Microsoft's RPA technology is more suited to workplace scenarios than home or personal automation. I'll talk more about licensing in the next article. Nevertheless, my ISP scenario was a good excuse to automate something else and try another scenario with Power Automate. 

What I'm automating

The basic process can be described as:

Once logged into the Virgin Media portal I can manage my "web safe" settings through a series of tabs. I'd need the automation to load the website (ensuring the session is authenticated with my credentials), and firstly navigate to the "Websites" tab shown below:

The websites tab provides an interface to specify the sites to block. I leave some sites permanently blocked, but what my automation needs to do is come in here and add or remove the "scheduled" websites. As the numbers in the image below depict, it's a 3-step process involving the URL being typed into the box, the "Add" button click and then settings applied with the "Apply" button once all changes are in:

The unblocking is effectively a reversal - this time the automation needs to find the right entries in the list and click the "Remove" link next to each. This involves a slightly different series of steps and there are a couple of ways in Power Automate Desktop to do this, but we'll get to that later. 

Getting started with Power Automate Desktop

Given that the tool is all about automation from the desktop (although cloud-integrated), we need to download some software. There are a number of ways you can do this, including from the "Install" menu within the Power Automate portal:

Once installed and signed-in, your machine effectively has a binding to a Power Platform environment:

Power Automate Desktop capabilities

The overall capability set is very powerful indeed, and of course the whole premise of desktop automation is that you can record (or manually create) steps which drive keyboard and mouse actions. You can open applications and browser windows, interact with their controls, and perform steps in systems that would be otherwise difficult to automate. Here are some of the things you can do:

Control elements of the desktop PC:      Work with files and folders:

Control applications through UI automation:     Control applications through web automation:

This distinction between "web automation" and "UI automation" is important - we'll come back to this later, but notice there are some similarities and some differences between the possible actions.

Overall, the toolbox covers many different areas of automation and there are almost infinite permutations of how these things can be combined:

Now let's look at the specific process to implement the automation I need.

Getting started with Power Automate Desktop

Like other desktop automation software, Power Automate Desktop brings a recorder to allow you to record the screen as a one-time operation to create your automated process. Whilst you *can* create your automation by directly dragging and dropping actions from the toolbox, in most cases you'll use the recorder somewhere in there. In fact, there are two:

  • A web recorder
  • A UI recorder
These are represented as two icons at the top of the designer:

As we're talking about this distinction, here's a tip:

Tip #1 - Web automation vs. UI automation
Since it's web automation I need to perform, using the web recorder is likely to give me the best results since it has a deeper understanding of browsers, web page structure and HTML input elements such as textboxes, radio buttons and dropdowns. The UI recorder is more suited to automation of desktop apps. This rule of thumb works in most cases, though I could imagine the occasional need to use UI automation in the browser (e.g. for a particularly complex web UI where pixel locations are more effective than navigating a DOM structure or dealing with content represented in images).

When using the web recorder you choose which browser to use - regardless, you'll always need the Power Automate Desktop browser extension installed and enabled (here are the links for Edge and Chrome). You select your browser of choice to get started:

Tip #2 - Chrome vs. Edge
At the moment, web automation with Chrome seems to be much more reliable than with Edge. The notorious "failed to get window" error seems much more common with Edge, and operations that fail in Edge tend to just work in Chrome. For now, I recommend switching to Chrome if you're trying with Edge but run into unexplained challenges.

As you record the steps you want to automate, they are captured by the Web Recorder:

In the end, my automated process looks for blocking the websites looks like this:

Tip #3 - use variables in your automated steps
Whilst the recorder will do a good job of detecting text you type into forms, replacing these with variables to more clearly separate out the values is a good idea. They become easier to replace and less buried in your script - just like in code.

In my case I have variables for the two websites I want to block and my Alexa access code:

The result

The video below shows the automated process in action - some observations:
  • It happens very fast :) 
  • It's hard to see what happens because some UI elements are off-screen or barely in the screen - however the web automation doesn't care about this 

Reversing the process - unblocking the sinful websites

As you might expect, I have a similar process to reverse the actions - in essence I have two distinct Flows (which can be scheduled independently):

In my case, unblocking is a little more challenging because I have to identify the correct sites in the list. This proved to be one of those web automation challenges where it takes more than the recorder to nail the steps. I found myself with two options:
  • Identify the right CSS selector to find the element with "", and navigate in the DOM to a sibling element which is the "Remove" hyperlink
  • Leverage the fact that the websites I'm removing will always be the last ones in the list. This was far simpler!
In the end I used Power Automate Desktop's ability to run some JavaScript on the page you're automating - very powerful! My actions looked like this:

The final step - announcing to Alexa

Since the whole problem scenario is the fact that I'm often on calls, I needed to signal to the kids that gaming was either available or unavailable. Alexa automation is a world I hadn't really dug into until now, but I found there are essentially two ways of getting what I wanted -
  • Alexa Notifications - this pings the Alexa devices, but only in a "you have a notification" way. To listen to the notification, someone has to say "Alexa, read me my messages", which is sub-optimal for what I wanted
  • Alexa Routines - this allows you to do any number of things, including make an announcement in the true sense with no "pull" of the message required
To integrate with Alexa Routines, I used the Virtual Buttons Alexa skill - this is a service which abstracts the triggering of Alex Routines and gives you a REST endpoint to call. If you want more than one virtual button, and I did because I wanted two different announcements, the service is chargeable but the cost is fairly negligible and it does seem to simplify things.

There's some setup work to do, but the steps are documented in an e-mail the service creators send. Part of the process involves defining the Routine in the Alexa app - in this case, to announce "Sorry kids, gaming is blocked" or similar:   

To integrate this into my automated process, I made use of the "Run PowerShell script" action in Power Automate Desktop and called the REST API through Invoke-RestMethod, passing the appropriate JSON in the body to call make the blocked/unblocked announcement as needed (via the Virtual Button ID):


Solution complete! I now have an automated process which opens a browser, drives the keyboard and mouse to make the changes to my ISP settings, and announce to Alexa when done - fantastic.

Except at this point, things aren't fully automated in an unattended sense - up to now, I have a button to click in Power Automate Desktop which will fire up the browser and execute the steps. I have to press the button manually though. The next step is to make this an automated process that runs from the cloud on a schedule. So next time, we'll talk about:

  • Turning this into a cloud Flow which runs on a schedule
  • Licensing considerations for Power Automate Desktop
  • Authentication challenges and solutions
For the last point, I'm talking about things like this - Captcha image verifications which are designed to foil malicious automation:

More on this next time!

Thursday, 13 May 2021

Speaking at ESPC21 - using Power Apps and AI for Incident Reporting

The European SharePoint, Office 365 and Azure Conference has always been one of my favorite events in Microsoft technology, and I'm looking forward to delivering a session at the event again this year. The event is running on June 1-2, and the good news of course is that you can attend with no travel required. I can guarantee you will have no problems with travel AC adapters! ESPC always has amazing content, and as usual there are keynotes and announcements from Microsoft execs such as Jeff Teper, Karuana Gatimu, Charles Lamanna, Adam Harmetz and others. 

I really like what the conference has done with pricing when in "virtual mode" - there's a free registration option, but also pay-for choices which get you many extras including on-demand session access, certain bonus sessions and a choice of pre-event tutorials. There's a link to pricing and the conference schedule at the end. 

Some details on my session:

Building an Incident Reporting Solution with Power Apps and AI

AI is no longer a high-end concept that only applies to organisations with large I.T. budgets. Instead, it is readily available in numerous ways in Microsoft cloud technologies, and when your content is already in Microsoft 365 it's easy to tap into. The scenario used here is incident reporting, but the approaches shown in this session can apply to *many* common applications.

Using a combination of Power Platform and Azure Cognitive Services, we'll show how to add image recognition and tagging to an app in a few easy steps. This session is aimed at developers, citizen developers, and anyone else building solutions in Microsoft 365.

Session link: Building an Incident Reporting Solution with Power Apps and AI

I will be taking any questions you may have live during my session, so come prepared! 

As a speaker at ESPC21 Online I can share with you a special 25% discount on Pro Access tickets, which includes a pre-event tutorial of your choice, all event sessions on demand and more. If interested, just use code ESPCPRO when booking here

You can find the full event schedule here and check out some of ESPC’s reasons why you don’t want to miss this event here. I hope to see you at ESPC21 Online!

Thursday, 22 April 2021

SharePoint Syntex - teaching AI to extract contents of structured documents with Form Processing

In previous articles on SharePoint Syntex I've talked mainly about the document processing approach - in this post I'll discuss it's counterpart, form processing. For those following along, my overall set of articles on this theme so far are:

Syntex - document processing

Syntex - general
That last article in particular is designed to help you understand the difference between the two models and when to use each one. As you read about Syntex you might form the view that "form processing is for things like invoices and order forms and document understanding is for everything else" - certainly some of the guidance infers this. However, that position is far too simplistic - there are differences in licensing, capabilities, supported file types and more - and you'll want to get this decision right to avoid having to rework AI models. My "tips for choosing" article might be helpful since it has a table of differences and details of licensing aspects to look out for. 

But today, we focus on form processing!

Syntex form processing - integrated AI Builder

As the briefest of recaps, Syntex form processing is typically more suited to highly-structured and consistent document formats compared to document processing, that much is true. Since the AI Builder technology within Microsoft's Power Platform is used, there are a few implications to consider:
  • To use, AI Builder credits are needed in addition to Syntex licenses (see AI Builder calculator). However, if your org has 300+ Syntex licenses you receive a generous allowance of 1m credits - this more than gets you started
  • Supported file types include JPG, PNG or PDF - but not Office files
  • Entire tables can be extracted from the document (in contrast to document processing)
  • The model is applied via a Power Automate Flow to the SharePoint document library where your documents reside (i.e. where you create the model from) - but there is no easy way to use this in other locations
In short, it's AI Builder conveniently built into SharePoint document libraries - so you don't have to do the integration or somehow pass each document to the model, it's taken care of for you.

Our invoice format

Before we get started on the process, it's worth seeing the format of documents used in this process. Like many classic examples of this type, they are invoices:

Implementing Syntex form processing

The approach followed here can be summarised as:
  1. Define the information to extract (i.e. teach Syntex what the fields are e.g. "Invoice Reference", "Invoice Date" etc.)
  2. Add documents for analysis
  3. Tag documents (i.e. teach Syntex where to find the relevant content in the document)
  4. Train the model
  5. Test
  6. Use in your document library
In your SharePoint document library, find "Automate" > "AI Builder" > "Create a model to process forms":

You'll see this message alerting you that AI Builder credits are needed:

Give your model a name - I'm using "COB invoice" for now. I want a new SharePoint content type to be created with this name so these documents are easily identified and classified amongst any others:

Syntex then begins to create your AI model:

Once the model has been created we define which information within the document we want to extract:

As the image shows, I start specifying some things I want to extract such as:
  • The invoice date
  • The invoice reference
  • The VAT number
Syntex now allows me to supply a collection of documents to train the model:

I create a new collection of documents for my invoice scenario:

I have some invoices ready to go, so I select those to upload:

Once uploaded I'm ready to analyze!

Once the analysis is complete we move into the tagging phase

The tagging phase

As you move your mouse, Syntex allows you to highlight portions of the document by drawing boxes around identified pieces of text. By doing this, you map them to the fields you defined at the beginning - these appear in a picker for selection, with a checkbox indicating whether you've already mapped this item. So I move through the document teaching Syntex what is the invoice reference, what is the date, the supplier name and so on.

As you can see, Syntex allows me to pick something as granular as an individual word or even character, or expand to pick a phrase or string of characters. Items with a green border are already tagged:

Tables can also be tagged in this way:
Once I'm done tagging I'm presented with a summary of the model, with a list of the fields I've defined:

We're now ready to move into the training phase

The training phase

We start by hitting the Train button:

Once the model has been training you can either run a quick test against a new document (not one used for training) or go ahead and publish it to your SharePoint document library:

Let's go ahead and publish the model. Once I have a published version, any subsequent changes will create a draft - this allows me to test things out (and get them wrong) whilst not disrupting the extraction that's already in place.

Once a model is published, we can go ahead and use it:

This makes the model available for use in a Power Automate Flow, and the person using will need to consent to the connections being used:

The resulting Flow looks like this:

If you're interested in the mechanics, the piece that does the extraction is this - the "Predict" action for AI Builder which links to the model we just created:

The results

So let's go back to the invoice format we are using:

When this file is uploaded to SharePoint, initially it's just any old document:

..but then after a couple of minutes the document is correctly identified and classified as a "COB Invoice" and the values I trained the model for are extracted:

Excellent. Now I can drag in many old invoices and have them properly classified and summarised:

..and after a couple of minutes:


Syntex is hugely powerful in automatically unlocking critical data from documents - it doesn't need to be buried inside any more. At the beginning of this series, we discussed how the best research suggests knowledge workers spend 20-30% of their time just searching for information or expertise, and many of us would recognise that having to open many documents to check their contents can contribute to this. As above, I can build SharePoint document library views so that information is readily-accessible or the view is sorted, filtered or grouped according to extracted information.

These benefits go far beyond search and views though. Having my documents correctly identified means that I can apply security and compliance policies to them, for example a conditional access policy which means employees can't print or download sensitive contracts from an unmanaged device, or a retention policy that means a Master Services Agreement is retained for 6 years. Syntex can drive these approaches so that policies are applied by the AI recognising the document, and this can work across documents of wildly varying formats so long as there's some consistency that a document understanding rule can be applied to.

Being able to automatically extract information also means I can build process automation around my documents, for example if something comes in for a certain region or above a certain value, I can route approval processes or notifications accordingly. There are many possibilities here alone. 

Ultimately it comes down to classification and extraction, and there are so many possible use cases around CVs, proposals, statements of work, RFPs, employee contracts, invoices, sales/purchase orders,  service agreements, HR policies and just about any other document type you can think of. This is democratised AI in action, and it's great to have it so accessible in SharePoint.