Tuesday, 3 March 2020

AI in Office 365 apps – choosing between Power Apps AI Builder, Azure Cognitive Services and Power Automate: Part 2

In the last article we talked about some examples of using AI in Office 365, and looked in detail at the the idea of building an incident reporting app which combines common Office 365 building blocks with AI. Whilst Power Apps and SharePoint underpin our solution, we use AI to triage the incident by understanding what is happening in the image. Is this a serious emergency? Are there casualties or emergency services involved? Our image processing AI can interpret the picture, and add tags and a description to where the file is stored in Office 365 - this can drive some automated actions to be taken, such as alerting particular teams or having a human review the incident. We also looked at the results of feeding in a series of images to the AI to analyze the results of different scenarios.

Overall, this article is part one of a series:

  1. AI in Office 365 apps - a scenario, some AI examples and a sample Power App 
  2. AI in Office 365 apps - choosing between Power Apps AI Builder, Azure Cognitive Services and Power Automate (this article)
  3. AI in Office 365 apps - pricing and conclusions
In this article, we'll focus on the how - in particular, comparing three approaches we could use in Office 365 to build our incident reporting app and make use of AI. Which is easier? What skills are required? How long might we expect it to take for each approach?

Choosing between Power Apps AI Builder, Azure Cognitive Services and Power Automate

As mentioned in the last article, there are a number of ways we could build this app:
  • Use of Power Apps AI Builder
  • A Power App which talks directly to Azure Cognitive Services (via a Custom Connector)
  • A Power App which uses a Power Automate Flow to consume AI services
For each approach, we’re looking at how the app would be built, pricing, and any constraints and considerations which come with this option.

Option 1 - Power Apps AI Builder

AI Builder is still in preview at the time of writing (February 2020 - release will be April 2020) with four models offered:

As you might expect, the idea is to make AI relatively easy to use and to focus on common scenarios. No coding is required, and anyone with Power Apps experience can now take advantage of what would previously have been highly-complex application capabilities.
How the app would be built
In our scenario, it’s the “Object Detection” model that is relevant. This can detect specific objects in images that are supplied to it, as well as count the number of times the recognized object is found in the image. The first step is to define a model and supply some sample images:



You'll need an entity pre-defined in CDS to use AI Builder - I'm skipping through a few screens here, but ultimately you select a CDS entity representing the object you are detecting:


In my case, I was building an app related to transport and my entity is "Bus". The next step is to start to train the AI model by providing some images containing the entity:



We then tag the object in each image with the entity:



Once all this is done, you can use the Object Detector control in your Power App and configure it to use this model:



Since the whole topic of how to build apps with AI Builder is interesting, I'll most likely go through this process in more detail in a future article - but hopefully you get a feel for what the process looks like.

In the case of our scenario, we said that we wanted the images to be tagged in SharePoint - and here's where we run into a consideration with AI Builder:

Power Apps AI Builder - great for SOME forms of image detection

The Object Detector capability allows us to detect whether a certain object is in an image or not, and how many times. However, our scenario demanded the capability to recognize *what* was happening in an image, not simply whether a predefined object is present or not! And that's what AI Builder provides - a percentage certainty of whether your single object is present or not. This is much less flexible other forms of AI image processing, and we'd need to somehow supplement this to achieve the goals of our application. After all, we can't provide an AI model with every known object in the universe..

Option 2 - Azure Cognitive Services

Another way of bringing AI into an app is to plug directly into Azure Cognitive Services. As you might expect, this as a developer-centric approach which is more low-level - we're not in the Power Platform or other low-code framework here. The big advantage is that there's a wider array of capabilities to use. Compared to the other approaches discussed here, we're not restricted to whatever Microsoft have integrated into the Power Platform. The high-level areas of Cognitive Services currently extend to:
  • Decision - detect anomalies, do content moderation etc.
  • Language - services such as LUIS, text analytics (e.g. sentiment analysis, extract key phrases and entities), translation between 60+ languages
  • Speech - convert between text and speech (both directions), real-time speech translation from audio, speaker recognition etc.
  • Vision - Computer Vision (e.g. tag and describe images, recognize objects, celebrities, landmarks, brands, perform OCR, generate thumbnails etc.), form data extraction, ink/handwriting processing, video indexing, face recognition and more
    • NOTE - this is the service that's relevant to the scenario in this article, in particular the Computer Vision API's ability to tag and describe images) 
  • Web search - Bing autosuggest, Bin entity/image/news/visual/video search and more 
In terms of what this looks like for our scenario, let's take the following image:


How the app would be built
If I can write some code to consume the Computer Vision API and send the above image to it, I get a response that looks like this (notice the tags such as "person", "indoor", "ceiling", "event", "crowd" and so on:

The code I used to do this is:

Code sample:

The relationship between Azure Cognitive Services and other options

Whilst we're talking about Cognitive Services, it's worth recognising of course that all of the options listed in this e-mail use these services underneath. Power Apps AI Builder, the Power Automate activities discussed here, and many other facilities in Microsoft cloud technologies all use Azure Cognitive Services underneath. When you're thinking about technology options, it's worth considering that the more direct your approach to Azure is, the cheaper it is likely to be.

Option 3 - Using Power Automate (Flow) to consume AI

The final option presented here is to create a Flow which will do the work of tagging and describing the incident report images. This is by far the easiest way I think, and is perhaps an overlooked approach for building AI into your apps - I recommend it highly, Power Automate is your friend. Note, however, that these are premium Flow actions - we'll cover licensing and pricing more in the next post, but for now understand that bringing AI capabilities this way does incur additional cost (as it does with the other two approaches).

In the scenario of our Power App for incident reporting, the simplest implementation is probably this:
  1. Power App uploads image to SharePoint document library
  2. Flow runs using the "SharePoint - when a file is created in a folder" trigger
  3. The Flow calls Azure Cognitive Services (using the native Flow actions for this)
  4. Once the tags and image descriptions have been obtained, they are written back to the file in SharePoint as metadata
The beauty is really in step 3. Since Microsoft provide hooks into many AI services as Flow actions, infusing AI into your app this way is extremely simple - no code is required, and it's as simple as lining up some actions in the Flow. Some skills and experience in Power Automate are certainly required, but the bar is certainly much lower than the other options. 

Here's what my end-to-end Flow looks like:

In more specific terms, the trigger is on a new file being created in the SharePoint site and list where my Power App pushes the image to:

For each file that has been added, I get the file and then call:
  • Describe Image Content 
  • Tag Image
The Flow section below shows how we get the details of the file added to SharePoint to be able to pass the contents to the Azure actions for processing:
On the Power App side of course, I need something to use the camera on the device to take the photo and upload the file to SharePoint - but that's not too complex, and it's just a question of adding the Power Apps camera control to my app to facilitate part of that. Of course, a major capability of Power Apps is being able to plug into device facilities such as camera, GPS and display functions, so it should be no surprise that's simple. If you remember, I showed my sample app briefly in the last post:



However, I do need to do some work to get my image into SharePoint once it has been captured - in my case I use the integration between Power Apps and Power Automate to do this. I create a Flow which uses the Power Apps trigger and ultimately uses the SharePoint "Create File" action. The important part though, is the Compose action in the middle which uses the "DataUriToBinary" function to translate the image data from how Power Apps captures it to how SharePoint needs it:

I then link the Flow to the Power App:

I can then use this in my formulas, as:

UpdateContext( { PictureFilename: "Incident_" & Text( Now(), "[$-en-US]yyyy-mm-dd-hh-mm-ss" ) & ".jpg" } );

IncidentPhotoProcessing.Run(PictureFilename, First(Photos).Url);

..and there we go, a fairly quick and easy way to get the photo for my incident into SharePoint so that the AI can do it's processing.

Summary


We've looked at three possible approaches in this post to building an Office 365 application which uses AI - Power Apps AI Builder, use of Azure Cognitive Services from code and use of actions in Power Automate which relate to AI. The findings can be summarized as:
  • Different skills are needed for each approach:-
    • Power Automate is the simplest to use because it provides actions which plug into AI easily - just build a Flow which can receive the image, and then use the Computer Vision actions shown above
    • Direct use of Azure Cognitive Services APIs requires coding skills (either use provided SDKs for .NET and JavaScript etc. or make your own REST requests to the Azure endpoints), but is a powerful approach since the full set of Microsoft AI capabilities are exposed
  • Capabilities are different across the options:-
    • Power Apps AI Builder has some constraints regarding our image processing scenario. The "object detection" model is great for identifying if a known object is present in the image, but can't help with identifying any arbitrary objects or concepts in the image
    •  Azure Cognitive Services underpins all of the AI capabilities in Office 365, and offers many services not exposed in Power Apps or Power Automate. As a result, it offers the most flexibility and power, at the cost of more effort and different skills to implement
  • Requirements and context are important:-
    • In our scenario we're talking about a Power App which captures incident data and stores it in SharePoint - in other words, we've already specified that the front-end should be Power Apps. In this context, integrating Azure Cognitive Services directly would be a bit more challenging than the other two approaches (but is relatively simple from a coded application). In Power Apps, we'd need a custom connector to bring the code in (probably in the form of an Azure Function), and that's certainly more complex than staying purely in the Power Platform 
    • In the design process, other requirements could lead to one technical approach being much more appropriate than another. As another example, if the app needed a rich user interface that was difficult to build in Power Apps, the front-end may well be custom code. At this point, there's an argument for saying that using code for the content services back-end of the application also makes sense too
As usual, having a team or partner who understands this landscape well and has capability across all options can lead to the best result. In this case, all options can be on the table rather than being limited to one or two because of skills. The architecture decision can be merit-based, considering the use cases and scenarios for the app, the AI capabilities needed, the user experience, the range of devices used, speed to implement and cost.

And cost, of course, is the element that we haven't covered yet! Let's discuss that in the next post: