So …. another Microsoft Ignite event! I wanted today to make a quick post regarding all new stuff that’s coming up inside Azure Cognitive Services.
What you will see in this post?
- Spatial Analysis – Computer Vision at the Edge. (Read more)
- OCR – Computer Vision Read OCR On-Premise Containers (Read More)
- Form Recognizer v2 – Public Preview (Read More)
- Speech Containers (Read More)
- Metrics Advisor (Read More)
- Neural TTS – New language support, more voices and flexible deployment options (Read More)
Spatial Analysis – Computer Vision at the Edge. (Read more)
We still live in a world that uses manual processes to understand their physical spaces and meet their business requirements. Computer Vision is introducing the new spatial analysis capability as a Cognitive Service with AI Edge container support. It uses Computer Vision AI on real-time video and offers the ability to understand people’s movements in a physical space. The ready-to-use, high-quality AI Models for spatial analysis operations are trained to understand people movement across a wide variety of scenarios, camera types, angles, and lighting conditions.
- Counting people in a space for maximum occupancy
- Understanding the distance between people for social distancing measures
- Determining footfall such as in retail spaces
- Understanding dwell time such as in front of a retail display or other designated location
- Determining wait time in a queue
- Determining when people are in a forbidden zone such as near industrial equipment
- Determining trespassing in protected areas
Its advanced AI models aggregate video from multiple cameras to count the number of people in the room, measure the distance between individuals, and monitor wait and dwell times.
OCR – Computer Vision Read OCR On-Premise Containers (Read More)
Microsoft’s Computer Vision OCR (Read) capability is available as a Cognitive Services Cloud API and as Docker containers.
Optical Character Recognition (OCR) is the foundational technology that drives the digitization of content today by extracting text from images, documents, and screens. There are several OCR technology providers that provide this capability as services, tools, and solutions, both in the cloud and for deployment within your environment.
The Read 3.1 container preview is the on-premise version of the same Read 3.1 API preview features we reviewed in the previous section. Moving forward, Read 3.1 and newer versions will add expanded language coverage and enhancements.
The Read 3.1 container preview includes everything that Read 3.0 has and adds the following additional capabilities covered in the previous section:
- Support for Simplified Chinese and Japanese
- Print vs. handwriting appearance for each text line with confidence scores
- Extract text from only selected page(s) from a large multi-page document
- Unified code and architecture ensures the container will be in step with cloud API releases in the future.
Form Recognizer v2 – Public Preview (Read More)
Form Recognizer service has added support for new and exciting features – multiple forms models (model compose), language expansion, pre-built business cards model, selection marks and lots more are now available in the Form Recognizer v2.1 release. Some of the new features:
- REST API reference is available
- New languages supported In addition to English, the following languages are now supported: for
LayoutandTrain Custom Model: English (en), Chinese (Simplified) (zh-Hans), Dutch (nl), French (fr), German (de), Italian (it), Portuguese (pt) and Spanish (es). - Checkbox / Selection Mark detection – Form Recognizer supports detection and extraction of selection marks such as check boxes and radio buttons. Selection Marks are extracted in
Layoutand you can now also label and train inTrain Custom Model– Train with Labels to extract key value pairs for selection marks. - Model Compose allows multiple models to be composed and called with a single model ID. When a document is submitted to be analyzed with a composed model ID, a classification step is first performed to route it to the correct custom model. Model compose is available for
Train Custom Model– Train with labels. - Model name add a friendly name to your custom models for easier management and tracking.
- New pre-built model for Business Cards for extracting common fields in English, language business cards.
- New locales for pre-built Receipts in addition to EN-US, support is now available for EN-AU, EN-CA, EN-GB, EN-IN
- Quality improvements for
Layout,Train Custom Model– Train without Labels and Train with Labels.
Table visualization The sample labeling tool now displays tables that were recognized in the document. This lets you view the tables that have been recognized and extracted from the document, prior to labeling and analyzing. This feature can be toggled on/off using the layers option.

Speech Containers (Read More)
Speech to Text and Text to Speech containers from Azure Cognitive Services are now Generally Available (GA). Using the combination of these containers, customers can build a speech application architecture that is optimized for both robust cloud capabilities and edge locality. Deploying your first container is about a 2-minute read, you basically create a resource at Azure portal, download image, run container with environmental variables.
Metrics Advisor (Read More)
Metrics Advisor, is now part of Azure Cognitive Services to address the need for metrics intelligence. The service ingests data from various sources, using machine learning to automatically find anomalies from sensors, products, and business metrics, and provides diagnostics insights. Yu are able to inject data, find anomalies and send alerts. also can analyze root cause, even if other metrics are causing or influencing the anomalies.
To get timely insights into their business, organizations need to monitor metrics proactively and quickly diagnose issues as they arise. Metrics Advisor, a new Azure Cognitive Service, helps customers to do this through a powerful combination of real-time monitoring, auto-tuning AI models, alerting, and root cause analysis. It allows organizations fix issues before they become significant problems. No machine learning expertise is required.
Neural TTS – New language support, more voices and flexible deployment options (Read More)
(Neural TTS), a powerful speech synthesis capability of Cognitive Services on Azure, enables you to convert text to lifelike speech which is close to human-parity. Neural TTS has now been extended to support 18 new languages/locales. They are Bulgarian, Czech, German (Austria), German (Switzerland), Greek, English (Ireland), French (Switzerland), Hebrew, Croatian, Hungarian, Indonesian, Malay, Romanian, Slovak, Slovenian, Tamil, Telugu and Vietnamese. With 14 additional voices released to enrich the variety.
How does Cognitive Services Containers manage Connectivity and Billing?
The way those particular Azure Cognitive Services containers are setup, they require Internet connectivity to reach back to Azure for billing purposes. One solution is to create your own containers using the Azure Cog Services you need and running them on Azure IoT Edge.
Azure IoT Edge supports extended offline operations on your IoT Edge devices, and enables offline operations on non-IoT Edge child devices too. As long as an IoT Edge device has had one opportunity to connect to IoT Hub, it and any child devices can continue to function with intermittent or no internet connection.
Here is a tutorial that show how to get up and running with cog services (computer vision) but you can sway speech and other services.


Leave a Reply