Extract Text From Images Using Computer Vision API and Azure Functions
See how to extract text from images using Computer Vision API and Azure Functions.
Join the DZone community and get the full member experience.
Join For FreeI started to work on a project that is a combination of a lot of intelligent APIs and machine learning. One of the things I have to accomplish is to extract the text from the images that are being uploaded to the storage. To accomplish this part of the project, I plan to use the Microsoft Cognitive Service Computer Vision API. Here is the extract of it from my architecture diagram.
Let's get started by provisioning a new Azure Function.
I named my Function App quoteextractor
and selected the Hosting Plan as Consumption Plan
instead of App Service Plan
. If you choose Consumption Plan
, you will be billed only what you use or whenever your function executes. On the other hand, if you choose App Service Plan, you will be billed monthly based on the Service Plan you choose even if your function executes only a few times a month or not even executed at all. I have selected the Location
as West US
because the Cognitive Service subscription I have has the endpoint for the API from West US
. I then also created a new Storage Account
. Click on Create
button to create the Function App.
Now change the Storage account connection
, which is basically a connection string for your storage account. To create a new connection, click on the new
link and select the storage account you want your function to get associated with. The storage account I am selecting is the same one that got created at the time of creating the Function App.
Once you select the storage account, you will be able to see the connection key in the Storage account connection
dropdown. If you want to view the full connection string, then click the show value
link.
Click on Integrate
, and under Triggers
, update the Blob parameter name
from myBlob
to quoteImage
or whatever the name you feel like having. This name is important as this is the same parameter I will be using in my function. Click Save
to save the settings.
With this step, you should be done with all the configurations, which are required for Azure Function to work properly. Now let's get the Computer Vision API
subscription. Go to https://azure.microsoft.com/en-in/try/cognitive-services/, and under the Vision
section, click Get API Key
, which is next to Computer Vision API
. Agree with all the terms and conditions and continue. Log in with any of the preferred accounts to get the subscription ready. Once everything goes well, you will see your subscription key and endpoint details.
The result that I'm going to get as a response from the API is in JSON format. Either I can parse the JSON request in the function itself or I can save it directly in the CosmosDB or any other persistent storage. I am going to put all the response in CosmosDB in raw format. This is an optional step for you, as the idea is to see how easy it is to use Cognitive Services Computer Vision API with Azure Functions. If you are skipping this step, then you have to tweak the code a bit so that you can see the response in the log window of your function.
Now that we have everything ready, let's get started with the code!
Let's start by creating a new function called SaveResponse
, which takes the API response as one of the parameters and is of type string. In this method, I am going to use the Azure DocumentDB library to communicate with CosmosDB. This means I have to add this library as a reference to my function. To add this library as a reference, navigate to the KUDU portal by adding scm in the URL of your function app like so: https://quoteextractor.scm.azurewebsites.net/. This will open up the KUDU
portal.
mkdir bin
You can now see the bin
folder in the function app directory. Click the bin
folder and upload the lib(s).
In the very first line of the function, use the #r
directive to add the reference of external libraries or assemblies. Now you can use this library in the function with the help of using
directive.
#r "D:\home\site\wwwroot\QuoteExtractor\bin\Microsoft.Azure.Documents.Client.dll"
Right after adding the reference of the DocumentDB assembly, add namespaces with the help of using directive. In the later stage of the function, you will also need to make HTTP POST calls to the API endpoint and therefore make use of the System.Net.Http
namespace as well.
using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
using System.Net.Http;
using System.Net.Http.Headers;
The code for the SaveResponse
function is very simple and just makes use of the DocumentClient class to create a new document for the response we receive from the Vision API. Here is the complete code.
private static async Task<bool> SaveResponse(string APIResponse)
{
bool isSaved = false;
const string EndpointUrl = "https://imgdocdb.documents.azure.com:443/";
const string PrimaryKey = "QCIHMgrcoGIuW1w3zqoBrs2C9EIhRnxQCrYhZOVNQweGi5CEn94sIQJOHK3NleFYDoFclB7DwhYATRJwEiUPag==";
try
{
DocumentClient client = new DocumentClient(new Uri(EndpointUrl), PrimaryKey);
await client.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri("VisionAPIDB", "ImgRespCollcetion"), new {APIResponse});
isSaved = true;
}
catch(Exception)
{
//Do something useful here.
}
return isSaved;
}
private static async Task<string> ExtractText(Stream quoteImage, TraceWriter log)
{
string APIResponse = string.Empty;
string APIKEY = "b33f562505bd7cc4b37b5e44cb2d2a2b";
string Endpoint = "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/ocr";
HttpClient client = new HttpClient();
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", APIKEY);
Endpoint = Endpoint + "?language=unk&detectOrientation=true";
byte[] imgArray = ConvertStreamToByteArray(quoteImage);
HttpResponseMessage response;
try
{
using(ByteArrayContent content = new ByteArrayContent(imgArray))
{
content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
response = await client.PostAsync(Endpoint, content);
APIResponse = await response.Content.ReadAsStringAsync();
log.Info(APIResponse);
//TODO: Perform check on the response and act accordingly.
}
}
catch(Exception)
{
log.Error("Error occured");
}
return APIResponse;
}
There are few things in the above function that need our attention. You will get the subscription key and the endpoint when you register for the Cognitive Services Computer Vision API. Notice the endpoint I am using also had ocr
at the end, which is important, as I want to read the text from the images I am uploading to the storage. The other parameters I am passing are the language
and detectOrientation
. Language
has the value as unk
, which stands for unknown
, which in turn tells the API to auto-detect the language, and detectOrientation
checks text orientation in the image. The API responds in JSON format, which this method returns back in the Run
function, and this output becomes the input for SaveResponse
method. Here is the code for ConvertStreamtoByteArray
method.
private static byte[] ConvertStreamToByteArray(Stream input)
{
byte[] buffer = new byte[16*1024];
using (MemoryStream ms = new MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}
Now comes the method where all the things will go into, the Run method. Here is how the Run method looks:
public static async Task<bool> Run(Stream quoteImage, string name, TraceWriter log)
{
string response = await ExtractText(quoteImage, log);
bool isSaved = await SaveResponse(response);
return isSaved;
}
I don't think this method needs any kind of explanation. Let's execute the function by adding a new image to the storage and see if we are able to see the log in the Logs window and a new document in the CosmosDB. Here is the screenshot of the Logs
window after the function gets triggered when I uploaded a new image to the storage.
In CosmosDB, I can see a new document added to the collection. To view the complete document, navigate to Document Explorer
and check the results. This is what my view looks like:
To save you all a little time, here is the complete code.
#r "D:\home\site\wwwroot\QuoteExtractor\bin\Microsoft.Azure.Documents.Client.dll"
using System.Net.Http.Headers;
using System.Net.Http;
using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
public static async Task<bool> Run(Stream quoteImage, string name, TraceWriter log)
{
string response = await ExtractText(quoteImage, log);
bool isSaved = await SaveResponse(response);
return isSaved;
}
private static async Task<string> ExtractText(Stream quoteImage, TraceWriter log)
{
string APIResponse = string.Empty;
string APIKEY = "b33f562505bd7cc4b37b5e44cb2d2a2b";
string Endpoint = "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/ocr";
HttpClient client = new HttpClient();
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", APIKEY);
Endpoint = Endpoint + "?language=unk&detectOrientation=true";
byte[] imgArray = ConvertStreamToByteArray(quoteImage);
HttpResponseMessage response;
try
{
using(ByteArrayContent content = new ByteArrayContent(imgArray))
{
content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
response = await client.PostAsync(Endpoint, content);
APIResponse = await response.Content.ReadAsStringAsync();
//TODO: Perform check on the response and act accordingly.
}
}
catch(Exception)
{
log.Error("Error occured");
}
return APIResponse;
}
private static async Task<bool> SaveResponse(string APIResponse)
{
bool isSaved = false;
const string EndpointUrl = "https://imgdocdb.documents.azure.com:443/";
const string PrimaryKey = "QCIHMgrcoGIuW1w3zqoBrs2C9EIhRnzZCrYhZOVNQweIi5CEn94sIQJOHK2NkeFYDoFcpB7DwhYATRJwEiUPbg==";
try
{
DocumentClient client = new DocumentClient(new Uri(EndpointUrl), PrimaryKey);
await client.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri("VisionAPIDB", "ImgRespCollcetion"), new {APIResponse});
isSaved = true;
}
catch(Exception)
{
//Do something useful here.
}
return isSaved;
}
private static byte[] ConvertStreamToByteArray(Stream input)
{
byte[] buffer = new byte[16*1024];
using (MemoryStream ms = new MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}
References
Published at DZone with permission of Prashant Khandelwal, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments