Using Amazon Rekognition for OCR and Image Identification using the Go API

Recipes for OCR and Image Identification

by Hadley Bradley

Amazon Rekognition is a highly scalable, deep learning technology that let’s you identify objects, people, and text within images and videos. It also provides highly accurate facial analysis and facial search capabilities. In this short article we’ll explore two use cases with the Go API. The first will be to OCR (extract text) from and image and the second will be to identify the contents of an image.

The first step is to install the AWS software development kit (SDK) for Go. This is done by using the following command issued at the terminal or command prompt.

go get github.com/aws/aws-sdk-go/...

Once the AWS SDK has been installed, you’ll then need to import the relevant sections into your program to be able to interact with the Rekognition service.

package main

import (
	"bufio"
	"fmt"
	"io/ioutil"
	"log"
	"os"
	"strings"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/credentials"
	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/rekognition"
)

The first step within our interaction with any AWS service is to initialise a session; by passing in your AWS access key and AWS secret key. For security reasons these aren’t included within our program but obtained from environment variables. After calling the NewSession function we need to check the error state. If an error has occurred then we terminate the program and log the error message to the terminal.

    s, err := session.NewSession(&aws.Config{
        Region: aws.String("eu-west-2"),
        Credentials: credentials.NewStaticCredentials(
            os.Getenv("AccessKeyID"),
            os.Getenv("SecretAccessKey"),
            ""),
    })

    if err != nil {
        log.Fatal(err.Error())
    }

OCR — Optical Character Recognition

The image below is the later flow packaging for a Covid19 test. Let’s assume we want to perform analysis of this image and extract the text into a machine readable format so that we can use the data within our application/database. This would be a good test for the API as the packaging is quite shiny and thus reflects the light in different ways.

Photo of LFA packaging

We initialise the rekognition API using the DetectTextInput option. We need to pass in the raw bytes for the image we want to process. This can either be a reference to an S3 object (ideal for running this type of analysis as a Lambda function) or like in this example an image loaded from the local file system.

    svc := rekognition.New(s)
    input := &rekognition.DetectTextInput{
        Image: &rekognition.Image{Bytes: loadLocalFile("lfa.jpg")},
    }

The function to load the image from local disk is shown below.

func loadLocalFile(filename string) []byte {
    var err error
    var f *os.File
    var reader *bufio.Reader
    var data []byte

    f, err = os.Open(filename)
    if err != nil {
        log.Fatal(err.Error()
    }

    reader = bufio.NewReader(f)
    data, err = ioutil.ReadAll(reader)
    if err != nil {
        log.Fatal(err.Error())
    }

    return data
}

We can then run the DetectText function and capture the results. Once we have the results we can then loop over the entires and print out the text that has been detected within the image.

    result, err := svc.DetectText(input)
    if err != nil {
        log.Fatal(err.Error())
    }

    for _, entity := range result.TextDetections {
        if *entity.Type == "LINE" {
            fmt.Println(strings.TrimSpace(*entity.DetectedText))
        }
    }

Which produces the following output. As you can see, its done a very good job of detecting the text within the image.

INNOVA
1 Test
Antigen Test Cartridge
LOT X2010008
2020. 10. 30
2022. 10. 29
CE
INNOVA

As with all AWS Rekognition results a confidence score is returned, which you can use to decide if you want to trust a particular result. You can also return the bounding box coordinates for the detected text by accessing the Geometry values. I plan to do a follow up article which demonstrates how to use these values to draw highlighting boxes back onto the original image.

    for _, entity := range result.TextDetections {
        if *entity.Type == "LINE" {
            fmt.Println(strings.TrimSpace(*entity.DetectedText))
            fmt.Println(*entity.Confidence)
            fmt.Println(*entity.Geometry.Polygon[0].X)
            fmt.Println(*entity.Geometry.Polygon[0].Y)
        }
    }

Image Identification

Let’s try using the image identification service to see what it makes of this image:

Photo of a Blue Tit bird sat on a fence

So preparing the input is slightly different. This time we need to use the DetectLabelsInput function which again let’s us load the image content from either an S3 bucket or a local file. However, the DetectLabelsInput function has two additional parameters to specify the maximum number of identifications (labels) and the minium confidence score to use.

svc := rekognition.New(s)

input := &rekognition.DetectLabelsInput{
    Image:         &rekognition.Image{Bytes: loadLocalFile("input.jpg")},
    MaxLabels:     aws.Int64(123),
    MinConfidence: aws.Float64(70.000000),
}

result, err := svc.DetectLabels(input)
for _, detectedLabel := range result.Labels {
    fmt.Println(*detectedLabel.Name)
}

Once we’ve defined the input, then we can call the DetectLabels function and loop over the results printing out the label names. For the above image the following identifications were returned:

Bird
Animal
Jay, Finch
Blue Jay

Costs

At $1.16 for every 1,000 images it would cost around $60 to categorise 50,000 images. So I’m going to be looking at categorising my entire photo albums so that I can search by the generated labels.

Need Help or Advice

If you need any help or advice in using AWS Rekognition with Go then please get in touch I’d be happy to help.