Receipt Optical Character Recognition (OCR)

Asprise Receipt OCR detects and extracts receipt information from images.

To initiate a receipt OCR, one needs to send a POST request (the image file and optional setting parameters) to our API endpoint. The API will return the results as JSON within seconds.

Receipt OCR endpoints

  • http://ocr.asprise.com/api/v1/receipt (HTTP)

  • https://ocr.asprise.com/api/v1/receipt (HTTPS)

  • http://ocr2.asprise.com/api/v1/receipt (HTTP - backup)

  • https://ocr2.asprise.com/api/v1/receipt (HTTPS - backup)

You may perform receipt OCR from Windows, macOS and Linux command consoles or from any of your favorite programming languages.

Click the tab below to find out how to OCR a receipt from the command line or in C# VB.NET, Java, JavaScript/Node.js, PHP or Python.

curl -X POST -F "api_key=TEST" -F "recognizer=auto" -F "ref_no=my_ref_123" -F "file=@receipt.jpg" https://ocr.asprise.com/api/v1/receipt

The complete source code of the receipt OCR sample programs in C#, Java, JavaScript, PHP and Python can be found at github.com/Asprise/receipt-ocr

Request Parameters

When sending a receipt OCR request, you may pass along the following parameters:

  • api_key (string, required)

  • recognizer (string, required)

  • file (file, required)

  • ref_no (string, optional)

  • mapping_rule_set (string, optional)

api_key

api_key (string, required) is used to identify the client who makes the OCR request. If you don’t have one, you may simply set it to TEST.

recognizer

A recognizer is implemented as a set of machine learning algorithms that optimizes the receipt recognition for a particular country or a specific scenario.

You use recognizer (string, required) to select the recognizer to be used for the given receipt.

Country-specific recognizers offered by the OCR API:

  • AU for recognizing receipts from Australia

  • DE for recognizing receipts from German

  • GB for recognizing receipts from the United Kingdom

  • JP for recognizing receipts from Japan

  • MY for recognizing receipts from Malaysia

  • SG for recognizing receipts from Singapore

  • US for recognizing receipts from the United States

If all the receipts you need to recognize are from a single country, you may simply set recognizer to one of the value above.

If you need to recognize receipts from any country not in the above list, please contact us so that we can add it for you.s

Multiple countries

If the receipts are from two countries or more, you can specify recognizer to list of the countries code separated by comma. For example, if the receipts are from either German or the UK, recognizer should be set to DE,UK.

When a receipt is detected, the OCR API will first select the top match from the list of the recognizers. The selected recognizer is then used to recognize the receipt.

“auto”

When recognizer is set to auto, the OCR API will try to find a top match from all of the available recognizers.

This is a convenient value if you aren’t sure where a receipt is from. However, it comes at a cost - it is usually slower as the OCR API needs to find a match among all the recognizers. Always specify a recognizer or a list of recognizers if you can.

file

This is the image file that contains one or multiple receipts. File format supported:

  • JPEG

  • PNG

  • PDF

  • TIFF

ref_no

You use ref_no (string, optional) to identify a OCR request for your own reference if necessary. ref_no from the request will be copied to the response. It doesn’t affect the OCR process in any way.

mapping_rule_set

Mapping rule sets can be used to post-process receipts after they have been recognized. For example, a mall operator may use a mapping rule set to identify each store accurately via matching of merchant address (unit number) or phone numbers, and sets a custom merchant id property accordingly.

A mapping rule set defines a set of rules. Each rule defines matching criteria and properties to be set if a receipt is matched.

You use mapping_rule_set (string, optional) to specify the id of the receipt mapping rule set that should be applied to a receipt after it has been recognized.

Before you can use mapping_rule_set to specify a rule set, you must first define it and then submit it to the OCR API.

Define mapping rule sets

A mapping rule set is represented in JSON. Below is a sample:

{
  "mapping_rule_set_id": "MY_MALL",
  "rules": [
    {
      "matching" : {
        "merchant_name": "MCD",
        "merchant_tax_reg_no" : "TAX1234",
      },
      "set_props": {
        "merchant_name": "McDonald's Restaurant",
        "my_custom_store_id": "mcdonald_123",
        "my_custom_prop": "US Food"
      }
    },
    {
      "matching" : {
        "merchant_phone": "6362"
      },
      "set_props": {
        "merchant_name": "Another Great Store",
      }
    }
  ]
}

mapping_rule_set_id (required, string, min 2 characters) - the id of the rule set; rules - list of rules as an array.

Each rule object contains two main parts:

matching (the matching criteria) contains receipt property name to keyword pairs. At runtime, the OCR API will attempt each pair to see whether the value of the corresponding property of a receipt contains the specified keyword (case-insensitive). The minimum length of keyword is 2. By default, a receipt is considered as rule matched if there is at least one such pair being matched.

For the list of supported property names, please refer to Receipt Object.

set_props contains properties to be set only if a receipt is matched. The properties can be either the standard receipt properties as defined in Receipt Object or your own custom properties.

By default, the OCR API will stop further matching once a rule has been matched for a receipt. You may use stop_if_matched to change this behavior. The default value of stop_if_matched for a rule is true. Setting it to false will allows the OCR API to continue matching even if the current rule is matched.

You draft a receipt mapping rule set in a JSON file. Keeping such rule set JSON files in a versioning system like GIT will help you track all the changes.

To get started, you may refer to the following samples: Sample 1 | Sample 2

When writing rules, you are recommended to use an editor that can provide code assist and validation against the JSON schema: Receipt Mapping Rule Set JSON Schema. Attributes that are not defined in the schema will be ignored, and no error will be thrown.

Submit mapping rule sets

After defining a rule set, you need to submit it to the OCR API to take effect.

Using Web GUI

Visit the web GUI URL we provided to you:

Receipt OCR Mapping GUI

Input your API key, keep the endpoint URL, select mapping_rule_set_update as the action and copy your entire rule set to the content box (alternatively, you may drag and drop your JSON file to the web page to set the content), hit ‘Execute Action’.

Once a rule set has been submitted, it takes effect immediately. To delete it, you simply set the content to a rule set with rules defined (mapping_rule_set_id must be present though).

Using the REST API

If you need to frequently update rule sets or you want to automate, you may use the REST API to do so.

To create or update a receipt mapping rule set, please make a POST request to https://ocr.asprise.com/api/v1/receipt with the following parameters:

  • api_key your API key

  • action must be set to mapping_rule_set_update

  • content the entire content of the rule set

Receipt OCR Results

When a receipt OCR request is received, the OCR API will process it. In case of failure (e.g., missing required request paramters), the OCR API will respond the error message with HTTP code of 400 (Bad Request response). In case of success, it will return the result in JSON with HTTP code of 200.

Below is a sample JSON result:

{
  "request_id" : "...",
  "ref_no" : "123",
  "file_name" : "receipt.jpg",
  "request_received_on" : 1610077103664,
  "success" : true,
  "recognition_completed_on" : 1610077104172,
  "receipts" : [ {
    "merchant_name" : "Merchant A", // receipt object #1
   ...
  }, {
    "merchant_name" : "Merchant B", // receipt object #2
   ...
  } ]
}

Top level result properties include:

  • request_id System generated ID

  • ref_no the reference number passed in the request by the client

  • file_name Name of the uploaded image file

  • request_received_on Epoch time in milliseconds when the request is received

  • success Whether the OCR is performed successfully

  • recognition_completed_on Epoch time in milliseconds when the OCR is complete

  • receipts an array of receipt objects; each receipt is represented by a receipt object

Receipt Object

A receipt object has many properties and owns items. Sample receipt object:

{
 "merchant_name" : "McDonald's",
 "merchant_address" : "600 @ Toa Payoh #01-02, Singapore 319515",
 "merchant_phone" : "62596362",
 "merchant_website" : null,
 "merchant_tax_reg_no" : "M2-0023981-4",
 "merchant_company_reg_no" : null,
 "merchant_logo" : null,
 "region" : null,
 "mall" : "600 @ Toa Payoh",
 "country" : "SG",
 "receipt_no" : "002201330026",
 "date" : "2016-01-13",
 "time" : "15:49",
 "items" : [ {
   "amount" : 2.95,
   "description" : "Med Ice Lemon Tea",
   "flags" : "",
   "qty" : 1,
   "remarks" : null,
   "unitPrice" : null
 }, {
   "amount" : 2.40,
   "description" : "Coffee with Milk",
   "flags" : "",
   "qty" : 1,
   "remarks" : null,
   "unitPrice" : null
 } ],
 "currency" : "SGD",
 "total" : 5.35,
 "subtotal" : null,
 "tax" : 0.35,
 "service_charge" : null,
 "tip" : null,
 "payment_method" : "cash",
 "payment_details" : null,
 "credit_card_type" : null,
 "credit_card_number" : null,
 "ocr_text" : "...",
 "ocr_confidence" : 96.82,
 "width" : 1940,
 "height" : 2395,
 "avg_char_width" : null,
 "avg_line_height" : null,
 "source_locations" : {
   "date" : [ [
     { "x" : 1024, "y" : 1396 },
     { "x" : 1971, "y" : 1390 },
     { "x" : 1972, "y" : 1522 },
     { "x" : 1024, "y" : 1528 }
    ] ],
   "total" : [ [
     { "x" : 1909, "y" : 1958 },
     { "x" : 2123, "y" : 1955 },
     { "x" : 2124, "y" : 2057 },
     { "x" : 1910, "y" : 2060 }
    ] ]
  }
}

Receipt object properties include:

Name

Description

merchant_name

Name of the merchant

merchant_address

Address of the merchant

merchant_phone

Phone number

merchant_website

Website if any

merchant_tax_reg_no

Tax registration number

merchant_company_reg_no

Company registration number

merchant_logo

URL of the merchant logo image

region

Region or area

mall

Mall

country

Two-letter country code

receipt_no

Receipt number (can be used for duplicate detection)

date

Date of the receipt

time

Time of the receipt if available

items

An array of line item objects (see below for more details)

currency

Currency used

total

Total amount

subtotal

Subtotal

tax

Tax

service_charge

Server charge amount

tip

Tip amount

payment_method

Payment method: cash, credit card, etc.

payment_details

Payment

credit_card_type

Credit card type: amex, master, visa

credit_card_number

Usually the last 4 digits of the credit number

ocr_text

The complete text with layout rentention

ocr_confidence

A number less than 100; the higher the better

width

Width of the input image in pixel

height

Height of the input image in pixel

source_locations

Map of key fields to polygon locations where the values are retrieved from

Note that not all properties are present on all receipts.

Line Item Object

Properties of a line item object:

Name

Description

amount

Amount of the line item

description

Description

flags

Text after amount indicating tax status

qty

Quantity

remarks

Remarks

unitPrice

Unit price

If you need to recognize other properties, please get in touch with us.