OCR

Quickstart SDKs Blog Test in APP Public Demo

This page describes API for OCR (Optical Character Recognition) service and combination of OCR on images and ChatGPT (large language models applied on OCR results). The API follows the general rules of Ximilar API as described in Section First steps.

If you are looking for a more detailed analysis of texts, you can use /v2/read_gpt endpoint.

There are many use-cases that you can use this endpoint:

You can for example datamine information from invoices.
You want to automatically read texts from trading cards, posters, comics, ...
You want to read nutrition information from labels on food products, ...

Just specify the prompt field (str) in record and this prompt will be sent along with full_text (read text by OCR) from image to ChatGPT API.

This service also supports asynchronous API with webhook. If you need to process large number of requests per day or month (millions of requests) and don't need to get the results immediately then just use our asynchronous requests with webhooks.

Endpoints

This service API has two endpoints running at URLs:

https://api.ximilar.com/ocr/v2/read      (for basic OCR reading)
https://api.ximilar.com/ocr/v2/read_gpt  (for OCR reading with GPT analysis)

POST/v2/read

OCR Reading

Quickstart

Given a list of image records, this method returns read text with OCR system on the images. For each image it predicts positions of text (defined by POLYGON) with read text based on language. The result is stored in _ocr field.

Required attributes

Name
records
Type
dict
Description
A batch of json records (max 10), one record is representation of an image and it's defined by _url or _base64.

Optional attributes

Name
lang
Type
string
Default
Default:en
Description
Language settings for OCR model, currently: "en", "zh" (chinese), "ko" (korean), "ja" (japanese), "ru" (russian).

Returns

HTTP error code 2XX, if the method was OK and other HTTP error code, if the method failed. Body of the response is a JSON object (map) with the following fields:

Name
records
Type
dict
Description
JSON array with the input records, each record enriched by field _ocr containing detected text with positions and probabilities.
Name
status
Type
dict
Description
A JSON map/dictionary with a status of the method processing. It contains these subfields: code (numeric code of the operation status; it follows the concept of HTTP status codes) and text (text describing the status code).

Request

POST

/v2/read

curl https://api.ximilar.com/ocr/v2/read -H "Content-Type: application/json" -H "Authorization: Token __API_TOKEN__" -d '{
  "lang": "en",
  "records": [
    {
      "_url": "__PATH_TO_IMAGE_URL__"
    }
  ]
}'

Response

{
  "records": [
    {
      "_url": "https://images.ximilar.com/examples/cards/mew_pokemon.jpeg",
      "_status": {
        "code": 200,
        "text": "OK",
        "request_id": "f9cec41f-dd6e-4aa9-93b3-4f2510fd5eb6"
      },
      "_id": "4d903663-a348-4d9b-be76-4df38df7d66a",
      "_width": 1140,
      "_height": 1520,
      "_ocr": {
        "texts": [
          {
            "polygon": [
              [285.0, 221.0],
              [384.0, 229.0],
              [382.0, 256.0],
              [283.0, 248.0]
            ],
            "text": "BASIC",
            "prob": 0.9564334750175476
          }
        ],
        "full_text": "BASIC MewV 180 ASTRIKE EnergyMix...",
        "lang": "en",
        "lang_name": "english"
      }
    }
  ],
  "status": {
    "code": 200,
    "text": "OK",
    "request_id": "f9cec41f-dd6e-4aa9-93b3-4f2510fd5eb6",
    "proc_id": "1a088e14-714a-41bd-9a89-4d732becafcd"
  },
  "statistics": {
    "processing time": 5.054640769958496
  }
}

POST/v2/read_gpt

OCR Reading with GPT Analysis