OCR

This page describes API for OCR (Optical Character Recognition) service and combination of OCR on images and ChatGPT (large language models applied on OCR results). The API follows the general rules of Ximilar API as described in Section First steps.

Endpoints

This service API has two endpoints running at URLs:

https://api.ximilar.com/ocr/v2/read      (for basic OCR reading)
https://api.ximilar.com/ocr/v2/read_gpt  (for OCR reading with GPT analysis)

POST/v2/read

OCR Reading

Quickstart

Given a list of image records, this method returns read text with OCR system on the images. For each image it predicts positions of text (defined by POLYGON) with read text based on language. The result is stored in _ocr field.

Required attributes

  • Name
    records
    Type
    dict
    Description

    A batch of json records (max 10), one record is representation of an image and it's defined by _url or _base64.

Optional attributes

  • Name
    lang
    Type
    string
    Default
    Default:en
    Description

    Language settings for OCR model, currently: "en", "zh" (chinese), "ko" (korean), "ja" (japanese), "ru" (russian).

Returns

HTTP error code 2XX, if the method was OK and other HTTP error code, if the method failed. Body of the response is a JSON object (map) with the following fields:

  • Name
    records
    Type
    dict
    Description

    JSON array with the input records, each record enriched by field _ocr containing detected text with positions and probabilities.

  • Name
    status
    Type
    dict
    Description

    A JSON map/dictionary with a status of the method processing. It contains these subfields: code (numeric code of the operation status; it follows the concept of HTTP status codes) and text (text describing the status code).

Request

POST
/v2/read
curl https://api.ximilar.com/ocr/v2/read -H "Content-Type: application/json" -H "Authorization: Token __API_TOKEN__" -d '{
  "lang": "en",
  "records": [
    {
      "_url": "__PATH_TO_IMAGE_URL__"
    }
  ]
}'

Response

{
  "records": [
    {
      "_url": "https://images.ximilar.com/examples/cards/mew_pokemon.jpeg",
      "_status": {
        "code": 200,
        "text": "OK",
        "request_id": "f9cec41f-dd6e-4aa9-93b3-4f2510fd5eb6"
      },
      "_id": "4d903663-a348-4d9b-be76-4df38df7d66a",
      "_width": 1140,
      "_height": 1520,
      "_ocr": {
        "texts": [
          {
            "polygon": [
              [285.0, 221.0],
              [384.0, 229.0],
              [382.0, 256.0],
              [283.0, 248.0]
            ],
            "text": "BASIC",
            "prob": 0.9564334750175476
          }
        ],
        "full_text": "BASIC MewV 180 ASTRIKE EnergyMix...",
        "lang": "en",
        "lang_name": "english"
      }
    }
  ],
  "status": {
    "code": 200,
    "text": "OK",
    "request_id": "f9cec41f-dd6e-4aa9-93b3-4f2510fd5eb6",
    "proc_id": "1a088e14-714a-41bd-9a89-4d732becafcd"
  },
  "statistics": {
    "processing time": 5.054640769958496
  }
}

POST/v2/read_gpt

OCR Reading with GPT Analysis

Quickstart

Given a list of image records, this method returns analyzed read text with OCR and analyzed text with ChatGPT based on your input prompt. For each image it predicts positions of text (defined by POLYGON) with read text based on language. The result is stored in _ocr and _gpt fields. This endpoint requires at least a business plan activated via app.ximilar.com.

Required attributes

  • Name
    records
    Type
    dict
    Description

    A batch of json records (max 10), one record is representation of an image and it's defined by _url or _base64. Must contain prompt for calling GPT service. Maximum number of tokens to process is limited to 1000 per record/image.

Optional attributes

  • Name
    lang
    Type
    string
    Default
    Default:en
    Description

    Language settings for OCR model, currently: "en", "zh" (chinese), "ko" (korean), "ja" (japanese), "ru" (russian).

Returns

HTTP error code 2XX, if the method was OK and other HTTP error code, if the method failed. Body of the response is a JSON object (map) with the following fields:

  • Name
    records
    Type
    dict
    Description

    JSON array with the input records, each record enriched by fields _ocr and _gpt containing detected text and GPT analysis results.

  • Name
    status
    Type
    dict
    Description

    A JSON map/dictionary with a status of the method processing. It contains these subfields: code (numeric code of the operation status; it follows the concept of HTTP status codes) and text (text describing the status code).

Request

POST
/v2/read_gpt
curl https://api.ximilar.com/ocr/v2/read_gpt -H "Content-Type: application/json" -H "Authorization: Token __API_TOKEN__" -d '{
  "records": [
    {
      "_url": "__PATH_TO_IMAGE_URL__",
      "prompt": "based on the following result from ocr system what is the name and type of the card as json result ({'\''name'\'':'\'''\'', '\''type'\'': '\'''\''})"
    }
  ],
  "lang": "en"
}'

Response

{
  "records": [
    {
      "_url": "https://images.ximilar.com/examples/cards/mew_pokemon.jpeg",
      "prompt": "based on the following result from ocr system what is the name and type of the card as json result ({'name':'', 'type': ''})",
      "_status": {
        "code": 200,
        "text": "OK",
        "request_id": "5d735c9b-71e7-425f-a4c7-4423230b3748"
      },
      "_id": "903ae611-2df4-4385-989e-146fa4d0fd7a",
      "_width": 1140,
      "_height": 1520,
      "_ocr": {
        "texts": [...],
        "full_text": "BASIC MewV 180 ASTRIKE EnergyMix...",
        "lang": "en",
        "lang_name": "english"
      },
      "_gpt": {
        "full_prompt": "Instruction: based on the following result from ocr system what is the name and type of the card as json result ({'name':'', 'type': ''}) \nInput: BASIC MewV 180 ASTRIKE EnergyMix... \nResult:",
        "result": "{'name': 'Mew V', 'type': 'Basic'}",
        "json_result": {
          "name": "Mew V",
          "type": "Basic"
        }
      }
    }
  ],
  "status": {
    "code": 200,
    "text": "OK",
    "request_id": "5d735c9b-71e7-425f-a4c7-4423230b3748",
    "proc_id": "b366485a-8ed9-462d-9ade-cdb817cae03a"
  },
  "statistics": {
    "processing time": 6.970479965209961
  }
}

Was this page helpful?