Optical Character Recognition (OCR)

This page describes the API for the OCR (Optical Character Recognition) service and its integration with ChatGPT (large language models applied to OCR results). This API follows the general rules of Ximilar API as described in Section First steps.

Endpoints

This service API provides two endpoints:

https://api.ximilar.com/ocr/v2/read      (basic OCR reading)
https://api.ximilar.com/ocr/v2/read_gpt  (OCR reading with GPT analysis)

POST/v2/read

OCR Reading

Quickstart

Given a list of image records, this method returns the texts extracted by the OCR system from the images. For each image, it predicts the position of the text (marked with a polygon) and extracts the text based on the specified language. The result is returned in the _ocr field.

Required attributes

  • Name
    records
    Type
    dict
    Max
    Maximum:10
    Description

    A batch of JSON records (maximum of 10). Each record represents a single image and must include either _url or _base64.

Optional attributes

  • Name
    lang
    Type
    string
    Default
    Default:en
    Description

    Language setting for the OCR model. Currently supported values: "en" (English), "zh" (Chinese), "ko" (Korean), "ja" (Japanese), "ru" (Russian).

Returns

HTTP error code 2XX, if the method was OK, and other HTTP error code, if the method failed. The response body is a JSON object (map) with the following fields:

  • Name
    records
    Type
    dict
    Description

    JSON array of input records. Each record contains an _ocr field with the detected text, its position polygons, and confidence scores.

  • Name
    status
    Type
    dict
    Description

    JSON object describing the processing status, with two subfields:

    • code – numeric status code (aligned with HTTP status codes)
    • text – message describing the status

Request

POST
/v2/read
curl https://api.ximilar.com/ocr/v2/read -H "Content-Type: application/json" -H "Authorization: Token __API_TOKEN__" -d '{
  "lang": "en",
  "records": [
    {
      "_url": "__PATH_TO_IMAGE_URL__"
    }
  ]
}'

Response

{
  "records": [
    {
      "_url": "https://images.ximilar.com/examples/cards/mew_pokemon.jpeg",
      "_status": {
        "code": 200,
        "text": "OK",
        "request_id": "f9cec41f-dd6e-4aa9-93b3-4f2510fd5eb6"
      },
      "_id": "4d903663-a348-4d9b-be76-4df38df7d66a",
      "_width": 1140,
      "_height": 1520,
      "_ocr": {
        "texts": [
          {
            "polygon": [
              [285.0, 221.0],
              [384.0, 229.0],
              [382.0, 256.0],
              [283.0, 248.0]
            ],
            "text": "BASIC",
            "prob": 0.9564334750175476
          }
        ],
        "full_text": "BASIC MewV 180 ASTRIKE EnergyMix...",
        "lang": "en",
        "lang_name": "english"
      }
    }
  ],
  "status": {
    "code": 200,
    "text": "OK",
    "request_id": "f9cec41f-dd6e-4aa9-93b3-4f2510fd5eb6",
    "proc_id": "1a088e14-714a-41bd-9a89-4d732becafcd"
  },
  "statistics": {
    "processing time": 5.054640769958496
  }
}

POST/v2/read_gpt

OCR Reading With GPT Analysis

Quickstart

This endpoint processes a list of image records by applying OCR (Optical Character Recognition) and then analyzing the extracted text using ChatGPT based on your input prompt.

For each image, the system:

  • Detects text regions (marked as polygons)
  • Extracts text based on the specified language
  • Applies ChatGPT analysis to the recognized text (if a prompt is provided)

The results are stored in the _ocr (raw text) and _gpt (ChatGPT output) fields.

Required attributes

  • Name
    records
    Type
    dict
    Max
    Maximum:10
    Description

    A batch of JSON records (max 10). Each record represents one image and must be defined by either _url or _base64.
    The record must also contain a prompt field to trigger the GPT analysis.
    Maximum limit: 1000 tokens per record (image).

Optional attributes

  • Name
    lang
    Type
    string
    Default
    Default:en
    Description

    Language setting for the OCR model. Supported values: "en" (English), "zh" (Chinese), "ko" (Korean), "ja" (Japanese), "ru" (Russian).

Returns

HTTP error code 2XX, if the method was OK and other HTTP error code, if the method failed. The response body is a JSON object (map) with the following fields:

  • Name
    records
    Type
    dict
    Description

    JSON array of input records. Each record contains an _ocr field with the detected text, its position polygons, and confidence scores.

  • Name
    status
    Type
    dict
    Description

    JSON object describing the processing status, with two subfields:

    • code – numeric status code (aligned with HTTP status codes)
    • text – message describing the status

Request

POST
/v2/read_gpt
curl https://api.ximilar.com/ocr/v2/read_gpt -H "Content-Type: application/json" -H "Authorization: Token __API_TOKEN__" -d '{
  "records": [
    {
      "_url": "__PATH_TO_IMAGE_URL__",
      "prompt": "based on the following result from ocr system what is the name and type of the card as json result ({'\''name'\'':'\'''\'', '\''type'\'': '\'''\''})"
    }
  ],
  "lang": "en"
}'

Response

{
  "records": [
    {
      "_url": "https://images.ximilar.com/examples/cards/mew_pokemon.jpeg",
      "prompt": "based on the following result from ocr system what is the name and type of the card as json result ({'name':'', 'type': ''})",
      "_status": {
        "code": 200,
        "text": "OK",
        "request_id": "5d735c9b-71e7-425f-a4c7-4423230b3748"
      },
      "_id": "903ae611-2df4-4385-989e-146fa4d0fd7a",
      "_width": 1140,
      "_height": 1520,
      "_ocr": {
        "texts": [...],
        "full_text": "BASIC MewV 180 ASTRIKE EnergyMix...",
        "lang": "en",
        "lang_name": "english"
      },
      "_gpt": {
        "full_prompt": "Instruction: based on the following result from ocr system what is the name and type of the card as json result ({'name':'', 'type': ''}) \nInput: BASIC MewV 180 ASTRIKE EnergyMix... \nResult:",
        "result": "{'name': 'Mew V', 'type': 'Basic'}",
        "json_result": {
          "name": "Mew V",
          "type": "Basic"
        }
      }
    }
  ],
  "status": {
    "code": 200,
    "text": "OK",
    "request_id": "5d735c9b-71e7-425f-a4c7-4423230b3748",
    "proc_id": "b366485a-8ed9-462d-9ade-cdb817cae03a"
  },
  "statistics": {
    "processing time": 6.970479965209961
  }
}

Was this page helpful?