Optical Character Recognition (OCR)
To access the Ximilar API, first register at Ximilar App to get your API token.
This page describes the API for the OCR (Optical Character Recognition) service and its integration with ChatGPT (large language models applied to OCR results). This API follows the general rules of Ximilar API as described in Section First steps.
Endpoints
This service API provides two endpoints:
https://api.ximilar.com/ocr/v2/read (basic OCR reading)
https://api.ximilar.com/ocr/v2/read_gpt (OCR reading with GPT analysis)
For more advanced text analysis, we recommend using the /v2/read_gpt
endpoint.
Common use cases include:
- Extracting data from invoices, receipts, and other documents.
- Reading text from images, such as product labels, posters, and comics.
- Reading nutritional information from food product labels.
To use GPT analysis, specify the prompt
field (string) in the record.
This prompt will be sent, along with the full_text
(OCR-extracted text), to the ChatGPT API.
If you need to process a large number of requests per day or month and don’t require immediate results, use our asynchronous requests with webhooks.
OCR Reading
Given a list of image records, this method returns the texts extracted by the OCR system from the images.
For each image, it predicts the position of the text (marked with a polygon) and extracts the text based on the specified language.
The result is returned in the _ocr
field.
Required attributes
- Name
records
- Type
- dict
- Max
- Maximum:10
- Description
A batch of JSON records (maximum of 10). Each record represents a single image and must include either
_url
or_base64
.
Optional attributes
- Name
lang
- Type
- string
- Default
- Default:en
- Description
Language setting for the OCR model. Currently supported values:
"en"
(English),"zh"
(Chinese),"ko"
(Korean),"ja"
(Japanese),"ru"
(Russian).
Returns
HTTP error code 2XX, if the method was OK, and other HTTP error code, if the method failed. The response body is a JSON object (map) with the following fields:
- Name
records
- Type
- dict
- Description
JSON array of input records. Each record contains an
_ocr
field with the detected text, its position polygons, and confidence scores.
- Name
status
- Type
- dict
- Description
JSON object describing the processing status, with two subfields:
code
– numeric status code (aligned with HTTP status codes)text
– message describing the status
Request
curl https://api.ximilar.com/ocr/v2/read -H "Content-Type: application/json" -H "Authorization: Token __API_TOKEN__" -d '{
"lang": "en",
"records": [
{
"_url": "__PATH_TO_IMAGE_URL__"
}
]
}'
Response
{
"records": [
{
"_url": "https://images.ximilar.com/examples/cards/mew_pokemon.jpeg",
"_status": {
"code": 200,
"text": "OK",
"request_id": "f9cec41f-dd6e-4aa9-93b3-4f2510fd5eb6"
},
"_id": "4d903663-a348-4d9b-be76-4df38df7d66a",
"_width": 1140,
"_height": 1520,
"_ocr": {
"texts": [
{
"polygon": [
[285.0, 221.0],
[384.0, 229.0],
[382.0, 256.0],
[283.0, 248.0]
],
"text": "BASIC",
"prob": 0.9564334750175476
}
],
"full_text": "BASIC MewV 180 ASTRIKE EnergyMix...",
"lang": "en",
"lang_name": "english"
}
}
],
"status": {
"code": 200,
"text": "OK",
"request_id": "f9cec41f-dd6e-4aa9-93b3-4f2510fd5eb6",
"proc_id": "1a088e14-714a-41bd-9a89-4d732becafcd"
},
"statistics": {
"processing time": 5.054640769958496
}
}
OCR Reading With GPT Analysis
This endpoint processes a list of image records by applying OCR (Optical Character Recognition) and then analyzing the extracted text using ChatGPT based on your input prompt.
For each image, the system:
- Detects text regions (marked as polygons)
- Extracts text based on the specified language
- Applies ChatGPT analysis to the recognized text (if a
prompt
is provided)
The results are stored in the _ocr
(raw text) and _gpt
(ChatGPT output) fields.
This endpoint is available under the Business and Professional user plans, which can be activated in the Ximilar App.
To compare plans and available services, visit our Pricing page.
Required attributes
- Name
records
- Type
- dict
- Max
- Maximum:10
- Description
A batch of JSON records (max 10). Each record represents one image and must be defined by either
_url
or_base64
.
The record must also contain aprompt
field to trigger the GPT analysis.
Maximum limit: 1000 tokens per record (image).
Optional attributes
- Name
lang
- Type
- string
- Default
- Default:en
- Description
Language setting for the OCR model. Supported values:
"en"
(English),"zh"
(Chinese),"ko"
(Korean),"ja"
(Japanese),"ru"
(Russian).
Returns
HTTP error code 2XX, if the method was OK and other HTTP error code, if the method failed. The response body is a JSON object (map) with the following fields:
- Name
records
- Type
- dict
- Description
JSON array of input records. Each record contains an
_ocr
field with the detected text, its position polygons, and confidence scores.
- Name
status
- Type
- dict
- Description
JSON object describing the processing status, with two subfields:
code
– numeric status code (aligned with HTTP status codes)text
– message describing the status
Request
curl https://api.ximilar.com/ocr/v2/read_gpt -H "Content-Type: application/json" -H "Authorization: Token __API_TOKEN__" -d '{
"records": [
{
"_url": "__PATH_TO_IMAGE_URL__",
"prompt": "based on the following result from ocr system what is the name and type of the card as json result ({'\''name'\'':'\'''\'', '\''type'\'': '\'''\''})"
}
],
"lang": "en"
}'
Response
{
"records": [
{
"_url": "https://images.ximilar.com/examples/cards/mew_pokemon.jpeg",
"prompt": "based on the following result from ocr system what is the name and type of the card as json result ({'name':'', 'type': ''})",
"_status": {
"code": 200,
"text": "OK",
"request_id": "5d735c9b-71e7-425f-a4c7-4423230b3748"
},
"_id": "903ae611-2df4-4385-989e-146fa4d0fd7a",
"_width": 1140,
"_height": 1520,
"_ocr": {
"texts": [...],
"full_text": "BASIC MewV 180 ASTRIKE EnergyMix...",
"lang": "en",
"lang_name": "english"
},
"_gpt": {
"full_prompt": "Instruction: based on the following result from ocr system what is the name and type of the card as json result ({'name':'', 'type': ''}) \nInput: BASIC MewV 180 ASTRIKE EnergyMix... \nResult:",
"result": "{'name': 'Mew V', 'type': 'Basic'}",
"json_result": {
"name": "Mew V",
"type": "Basic"
}
}
}
],
"status": {
"code": 200,
"text": "OK",
"request_id": "5d735c9b-71e7-425f-a4c7-4423230b3748",
"proc_id": "b366485a-8ed9-462d-9ade-cdb817cae03a"
},
"statistics": {
"processing time": 6.970479965209961
}
}